Source code

Revision control

Other Tools

1
Expectation Data
2
================
3
4
Introduction
5
------------
6
7
For use in continuous integration systems, and other scenarios where
8
regression tracking is required, wptrunner supports storing and
9
loading the expected result of each test in a test run. Typically
10
these expected results will initially be generated by running the
11
testsuite in a baseline build. They may then be edited by humans as
12
new features are added to the product that change the expected
13
results. The expected results may also vary for a single product
14
depending on the platform on which it is run. Therefore, the raw
15
structured log data is not a suitable format for storing these
16
files. Instead something is required that is:
17
18
* Human readable
19
20
* Human editable
21
22
* Machine readable / writable
23
24
* Capable of storing test id / result pairs
25
26
* Suitable for storing in a version control system (i.e. text-based)
27
28
The need for different results per platform means either having
29
multiple expectation files for each platform, or having a way to
30
express conditional values within a certain file. The former would be
31
rather cumbersome for humans updating the expectation files, so the
32
latter approach has been adopted, leading to the requirement:
33
34
* Capable of storing result values that are conditional on the platform.
35
36
There are few extant formats that meet these requirements, so
37
wptrunner uses a bespoke ``expectation manifest`` format, which is
38
closely based on the standard ``ini`` format.
39
40
Directory Layout
41
----------------
42
43
Expectation manifest files must be stored under the ``metadata``
44
directory passed to the test runner. The directory layout follows that
45
of web-platform-tests with each test path having a corresponding
46
manifest file. Tests that differ only by query string, or reftests
47
with the same test path but different ref paths share the same
48
reference file. The file name is taken from the last /-separated part
49
of the path, suffixed with ``.ini``.
50
51
As an optimisation, files which produce only default results
52
(i.e. ``PASS`` or ``OK``) don't require a corresponding manifest file.
53
54
For example a test with url::
55
56
/spec/section/file.html?query=param
57
58
would have an expectation file ::
59
60
metadata/spec/section/file.html.ini
61
62
63
.. _wptupdate-label:
64
65
Generating Expectation Files
66
----------------------------
67
68
wptrunner provides the tool ``wptupdate`` to generate expectation
69
files from the results of a set of baseline test runs. The basic
70
syntax for this is::
71
72
wptupdate [options] [logfile]...
73
74
Each ``logfile`` is a structured log file from a previous run. These
75
can be generated from wptrunner using the ``--log-raw`` option
76
e.g. ``--log-raw=structured.log``. The default behaviour is to update
77
all the test data for the particular combination of hardware and OS
78
used in the run corresponding to the log data, whilst leaving any
79
other expectations untouched.
80
81
wptupdate takes several useful options:
82
83
``--sync``
84
Pull the latest version of web-platform-tests from the
85
upstream specified in the config file. If this is specified in
86
combination with logfiles, it is assumed that the results in the log
87
files apply to the post-update tests.
88
89
``--no-check-clean``
90
Don't attempt to check if the working directory is clean before
91
doing the update (assuming that the working directory is a git or
92
mercurial tree).
93
94
``--patch``
95
Create a a git commit, or a mq patch, with the changes made by wptupdate.
96
97
``--ignore-existing``
98
Overwrite all the expectation data for any tests that have a result
99
in the passed log files, not just data for the same platform.
100
101
``--disable-intermittent``
102
When updating test results, disable tests that have inconsistent
103
results across many runs. This can precede a message providing a
104
reason why that test is disable. If no message is provided,
105
``unstable`` is the default text.
106
107
``--update-intermittent``
108
When this option is used, the ``expected`` key (see below) stores
109
expected intermittent statuses in addition to the primary expected
110
status. If there is more than one status, it appears as a list. The
111
default behaviour of this option is to retain any existing intermittent
112
statuses in the list unless ``--remove-intermittent`` is specified.
113
114
``--remove-intermittent``
115
This option is used in conjunction with ``--update-intermittent``.
116
When the ``expected`` statuses are updated, any obsolete intermittent
117
statuses that did not occur in the specified logfiles are removed from
118
the list.
119
120
Examples
121
~~~~~~~~
122
123
Update the local copy of web-platform-tests without changing the
124
expectation data and commit (or create a mq patch for) the result::
125
126
wptupdate --patch --sync
127
128
Update all the expectations from a set of cross-platform test runs::
129
130
wptupdate --no-check-clean --patch osx.log linux.log windows.log
131
132
Add expectation data for some new tests that are expected to be
133
platform-independent::
134
135
wptupdate --no-check-clean --patch --ignore-existing tests.log
136
137
Manifest Format
138
---------------
139
The format of the manifest files is based on the ini format. Files are
140
divided into sections, each (apart from the root section) having a
141
heading enclosed in square braces. Within each section are key-value
142
pairs. There are several notable differences from standard .ini files,
143
however:
144
145
* Sections may be hierarchically nested, with significant whitespace
146
indicating nesting depth.
147
148
* Only ``:`` is valid as a key/value separator
149
150
A simple example of a manifest file is::
151
152
root_key: root_value
153
154
[section]
155
section_key: section_value
156
157
[subsection]
158
subsection_key: subsection_value
159
160
[another_section]
161
another_key: another_value
162
163
Conditional Values
164
~~~~~~~~~~~~~~~~~~
165
166
In order to support values that depend on some external data, the
167
right hand side of a key/value pair can take a set of conditionals
168
rather than a plain value. These values are placed on a new line
169
following the key, with significant indentation. Conditional values
170
are prefixed with ``if`` and terminated with a colon, for example::
171
172
key:
173
if cond1: value1
174
if cond2: value2
175
value3
176
177
In this example, the value associated with ``key`` is determined by
178
first evaluating ``cond1`` against external data. If that is true,
179
``key`` is assigned the value ``value1``, otherwise ``cond2`` is
180
evaluated in the same way. If both ``cond1`` and ``cond2`` are false,
181
the unconditional ``value3`` is used.
182
183
Conditions themselves use a Python-like expression syntax. Operands
184
can either be variables, corresponding to data passed in, numbers
185
(integer or floating point; exponential notation is not supported) or
186
quote-delimited strings. Equality is tested using ``==`` and
187
inequality by ``!=``. The operators ``and``, ``or`` and ``not`` are
188
used in the expected way. Parentheses can also be used for
189
grouping. For example::
190
191
key:
192
if (a == 2 or a == 3) and b == "abc": value1
193
if a == 1 or b != "abc": value2
194
value3
195
196
Here ``a`` and ``b`` are variables, the value of which will be
197
supplied when the manifest is used.
198
199
Expectation Manifests
200
---------------------
201
202
When used for expectation data, manifests have the following format:
203
204
* A section per test URL described by the manifest, with the section
205
heading being the part of the test URL following the last ``/`` in
206
the path (this allows multiple tests in a single manifest file with
207
the same path part of the URL, but different query parts).
208
209
* A subsection per subtest, with the heading being the title of the
210
subtest.
211
212
* A key ``expected`` giving the expectation value or values of each
213
(sub)test.
214
215
* A key ``disabled`` which can be set to any value to indicate that
216
the (sub)test is disabled and should either not be run (for tests)
217
or that its results should be ignored (subtests).
218
219
* A key ``restart-after`` which can be set to any value to indicate that
220
the runner should restart the browser after running this test (e.g. to
221
clear out unwanted state).
222
223
* A key ``fuzzy`` that is used for reftests. This is interpreted as a
224
list containing entries like ``<meta name=fuzzy>`` content value,
225
which consists of an optional reference identifier followed by a
226
colon, then a range indicating the maximum permitted pixel
227
difference per channel, then semicolon, then a range indicating the
228
maximum permitted total number of differing pixels. The reference
229
identifier is either a single relative URL, resolved against the
230
base test URL, in which case the fuzziness applies to any
231
comparison with that URL, or takes the form lhs url, comparison,
232
rhs url, in which case the fuzziness only applies for any
233
comparison involving that specifc pair of URLs. Some illustrative
234
examples are given below.
235
236
* Variables ``debug``, ``os``, ``version``, ``processor`` and
237
``bits`` that describe the configuration of the browser under
238
test. ``debug`` is a boolean indicating whether a build is a debug
239
build. ``os`` is a string indicating the operating system, and
240
``version`` a string indicating the particular version of that
241
operating system. ``processor`` is a string indicating the
242
processor architecture and ``bits`` an integer indicating the
243
number of bits. This information is typically provided by
244
:py:mod:`mozinfo`.
245
246
* Top level keys are taken as defaults for the whole file. So, for
247
example, a top level key with ``expected: FAIL`` would indicate
248
that all tests and subtests in the file are expected to fail,
249
unless they have an ``expected`` key of their own.
250
251
An simple example manifest might look like::
252
253
[test.html?variant=basic]
254
type: testharness
255
256
[Test something unsupported]
257
expected: FAIL
258
259
[Test with intermittent statuses]
260
expected: [PASS, TIMEOUT]
261
262
[test.html?variant=broken]
263
expected: ERROR
264
265
[test.html?variant=unstable]
267
268
A more complex manifest with conditional properties might be::
269
270
[canvas_test.html]
271
expected:
272
if os == "osx": FAIL
273
if os == "windows" and version == "XP": FAIL
274
PASS
275
276
Note that ``PASS`` in the above works, but is unnecessary; ``PASS``
277
(or ``OK``) is always the default expectation for (sub)tests.
278
279
A manifest with fuzzy reftest values might be::
280
281
[reftest.html]
282
fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]
283
284
In this case the default fuzziness for any comparison would be to
285
require a maximum difference per channel of less than or equal to 10
286
and less than or equal to 200 total pixels different. For any
287
comparison involving ref1.html on the right hand side, the limits
288
would instead be a difference per channel not more than 20 and a total
289
difference count of not less than 200 and not more than 300. For the
290
specific comparison subtest1.html == ref2.html (both resolved against
291
the test URL) these limits would instead be 10 to 15 and 0 to 20,
292
respectively.