Source code

Revision control

Other Tools

1
Expectation Data
2
================
3
4
Introduction
5
------------
6
7
For use in continuous integration systems, and other scenarios where
8
regression tracking is required, wptrunner supports storing and
9
loading the expected result of each test in a test run. Typically
10
these expected results will initially be generated by running the
11
testsuite in a baseline build. They may then be edited by humans as
12
new features are added to the product that change the expected
13
results. The expected results may also vary for a single product
14
depending on the platform on which it is run. Therefore, the raw
15
structured log data is not a suitable format for storing these
16
files. Instead something is required that is:
17
18
* Human readable
19
20
* Human editable
21
22
* Machine readable / writable
23
24
* Capable of storing test id / result pairs
25
26
* Suitable for storing in a version control system (i.e. text-based)
27
28
The need for different results per platform means either having
29
multiple expectation files for each platform, or having a way to
30
express conditional values within a certain file. The former would be
31
rather cumbersome for humans updating the expectation files, so the
32
latter approach has been adopted, leading to the requirement:
33
34
* Capable of storing result values that are conditional on the platform.
35
36
There are few extant formats that meet these requirements, so
37
wptrunner uses a bespoke ``expectation manifest`` format, which is
38
closely based on the standard ``ini`` format.
39
40
Directory Layout
41
----------------
42
43
Expectation manifest files must be stored under the ``metadata``
44
directory passed to the test runner. The directory layout follows that
45
of web-platform-tests with each test path having a corresponding
46
manifest file. Tests that differ only by query string, or reftests
47
with the same test path but different ref paths share the same
48
reference file. The file name is taken from the last /-separated part
49
of the path, suffixed with ``.ini``.
50
51
As an optimisation, files which produce only default results
52
(i.e. ``PASS`` or ``OK``) don't require a corresponding manifest file.
53
54
For example a test with url::
55
56
/spec/section/file.html?query=param
57
58
would have an expectation file ::
59
60
metadata/spec/section/file.html.ini
61
62
63
.. _wptupdate-label:
64
65
Generating Expectation Files
66
----------------------------
67
68
wptrunner provides the tool ``wptupdate`` to generate expectation
69
files from the results of a set of baseline test runs. The basic
70
syntax for this is::
71
72
wptupdate [options] [logfile]...
73
74
Each ``logfile`` is a structured log file from a previous run. These
75
can be generated from wptrunner using the ``--log-raw`` option
76
e.g. ``--log-raw=structured.log``. The default behaviour is to update
77
all the test data for the particular combination of hardware and OS
78
used in the run corresponding to the log data, whilst leaving any
79
other expectations untouched.
80
81
wptupdate takes several useful options:
82
83
``--sync``
84
Pull the latest version of web-platform-tests from the
85
upstream specified in the config file. If this is specified in
86
combination with logfiles, it is assumed that the results in the log
87
files apply to the post-update tests.
88
89
``--no-check-clean``
90
Don't attempt to check if the working directory is clean before
91
doing the update (assuming that the working directory is a git or
92
mercurial tree).
93
94
``--patch``
95
Create a a git commit, or a mq patch, with the changes made by wptupdate.
96
97
``--ignore-existing``
98
Overwrite all the expectation data for any tests that have a result
99
in the passed log files, not just data for the same platform.
100
101
``--disable-intermittent``
102
When updating test results, disable tests that have inconsistent
103
results across many runs. This can precede a message providing a
104
reason why that test is disable. If no message is provided,
105
``unstable`` is the default text.
106
107
``--update-intermittent``
108
When this option is used, the ``expected`` key (see below) stores
109
expected intermittent statuses in addition to the primary expected
110
status. If there is more than one status, it appears as a list. The
111
default behaviour of this option is to retain any existing intermittent
112
statuses in the list unless ``--remove-intermittent`` is specified.
113
114
``--remove-intermittent``
115
This option is used in conjunction with ``--update-intermittent``.
116
When the ``expected`` statuses are updated, any obsolete intermittent
117
statuses that did not occur in the specified logfiles are removed from
118
the list.
119
120
Examples
121
~~~~~~~~
122
123
Update the local copy of web-platform-tests without changing the
124
expectation data and commit (or create a mq patch for) the result::
125
126
wptupdate --patch --sync
127
128
Update all the expectations from a set of cross-platform test runs::
129
130
wptupdate --no-check-clean --patch osx.log linux.log windows.log
131
132
Add expectation data for some new tests that are expected to be
133
platform-independent::
134
135
wptupdate --no-check-clean --patch --ignore-existing tests.log
136
137
Manifest Format
138
---------------
139
The format of the manifest files is based on the ini format. Files are
140
divided into sections, each (apart from the root section) having a
141
heading enclosed in square braces. Within each section are key-value
142
pairs. There are several notable differences from standard .ini files,
143
however:
144
145
* Sections may be hierarchically nested, with significant whitespace
146
indicating nesting depth.
147
148
* Only ``:`` is valid as a key/value separator
149
150
A simple example of a manifest file is::
151
152
root_key: root_value
153
154
[section]
155
section_key: section_value
156
157
[subsection]
158
subsection_key: subsection_value
159
160
[another_section]
161
another_key: another_value
162
163
The web-platform-test harness knows about several keys:
164
165
`expected`
166
Must evaluate to a possible test status indicating the expected
167
result of the test. The implicit default is PASS or OK when the
168
field isn't present. When `expected` is a list, the first status
169
is the primary expected status and the trailing statuses listed are
170
expected intermittent statuses.
171
172
`disabled`
173
Any value indicates that the test is disabled.
174
175
`reftype`
176
The type of comparison for reftests; either `==` or `!=`.
177
178
`refurl`
179
The reference url for reftests.
180
181
Conditional Values
182
~~~~~~~~~~~~~~~~~~
183
184
In order to support values that depend on some external data, the
185
right hand side of a key/value pair can take a set of conditionals
186
rather than a plain value. These values are placed on a new line
187
following the key, with significant indentation. Conditional values
188
are prefixed with ``if`` and terminated with a colon, for example::
189
190
key:
191
if cond1: value1
192
if cond2: value2
193
value3
194
195
In this example, the value associated with ``key`` is determined by
196
first evaluating ``cond1`` against external data. If that is true,
197
``key`` is assigned the value ``value1``, otherwise ``cond2`` is
198
evaluated in the same way. If both ``cond1`` and ``cond2`` are false,
199
the unconditional ``value3`` is used.
200
201
Conditions themselves use a Python-like expression syntax. Operands
202
can either be variables, corresponding to data passed in, numbers
203
(integer or floating point; exponential notation is not supported) or
204
quote-delimited strings. Equality is tested using ``==`` and
205
inequality by ``!=``. The operators ``and``, ``or`` and ``not`` are
206
used in the expected way. Parentheses can also be used for
207
grouping. For example::
208
209
key:
210
if (a == 2 or a == 3) and b == "abc": value1
211
if a == 1 or b != "abc": value2
212
value3
213
214
Here ``a`` and ``b`` are variables, the value of which will be
215
supplied when the manifest is used.
216
217
Expectation Manifests
218
---------------------
219
220
When used for expectation data, manifests have the following format:
221
222
* A section per test URL described by the manifest, with the section
223
heading being the part of the test URL following the last ``/`` in
224
the path (this allows multiple tests in a single manifest file with
225
the same path part of the URL, but different query parts).
226
227
* A subsection per subtest, with the heading being the title of the
228
subtest.
229
230
* A key ``expected`` giving the expectation value or values of each
231
(sub)test.
232
233
* A key ``disabled`` which can be set to any value to indicate that
234
the (sub)test is disabled and should either not be run (for tests)
235
or that its results should be ignored (subtests).
236
237
* A key ``restart-after`` which can be set to any value to indicate that
238
the runner should restart the browser after running this test (e.g. to
239
clear out unwanted state).
240
241
* A key ``fuzzy`` that is used for reftests. This is interpreted as a
242
list containing entries like ``<meta name=fuzzy>`` content value,
243
which consists of an optional reference identifier followed by a
244
colon, then a range indicating the maximum permitted pixel
245
difference per channel, then semicolon, then a range indicating the
246
maximum permitted total number of differing pixels. The reference
247
identifier is either a single relative URL, resolved against the
248
base test URL, in which case the fuzziness applies to any
249
comparison with that URL, or takes the form lhs url, comparison,
250
rhs url, in which case the fuzziness only applies for any
251
comparison involving that specific pair of URLs. Some illustrative
252
examples are given below.
253
254
* Variables ``debug``, ``os``, ``version``, ``processor`` and
255
``bits`` that describe the configuration of the browser under
256
test. ``debug`` is a boolean indicating whether a build is a debug
257
build. ``os`` is a string indicating the operating system, and
258
``version`` a string indicating the particular version of that
259
operating system. ``processor`` is a string indicating the
260
processor architecture and ``bits`` an integer indicating the
261
number of bits. This information is typically provided by
262
:py:mod:`mozinfo`.
263
264
* Top level keys are taken as defaults for the whole file. So, for
265
example, a top level key with ``expected: FAIL`` would indicate
266
that all tests and subtests in the file are expected to fail,
267
unless they have an ``expected`` key of their own.
268
269
An simple example manifest might look like::
270
271
[test.html?variant=basic]
272
type: testharness
273
274
[Test something unsupported]
275
expected: FAIL
276
277
[Test with intermittent statuses]
278
expected: [PASS, TIMEOUT]
279
280
[test.html?variant=broken]
281
expected: ERROR
282
283
[test.html?variant=unstable]
285
286
A more complex manifest with conditional properties might be::
287
288
[canvas_test.html]
289
expected:
290
if os == "mac": FAIL
291
if os == "windows" and version == "XP": FAIL
292
PASS
293
294
Note that ``PASS`` in the above works, but is unnecessary; ``PASS``
295
(or ``OK``) is always the default expectation for (sub)tests.
296
297
A manifest with fuzzy reftest values might be::
298
299
[reftest.html]
300
fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]
301
302
In this case the default fuzziness for any comparison would be to
303
require a maximum difference per channel of less than or equal to 10
304
and less than or equal to 200 total pixels different. For any
305
comparison involving ref1.html on the right hand side, the limits
306
would instead be a difference per channel not more than 20 and a total
307
difference count of not less than 200 and not more than 300. For the
308
specific comparison subtest1.html == ref2.html (both resolved against
309
the test URL) these limits would instead be 10 to 15 and 0 to 20,
310
respectively.