Source code

Revision control

Copy as Markdown

Other Tools

Test Metadata
Directory Layout
Metadata files must be stored under the ``metadata`` directory passed
to the test runner. The directory layout follows that of
web-platform-tests with each test source path having a corresponding
metadata file. Because the metadata path is based on the source file
path, files that generate multiple URLs e.g. tests with multiple
variants, or multi-global tests generated from an ``any.js`` input
file, share the same metadata file for all their corresponding
tests. The metadata path under the ``metadata`` directory is the same
as the source path under the ``tests`` directory, with an additional
``.ini`` suffix.
For example a test with URL::
generated from a source file with path::
<tests root>/spec/section.file.html
would have a metadata file ::
<metadata root>/spec/section/file.html.ini
As an optimisation, files which produce only default results
(i.e. ``PASS`` or ``OK``), and which don't have any other associated
metadata, don't require a corresponding metadata file.
Directory Metadata
In addition to per-test metadata, default metadata can be applied to
all the tests in a given source location, using a ``__dir__.ini``
metadata file. For example to apply metadata to all tests under
``<tests root>/spec/`` add the metadata in ``<tests
Metadata Format
The format of the metadata files is based on the ini format. Files are
divided into sections, each (apart from the root section) having a
heading enclosed in square braces. Within each section are key-value
pairs. There are several notable differences from standard .ini files,
* Sections may be hierarchically nested, with significant whitespace
indicating nesting depth.
* Only ``:`` is valid as a key/value separator
A simple example of a metadata file is::
root_key: root_value
section_key: section_value
subsection_key: subsection_value
another_key: [list, value]
Conditional Values
In order to support values that depend on some external data, the
right hand side of a key/value pair can take a set of conditionals
rather than a plain value. These values are placed on a new line
following the key, with significant indentation. Conditional values
are prefixed with ``if`` and terminated with a colon, for example::
if cond1: value1
if cond2: value2
In this example, the value associated with ``key`` is determined by
first evaluating ``cond1`` against external data. If that is true,
``key`` is assigned the value ``value1``, otherwise ``cond2`` is
evaluated in the same way. If both ``cond1`` and ``cond2`` are false,
the unconditional ``value3`` is used.
Conditions themselves use a Python-like expression syntax. Operands
can either be variables, corresponding to data passed in, numbers
(integer or floating point; exponential notation is not supported) or
quote-delimited strings. Equality is tested using ``==`` and
inequality by ``!=``. The operators ``and``, ``or`` and ``not`` are
used in the expected way. Parentheses can also be used for
grouping. For example::
if (a == 2 or a == 3) and b == "abc": value1
if a == 1 or b != "abc": value2
Here ``a`` and ``b`` are variables, the value of which will be
supplied when the metadata is used.
Web-Platform-Tests Metadata
When used for expectation data, metadata files have the following format:
* A section per test URL provided by the corresponding source file,
with the section heading being the part of the test URL following
the last ``/`` in the path (this allows multiple tests in a single
metadata file with the same path part of the URL, but different
query parts). This may be omitted if there's no non-default
metadata for the test.
* A subsection per subtest, with the heading being the title of the
subtest. This may be omitted if there's no non-default metadata for
the subtest.
* The following known keys:
The expectation value or values of each (sub)test. In
the case this value is a list, the first value represents the
typical expected test outcome, and subsequent values indicate
known intermittent outcomes e.g. ``expected: [PASS, ERROR]``
would indicate a test that usually passes but has a known-flaky
``ERROR`` outcome.
Any values apart from the special value ``@False``
indicates that the (sub)test is disabled and should either not be
run (for tests) or that its results should be ignored (subtests).
Any value apart from the special value ``@False``
indicates that the runner should restart the browser after running
this test (e.g. to clear out unwanted state).
Used for reftests. This is interpreted as a list containing
entries like ``<meta name=fuzzy>`` content value, which consists of
an optional reference identifier followed by a colon, then a range
indicating the maximum permitted pixel difference per channel, then
semicolon, then a range indicating the maximum permitted total
number of differing pixels. The reference identifier is either a
single relative URL, resolved against the base test URL, in which
case the fuzziness applies to any comparison with that URL, or
takes the form lhs URL, comparison, rhs URL, in which case the
fuzziness only applies for any comparison involving that specific
pair of URLs. Some illustrative examples are given below.
One of the values ``implementing``,
``not-implementing`` or ``default``. This is used in conjunction
with the ``--skip-implementation-status`` command line argument to
``wptrunner`` to ignore certain features where running the test is
low value.
A list of labels associated with a given test that can be
used in conjunction with the ``--tag`` command line argument to
``wptrunner`` for test selection.
In addition there are extra arguments which are currently tied to
specific implementations. For example Gecko-based browsers support
``min-asserts``, ``max-asserts``, ``prefs``, ``lsan-disabled``,
``lsan-allowed``, ``lsan-max-stack-depth``, ``leak-allowed``, and
``leak-threshold`` properties.
* Variables taken from the ``RunInfo`` data which describe the
configuration of the test run. Common properties include:
:product: A string giving the name of the browser under test
:browser_channel: A string giving the release channel of the browser under test
:debug: A Boolean indicating whether the build is a debug build
:os: A string the operating system
:version: A string indicating the particular version of that operating system
:processor: A string indicating the processor architecture.
This information is typically provided by :py:mod:`mozinfo`, but
different environments may add additional information, and not all
the properties above are guaranteed to be present in all
environments. The definitive list of available properties for a
specific run may be determined by looking at the ``run_info`` key
in the ``wptreport.json`` output for the run.
* Top level keys are taken as defaults for the whole file. So, for
example, a top level key with ``expected: FAIL`` would indicate
that all tests and subtests in the file are expected to fail,
unless they have an ``expected`` key of their own.
An simple example metadata file might look like::
type: testharness
[Test something unsupported]
expected: FAIL
[Test with intermittent statuses]
expected: [PASS, TIMEOUT]
expected: ERROR
A more complex metadata file with conditional properties might be::
if os == "mac": FAIL
if os == "windows" and version == "XP": FAIL
Note that ``PASS`` in the above works, but is unnecessary since it's
the default expected result.
A metadata file with fuzzy reftest values might be::
fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]
In this case the default fuzziness for any comparison would be to
require a maximum difference per channel of less than or equal to 10
and less than or equal to 200 total pixels different. For any
comparison involving ref1.html on the right hand side, the limits
would instead be a difference per channel not more than 20 and a total
difference count of not less than 200 and not more than 300. For the
specific comparison ``subtest1.html == ref2.html`` (both resolved against
the test URL) these limits would instead be 10 to 15 and 0 to 20,
Generating Expectation Files
wpt provides the tool ``wpt update-expectations`` command to generate
expectation files from the results of a set of test runs. The basic
syntax for this is::
./wpt update-expectations [options] [logfile]...
Each ``logfile`` is a wptreport log file from a previous run. These
can be generated from wptrunner using the ``--log-wptreport`` option
e.g. ``--log-wptreport=wptreport.json``.
``update-expectations`` takes several options:
--full Overwrite all the expectation data for any tests that have a
result in the passed log files, not just data for the same run
--disable-intermittent When updating test results, disable tests that
have inconsistent results across many
runs. This can precede a message providing a
reason why that test is disable. If no message
is provided, ``unstable`` is the default text.
--update-intermittent When this option is used, the ``expected`` key
stores expected intermittent statuses in
addition to the primary expected status. If
there is more than one status, it appears as a
list. The default behaviour of this option is to
retain any existing intermittent statuses in the
list unless ``--remove-intermittent`` is
--remove-intermittent This option is used in conjunction with
``--update-intermittent``. When the
``expected`` statuses are updated, any obsolete
intermittent statuses that did not occur in the
specified log files are removed from the list.
Property Configuration
In cases where the expectation depends on the run configuration ``wpt
update-expectations`` is able to generate conditional values. Because
the relevant variables depend on the range of configurations that need
to be covered, it's necessary to specify the list of configuration
variables that should be used. This is done using a ``json`` format
file that can be specified with the ``--properties-file`` command line
argument to ``wpt update-expectations``. When this isn't supplied the
defaults from ``<metadata root>/update_properties.json`` are used, if
Properties File Format
The file is JSON formatted with two top-level keys:
A list of property names to consider for conditionals
e.g ``["product", "os"]``.
An optional dictionary containing properties that
should only be used as "tie-breakers" when differentiating based on a
specific top-level property has failed. This is useful when the
dependent property is always more specific than the top-level
property, but less understandable when used directly. For example the
``version`` property covering different OS versions is typically
unique amongst different operating systems, but using it when the
``os`` property would do instead is likely to produce metadata that's
too specific to the current configuration and more difficult to
read. But where there are multiple versions of the same operating
system with different results, it can be necessary. So specifying
``{"os": ["version"]}`` as a dependent property means that the
``version`` property will only be used if the condition already
contains the ``os`` property and further conditions are required to
separate the observed results.
So an example ``update-properties.json`` file might look like::
"properties": ["product", "os"],
"dependents": {"product": ["browser_channel"], "os": ["version"]}
Update all the expectations from a set of cross-platform test runs::
wpt update-expectations --full osx.log linux.log windows.log
Add expectation data for some new tests that are expected to be
wpt update-expectations tests.log
Why a Custom Format?
Given the use of the metadata files in CI systems, it was desirable to
have something with the following properties:
* Human readable
* Human editable
* Machine readable / writable
* Capable of storing key-value pairs
* Suitable for storing in a version control system (i.e. text-based)
The need for different results per platform means either having
multiple expectation files for each platform, or having a way to
express conditional values within a certain file. The former would be
rather cumbersome for humans updating the expectation files, so the
latter approach has been adopted, leading to the requirement:
* Capable of storing result values that are conditional on the platform.
There are few extant formats that clearly meet these requirements. In
particular although conditional properties could be expressed in many
existing formats, the representation would likely be cumbersome and
error-prone for hand authoring. Therefore it was decided that a custom
format offered the best tradeoffs given the requirements.