| bhr_collection.py |
BigQuery reader for the BHR aggregation pipeline.
Streams hang-report pings from BigQuery for a given build-date window,
filtered down to the columns the downstream pipeline actually consumes.
Ported from the original python_mozetl/mozetl/bhr_collection/bhr_collection.py. The
SQL and FARM_FINGERPRINT-based deterministic sampling are unchanged. The
PySpark BigQuery connector has been replaced with the google-cloud-bigquery
Python client; rows are yielded one at a time rather than materialised into
a Spark DataFrame so memory stays bounded at production sample sizes
(~500K rows × ~5-10 KB each would otherwise need 3-5 GB on a single worker).
The google-cloud-bigquery import is deferred to the call site so this
module can be imported in environments where the package isn't installed
(e.g. unit tests that mock the client). Production runs need the package
present in the runtime — that's handled by the TaskCluster Docker image
in a later phase of the migration.
|
21780 |
- |
| heuristics.py |
Hang-signature heuristics for BHR stack aggregation.
Ported from the frontend's getHangFrames in
https://github.com/mozilla/hang-stats/blob/master/bhr.js so that the daily
aggregation job can trim stacks upstream instead of the frontend doing it on
every page render.
History:
- Originally introduced in python_mozetl issue #410 / PR (heuristics
migration) as a Python port of the JS algorithm, with byte-for-byte parity
verified against 2,080 real recorded samples.
- Moved here as part of the bhr_collection migration from python_mozetl into
mozilla-central. The algorithm is unchanged.
The three heuristics:
1. Nested event loop trim — stop walking the stack at the innermost
``nsThread::ProcessNextEvent`` frame. Frames outside that point are
ancestor event loops that don't help identify the hang.
2. Non-Mozilla code collapse — once we cross into Mozilla code walking from
leaf to root, drop all but the immediate entry-point non-Mozilla frame.
We care HOW Firefox code reached system libraries, not the system internals.
3. SpiderMonkey internals strip — between two JS frames, drop frames that
are recognizable as JS engine internals. The JS interpreter machinery
isn't useful signal for hang triage.
Plus an XPConnect-glue special case that strips XPC_WN_* / XPCWrappedNative
/ XPTC__InvokebyIndex chains when they sit between a JS frame and native code.
The function operates on a symbolicated stack — a list of
``(func_name, lib_name)`` tuples in outer-first (root -> leaf) order.
|
5122 |
- |
| moz.build |
|
329 |
- |
| profile_processor.py |
Columnar data structures and the ProfileProcessor aggregator for BHR.
Ported from python_mozetl/mozetl/bhr_collection/bhr_collection.py as part of
the bhr_collection migration. The semantics are unchanged — these classes
take symbolicated, heuristic-trimmed hang samples and build the columnar
output schema (stackTable / funcTable / stringArray / sampleTable /
annotationsTable / dates / libs) the frontend consumes.
The "(root)" sentinel at index 0 of stackTable / pruneStackCache is
intentional: the frontend's stack walker terminates when prefix == 0, so
keeping that slot reserved is load-bearing.
|
15131 |
- |
| symbolication.py |
Symbol-server I/O and breakpad ``.sym`` parsing for BHR aggregation.
Ported from python_mozetl/mozetl/bhr_collection/bhr_collection.py as part of
the bhr_collection migration. Pure-stdlib relocation; semantics are
unchanged.
The Mozilla symbol server returns text in breakpad's ``.sym`` format. Each
file describes one module: ``PUBLIC`` lines map exported names to addresses,
``FUNC`` lines map function symbols to address ranges. ``make_sym_map``
parses one ``.sym`` blob into a ``{address: symbol}`` dict (plus a sorted
key list for bisecting). ``process_module`` is the per-module pipeline:
fetch the ``.sym``, parse it, resolve each requested offset.
|
9748 |
- |
| tests |
|
|
- |