Name Description Size Coverage
bhr_collection.py BigQuery reader for the BHR aggregation pipeline. Streams hang-report pings from BigQuery for a given build-date window, filtered down to the columns the downstream pipeline actually consumes. Ported from the original python_mozetl/mozetl/bhr_collection/bhr_collection.py. The SQL and FARM_FINGERPRINT-based deterministic sampling are unchanged. The PySpark BigQuery connector has been replaced with the google-cloud-bigquery Python client; rows are yielded one at a time rather than materialised into a Spark DataFrame so memory stays bounded at production sample sizes (~500K rows × ~5-10 KB each would otherwise need 3-5 GB on a single worker). The google-cloud-bigquery import is deferred to the call site so this module can be imported in environments where the package isn't installed (e.g. unit tests that mock the client). Production runs need the package present in the runtime — that's handled by the TaskCluster Docker image in a later phase of the migration. 21780 -
heuristics.py Hang-signature heuristics for BHR stack aggregation. Ported from the frontend's getHangFrames in https://github.com/mozilla/hang-stats/blob/master/bhr.js so that the daily aggregation job can trim stacks upstream instead of the frontend doing it on every page render. History: - Originally introduced in python_mozetl issue #410 / PR (heuristics migration) as a Python port of the JS algorithm, with byte-for-byte parity verified against 2,080 real recorded samples. - Moved here as part of the bhr_collection migration from python_mozetl into mozilla-central. The algorithm is unchanged. The three heuristics: 1. Nested event loop trim — stop walking the stack at the innermost ``nsThread::ProcessNextEvent`` frame. Frames outside that point are ancestor event loops that don't help identify the hang. 2. Non-Mozilla code collapse — once we cross into Mozilla code walking from leaf to root, drop all but the immediate entry-point non-Mozilla frame. We care HOW Firefox code reached system libraries, not the system internals. 3. SpiderMonkey internals strip — between two JS frames, drop frames that are recognizable as JS engine internals. The JS interpreter machinery isn't useful signal for hang triage. Plus an XPConnect-glue special case that strips XPC_WN_* / XPCWrappedNative / XPTC__InvokebyIndex chains when they sit between a JS frame and native code. The function operates on a symbolicated stack — a list of ``(func_name, lib_name)`` tuples in outer-first (root -> leaf) order. 5122 -
moz.build 329 -
profile_processor.py Columnar data structures and the ProfileProcessor aggregator for BHR. Ported from python_mozetl/mozetl/bhr_collection/bhr_collection.py as part of the bhr_collection migration. The semantics are unchanged — these classes take symbolicated, heuristic-trimmed hang samples and build the columnar output schema (stackTable / funcTable / stringArray / sampleTable / annotationsTable / dates / libs) the frontend consumes. The "(root)" sentinel at index 0 of stackTable / pruneStackCache is intentional: the frontend's stack walker terminates when prefix == 0, so keeping that slot reserved is load-bearing. 15131 -
symbolication.py Symbol-server I/O and breakpad ``.sym`` parsing for BHR aggregation. Ported from python_mozetl/mozetl/bhr_collection/bhr_collection.py as part of the bhr_collection migration. Pure-stdlib relocation; semantics are unchanged. The Mozilla symbol server returns text in breakpad's ``.sym`` format. Each file describes one module: ``PUBLIC`` lines map exported names to addresses, ``FUNC`` lines map function symbols to address ranges. ``make_sym_map`` parses one ``.sym`` blob into a ``{address: symbol}`` dict (plus a sorted key list for bisecting). ``process_module`` is the per-module pipeline: fetch the ``.sym``, parse it, resolve each requested offset. 9748 -
tests -