code-overview.rst - mozsearch

Profiler Code Overview

######################

This is an overview of the code that implements the Profiler inside Firefox

with some details around tricky subjects, or pointers to more detailed

documentation and/or source code.

It assumes familiarity with Firefox development, including Mercurial (hg), mach,

moz.build files, Try, Phabricator, etc.

It also assumes knowledge of the user-visible part of the Firefox Profiler, that

is: How to use the Firefox Profiler, and what profiles contain that is shown

when capturing a profile. See the main website https://profiler.firefox.com, and

its `documentation <https://profiler.firefox.com/docs/>`_.

For just an "overview", it may look like a huge amount of information, but the

Profiler code is indeed quite expansive, so it takes a lot of words to explain

even just a high-level view of it! For on-the-spot needs, it should be possible

to search for some terms here and follow the clues. But for long-term

maintainers, it would be worth skimming this whole document to get a grasp of

the domain, and return to get some more detailed information before diving into

the code.

WIP note: This document should be correct at the time it is written, but the

profiler code constantly evolves to respond to bugs or to provide new exciting

features, so this document could become obsolete in parts! It should still be

useful as an overview, but its correctness should be verified by looking at the

actual code. If you notice any significant discrepancy or broken links, please

help by

`filing a bug <https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=Gecko+Profiler>`_.

*****

Terms

*****

This is the common usage for some frequently-used terms, as understood by the

Dev Tools team. But incorrect usage can sometimes happen, context is key!

* **profiler** (a): Generic name for software that enables the profiling of

  code. (`"Profiling" on Wikipedia <https://en.wikipedia.org/wiki/Profiling_(computer_programming)>`_)

* **Profiler** (the): All parts of the profiler code inside Firefox.

* **Base Profiler** (the): Parts of the Profiler that live in

  mozglue/baseprofiler, and can be used from anywhere, but has limited

  functionality.

* **Gecko Profiler** (the): Parts of the Profiler that live in tools/profiler,

  and can only be used from other code in the XUL library.

* **Profilers** (the): Both the Base Profiler and the Gecko Profiler.

* **profiling session**: This is the time during which the profiler is running

  and collecting data.

* **profile** (a): The output from a profiling session, either as a file, or a

  shared viewable profile on https://profiler.firefox.com

* **Profiler back-end** (the): Other name for the Profiler code inside Firefox,

  to distinguish it from...

* **Profiler front-end** (the): The website https://profiler.firefox.com that

  displays profiles captured by the back-end.

* **Firefox Profiler** (the): The whole suite comprised of the back-end and front-end.

******************

Guiding Principles

******************

When working on the profiler, here are some guiding principles to keep in mind:

* Low profiling overhead in cpu and memory. For the Profiler to provide the best

  value, it should stay out of the way and consume as few resources (in time and

  memory) as possible, so as not to skew the actual Firefox code too much.

* Common data structures and code should be in the Base Profiler when possible.

  WIP note: Deduplication is slowly happening, see

  `meta bug 1557566 <https://bugzilla.mozilla.org/show_bug.cgi?id=1557566>`_.

  This document focuses on the Profiler back-end, and mainly the Gecko Profiler

  (because this is where most of the code lives, the Base Profiler is mostly a

  subset, originally just a cut-down version of the Gecko Profiler); so unless

  specified, descriptions below are about the Gecko Profiler, but know that

  there may be some equivalent code in the Base Profiler as well.

* Use appropriate programming-language features where possible to reduce coding

  errors in both our code, and our users' usage of it. In C++, this can be done

  by using a specific class/struct types for a given usage, to avoid misuse

  (e.g., an generic integer representing a **process** could be incorrectly

  given to a function expecting a **thread**; we have specific types for these

  instead, more below.)

* Follow the

  `Coding Style <https://firefox-source-docs.mozilla.org/code-quality/coding-style/index.html>`_.

* Whenever possible, write tests (if not present already) for code you add or

  modify -- but this may be too difficult in some case, use good judgement and

  at least test manually instead.

******************

Profiler Lifecycle

******************

Here is a high-level view of the Base **or** Gecko Profiler lifecycle, as part

of a Firefox run. The following sections will go into much more details.

* Profiler initialization, preparing some common data.

* Threads de/register themselves as they start and stop.

* During each User/test-controlled profiling session:

  * Profiler start, preparing data structures that will store the profiling data.

  * Periodic sampling from a separate thread, happening at a user-selected

    frequency (usually once every 1-2 ms), and recording snapshots of what

    Firefox is doing:

    * CPU sampling, measuring how much time each thread has spent actually

      running on the CPU.

    * Stack sampling, capturing a stack of functions calls from whichever leaf

      function the program is in at this point in time, up to the top-most

      caller (i.e., at least the ``main()`` function, or its callers if any).

      Note that unlike most external profilers, the Firefox Profiler back-end

      is capable or getting more useful information than just native functions

      calls (compiled from C++ or Rust):

      * Labels added by Firefox developers along the stack, usually to identify

        regions of code that perform "interesting" operations (like layout, file

        I/Os, etc.).

      * JavaScript function calls, including the level of optimization applied.

      * Java function calls.

  * At any time, Markers may record more specific details of what is happening,

    e.g.: User operations, page rendering steps, garbage collection, etc.

  * Optional profiler pause, which stops most recording, usually near the end of

    a session so that no data gets recorded past this point.

  * Profile JSON output, generated from all the recorded profiling data.

  * Profiler stop, tearing down profiling session objects.

* Profiler shutdown.

Note that the Base Profiler can start earlier, and then the data collected so

far, as well as the responsibility for periodic sampling, is handed over to the

Gecko Profiler:

#. (Firefox starts)

#. Base Profiler init

#. Base Profiler start

#. (Firefox loads the libxul library and initializes XPCOM)

#. Gecko Profiler init

#. Gecko Profiler start

#. Handover from Base to Gecko

#. Base Profiler stop

#. (Bulk of the profiling session)

#. JSON generation

#. Gecko Profiler stop

#. Gecko Profiler shutdown

#. (Firefox ends XPCOM)

#. Base Profiler shutdown

#. (Firefox exits)

Base Profiler functions that add data (mostly markers and labels) may be called

from anywhere, and will be recorded by either Profiler. The corresponding

functions in Gecko Profiler can only be called from other libxul code, and can

only be recorded by the Gecko Profiler.

Whenever possible, Gecko Profiler functions should be preferred if accessible,

as they may provide extended functionality (e.g., better stacks with JS in

markers). Otherwise fallback on Base Profiler functions.

***********

Directories

***********

* Non-Profiler supporting code

  * `mfbt <https://searchfox.org/mozilla-central/source/mfbt>`_ - Mostly

    replacements for C++ std library facilities.

  * `mozglue/misc <https://searchfox.org/mozilla-central/source/mozglue/misc>`_

    * `PlatformMutex.h <https://searchfox.org/mozilla-central/source/mozglue/misc/PlatformMutex.h>`_ -

      Mutex base classes.

    * `StackWalk.h <https://searchfox.org/mozilla-central/source/mozglue/misc/StackWalk.h>`_ -

      Stack-walking functions.

    * `TimeStamp.h <https://searchfox.org/mozilla-central/source/mozglue/misc/TimeStamp.h>`_ -

      Timestamps and time durations.

  * `xpcom <https://searchfox.org/mozilla-central/source/xpcom>`_

    * `ds <https://searchfox.org/mozilla-central/source/xpcom/ds>`_ -

      Data structures like arrays, strings.

    * `threads <https://searchfox.org/mozilla-central/source/xpcom/threads>`_ -

      Threading functions.

* Profiler back-end

  * `mozglue/baseprofiler <https://searchfox.org/mozilla-central/source/mozglue/baseprofiler>`_ -

    Base Profiler code, usable from anywhere in Firefox. Because it lives in

    mozglue, it's loaded right at the beginning, so it's possible to start the

    profiler very early, even before Firefox loads its big&heavy "xul" library.

    * `baseprofiler's public <https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/public>`_ -

      Public headers, may be #included from anywhere.

    * `baseprofiler's core <https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/core>`_ -

      Main implementation code.

    * `baseprofiler's lul <https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/lul>`_ -

      Special stack-walking code for Linux.

    * `../tests/TestBaseProfiler.cpp <https://searchfox.org/mozilla-central/source/mozglue/tests/TestBaseProfiler.cpp>`_ -

      Unit tests.

  * `tools/profiler <https://searchfox.org/mozilla-central/source/tools/profiler>`_ -

    Gecko Profiler code, only usable from the xul library. That library is

    loaded a short time after Firefox starts, so the Gecko Profiler is not able

    to profile the early phase of the application, Base Profiler handles that,

    and can pass its collected data to the Gecko Profiler when the latter

    starts.

    * `public <https://searchfox.org/mozilla-central/source/tools/profiler/public>`_ -

      Public headers, may be #included from most libxul code.

    * `core <https://searchfox.org/mozilla-central/source/tools/profiler/core>`_ -

      Main implementation code.

    * `gecko <https://searchfox.org/mozilla-central/source/tools/profiler/gecko>`_ -

      Control from JS, and multi-process/IPC code.

    * `lul <https://searchfox.org/mozilla-central/source/tools/profiler/lul>`_ -

      Special stack-walking code for Linux.

    * `rust-api <https://searchfox.org/mozilla-central/source/tools/profiler/rust-api>`_,

      `rust-helper <https://searchfox.org/mozilla-central/source/tools/profiler/rust-helper>`_

    * `tests <https://searchfox.org/mozilla-central/source/tools/profiler/tests>`_

  * `devtools/client/performance-new <https://searchfox.org/mozilla-central/source/devtools/client/performance-new>`_,

    `devtools/shared/performance-new <https://searchfox.org/mozilla-central/source/devtools/shared/performance-new>`_ -

    Middleware code for about:profiling and devtools panel functionality.

  * js, starting with

    `js/src/vm/GeckoProfiler.h <https://searchfox.org/mozilla-central/source/js/src/vm/GeckoProfiler.h>`_ -

    JavaScript engine support, mostly to capture JS stacks.

  * `toolkit/components/extensions/schemas/geckoProfiler.json <https://searchfox.org/mozilla-central/source/toolkit/components/extensions/schemas/geckoProfiler.json>`_ -

    File that needs to be updated when Profiler features change.

* Profiler front-end

  * Out of scope for this document, but its code and bug repository can be found at:

    https://github.com/firefox-devtools/profiler . Sometimes work needs to be

    done on both the back-end of the front-end, especially when modifying the

    back-end's JSON output format.

*******

Headers

*******

The most central public header is

`GeckoProfiler.h <https://searchfox.org/mozilla-central/source/tools/profiler/public/GeckoProfiler.h>`_,

from which almost everything else can be found, it can be a good starting point

for exploration.

It includes other headers, which together contain important top-level macros and

functions.

WIP note: GeckoProfiler.h used to be the header that contained everything!

To better separate areas of functionality, and to hopefully reduce compilation

times, parts of it have been split into smaller headers, and this work will

continue, see `bug 1681416 <https://bugzilla.mozilla.org/show_bug.cgi?id=1681416>`_.

MOZ_GECKO_PROFILER and Macros

=============================

Mozilla officially supports the Profiler on `tier-1 platforms

<https://firefox-source-docs.mozilla.org/contributing/build/supported.html>`_:

Windows, macos, Linux and Android.

There is also some code running on tier 2-3 platforms (e.g., for FreeBSD), but

the team at Mozilla is not obligated to maintain it; we do try to keep it

running, and some external contributors are keeping an eye on it and provide

patches when things do break.

To reduce the burden on unsupported platforms, a lot of the Profilers code is

only compiled when ``MOZ_GECKO_PROFILER`` is #defined. This means that some

public functions may not always be declared or implemented, and should be

surrounded by guards like ``#ifdef MOZ_GECKO_PROFILER``.

Some commonly-used functions offer an empty definition in the

non-``MOZ_GECKO_PROFILER`` case, so these functions may be called from anywhere

without guard.

Other functions have associated macros that can always be used, and resolve to

nothing on unsupported platforms. E.g.,

``PROFILER_REGISTER_THREAD`` calls ``profiler_register_thread`` where supported,

otherwise does nothing.

WIP note: There is an effort to eventually get rid of ``MOZ_GECKO_PROFILER`` and

its associated macros, see

`bug 1635350 <https://bugzilla.mozilla.org/show_bug.cgi?id=1635350>`_.

RAII "Auto" macros and classes

==============================

A number of functions are intended to be called in pairs, usually to start and

then end some operation. To ease their use, and ensure that both functions are

always called together, they usually have an associated class and/or macro that

may be called only once. This pattern of using an object's destructor to ensure

that some action always eventually happens, is called

`RAII <https://en.cppreference.com/w/cpp/language/raii>`_ in C++, with the

common prefix "auto".

E.g.: In ``MOZ_GECKO_PROFILER`` builds,

`AUTO_PROFILER_INIT <https://searchfox.org/mozilla-central/search?q=AUTO_PROFILER_INIT>`_

instantiates an

`AutoProfilerInit <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AAutoProfilerInit>`_

object, which calls ``profiler_init`` when constructed, and

``profiler_shutdown`` when destroyed.

*********************

Platform Abstractions

*********************

This section describes some platform abstractions that are used throughout the

Profilers. (Other platform abstractions will be described where they are used.)

Process and Thread IDs

======================

The Profiler back-end often uses process and thread IDs (aka "pid" and "tid"),

which are commonly just a number.

For better code correctness, and to hide specific platform details, they are

encapsulated in opaque types

`BaseProfilerProcessId <https://searchfox.org/mozilla-central/search?q=BaseProfilerProcessId>`_

and

`BaseProfilerThreadId <https://searchfox.org/mozilla-central/search?q=BaseProfilerThreadId>`_.

These types should be used wherever possible.

When interfacing with other code, they may be converted using the member

functions ``FromNumber`` and ``ToNumber``.

To find the current process or thread ID, use

`profiler_current_process_id <https://searchfox.org/mozilla-central/search?q=profiler_current_process_id>`_

or

`profiler_current_thread_id <https://searchfox.org/mozilla-central/search?q=profiler_current_thread_id>`_.

The main thread ID is available through

`profiler_main_thread_id <https://searchfox.org/mozilla-central/search?q=profiler_main_thread_id>`_

(assuming

`profiler_init_main_thread_id <https://searchfox.org/mozilla-central/search?q=profiler_init_main_thread_id>`_

was called when the application started -- especially important in stand-alone

test programs.)

And

`profiler_is_main_thread <https://searchfox.org/mozilla-central/search?q=profiler_is_main_thread>`_

is a quick way to find out if the current thread is the main thread.

Locking

=======

The locking primitives in PlatformMutex.h are not supposed to be used as-is, but

through a user-accessible implementation. For the Profilers, this is in

`BaseProfilerDetail.h <https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/public/BaseProfilerDetail.h>`_.

In addition to the usual ``Lock``, ``TryLock``, and ``Unlock`` functions,

`BaseProfilerMutex <https://searchfox.org/mozilla-central/search?q=BaseProfilerMutex>`_

objects have a name (which may be helpful when debugging),

they record the thread on which they are locked (making it possible to know if

the mutex is locked on the current thread), and in ``DEBUG`` builds there are

assertions verifying that the mutex is not incorrectly used recursively, to

verify the correct ordering of different Profiler mutexes, and that it is

unlocked before destruction.

Mutexes should preferably be locked within C++ block scopes, or as class

members, by using

`BaseProfilerAutoLock <https://searchfox.org/mozilla-central/search?q=BaseProfilerAutoLock>`_.

Some classes give the option to use a mutex or not (so that single-threaded code

can more efficiently bypass locking operations), for these we have

`BaseProfilerMaybeMutex <https://searchfox.org/mozilla-central/search?q=BaseProfilerMaybeMutex>`_

and

`BaseProfilerMaybeAutoLock <https://searchfox.org/mozilla-central/search?q=BaseProfilerMaybeAutoLock>`_.

There is also a special type of shared lock (aka RWLock, see

`RWLock on wikipedia <https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock>`_),

which may be locked in multiple threads (through ``LockShared`` or preferably

`BaseProfilerAutoLockShared <https://searchfox.org/mozilla-central/search?q=BaseProfilerAutoLockShared>`_),

or locked exclusively, preventing any other locking (through ``LockExclusive`` or preferably

`BaseProfilerAutoLockExclusive <https://searchfox.org/mozilla-central/search?q=BaseProfilerAutoLockExclusive>`_).

*********************

Main Profiler Classes

*********************

Diagram showing the most important Profiler classes, see details in the

following sections:

(As noted, the "RegisteredThread" classes are now obsolete in the Gecko

Profiler, see the "Thread Registration" section below for an updated diagram and

description.)

.. image:: profilerclasses-20220913.png

***********************

Profiler Initialization

***********************

`profiler_init <https://searchfox.org/mozilla-central/search?q=symbol:_Z13profiler_initPv>`_

and

`baseprofiler::profiler_init <https://searchfox.org/mozilla-central/search?q=symbol:_ZN7mozilla12baseprofiler13profiler_initEPv>`_

must be called from the main thread, and are used to prepare important aspects

of the profiler, including:

* Making sure the main thread ID is recorded.

* Handling ``MOZ_PROFILER_HELP=1 ./mach run`` to display the command-line help.

* Creating the ``CorePS`` instance -- more details below.

* Registering the main thread.

* Initializing some platform-specific code.

* Handling other environment variables that are used to immediately start the

  profiler, with optional settings provided in other env-vars.

CorePS

======

The `CorePS class <https://searchfox.org/mozilla-central/search?q=symbol:T_CorePS>`_

has a single instance that should live for the duration of the Firefox

application, and contains important information that could be needed even when

the Profiler is not running.

It includes:

* A static pointer to its single instance.

* The process start time.

* JavaScript-specific data structures.

* A list of registered

  `PageInformations <https://searchfox.org/mozilla-central/search?q=symbol:T_PageInformation>`_,

  used to keep track of the tabs that this process handles.

* A list of

  `BaseProfilerCounts <https://searchfox.org/mozilla-central/search?q=symbol:T_BaseProfilerCount>`_,

  used to record things like the process memory usage.

* The process name, and optionally the "eTLD+1" (roughly sub-domain) that this

  process handles.

* In the Base Profiler only, a list of

  `RegisteredThreads <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%253A%253Abaseprofiler%253A%253ARegisteredThread>`_.

  WIP note: This storage has been reworked in the Gecko Profiler (more below),

  and in practice the Base Profiler only registers the main thread. This should

  eventually disappear as part of the de-duplication work

  (`bug 1557566 <https://bugzilla.mozilla.org/show_bug.cgi?id=1557566>`_).

*******************

Thread Registration

*******************

Threads need to register themselves in order to get fully profiled.

This section describes the main data structures that record the list of

registered threads and their data.

WIP note: There is some work happening to add limited profiling of unregistered

threads, with the hope that more and more functionality could be added to

eventually use the same registration data structures.

Diagram showing the relevant classes, see details in the following sub-sections:

.. image:: profilerthreadregistration-20220913.png

ProfilerThreadRegistry

======================

The

`static ProfilerThreadRegistry object <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3Aprofiler%3A%3AThreadRegistry>`_

contains a list of ``OffThreadRef`` objects.

Each ``OffThreadRef`` points to a ``ProfilerThreadRegistration``, and restricts

access to a safe subset of the thread data, and forces a mutex lock if necessary

(more information under ProfilerThreadRegistrationData below).

ProfilerThreadRegistration

==========================

`ProfilerThreadRegistration object <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3Aprofiler%3A%3AThreadRegistration>`_

contains a lot of information relevant to its thread, to help with profiling it.

This data is accessible from the thread itself through an ``OnThreadRef``

object, which points to the ``ThreadRegistration``, and restricts access to a

safe subset of thread data, and forces a mutex lock if necessary (more

information under ProfilerThreadRegistrationData below).

ThreadRegistrationData and accessors

====================================

`The ProfilerThreadRegistrationData.h header <https://searchfox.org/mozilla-central/source/tools/profiler/public/ProfilerThreadRegistrationData.h>`_

contains a hierarchy of classes that encapsulate all the thread-related data.

``ThreadRegistrationData`` contains all the actual data members, including:

* Some long-lived

  `ThreadRegistrationInfo <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%253A%253Aprofiler%253A%253AThreadRegistrationInfo>`_,

  containing the thread name, its registration time, the thread ID, and whether

  it's the main thread.

* A ``ProfilingStack`` that gathers developer-provided pseudo-frames, and JS

  frames.

* Some platform-specific ``PlatformData`` (usually required to actually record

  profiling measurements for that thread).

* A pointer to the top of the stack.

* A shared pointer to the thread's ``nsIThread``.

* A pointer to the ``JSContext``.

* An optional pre-allocated ``JsFrame`` buffer used during stack-sampling.

* Some JS flags.

* Sleep-related data (to avoid costly sampling while the thread is known to not

  be doing anything).

* The current ``ThreadProfilingFeatures``, to know what kind of data to record.

* When profiling, a pointer to a ``ProfiledThreadData``, which contains some

  more data needed during and just after profiling.

As described in their respective code comments, each data member is supposed to

be accessed in certain ways, e.g., the ``JSContext`` should only be "written

from thread, read from thread and suspended thread". To enforce these rules,

data members can only be accessed through certain classes, which themselves can

only be instantiated in the correct conditions.

The accessor classes are, from base to most-derived:

* ``ThreadRegistrationData``, not an accessor itself, but it's the base class

  with all the ``protected`` data.

* ``ThreadRegistrationUnlockedConstReader``, giving unlocked ``const`` access to

   the ``ThreadRegistrationInfo``, ``PlatformData``, and stack top.

* ``ThreadRegistrationUnlockedConstReaderAndAtomicRW``, giving unlocked

  access to the atomic data members: ``ProfilingStack``, sleep-related data,

  ``ThreadProfilingFeatures``.

* ``ThreadRegistrationUnlockedRWForLockedProfiler``, giving access that's

  protected by the Profiler's main lock, but doesn't require a

  ``ThreadRegistration`` lock, to the ``ProfiledThreadData``

* ``ThreadRegistrationUnlockedReaderAndAtomicRWOnThread``, giving unlocked

  mutable access, but only on the thread itself, to the ``JSContext``.

* ``ThreadRegistrationLockedRWFromAnyThread``, giving locked access from any

  thread to mutex-protected data: ``ThreadProfilingFeatures``, ``JsFrame``,

  ``nsIThread``, and the JS flags.

* ``ThreadRegistrationLockedRWOnThread``, giving locked access, but only from

  the thread itself, to the ``JSContext`` and a JS flag-related operation.

* ``ThreadRegistration::EmbeddedData``, containing all of the above, and stored

  as a data member in each ``ThreadRegistration``.

To recapitulate, if some code needs some data on the thread, it can use

``ThreadRegistration`` functions to request access (with the required rights,

like a mutex lock).

To access data about another thread, use similar functions from

``ThreadRegistry`` instead.

You may find some examples in the implementations of the functions in

ProfilerThreadState.h (see the following section).

ProfilerThreadState.h functions

===============================

The

`ProfilerThreadState.h <https://searchfox.org/mozilla-central/source/tools/profiler/public/ProfilerThreadState.h>`_

header provides a few helpful functions related to threads, including:

* ``profiler_is_active_and_thread_is_registered``

* ``profiler_thread_is_being_profiled`` (for the current thread or another

  thread, and for a given set of features)

* ``profiler_thread_is_sleeping``

**************

Profiler Start

**************

There are multiple ways to start the profiler, through command line env-vars,

and programmatically in C++ and JS.

The main public C++ function is

`profiler_start <https://searchfox.org/mozilla-central/search?q=symbol:_Z14profiler_startN7mozilla10PowerOfTwoIjEEdjPPKcjyRKNS_5MaybeIdEE%2C_Z14profiler_startN7mozilla10PowerOfTwoIjEEdjPPKcjmRKNS_5MaybeIdEE>`_.

It takes all the features specifications, and returns a promise that gets

resolved when the Profiler has fully started in all processes (multi-process

profiling is described later in this document, for now the focus will be on each

process running its instance of the Profiler). It first calls ``profiler_init``

if needed, and also ``profiler_stop`` if the profiler was already running.

The main implementation, which can be called from multiple sources, is

`locked_profiler_start <https://searchfox.org/mozilla-central/search?q=locked_profiler_start>`_.

It performs a number of operations to start the profiling session, including:

* Record the session start time.

* Pre-allocate some work buffer to capture stacks for markers on the main thread.

* In the Gecko Profiler only: If the Base Profiler was running, take ownership

  of the data collected so far, and stop the Base Profiler (we don't want both

  trying to collect the same data at the same time!)

* Create the ActivePS, which keeps track of most of the profiling session

  information, more about it below.

* For each registered thread found in the ``ThreadRegistry``, check if it's one

  of the threads to profile, and if yes set the appropriate data into the

  corresponding ``ThreadRegistrationData`` (including informing the JS engine to

  start recording profiling data).

* On Android, start the Java sampler.

* If native allocations are to be profiled, setup the appropriate hooks.

* Start the audio callback tracing if requested.

* Set the public shared "active" state, used by many functions to quickly assess

  whether to actually record profiling data.

ActivePS

========

The `ActivePS class <https://searchfox.org/mozilla-central/search?q=symbol:T_ActivePS>`_

has a single instance at a time, that should live for the length of the

profiling session.

It includes:

* The session start time.

* A way to track "generations" (in case an old ActivePS still lives when the

  next one starts, so that in-flight data goes to the correct place.)

* Requested features: Buffer capacity, periodic sampling interval, feature set,

  list of threads to profile, optional: specific tab to profile.

* The profile data storage buffer and its chunk manager (see "Storage" section

  below for details.)

* More data about live and dead profiled threads.

* Optional counters for per-process CPU usage, and power usage.

* A pointer to the ``SamplerThread`` object (see "Periodic Sampling" section

  below for details.)

*******

Storage

*******

During a session, the profiling data is serialized into a buffer, which is made

of "chunks", each of which contains "blocks", which have a size and the "entry"

data.

During a profiling session, there is one main profile buffer, which may be

started by the Base Profiler, and then handed over to the Gecko Profiler when

the latter starts.

The buffer is divided in chunks of equal size, which are allocated before they

are needed. When the data reaches a user-set limit, the oldest chunk is

recycled. This means that for long-enough profiling sessions, only the most

recent data (that could fit under the limit) is kept.

Each chunk stores a sequence of blocks of variable length. The chunk itself

only knows where the first full block starts, and where the last block ends,

which is where the next block will be reserved.

To add an entry to the buffer, a block is reserved, the size is written first

(so that readers can find the start of the next block), and then the entry bytes

are written.

The following sessions give more technical details.

leb128iterator.h

================

`This utility header <https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/public/leb128iterator.h>`_

contains some functions to read and write unsigned "LEB128" numbers

(`LEB128 on wikipedia <https://en.wikipedia.org/wiki/LEB128>`_).

They are an efficient way to serialize numbers that are usually small, e.g.,

numbers up to 127 only take one byte, two bytes up to 16,383, etc.

ProfileBufferBlockIndex

=======================

`A ProfileBufferBlockIndex object <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferBlockIndex>`_

encapsulates a block index that is known to be the valid start of a block. It is

created when a block is reserved, or when trusted code computes the start of a

block in a chunk.

The more generic

`ProfileBufferIndex <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferIndex>`_

type is used when working inside blocks.

ProfileBufferChunk

==================

`A ProfileBufferChunk <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferChunk>`_

is a variable-sized object. It contains:

* A public copyable header, itself containing:

  * The local offset to the first full block (a chunk may start with the end of

    a block that was started at the end of the previous chunk). That offset in

    the very first chunk is the natural start to read all the data in the

    buffer.

  * The local offset past the last reserved block. This is where the next block

    should be reserved, unless it points past the end of this chunk size.

  * The timestamp when the chunk was first used.

  * The timestamp when the chunk became full.

  * The number of bytes that may be stored in this chunk.

  * The number of reserved blocks.

  * The global index where this chunk starts.

  * The process ID writing into this chunk.

* An owning unique pointer to the next chunk. It may be null for the last chunk

  in a chain.

* In ``DEBUG`` builds, a state variable, which is used to ensure that the chunk

  goes through a known sequence of states (e.g., Created, then InUse, then

  Done, etc.) See the sequence diagram

  `where the member variable is defined <https://searchfox.org/mozilla-central/search?q=symbol:F_%3CT_mozilla%3A%3AProfileBufferChunk%3A%3AInternalHeader%3E_mState>`_.

* The actual buffer data.

Because a ProfileBufferChunk is variable-size, it must be created through its

static ``Create`` function, which takes care of allocating the correct amount

of bytes, at the correct alignment.

Chunk Managers

==============

ProfilerBufferChunkManager

--------------------------

`The ProfileBufferChunkManager abstract class <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferChunkManager>`_

defines the interface of classes that manage chunks.

Concrete implementations are responsible for:

* Creating chunks for their user, with a mechanism to pre-allocate chunks before they are actually needed.

* Taking back and owning chunks when they are "released" (usually when full).

* Automatically destroying or recycling the oldest released chunks.

* Giving temporary access to extant released chunks.

ProfileBufferChunkManagerSingle

-------------------------------

`A ProfileBufferChunkManagerSingle object <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferChunkManagerSingle>`_

manages a single chunk.

That chunk is always the same, it is never destroyed. The user may use it and

optionally release it. The manager can then be reset, and that one chunk will

be available again for use.

A request for a second chunk would always fail.

This manager is short-lived and not thread-safe. It is useful when there is some

limited data that needs to be captured without blocking the global profiling

buffer, usually one stack sample. This data may then be extracted and quickly

added to the global buffer.

ProfileBufferChunkManagerWithLocalLimit

---------------------------------------

`A ProfileBufferChunkManagerWithLocalLimit object <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferChunkManagerSingle>`_

implements the ``ProfileBufferChunkManager`` interface fully, managing a number

of chunks, and making sure their total combined size stays under a given limit.

This is the main chunk manager user during a profiling session.

Note: It also implements the ``ProfileBufferControlledChunkManager`` interface,

this is explained in the later section "Multi-Process Profiling".

It is thread-safe, and one instance is shared by both Profilers.

ProfileChunkedBuffer

====================

`A ProfileChunkedBuffer object <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileChunkedBuffer>`_

uses a ``ProfilerBufferChunkManager`` to store data, and handles the different

C++ types of data that the Profilers want to read/write as entries in buffer

chunks.

Its main function is ``ReserveAndPut``:

* It takes an invocable object (like a lambda) that should return the size of

  the entry to store, this is to potentially avoid costly operations just to

  compute a size, when the profiler may not be running.

* It attempts to reserve the space in its chunks, requesting a new chunk if

  necessary.

* It then calls a provided invocable object with a

  `ProfileBufferEntryWriter <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferEntryWriter>`_,

  which offers a range of functions to help serialize C++ objects. The

  de/serialization functions are found in specializations of

  `ProfileBufferEntryWriter::Serializer <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferEntryWriter%3A%3ASerializer>`_

and

  `ProfileBufferEntryReader::Deserializer <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferEntryReader%3A%3ADeserializer>`_.

More "put" functions use ``ReserveAndPut`` to more easily serialize blocks of

memory, or C++ objects.

``ProfileChunkedBuffer`` is optionally thread-safe, using a

``BaseProfilerMaybeMutex``.

WIP note: Using a mutex makes this storage too noisy for profiling some

real-time (like audio processing).

`Bug 1697953 <https://bugzilla.mozilla.org/show_bug.cgi?id=1697953>`_ will look

at switching to using atomic variables instead.

An alternative would be to use a totally separate non-thread-safe buffers for

each real-time thread that requires it (see

`bug 1754889 <https://bugzilla.mozilla.org/show_bug.cgi?id=1754889>`_).

ProfileBuffer

=============

`A ProfileBuffer object <https://searchfox.org/mozilla-central/search?q=symbol:T_ProfileBuffer>`_

uses a ``ProfileChunkedBuffer`` to store data, and handles the different kinds

of entries that the Profilers want to read/write.

Each entry starts with a tag identifying a kind. These kinds can be found in

`ProfileBufferEntryKinds.h <https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/public/ProfileBufferEntryKinds.h>`_.

There are "legacy" kinds, which are small fixed-length entries, such as:

Categories, labels, frame information, counters, etc. These can be stored in

`ProfileBufferEntry objects <https://searchfox.org/mozilla-central/search?q=symbol:T_ProfileBufferEntry>`_

And there are "modern" kinds, which have variable sizes, such as: Markers, CPU

running times, full stacks, etc. These are more directly handled by code that

can access the underlying ``ProfileChunkedBuffer``.

The other major responsibility of a ``ProfileChunkedBuffer`` is to read back all

this data, sometimes during profiling (e.g., to duplicate a stack), but mainly

at the end of a session when generating the output JSON profile.

*****************

Periodic Sampling

*****************

Probably the most important job of the Profiler is to sample stacks of a number

of running threads, to help developers know which functions get used a lot when

performing some operation on Firefox.

This is accomplished from a special thread, which regularly springs into action

and captures all this data.

SamplerThread

=============

`The SamplerThread object <https://searchfox.org/mozilla-central/search?q=symbol:T_SamplerThread>`_

manages the information needed during sampling. It is created when the profiler

starts, and is stored inside the ``ActivePS``, see above for details.

It includes:

* A ``Sampler`` object that contains platform-specific details, which are

  implemented in separate files like platform-win32.cpp, etc.

* The same generation index as its owning ``ActivePS``.

* The requested interval between samples.

* A handle to the thread where the sampling happens, its main function is

  `Run() function <https://searchfox.org/mozilla-central/search?q=symbol:_ZN13SamplerThread3RunEv>`_.

* A list of callbacks to invoke after the next sampling. These may be used by

  tests to wait for sampling to actually happen.

* The unregistered-thread-spy data, and an optional handle on another thread

  that takes care of "spying" on unregistered thread (on platforms where that

  operation is too expensive to run directly on the sampling thread).

The ``Run()`` function takes care of performing the periodic sampling work:

(more details in the following sections)

* Retrieve the sampling parameters.

* Instantiate a ``ProfileBuffer`` on the stack, to capture samples from other threads.

* Loop until a ``break``:

  * Lock the main profiler mutex, and do:

    * Check if sampling should stop, and break from the loop.

    * Clean-up exit profiles (these are profiles sent from dying sub-processes,

      and are kept for as long as they overlap with this process' own buffer range).

    * Record the CPU utilization of the whole process.

    * Record the power consumption.

    * Sample each registered counter, including the memory counter.

    * For each registered thread to be profiled:

      * Record the CPU utilization.

      * If the thread is marked as "still sleeping", record a "same as before"

        sample, otherwise suspend the thread and take a full stack sample.

      * On some threads, record the event delay to compute the

        (un)responsiveness. WIP note: This implementation may change.

    * Record profiling overhead durations.

  * Unlock the main profiler mutex.

  * Invoke registered post-sampling callbacks.

  * Spy on unregistered threads.

  * Based on the requested sampling interval, and how much time this loop took,

    compute when the next sampling loop should start, and make the thread sleep

    for the appropriate amount of time. The goal is to be as regular as

    possible, but if some/all loops take too much time, don't try too hard to

    catch up, because the system is probably under stress already.

  * Go back to the top of the loop.

* If we're here, we hit a loop ``break`` above.

* Invoke registered post-sampling callbacks, to let them know that sampling

  stopped.

CPU Utilization

===============

CPU Utilization is stored as a number of milliseconds that a thread or process

has spent running on the CPU since the previous sampling.

Implementations are platform-dependent, and can be found in

`the GetThreadRunningTimesDiff function <https://searchfox.org/mozilla-central/search?q=symbol:_ZL25GetThreadRunningTimesDiffRK10PSAutoLockRN7mozilla8profiler45ThreadRegistrationUnlockedRWForLockedProfilerE>`_

and

`the GetProcessRunningTimesDiff function <https://searchfox.org/mozilla-central/search?q=symbol:_ZL26GetProcessRunningTimesDiffRK10PSAutoLockR12RunningTimes>`_.

Power Consumption

=================

Energy probes added in 2022.

Stacks

======

Stacks are the sequence of calls going from the entry point in the program

(generally ``main()`` and some OS-specific functions above), down to the

function where code is currently being executed.

Native Frames

-------------

Compiled code, from C++ and Rust source.

Label Frames

------------

Pseudo-frames with arbitrary text, added from any language, mostly C++.

JS, Wasm Frames

---------------

Frames corresponding to JavaScript functions.

Java Frames

-----------

Recorded by the JavaSampler.

Stack Merging

-------------

The above types of frames are all captured in different ways, and when finally

taking an actual stack sample (apart from Java), they get merged into one stack.

All frames have an associated address in the call stack, and can therefore be

merged mostly by ordering them by this stack address. See

`MergeStacks <https://searchfox.org/mozilla-central/search?q=symbol:_ZL11MergeStacksjbRKN7mozilla8profiler51ThreadRegistrationUnlockedReaderAndAtomicRWOnThreadERK9RegistersRK11NativeStackR22ProfilerStackCollectorPN2JS22ProfilingFrameIterator5FrameEj>`_

for the implementation details.

Counters

========

Counters are a special kind of probe, which can be continuously updated during

profiling, and the ``SamplerThread`` will sample their value at every loop.

Memory Counter

--------------

This is the main counter. During a profiling session, hooks into the memory

manager keep track of each de/allocation, so at each sampling we know how many

operations were performed, and what is the current memory usage compared to the

previous sampling.

Profiling Overhead

==================

The ``SamplerThread`` records timestamps between parts of its sampling loop, and

records this as the sampling overhead. This may be useful to determine if the

profiler itself may have used too much of the computer resources, which could

skew the profile and give wrong impressions.

Unregistered Thread Profiling

=============================

At some intervals (not necessarily every sampling loop, depending on the OS),

the profiler may attempt to find unregistered threads, and record some

information about them.

WIP note: This feature is experimental, and data is captured in markers on the

main thread. More work is needed to put this data in tracks like regular

registered threads, and capture more data like stack samples and markers.

*******

Markers

*******

Markers are events with a precise timestamp or time range, they have a name, a

category, options (out of a few choices), and optional marker-type-specific

payload data.

Before describing the implementation, it is useful to be familiar with how

markers are natively added from C++, because this drives how the implementation

takes all this information and eventually outputs it in the final JSON profile.

Adding Markers from C++

=======================

See https://firefox-source-docs.mozilla.org/tools/profiler/markers-guide.html

Implementation

==============

The main function that records markers is

`profiler_add_marker <https://searchfox.org/mozilla-central/search?q=symbol:_Z19profiler_add_markerRKN7mozilla18ProfilerStringViewIcEERKNS_14MarkerCategoryEONS_13MarkerOptionsET_DpRKT0_>`_.

It's a variadic templated function that takes the different the expected

arguments, first checks if the marker should actually be recorded (the profiler

should be running, and the target thread should be profiled), and then calls

into the deeper implementation function ``AddMarkerToBuffer`` with a reference

to the main profiler buffer.

`AddMarkerToBuffer <https://searchfox.org/mozilla-central/search?q=symbol:_Z17AddMarkerToBufferRN7mozilla20ProfileChunkedBufferERKNS_18ProfilerStringViewIcEERKNS_14MarkerCategoryEONS_13MarkerOptionsET_DpRKT0_>`_

takes the marker type as an object, removes it from the function parameter list,

and calls the next function with the marker type as an explicit template

parameter, and also a pointer to the function that can capture the stack

(because it is different between Base and Gecko Profilers, in particular the

latter one knows about JS).

From here, we enter the land of

`BaseProfilerMarkersDetail.h <https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/public/BaseProfilerMarkersDetail.h>`_,

which employs some heavy template techniques, in order to most efficiently

serialize the given marker payload arguments, in order to make them

deserializable when outputting the final JSON. In previous implementations, for

each new marker type, a new C++ class derived from a payload abstract class was

required, that had to implement all the constructors and virtual functions to:

* Create the payload object.

* Serialize the payload into the profile buffer.

* Deserialize from the profile buffer to a new payload object.

* Convert the payload into the final output JSON.

Now, the templated functions automatically take care of serializing all given

function call arguments directly (instead of storing them somewhere first), and

preparing a deserialization function that will recreate them on the stack and

directly call the user-provided JSONification function with these arguments.

Continuing from the public ``AddMarkerToBuffer``,

`mozilla::base_profiler_markers_detail::AddMarkerToBuffer <https://searchfox.org/mozilla-central/search?q=symbol:_ZN7mozilla28base_profiler_markers_detail17AddMarkerToBufferERNS_20ProfileChunkedBufferERKNS_18ProfilerStringViewIcEERKNS_14MarkerCategoryEONS_13MarkerOptionsEPFbS2_NS_19StackCaptureOptionsEEDpRKT0_>`_

sets some defaults if not specified by the caller: Target the current thread,

use the current time.

Then if a stack capture was requested, attempt to do it in

the most efficient way, using a pre-allocated buffer if possible.

WIP note: This potential allocation should be avoided in time-critical thread.

There is already a buffer for the main thread (because it's the busiest thread),

but there could be more pre-allocated threads, for specific real-time thread

that need it, or picked from a pool of pre-allocated buffers. See

`bug 1578792 <https://bugzilla.mozilla.org/show_bug.cgi?id=1578792>`_.

From there, `AddMarkerWithOptionalStackToBuffer <https://searchfox.org/mozilla-central/search?q=AddMarkerWithOptionalStackToBuffer>`_

handles ``NoPayload`` markers (usually added with ``PROFILER_MARKER_UNTYPED``)

in a special way, mostly to avoid the extra work associated with handling

payloads. Otherwise it continues with the following function.

`MarkerTypeSerialization<MarkerType>::Serialize <symbol:_ZN7mozilla28base_profiler_markers_detail23MarkerTypeSerialization9SerializeERNS_20ProfileChunkedBufferERKNS_18ProfilerStringViewIcEERKNS_14MarkerCategoryEONS_13MarkerOptionsEDpRKTL0__>`_

retrieves the deserialization tag associated with the marker type. If it's the

first time this marker type is used,

`Streaming::TagForMarkerTypeFunctions <symbol:_ZN7mozilla28base_profiler_markers_detail9Streaming25TagForMarkerTypeFunctionsEPFvRNS_24ProfileBufferEntryReaderERNS_12baseprofiler20SpliceableJSONWriterEEPFNS_4SpanIKcLy18446744073709551615EEEvEPFNS_12MarkerSchemaEvE,_ZN7mozilla28base_profiler_markers_detail9Streaming25TagForMarkerTypeFunctionsEPFvRNS_24ProfileBufferEntryReaderERNS_12baseprofiler20SpliceableJSONWriterEEPFNS_4SpanIKcLm18446744073709551615EEEvEPFNS_12MarkerSchemaEvE,_ZN7mozilla28base_profiler_markers_detail9Streaming25TagForMarkerTypeFunctionsEPFvRNS_24ProfileBufferEntryReaderERNS_12baseprofiler20SpliceableJSONWriterEEPFNS_4SpanIKcLj4294967295EEEvEPFNS_12MarkerSchemaEvE>`_

adds it to the global list (which stores some function pointers used during

deserialization).

Then the main serialization happens in

`StreamFunctionTypeHelper<decltype(MarkerType::StreamJSONMarkerData)>::Serialize <symbol:_ZN7mozilla28base_profiler_markers_detail24StreamFunctionTypeHelperIFT_RNS_12baseprofiler20SpliceableJSONWriterEDpT0_EE9SerializeERNS_20ProfileChunkedBufferERKNS_18ProfilerStringViewIcEERKNS_14MarkerCategoryEONS_13MarkerOptionsEhDpRKS6_>`_.

Deconstructing this mouthful of an template:

* ``MarkerType::StreamJSONMarkerData`` is the user-provided function that will

  eventually produce the final JSON, but here it's only used to know the

  parameter types that it expects.

* ``StreamFunctionTypeHelper`` takes that function prototype, and can extract

  its argument by specializing on ```R(SpliceableJSONWriter&, As...)``, now

  ``As...`` is a parameter pack matching the function parameters.

* Note that ``Serialize`` also takes a parameter pack, which contains all the

  referenced arguments given to the top ``AddBufferToMarker`` call. These two

  packs are supposed to match, at least the given arguments should be

  convertible to the target pack parameter types.

* That specialization's ``Serialize`` function calls the buffer's ``PutObjects``

  variadic function to write all the marker data, that is:

  * The entry kind that must be at the beginning of every buffer entry, in this

    case `ProfileBufferEntryKind::Marker <https://searchfox.org/mozilla-central/source/mozglue/baseprofiler/public/ProfileBufferEntryKinds.h#78>`_.

  * The common marker data (options first, name, category, deserialization tag).

  * Then all the marker-type-specific arguments. Note that the C++ types

    are those extracted from the deserialization function, so we know that

    whatever is serialized here can be later deserialized using those same

    types.

The deserialization side is described in the later section "JSON output of

Markers".

Adding Markers from Rust

========================

See https://firefox-source-docs.mozilla.org/tools/profiler/instrumenting-rust.html#adding-markers

Adding Markers from JS

======================

See https://firefox-source-docs.mozilla.org/tools/profiler/instrumenting-javascript.html

Adding Markers from Java

========================

See https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/main/java/org/mozilla/geckoview/ProfilerController.java

*************

Profiling Log

*************

During a profiling session, some profiler-related events may be recorded using

`ProfilingLog::Access <https://searchfox.org/mozilla-central/search?q=symbol:_ZN12ProfilingLog6AccessEOT_>`_.

The resulting JSON object is added near the end of the process' JSON generation,

in a top-level property named "profilingLog". This object is free-form, and is

not intended to be displayed, or even read by most people. But it may include

interesting information for advanced users, or could be an early temporary

prototyping ground for new features.

See "profileGatheringLog" for another log related to late events.

WIP note: This was introduced shortly before this documentation, so at this time

it doesn't do much at all.

***************

Profile Capture

***************

Usually at the end of a profiling session, a profile is "captured", and either

saved to disk, or sent to the front-end https://profiler.firefox.com for

analysis. This section describes how the captured data is converted to the

Gecko Profiler JSON format.

FailureLatch

============

`The FailureLatch interface <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AFailureLatch>`_

is used during the JSON generation, in order to catch any unrecoverable error

(such as running Out Of Memory), to exit the process early, and to forward the

error to callers.

There are two main implementations, suffixed "source" as they are the one source

of failure-handling, which is passed as ``FailureLatch&`` throughout the code:

* `FailureLatchInfallibleSource <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AFailureLatchInfallibleSource>`_

  is an "infallible" latch, meaning that it doesn't expect any failure. So if

  a failure actually happened, the program would immediately terminate! (This

  was the default behavior prior to introducing these latches.)

* `FailureLatchSource <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AFailureLatchSource>`_

  is a "fallible" latch, it will record the first failure that happens, and

  "latch" into the failure state. The code should regularly examine this state,

  and return early when possible. Eventually this failure state may be exposed

  to end users.

ProgressLogger, ProportionValue

===============================

`A ProgressLogger object <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProgressLogger>`_

is used to track the progress of a long operation, in this case the JSON

generation process.

To match how the JSON generation code works (as a tree of C++ functions calls),

each ``ProgressLogger`` in a function usually records progress from 0 to 100%

locally inside that function. If that function calls a sub-function, it gives it

a sub-logger, which in the caller function is set to represent a local sub-range

(like 20% to 40%), but to the called function it will look like its own local

``ProgressLogger`` that goes from 0 to 100%. The very top ``ProgressLogger``

converts the deepest local progress value to the corresponding global progress.

Progress values are recorded in

`ProportionValue objects <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProportionValue>`_,

which effectively record fractional value with no loss of precision.

This progress is most useful when the parent process is waiting for child

processes to do their work, to make sure progress does happen, otherwise to stop

waiting for frozen processes. More about that in the "Multi-Process Profiling"

section below.

JSONWriter

==========

`A JSONWriter object <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AJSONWriter>`_

offers a simple way to create a JSON stream (start/end collections, add

elements, etc.), and calls back into a provided

`JSONWriteFunc interface <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AJSONWriteFunc>`_

to output characters.

While these classes live outside of the Profiler directories, it may sometimes be

worth maintaining and/or modifying them to better serve the Profiler's needs.

But there are other users, so be careful not to break other things!

SpliceableJSONWriter and SpliceableChunkedJSONWriter

====================================================

Because the Profiler deals with large amounts of data (big profiles can take

tens to hundreds of megabytes!), some specialized wrappers add better handling

of these large JSON streams.

`SpliceableJSONWriter <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3Abaseprofiler%3A%3ASpliceableJSONWriter>`_

is a subclass of ``JSONWriter``, and allows the "splicing" of JSON strings,

i.e., being able to take a whole well-formed JSON string, and directly inserting

it as a JSON object in the target JSON being streamed.

It also offers some functions that are often useful for the Profiler, such as:

* Converting a timestamp into a JSON object in the stream, taking care of keeping a nanosecond precision, without unwanted zeroes or nines at the end.

* Adding a number of null elements.

* Adding a unique string index, and add that string to a provided unique-string list if necessary. (More about UniqueStrings below.)

`SpliceableChunkedJSONWriter <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3Abaseprofiler%3A%3ASpliceableChunkedJSONWriter>`_

is a subclass of ``SpliceableJSONWriter``. Its main attribute is that it provides its own writer

(`ChunkedJSONWriteFunc <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3Abaseprofiler%3A%3AChunkedJSONWriteFunc>`_),

which stores the stream as a sequence of "chunks" (heap-allocated buffers).

It starts with a chunk of a default size, and writes incoming data into it,

later allocating more chunks as needed. This avoids having massive buffers being

resized all the time.

It also offers the same splicing abilities as its parent class, but in case an

incoming JSON string comes from another ``SpliceableChunkedJSONWriter``, it's

able to just steal the chunks and add them to its list, thereby avoiding

expensive allocations and copies and destructions.

UniqueStrings

=============

Because a lot of strings would be repeated in profiles (e.g., frequent marker

names), such strings are stored in a separate JSON array of strings, and an

index into this list is used instead of that full string object.

Note that these unique-string indices are currently only located in specific

spots in the JSON tree, they cannot be used just anywhere strings are accepted.

`The UniqueJSONStrings class <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3Abaseprofiler%3A%3AUniqueJSONStrings>`_

stores this list of unique strings in a ``SpliceableChunkedJSONWriter``.

Given a string, it takes care of storing it if encountered for the first time,

and inserts the index into a target ``SpliceableJSONWriter``.

JSON Generation

===============

The "Gecko Profile Format" can be found at

https://github.com/firefox-devtools/profiler/blob/main/docs-developer/gecko-profile-format.md .

The implementation in the back-end is

`locked_profiler_stream_json_for_this_process <https://searchfox.org/mozilla-central/search?q=locked_profiler_stream_json_for_this_process>`_.

It outputs each JSON top-level JSON object, mostly in sequence. See the code for

how each object is output. Note that there is special handling for samples and

markers, as explained in the following section.

ProcessStreamingContext and ThreadStreamingContext

--------------------------------------------------

In JSON profiles, samples and markers are separated by thread and by

samples/markers. Because there are potentially tens to a hundred threads, it

would be very costly to read the full profile buffer once for each of these

groups. So instead the buffer is read once, and all samples and markers are

handled as they are read, and their JSON output is sent to separate JSON

writers.

`A ProcessStreamingContext object <https://searchfox.org/mozilla-central/search?q=symbol:T_ProcessStreamingContext>`_

contains all the information to facilitate this output, including a list of

`ThreadStreamingContext's <https://searchfox.org/mozilla-central/search?q=symbol:T_ThreadStreamingContext>`_,

which each contain one ``SpliceableChunkedJSONWriter`` for the samples, and one

for the markers in this thread.

When reading entries from the profile buffer, samples and markers are found by

their ``ProfileBufferEntryKind``, and as part of deserializing either kind (more

about each below), the thread ID is read, and determines which

``ThreadStreamingContext`` will receive the JSON output.

At the end of this process, all ``SpliceableChunkedJSONWriters`` are efficiently

spliced (mainly a pointer move) into the final JSON output.

JSON output of Samples

----------------------

This work is done in

`ProfileBuffer::DoStreamSamplesAndMarkersToJSON <https://searchfox.org/mozilla-central/search?q=DoStreamSamplesAndMarkersToJSON>`_.

From the main ``ProfileChunkedBuffer``, each entry is visited, its

``ProfileBufferEntryKind`` is read first, and for samples all frames from

captured stack are converted to the appropriate JSON.

`A UniqueStacks object <https://searchfox.org/mozilla-central/search?q=symbol:T_UniqueStacks>`_

is used to de-duplicate frames and even sub-stacks:

* Each unique frame string is written into a JSON array inside a

  ``SpliceableChunkedJSONWriter``, and its index is the frame identifier.

* Each stack level is also de-duplicated, and identifies the associated frame

  string, and points at the calling stack level (i.e., closer to the root).

* Finally, the identifier for the top of the stack is stored, along with a

  timestamp (and potentially some more information) as the sample.

For example, if we have collected the following samples:

#. A -> B -> C

#. A -> B

#. A -> B -> D

The frame table would contain each frame name, something like:

``["A", "B", "C", "D"]``. So the frame containing "A" has index 0, "B" is at 1,

etc.

The stack table would contain each stack level, something like:

``[[0, null], [1, 0], [2, 1], [3, 1]]``. ``[0, null]`` means the frame is 0

("A"), and it has no caller, it's the root frame. ``[1, 0]`` means the frame is

1 ("B"), and its caller is stack 0, which is just the previous one in this

example.

And the three samples stored in the thread data would be therefore be: 2, 1, 3

(E.g.: "2" points in the stack table at the frame [2,1] with "C", and from them

down to "B", then "A").

All this contains all the information needed to reconstruct all full stack

samples.

JSON output of Markers

----------------------

This also happens

`inside ProfileBuffer::DoStreamSamplesAndMarkersToJSON <https://searchfox.org/mozilla-central/search?q=DoStreamSamplesAndMarkersToJSON>`_.

When a ``ProfileBufferEntryKind::Marker`` is encountered,

`the DeserializeAfterKindAndStream function <https://searchfox.org/mozilla-central/search?q=DeserializeAfterKindAndStream>`_

reads the ``MarkerOptions`` (stored as explained above), which include the

thread ID, identifying which ``ThreadStreamingContext``'s

``SpliceableChunkedJSONWriter`` to use.

After that, the common marker data (timing, category, etc.) is output.

Then the ``Streaming::DeserializerTag`` identifies which type of marker this is.

The special case of ``0`` (no payload) means nothing more is output.

Otherwise some more common data is output as part of the payload if present, in

particular the "inner window id" (used to match markers with specific html

frames), and stack.

WIP note: Some of these may move around in the future, see

`bug 1774326 <https://bugzilla.mozilla.org/show_bug.cgi?id=1774326>`_,

`bug 1774328 <https://bugzilla.mozilla.org/show_bug.cgi?id=1774328>`_, and

others.

In case of a C++-written payload, the ``DeserializerTag`` identifies the

``MarkerDataDeserializer`` function to use. This is part of the heavy templated

code in BaseProfilerMarkersDetail.h, the function is defined as

`MarkerTypeSerialization<MarkerType>::Deserialize <https://searchfox.org/mozilla-central/search?q=symbol:_ZN7mozilla28base_profiler_markers_detail23MarkerTypeSerialization11DeserializeERNS_24ProfileBufferEntryReaderERNS_12baseprofiler20SpliceableJSONWriterE>`_,

which outputs the marker type name, and then each marker payload argument. The

latter is done by using the user-defined ``MarkerType::StreamJSONMarkerData``

parameter list, and recursively deserializing each parameter from the profile

buffer into an on-stack variable of a corresponding type, at the end of which

``MarkerType::StreamJSONMarkerData`` can be called with all of these arguments

at it expects, and that function does the actual JSON streaming as the user

programmed.

*************

Profiler Stop

*************

See "Profiler Start" and do the reverse!

There is some special handling of the ``SampleThread`` object, just to ensure

that it gets deleted outside of the main profiler mutex being locked, otherwise

this could result in a deadlock (because it needs to take the lock before being

able to check the state variable indicating that the sampling loop and thread

should end).

*****************

Profiler Shutdown

*****************

See "Profiler Initialization" and do the reverse!

One additional action is handling the optional ``MOZ_PROFILER_SHUTDOWN``

environment variable, to output a profile if the profiler was running.

***********************

Multi-Process Profiling

***********************

All of the above explanations focused on what the profiler is doing is each

process: Starting, running and collecting samples, markers, and more data,

outputting JSON profiles, and stopping.

But Firefox is a multi-process program, since

`Electrolysis aka e10s <https://wiki.mozilla.org/Electrolysis>`_ introduce child

processes to handle web content and extensions, and especially since

`Fission <https://wiki.mozilla.org/Project_Fission>`_ forced even parts of the

same webpage to run in separate processes, mainly for added security. Since then

Firefox can spawn many processes, sometimes 10 to 20 when visiting busy sites.

The following sections explains how profiling Firefox as a whole works.

IPC (Inter-Process Communication)

=================================

See https://firefox-source-docs.mozilla.org/ipc/.

As a quick summary, some message-passing function-like declarations live in

`PProfiler.ipdl <https://searchfox.org/mozilla-central/source/tools/profiler/gecko/PProfiler.ipdl>`_,

and corresponding ``SendX`` and ``RecvX`` C++ functions are respectively

generated in

`PProfilerParent.h <https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/_ipdlheaders/mozilla/PProfilerParent.h>`_,

and virtually declared (for user implementation) in

`PProfilerChild.h <https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/_ipdlheaders/mozilla/PProfilerChild.h>`_.

During Profiling

================

Exit profiles

-------------

One IPC message that is not in PProfiler.ipdl, is

`ShutdownProfile <https://searchfox.org/mozilla-central/search?q=ShutdownProfile%28&path=&case=false&regexp=false>`_

in

`PContent.ipdl <https://searchfox.org/mozilla-central/source/dom/ipc/PContent.ipdl>`_.

It's called from

`ContentChild::ShutdownInternal <https://searchfox.org/mozilla-central/search?q=symbol:_ZN7mozilla3dom12ContentChild16ShutdownInternalEv>`_,

just before a child process ends, and if the profiler was running, to ensure

that the profile data is collected and sent to the parent, for storage in its

``ActivePS``.

See

`ActivePS::AddExitProfile <https://searchfox.org/mozilla-central/search?q=symbol:_ZN8ActivePS14AddExitProfileERK10PSAutoLockRK12nsTSubstringIcE>`_

for details. Note that the current "buffer position at gathering time" (which is

effectively the largest ``ProfileBufferBlockIndex`` that is present in the

global profile buffer) is recorded. Later,

`ClearExpiredExitProfiles <https://searchfox.org/mozilla-central/search?q=ClearExpiredExitProfiles>`_

looks at the **smallest** ``ProfileBufferBlockIndex`` still present in the

buffer (because early chunks may have been discarded to limit memory usage), and

discards exit profiles that were recorded before, because their data is now

older than anything stored in the parent.

Profile Buffer Global Memory Control

------------------------------------

Each process runs its own profiler, with each its own profile chunked buffer. To

keep the overall memory usage of all these buffers under the user-picked limit,

processes work together, with the parent process overseeing things.

Diagram showing the relevant classes, see details in the following sub-sections:

.. image:: fissionprofiler-20200424.png

ProfileBufferControlledChunkManager

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

`The ProfileBufferControlledChunkManager interface <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferControlledChunkManager>`_

allows a controller to get notified about all chunk updates, and to force the

destruction/recycling of old chunks.

`The ProfileBufferChunkManagerWithLocalLimit class <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferChunkManagerWithLocalLimit>`_

implements it.

`An Update object <https://searchfox.org/mozilla-central/search?q=symbol:T_mozilla%3A%3AProfileBufferControlledChunkManager%3A%3AUpdate>`_

contains all information related to chunk changes: How much memory is currently

used by the local chunk manager, how much has been "released" (and therefore

could be destroyed/recycled), and a list of all chunks that were released since

the previous update; it also has a special state meaning that the child is

shutting down so there won't be updates anymore. An ``Update`` may be "folded"

into a previous one, to create a combined update equivalent to the two separate

ones one after the other.

Update Handling in the ProfilerChild

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When the profiler starts in a child process, the ``ProfilerChild``

`starts to listen for updates <https://searchfox.org/mozilla-central/search?q=symbol:_ZN7mozilla13ProfilerChild17SetupChunkManagerEv>`_.

These updates are stored and folded into previous ones (if any). At some point,

`an AwaitNextChunkManagerUpdate message <https://searchfox.org/mozilla-central/search?q=RecvAwaitNextChunkManagerUpdate>`_

will be received, and any update can be forwarded to the parent. The local

update is cleared, ready to store future updates.

Update Handling in the ProfilerParent

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When the profiler starts AND when there are child processes, the

`ProfilerParent's ProfilerParentTracker <https://searchfox.org/mozilla-central/search?q=ProfilerParentTracker>`_

creates

`a ProfileBufferGlobalController <https://searchfox.org/mozilla-central/search?q=ProfileBufferGlobalController>`_,

which starts to listen for updates from the local chunk manager.

The ``ProfilerParentTracker`` is also responsible for keeping track of child

processes, and to regularly

`send them AwaitNextChunkManagerUpdate messages <https://searchfox.org/mozilla-central/search?q=SendAwaitNextChunkManagerUpdate>`_,

that the child's ``ProfilerChild`` answers to with updates. The update may

indicate that the child is shutting down, in which case the tracker will stop

tracking it.

All these updates (from the local chunk manager, and from child processes' own

chunk managers) are processed in

`ProfileBufferGlobalController::HandleChunkManagerNonFinalUpdate <https://searchfox.org/mozilla-central/search?q=HandleChunkManagerNonFinalUpdate>`_.

Based on this stream of updates, it is possible to calculate the total memory

used by all profile buffers in all processes, and to keep track of all chunks

that have been "released" (i.e., are full, and can be destroyed). When the total

memory usage reaches the user-selected limit, the controller can lookup the

oldest chunk, and get it destroyed (either a local call for parent chunks, or by

sending

`a DestroyReleasedChunksAtOrBefore message <https://searchfox.org/mozilla-central/search?q=DestroyReleasedChunksAtOrBefore>`_

to the owning child).

Historical note: Prior to Fission, the Profiler used to keep one fixed-size

circular buffer in each process, but as Fission made the possible number of

processes unlimited, the memory consumption grew too fast, and required the

implementation of the above system. But there may still be mentions of

"circular buffers" in the code or documents; these have effectively been

replaced by chunked buffers, with centralized chunk control.

Gathering Child Profiles

========================

When it's time to capture a full profile, the parent process performs its own

JSON generation (as described above), and sends

`a GatherProfile message <https://searchfox.org/mozilla-central/search?q=GatherProfile%28>`_

to all child processes, which will make them generate their JSON profile and

send it back to the parent.

All child profiles, including the exit profiles collected during profiling, are

stored as elements of a top-level array with property name "processes".

During the gathering phase, while the parent is waiting for child responses, it

regularly sends

`GetGatherProfileProgress messages <https://searchfox.org/mozilla-central/search?q=GetGatherProfileProgress>`_

to all child processes that have not sent their profile yet, and the parent

expects responses within a short timeframe. The response carries a progress

value. If at some point two messages went with no progress was made anywhere

(either there was no response, or the progress value didn't change), the parent

assumes that remaining child processes may be frozen indefinitely, stops the

gathering and considers the JSON generation complete.

During all of the above work, events are logged (especially issues with child

processes), and are added at the end of the JSON profile, in a top-level object

with property name "profileGatheringLog". This object is free-form, and is not

intended to be displayed, or even read by most people. But it may include

interesting information for advanced users regarding the profile-gathering

phase.

Source code

Revision control

Copy as Markdown

Other Tools