Source code

Revision control

Copy as Markdown

Other Tools

# Content Classifier Service
The Content Classifier Service (`toolkit/components/content-classifier/`) is
the anti-tracking component that classifies network channels against
adblock-format filter lists delivered through Remote Settings. It is a
parallel classification path layered alongside the older URL Classifier and
its safebrowsing-format hash tables: same set of features (trackers, social
trackers, fingerprinters, cryptominers, email trackers, plus
allow-list/exception features and dedicated `test_block` / `test_annotate`
features), but driven by full adblock syntax rules evaluated by a Rust
engine wrapping the [`adblock`](https://crates.io/crates/adblock) crate.
This page is a reference for how the service is wired up internally: where
list bytes live, how they get turned into engines, how a channel
classification request flows through it, and which invariants the code
depends on.
## Components
| File | Role |
| --- | --- |
| `nsIContentClassifierService.idl` | XPCOM contract surfaced to JS: `onListsChanged(updated, removed)`, `getFeatureNames()`, and the test-only `NS_CONTENT_CLASSIFIER_FILTER_LISTS_LOADED_TOPIC` observer notification that fires after every rebuild. |
| `nsIContentClassifierRemoteSettingsClient.idl` | JS-side contract: `init`, `shutdown`, `getListBytes(listName)`. |
| `ContentClassifierService.{h,cpp}` | The singleton C++ service. Owns the feature table, the per-feature engine map, the four mode-keyed active-engine lists, the mutex, pref/Nimbus observers, async-shutdown blocker, and the build thread. |
| `ContentClassifierRemoteSettingsClient.sys.mjs` | Wraps the `content-classifier-lists` Remote Settings collection. Owns the on-disk attachment cache, registers a sync listener, and pulls bytes on demand. |
| `content_classifier_engine/` (Rust crate) | Wraps the `adblock` crate (v0.12.1, `full-regex-handling` + `single-thread` features) behind a small FFI: `engine_from_rules`, `check_network_request_preparsed`, `engine_destroy`, plus init/teardown for an `nsIEffectiveTLDService`-backed domain resolver. |
| `ContentClassifierEngine.{h,cpp}` | Thread-safe refcounted C++ wrapper around the Rust FFI engine. Extracts request metadata from an `nsIChannel`-derived `ContentClassifierRequest` and calls into Rust. |
| `components.conf`, `moz.build` | Component registration and build setup (cbindgen generates `content_classifier_ffi.h`). |
## Features and prefs
The static `kFeatures[]` table (`ContentClassifierService.cpp`) is the single
source of truth for which feature names exist, which Remote Settings list
IDs roll up into each feature's engine, and how matches are reported to the
channel. Each entry carries:
- `mName` — the identifier used in prefs.
- `mListIds` — one or more Remote Settings record names whose attachments
are concatenated into the feature's engine rules.
- `mClassificationFlag` — the
`nsIClassifiedChannel::ClassificationFlags` bit set on the channel for an
annotation match.
- `mLoadedState` / `mReplacedState` / `mAllowedState` —
`nsIWebProgressListener` STATE_LOADED_* / STATE_REPLACED_* /
STATE_ALLOWED_* values logged into the content blocking log.
`mLoadedState == 0` denotes an annotate-without-notify feature.
- `mBlockingErrorCode` — `NS_ERROR_*_URI` passed to
`UrlClassifierCommon::SetBlockedContent` for a cancellation; `NS_OK`
means the feature has no blocking variant and is only ever an annotation.
- `mExceptionOnly` — true if the feature contains only allowlist /
exception rules. This means it must be last in a list of features.
A console warning will yell at you for this.
Enable switches (per mode):
- `privacy.trackingprotection.content.protection.enabled`
- `privacy.trackingprotection.content.annotation.enabled`
Feature selection (comma-separated feature names):
- `privacy.trackingprotection.content.protection.engines`
- `privacy.trackingprotection.content.protection.engines.pbmode`
- `privacy.trackingprotection.content.annotation.engines`
- `privacy.trackingprotection.content.annotation.engines.pbmode`
Test-only lists fetched over HTTP (used by the `test_block` /
`test_annotate` features so tests don't need a live Remote Settings
collection):
- `privacy.trackingprotection.content.protection.test_list_urls`
- `privacy.trackingprotection.content.annotation.test_list_urls`
All of the above prefs are mapped onto Nimbus feature variables in
`toolkit/components/nimbus/FeatureManifest.yaml`.
## Threading model
Three thread types appear in this code, and the rebuild and classify
paths both deliberately move work between them:
- **Main thread.** All init, pref observers, Remote Settings sync
callbacks, and final channel-side decisions (`MaybeCancelChannel`,
`MaybeAnnotateChannel`) run here.
- **`mBuildThread`** (an `nsISerialEventTarget` task queue, created in
`Init`). The CPU-heavy half of an engine rebuild runs here:
`Engine::from_rules` calls (the actual adblock parser) happen with no
lock held, and the lock-protected `InstallEngine` /
`PopulateAllActiveEnginesFromPreferenceSnapshot` / `PruneInactiveEngines`
steps run here too, just briefly under `mLock`.
- **URL-classifier worker thread.** `ClassifyForCancel` and
`ClassifyForAnnotate` run here, called from
`netwerk/url-classifier/AsyncUrlChannelClassifier.cpp`. Both acquire
`mLock` briefly to snapshot the active-engine list pointer and then
release it before crossing the FFI.
The `mozilla::Mutex mLock` is **non-recursive**. Reacquiring it while
already held will deadlock the calling thread. The header enforces this
by:
- Marking `mInitPhase`, `mEngines`, `mFeatureVersions`,
`mUpdateGeneration`, and the four active-engine arrays as
`MOZ_GUARDED_BY(mLock)`.
- Annotating `InstallEngine`,
`PopulateAllActiveEnginesFromPreferenceSnapshot`, and
`PruneInactiveEngines` with `MOZ_REQUIRES(mLock)`.
- Releasing `mLock` before any call into the engine FFI (so a long
classification cannot stall a rebuild and vice versa).
You may be tempted to use a RWLock. This will give you less than you think
because we really only have one classifying thread. Worse yet, I don't
remember if the engine lookup is threadsafe.
## List load and engine rebuild
A rebuild is triggered by any of:
- Initial `InitRSClient()` (first time the service sees an active RS
feature).
- A Remote Settings sync push (`onSync` in the JS client).
- A pref change: master enable, an engines selection pref, or one of the
`test_list_urls` prefs.
`onListsChanged(updated, removed)` on the main thread calls
`ProcessListChanges`, which takes a fresh `EnginesPrefsSnapshot` of the
current pref state, walks the active features named in that snapshot, and
selects every feature that either has no engine yet or whose
`mListIds` overlap `updated` ∪ `removed`. That set goes to
`UpdateFeatures`.
`UpdateFeatures` (main thread) bumps `mUpdateGeneration` (global) and the
per-feature `mFeatureVersions` entry for every feature it's about to
rebuild — both under `mLock`. It then fires
`FetchEngineDataForFeature` to get the rule lists.
The `MozPromise<>` returned by each fetch is
collected via `MozPromise::AllSettled`; when all of them resolve, the
collected rule arrays plus the captured generation and per-feature
versions are dispatched onto `mBuildThread`.
On `mBuildThread`, with no lock held, we build the rule engines.
The same `mBuildThread` task then reacquires `mLock` and performs the
install / populate / prune step under it:
- For each freshly built engine, compare the captured per-feature version
to the current `mFeatureVersions` entry. If a newer rebuild has been
issued since this one was dispatched, the captured version is stale and
the engine is dropped on the floor. Otherwise it's stored into `mEngines`
via `InstallEngine`.
- After all installs, compare the captured `mUpdateGeneration` to the
current one. Only if it's still the latest do we run
`PopulateAllActiveEnginesFromPreferenceSnapshot` (rebuild the four
per-mode active-engine arrays from `mEngines`, in pref order) and
`PruneInactiveEngines` (drop entries from `mEngines` not referenced by
any active-engine array).
This versioning-and-recheck pattern is the safety invariant for concurrent
rebuilds: two rebuilds racing through `mBuildThread` can never have the
older one's snapshot overwrite the newer one's results, because the
older one's captured generation no longer matches by the time it tries
to commit.
Finally a small task is dispatched back to the main thread to fire
`NS_CONTENT_CLASSIFIER_FILTER_LISTS_LOADED_TOPIC` (test-only, gated on the
`privacy.trackingprotection.content.testing` pref), which is how the
browser tests await rebuild completion. These need to be debounced.
## Channel classification
A channel classification request enters from
`netwerk/url-classifier/AsyncUrlChannelClassifier.cpp` on the URL-classifier
worker thread. The caller has already constructed a `ContentClassifierRequest`
on the main thread that extracts the URL, the schemeless site and source
schemeless site (via `nsIEffectiveTLDService`), the request type (mapped
from `ExtContentPolicyType` to an adblock type string), the third-party
flag (via `mozIThirdPartyUtil`), and the PBM flag.
`ClassifyForCancel` and `ClassifyForAnnotate` both acquire `mLock`, pick
the appropriate active-engine array based on PBM and mode, and call
`ClassifyWithEngines`. The lock is released before returning the result.
`ClassifyWithEngines` takes an `aIndependentEngines` flag that controls
how engine evaluation chains:
- **Cancel (`aIndependentEngines = false`).** Threads a `matchedSoFar`
flag through every `CheckNetworkRequest` call so exception-only engines
see the propagated `matched_rule`. Stops iterating when the aggregated
status reaches `ImportantHit` or `ImportantException` — either of those
pins the outcome and further engines can't change it — but otherwise
continues so a trailing exception can still demote an earlier hit.
- **Annotate (`aIndependentEngines = true`).** Each engine sees
`previously_matched_rule = false`, so each evaluates its own rules in
isolation and `MaybeAnnotateChannel` can attribute matches to every
feature whose rules fired.
`ContentClassifierEngine::CheckNetworkRequest` short-circuits to a `Miss`
for first-party requests before crossing the FFI. For genuine
third-party requests, it builds the preparsed request fields once and
calls `content_classifier_engine_check_network_request_preparsed`. The
Rust side constructs an `adblock::Request` via `Request::preparsed`,
calls `Engine::check_network_request_subset(req, previously_matched_rule,
false)`, and writes back `matched`, `important`, and an optional
`exception` rule string.
Each per-engine result is folded into a `ContentClassifierResult` via
`Accumulate`. The status enum is ordered (Miss < Hit < Exception <
ImportantHit < ImportantException), and `Accumulate` promotes
monotonically: any Exception promotes the aggregate over a Hit, and any
Important value pins the status against later non-Important results.
Really, the status enum only matters for annotation.
The worker thread dispatches the result back to the main thread, which
then calls either `MaybeCancelChannel` (consults
`ChannelClassifierUtils::IsAllowListed`, finds the first matched feature
whose `mBlockingErrorCode` is non-`NS_OK`, hands off to
`ChannelClassifierUtils::MaybeBlockChannel`) or `MaybeAnnotateChannel`
(iterates the engine-result list and calls
`ChannelClassifierUtils::AnnotateChannel` for each matched feature with a
non-zero `mLoadedState`, applying the feature's classification flag and
loaded state to the channel).