index.md - mozsearch

# Content Classifier Service

The Content Classifier Service (`toolkit/components/content-classifier/`) is

the anti-tracking component that classifies network channels against

adblock-format filter lists delivered through Remote Settings. It is a

parallel classification path layered alongside the older URL Classifier and

its safebrowsing-format hash tables: same set of features (trackers, social

trackers, fingerprinters, cryptominers, email trackers, plus

allow-list/exception features and dedicated `test_block` / `test_annotate`

features), but driven by full adblock syntax rules evaluated by a Rust

engine wrapping the [`adblock`](https://crates.io/crates/adblock) crate.

This page is a reference for how the service is wired up internally: where

list bytes live, how they get turned into engines, how a channel

classification request flows through it, and which invariants the code

depends on.

## Components

| File | Role |

| ---  | ---  |

| `nsIContentClassifierService.idl` | XPCOM contract surfaced to JS: `onListsChanged(updated, removed)`, `getFeatureNames()`, and the test-only `NS_CONTENT_CLASSIFIER_FILTER_LISTS_LOADED_TOPIC` observer notification that fires after every rebuild. |

| `nsIContentClassifierRemoteSettingsClient.idl` | JS-side contract: `init`, `shutdown`, `getListBytes(listName)`. |

| `ContentClassifierService.{h,cpp}` | The singleton C++ service. Owns the feature table, the per-feature engine map, the four mode-keyed active-engine lists, the mutex, pref/Nimbus observers, async-shutdown blocker, and the build thread. |

| `ContentClassifierRemoteSettingsClient.sys.mjs` | Wraps the `content-classifier-lists` Remote Settings collection. Owns the on-disk attachment cache, registers a sync listener, and pulls bytes on demand. |

| `content_classifier_engine/` (Rust crate) | Wraps the `adblock` crate (v0.12.1, `full-regex-handling` + `single-thread` features) behind a small FFI: `engine_from_rules`, `check_network_request_preparsed`, `engine_destroy`, plus init/teardown for an `nsIEffectiveTLDService`-backed domain resolver. |

| `ContentClassifierEngine.{h,cpp}` | Thread-safe refcounted C++ wrapper around the Rust FFI engine. Extracts request metadata from an `nsIChannel`-derived `ContentClassifierRequest` and calls into Rust. |

| `components.conf`, `moz.build` | Component registration and build setup (cbindgen generates `content_classifier_ffi.h`). |

## Features and prefs

The static `kFeatures[]` table (`ContentClassifierService.cpp`) is the single

source of truth for which feature names exist, which Remote Settings list

IDs roll up into each feature's engine, and how matches are reported to the

channel. Each entry carries:

- `mName` — the identifier used in prefs.

- `mListIds` — one or more Remote Settings record names whose attachments

  are concatenated into the feature's engine rules.

- `mClassificationFlag` — the

  `nsIClassifiedChannel::ClassificationFlags` bit set on the channel for an

  annotation match.

- `mLoadedState` / `mReplacedState` / `mAllowedState` —

  `nsIWebProgressListener` STATE_LOADED_* / STATE_REPLACED_* /

  STATE_ALLOWED_* values logged into the content blocking log.

  `mLoadedState == 0` denotes an annotate-without-notify feature.

- `mBlockingErrorCode` — `NS_ERROR_*_URI` passed to

  `UrlClassifierCommon::SetBlockedContent` for a cancellation; `NS_OK`

  means the feature has no blocking variant and is only ever an annotation.

- `mExceptionOnly` — true if the feature contains only allowlist /

  exception rules. This means it must be last in a list of features.

  A console warning will yell at you for this.

Enable switches (per mode):

- `privacy.trackingprotection.content.protection.enabled`

- `privacy.trackingprotection.content.annotation.enabled`

Feature selection (comma-separated feature names):

- `privacy.trackingprotection.content.protection.engines`

- `privacy.trackingprotection.content.protection.engines.pbmode`

- `privacy.trackingprotection.content.annotation.engines`

- `privacy.trackingprotection.content.annotation.engines.pbmode`

Test-only lists fetched over HTTP (used by the `test_block` /

`test_annotate` features so tests don't need a live Remote Settings

collection):

- `privacy.trackingprotection.content.protection.test_list_urls`

- `privacy.trackingprotection.content.annotation.test_list_urls`

All of the above prefs are mapped onto Nimbus feature variables in

`toolkit/components/nimbus/FeatureManifest.yaml`.

## Threading model

Three thread types appear in this code, and the rebuild and classify

paths both deliberately move work between them:

- **Main thread.** All init, pref observers, Remote Settings sync

  callbacks, and final channel-side decisions (`MaybeCancelChannel`,

  `MaybeAnnotateChannel`) run here.

- **`mBuildThread`** (an `nsISerialEventTarget` task queue, created in

  `Init`). The CPU-heavy half of an engine rebuild runs here:

  `Engine::from_rules` calls (the actual adblock parser) happen with no

  lock held, and the lock-protected `InstallEngine` /

  `PopulateAllActiveEnginesFromPreferenceSnapshot` / `PruneInactiveEngines`

  steps run here too, just briefly under `mLock`.

- **URL-classifier worker thread.** `ClassifyForCancel` and

  `ClassifyForAnnotate` run here, called from

  `netwerk/url-classifier/AsyncUrlChannelClassifier.cpp`. Both acquire

  `mLock` briefly to snapshot the active-engine list pointer and then

  release it before crossing the FFI.

The `mozilla::Mutex mLock` is **non-recursive**. Reacquiring it while

already held will deadlock the calling thread. The header enforces this

by:

- Marking `mInitPhase`, `mEngines`, `mFeatureVersions`,

  `mUpdateGeneration`, and the four active-engine arrays as

  `MOZ_GUARDED_BY(mLock)`.

- Annotating `InstallEngine`,

  `PopulateAllActiveEnginesFromPreferenceSnapshot`, and

  `PruneInactiveEngines` with `MOZ_REQUIRES(mLock)`.

- Releasing `mLock` before any call into the engine FFI (so a long

  classification cannot stall a rebuild and vice versa).

You may be tempted to use a RWLock. This will give you less than you think

because we really only have one classifying thread. Worse yet, I don't

remember if the engine lookup is threadsafe.

## List load and engine rebuild

A rebuild is triggered by any of:

- Initial `InitRSClient()` (first time the service sees an active RS

  feature).

- A Remote Settings sync push (`onSync` in the JS client).

- A pref change: master enable, an engines selection pref, or one of the

  `test_list_urls` prefs.

`onListsChanged(updated, removed)` on the main thread calls

`ProcessListChanges`, which takes a fresh `EnginesPrefsSnapshot` of the

current pref state, walks the active features named in that snapshot, and

selects every feature that either has no engine yet or whose

`mListIds` overlap `updated` ∪ `removed`. That set goes to

`UpdateFeatures`.

`UpdateFeatures` (main thread) bumps `mUpdateGeneration` (global) and the

per-feature `mFeatureVersions` entry for every feature it's about to

rebuild — both under `mLock`. It then fires

`FetchEngineDataForFeature` to get the rule lists.

The `MozPromise<>` returned by each fetch is

collected via `MozPromise::AllSettled`; when all of them resolve, the

collected rule arrays plus the captured generation and per-feature

versions are dispatched onto `mBuildThread`.

On `mBuildThread`, with no lock held, we build the rule engines.

The same `mBuildThread` task then reacquires `mLock` and performs the

install / populate / prune step under it:

- For each freshly built engine, compare the captured per-feature version

  to the current `mFeatureVersions` entry. If a newer rebuild has been

  issued since this one was dispatched, the captured version is stale and

  the engine is dropped on the floor. Otherwise it's stored into `mEngines`

  via `InstallEngine`.

- After all installs, compare the captured `mUpdateGeneration` to the

  current one. Only if it's still the latest do we run

  `PopulateAllActiveEnginesFromPreferenceSnapshot` (rebuild the four

  per-mode active-engine arrays from `mEngines`, in pref order) and

  `PruneInactiveEngines` (drop entries from `mEngines` not referenced by

  any active-engine array).

This versioning-and-recheck pattern is the safety invariant for concurrent

rebuilds: two rebuilds racing through `mBuildThread` can never have the

older one's snapshot overwrite the newer one's results, because the

older one's captured generation no longer matches by the time it tries

to commit.

Finally a small task is dispatched back to the main thread to fire

`NS_CONTENT_CLASSIFIER_FILTER_LISTS_LOADED_TOPIC` (test-only, gated on the

`privacy.trackingprotection.content.testing` pref), which is how the

browser tests await rebuild completion. These need to be debounced.

## Channel classification

A channel classification request enters from

`netwerk/url-classifier/AsyncUrlChannelClassifier.cpp` on the URL-classifier

worker thread. The caller has already constructed a `ContentClassifierRequest`

on the main thread that extracts the URL, the schemeless site and source

schemeless site (via `nsIEffectiveTLDService`), the request type (mapped

from `ExtContentPolicyType` to an adblock type string), the third-party

flag (via `mozIThirdPartyUtil`), and the PBM flag.

`ClassifyForCancel` and `ClassifyForAnnotate` both acquire `mLock`, pick

the appropriate active-engine array based on PBM and mode, and call

`ClassifyWithEngines`. The lock is released before returning the result.

`ClassifyWithEngines` takes an `aIndependentEngines` flag that controls

how engine evaluation chains:

- **Cancel (`aIndependentEngines = false`).** Threads a `matchedSoFar`

  flag through every `CheckNetworkRequest` call so exception-only engines

  see the propagated `matched_rule`. Stops iterating when the aggregated

  status reaches `ImportantHit` or `ImportantException` — either of those

  pins the outcome and further engines can't change it — but otherwise

  continues so a trailing exception can still demote an earlier hit.

- **Annotate (`aIndependentEngines = true`).** Each engine sees

  `previously_matched_rule = false`, so each evaluates its own rules in

  isolation and `MaybeAnnotateChannel` can attribute matches to every

  feature whose rules fired.

`ContentClassifierEngine::CheckNetworkRequest` short-circuits to a `Miss`

for first-party requests before crossing the FFI. For genuine

third-party requests, it builds the preparsed request fields once and

calls `content_classifier_engine_check_network_request_preparsed`. The

Rust side constructs an `adblock::Request` via `Request::preparsed`,

calls `Engine::check_network_request_subset(req, previously_matched_rule,

false)`, and writes back `matched`, `important`, and an optional

`exception` rule string.

Each per-engine result is folded into a `ContentClassifierResult` via

`Accumulate`. The status enum is ordered (Miss < Hit < Exception <

ImportantHit < ImportantException), and `Accumulate` promotes

monotonically: any Exception promotes the aggregate over a Hit, and any

Important value pins the status against later non-Important results.

Really, the status enum only matters for annotation.

The worker thread dispatches the result back to the main thread, which

then calls either `MaybeCancelChannel` (consults

`ChannelClassifierUtils::IsAllowListed`, finds the first matched feature

whose `mBlockingErrorCode` is non-`NS_OK`, hands off to

`ChannelClassifierUtils::MaybeBlockChannel`) or `MaybeAnnotateChannel`

(iterates the engine-result list and calls

`ChannelClassifierUtils::AnnotateChannel` for each matched feature with a

non-zero `mLoadedState`, applying the feature's classification flag and

loaded state to the channel).

Source code

Revision control

Copy as Markdown

Other Tools