Source code

Revision control

Copy as Markdown

Other Tools

WebExtensions AI API
====================
.. note::
The extension developer is responsible to comply with `Mozilla's add-on policies <https://extensionworkshop.com/documentation/publish/add-on-policies/>`_
as well as regulatory rules when
providing AI features, such as the `EU AI Act <https://www.europarl.europa.eu/thinktank/en/document/EPRS_BRI(2021)698792>`_.
The Firefox AI Platform API can be used from web extensions via a trial API we've added in 134. This API
is enabled by default in Nightly. For Beta and Release, toggle the following flags in `about:config`:
- `browser.ml.enable` → true
- `extensions.ml.enabled` → true
WebExtensions that use the `trialML` optional permission will be able to use the API.
The permission is added to your manifest.json file as follows:
.. code-block:: json
{
"optional_permissions": ["trialML"],
}
The WebExtensions inference API wraps the Firefox AI API and comes in four endpoints under
the `browser.trial.ml` namespace:
- **createEngine**: creates an inference engine.
- **runEngine**: runs an inference engine.
- **onProgress**: listener for engine events
- **deleteCachedModels**: delete model(s) files
Below is a full example of using the engine to summarize a content:
.. code-block:: javascript
// 1. Initialize the event listener
browser.trial.ml.onProgress.addListener(progressData => {
console.log(progressData);
});
// 2. Create the engine, may trigger downloads.
await browser.trial.ml.createEngine({
modelHub: "huggingface",
taskName: "summarization",
});
// 3. Call the engine
const res = await browser.trial.ml.runEngine({
args: ["This is the text to summarize"],
});
// 4. Get the results.
console.log(res[0]["summary_text"]);
// 5. Delete the downloaded model files
await browser.trial.ml.deleteCachedModels();
The `createEngine` call will trigger downloads in case the model files are not already cached in IndexDB.
This means that the first call to `createEngine` may last for a while, which need to be taken
into account when building the web extension. Subsequent calls will be much faster.
Engine arguments
----------------
When calling that API, the object you pass to it can contain the following arguments (a subset of the arguments of the platform API):
- **taskName**: The name of the task the pipeline is configured for. MANDATORY
- **modelHub**: The model hub to use, can be huggingface or mozilla. When used, modelHubRootUrl and modelHubUrlTemplate are ignored.
- **modelId**: The identifier for the specific model to be used by the pipeline.
- **modelRevision**: The revision for the specific model to be used by the pipeline.
- **tokenizerId**: The identifier for the tokenizer associated with the model, used for pre-processing inputs.
- **tokenizerRevision**: The revision for the tokenizer associated with the model, used for pre-processing inputs.
- **processorId**: The identifier for any processor required by the model, used for additional input processing.
- **processorRevision**: The revision for any processor required by the model, used for additional input processing.
- **dtype**: quantization level
- **device**: device to use (wasm or gpu)
Besides `taskName`, all other arguments are optional, and the API will pick sane defaults.
Notice that model files can be very large, and it’s recommended to use quantized versions to reduce the size of the downloads.
We also have not activated all tasks for this first version because we have not yet implemented a streaming API for
the inference tasks, making it impractical to run tasks that run on audio, video or large amounts of data.
Default models
--------------
Below is a list of supported tasks and their default models that will be picked if you don't provide
one.
- **text-classification**: Xenova/distilbert-base-uncased-finetuned-sst-2-english
- **token-classification**: Xenova/bert-base-multilingual-cased-ner-hrl
- **question-answering**: Xenova/distilbert-base-cased-distilled-squad
- **fill-mask**: Xenova/bert-base-uncased
- **summarization**: Xenova/distilbart-cnn-6-6
- **translation**: Xenova/t5-small
- **text2text-generation**: Xenova/flan-t5-small
- **text-generation**: Xenova/gpt2
- **zero-shot-classification**: Xenova/distilbert-base-uncased-mnli
- **image-to-text**: Mozilla/distilvit
- **image-classification**: Xenova/vit-base-patch16-224
- **image-segmentation**: Xenova/detr-resnet-50-panoptic
- **zero-shot-image-classification**: Xenova/clip-vit-base-patch32
- **object-detection**: Xenova/detr-resnet-50
- **zero-shot-object-detection**: Xenova/owlvit-base-patch32
- **document-question-answering**: Xenova/donut-base-finetuned-docvqa
- **image-to-image**: Xenova/swin2SR-classical-sr-x2-64
- **depth-estimation**: Xenova/dpt-large
- **feature-extraction**: Xenova/all-MiniLM-L6-v2
- **image-feature-extraction**: Xenova/vit-base-patch16-224-in21k
Any model in Hugging Face that is compatible with Transformers.js should work.
Once the engine is created, the `runEngine` API will execute. To know what arguments to pass to args
and options, you can refer to the `Transformers.js documentation <https://huggingface.co/docs/transformers.js/index#tasks>`_.
In practice, `args` is the first argument passed to the Transformers.js pipeline API, and `options` the second.
So the example below:
.. code-block:: javascript
const gen = await pipeline('summarization', 'Xenova/distilbart-cnn-6-6');
const output = await gen('some text', {max_new_tokens: 100});
Becomes:
.. code-block:: javascript
await browser.trial.ml.createEngine({
modelHub: "huggingface",
taskName: "summarization",
modelId: "Xenova/distilbart-cnn-6-6"
});
const output = await browser.trial.ml.runEngine({
args: ["some text"],
});
Limitations
-----------
This trial API comes with a few limitations.
Beside restricting a few tasks, Firefox will not authorize web extensions to download any model that is not
in our model hub, or in the organizations that are allowed in Hugging Face.
The two blessed organizations in Hugging Face for now are `Mozilla <https://huggingface.co/Mozilla>`_ and `Xenova <https://huggingface.co/Xenova>`_ which provide over a thousand models to play with.
We are planning to add more organizations in the future and provide a process for web extension developers
to ask for their models to be added in our list.
Extensions are also not able to run several engines in parallel to avoid resource conflicts.
This means that if you want to run different tasks, it needs to be done in sequence.
This limitation might be relaxed in the future as well.
Last, but not least, if the device memory resources are getting too low, engine running in an extension might
be deleted and an error will be thrown.
Full example
------------
We've implemented a full example that leverages our `image-to-text model` to generate a caption on a right click. :ref:`See the README <Trial Inference API Extension Example>`.