canonical.rs |
Canonicalization of types.
The unit of canonicalization is a recursion group. Having "unnecessary"
types in a recursion group can "break" canonicalization of other types
within that same recursion group, as can reordering types within a recursion
group.
It is an invariant that all types defined before the recursion group we are
currently canonicalizing have already been canonicalized themselves.
Canonicalizing a recursion group then proceeds as follows:
* First we walk each of its `SubType` elements and put their type references
(i.e. their `PackedIndex`es) into canonical form. Canonicalizing a
`PackedIndex` means switching it from indexing into the Wasm module's
types space into either
1. Referencing an already-canonicalized type, for types outside of this
recursion group. Because inter-group type references can only go
towards types defined before this recursion group, we know the type is
already canonicalized and we have a `CoreTypeId` for each of those
types. This updates the `PackedIndex` into a `CoreTypeId`.
2. Indexing into the current recursion group, for intra-group type
references.
Note that (2) has the effect of making the "same" structure of mutual type
recursion look identical across recursion groups:
```wat
;; Before
(rec (struct (field (module-type 1))) (struct (field (module-type 0))))
(rec (struct (field (module-type 3))) (struct (field (module-type 2))))
;; After
(rec (struct (field (rec-group-type 1))) (struct (field (rec-group-type 0))))
(rec (struct (field (rec-group-type 1))) (struct (field (rec-group-type 0))))
```
* Now that the recursion group's elements are in canonical form, we can
"simply" hash cons whole rec groups at a time. The `TypesList` morally
maintains a hash map from `Vec<SubType>` to `RecGroupId` and we can do
get-or-create operations on it. I say "morally" because we don't actually
duplicate the `Vec<SubType>` key in that hash map since those elements are
already stored in the `TypeList`'s internal `SnapshotList<CoreType>`. This
means we need to do some low-level hash table fiddling with the
`hashbrown` crate.
And that's it! That is the whole canonicalization algorithm.
Some more random things to note:
* Because we essentially already have to do the check to canonicalize, and
to avoid additional passes over the types, the canonicalization pass also
checks that type references are in bounds. These are the only errors that
can be returned from canonicalization.
* Canonicalizing requires the `Module` to translate type indices to
actual `CoreTypeId`s.
* It is important that *after* we have canonicalized all types, we don't
need the `Module` anymore. This makes sure that we can, for example,
intern all types from the same store into the same `TypeList`. Which in
turn lets us type check function imports of a same-store instance's
exported functions and we don't need to translate from one module's
canonical representation to another module's canonical representation or
perform additional expensive checks to see if the types match or not
(since the whole point of canonicalization is to avoid that!). |
17370 |