Name Description Size
canonical.rs Canonicalization of types. The unit of canonicalization is a recursion group. Having "unnecessary" types in a recursion group can "break" canonicalization of other types within that same recursion group, as can reordering types within a recursion group. It is an invariant that all types defined before the recursion group we are currently canonicalizing have already been canonicalized themselves. Canonicalizing a recursion group then proceeds as follows: * First we walk each of its `SubType` elements and put their type references (i.e. their `PackedIndex`es) into canonical form. Canonicalizing a `PackedIndex` means switching it from indexing into the Wasm module's types space into either 1. Referencing an already-canonicalized type, for types outside of this recursion group. Because inter-group type references can only go towards types defined before this recursion group, we know the type is already canonicalized and we have a `CoreTypeId` for each of those types. This updates the `PackedIndex` into a `CoreTypeId`. 2. Indexing into the current recursion group, for intra-group type references. Note that (2) has the effect of making the "same" structure of mutual type recursion look identical across recursion groups: ```wat ;; Before (rec (struct (field (module-type 1))) (struct (field (module-type 0)))) (rec (struct (field (module-type 3))) (struct (field (module-type 2)))) ;; After (rec (struct (field (rec-group-type 1))) (struct (field (rec-group-type 0)))) (rec (struct (field (rec-group-type 1))) (struct (field (rec-group-type 0)))) ``` * Now that the recursion group's elements are in canonical form, we can "simply" hash cons whole rec groups at a time. The `TypesList` morally maintains a hash map from `Vec<SubType>` to `RecGroupId` and we can do get-or-create operations on it. I say "morally" because we don't actually duplicate the `Vec<SubType>` key in that hash map since those elements are already stored in the `TypeList`'s internal `SnapshotList<CoreType>`. This means we need to do some low-level hash table fiddling with the `hashbrown` crate. And that's it! That is the whole canonicalization algorithm. Some more random things to note: * Because we essentially already have to do the check to canonicalize, and to avoid additional passes over the types, the canonicalization pass also checks that type references are in bounds. These are the only errors that can be returned from canonicalization. * Canonicalizing requires the `Module` to translate type indices to actual `CoreTypeId`s. * It is important that *after* we have canonicalized all types, we don't need the `Module` anymore. This makes sure that we can, for example, intern all types from the same store into the same `TypeList`. Which in turn lets us type check function imports of a same-store instance's exported functions and we don't need to translate from one module's canonical representation to another module's canonical representation or perform additional expensive checks to see if the types match or not (since the whole point of canonicalization is to avoid that!). 17370