UnrollLoops.cpp - mozsearch

mozilla-central/js/src/jit/UnrollLoops.cpp (file symbol)

Enable keyboard shortcuts

Source code

Revision control

Copy as Markdown

Other Tools

/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*-

 * vim: set ts=8 sts=2 et sw=2 tw=80:

 * This Source Code Form is subject to the terms of the Mozilla Public

 * License, v. 2.0. If a copy of the MPL was not distributed with this

 * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

#include "jit/UnrollLoops.h"

#include "mozilla/Maybe.h"

#include <stdio.h>

#include "jit/DominatorTree.h"

#include "jit/IonAnalysis.h"

#include "jit/MIRGraph.h"

namespace js {

namespace jit {

// [SMDOC] Loop unroller implementation summary

//

// This is a simple loop unroller, intended (initially at least) to handle only

// wasm loops, with the aims of amortizing the per-iteration interrupt check

// cost, and of lifting an initial iteration outside the loop so as to

// facilitate subsequent optimizations.  Unrolling and peeling can be selected

// independently, so the available choices are: peeling only, unrolling only,

// or both peeling and unrolling.

//

// The flow of control (for a single function) is roughly:

//

// (0) The top level function is UnrollLoops.

//

// (1) UnrollLoops calls FindInitialCandidates to identify "initial candidate"

//     loops in the function.  These are innermost loops.  Non-innermost loops

//     cannot be unrolled/peeled.

//

// (2) FindInitialCandidates counts blocks and value defs in each initial

//     candidate, creating an initial score which increases with loop size.

//

// (3) Back in UnrollLoops, initial candidates whose initial score does not

//     exceed InitialCandidateThreshold proceed to the next stage in the

//     process: they are passed to function AnalyzeLoop.

//

// (4) AnalyzeLoop, as called from UnrollLoops, checks many structural

//     invariants on each loop, since the core unroll machinery depends on

//     those invariants.  It also makes a final determination for each loop, as

//     to whether the loop should be peeled, unrolled, peeled and unrolled, or

//     be left unchanged.  In the latter case it returns one of a few reasons

//     why the loop is to be left unchanged.

//

// (5) AnalyzeLoop also identifies, for a loop, its basic blocks, the set of

//     blocks it jumps to when exiting, and the values defined in the loop

//     which are used afterwards.

//

// (6) Back in UnrollLoops, we now know which loops are suitable for

//     processing, what processing is to happen to them, and what their blocks

//     are.  This is used to construct and initialize a `struct UnrollState`

//     for each loop candidate that made it this far.

//

// (7) Before a loop can be unrolled, single-argument phi nodes need to be

//     added to its exit target blocks, to "catch" values defined in the loop

//     that are used afterwards.  This is a limited conversion into what is

//     called "Loop-Closed SSA form" in the GCC/Clang literature.  UnrollLoops

//     calls AddClosingPhisForLoop to get this done.  This is done for all

//     loops before proceeding further.

//

// (8) Finally, we get to the core of the unrolling: the loops are handed to

//     UnrollAndOrPeelLoop in turn.  This has extensive comments; a summary

//     of the actions is:

//     - blocks, phis and normal instructions are cloned so as to create

//       the extra loop body copies.

//     - the new blocks are added to the loop's UnrollState::blockTable.

//     - during cloning, value uses are remapped to the correct iteration

//       using an MDefinitionRemapper.

//     - also during cloning, the new values are parked in a ValueTable for

//       later reference.

//     - control flow edges and predecessor vectors in the new and original

//       block sets are fixed up, using RemapEdgeSource and

//       RemapEdgeDestination.

//     - header block phi nodes are fixed up in both the body copies

//       and the original.

//     - new exiting blocks are created, so as to maintain the invariant

//       that there are no critical edges.

//     - for exit target blocks, both their predecessor arrays and phi nodes

//       (as installed by AddClosingPhisForLoop) are augmented to handle

//       the new exit edges.

//     - Wasm interrupt checks in all but the last body copy are nop'd out.

//     - If peeling is required, the back edge of the unrolled loop is changed

//       so it points at the second body copy, not the first (the original).

//     - Finally, the new blocks are installed in the MIRGraph.

//

// (9) Back in UnrollLoops, once all loops have been processed,

//     some function-level fixups are done:

//     - the blocks are renumbered

//     - the dominator tree is rebuilt

//

// UnrollLoops now returns to OptimizeMIR, handing back an indication of how

// many loops were processed.  OptimizeMIR will tidy up the resulting function

// by re-running GVN, empty block removal, and possibly other transformations.

// The core logic in UnrollAndOrPeelLoop is heavily commented but still

// nearly incomprehensible.  For debugging and modification it is helpful to

// use the Dump{Block,Value}Table* routines provided.

// The logic below functions without reference to block IDs or value IDs.

// Much of the logic uses simple scans over vectors (classes BlockVector,

// Matrix, SimpleSet, MDefinitionRemapper) rather than more complex data

// structures.  The loops we process are small, which keeps the costs of these

// scans low.  The InitialCandidateThreshold value used limits most processed

// loops to about 6 blocks and 30-ish value definitions.

// The idiom `blockTable.rowContains(0, block)` in the logic below means

// "is `block` in the original loop?"

// =====================================================================

//

// Tunables

// As a crude first pass for filtering out unsuitable candidates, top level

// function UnrollLoops calculates a size-related value for each candidate,

// with lower values indicating more desirability for unrolling/peeling.

// Candidates with a score above this value will not be unrolled.  Reducing the

// value therefore reduces the aggressiveness of the unroller.

const float InitialCandidateThreshold = 100.0;

// On further analysis of candidates that pass the above test, we have more

// detailed limitations on size.  The specified action will not be taken for a

// loop if the associated metric exceeds the constants here.  Hence increasing

// these values increases the aggressiveness of the unroller, but only to point

// where the `InitialCandidateThreshold` would have cut it off.

// Loops with more than this number of basic blocks or values (phis and normal

// definitions) will be excluded from peeling.  The actual numbers are pretty

// much pulled out of a hat.

const size_t MaxBlocksForPeel = 12;

const size_t MaxValuesForPeel = 150;

// Loops with more than this number of blocks or values will be excluded from

// unrolling.  Unrolling adds multiple copies of the loop body, whereas peeling

// adds only one, so it seems prudent to make unrolling less aggressive than

// peeling.

const size_t MaxBlocksForUnroll = 10;

const size_t MaxValuesForUnroll = 100;

// =====================================================================

//

// BlockVector, a vector of MBasicBlock*s.

using BlockVector = mozilla::Vector<MBasicBlock*, 32, SystemAllocPolicy>;

static bool BlockVectorContains(const BlockVector& vec,

                                const MBasicBlock* block) {

  for (const MBasicBlock* b : vec) {

    if (b == block) {

      return true;

  return false;

// =====================================================================

//

// Matrix, a 2-D array of pointers

template <typename T, int N, typename AP>

class Matrix {

  static_assert(std::is_pointer<T>::value);

  mozilla::Vector<T, N, AP> vec_;

  size_t size1_ = 0;

  size_t size2_ = 0;

  inline size_t index(size_t ix1, size_t ix2) const {

    MOZ_ASSERT(ix1 < size1_ && ix2 < size2_);

    return ix1 * size2_ + ix2;

 public:

  [[nodiscard]] bool init(size_t size1, size_t size2) {

    // Not already init'd.

    MOZ_ASSERT(size1_ == 0 && size2_ == 0 && vec_.empty());

    // This would be pointless.

    MOZ_ASSERT(size1 > 0 && size2 > 0);

    // So we can safely multiply them.  InitialCandidateThreshold ensures we'll

    // never have to process a loop anywhere near this big.

    MOZ_RELEASE_ASSERT(size1 <= 1000 && size2 <= 1000);

    size1_ = size1;

    size2_ = size2;

    if (!vec_.resize(size1 * size2)) {

      return false;

    // Resizing also zero-initialises all the `vec_` entries.

    return true;

  inline size_t size1() const {

    MOZ_ASSERT(size1_ * size2_ == vec_.length());

    return size1_;

  inline size_t size2() const {

    MOZ_ASSERT(size1_ * size2_ == vec_.length());

    return size2_;

  inline T get(size_t ix1, size_t ix2) const { return vec_[index(ix1, ix2)]; }

  inline void set(size_t ix1, size_t ix2, T value) {

    vec_[index(ix1, ix2)] = value;

  // Does the matrix slice `[ix1, ..]` contain `value`?

  inline bool rowContains(size_t ix1, const T value) const {

    return findInRow(ix1, value).isSome();

  // Find the column number (ix2) such that entry `[ix1, ix2] == value`.

  mozilla::Maybe<size_t> findInRow(size_t ix1, const T value) const {

    for (size_t ix2 = 0; ix2 < size2_; ix2++) {

      if (get(ix1, ix2) == value) {

        return mozilla::Some(ix2);

    return mozilla::Nothing();

};

using BlockTable = Matrix<MBasicBlock*, 32, SystemAllocPolicy>;

using ValueTable = Matrix<MDefinition*, 128, SystemAllocPolicy>;

// =====================================================================

//

// Debug printing

#ifdef JS_JITSPEW

// For debugging, dump specific rows (loop body copies) of a block table.

static void DumpBlockTableRows(const BlockTable& table, int32_t firstCix,

                               int32_t lastCix, const char* tag) {

  Fprinter& printer(JitSpewPrinter());

  printer.printf("<<<< %s\n", tag);

  for (int32_t cix = firstCix; cix <= lastCix; cix++) {

    if (cix == 0) {

      printer.printf("  -------- Original --------\n");

    } else {

      printer.printf("  -------- Copy %u --------\n", cix);

    for (uint32_t bix = 0; bix < table.size2(); bix++) {

      DumpMIRBlock(printer, table.get(uint32_t(cix), bix),

                   /*showHashedPointers=*/true);

  printer.printf(">>>>\n");

// Dump an entire block table.

static void DumpBlockTable(const BlockTable& table, const char* tag) {

  DumpBlockTableRows(table, 0, int32_t(table.size1()) - 1, tag);

// Dump just the original loop body in a block table.

static void DumpBlockTableRowZero(const BlockTable& table, const char* tag) {

  DumpBlockTableRows(table, 0, 0, tag);

// Dump a value table.

static void DumpValueTable(const ValueTable& table, const char* tag) {

  Fprinter& printer(JitSpewPrinter());

  printer.printf("<<<< %s\n", tag);

  for (uint32_t cix = 0; cix < table.size1(); cix++) {

    if (cix == 0) {

      printer.printf("  -------- Original --------\n");

    } else {

      printer.printf("  -------- Copy %u --------\n", cix);

    for (uint32_t vix = 0; vix < table.size2(); vix++) {

      printer.printf("    ");

      DumpMIRDefinition(printer, table.get(cix, vix),

                        /*showHashedPointers=*/true);

      printer.printf("\n");

  printer.printf(">>>>\n");

#endif  // JS_JITSPEW

// =====================================================================

//

// SimpleSet, a set of pointers

template <typename T, int N, typename AP>

class SimpleSet {

  static_assert(std::is_pointer<T>::value);

  mozilla::Vector<T, N, AP> vec_;

 public:

  [[nodiscard]] bool copy(const SimpleSet<T, N, AP>& other) {

    vec_.clear();

    for (T elem : other.vec_) {

      if (!vec_.append(elem)) {

        return false;

    return true;

  bool contains(T t) const {

    for (auto* existing : vec_) {

      if (existing == t) {

        return true;

    return false;

  [[nodiscard]] bool add(T t) {

    MOZ_ASSERT(t);

    return contains(t) ? true : vec_.append(t);

  bool empty() const { return vec_.empty(); }

  size_t size() const { return vec_.length(); }

  T get(size_t ix) const { return vec_[ix]; }

};

using BlockSet = SimpleSet<MBasicBlock*, 8, SystemAllocPolicy>;

using ValueSet = SimpleSet<MDefinition*, 64, SystemAllocPolicy>;

// =====================================================================

//

// MDefinitionRemapper, a very simple MDefinition*-remapping class.

class MDefinitionRemapper {

  using Pair = std::pair<MDefinition*, MDefinition*>;

  mozilla::Vector<Pair, 32, SystemAllocPolicy> pairs;

 public:

  MDefinitionRemapper() {}

  // Register `original` as a key in the mapper, and map it to itself.

  [[nodiscard]] bool enregister(MDefinition* original) {

    MOZ_ASSERT(original);

    for (auto& p : pairs) {

      (void)p;

      MOZ_ASSERT(p.first != original);  // not mapped at all

      MOZ_ASSERT(p.second == p.first);  // no `update` calls happened yet

    return pairs.append(std::pair(original, original));

  // Look up the binding for `original` in the mapper.

  MDefinition* lookup(const MDefinition* original) const {

    MOZ_ASSERT(original);

    MDefinition* res = nullptr;

    for (auto& p : pairs) {

      if (p.first == original) {

        res = p.second;

        break;

    return res;

  // Update the mapper so as to map `original` to `replacement`.  `original`

  // must have previously been registered with ::enregister.

  void update(const MDefinition* original, MDefinition* replacement) {

    MOZ_ASSERT(original && replacement);

    MOZ_ASSERT(original != replacement);

    for (auto& p : pairs) {

      if (p.first == original) {

        p.second = replacement;

        return;

    MOZ_CRASH();  // asked to update a non-existent key

};

// =====================================================================

//

// MakeReplacementInstruction / MakeReplacementPhi

// Make a cloned version of `ins`, but with its arguments remapped by `mapper`.

static MInstruction* MakeReplacementInstruction(

    TempAllocator& alloc, const MDefinitionRemapper& mapper,

    const MInstruction* ins) {

  MDefinitionVector inputs(alloc);

  for (size_t i = 0; i < ins->numOperands(); i++) {

    MDefinition* old = ins->getOperand(i);

    MDefinition* replacement = mapper.lookup(old);

    if (!replacement) {

      // The mapper didn't map it, which means that `old` is a value that's not

      // defined in the loop body.  So leave it unchanged.

      replacement = old;

    if (!inputs.append(replacement)) {

      return nullptr;

  return ins->clone(alloc, inputs);

// Make a cloned version of `phi`, but with its arguments remapped by `mapper`.

static MPhi* MakeReplacementPhi(TempAllocator& alloc,

                                const MDefinitionRemapper& mapper,

                                const MPhi* phi) {

  MDefinitionVector inputs(alloc);

  for (size_t i = 0; i < phi->numOperands(); i++) {

    MDefinition* old = phi->getOperand(i);

    MDefinition* replacement = mapper.lookup(old);

    if (!replacement) {

      replacement = old;

    if (!inputs.append(replacement)) {

      return nullptr;

  return phi->clone(alloc, inputs);

// =====================================================================

//

// UnrollState

enum class UnrollMode { Peel, Unroll, PeelAndUnroll };

struct UnrollState {

  // What should happen to the specified loop.

  const UnrollMode mode;

  // The central data structure: BlockTable, a 2-D array of block pointers.

//

  // To create copies of the original loop body, we need to copy the original

  // blocks and fix up the inter-block edges and phis appropriately.  This

  // edge-fixup is done without reference to the block IDs; instead it is done

  // purely on the basis of the MBasicBlock* values themselves.  Hence it does

  // not make any assumption about the ordering of the block ID values.

//

  // The block copies are managed using a 2-D array of block pointers, via

  // which we can uniformly access the original and new blocks.  The array is

  // indexed by `(copy index, block index in copy)`.  Both indices are

  // zero-based.  The sequencing of blocks implied by "block index in copy" is

  // the sequence produced by the RPO iterator as it enumerates the blocks in

  // the loop.

//

  // Per the above, note that "block index" (`bix` in the code) is a zero-based

  // index into the RPO sequence of blocks in the loop and is not to be

  // confused with the pre-existing concept of "block ID".

//

  // Note also that the required unroll factor (including that for the peeled

  // copy, if any) is equal to the first dimension of `blockTable`, and the

  // number of blocks in the original loop is implied by the second dimension

  // of `blockTable`.  Hence neither value needs to be stored explicitly.

  BlockTable blockTable;

  // The set of blocks arrived at (directly) by exiting branches in the

  // original loop.  FIXME: try to make const

  /*const*/ BlockSet exitTargetBlocks;

  // The set of values (MDefinition*s) defined in the original loop and used

  // afterwards.

  /*const*/ ValueSet exitingValues;

  explicit UnrollState(UnrollMode mode) : mode(mode) {}

  bool doPeeling() const {

    return mode == UnrollMode::Peel || mode == UnrollMode::PeelAndUnroll;

  bool doUnrolling() const {

    return mode == UnrollMode::Unroll || mode == UnrollMode::PeelAndUnroll;

};

// =====================================================================

//

// AnalyzeLoop and AnalysisResult

// Summarises the result of a detailed analysis of a candidate loop, producing

// either a recommendation about what to do with it, or a reason why it should

// be left unchanged.

enum class AnalysisResult {

  // OOM during analysis

  OOM,

  // The chosen transformation for the loop ..

  Peel,

  Unroll,

  PeelAndUnroll,

  // .. or .. reasons why the loop can't be unrolled:

//

  // The unrolling/peeling machinery below relies on the fact that MIR loops

  // are heavily constrained in their structure.  AnalyzeLoop checks the

  // relevant invariants carefully, with this value indicating non-compliance.

  // Top level function UnrollLoops will MOZ_CRASH if this happens.

  BadInvariants,

  // Cannot be handled by the unroller: if the loop defines value(s) that are

  // used after the loop, then we can only handle the case where it has one

  // exiting branch.

  TooComplex,

  // The loop contains MIR nodes that are uncloneable

  Uncloneable,

  // The loop has too many blocks or values

  TooLarge,

  // The loop is otherwise unsuitable:

  // * contains a call or table switch

  // * is an infinite loop

  Unsuitable

};

#ifdef JS_JITSPEW

static const char* Name_of_AnalysisResult(AnalysisResult res) {

  switch (res) {

    case AnalysisResult::OOM:

      return "OOM";

    case AnalysisResult::Peel:

      return "Peel";

    case AnalysisResult::Unroll:

      return "Unroll";

    case AnalysisResult::PeelAndUnroll:

      return "PeelAndUnroll";

    case AnalysisResult::BadInvariants:

      return "BadInvariants";

    case AnalysisResult::TooComplex:

      return "TooComplex";

    case AnalysisResult::Uncloneable:

      return "Uncloneable";

    case AnalysisResult::TooLarge:

      return "TooLarge";

    case AnalysisResult::Unsuitable:

      return "Unsuitable";

    default:

      MOZ_CRASH();

#endif

// Examine the original loop in `originalBlocks` for unrolling suitability,

// and, if acceptable, collect auxiliary information:

//

// * the set of blocks that are direct exit targets for the loop

// * the set of values defined in the loop AND used afterwards

static AnalysisResult AnalyzeLoop(const BlockVector& originalBlocks,

                                  BlockSet* exitTargetBlocks,

                                  ValueSet* exitingValues) {

  MOZ_ASSERT(exitTargetBlocks->empty());

  MOZ_ASSERT(exitingValues->empty());

  // ==== BEGIN check invariants on the loop structure ====

  // The unroller/peeler below depends on these invariants.

  // In summary (where nBlocks == originalBlocks.length()):

//

  // * At least 2 blocks in original loop (1 for header phis + possible insns,

  //   >= 0 for "ordinary" blocks, 1 for the backedge jump)

//

  // * originalBlocks[0] is a loop header

  // * originalBlocks[nBlocks-1] == originalBlocks[0]->backedge

//

  // * all blocks in original loop have at least one insn

  // * all blocks in original loop have a control insn as their last insn

  // * all blocks in original loop have non-control insns for all other insns

  // * all blocks in original loop have loop depth > 0

//

  // * none of originalBlocks[0 .. nBlocks-2] (inclusive) jumps to header

//

  // * originalBlocks[nBlocks-1] jumps unconditionally to header

//

  // * original header node has 2 preds

  // * original header node pred[0] is from outside the loop, and

  //   pred[1] is from inside the loop.

//

  // * original header node phis all have 2 args

  // * original header node phis have arg[0] from outside the loop

  //   (because pred[0] is the loop-entry edge)

  // * original header node phis have their arg[1] as

  //   -- (almost always) defined in the loop -- loop carried variable

  //   -- (very occasionally) defined outside the loop -- in which case

  //      this is just a vanilla phi, nothing to do with the loop.

  //   Hence there is nothing to check for phi arg[1]s, but leaving this

  //   here for clarity.

//

  // Aside: how can a loop head phi not depend on a value defined within the

  // loop?  Consider the strange-but-valid case

//

  //    x = 5 ; loop { .. x = 7; .. }

//

  // The loop head phi will merge the initial value (5) and the loop-defined

  // value in the usual way.  Imagine now the loop is LICMd; the loop defined

  // value `x = 7;` is observed to be invariant and so is lifted out of the

  // loop.  Hence the loop head phi will have both args defined outside the

  // loop.

  // * At least 2 blocks in original loop (1 for header phis + possible insns,

  //   >= 0 for "ordinary" blocks, 1 for the backedge jump)

  const size_t numBlocksInOriginal = originalBlocks.length();

  if (numBlocksInOriginal < 2) {

    return AnalysisResult::BadInvariants;

  // * originalBlocks[0] is a loop header

  // * originalBlocks[nBlocks-1] == originalBlocks[0]->backedge

    MBasicBlock* header = originalBlocks[0];

    if (!header->isLoopHeader() ||

        header->backedge() != originalBlocks[numBlocksInOriginal - 1]) {

      return AnalysisResult::BadInvariants;

  // * all blocks in original loop have at least one insn

  // * all blocks in original loop have a control insn as their last insn

  // * all blocks in original loop have non-control insns for all other insns

  // * all blocks in original loop have loop depth > 0

  for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

    MBasicBlock* block = originalBlocks[bix];

    // This is a bit lame, but ..

    size_t numInsns = 0;

    for (MInstructionIterator insnIter(block->begin());

         insnIter != block->end(); insnIter++) {

      numInsns++;

    if (numInsns < 1) {

      return AnalysisResult::BadInvariants;

    bool ok = true;

    size_t curInsn = 0;

    for (MInstructionIterator insnIter(block->begin());

         insnIter != block->end(); insnIter++) {

      MInstruction* insn = *insnIter;

      if (curInsn == numInsns - 1) {

        ok = ok && insn->isControlInstruction();

      } else {

        ok = ok && !insn->isControlInstruction();

      curInsn++;

    MOZ_ASSERT(curInsn == numInsns);  // internal logic

    ok = ok && block->loopDepth() > 0;

    if (!ok) {

      return AnalysisResult::BadInvariants;

  // * none of originalBlocks[0 .. nBlocks-2] (inclusive) jumps to header

    MBasicBlock* header = originalBlocks[0];

    for (uint32_t bix = 0; bix < numBlocksInOriginal - 1; bix++) {

      MBasicBlock* block = originalBlocks[bix];

      if (!block->hasLastIns()) {

        return AnalysisResult::BadInvariants;

      MControlInstruction* lastIns = block->lastIns();

      for (size_t i = 0; i < lastIns->numSuccessors(); i++) {

        if (lastIns->getSuccessor(i) == header) {

          return AnalysisResult::BadInvariants;

  // originalBlocks[nBlocks-1] jumps unconditionally to header

    MBasicBlock* header = originalBlocks[0];

    MBasicBlock* backedge = originalBlocks[numBlocksInOriginal - 1];

    if (!backedge->hasLastIns()) {

      return AnalysisResult::BadInvariants;

    MControlInstruction* lastIns = backedge->lastIns();

    if (!lastIns->isGoto() || lastIns->toGoto()->target() != header) {

      return AnalysisResult::BadInvariants;

  // * original header node has 2 preds

  // * original header node pred[0] is from outside the loop.

  //   pred[1] is from inside the loop.

    MBasicBlock* header = originalBlocks[0];

    if (header->numPredecessors() != 2 ||

        BlockVectorContains(originalBlocks, header->getPredecessor(0)) ||

        !BlockVectorContains(originalBlocks, header->getPredecessor(1))) {

      return AnalysisResult::BadInvariants;

  // * original header node phis all have 2 args

  // * original header node phis have arg[0] from outside the loop

  //   (because pred[0] is the loop-entry edge)

  // * original header node phis have their arg[1] as

  //   -- (almost always) defined in the loop -- loop carried variable

  //   -- (very occasionally) defined outside the loop -- in which case

  //      this is just a vanilla phi, nothing to do with the loop.

  //   Hence there is nothing to check for phi arg[1]s, but leaving this

  //   here for clarity.

    MBasicBlock* header = originalBlocks[0];

    for (MPhiIterator phiIter(header->phisBegin());

         phiIter != header->phisEnd(); phiIter++) {

      MPhi* phi = *phiIter;

      if (phi->numOperands() != 2 ||

          BlockVectorContains(originalBlocks, phi->getOperand(0)->block())) {

        return AnalysisResult::BadInvariants;

  // ==== END check invariants on the loop structure ====

  // Check that all the insns are cloneable.

  for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

    MBasicBlock* block = originalBlocks[bix];

    for (MInstructionIterator insIter(block->begin()); insIter != block->end();

         insIter++) {

      MInstruction* ins = *insIter;

      if (ins->isWasmCallCatchable() || ins->isWasmCallUncatchable() ||

          ins->isWasmReturnCall() || ins->isTableSwitch()

          /* TODO: other node kinds unsuitable for unrolling? */) {

        return AnalysisResult::Unsuitable;

      if (!ins->canClone()) {

        // `ins->opName()` will show the name of the insn kind, if you want to

        // see it

        return AnalysisResult::Uncloneable;

  // More analysis: make up a set of blocks that are not in the loop, but which

  // are jumped to from within the loop.  We will need this later.

  for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

    MBasicBlock* block = originalBlocks[bix];

    MControlInstruction* lastIns = block->lastIns();  // asserted safe above

    for (size_t i = 0; i < lastIns->numSuccessors(); i++) {

      MBasicBlock* succ = lastIns->getSuccessor(i);

      // See if `succ` is any of the blocks in the (original) loop.  If not,

      // add it to `exitTargetBlocks`.

      if (!BlockVectorContains(originalBlocks, succ)) {

        if (!exitTargetBlocks->add(succ)) {

          return AnalysisResult::OOM;

  // If the loop has zero exits, we consider it sufficiently mutant to ignore

  // (it's an infinite loop?)

  if (exitTargetBlocks->empty()) {

    return AnalysisResult::Unsuitable;

  // Even more analysis: make up a set of MDefinition*s that are defined in the

  // original loop and which are used after the loop.  We will later need to

  // add phi nodes that generate the correct values for these, since, after

  // unrolling, these values will have multiple definition points (one per

  // unrolled iteration).  While we're at it, count the number of values

  // defined in the original loop.

  size_t numValuesInOriginal = 0;

  for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

    MBasicBlock* block = originalBlocks[bix];

    // Check all phi nodes ..

    for (MPhiIterator phiIter(block->phisBegin()); phiIter != block->phisEnd();

         phiIter++) {

      numValuesInOriginal++;

      MPhi* phi = *phiIter;

      bool usedAfterLoop = false;

      // Is the value of `phi` used after the loop?

      for (MUseIterator useIter(phi->usesBegin()); useIter != phi->usesEnd();

           useIter++) {

        MUse* use = *useIter;

        if (!BlockVectorContains(originalBlocks, use->consumer()->block())) {

          usedAfterLoop = true;

          break;

      if (usedAfterLoop && !exitingValues->add(phi)) {

        return AnalysisResult::OOM;

    // .. and all normal nodes ..

    for (MInstructionIterator insnIter(block->begin());

         insnIter != block->end(); insnIter++) {

      numValuesInOriginal++;

      MInstruction* insn = *insnIter;

      bool usedAfterLoop = false;

      // Is the value of `insn` used after the loop?

      for (MUseIterator useIter(insn->usesBegin()); useIter != insn->usesEnd();

           useIter++) {

        MUse* use = *useIter;

        if (!BlockVectorContains(originalBlocks, use->consumer()->block())) {

          usedAfterLoop = true;

          break;

      if (usedAfterLoop && !exitingValues->add(insn)) {

        return AnalysisResult::OOM;

  // Due to limitations in AddClosingPhisForLoop (see comments there),

  // currently we can only unroll loops where, either:

//

  // * no value defined in the loop is used afterwards, in which case we handle

  //   any number of exits from the loop.

//

  // * one or more values defined in the loop is used afterwards, in which case

  //   there must be one exit point from the loop.

//

  // IOW we can't handle multiple exits *and* values defined in the loop.

  if (exitTargetBlocks->size() >= 2 && exitingValues->size() > 0) {

    return AnalysisResult::TooComplex;

  // There is an invariant that a loop exit target block does not contain any

  // phis.  This is really just the no-critical-edges invariant.

//

  // By definition, a loop exit block has at least one successor in the loop

  // and at least one successor outside the loop.  If the loop exit target has

  // phis, that would imply it has multiple predecessors.  But then the edge

  // would be critical, because it connects a block with multiple successors to

  // a block with multiple predecessors.

//

  // So enforce that; it simplifies the addition of loop-closing phis that will

  // happen shortly.

  for (size_t i = 0; i < exitTargetBlocks->size(); i++) {

    MBasicBlock* exitTargetBlock = exitTargetBlocks->get(i);

    if (!exitTargetBlock->phisEmpty()) {

      return AnalysisResult::BadInvariants;

    // Also we need this, since we'll be adding single-argument phis to this

    // block during loop-closing.

    if (exitTargetBlock->numPredecessors() != 1) {

      return AnalysisResult::BadInvariants;

  // At this point, we've worked through all invariant- and structural- reasons

  // why we can't/shouldn't unroll/peel this loop.  Now we need to decide which

  // (or both) of those actions should happen.

  bool peel = true;

  bool unroll = true;

  if (numBlocksInOriginal > MaxBlocksForPeel ||

      numValuesInOriginal > MaxValuesForPeel) {

    peel = false;

  if (numBlocksInOriginal > MaxBlocksForUnroll ||

      numValuesInOriginal > MaxValuesForUnroll) {

    unroll = false;

  if (peel) {

    return unroll ? AnalysisResult::PeelAndUnroll : AnalysisResult::Peel;

  } else {

    return unroll ? AnalysisResult::Unroll : AnalysisResult::TooLarge;

// =====================================================================

//

// AddClosingPhisForLoop

// Add "loop-closing phis" for values defined in a loop and used afterwards.

// In GCC parlance, the resulting form is called LCSSA (Loop-Closed SSA).  This

// is a very limited intervention in that it can only handle a single exit in a

// loop, thereby restricting the places where the phis should go to the

// one-and-only exit block, and avoiding the need to compute iterated dominance

// frontiers.

[[nodiscard]]

static bool AddClosingPhisForLoop(TempAllocator& alloc,

                                  const UnrollState& state) {

  // If no values exit the loop, we don't care how many exit target blocks

  // there are, and need to make no changes.

  if (state.exitingValues.empty()) {

    return true;

  // Ensured by AnalyzeLoop

  MOZ_ASSERT(state.exitTargetBlocks.size() == 1);

  MBasicBlock* targetBlock = state.exitTargetBlocks.get(0);

  // Ensured by AnalyzeLoop.  Required because we're adding single-arg phis

  // to `targetBlock`.

  MOZ_ASSERT(targetBlock->numPredecessors() == 1);

  for (size_t i = 0; i < state.exitingValues.size(); i++) {

    MDefinition* exitingValue = state.exitingValues.get(i);

    // In `targetBlock`, add a single-argument phi node that "catches"

    // `exitingValue`, and change all uses of `exitingValue` to be the phi

    // node instead.

    MPhi* phi = MPhi::New(alloc, exitingValue->type());

    if (!phi) {

      return false;

    targetBlock->addPhi(phi);

    // Change all uses of `exitingValue` outside of the loop into uses of

    // `phi`.

    for (MUseIterator useIter(exitingValue->usesBegin()),

         useIterEnd(exitingValue->usesEnd());

         useIter != useIterEnd;

         /**/) {

      MUse* use = *useIter++;

      if (state.blockTable.rowContains(0, use->consumer()->block())) {

        continue;

      MOZ_ASSERT(use->consumer());

      use->replaceProducer(phi);

    // Finally, make `phi` be the one-and-only user of `exitingValue`.

    if (!phi->addInputFallible(exitingValue)) {

      return false;

  // Invariant that should hold at this point: for each exiting value, all uses

  // of it outside the loop should only be phi nodes.

  for (size_t i = 0; i < state.exitingValues.size(); i++) {

    MDefinition* exitingValue = state.exitingValues.get(i);

    for (MUseIterator useIter(exitingValue->usesBegin());

         useIter != exitingValue->usesEnd(); useIter++) {

      mozilla::DebugOnly<MUse*> use = *useIter;

      MOZ_ASSERT_IF(!state.blockTable.rowContains(0, use->consumer()->block()),

                    use->consumer()->isDefinition() &&

                        use->consumer()->toDefinition()->isPhi());

  return true;

// =====================================================================

//

// UnrollAndOrPeelLoop

[[nodiscard]]

static bool UnrollAndOrPeelLoop(MIRGraph& graph, UnrollState& state) {

  // Prerequisites (assumed):

  // * AnalyzeLoop has approved this loop for peeling and/or unrolling

  // * AddClosingPhisForLoop has been called for it

//

  // We expect `state` to be ready to go, with the caveat that only the first

  // row of entries in `state.blockTable` have been filled in (these are the

  // original loop blocks).  All other entries are null and will be filled in

  // by this routine.

//

  // Almost all the logic here is to do with unrolling.  That's because peeling

  // (done far below) can be done by a trivial modification of the loop

  // resulting from unrolling: make the backedge jump to the second copy of the

  // loop body rather than the first.  That leaves the first copy (that is, the

  // original loop body) in a "will only be executed once" state, as we desire.

  // We need this so often it's worth pulling out specially.

  const MBasicBlock* originalHeader = state.blockTable.get(0, 0);

  MOZ_ASSERT(originalHeader->isLoopHeader());

#ifdef JS_JITSPEW

  if (JitSpewEnabled(JitSpew_UnrollDetails)) {

    Fprinter& printer(JitSpewPrinter());

    printer.printf(

        "<<<< ORIGINAL FUNCTION (after LCSSA-ification of chosen loops)\n");

    DumpMIRGraph(printer, graph, /*showHashedPointers=*/true);

    printer.printf(">>>>\n");

#endif

  // Decide on the total unroll factor.  "Copy #0" is the original loop, so the

  // number of new copies is `unrollFactor - 1`.  For example, if the original

  // loop was `loop { B; }` then with `unrollFactor == 3`, the unrolled loop

  // will be `loop { B; B; B; }`.  Note that the value acquired here includes

  // the copy needed for peeling, if that has been requested.

  const uint32_t unrollFactor = state.blockTable.size1();

  // The logic below will fail if this doesn't hold.

  MOZ_ASSERT(unrollFactor >= 2);

  // The number of blocks in the original loop.

  const uint32_t numBlocksInOriginal = state.blockTable.size2();

  // The number of values (phis and normal values) in the original loop.

  uint32_t numValuesInOriginal = 0;

  for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

    MBasicBlock* block = state.blockTable.get(0, bix);

    // This is very feeble, but I can't think of anything better.

    for (MPhiIterator phiIter(block->phisBegin()); phiIter != block->phisEnd();

         phiIter++) {

      numValuesInOriginal++;

    for (MInstructionIterator insnIter(block->begin());

         insnIter != block->end(); insnIter++) {

      numValuesInOriginal++;

  // `numValuesInOriginal` is const after this point.

#ifdef JS_JITSPEW

  if (JitSpewEnabled(JitSpew_UnrollDetails)) {

    DumpBlockTableRowZero(state.blockTable, "ORIGINAL LOOP");

    Fprinter& printer(JitSpewPrinter());

    printer.printf("<<<< EXIT TARGET BLOCKS: ");

    for (size_t i = 0; i < state.exitTargetBlocks.size(); i++) {

      MBasicBlock* targetBlock = state.exitTargetBlocks.get(i);

      DumpMIRBlockID(printer, targetBlock, /*showHashedPointers=*/true);

      printer.printf(" ");

    printer.printf(">>>>\n");

    printer.printf("<<<< EXITING VALUES: ");

    for (size_t i = 0; i < state.exitingValues.size(); i++) {

      MDefinition* exitingValue = state.exitingValues.get(i);

      DumpMIRDefinitionID(printer, exitingValue, /*showHashedPointers=*/true);

      printer.printf(" ");

    printer.printf(">>>>\n");

#endif

  // At this point, `state.blockTable` row zero contains the original blocks,

  // and all other rows are nulled out.

  // Set up the value table.  This is a structure similar to

  // `state.blockTable`: a 2-D array of values.  This is local to this routine

  // rather than being part of `state` because it is only needed here, and can

  // take quite a lot of space.

//

  // As with the block table, the main (actually, only) purpose of the value

  // table is to make it possible to find equivalent definitions in different

  // iterations.  Also as above, "value index" (`vix` in the code) is a

  // zero-based index into the sequence of values (including phis) in the loop

  // as generated by the RPO traversal of the loop's blocks.

  ValueTable valueTable;

  if (!valueTable.init(unrollFactor, numValuesInOriginal)) {

    return false;

  // Copy the original values into row zero of `valueTable`, parking them

  // end-to-end, without regard to block boundaries.  All other rows remain

  // nulled out for now.

    uint32_t vix = 0;

    for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

      MBasicBlock* block = state.blockTable.get(0, bix);

      for (MPhiIterator phiIter(block->phisBegin());

           phiIter != block->phisEnd(); phiIter++) {

        valueTable.set(0, vix, *phiIter);

        vix++;

      for (MInstructionIterator insnIter(block->begin());

           insnIter != block->end(); insnIter++) {

        valueTable.set(0, vix, *insnIter);

        vix++;

    MOZ_ASSERT(vix == numValuesInOriginal);

  // Set up the value remapper.  This maps MDefinitions* in the original body

  // to their "current" definition in new bodies.  The only allowed keys are

  // MDefinition*s from the original body.  To enforce this, the keys have to

  // be "registered" before use.

  MDefinitionRemapper mapper;

  for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

    MBasicBlock* originalBlock = state.blockTable.get(0, bix);

    // Create initial entries in `mapper` for both the normal insns and phi

    // nodes in `originalBlock`.

    for (MPhiIterator phiIter(originalBlock->phisBegin());

         phiIter != originalBlock->phisEnd(); phiIter++) {

      MPhi* originalPhi = *phiIter;

      if (!mapper.enregister(originalPhi)) {

        return false;

    for (MInstructionIterator insnIter(originalBlock->begin());

         insnIter != originalBlock->end(); insnIter++) {

      MInstruction* originalInsn = *insnIter;

      if (!mapper.enregister(originalInsn)) {

        return false;

  // Now we begin with the very core of the unrolling activity.

  // Create new empty blocks for copy indices 1 .. (unrollFactor-1)

  const CompileInfo& info = originalHeader->info();

  for (uint32_t cix = 1; cix < unrollFactor; cix++) {

    for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

      MBasicBlock* empty = MBasicBlock::New(graph, info, /*pred=*/nullptr,

                                            MBasicBlock::Kind::NORMAL);

      if (!empty) {

        return false;

      empty->setLoopDepth(state.blockTable.get(0, bix)->loopDepth());

      // We don't bother to set the block's ID, because the logic below not

      // depend on it. In any case the blocks will get renumbered at the end of

      // UnrollLoops, so it doesn't matter what we choose here.  Unrolling

      // still works even if we don't set the ID to anything.

      MOZ_ASSERT(!state.blockTable.get(cix, bix));

      state.blockTable.set(cix, bix, empty);

  // Copy verbatim, blocks in the original loop, into our new copies.

  // For each copy of the loop ..

  for (uint32_t cix = 1; cix < unrollFactor; cix++) {

    // For each block in the original copy ..

    uint32_t vix = 0;

    for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

      MBasicBlock* originalBlock = state.blockTable.get(0, bix);

      MBasicBlock* clonedBlock = state.blockTable.get(cix, bix);

      // Clone the phi nodes, but don't update the mapper until all original

      // arguments have been looked up.  This is required because the phi nodes

      // collectively implement a parallel assignment; there is no implied

      // sequentiality.

      mozilla::Vector<std::pair<const MPhi*, MPhi*>, 16, SystemAllocPolicy>

          mapperUpdates;

      for (MPhiIterator phiIter(originalBlock->phisBegin());

           phiIter != originalBlock->phisEnd(); phiIter++) {

        const MPhi* originalPhi = *phiIter;

        MPhi* clonedPhi =

            MakeReplacementPhi(graph.alloc(), mapper, originalPhi);

        if (!clonedPhi) {

          return false;

        clonedBlock->addPhi(clonedPhi);

        MOZ_ASSERT(!valueTable.get(cix, vix));

        valueTable.set(cix, vix, clonedPhi);

        vix++;

        if (!mapperUpdates.append(std::pair(originalPhi, clonedPhi))) {

          return false;

      for (auto& p : mapperUpdates) {

        // Update the value mapper.  The `originalPhi` values must be part of

        // the original loop body and so must already have a key in `mapper`.

        mapper.update(p.first, p.second);

      // Cloning the instructions is simpler, since we can incrementally update

      // the mapper.

      for (MInstructionIterator insnIter(originalBlock->begin());

           insnIter != originalBlock->end(); insnIter++) {

        const MInstruction* originalInsn = *insnIter;

        MInstruction* clonedInsn =

            MakeReplacementInstruction(graph.alloc(), mapper, originalInsn);

        if (!clonedInsn) {

          return false;

        clonedBlock->insertAtEnd(clonedInsn);

        MOZ_ASSERT(!valueTable.get(cix, vix));

        valueTable.set(cix, vix, clonedInsn);

        vix++;

        // Update the value mapper.  `originalInsn` must be part of the

        // original loop body and so must already have a key in `mapper`.

        mapper.update(originalInsn, clonedInsn);

      // Clone the block's predecessor array

      MOZ_ASSERT(clonedBlock->numPredecessors() == 0);

      for (uint32_t i = 0; i < originalBlock->numPredecessors(); i++) {

        MBasicBlock* pred = originalBlock->getPredecessor(i);

        if (!clonedBlock->appendPredecessor(pred)) {

          return false;

    MOZ_ASSERT(vix == numValuesInOriginal);

    // Check we didn't mess up the valueTable ordering relative to the original

    // (best effort assertion).

    for (vix = 0; vix < numValuesInOriginal; vix++) {

      MOZ_ASSERT(valueTable.get(cix, vix)->op() ==

                 valueTable.get(0, vix)->op());

#ifdef JS_JITSPEW

  if (JitSpewEnabled(JitSpew_UnrollDetails)) {

    DumpValueTable(valueTable, "VALUE TABLE (final)");

#endif

  // Fix up (remap) inter-block edges in both the original and copies, using

  // uniform indexing as above (original = index 0, copies = indices >= 1):

  // For a block with copy-index N, inspect each edge.  Determine the role the

  // edge-destination (target) plays in the original copy:

//

  // * backedge

  //     -> remap to header node of copy (N + 1) % unrollFactor

  //        (so, generates a backedge in the last copy)

  // * internal jump within the original body

  //     -> remap to corresponding block in copy N

  // * jump to after the original body

  //     -> unchanged (it is an exiting edge)

  // * jump to before the original body

  //     -> this can't happen

  //        (but if it did, it would be just another exit block)

  auto RemapEdgeDestination =

      [=, &state](uint32_t cix, MBasicBlock* oldDestination) -> MBasicBlock* {

    // `oldDestination` is the target of a control-flow insn in the original

    // loop.  Figure out what its replacement should be, for copy index `cix`

    // (with `cix == 0` meaning the original copy).

    // If it is a back edge, remap to the header node of the next copy, with

    // wraparound for the last copy.

    if (oldDestination == state.blockTable.get(0, 0)) {

      MOZ_ASSERT(oldDestination == originalHeader);

      return state.blockTable.get((cix + 1) % unrollFactor, 0);

    // If it is to some other node within the original body, remap to the

    // corresponding node in this copy.

    for (uint32_t i = 1; i < numBlocksInOriginal; i++) {

      if (oldDestination == state.blockTable.get(0, i)) {

        return state.blockTable.get(cix, i);

    // The target of the edge is not in the original loop.  Either it's to

    // after the loop (an exiting branch) or it is to before the loop, although

    // that would be bizarre and probably illegal.  Either way, leave it

    // unchanged.

    return oldDestination;

};

  // Similarly, for an edge that claims to come from `oldSource` in the

  // original loop, determine where it should come from if we imagine that it

  // applies to copy index `cix`.

  auto RemapEdgeSource = [=, &state](uint32_t cix,

                                     MBasicBlock* oldSource) -> MBasicBlock* {

    if (oldSource == state.blockTable.get(0, numBlocksInOriginal - 1)) {

      // Source is backedge node in the original loop.  Remap to the backedge

      // node in the previous iteration.

      MOZ_ASSERT(oldSource == originalHeader->backedge());

      return state.blockTable.get(cix == 0 ? (unrollFactor - 1) : (cix - 1),

                                  numBlocksInOriginal - 1);

    for (uint32_t i = 0; i < numBlocksInOriginal - 1; i++) {

      if (oldSource == state.blockTable.get(0, i)) {

        // Source is a non-backedge node in the original loop.  Remap it to

        // the equivalent node in this copy.

        return state.blockTable.get(cix, i);

    // Source is not in the original loop.  Leave unchanged.

    return oldSource;

};

  // Work through all control-flow instructions in all copies and remap their

  // edges.

  // For each copy of the loop, including the original ..

  for (uint32_t cix = 0; cix < unrollFactor; cix++) {

    // For each block in the copy ..

    for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

      MBasicBlock* b = state.blockTable.get(cix, bix);

      MControlInstruction* lastI = b->lastIns();

      if (lastI->isGoto()) {

        MGoto* insn = lastI->toGoto();

        insn->setTarget(RemapEdgeDestination(cix, insn->target()));

      } else if (lastI->isTest()) {

        MTest* insn = lastI->toTest();

        insn->setIfTrue(RemapEdgeDestination(cix, insn->ifTrue()));

        insn->setIfFalse(RemapEdgeDestination(cix, insn->ifFalse()));

      } else {

        // The other existing control instructions are MTableSwitch,

        // MUnreachable, MWasmTrap, MWasmReturn, and MWasmReturnVoid.

        // MTableSwitch is ruled out above.  The others end a block without a

        // successor, which means that they can't be loop blocks

        // (MarkLoopBlocks marks blocks that are predecessors of the back

        // edge.)  So we don't expect to see them here.

        MOZ_CRASH();

  // Also we need to remap within the each block's predecessors vector.

  // We have to do this backwards to stop RemapEdgeSource from asserting.

  // For each copy of the loop, including the original ..

  for (int32_t cix = unrollFactor - 1; cix >= 0; cix--) {

    // For each block in the copy ..

    for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

      MBasicBlock* block = state.blockTable.get(cix, bix);

      for (uint32_t i = 0; i < block->numPredecessors(); i++) {

        MBasicBlock* oldPred = block->getPredecessor(i);

        MBasicBlock* newPred = RemapEdgeSource(cix, oldPred);

        block->setPredecessor(i, newPred);

  // Clean up the header node copies (but not for the original copy) since they

  // all still claim to have two predecessors, the first of which is the entry

  // edge from outside the loop.  This is no longer true, so delete that

  // predecessor.  Remove the corresponding Phi args too.  This leaves

  // one-argument phis, meh.

  for (uint32_t cix = 1; cix < unrollFactor; cix++) {

    MBasicBlock* block = state.blockTable.get(cix, 0);

    MOZ_ASSERT(block->numPredecessors() == 2);  // from AnalyzeLoop

    block->erasePredecessor(0);

    MOZ_ASSERT(block->numPredecessors() == 1);  // duh

    for (MPhiIterator phiIter(block->phisBegin()); phiIter != block->phisEnd();

         phiIter++) {

      MPhi* phi = *phiIter;

      MOZ_ASSERT(phi->numOperands() == 2);  // from AnalyzeLoop

      phi->removeOperand(0);

      MOZ_ASSERT(phi->numOperands() == 1);

  // Fix up the phi nodes in the original header; the previous step did not do

  // that.  Specifically, the second arg in each phi -- which are values that

  // flow in from the backedge block -- need to be the ones from the last body

  // copy; but currently they are from the original copy.  At this point, the

  // `mapper` should be able to tell us the correct value.

  for (MPhiIterator phiIter(originalHeader->phisBegin());

       phiIter != originalHeader->phisEnd(); phiIter++) {

    MPhi* phi = *phiIter;

    MOZ_ASSERT(phi->numOperands() == 2);  // from AnalyzeLoop

    // phi arg[0] is defined before the loop.  Ensured by AnalyzeLoop.

    MOZ_ASSERT(!state.blockTable.rowContains(0, phi->getOperand(0)->block()));

    // phi arg[1] is almost always defined in the loop (hence it is a

    // loop-carried value) but is very occasionally defined before the loop

    // (so it is unrelated to the loop).  Proceed with maximum paranoia.

    MDefinition* operand1 = phi->getOperand(1);

    MDefinition* replacement = mapper.lookup(operand1);

    // If we didn't find a replacement (unlikely but possible) then

    // `operand1` must be defined before the loop.

    MOZ_ASSERT((replacement == nullptr) ==

               !state.blockTable.rowContains(0, operand1->block()));

    if (replacement) {

      phi->replaceOperand(1, replacement);

  // Now we need to fix up the exit target blocks, by adding sources of exiting

  // edges to their `predecessors_` vectors and updating their phi nodes

  // accordingly.  Recall that those phi nodes were previously installed in

  // these blocks by AddClosingPhisForLoop.

//

  // Currently the exit target blocks mention only predecessors in the original

  // loop, but they also need to mention equivalent predecessors in each of the

  // loop copies.

  for (uint32_t cix = 0; cix < unrollFactor; cix++) {

    for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

      MBasicBlock* block = state.blockTable.get(cix, bix);

      MControlInstruction* lastIns = block->lastIns();

      for (size_t i = 0; i < lastIns->numSuccessors(); i++) {

        MBasicBlock* succ = lastIns->getSuccessor(i);

        if (!state.exitTargetBlocks.contains(succ)) {

          // This isn't an exiting edge -- not interesting.

          continue;

        // If `cix` is zero (the original body), we expect `succ` to already

        // list `block` as a predecessor.  If `cix` is nonzero (a body copy)

        // then we expect `succ` *not* to list it, and we must add it.

        mozilla::DebugOnly<bool> found = false;

        for (uint32_t j = 0; j < succ->numPredecessors(); j++) {

          if (block == succ->getPredecessor(j)) {

            found = true;

            break;

        MOZ_ASSERT((cix == 0) == found);

        if (cix > 0) {

          if (!succ->appendPredecessor(block)) {

            return false;

          for (MPhiIterator phiIter(succ->phisBegin());

               phiIter != succ->phisEnd(); phiIter++) {

            MPhi* phi = *phiIter;

            // "+ 1" because we just added a predecessor

            MOZ_ASSERT(phi->numOperands() + 1 == succ->numPredecessors());

            MDefinition* exitingValue = phi->getOperand(0);

            MOZ_ASSERT(state.exitingValues.contains(exitingValue));

            // `exitingValue` is defined in the original loop body.  We need to

            // find the equivalent value in copy `cix` so we can add it as an

            // argument to the phi.  This bit is the one-and-only reason for

            // `valueTable`s existence.

            mozilla::Maybe<size_t> colIndex =

                valueTable.findInRow(0, exitingValue);

            MOZ_ASSERT(colIndex.isSome());  // We must find it!

            MDefinition* equivValue = valueTable.get(cix, colIndex.value());

            if (!phi->addInputFallible(equivValue)) {

              return false;

  // At this point the control flow is correct.  But we may now have critical

  // edges where none existed before.  Consider a loop {Block1, Block2} with an

  // exit target Block9:

//

  //   Block1

  //     stuff1

  //     Goto Block2

  //   Block2

  //     stuff2

  //     Test true->Block1(== continue) false->Block9(== break)

  //   ...

  //   Block9

  //     this is a loop-exit target

  //     preds = Block2

//

  // After factor-of-2 unrolling:

//

  //   Block1

  //     stuff1

  //     Goto Block2

  //   Block2

  //     stuff2

  //     Test true->Block3(== continue) false->Block9(== break)

  //   Block3

  //     stuff3

  //     Goto Block4

  //   Block4

  //     stuff4

  //     Test true->Block1(== continue) false->Block9(== break)

  //   ...

  //   Block9

  //     preds = Block2, Block4

//

  // Now the exiting edges (Block2 to Block9 and Block4 to Block9) are critical.

//

  // The obvious fix is to insert a new block on each exiting edge.

  mozilla::Vector<MBasicBlock*, 16, SystemAllocPolicy> splitterBlocks;

  for (uint32_t cix = 0; cix < unrollFactor; cix++) {

    for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

      MBasicBlock* block = state.blockTable.get(cix, bix);

      // Inspect all successors of `block`.  If any are exiting edges, create a

      // new block and route through that instead.

      MControlInstruction* lastIns = block->lastIns();

      for (size_t i = 0; i < lastIns->numSuccessors(); i++) {

        MBasicBlock* succ = lastIns->getSuccessor(i);

        if (!state.exitTargetBlocks.contains(succ)) {

          // This isn't an exiting edge -- not interesting.

          continue;

        // Invent a new block.

        MBasicBlock* splitter =

            MBasicBlock::New(graph, info, block, MBasicBlock::Kind::SPLIT_EDGE);

        if (!splitter || !splitterBlocks.append(splitter)) {

          return false;

        splitter->setLoopDepth(succ->loopDepth());

        // As with empty-block creation earlier in this routine, we don't

        // bother to set the block's ID since none of the logic depends on it,

        // and it will eventually be renumbered anyway by UnrollLoops.

        MGoto* jump = MGoto::New(graph.alloc(), succ);

        if (!jump) {

          return false;

        splitter->insertAtEnd(jump);

        // For the target, fix up the `predecessor_` entry.

        mozilla::DebugOnly<bool> found = false;

        for (uint32_t j = 0; j < succ->numPredecessors(); j++) {

          if (succ->getPredecessor(j) == block) {

            succ->setPredecessor(j, splitter);

            found = true;

            break;

        // Finally, reroute the jump to `succ` via `splitter`.

        lastIns->replaceSuccessor(i, splitter);

        MOZ_ASSERT(found);

  // Now the control flow is correct *and* we don't have any critical edges.

  // Fix up the `successorWithPhis_` and `positionInPhiSuccessor_` fields in

  // both the unrolled loop and the splitter blocks, thusly:

//

  // For all blocks `B` in the unrolled loop, and for all splitter blocks, fix

  // up the `B.successorWithPhis_` and `B.positionInPhiSuccessor_` fields.

  // Each block may have at maximum one successor that has phi nodes, in which

  // case `B.successorWithPhis_` should point at that block.  And

  // `B.positionInPhiSuccessor_` should indicate the index that successor's

  // `predecessors_` vector, of B.

    // Make up a list of blocks to fix up, then process them.

    mozilla::Vector<MBasicBlock*, 32 + 16, SystemAllocPolicy> workList;

    for (uint32_t cix = 0; cix < unrollFactor; cix++) {

      for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

        MBasicBlock* block = state.blockTable.get(cix, bix);

        if (!workList.append(block)) {

          return false;

    for (MBasicBlock* block : splitterBlocks) {

      if (!workList.append(block)) {

        return false;

    // Now process them.

    for (MBasicBlock* block : workList) {

      // Inspect `block`s successors.

      MBasicBlock* succWithPhis = nullptr;

      MControlInstruction* lastIns = block->lastIns();

      // Determine `succWithPhis`, if one exists.

      for (size_t i = 0; i < lastIns->numSuccessors(); i++) {

        MBasicBlock* succ = lastIns->getSuccessor(i);

        if (succ->phisEmpty()) {

          continue;

        MOZ_ASSERT(succ);

        // `succ` is a successor to `block` that has phis.  So we can't already

        // have seen a (different) successor.

        if (!succWithPhis) {

          succWithPhis = succ;

        } else if (succWithPhis == succ) {

          // ignore duplicate successors

        } else {

          // More than one successor with phis.  Structural invariants

          // invalidated.  Give up.

          MOZ_CRASH();

      // Annotate `block`.

      if (!succWithPhis) {

        block->setSuccessorWithPhis(nullptr, 0);

      } else {

        MOZ_ASSERT(!succWithPhis->phisEmpty());

        // Figure out which of `succWithPhis`s predecessors `block` is.

        // Should we assert `succWithPhis->predecessors_` is duplicate-free?

        uint32_t predIx = UINT32_MAX;  // invalid

        for (uint32_t i = 0; i < succWithPhis->numPredecessors(); i++) {

          if (succWithPhis->getPredecessor(i) == block) {

            predIx = i;

            break;

        MOZ_ASSERT(predIx != UINT32_MAX);

        block->setSuccessorWithPhis(succWithPhis, predIx);

  // Find and remove the MWasmInterruptCheck in all but the last iteration.

  // We don't assume that there is an interrupt check, since

  // wasm::FunctionCompiler::fillArray, at least, generates a loop with no

  // check.

  for (uint32_t cix = 0; cix < unrollFactor - 1; cix++) {

    for (uint32_t vix = 0; vix < numValuesInOriginal; vix++) {

      MDefinition* ins = valueTable.get(cix, vix);

      if (!ins->isWasmInterruptCheck()) {

        continue;

      MWasmInterruptCheck* ic = ins->toWasmInterruptCheck();

      ic->block()->discard(ic);

      valueTable.set(cix, vix, nullptr);

  // We're all done with unrolling.  If peeling was also requested, make a

  // minor modification to the unrolled loop: change the backedge so that,

  // instead of jumping to the first body copy, make it jump to the second body

  // copy.  This leaves the first body copy in a

  // will-only-ever-be-executed-once state, hence achieving peeling.

  if (state.doPeeling()) {

    MBasicBlock* backedge =

        state.blockTable.get(unrollFactor - 1, numBlocksInOriginal - 1);

    MBasicBlock* header0 = state.blockTable.get(0, 0);

    MBasicBlock* header1 = state.blockTable.get(1, 0);

    MOZ_ASSERT(header0 == originalHeader);

    // Fix up the back edge, to make it jump to the second copy of the loop

    // body

    MOZ_ASSERT(backedge->hasLastIns());  // should always hold (AnalyzeLoop)

    MOZ_ASSERT(backedge->lastIns()->isGoto());  // AnalyzeLoop ensures

    MGoto* backedgeG = backedge->lastIns()->toGoto();

    MOZ_ASSERT(backedgeG->target() == header0);  // AnalyzeLoop ensures

    backedgeG->setTarget(header1);

    header0->clearLoopHeader();

    header1->setLoopHeader();

    for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

      MBasicBlock* originalBlock = state.blockTable.get(0, bix);

      MOZ_ASSERT(originalBlock->loopDepth() > 0);

      originalBlock->setLoopDepth(originalBlock->loopDepth() - 1);

    // Fix up the preds of the original header

    MOZ_ASSERT(header0->numPredecessors() == 2);

    MOZ_ASSERT(header0->getPredecessor(1) == backedge);

    header0->erasePredecessor(1);

    // Fix up the preds of the new header

    MOZ_ASSERT(header1->numPredecessors() == 1);

    if (!header1->appendPredecessor(backedge)) {

      return false;

    // Fix up phis of both headers, by moving the second operand of each phi of

    // `header0` to be the second operand of the corresponding phi in

    // `header1`.

    MPhiIterator phiIter0(header0->phisBegin());

    MPhiIterator phiIter1(header1->phisBegin());

    while (true) {

      bool finished0 = phiIter0 == header0->phisEnd();

      mozilla::DebugOnly<bool> finished1 = phiIter1 == header1->phisEnd();

      // The two blocks must have the same number of phis.

      MOZ_ASSERT(finished0 == finished1);

      if (finished0) {

        break;

      MPhi* phi0 = *phiIter0;

      MPhi* phi1 = *phiIter1;

      MOZ_ASSERT(phi0->numOperands() == 2);

      MOZ_ASSERT(phi1->numOperands() == 1);

      MDefinition* phi0arg1 = phi0->getOperand(1);

      phi0->removeOperand(1);

      if (!phi1->addInputFallible(phi0arg1)) {

        return false;

      phiIter0++;

      phiIter1++;

    // Fix the succ-w-phis entry for `backEdge`.  In very rare circumstances,

    // the loop may have no header phis, so this must be conditional.

    if (backedge->successorWithPhis()) {

      MOZ_ASSERT(backedge->successorWithPhis() == header0);

      MOZ_ASSERT(backedge->positionInPhiSuccessor() == 1);

      backedge->setSuccessorWithPhis(header1, 1);

#ifdef JS_JITSPEW

  if (JitSpewEnabled(JitSpew_UnrollDetails)) {

    DumpBlockTable(state.blockTable, "LOOP BLOCKS (final)");

    Fprinter& printer(JitSpewPrinter());

    printer.printf("<<<< SPLITTER BLOCKS\n");

    for (MBasicBlock* block : splitterBlocks) {

      DumpMIRBlock(printer, block, /*showHashedPointers=*/true);

    printer.printf(">>>>\n");

#endif

  // Add the new blocks to the graph.

    // We want to add them to the end of the existing loop.

    MBasicBlock* cursor = state.blockTable.get(0, numBlocksInOriginal - 1);

    for (uint32_t cix = 1; cix < unrollFactor; cix++) {

      for (uint32_t bix = 0; bix < numBlocksInOriginal; bix++) {

        MBasicBlock* addMe = state.blockTable.get(cix, bix);

        graph.insertBlockAfter(cursor, addMe);

        cursor = addMe;

    for (MBasicBlock* addMe : splitterBlocks) {

      graph.insertBlockAfter(cursor, addMe);

      cursor = addMe;

  // UnrollLoops will renumber the blocks and redo the dominator tree once all

  // loops have been unrolled.

#ifdef JS_JITSPEW

  if (JitSpewEnabled(JitSpew_UnrollDetails)) {

    Fprinter& printer(JitSpewPrinter());

    printer.printf("<<<< FUNCTION AFTER UNROLLING\n");

    DumpMIRGraph(printer, graph, /*showHashedPointers=*/true);

    printer.printf(">>>>\n");

#endif

  return true;

// =====================================================================

//

// FindInitialCandidates, which finds "initial candidates" for

// unrolling/peeling.  See SMDOC at the top of this file.

struct InitialCandidate {

  MBasicBlock* header;

  uint32_t numBlocks;

  float score;

  bool valid;

  InitialCandidate(MBasicBlock* header, uint32_t numBlocks)

      : header(header), numBlocks(numBlocks), score(0.0), valid(true) {}

};

using InitialCandidateVector =

    mozilla::Vector<InitialCandidate, 64, SystemAllocPolicy>;

[[nodiscard]]

bool FindInitialCandidates(MIRGraph& graph,

                           InitialCandidateVector& initialCandidates) {

  // Find initial candidate loops, and place them in `initialCandidates`.

  MOZ_ASSERT(initialCandidates.empty());

  // TODO: possibly use a linear-time algorithm to find innermost loops, as

  // BackTrackingAllocator.cpp does, rather than the scheme below.

  // See bug 1959074.

  // Collect up the loops in the function.  The iteration order doesn't matter

  // here, but we have to choose something.

  for (auto rpoIter(graph.rpoBegin()), rpoIterEnd(graph.rpoEnd());

       rpoIter != rpoIterEnd; ++rpoIter) {

    MBasicBlock* header = *rpoIter;

    if (!header->isLoopHeader()) {

      continue;

    bool hasOsrEntry;

    size_t numBlocks = MarkLoopBlocks(graph, header, &hasOsrEntry);

    if (numBlocks == 0) {

      // Bogus; skip.

      continue;

    UnmarkLoopBlocks(graph, header);

    if (hasOsrEntry) {

      // Unhandled; skip.

      continue;

    MOZ_RELEASE_ASSERT(numBlocks <= UINT32_MAX);

    if (!initialCandidates.append(InitialCandidate(header, numBlocks))) {

      return false;

  // Sort the loops by size (in blocks).

  std::sort(initialCandidates.begin(), initialCandidates.end(),

            [](const InitialCandidate& cand1, const InitialCandidate& cand2) {

              return cand1.numBlocks < cand2.numBlocks;

});

  // Remove non-innermost candidates from consideration.  We expect

  // `initialCandidates` to describe contiguous, properly nested candidates.

  // For each loop in `initialCandidates`, scan forwards in the vector and mark

  // as invalid, any subsequent loops that contain it.

  for (size_t i = 0; i < initialCandidates.length(); i++) {

    const InitialCandidate& candI = initialCandidates[i];

    MOZ_ASSERT(candI.numBlocks > 0);

    for (size_t j = i + 1; j < initialCandidates.length(); j++) {

      InitialCandidate& candJ = initialCandidates[j];

      MOZ_ASSERT(candJ.numBlocks > 0);

      // Because sorted by size:

      MOZ_ASSERT(candI.numBlocks <= candJ.numBlocks);

      if (!candJ.valid) {

        // Already invalidated

        continue;

      // Determine relationship of I vs J.  Either non-overlapping, or I is

      // completely contained inside J.  In which case invalidate J.

      MOZ_ASSERT(candI.numBlocks > 0);

      if (candI.header->id() + candI.numBlocks <= candJ.header->id()) {

        continue;  // no overlap

      if (candJ.header->id() + candJ.numBlocks <= candI.header->id()) {

        continue;  // no overlap

      // "I is completely contained within J"

      MOZ_ASSERT(candJ.header->id() <= candI.header->id());

      MOZ_ASSERT(candI.header->id() + candI.numBlocks <=

                 candJ.header->id() + candJ.numBlocks);

      // and they aren't identical

      MOZ_ASSERT(candI.header->id() != candJ.header->id() ||

                 candI.numBlocks != candJ.numBlocks);

      candJ.valid = false;

  // Slide the valid ones to the start of the vector and sort by starting block

  // ID.  This makes it easy to verify that the cands are non-overlapping.

    size_t wr = 0;

    size_t rd;

    for (rd = 0; rd < initialCandidates.length(); rd++) {

      if (initialCandidates[rd].valid) {

        initialCandidates[wr++] = initialCandidates[rd];

    while (wr < rd) {

      initialCandidates.popBack();

      wr++;

  // Sort the loops by header ID.

  std::sort(initialCandidates.begin(), initialCandidates.end(),

            [](const InitialCandidate& cand1, const InitialCandidate& cand2) {

              return cand1.header->id() < cand2.header->id();

});

  // Now we can conveniently assert non-overlappingness.

  for (size_t i = 1; i < initialCandidates.length(); i++) {

    MOZ_ASSERT(initialCandidates[i - 1].header->id() +

                   initialCandidates[i - 1].numBlocks <=

               initialCandidates[i].header->id());

  // Heuristics: calculate a "score" for each loop, indicating our desire to

  // unroll it.  **Lower** values indicate a higher desirability for unrolling.

//

  // A loop gets a lower score if:

  // * it is small

  // * it has few blocks

  // * it is deeply nested (currently ignored)

  for (InitialCandidate& cand : initialCandidates) {

    uint32_t nBlocks = cand.numBlocks;

    uint32_t nInsns = 0;

    uint32_t nPhis = 0;

    // Visit each block in the loop

    mozilla::DebugOnly<uint32_t> numBlocksVisited = 0;

    const MBasicBlock* backedge = cand.header->backedge();

    for (auto i(graph.rpoBegin(cand.header)); /**/; ++i) {

      MOZ_ASSERT(i != graph.rpoEnd(),

                 "UnrollLoops: loop blocks overrun graph end");

      MBasicBlock* block = *i;

      numBlocksVisited++;

      for (MInstructionIterator iter(block->begin()); iter != block->end();

           iter++) {

        nInsns++;

      for (MPhiIterator iter(block->phisBegin()); iter != block->phisEnd();

           iter++) {

        nPhis++;

      if (block == backedge) {

        break;

    MOZ_ASSERT(numBlocksVisited == nBlocks);

    cand.score =

        10.0 * float(nBlocks) + 2.0 * float(nInsns) + 1.0 * float(nPhis);

  // Sort the loops by `score`.

  std::sort(initialCandidates.begin(), initialCandidates.end(),

            [](const InitialCandidate& cand1, const InitialCandidate& cand2) {

              return cand1.score < cand2.score;

});

  return true;

// =====================================================================

//

// UnrollLoops, the top level of the unroller.

bool UnrollLoops(const MIRGenerator* mir, MIRGraph& graph, bool* changed) {

  *changed = false;

#ifdef JS_JITSPEW

  if (JitSpewEnabled(JitSpew_Unroll)) {

    JitSpew(JitSpew_Unroll, "BEGIN UnrollLoops");

#endif

  // Make up a vector of initial candidates, which are innermost loops only,

  // and not too big.

  InitialCandidateVector initialCandidates;

  if (!FindInitialCandidates(graph, initialCandidates)) {

    return false;

#ifdef JS_JITSPEW

  if (JitSpewEnabled(JitSpew_Unroll)) {

    for (const InitialCandidate& cand : initialCandidates) {

      MOZ_ASSERT(cand.valid);

      JitSpew(JitSpew_Unroll, "   initial cand:  score=%-5.1f  blocks=%u-%u",

              cand.score, cand.header->id(),

              cand.header->id() + cand.numBlocks - 1);

#endif

  // Perform a more expensive analysis of the candidates, leading to a further

  // reduction in the set.  For those remaining, an UnrollState is created and

  // initialized.  Also a decision is made whether to unroll only, peel only,

  // or both.

  mozilla::Vector<UnrollState, 32, SystemAllocPolicy> unrollStates;

  for (const InitialCandidate& cand : initialCandidates) {

    if (cand.score > InitialCandidateThreshold) {

      // Give up.  We just sorted them by `score`, so the rest will be even

      // more undesirable for unrolling.

      break;

    // Park the loop's blocks in a vector, so we can hand it off to

    // AnalyzeLoops for examination.  This is the place where the original

    // block sequence, as seen by the entire rest of the machinery, is fixed.

    BlockVector originalBlocks;

    const MBasicBlock* backedge = cand.header->backedge();

    for (auto blockIter(graph.rpoBegin(cand.header)); /**/; ++blockIter) {

      MOZ_ASSERT(blockIter != graph.rpoEnd());

      MBasicBlock* block = *blockIter;

      if (!originalBlocks.append(block)) {

        return false;

      if (block == backedge) {

        break;

    // Analyze the candidate in detail.  If unrolling and/or peeling is

    // approved, this also collects up for us, the loop's target block set and

    // exiting value set.

    BlockSet exitTargetBlocks;

    ValueSet exitingValues;

    AnalysisResult res =

        AnalyzeLoop(originalBlocks, &exitTargetBlocks, &exitingValues);

#ifdef JS_JITSPEW

    if (JitSpewEnabled(JitSpew_Unroll)) {

      MOZ_ASSERT(cand.valid);

      JitSpew(JitSpew_Unroll, "  %13s:  score=%-5.1f  blocks=%u-%u",

              Name_of_AnalysisResult(res), cand.score, cand.header->id(),

              cand.header->id() + cand.numBlocks - 1);

#endif

    switch (res) {

      case AnalysisResult::OOM:

        return false;

      case AnalysisResult::Peel:

      case AnalysisResult::Unroll:

      case AnalysisResult::PeelAndUnroll:

        break;

      case AnalysisResult::BadInvariants:

        MOZ_CRASH("js::jit::UnrollLoops: unexpected incoming MIR");

      case AnalysisResult::TooComplex:

      case AnalysisResult::Uncloneable:

      case AnalysisResult::TooLarge:

      case AnalysisResult::Unsuitable:

        // Ignore this loop

        continue;

      default:

        MOZ_CRASH();

    // The candidate has been accepted for unrolling and/or peeling.  Make an

    // initial UnrollState for it and add it to our collection thereof.

    UnrollMode mode;

    if (res == AnalysisResult::Peel) {

      mode = UnrollMode::Peel;

    } else if (res == AnalysisResult::Unroll) {

      mode = UnrollMode::Unroll;

    } else if (res == AnalysisResult::PeelAndUnroll) {

      mode = UnrollMode::PeelAndUnroll;

    } else {

      MOZ_CRASH();

    if (!unrollStates.emplaceBack(mode)) {

      return false;

    UnrollState* state = &unrollStates.back();

    if (!state->exitTargetBlocks.copy(exitTargetBlocks) ||

        !state->exitingValues.copy(exitingValues)) {

      return false;

    // Ignoring peeling, how many times should we unroll a loop?

    // For example, if the original loop was `loop { B; }` then with

    // `basicUnrollingFactor = 3`, the unrolled loop will be `loop { B; B; B;

    // }`.  If peeling is also requested then we will unroll one more time than

    // this, giving overall result `B; loop { B; B; B; }`.

    uint32_t basicUnrollingFactor = JS::Prefs::wasm_unroll_factor();

    if (basicUnrollingFactor < 2) {

      // It needs to be at least 2, else we're not unrolling at all.

      basicUnrollingFactor = 2;

    } else if (basicUnrollingFactor > 8) {

      // Unrolling gives rapidly diminishing marginal gains.  Even beyond 4

      // there's little benefit to be had.

      basicUnrollingFactor = 8;

    // Decide on the total number of duplicates of the loop body we'll need

    // (including the original).

    uint32_t unrollingFactor;

    if (res == AnalysisResult::Peel) {

      // 1 for the peeled iteration,

      // 1 for the loop (since we are not unrolling it)

      unrollingFactor = 1 + 1;

    } else if (res == AnalysisResult::Unroll) {

      // No peeled iteration, we unroll only

      unrollingFactor = 0 + basicUnrollingFactor;

    } else if (res == AnalysisResult::PeelAndUnroll) {

      // 1 peeled iteration, and we unroll the loop

      unrollingFactor = 1 + basicUnrollingFactor;

    } else {

      MOZ_CRASH();

    // Set up UnrollState::blockTable and copy in the initial loop blocks.

    if (!state->blockTable.init(unrollingFactor, originalBlocks.length())) {

      return false;

    for (uint32_t bix = 0; bix < originalBlocks.length(); bix++) {

      state->blockTable.set(0, bix, originalBlocks[bix]);

  // Now, finally, we have an initialized vector of UnrollStates ready to go.

  // After this point, `originalBlocks` is no longer used.  Also, because of

  // the checking done in AnalyzeLoop, we know the transformations can't fail

  // (apart from OOMing).

  // Add "loop-closing phis" for values defined in the loop and used

  // afterwards.  See comments on AddClosingPhisForLoop.

  for (const UnrollState& state : unrollStates) {

    if (!AddClosingPhisForLoop(graph.alloc(), state)) {

      return false;

  // Actually do the unrolling and/or peeling.

  uint32_t numLoopsPeeled = 0;

  uint32_t numLoopsUnrolled = 0;

  uint32_t numLoopsPeeledAndUnrolled = 0;

  for (UnrollState& state : unrollStates) {

    if (!UnrollAndOrPeelLoop(graph, state)) {

      return false;

    // Update stats.

    switch (state.mode) {

      case UnrollMode::Peel:

        numLoopsPeeled++;

        break;

      case UnrollMode::Unroll:

        numLoopsUnrolled++;

        break;

      case UnrollMode::PeelAndUnroll:

        numLoopsPeeledAndUnrolled++;

        break;

      default:

        MOZ_CRASH();

  // Do function-level fixups, if anything changed.

  if (!unrollStates.empty()) {

    RenumberBlocks(graph);

    ClearDominatorTree(graph);

    if (!BuildDominatorTree(mir, graph)) {

      return false;

  uint32_t numLoopsChanged =

      numLoopsPeeled + numLoopsUnrolled + numLoopsPeeledAndUnrolled;

#ifdef JS_JITSPEW

  if (JitSpewEnabled(JitSpew_Unroll)) {

    if (numLoopsChanged == 0) {

      JitSpew(JitSpew_Unroll, "END   UnrollLoops");

    } else {

      JitSpew(JitSpew_Unroll,

              "END UnrollLoops, %u processed (P=%u, U=%u, P&U=%u)",

              numLoopsChanged, numLoopsPeeled, numLoopsUnrolled,

              numLoopsPeeledAndUnrolled);

#endif

  *changed = numLoopsChanged > 0;

  return true;

}  // namespace jit

}  // namespace js