| xts_avx512_clmul.cpp |
We need to perform N doublings on each block.
We can compute the carryless multiplication with any size. Here, curiously, the
constraint is that AVX2/AVX512 don't include an equivalent of psrldq (aka
_mm_srli_si128), which allows shifting 128-bit lanes by any number of bits.
Instead only byte-wide lane shifts are available, so we can only raise to powers
where N is a multiple of 8.
|
2481 |
- |