| aria_avx512_gfni.cpp |
ARIA has two S-boxes pairs S1/X1 (the Rijndael sbox and its inverse)
and S2/X2 (another sbox and its inverse), all of which can be described
as an affine transformation applied to an inversion in GF(2^8)
A very helpful reference for this implementation was
"AVX-Based Acceleration of ARIA Block Cipher Algorithm"
by Yoo, Kivilinna, Cho.
IEEE Access, Vol. 11, 2023 (DOI: 10.1109/ACCESS.2023.3298026)
<https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10190597>
The paper describes the sbox decompositions (Section IV. A. 1.)
S1(x) = A_S1(inv(x)) -> affineinv(AFF_S1, x, 0x63)
S2(x) = A_S2(inv(x)) -> affineinv(AFF_S2, x, 0xE2)
X1(x) = inv(A_{S1^-1}(x)) -> affine(AFF_X1, x, 0x05) then affineinv(I, y, 0)
X2(x) = inv(A_{S2^-1}(x)) -> affine(AFF_X2, x, 0x2C) then affineinv(I, y, 0)
where inv(x) = x^-1 in GF(2^8), implemented by the GFNI affineinv instruction
and the AFF_* matrixes are the constants following.
The approach used here diverges from the implementation described in the
paper; they used AVX-512 to compute 64 blocks in parallel. This implementation
instead takes advantage of the fact that AVX-512/GFNI can use 4 different GFNI
affine constants in a single call, and so needs only 16 block chunks. This
leads to less register pressure and (imo) a simpler implementation, albeit likely
giving up some performance with larger input sizes.
|
15283 |
- |