third_party/botan/src/lib/block/aria/aria_avx512_gfni

Name	Description	Size	Coverage
aria_avx512_gfni.cpp	ARIA has two S-boxes pairs S1/X1 (the Rijndael sbox and its inverse) and S2/X2 (another sbox and its inverse), all of which can be described as an affine transformation applied to an inversion in GF(2^8) A very helpful reference for this implementation was "AVX-Based Acceleration of ARIA Block Cipher Algorithm" by Yoo, Kivilinna, Cho. IEEE Access, Vol. 11, 2023 (DOI: 10.1109/ACCESS.2023.3298026) <https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10190597> The paper describes the sbox decompositions (Section IV. A. 1.) S1(x) = A_S1(inv(x)) -> affineinv(AFF_S1, x, 0x63) S2(x) = A_S2(inv(x)) -> affineinv(AFF_S2, x, 0xE2) X1(x) = inv(A_{S1^-1}(x)) -> affine(AFF_X1, x, 0x05) then affineinv(I, y, 0) X2(x) = inv(A_{S2^-1}(x)) -> affine(AFF_X2, x, 0x2C) then affineinv(I, y, 0) where inv(x) = x^-1 in GF(2^8), implemented by the GFNI affineinv instruction and the AFF_* matrixes are the constants following. The approach used here diverges from the implementation described in the paper; they used AVX-512 to compute 64 blocks in parallel. This implementation instead takes advantage of the fact that AVX-512/GFNI can use 4 different GFNI affine constants in a single call, and so needs only 16 block chunks. This leads to less register pressure and (imo) a simpler implementation, albeit likely giving up some performance with larger input sizes.	15283	-
info.txt		204	-