Name Description Size
mmxfrag.c MMX acceleration of fragment reconstruction for motion compensation. Originally written by Rudolf Marek. Additional optimization by Nils Pipenbrinck. Note: Loops are unrolled for best performance. The iteration each instruction belongs to is marked in the comments as #i. 12007
mmxidct.c MMX acceleration of Theora's iDCT. Originally written by Rudolf Marek, based on code from On2's VP3. 16489
mmxloop.h On entry, mm0={a0,...,a7}, mm1={b0,...,b7}, mm2={c0,...,c7}, mm3={d0,...d7}. On exit, mm1={b0+lflim(R_0,L),...,b7+lflim(R_7,L)} and mm2={c0-lflim(R_0,L),...,c7-lflim(R_7,L)}; mm0 and mm3 are clobbered. 11198
mmxstate.c MMX acceleration of complete fragment reconstruction algorithm. Originally written by Rudolf Marek. 8719
sse2idct.c SSE2 acceleration of Theora's iDCT. 17556
sse2trans.h On x86-64 we can transpose in-place without spilling registers. By clever choices of the order to apply the butterflies and the order of their outputs, we can take the rows in order and output the columns in order without any extra operations and using just one temporary register. 8518
x86cpu.c On x86-64, gcc seems to be able to figure out how to save %rbx for us when compiling with -fPIC. 6606
x86cpu.h 1422
x86int.h x86-64 guarantees SIMD support up through at least SSE2. If the best routine we have available only needs SSE2 (which at the moment covers all of them), then we can avoid runtime detection and the indirect call. 5490
x86state.c This table has been modified from OC_FZIG_ZAG by baking a 4x4 transpose into each quadrant of the destination. 3458