mmxfrag.c |
MMX acceleration of fragment reconstruction for motion compensation.
Originally written by Rudolf Marek.
Additional optimization by Nils Pipenbrinck.
Note: Loops are unrolled for best performance.
The iteration each instruction belongs to is marked in the comments as #i. |
12007 |
mmxidct.c |
MMX acceleration of Theora's iDCT.
Originally written by Rudolf Marek, based on code from On2's VP3. |
16489 |
mmxloop.h |
On entry, mm0={a0,...,a7}, mm1={b0,...,b7}, mm2={c0,...,c7}, mm3={d0,...d7}.
On exit, mm1={b0+lflim(R_0,L),...,b7+lflim(R_7,L)} and
mm2={c0-lflim(R_0,L),...,c7-lflim(R_7,L)}; mm0 and mm3 are clobbered. |
11198 |
mmxstate.c |
MMX acceleration of complete fragment reconstruction algorithm.
Originally written by Rudolf Marek. |
8719 |
sse2idct.c |
SSE2 acceleration of Theora's iDCT. |
17556 |
sse2trans.h |
On x86-64 we can transpose in-place without spilling registers.
By clever choices of the order to apply the butterflies and the order of
their outputs, we can take the rows in order and output the columns in order
without any extra operations and using just one temporary register. |
8518 |
x86cpu.c |
On x86-64, gcc seems to be able to figure out how to save %rbx for us when
compiling with -fPIC. |
6606 |
x86cpu.h |
|
1422 |
x86int.h |
x86-64 guarantees SIMD support up through at least SSE2.
If the best routine we have available only needs SSE2 (which at the moment
covers all of them), then we can avoid runtime detection and the indirect
call. |
5490 |
x86state.c |
This table has been modified from OC_FZIG_ZAG by baking a 4x4 transpose into
each quadrant of the destination. |
3458 |