Author | SHA1 | Message | Date |
---|---|---|---|
|
34454c761f |
SBR DSP x86: implement SSE sbr_sum_square_sse
The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> |
13 years ago |
|
2e74a5abc2 |
SBR DSP: use intptr_t for the ixh parameter.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> |
13 years ago |
|
be822d77b6 |
aacsbr: ARM NEON optimised sbrdsp functions
Overall speedup of HE-AAC decoding 2.3x on Cortex-A8, 1.2x on A9. Signed-off-by: Mans Rullgard <mans@mansr.com> |
13 years ago |
|
aac46e088d |
aacsbr: move some simdable loops to function pointers
This prepares for assembly optimisations by moving the most time-consuming loops to functions called through pointers in a new context. Signed-off-by: Mans Rullgard <mans@mansr.com> |
13 years ago |