Diego Biurrun
01621202aa
build: miscellaneous cosmetics
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
10 years ago
Diego Biurrun
1a094af638
fft: Split MDCT bits off from FFT
10 years ago
Janne Grunau
a0fc780a20
arm64: int32_to_float_fmul neon asm
3% faster dts decoding on a cortex-a57.
cortex-a57 cortex-a53
int32_to_float_fmul_array8_c: 1270.9 4475.6
int32_to_float_fmul_array8_neon: 328.6 569.2
int32_to_float_fmul_scalar_c: 928.5 4119.6
int32_to_float_fmul_scalar_neon: 309.1 524.1
10 years ago
Janne Grunau
705f5e5e15
arm64: port synth_filter_float_neon from arm
~25% faster dts decoding overall. The checkasm CPU cycles numbers are
not that useful since synth_filter_float() calls FFTContext.imdct_half().
cortex-a57 cortex-a53
synth_filter_float_c: 1866.2 3490.9
synth_filter_float_neon: 915.0 1531.5
With fftc.imdct_half forced to imdct_half_neon:
cortex-a57 cortex-a53
synth_filter_float_c: 1718.4 3025.3
synth_filter_float_neon: 926.2 1530.1
10 years ago
Janne Grunau
c33c1fa8af
arm64: convert dcadsp neon asm from arm
~2% faster dts decoding overall.
cortex-a57 cortex-a53
dca_decode_hf_c: 474.8 1659.9
dca_decode_hf_neon: 225.2 301.1
dca_lfe_fir0_c: 913.2 1537.7
dca_lfe_fir0_neon: 286.8 451.9
dca_lfe_fir1_c: 848.7 1711.5
dca_lfe_fir1_neon: 387.1 506.4
10 years ago
Janne Grunau
f56d8d8dd7
h264: aarch64: intra prediction optimisations
10 years ago
Diego Biurrun
3d5d46233c
opus: Factor out imdct15 into a standalone component
It will be reused by the AAC decoder.
11 years ago
Janne Grunau
d3f5b94762
aarch64: opus NEON iMDCT and FFT
Opus celt decoding 11% faster and the iMDCT over 2.5 times faster on
Apple's A7.
12 years ago
Janne Grunau
3956a5e0ea
aarch64: NEON vorbis_inverse_coupling
From the ARMv7 NEON version. 16 times faster as the C version, overall
more than 12% faster vorbis decoding on Apple's A7.
12 years ago
Janne Grunau
8f9fe6ae34
aarch64: NEON fixed/floating point MPADSP apply_window
30%/25% (fixed/float) faster mp3 decoding on Apple's A7. The floating
point decoder is approximately 7% faster.
12 years ago
Janne Grunau
ee2bc5974f
aarch64: NEON float (i)MDCT
Approximately as fast as the ARM NEON version on Apple's A7.
12 years ago
Janne Grunau
650c4300d9
aarch64: NEON float FFT
Approximately as fast as the ARM NEON version on Apple's A7.
12 years ago
Janne Grunau
d3789eeeed
aarch64: implement videodsp.prefetch
8% faster h264 decoding on Apple A7.
12 years ago
Diego Biurrun
0e083d7e43
build: Group general components separate from de/encoders in arch Makefiles
This is in line with how the top-level libavcodec Makefile is structured.
12 years ago
Janne Grunau
fe96769bed
aarch64: port neon clobber test from arm
12 years ago
Janne Grunau
36e3b1f2fd
aarch64: h264 loop filter NEON optimizations
Ported from ARMv7 NEON.
12 years ago
Janne Grunau
c65d67ef50
aarch64: hpeldsp NEON optimizations
Ported from ARMv7 NEON.
12 years ago
Janne Grunau
d5dd8c7bf0
aarch64: h264 qpel NEON optimizations
Ported from ARMv7 NEON.
12 years ago
Janne Grunau
8438b3f09f
aarch64: h264 idct NEON assembler optimizations
Ported from ARMv7 NEON.
12 years ago
Janne Grunau
71617884a2
aarch64: h264 chroma motion compensation NEON optimizations
Since RV40 and VC-1 use almost the same algorithm so optimizations for
those two decoders are easy to do and included.
12 years ago