This prevents crashes when trying to read beyond the end of the buffer
while decoding frame data.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Unrolling the main loop to process, instead of 4 elements:
- 8: minor gain of 2 cycles (not worth the extra object size)
- 2: loss of 8 cycles.
Assigning STEP to a register is a loss. Output address (Y) is almost always
unaligned.
Timings:
- C (32/64 bits): 117/109 cycles
- SSE: 57 cycles
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Since we are clipping before we shift the values to
16 or 32 bits, we should not shift the min/max clip
values to compensate.
Fixes 8 and 24 bit lossy decoding.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
If the PNG filter is enabled, a PNG-style filter will run over the
input buffer, writing into the buffer. Therefore, if no zlib compression
was used, ensure that we copy into a temporary buffer, otherwise we
overwrite user-provided input data.
This prevents crashers and errors further down when reading nodes in the
empty tree.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
The operations that use it require it to be promoted to a larger (natural)
type and thus perform sign extension on it.
While an optimal compiler may account for this, gcc 4.6 (for x86 Windows)
fails. Using the natural integer type provides a 2% speedup for Win64
and 1% for Win32.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This prevents having to sign-extend on 64-bit systems with 32-bit ints,
such as x86-64. Also fixes crashes on systems where we don't do it and
arguments are not in registers, such as Win64 for all weight functions.