Martin Storsjö
557519189f
swscale: yuv2planeX 8bit >=sse2 functions need aligned stack on x86-32.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
13 years ago
Diego Biurrun
baaab6069a
build: Move all arch OBJS declarations into arch subdirectory Makefiles.
13 years ago
Henrik Gramner
729f90e268
x86inc improvements for 64-bit
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments
Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
13 years ago
Ronald S. Bultje
45fdcc8e2d
swscale: convert hscale() to use named arguments.
13 years ago
Ronald S. Bultje
aba7a827aa
swscale: convert hscale to cpuflags().
13 years ago
Ronald S. Bultje
2254b559cb
swscale: make filterPos 32bit.
Fixes overflows for large image sizes.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
13 years ago
Ronald S. Bultje
dccb2cd3f9
swscale: make %rep unconditional.
Fixes pre-processing with latest versions of nasm.
13 years ago
Ronald S. Bultje
8249a23fc1
swscale: remove now unnecessary hack.
13 years ago
Ronald S. Bultje
1d8c4af396
swscale: take first/lastline over/underflows into account for MMX.
Fixes crashes for extremely large resizes (several 100-fold).
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
13 years ago
Ronald S. Bultje
b18f8cbf3d
Revert two swscale commits.
Revert "swscale: update context offsets after removal of AlpMmxFilter."
(commit a95e3fa90b
)
and
Revert "swscale: Remove some write-only variables related to alpha handling."
(commit 9d03cb9fc5
).
They broke alpha handling - it's the evil inline asm that still uses that
variable, so it's not truely write-only.
13 years ago
Ronald S. Bultje
1bab6f852c
swscale: make access to filter data conditional on filter type.
Prevents crashes on 1-tap filter (unscaled). Also rename "bguf" argument
to "vbuf", seems that was a typo.
13 years ago
Ronald S. Bultje
a95e3fa90b
swscale: update context offsets after removal of AlpMmxFilter.
13 years ago
Diego Biurrun
9d03cb9fc5
swscale: Remove some write-only variables related to alpha handling.
13 years ago
Ronald S. Bultje
771bab7f57
swscale: fix crashes in yuv2yuvX on x86-32.
They were introduced in an earlier commit that introduced use of named
arguments. One cause was a typo, a second cause appears to be a bug in
x264asm that I work around by not using named arguments.
13 years ago
Ronald S. Bultje
3e23badd83
swscale: convert yuv2yuvX() to using named arguments.
13 years ago
Ronald S. Bultje
8c433d8a03
swscale: rename "dstw" to "w" to prevent name collisions.
"dstw" can collide with the word-version of the "dst" argument, causing
all kind of weird stuff down the pipe.
13 years ago
Ronald S. Bultje
ef66a0ed2e
swscale: use named registers in yuv2yuv1_plane() place.
Most of the function had been converted before, but I forgot this
particular location.
13 years ago
Ronald S. Bultje
783487ae44
swscale: sign-extend integer function argument to qword on x86-64.
13 years ago
Ronald S. Bultje
ef1c785f11
swscale: make yuv2yuv1 use named registers.
13 years ago
Ronald S. Bultje
b7542dd3d7
swscale: fix V plane memory location in bilinear/unscaled RGB/YUYV case.
Fixes bug 221.
CC: libav-stable@libav.org
13 years ago
Ronald S. Bultje
7e4d9d5d45
win64: add a XMM clobber test configure option.
This will be useful to test more aggressively for failures to mark XMM
registers as clobbered in Win64 builds, and prevent regressions thereof.
Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
13 years ago
Ronald S. Bultje
de53b9068a
swscale: implement MMX, SSE2 and AVX functions for RGB32 input.
13 years ago
Ronald S. Bultje
378c5ef9ae
swscale: enable dithering in MMX functions.
This was accidently disabled.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
13 years ago
Ronald S. Bultje
212f161caa
swscale: make rgb24 function macros slightly smaller.
13 years ago
Ronald S. Bultje
b5d08c27c3
swscale: convert rgb/bgr24ToY/UV_mmx functions from inline asm to yasm.
Also implement sse2/ssse3/avx versions.
13 years ago
Ronald S. Bultje
3b15a6d742
config.asm: change %ifdef directives to %if directives.
This allows combining multiple conditionals in a single statement.
13 years ago
Ronald S. Bultje
3c172a4106
swscale: change yuv2yuvX code to use cpuflag().
13 years ago
Ronald S. Bultje
b14fa5572c
swscale: fix crash in fast_bilinear code when compiled with -mred-zone.
Additional comments from Måns Rullgard have been integrated
by Reinhard Tartler.
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
13 years ago
Oka Motofumi
cd44521625
swscale: specify register type.
Fixes a compilation failure on win64.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
13 years ago
Ronald S. Bultje
2170a0e6ad
swscale: convert yuy2/uyvy/nv12/nv21ToY/UV from inline asm to yasm.
Also implement SSE2/AVX variants.
13 years ago
Ronald S. Bultje
6ea64339c5
swscale: split scale.asm.
scale.asm keeps horizontal scaling functions, whereas output.asm gets
the vertical scaling/output functions.
13 years ago
Diego Biurrun
3c62a71486
swscale_mmx: drop no longer required parameters from VSCALEX macros
13 years ago
Diego Biurrun
52de07e1f1
swscale: Mark yuv2planeX_8_mmx as MMX2; it contains MMX2 instructions.
13 years ago
Mans Rullgard
373211d828
Remove extraneous semicolons
These semicolons cause invalid empty top-level declarations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Ronald S. Bultje
8283f90a52
swscale: handle unaligned buffers in yuv2plane1
The issue had been introduced in
c435653627
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
13 years ago
Ronald S. Bultje
c435653627
swscale: write yuv2plane1 MMX/SSE2/SSE4/AVX functions.
13 years ago
Ronald S. Bultje
9e66b892e8
swscale: add missing colons to x86 assembly yuv2planeX.
This fixes assembling using "nasm".
13 years ago
Ronald S. Bultje
6cacecdca3
swscale: make yuv2yuvX_10_sse2/avx 8/9/16-bits aware.
Also implement MMX/MMX2 versions and SSE4 versions.
13 years ago
Kieran Kunhya
7fbbf95293
yuv2planeX10 SIMD
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
13 years ago
Kieran Kunhya
34e8d147b3
Split out yuv2yuv1 luma and chroma in order to make them generic DSP functions
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
13 years ago
Ronald S. Bultje
6aa3cac6bf
swscale: use aligned move for storage into temporary buffer.
The intermediate buffer is always aligned.
13 years ago
Ronald S. Bultje
e0c3e07387
sws: implement MMX/SSE2/SSSE3/SSE4 versions for horizontal scaling.
Speed: from 3.9x to 9.6x speed improvement over C, and some small
(up to 15%) speed improvements over existing MMX code (particularly
for bigger filters).
13 years ago
Ronald S. Bultje
3f04ab4fcd
swscale: split hScale() function pointer into h[cy]Scale().
This allows using more specific implementations for chroma/luma, e.g.
we can make assumptions on filterSize being constant, thus avoiding
that test at runtime.
14 years ago
Ronald S. Bultje
28c1115a91
swscale: use 15-bit intermediates for 9/10-bit scaling.
14 years ago
Ronald S. Bultje
5c391a161a
swscale: rename uv_off/uv_off2 to uv_off_px/byte.
14 years ago
Ronald S. Bultje
4e3e333a79
swscale: error dithering for 16/9/10-bit to 8-bit.
Based on a somewhat similar idea in FFmpeg's swscale copy.
14 years ago
Ronald S. Bultje
42d622fab3
swscale: fix 16-bit scaling when output is 8-bits.
We would use the second half of the U plane buffer, rather than the
V plane buffer, to output the V plane pixels.
14 years ago
Ronald S. Bultje
8a8d0ce208
swscale: for >8bit scaling, read in native bit-depth.
For 9/10bit, it means we don't have to upscale to 16bit before
actual scaling or pixel format conversion, and thus a performance
gain.
14 years ago
Ronald S. Bultje
ef1ee362b3
swscale: implement >8bit scaling support.
This means that precision is retained when scaling between sample
formats with >8 bits per component (48bit RGB, 16bit grayscale,
9/10/16bit YUV).
14 years ago
Ronald S. Bultje
13a099799e
swscale: change prototypes of scaled YUV output functions.
Remove unused variables "flags" and "dstFormat" in yuv2packed1,
merge source rows per plane for yuv2packed[12], and make every
source argument int16_t (some where invalidly set to uint16_t).
This prevents stack pollution and is part of the Great Evil Plan
to simplify swscale.
14 years ago