FFmpeg

Commit Graph

Author	SHA1	Message	Date
Martin Storsjö	8b11a89c06	aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. This is cherrypicked from libav commits `cad42fadcd` and `a0c443a398`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Martin Storsjö	388f6e6715	arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. This is cherrypicked from libav commit `9c8bc74c2b`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Martin Storsjö	ecd343aa1f	arm: vp9itxfm: Only reload the idct coeffs for the iadst_idct combination This avoids reloading them if they haven't been clobbered, if the first pass also was idct. This is similar to what was done in the aarch64 version. This is cherrypicked from libav commit `3c87039a40`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Martin Storsjö	37cb224e3e	aarch64: vp9itxfm: Don't repeatedly set x9 when nothing overwrites it This is cherrypicked from libav commit `2f99117f6f`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Martin Storsjö	f69dd26df5	arm: vp9itxfm: Rename a macro parameter to fit better Since the same parameter is used for both input and output, the name inout is more fitting. This matches the naming used below in the dmbutterfly macro. This is cherrypicked from libav commit `79566ec8c7`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Martin Storsjö	4a5874ea8d	arm/aarch64: vp9itxfm: Fix indentation of macro arguments This is cherrypicked from libav commit `721bc37522`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Martin Storsjö	a95e7de41d	aarch64: vp9itxfm: Use w3 instead of x3 for the int eob parameter The clobbering tests in checkasm are only invoked when testing correctness, so this bug didn't show up when benchmarking the dc-only version. This is cherrypicked from libav commit `4d960a1185`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Janne Grunau	a71cd8439f	arm: vp9itxfm: Simplify the stack alignment code This is one instruction less for thumb, and only have got 1/2 arm/thumb specific instructions. This is cherrypicked from libav commit `e5b0fc170f`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Janne Grunau	cb220eeef9	aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne}; The latter is 1 cycle faster on a cortex-53 and since the operands are bytewise (or larger) bitmask (impossible to overflow to zero) both are equivalent. This is cherrypicked from libav commit `e7ae8f7a71`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Janne Grunau	62ea07d797	aarch64: vp9: use alternative returns in the core loop filter function Since aarch64 has enough free general purpose registers use them to branch to the appropiate storage code. 1-2 cycles faster for the functions using loop_filter 8/16, ... on a cortex-a53. Mixed results (up to 2 cycles faster/slower) on a cortex-a57. This is cherrypicked from libav commit `d7595de0b2`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Michael Bradshaw	3ac46a0a62	ffmpeg: Add -time_base option to hint the time base Signed-off-by: Michael Bradshaw <mjbshaw@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Paul B Mahol	743052ec5b	avcodec/cinepakenc: remove CVID from long description Signed-off-by: Paul B Mahol <onemda@gmail.com>	9 years ago
Carl Eugen Hoyos	935404923d	Cosmetics: Reindent after last commit.	9 years ago
Carl Eugen Hoyos	c723108e25	lavf/matroskaenc: Do not write two CodecID elements for rawvideo. Fixes ticket #6068.	9 years ago
Martin Vignali	1412e5a004	fate/psd : add test for bitmap and duotone The duotone file is interpreted as gray Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Martin Vignali	31e722e9da	libavcodec/psd : add test for channel depth/channel count in bitmap mode Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Matthieu Bouron	e109c54a69	swresample/arm: cosmetic fixes	9 years ago
Matthieu Bouron	0265aec565	swresample/aarch64: add ff_resample_common_apply_filter_{x4,x8}_{float,s16}_neon	9 years ago
Paul B Mahol	2eaee6e79b	avcodec/qdrw: skip long comment for now Fixes part of #5918. Signed-off-by: Paul B Mahol <onemda@gmail.com>	9 years ago
Steinar H. Gunderson	d68d7198be	speedhq: Align blocks variable properly. Seemingly ff_clear_block_sse assumed that the block array is aligned, so make sure it is. Fixes ticket #6079 Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Alexandra Hájková	4795e4f61f	alac: Convert to the new bitstream reader	10 years ago
Alexandra Hájková	b1e7394ea0	rtp: Convert to the new bitstream reader	9 years ago
Alexandra Hájková	a895292f27	mov: Convert to the new bitstream reader	9 years ago
Luca Barbato	44129e3804	avconv: Do not pass NULL to avio_tell The null demuxer does not have a backing AVIOContext.	9 years ago
Luca Barbato	f8f7ad758d	qsv: Set the correct range for la_depth Setting an invalid range for it makes the encoder behave inconsistently.	9 years ago
James Almer	6596b34954	avcodec/lossless_videodsp: add missing call to ff_llviddsp_init_ppc() Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	6d4c9f2ade	lossless_videodsp: rename add_hfyu_left_pred_int16 to add_left_pred_int16 Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	47f212329e	huffyuvdsp: move functions only used by huffyuv from lossless_videodsp Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	cf9ef83960	huffyuvencdsp: move shared functions to a new lossless_videoencdsp context Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	30c1f27299	huffyuvencdsp: move functions only used by huffyuv from lossless_videodsp Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	5ac1dd8e23	lossless_videodsp: move shared functions from huffyuvdsp Several codecs other than huffyuv use them. Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Steven Liu	3222786c5a	avformat/hlsenc: refine the hlsenc code because the oc have been potint to hls->avf or hls->vtt_avf here is not needed point once again Signed-off-by: Steven Liu <lq@chinaffmpeg.org>	9 years ago
Steven Liu	b97e9cba0b	avformat/hlsenc: fix hlsenc bug at windows system when hlsenc use flag second_level_segment_index, second_level_segment_size and second_level_segment_duration, the rename is ok but the output filename always use the old filename so move the rename operation after the close the ts file and before open new segment Reported-by: Christian Johannesen <chrisjohannesen@gmail.com> Reviewed-by: Bodecs Bela <bodecsb@vivanet.hu> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>	9 years ago
Steven Liu	aa7982577c	cmdutils_opencl: fix resource_leak cid 1396852 CID: 1396852 check the devices_list alloc status, and release the devices_list when alloc devices error Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>	9 years ago
Thomas Turner	08fdf965c9	avutil/tests/audio_fifo.c: pass by reference for efficiency and change datatype to const Signed-off-by: Thomas Turner <thomastdt@googlemail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
James Almer	1d4d0ee4b0	avutil/reverse: move the ff_reverse declaration to a separate header Fixes compilation with hardcoded tables after `eaff1aa09e` and `e71b8119e7` Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Carl Eugen Hoyos	2f94b305ac	lavf/mxf: Add a universal label for ProRes used in FCP. Fixes ticket #6075.	9 years ago
Sergey Kudryashov	a9b33b5a37	libavfilter/af_biquads: warn about clipping only after frame with clipping	9 years ago
Anton Khirnov	1202b71269	theora: export cropping information instead of handling it internally	9 years ago
Anton Khirnov	c3e84820d6	h264dec: export cropping information instead of handling it internally	9 years ago
Anton Khirnov	4fded0480f	h264dec: be more explicit in handling container cropping The current condition can trigger in cases where it shouldn't, with unexpected results. Make sure that: - container cropping is really based on the original dimensions from the caller - those dimenions are discarded on size change The code is still quite hacky and eventually should be deprecated and removed, with the decision about which cropping is used delegated to the caller.	9 years ago
Anton Khirnov	a02ae1c683	hevcdec: export cropping information instead of handling it internally	9 years ago
Anton Khirnov	019ab88a95	lavc: add an option for exporting cropping information to the caller Also, add generic code for handling cropping, so the decoders can export just the cropping size and not bother with the rest.	9 years ago
Anton Khirnov	52627248e4	frame: add a cropping rectangle to AVFrame Extend the width/height doxy to clarify that it should store coded values.	9 years ago
Anton Khirnov	b68e353136	qsvdec: do not sync PIX_FMT_QSV surfaces Introducing enforced sync points in arbitrary places is bad for performance. Since the vast majority of receiving code (QSV VPP or encoders, retrieving frames through hwcontext) will do the syncing, this change should not be visible to most callers. But bumping micro just in case. This is also consistent with what VAAPI hwaccel does.	9 years ago
Steve Lhomme	ac3c3ee678	dxva2: allow an empty array of ID3D11VideoDecoderOutputView We can pick the correct slice index directly from the ID3D11VideoDecoderOutputView casted from data[3]. Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Steve Lhomme	f67235a28c	dxva2: get the slice number directly from the surface in D3D11VA No need to loop through the known surfaces, we'll use the requested surface anyway. The loop is only done for DXVA2. Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Nicolas George	f7191ccad6	lavfi: remove stray semicolons. Hopefully fix compilation with suncc.	9 years ago
Carl Eugen Hoyos	f31bac596f	lavf/dss: Do not fail randomly if dss_sp input contains 0xff. Fixes decoding the sample from ticket #6072 with ffmpeg.	9 years ago
Nicolas George	aaae459a85	lavfi: reindent after previous commit.	9 years ago

... 62 63 64 65 66 ...

87105 Commits (f8d7b5febba075035a94de5d7d1dc9083ad2f3ed) All Branches Search

87105 Commits (f8d7b5febba075035a94de5d7d1dc9083ad2f3ed)

All Branches