This fixes errors like this when building non-pic binaries with armv6
as baseline:
Error: invalid literal constant: pool needs to be closer
Signed-off-by: Martin Storsjö <martin@martin.st>
Otherwise it can be non-zero next time decode_lowdelay is called, causing
slice_params_buf not to be allocated, leading to a NULL pointer dereference.
The problem was introduced in commit
dcad4677d6.
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
This fixes a double-free detected by AddressSanitizer.
The problem was introduced in commit
dcad4677d6.
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
"Vidvox Hap", not "Vidvox Hap encoder" or "Vidvox Hap decoder". Fixes
bad name in "ffmpeg -codecs", matches other codec naming.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2-channels convolution using complex fft
improves speed significantly
not sure if it should be enabled by default
so disable it by default
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
This is in the same the same vein as c981b1145a.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Takes a frame associated with a hardware context as input and maps it
to something else (another hardware frame or normal memory) for other
processing. If the frame to map was originally in the target format
(but mapped to something else), the original frame is output.
Also supports mapping backwards, where only the output has a hardware
context. The link immediately before will be supplied with mapped
hardware frames which it can write directly into, and this filter
then unmaps them back to the actual hardware frames.
Adds the new av_hwframe_map() function, which allows mapping between
hardware frames and normal memory, along with internal support for
implementing it.
Also adds av_hwframe_ctx_create_derived(), for creating a hardware
frames context associated with one device using frames mapped from
another by some hardware-specific means.
this is somewhat a magic number, which can be understood from reading section
"7.1.2 Exponent Strategy" of the ac3 specification, in short:
Three exponents each represented as number 0-4 are grouped together and
base-5 encoded, so the maximal correct value is 25*4 + 5*4 + 4 = 124.
Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The framework will allocate a buffer and copy the data to it,
that takes time. But it avoids constently creating and
destroyng the shared memory segment, and that saves more time.
On my setup,
from ~200 to ~300 FPS at full screen (1920×1200),
from ~1400 to ~3300 at smaller size (640×480),
similar to legacy x11grab and confirmed by others.
Plus, shared memory segments are a scarce resource,
allocating potentially many is a bad idea.
Note: if the application were to drop all references to the
buffer before the next call to av_read_frame(), then passing
the shared memory segment as a refcounted buffer would be
even more efficient, but it is hard to guarantee, and it does
not happen with the ffmpeg command-line tool. Using a small
number of preallocated buffers and resorting to a copy when
the pool is exhausted would be a solution to get the better
of both worlds.
According to spec ISO_IEC_15444_12 "For any media stream for which no segment index is present, referred to as non‐indexed stream, the media stream associated with the first Segment Index box in the segment serves as a reference stream in a sense that it also describes the subsegments for any non‐indexed media stream."
Signed-off-by: Sasi Inguva <isasi@google.com>
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration
libavcodec/dnxhdenc.c(326) : warning C4028: formal parameter 1 different from declaration
libavcodec/dnxhdenc.c(329) : warning C4028: formal parameter 1 different from declaration
libavcodec/pixblockdsp.c(58) : warning C4028: formal parameter 1 different from declaration
libavcodec/pixblockdsp.c(63) : warning C4028: formal parameter 1 different from declaration
libavcodec/pixblockdsp.c(66) : warning C4028: formal parameter 1 different from declaration
libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 1 different from declaration
libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 2 different from declaration
Avoids a forward-declaration in the following commit.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The include of config.h was added in 2012 in 1d9c2dc8, due to
the use of CONFIG_SNOW_ENCODER ifdefs within options_table.h.
When the snow codec was dropped later (in a0c5917f8 in 2013),
this include no longer served any purpose.
options_table.h is included in builds for the host as well, when
building documentation. config.h should not be included in code
that is built for the host, since it can contain workarounds
for the target compiler/environment, like adding a missing define
of restrict, defining getenv(x) to NULL for environments that lack
getenv.
The seemingly innocent include reordering in 2025d37871 broke
builds that have getenv(x) defined to NULL in config.h (Windows CE
and Windows Phone/RT), since libavcodec/options_table.h include
config.h, while libavformat/options_table.h end up bringing in
more system headers, and those system headers can contain a proper
definition of getenv, which clash with the getenv define in config.h.
This was avoided earlier as long as libavformat/options_table.h (or
avformat.h) was included before libavcodec/options_table.h.
This fixes builds for Windows Phone/RT and CE.
Signed-off-by: Martin Storsjö <martin@martin.st>
This work is sponsored by, and copyright, Google.
The filter coefficients are signed values, where the product of the
multiplication with one individual filter coefficient doesn't
overflow a 16 bit signed value (the largest filter coefficient is
127). But when the products are accumulated, the resulting sum can
overflow the 16 bit signed range. Instead of accumulating in 32 bit,
we accumulate the largest product (either index 3 or 4) last with a
saturated addition.
(The VP8 MC asm does something similar, but slightly simpler, by
accumulating each half of the filter separately. In the VP9 MC
filters, each half of the filter can also overflow though, so the
largest component has to be handled individually.)
Examples of relative speedup compared to the C version, from checkasm:
Cortex A7 A8 A9 A53
vp9_avg4_neon: 1.71 1.15 1.42 1.49
vp9_avg8_neon: 2.51 3.63 3.14 2.58
vp9_avg16_neon: 2.95 6.76 3.01 2.84
vp9_avg32_neon: 3.29 6.64 2.85 3.00
vp9_avg64_neon: 3.47 6.67 3.14 2.80
vp9_avg_8tap_smooth_4h_neon: 3.22 4.73 2.76 4.67
vp9_avg_8tap_smooth_4hv_neon: 3.67 4.76 3.28 4.71
vp9_avg_8tap_smooth_4v_neon: 5.52 7.60 4.60 6.31
vp9_avg_8tap_smooth_8h_neon: 6.22 9.04 5.12 9.32
vp9_avg_8tap_smooth_8hv_neon: 6.38 8.21 5.72 8.17
vp9_avg_8tap_smooth_8v_neon: 9.22 12.66 8.15 11.10
vp9_avg_8tap_smooth_64h_neon: 7.02 10.23 5.54 11.58
vp9_avg_8tap_smooth_64hv_neon: 6.76 9.46 5.93 9.40
vp9_avg_8tap_smooth_64v_neon: 10.76 14.13 9.46 13.37
vp9_put4_neon: 1.11 1.47 1.00 1.21
vp9_put8_neon: 1.23 2.17 1.94 1.48
vp9_put16_neon: 1.63 4.02 1.73 1.97
vp9_put32_neon: 1.56 4.92 2.00 1.96
vp9_put64_neon: 2.10 5.28 2.03 2.35
vp9_put_8tap_smooth_4h_neon: 3.11 4.35 2.63 4.35
vp9_put_8tap_smooth_4hv_neon: 3.67 4.69 3.25 4.71
vp9_put_8tap_smooth_4v_neon: 5.45 7.27 4.49 6.52
vp9_put_8tap_smooth_8h_neon: 5.97 8.18 4.81 8.56
vp9_put_8tap_smooth_8hv_neon: 6.39 7.90 5.64 8.15
vp9_put_8tap_smooth_8v_neon: 9.03 11.84 8.07 11.51
vp9_put_8tap_smooth_64h_neon: 6.78 9.48 4.88 10.89
vp9_put_8tap_smooth_64hv_neon: 6.99 8.87 5.94 9.56
vp9_put_8tap_smooth_64v_neon: 10.69 13.30 9.43 14.34
For the larger 8tap filters, the speedup vs C code is around 5-14x.
This is significantly faster than libvpx's implementation of the same
functions, at least when comparing the put_8tap_smooth_64 functions
(compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from
libvpx).
Absolute runtimes from checkasm:
Cortex A7 A8 A9 A53
vp9_put_8tap_smooth_64h_neon: 20150.3 14489.4 19733.6 10863.7
libvpx vpx_convolve8_horiz_neon: 52623.3 19736.4 21907.7 25027.7
vp9_put_8tap_smooth_64v_neon: 14455.0 12303.9 13746.4 9628.9
libvpx vpx_convolve8_vert_neon: 42090.0 17706.2 17659.9 16941.2
Thus, on the A9, the horizontal filter is only marginally faster than
libvpx, while our version is significantly faster on the other cores,
and the vertical filter is significantly faster on all cores. The
difference is especially large on the A7.
The libvpx implementation does the accumulation in 32 bit, which
probably explains most of the differences.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes it match the pattern already used for VP8 MC functions.
This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.
Signed-off-by: Martin Storsjö <martin@martin.st>
This was broken by the following Libav commit:
4c387c7 ppc: dsputil: do unaligned block accesses correctly
The following tests fail due to this:
fate-checkasm
fate-vsynth1-dnxhd-2k-hr-hq fate-vsynth1-dnxhd-edge1-hr
fate-vsynth1-dnxhd-edge2-hr fate-vsynth1-dnxhd-edge3-hr
fate-vsynth1-dnxhd-hr-sq-mov fate-vsynth1-dnxhd-hr-hq-mov
fate-vsynth2-dnxhd-2k-hr-hq fate-vsynth2-dnxhd-edge1-hr
fate-vsynth2-dnxhd-edge2-hr fate-vsynth2-dnxhd-edge3-hr
fate-vsynth2-dnxhd-hr-sq-mov fate-vsynth2-dnxhd-hr-hq-mov
fate-vsynth3-dnxhd-2k-hr-hq fate-vsynth3-dnxhd-edge1-hr
fate-vsynth3-dnxhd-edge2-hr fate-vsynth3-dnxhd-edge3-hr
fate-vsynth3-dnxhd-hr-sq-mov fate-vsynth3-dnxhd-hr-hq-mov
Fixes trac ticket #5508.
Reviewed-by: Carl Eugen Hoyos <ceffmpeg@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
The parser depends on the codec and thus must not be used with a different one.
If it is, the 'avctx->codec_id == s->parser->codec_ids[0] ...' assert in
av_parser_parse2 gets triggered.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
The old code had to retain a partial frame across two calls in
the case of separate interlaced fields. Now, we know that we'll
get both fields within the same receive_frame call, and so we
don't need to manage the frame as private state any more.
It's not possible to return EAGAIN when we've passed input EOF and are
in draining mode. If do return EAGAIN, we're saying there's no way to
get any more output - which isn't true in many cases.
So let's handled these cases in an internal loop as best we can.
It seems that without all the other 1:1 heuristics, we don't have
a fundamental problem trusting the interlaced flag on output
pictures. That's a relief.
I'm not sure why, but the mpeg4_unpack_bframes bsf is not
interacting well with seeking. Looking at the code, it should be
ok, with possibly one warning shown, but I see it getting stuck
for an extended period of time after a seek where a packed frame
is cached to be shown later.
So, I gave up on that and went back to making the old hardware
based path work. Turns out that it wasn't broken except that some
samples have a 6 byte drop packet which I wasn't accounting for.
Now it works again and seeks are good.
The new decode API allows for m:n decode patterns, which is what
you need to use this hardware in a sane way. There are so many
situations where 1:1 doesn't happen naturally that it's a miracle
I got it working as well as I did.
With this change, we can throw all of the crazy heuristics and
sleeps(!) out, and things work correctly.
Why on earth the hardware returns garbage for the first sample of
a decoded picture is anyone's guess. The simplest reasonable way
to patch it up is to copy the first sample of the second line. This
should result in the correct chroma values (because the data was
original 4:2:0 upsampled to 4:2:2) even if the luma is isn't.
Also adds a new flag to mark filters which are aware of hwframes and
will perform this task themselves, and marks all appropriate filters
with this flag.
This is required to allow software-mapped hardware frames to work,
because we need to have the frames context available for any later
mapping operation in the filter graph.
The output from the filter graph should only propagate further to an
encoder if the hardware format actually matches the visible format
(mapped frames are valid here and have an hw_frames_ctx, but this
should not be given to the encoder as its hardware context).
This avoids potential rounding errors and guarantees the source aspect
ratio is preserved.
Keep writing pixel values when Stereo 3D Mode is enabled and for WebM,
as the format doesn't support anything else.
This fixes ticket #5743, implementing the suggestion from ticket #5903.
Signed-off-by: James Almer <jamrial@gmail.com>
This avoids continuity check failures in concatenated streams
Reviewed-by: Steven Liu <lingjiujianke@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>