Alex Converse
c226fc5bfb
aacenc: Prevent premature termination of the two loop search.
Originally committed as revision 24476 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Alex Converse
81824fe059
aacdec: Only load and write each predictor variable once.
This is slightly faster and opens the door for further optimization.
Originally committed as revision 24475 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Alex Converse
70c99adb48
aacdec: 4% faster main profile decoding.
Originally committed as revision 24474 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Alex Converse
51ffd3a62f
aacenc: Favor log2f() and sqrtf() over log2() and sqrt().
Originally committed as revision 24473 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Alex Converse
04d72abf17
aacenc: Factorize some scalefactor utilities.
Originally committed as revision 24472 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Eli Friedman
3611e7a309
Inline asm for VP56 arith coder
This is a lot more reliable to get cmov rather than trying to trick gcc into
generating it, useful since it's 2% faster overall.
Patch by Eli Friedman <eli.friedman at gmail>
Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
David Conrad
ca18a478e3
VP8: Inline traversing vp8_small_mvtree
Much faster read_mv_component, slightly faster overall
Originally committed as revision 24470 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
David Conrad
7697cdcf95
VP8: Use vp56_rac_get_prob_branchy when the bit is only used by an if()
Originally committed as revision 24469 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
David Conrad
fe1b5d974a
Decode DCT tokens by branching to a different code path for each branch
on the huffman tree, instead of traversing the tree in a while loop.
Based on the similar optimization in libvpx's detokenize.c
10% faster at normal bitrates, and 30% faster for high-bitrate intra-only
Originally committed as revision 24468 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
David Conrad
5474ec2ac8
Move renormalization of the VP56 arith decoder to before decoding a bit
No difference at the moment, but allows a future branchy variant
of vp56_rac_get_prob to be significantly faster
Originally committed as revision 24467 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
David Conrad
b3d755ec8b
Split renorm of vp56 arith decoder to its own function
Originally committed as revision 24466 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
David Conrad
24675b8093
vp56's arith decoder's code_word is only 16 bits, no need for unsigned long
Originally committed as revision 24465 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
13a1304bb3
Add myself to VP8 copyright and maintainers.
Also add Ronald to maintainers.
Originally committed as revision 24464 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
414ac27d8f
VP8: always_inline some things to force gcc to do the right thing
Mostly seems to help in the MC code, which gets a hundred cycles faster.
Originally committed as revision 24463 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
06d50ca804
VP8: use AV_RL24 instead of defining a new RL24.
Originally committed as revision 24462 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
9fddd14a8e
VP8: Slightly faster MV selection
Don't clamp best mv unless it's actually used.
Originally committed as revision 24461 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
14767f35ed
VP8: use AV_ZERO32 instead of AV_WN32A where relevant
Originally committed as revision 24460 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
09959ec46e
VP8: eliminate redundant code in r24458
Originally committed as revision 24459 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
a71abb714e
VP8: shave a few clocks off check_intra_pred_mode
Originally committed as revision 24458 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
0087aa47d0
VP8: fix broken sign bias code in MV pred
Apparently the official conformance test vectors don't test this feature,
even though libvpx uses it.
Originally committed as revision 24456 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
3ae079a3c8
VP8: optimize DC-only chroma case in the same way as luma.
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.
Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
3df56f4118
VP8: Clean up some variable shadowing.
Originally committed as revision 24454 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
51c9156438
VP8 asm: cosmetics (spacing)
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
8a467b2d44
VP8: 30% faster idct_mb
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?
Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
ef38842f0b
VP8: smarter prefetching
Don't prefetch reference frames that were used less than 1/32th of the time so
far in the frame.
This helps speed up to ~2% on videos that, in many frames, make near-zero
(but not entirely zero) use of golden and/or alt-refs.
This is a very common property of videos encoded by libvpx.
Originally committed as revision 24451 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Baptiste Coudurier
9479415e4e
In h264 parser, return immediately if buf_size is 0, avoid printing
erroneous message for last frame.
Originally committed as revision 24450 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
c25c776708
VP8: clear DCT blocks in iDCT instead of using clear_blocks.
~0.3% faster overall.
Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
b74f70d646
VP8: avoid a memset for non-i4x4 blocks with no coefficients
Originally committed as revision 24447 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
145d31865d
Get rid of more unnecessary dereferences in VP8 deblocking
Originally committed as revision 24446 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
867215336d
Shut up an uninitialized variable GCC warning in VP8.
Originally committed as revision 24445 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
c4211046d2
Smarter VP8 prefetching
Prefetch all refs (including altref), but only if they've been used so far this
frame.
~2.5% faster overall.
TODO: Do something even smarter, like using how often each ref has been used
so far, so that a couple blocks of a rarely-used ref don't force us to prefetch
it.
Originally committed as revision 24444 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
8cfae560ad
Fix stupid bug in VP8 prefetching code
Originally committed as revision 24443 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
2a38c2e99a
Eliminate a LUT in escape decoding in VP8 decode_block_coeffs
Originally committed as revision 24441 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
d292c3455e
Eliminate some repeated dereferences in VP8 inter_predict
Originally committed as revision 24438 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Ronald S. Bultje
dc5eec8085
Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on
CPUs supporting it.
Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
James Zern
7eb185e0a3
Map settings for 2-pass libvpx encoding.
Patch by James Zern, jzern at google
Originally committed as revision 24430 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
b946111fde
Eliminate a pointless memset for intra blocks in P-frames in VP8
Originally committed as revision 24429 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
b9a7186bf4
VP8: Don't store segment in macroblock struct anymore.
Not necessary with the previous patch.
Originally committed as revision 24427 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
c55e0d34ba
Convert VP8 macroblock structures to a ring buffer.
Uses a slightly nonintuitive ring buffer size of (width+height*2) to simplify
addressing logic.
Also split out the segmentation map to a separate structure, necessary to
implement the ring buffer.
Originally committed as revision 24426 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
968570d65f
Calculate deblock strength per-MB instead of per-row
Gives better cache locality, since the VP8Macroblock structs are still in cache.
Inspired by the way x264 does it.
Originally committed as revision 24417 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
d1c58fce20
Avoid tracking i4x4 modes in P-frames in VP8
As in the previous commit, they aren't used for context selection, so it saves
memory this way.
Originally committed as revision 24416 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
158e062c95
Avoid useless fill_rectangle in P-frames in VP8
In VP8, i4x4 only uses contexts based on neighbors in I-frames.
Originally committed as revision 24415 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
7bf254c41d
Optimize partition mv decoding in VP8
Originally committed as revision 24414 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
c0498b3031
Take shortcuts for mv0 case in VP8 MC
Avoid edge emulation -- it isn't needed if there isn't any subpel.
Originally committed as revision 24413 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
702e8d3376
Much faster VP8 mv and mode prediction
Originally committed as revision 24412 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
d229ae2b62
Convert vp56_mv to 16-bit.
Saves nothing except a bit of memory/cache now, but will allow future
optimizations.
Originally committed as revision 24411 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Jason Garrett-Glaser
d864dee8ab
Add prefetching to VP8 decoder
~5% faster overall, probably depends on CPU and resolution.
Originally committed as revision 24410 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Ronald S. Bultje
003243c3c2
Fix and enable horizontal >=SSE2 mbedge loopfilter.
Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Loren Merritt
c7b1d9768c
relicense h264 deblock sse2 to lgpl
Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago
Loren Merritt
532e769701
sync yasm macros from x264
Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk
15 years ago