This reverts commit f8bed30d8b. The reason
for this is that the overlap filter, which runs after IDCT, should run
on unclamped values, and thus IDCT and put_pixels() cannot be merged if
we want to attempt to be bitexact.
Merging these functions allows merging some loops, which makes the
results (particularly after SIMD optimizations) much faster.
(cherry picked from commit f8bed30d8b)