The loops are guaranteed to be at least multiples of 8, so this unrolling is safe but allows exploiting execution ports. For int32 version: 68 -> 58c. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>