You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

511 lines
16KB

  1. =============================================
  2. SNOW Video Codec Specification Draft 20070103
  3. =============================================
  4. Intro:
  5. ======
  6. This Specification describes the snow syntax and semmantics as well as
  7. how to decode snow.
  8. The decoding process is precissely described and any compliant decoder
  9. MUST produce the exactly same output for a spec conformant snow stream.
  10. For encoding though any process which generates a stream compliant to
  11. the syntactical and semmantical requirements and which is decodeable by
  12. the process described in this spec shall be considered a conformant
  13. snow encoder.
  14. Definitions:
  15. ============
  16. MUST the specific part must be done to conform to this standard
  17. SHOULD it is recommended to be done that way, but not strictly required
  18. ilog2(x) is the rounded down logarithm of x with basis 2
  19. ilog2(0) = 0
  20. Type definitions:
  21. =================
  22. b 1-bit range coded
  23. u unsigned scalar value range coded
  24. s signed scalar value range coded
  25. Bitstream syntax:
  26. =================
  27. frame:
  28. header
  29. prediction
  30. residual
  31. header:
  32. keyframe b MID_STATE
  33. if(keyframe || always_reset)
  34. reset_contexts
  35. if(keyframe){
  36. version u header_state
  37. always_reset b header_state
  38. temporal_decomposition_type u header_state
  39. temporal_decomposition_count u header_state
  40. spatial_decomposition_count u header_state
  41. colorspace_type u header_state
  42. chroma_h_shift u header_state
  43. chroma_v_shift u header_state
  44. spatial_scalability b header_state
  45. max_ref_frames-1 u header_state
  46. qlogs
  47. }
  48. if(!keyframe){
  49. if(!always_reset)
  50. update_mc b header_state
  51. if(always_reset || update_mc){
  52. for(plane=0; plane<2; plane++){
  53. diag_mc b header_state
  54. htaps/2-1 u header_state
  55. for(i= p->htaps/2; i; i--)
  56. |hcoeff[i]| u header_state
  57. }
  58. }
  59. }
  60. spatial_decomposition_type s header_state
  61. qlog s header_state
  62. mv_scale s header_state
  63. qbias s header_state
  64. block_max_depth s header_state
  65. qlogs:
  66. for(plane=0; plane<2; plane++){
  67. quant_table[plane][0][0] s header_state
  68. for(level=0; level < spatial_decomposition_count; level++){
  69. quant_table[plane][level][1]s header_state
  70. quant_table[plane][level][3]s header_state
  71. }
  72. }
  73. reset_contexts
  74. *_state[*]= MID_STATE
  75. prediction:
  76. for(y=0; y<block_count_vertical; y++)
  77. for(x=0; x<block_count_horizontal; x++)
  78. block(0)
  79. block(level):
  80. if(keyframe){
  81. intra=1
  82. y_diff=cb_diff=cr_diff=0
  83. }else{
  84. if(level!=max_block_depth){
  85. s_context= 2*left->level + 2*top->level + topleft->level + topright->level
  86. leaf b block_state[4 + s_context]
  87. }
  88. if(level==max_block_depth || leaf){
  89. intra b block_state[1 + left->intra + top->intra]
  90. if(intra){
  91. y_diff s block_state[32]
  92. cb_diff s block_state[64]
  93. cr_diff s block_state[96]
  94. }else{
  95. ref_context= ilog2(2*left->ref) + ilog2(2*top->ref)
  96. if(ref_frames > 1)
  97. ref u block_state[128 + 1024 + 32*ref_context]
  98. mx_context= ilog2(2*abs(left->mx - top->mx))
  99. my_context= ilog2(2*abs(left->my - top->my))
  100. mvx_diff s block_state[128 + 32*(mx_context + 16*!!ref)]
  101. mvy_diff s block_state[128 + 32*(my_context + 16*!!ref)]
  102. }
  103. }else{
  104. block(level+1)
  105. block(level+1)
  106. block(level+1)
  107. block(level+1)
  108. }
  109. }
  110. residual:
  111. FIXME
  112. Tag description:
  113. ----------------
  114. version
  115. 0
  116. this MUST NOT change within a bitstream
  117. always_reset
  118. if 1 then the range coder contexts will be reset after each frame
  119. temporal_decomposition_type
  120. 0
  121. temporal_decomposition_count
  122. 0
  123. spatial_decomposition_count
  124. FIXME
  125. colorspace_type
  126. 0
  127. this MUST NOT change within a bitstream
  128. chroma_h_shift
  129. log2(luma.width / chroma.width)
  130. this MUST NOT change within a bitstream
  131. chroma_v_shift
  132. log2(luma.height / chroma.height)
  133. this MUST NOT change within a bitstream
  134. spatial_scalability
  135. 0
  136. max_ref_frames
  137. maximum number of reference frames
  138. this MUST NOT change within a bitstream
  139. update_mc
  140. indicates that motion compensation filter parameters are stored in the
  141. header
  142. diag_mc
  143. flag to enable faster diagonal interpolation
  144. this SHOULD be 1 unless it turns out to be covered by a valid patent
  145. htaps
  146. number of half pel interpolation filter taps, MUST be even, >0 and <10
  147. hcoeff
  148. half pel interpolation filter coefficients, hcoeff[0] are the 2 middle
  149. coefficients [1] are the next outer ones and so on, resulting in a filter
  150. like: ...eff[2], hcoeff[1], hcoeff[0], hcoeff[0], hcoeff[1], hcoeff[2] ...
  151. the sign of the coefficients is not explicitly stored but alternates
  152. after each coeff and coeff[0] is positive, so ...,+,-,+,-,+,+,-,+,-,+,...
  153. hcoeff[0] is not explicitly stored but found by subtracting the sum
  154. of all stored coefficients with signs from 32
  155. hcoeff[0]= 32 - hcoeff[1] - hcoeff[2] - ...
  156. a good choice for hcoeff and htaps is
  157. htaps= 6
  158. hcoeff={40,-10,2}
  159. an alternative which requires more computations at both encoder and
  160. decoder side and may or may not be better is
  161. htaps= 8
  162. hcoeff={42,-14,6,-2}
  163. ref_frames
  164. minimum of the number of available reference frames and max_ref_frames
  165. for example the first frame after a key frame always has ref_frames=1
  166. spatial_decomposition_type
  167. wavelet type
  168. 0 is a 9/7 symmetric compact integer wavelet
  169. 1 is a 5/3 symmetric compact integer wavelet
  170. others are reserved
  171. stored as delta from last, last is reset to 0 if always_reset || keyframe
  172. qlog
  173. quality (logarthmic quantizer scale)
  174. stored as delta from last, last is reset to 0 if always_reset || keyframe
  175. mv_scale
  176. stored as delta from last, last is reset to 0 if always_reset || keyframe
  177. FIXME check that everything works fine if this changes between frames
  178. qbias
  179. dequantization bias
  180. stored as delta from last, last is reset to 0 if always_reset || keyframe
  181. block_max_depth
  182. maximum depth of the block tree
  183. stored as delta from last, last is reset to 0 if always_reset || keyframe
  184. quant_table
  185. quantiztation table
  186. Range Coder:
  187. ============
  188. FIXME
  189. Neighboring Blocks:
  190. ===================
  191. left and top are set to the respective blocks unless they are outside of
  192. the image in which case they are set to the Null block
  193. top-left is set to the top left block unless it is outside of the image in
  194. which case it is set to the left block
  195. if this block has no larger parent block or it is at the left side of its
  196. parent block and the top right block is not outside of the image then the
  197. top right block is used for top-right else the top-left block is used
  198. Null block
  199. y,cb,cr are 128
  200. level, ref, mx and my are 0
  201. Motion Vector Prediction:
  202. =========================
  203. 1. the motion vectors of all the neighboring blocks are scaled to
  204. compensate for the difference of reference frames
  205. scaled_mv= (mv * (256 * (current_reference+1) / (mv.reference+1)) + 128)>>8
  206. 2. the median of the scaled left, top and top-right vectors is used as
  207. motion vector prediction
  208. 3. the used motion vector is the sum of the predictor and
  209. (mvx_diff, mvy_diff)*mv_scale
  210. Intra DC Predicton:
  211. ======================
  212. the luma and chroma values of the left block are used as predictors
  213. the used luma and chroma is the sum of the predictor and y_diff, cb_diff, cr_diff
  214. to reverse this in the decoder apply the following:
  215. block[y][x].dc[0] += block[y][x-1].dc[0];
  216. block[y][x].dc[1] += block[y][x-1].dc[1];
  217. block[y][x].dc[2] += block[y][x-1].dc[2];
  218. block[*][-1].dc[*]= 128;
  219. Motion Compensation:
  220. ====================
  221. Halfpel interpolation:
  222. ----------------------
  223. halfpel interpolation is done by convolution with the halfpel filter stored
  224. in the header:
  225. horizontal halfpel samples are found by
  226. H1[y][x] = hcoeff[0]*(F[y][x ] + F[y][x+1])
  227. + hcoeff[1]*(F[y][x-1] + F[y][x+2])
  228. + hcoeff[2]*(F[y][x-2] + F[y][x+3])
  229. + ...
  230. h1[y][x] = (H1[y][x] + 32)>>6;
  231. vertical halfpel samples are found by
  232. H2[y][x] = hcoeff[0]*(F[y ][x] + F[y+1][x])
  233. + hcoeff[1]*(F[y-1][x] + F[y+2][x])
  234. + ...
  235. h2[y][x] = (H2[y][x] + 32)>>6;
  236. vertical+horizontal halfpel samples are found by
  237. H3[y][x] = hcoeff[0]*(H2[y][x ] + H2[y][x+1])
  238. + hcoeff[1]*(H2[y][x-1] + H2[y][x+2])
  239. + ...
  240. H3[y][x] = hcoeff[0]*(H1[y ][x] + H1[y+1][x])
  241. + hcoeff[1]*(H1[y+1][x] + H1[y+2][x])
  242. + ...
  243. h3[y][x] = (H3[y][x] + 2048)>>12;
  244. F H1 F
  245. | | |
  246. | | |
  247. | | |
  248. F H1 F
  249. | | |
  250. | | |
  251. | | |
  252. F-------F-------F-> H1<-F-------F-------F
  253. v v v
  254. H2 H3 H2
  255. ^ ^ ^
  256. F-------F-------F-> H1<-F-------F-------F
  257. | | |
  258. | | |
  259. | | |
  260. F H1 F
  261. | | |
  262. | | |
  263. | | |
  264. F H1 F
  265. unavailable fullpel samples (outside the picture for example) shall be equal
  266. to the closest available fullpel sample
  267. Smaller pel interpolation:
  268. --------------------------
  269. if diag_mc is set then points which lie on a line between 2 vertically,
  270. horiziontally or diagonally adjacent halfpel points shall be interpolated
  271. linearls with rounding to nearest and halfway values rounded up.
  272. points which lie on 2 diagonals at the same time should only use the one
  273. diagonal not containing the fullpel point
  274. F-->O---q---O<--h1->O---q---O<--F
  275. v \ / v \ / v
  276. O O O O O O O
  277. | / | \ |
  278. q q q q q
  279. | / | \ |
  280. O O O O O O O
  281. ^ / \ ^ / \ ^
  282. h2-->O---q---O<--h3->O---q---O<--h2
  283. v \ / v \ / v
  284. O O O O O O O
  285. | \ | / |
  286. q q q q q
  287. | \ | / |
  288. O O O O O O O
  289. ^ / \ ^ / \ ^
  290. F-->O---q---O<--h1->O---q---O<--F
  291. the remaining points shall be bilinearly interpolated from the
  292. up to 4 surrounding halfpel and fullpel points, again rounding should be to
  293. nearest and halfway values rounded up
  294. compliant snow decoders MUST support 1-1/8 pel luma and 1/2-1/16 pel chroma
  295. interpolation at least
  296. Overlapped block motion compensation:
  297. -------------------------------------
  298. FIXME
  299. LL band prediction:
  300. ===================
  301. Each sample in the LL0 subband is predicted by the median of the left, top and
  302. left+top-topleft samples, samples outside the subband shall be considered to
  303. be 0. To reverse this prediction in the decoder apply the following.
  304. for(y=0; y<height; y++){
  305. for(x=0; x<width; x++){
  306. sample[y][x] += median(sample[y-1][x],
  307. sample[y][x-1],
  308. sample[y-1][x]+sample[y][x-1]-sample[y-1][x-1]);
  309. }
  310. }
  311. sample[-1][*]=sample[*][-1]= 0;
  312. width,height here are the width and height of the LL0 subband not of the final
  313. video
  314. Dequantizaton:
  315. ==============
  316. FIXME
  317. Wavelet Transform:
  318. ==================
  319. Snow supports 2 wavelet transforms, the symmetric biorthogonal 5/3 integer
  320. transform and a integer approximation of the symmetric biorthogonal 9/7
  321. daubechies wavelet.
  322. 2D IDWT (inverse discrete wavelet transform)
  323. --------------------------------------------
  324. The 2D IDWT applies a 2D filter recursively, each time combining the
  325. 4 lowest frequency subbands into a single subband until only 1 subband
  326. remains.
  327. The 2D filter is done by first applying a 1D filter in the vertical direction
  328. and then applying it in the horizontal one.
  329. --------------- --------------- --------------- ---------------
  330. |LL0|HL0| | | | | | | | | | | |
  331. |---+---| HL1 | | L0|H0 | HL1 | | LL1 | HL1 | | | |
  332. |LH0|HH0| | | | | | | | | | | |
  333. |-------+-------|->|-------+-------|->|-------+-------|->| L1 | H1 |->...
  334. | | | | | | | | | | | |
  335. | LH1 | HH1 | | LH1 | HH1 | | LH1 | HH1 | | | |
  336. | | | | | | | | | | | |
  337. --------------- --------------- --------------- ---------------
  338. 1D Filter:
  339. ----------
  340. 1. interleave the samples of the low and high frequency subbands like
  341. s={L0, H0, L1, H1, L2, H2, L3, H3, ... }
  342. note, this can end with a L or a H, the number of elements shall be w
  343. s[-1] shall be considered equivalent to s[1 ]
  344. s[w ] shall be considered equivalent to s[w-2]
  345. 2. perform the lifting steps in order as described below
  346. 5/3 Integer filter:
  347. 1. s[i] -= (s[i-1] + s[i+1] + 2)>>2; for all even i < w
  348. 2. s[i] += (s[i-1] + s[i+1] )>>1; for all odd i < w
  349. \ | /|\ | /|\ | /|\ | /|\
  350. \|/ | \|/ | \|/ | \|/ |
  351. + | + | + | + | -1/4
  352. /|\ | /|\ | /|\ | /|\ |
  353. / | \|/ | \|/ | \|/ | \|/
  354. | + | + | + | + +1/2
  355. snows 9/7 Integer filter:
  356. 1. s[i] -= (3*(s[i-1] + s[i+1]) + 4)>>3; for all even i < w
  357. 2. s[i] -= s[i-1] + s[i+1] ; for all odd i < w
  358. 3. s[i] += ( s[i-1] + s[i+1] + 4*s[i] + 8)>>4; for all even i < w
  359. 4. s[i] += (3*(s[i-1] + s[i+1]) )>>1; for all odd i < w
  360. \ | /|\ | /|\ | /|\ | /|\
  361. \|/ | \|/ | \|/ | \|/ |
  362. + | + | + | + | -3/8
  363. /|\ | /|\ | /|\ | /|\ |
  364. / | \|/ | \|/ | \|/ | \|/
  365. (| + (| + (| + (| + -1
  366. \ + /|\ + /|\ + /|\ + /|\ +1/4
  367. \|/ | \|/ | \|/ | \|/ |
  368. + | + | + | + | +1/16
  369. /|\ | /|\ | /|\ | /|\ |
  370. / | \|/ | \|/ | \|/ | \|/
  371. | + | + | + | + +3/2
  372. optimization tips:
  373. following are exactly identical
  374. (3a)>>1 == a + (a>>1)
  375. (a + 4b + 8)>>4 == ((a>>2) + b + 2)>>2
  376. 16bit implementation note:
  377. The IDWT can be implemented with 16bits, but this requires some care to
  378. prevent overflows, the following list, lists the minimum number of bits needed
  379. for some terms
  380. 1. lifting step
  381. A= s[i-1] + s[i+1] 16bit
  382. 3*A + 4 18bit
  383. A + (A>>1) + 2 17bit
  384. 3. lifting step
  385. s[i-1] + s[i+1] 17bit
  386. 4. lifiting step
  387. 3*(s[i-1] + s[i+1]) 17bit
  388. TODO:
  389. =====
  390. Important:
  391. finetune initial contexts
  392. spatial_decomposition_count per frame?
  393. flip wavelet?
  394. try to use the wavelet transformed predicted image (motion compensated image) as context for coding the residual coefficients
  395. try the MV length as context for coding the residual coefficients
  396. use extradata for stuff which is in the keyframes now?
  397. the MV median predictor is patented IIRC
  398. change MC so per picture halfpel interpolation can be done and finish the implementation of it
  399. compare the 6 tap and 8 tap hpel filters (psnr/bitrate and subjective quality)
  400. try different range coder state transition tables for different contexts
  401. Not Important:
  402. spatial_scalability b vs u (!= 0 breaks syntax anyway so we can add a u later)
  403. Credits:
  404. ========
  405. Michael Niedermayer
  406. Loren Merritt
  407. Copyright:
  408. ==========
  409. GPL + GFDL + whatever is needed to make this a RFC