You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

590 lines
19KB

  1. =============================================
  2. SNOW Video Codec Specification Draft 20070103
  3. =============================================
  4. Intro:
  5. ======
  6. This Specification describes the snow syntax and semmantics as well as
  7. how to decode snow.
  8. The decoding process is precissely described and any compliant decoder
  9. MUST produce the exactly same output for a spec conformant snow stream.
  10. For encoding though any process which generates a stream compliant to
  11. the syntactical and semmantical requirements and which is decodeable by
  12. the process described in this spec shall be considered a conformant
  13. snow encoder.
  14. Definitions:
  15. ============
  16. MUST the specific part must be done to conform to this standard
  17. SHOULD it is recommended to be done that way, but not strictly required
  18. ilog2(x) is the rounded down logarithm of x with basis 2
  19. ilog2(0) = 0
  20. Type definitions:
  21. =================
  22. b 1-bit range coded
  23. u unsigned scalar value range coded
  24. s signed scalar value range coded
  25. Bitstream syntax:
  26. =================
  27. frame:
  28. header
  29. prediction
  30. residual
  31. header:
  32. keyframe b MID_STATE
  33. if(keyframe || always_reset)
  34. reset_contexts
  35. if(keyframe){
  36. version u header_state
  37. always_reset b header_state
  38. temporal_decomposition_type u header_state
  39. temporal_decomposition_count u header_state
  40. spatial_decomposition_count u header_state
  41. colorspace_type u header_state
  42. chroma_h_shift u header_state
  43. chroma_v_shift u header_state
  44. spatial_scalability b header_state
  45. max_ref_frames-1 u header_state
  46. qlogs
  47. }
  48. if(!keyframe){
  49. update_mc b header_state
  50. if(update_mc){
  51. for(plane=0; plane<2; plane++){
  52. diag_mc b header_state
  53. htaps/2-1 u header_state
  54. for(i= p->htaps/2; i; i--)
  55. |hcoeff[i]| u header_state
  56. }
  57. }
  58. update_qlogs b header_state
  59. if(update_qlogs){
  60. spatial_decomposition_count u header_state
  61. qlogs
  62. }
  63. }
  64. spatial_decomposition_type s header_state
  65. qlog s header_state
  66. mv_scale s header_state
  67. qbias s header_state
  68. block_max_depth s header_state
  69. qlogs:
  70. for(plane=0; plane<2; plane++){
  71. quant_table[plane][0][0] s header_state
  72. for(level=0; level < spatial_decomposition_count; level++){
  73. quant_table[plane][level][1]s header_state
  74. quant_table[plane][level][3]s header_state
  75. }
  76. }
  77. reset_contexts
  78. *_state[*]= MID_STATE
  79. prediction:
  80. for(y=0; y<block_count_vertical; y++)
  81. for(x=0; x<block_count_horizontal; x++)
  82. block(0)
  83. block(level):
  84. mvx_diff=mvy_diff=y_diff=cb_diff=cr_diff=0
  85. if(keyframe){
  86. intra=1
  87. }else{
  88. if(level!=max_block_depth){
  89. s_context= 2*left->level + 2*top->level + topleft->level + topright->level
  90. leaf b block_state[4 + s_context]
  91. }
  92. if(level==max_block_depth || leaf){
  93. intra b block_state[1 + left->intra + top->intra]
  94. if(intra){
  95. y_diff s block_state[32]
  96. cb_diff s block_state[64]
  97. cr_diff s block_state[96]
  98. }else{
  99. ref_context= ilog2(2*left->ref) + ilog2(2*top->ref)
  100. if(ref_frames > 1)
  101. ref u block_state[128 + 1024 + 32*ref_context]
  102. mx_context= ilog2(2*abs(left->mx - top->mx))
  103. my_context= ilog2(2*abs(left->my - top->my))
  104. mvx_diff s block_state[128 + 32*(mx_context + 16*!!ref)]
  105. mvy_diff s block_state[128 + 32*(my_context + 16*!!ref)]
  106. }
  107. }else{
  108. block(level+1)
  109. block(level+1)
  110. block(level+1)
  111. block(level+1)
  112. }
  113. }
  114. residual:
  115. residual2(luma)
  116. residual2(chroma_cr)
  117. residual2(chroma_cb)
  118. residual2:
  119. for(level=0; level<spatial_decomposition_count; level++){
  120. if(level==0)
  121. subband(LL, 0)
  122. subband(HL, level)
  123. subband(LH, level)
  124. subband(HH, level)
  125. }
  126. subband:
  127. FIXME
  128. Tag description:
  129. ----------------
  130. version
  131. 0
  132. this MUST NOT change within a bitstream
  133. always_reset
  134. if 1 then the range coder contexts will be reset after each frame
  135. temporal_decomposition_type
  136. 0
  137. temporal_decomposition_count
  138. 0
  139. spatial_decomposition_count
  140. FIXME
  141. colorspace_type
  142. 0
  143. this MUST NOT change within a bitstream
  144. chroma_h_shift
  145. log2(luma.width / chroma.width)
  146. this MUST NOT change within a bitstream
  147. chroma_v_shift
  148. log2(luma.height / chroma.height)
  149. this MUST NOT change within a bitstream
  150. spatial_scalability
  151. 0
  152. max_ref_frames
  153. maximum number of reference frames
  154. this MUST NOT change within a bitstream
  155. update_mc
  156. indicates that motion compensation filter parameters are stored in the
  157. header
  158. diag_mc
  159. flag to enable faster diagonal interpolation
  160. this SHOULD be 1 unless it turns out to be covered by a valid patent
  161. htaps
  162. number of half pel interpolation filter taps, MUST be even, >0 and <10
  163. hcoeff
  164. half pel interpolation filter coefficients, hcoeff[0] are the 2 middle
  165. coefficients [1] are the next outer ones and so on, resulting in a filter
  166. like: ...eff[2], hcoeff[1], hcoeff[0], hcoeff[0], hcoeff[1], hcoeff[2] ...
  167. the sign of the coefficients is not explicitly stored but alternates
  168. after each coeff and coeff[0] is positive, so ...,+,-,+,-,+,+,-,+,-,+,...
  169. hcoeff[0] is not explicitly stored but found by subtracting the sum
  170. of all stored coefficients with signs from 32
  171. hcoeff[0]= 32 - hcoeff[1] - hcoeff[2] - ...
  172. a good choice for hcoeff and htaps is
  173. htaps= 6
  174. hcoeff={40,-10,2}
  175. an alternative which requires more computations at both encoder and
  176. decoder side and may or may not be better is
  177. htaps= 8
  178. hcoeff={42,-14,6,-2}
  179. ref_frames
  180. minimum of the number of available reference frames and max_ref_frames
  181. for example the first frame after a key frame always has ref_frames=1
  182. spatial_decomposition_type
  183. wavelet type
  184. 0 is a 9/7 symmetric compact integer wavelet
  185. 1 is a 5/3 symmetric compact integer wavelet
  186. others are reserved
  187. stored as delta from last, last is reset to 0 if always_reset || keyframe
  188. qlog
  189. quality (logarthmic quantizer scale)
  190. stored as delta from last, last is reset to 0 if always_reset || keyframe
  191. mv_scale
  192. stored as delta from last, last is reset to 0 if always_reset || keyframe
  193. FIXME check that everything works fine if this changes between frames
  194. qbias
  195. dequantization bias
  196. stored as delta from last, last is reset to 0 if always_reset || keyframe
  197. block_max_depth
  198. maximum depth of the block tree
  199. stored as delta from last, last is reset to 0 if always_reset || keyframe
  200. quant_table
  201. quantiztation table
  202. Highlevel bitstream structure:
  203. =============================
  204. --------------------------------------------
  205. | Header |
  206. --------------------------------------------
  207. | ------------------------------------ |
  208. | | Block0 | |
  209. | | split? | |
  210. | | yes no | |
  211. | | ......... intra? | |
  212. | | : Block01 : yes no | |
  213. | | : Block02 : ....... .......... | |
  214. | | : Block03 : : y DC : : ref index: | |
  215. | | : Block04 : : cb DC : : motion x : | |
  216. | | ......... : cr DC : : motion y : | |
  217. | | ....... .......... | |
  218. | ------------------------------------ |
  219. | ------------------------------------ |
  220. | | Block1 | |
  221. | ... |
  222. --------------------------------------------
  223. | ------------ ------------ ------------ |
  224. || Y subbands | | Cb subbands| | Cr subbands||
  225. || --- --- | | --- --- | | --- --- ||
  226. || |LL0||HL0| | | |LL0||HL0| | | |LL0||HL0| ||
  227. || --- --- | | --- --- | | --- --- ||
  228. || --- --- | | --- --- | | --- --- ||
  229. || |LH0||HH0| | | |LH0||HH0| | | |LH0||HH0| ||
  230. || --- --- | | --- --- | | --- --- ||
  231. || --- --- | | --- --- | | --- --- ||
  232. || |HL1||LH1| | | |HL1||LH1| | | |HL1||LH1| ||
  233. || --- --- | | --- --- | | --- --- ||
  234. || --- --- | | --- --- | | --- --- ||
  235. || |HH1||HL2| | | |HH1||HL2| | | |HH1||HL2| ||
  236. || ... | | ... | | ... ||
  237. | ------------ ------------ ------------ |
  238. --------------------------------------------
  239. Decoding process:
  240. =================
  241. ------------
  242. | |
  243. | Subbands |
  244. ------------ | |
  245. | | ------------
  246. | Intra DC | |
  247. | | LL0 subband prediction
  248. ------------ |
  249. \ Dequantizaton
  250. ------------------- \ |
  251. | Reference frames | \ IDWT
  252. | ------- ------- | Motion \ |
  253. ||Frame 0| |Frame 1|| Compensation . OBMC v -------
  254. | ------- ------- | --------------. \------> + --->|Frame n|-->output
  255. | ------- ------- | -------
  256. ||Frame 2| |Frame 3||<----------------------------------/
  257. | ... |
  258. -------------------
  259. Range Coder:
  260. ============
  261. FIXME
  262. Neighboring Blocks:
  263. ===================
  264. left and top are set to the respective blocks unless they are outside of
  265. the image in which case they are set to the Null block
  266. top-left is set to the top left block unless it is outside of the image in
  267. which case it is set to the left block
  268. if this block has no larger parent block or it is at the left side of its
  269. parent block and the top right block is not outside of the image then the
  270. top right block is used for top-right else the top-left block is used
  271. Null block
  272. y,cb,cr are 128
  273. level, ref, mx and my are 0
  274. Motion Vector Prediction:
  275. =========================
  276. 1. the motion vectors of all the neighboring blocks are scaled to
  277. compensate for the difference of reference frames
  278. scaled_mv= (mv * (256 * (current_reference+1) / (mv.reference+1)) + 128)>>8
  279. 2. the median of the scaled left, top and top-right vectors is used as
  280. motion vector prediction
  281. 3. the used motion vector is the sum of the predictor and
  282. (mvx_diff, mvy_diff)*mv_scale
  283. Intra DC Predicton:
  284. ======================
  285. the luma and chroma values of the left block are used as predictors
  286. the used luma and chroma is the sum of the predictor and y_diff, cb_diff, cr_diff
  287. to reverse this in the decoder apply the following:
  288. block[y][x].dc[0] = block[y][x-1].dc[0] + y_diff;
  289. block[y][x].dc[1] = block[y][x-1].dc[1] + cb_diff;
  290. block[y][x].dc[2] = block[y][x-1].dc[2] + cr_diff;
  291. block[*][-1].dc[*]= 128;
  292. Motion Compensation:
  293. ====================
  294. Halfpel interpolation:
  295. ----------------------
  296. halfpel interpolation is done by convolution with the halfpel filter stored
  297. in the header:
  298. horizontal halfpel samples are found by
  299. H1[y][x] = hcoeff[0]*(F[y][x ] + F[y][x+1])
  300. + hcoeff[1]*(F[y][x-1] + F[y][x+2])
  301. + hcoeff[2]*(F[y][x-2] + F[y][x+3])
  302. + ...
  303. h1[y][x] = (H1[y][x] + 32)>>6;
  304. vertical halfpel samples are found by
  305. H2[y][x] = hcoeff[0]*(F[y ][x] + F[y+1][x])
  306. + hcoeff[1]*(F[y-1][x] + F[y+2][x])
  307. + ...
  308. h2[y][x] = (H2[y][x] + 32)>>6;
  309. vertical+horizontal halfpel samples are found by
  310. H3[y][x] = hcoeff[0]*(H2[y][x ] + H2[y][x+1])
  311. + hcoeff[1]*(H2[y][x-1] + H2[y][x+2])
  312. + ...
  313. H3[y][x] = hcoeff[0]*(H1[y ][x] + H1[y+1][x])
  314. + hcoeff[1]*(H1[y+1][x] + H1[y+2][x])
  315. + ...
  316. h3[y][x] = (H3[y][x] + 2048)>>12;
  317. F H1 F
  318. | | |
  319. | | |
  320. | | |
  321. F H1 F
  322. | | |
  323. | | |
  324. | | |
  325. F-------F-------F-> H1<-F-------F-------F
  326. v v v
  327. H2 H3 H2
  328. ^ ^ ^
  329. F-------F-------F-> H1<-F-------F-------F
  330. | | |
  331. | | |
  332. | | |
  333. F H1 F
  334. | | |
  335. | | |
  336. | | |
  337. F H1 F
  338. unavailable fullpel samples (outside the picture for example) shall be equal
  339. to the closest available fullpel sample
  340. Smaller pel interpolation:
  341. --------------------------
  342. if diag_mc is set then points which lie on a line between 2 vertically,
  343. horiziontally or diagonally adjacent halfpel points shall be interpolated
  344. linearls with rounding to nearest and halfway values rounded up.
  345. points which lie on 2 diagonals at the same time should only use the one
  346. diagonal not containing the fullpel point
  347. F-->O---q---O<--h1->O---q---O<--F
  348. v \ / v \ / v
  349. O O O O O O O
  350. | / | \ |
  351. q q q q q
  352. | / | \ |
  353. O O O O O O O
  354. ^ / \ ^ / \ ^
  355. h2-->O---q---O<--h3->O---q---O<--h2
  356. v \ / v \ / v
  357. O O O O O O O
  358. | \ | / |
  359. q q q q q
  360. | \ | / |
  361. O O O O O O O
  362. ^ / \ ^ / \ ^
  363. F-->O---q---O<--h1->O---q---O<--F
  364. the remaining points shall be bilinearly interpolated from the
  365. up to 4 surrounding halfpel and fullpel points, again rounding should be to
  366. nearest and halfway values rounded up
  367. compliant snow decoders MUST support 1-1/8 pel luma and 1/2-1/16 pel chroma
  368. interpolation at least
  369. Overlapped block motion compensation:
  370. -------------------------------------
  371. FIXME
  372. LL band prediction:
  373. ===================
  374. Each sample in the LL0 subband is predicted by the median of the left, top and
  375. left+top-topleft samples, samples outside the subband shall be considered to
  376. be 0. To reverse this prediction in the decoder apply the following.
  377. for(y=0; y<height; y++){
  378. for(x=0; x<width; x++){
  379. sample[y][x] += median(sample[y-1][x],
  380. sample[y][x-1],
  381. sample[y-1][x]+sample[y][x-1]-sample[y-1][x-1]);
  382. }
  383. }
  384. sample[-1][*]=sample[*][-1]= 0;
  385. width,height here are the width and height of the LL0 subband not of the final
  386. video
  387. Dequantizaton:
  388. ==============
  389. FIXME
  390. Wavelet Transform:
  391. ==================
  392. Snow supports 2 wavelet transforms, the symmetric biorthogonal 5/3 integer
  393. transform and a integer approximation of the symmetric biorthogonal 9/7
  394. daubechies wavelet.
  395. 2D IDWT (inverse discrete wavelet transform)
  396. --------------------------------------------
  397. The 2D IDWT applies a 2D filter recursively, each time combining the
  398. 4 lowest frequency subbands into a single subband until only 1 subband
  399. remains.
  400. The 2D filter is done by first applying a 1D filter in the vertical direction
  401. and then applying it in the horizontal one.
  402. --------------- --------------- --------------- ---------------
  403. |LL0|HL0| | | | | | | | | | | |
  404. |---+---| HL1 | | L0|H0 | HL1 | | LL1 | HL1 | | | |
  405. |LH0|HH0| | | | | | | | | | | |
  406. |-------+-------|->|-------+-------|->|-------+-------|->| L1 | H1 |->...
  407. | | | | | | | | | | | |
  408. | LH1 | HH1 | | LH1 | HH1 | | LH1 | HH1 | | | |
  409. | | | | | | | | | | | |
  410. --------------- --------------- --------------- ---------------
  411. 1D Filter:
  412. ----------
  413. 1. interleave the samples of the low and high frequency subbands like
  414. s={L0, H0, L1, H1, L2, H2, L3, H3, ... }
  415. note, this can end with a L or a H, the number of elements shall be w
  416. s[-1] shall be considered equivalent to s[1 ]
  417. s[w ] shall be considered equivalent to s[w-2]
  418. 2. perform the lifting steps in order as described below
  419. 5/3 Integer filter:
  420. 1. s[i] -= (s[i-1] + s[i+1] + 2)>>2; for all even i < w
  421. 2. s[i] += (s[i-1] + s[i+1] )>>1; for all odd i < w
  422. \ | /|\ | /|\ | /|\ | /|\
  423. \|/ | \|/ | \|/ | \|/ |
  424. + | + | + | + | -1/4
  425. /|\ | /|\ | /|\ | /|\ |
  426. / | \|/ | \|/ | \|/ | \|/
  427. | + | + | + | + +1/2
  428. snows 9/7 Integer filter:
  429. 1. s[i] -= (3*(s[i-1] + s[i+1]) + 4)>>3; for all even i < w
  430. 2. s[i] -= s[i-1] + s[i+1] ; for all odd i < w
  431. 3. s[i] += ( s[i-1] + s[i+1] + 4*s[i] + 8)>>4; for all even i < w
  432. 4. s[i] += (3*(s[i-1] + s[i+1]) )>>1; for all odd i < w
  433. \ | /|\ | /|\ | /|\ | /|\
  434. \|/ | \|/ | \|/ | \|/ |
  435. + | + | + | + | -3/8
  436. /|\ | /|\ | /|\ | /|\ |
  437. / | \|/ | \|/ | \|/ | \|/
  438. (| + (| + (| + (| + -1
  439. \ + /|\ + /|\ + /|\ + /|\ +1/4
  440. \|/ | \|/ | \|/ | \|/ |
  441. + | + | + | + | +1/16
  442. /|\ | /|\ | /|\ | /|\ |
  443. / | \|/ | \|/ | \|/ | \|/
  444. | + | + | + | + +3/2
  445. optimization tips:
  446. following are exactly identical
  447. (3a)>>1 == a + (a>>1)
  448. (a + 4b + 8)>>4 == ((a>>2) + b + 2)>>2
  449. 16bit implementation note:
  450. The IDWT can be implemented with 16bits, but this requires some care to
  451. prevent overflows, the following list, lists the minimum number of bits needed
  452. for some terms
  453. 1. lifting step
  454. A= s[i-1] + s[i+1] 16bit
  455. 3*A + 4 18bit
  456. A + (A>>1) + 2 17bit
  457. 3. lifting step
  458. s[i-1] + s[i+1] 17bit
  459. 4. lifiting step
  460. 3*(s[i-1] + s[i+1]) 17bit
  461. TODO:
  462. =====
  463. Important:
  464. finetune initial contexts
  465. flip wavelet?
  466. try to use the wavelet transformed predicted image (motion compensated image) as context for coding the residual coefficients
  467. try the MV length as context for coding the residual coefficients
  468. use extradata for stuff which is in the keyframes now?
  469. the MV median predictor is patented IIRC
  470. implement per picture halfpel interpolation
  471. try different range coder state transition tables for different contexts
  472. Not Important:
  473. compare the 6 tap and 8 tap hpel filters (psnr/bitrate and subjective quality)
  474. spatial_scalability b vs u (!= 0 breaks syntax anyway so we can add a u later)
  475. Credits:
  476. ========
  477. Michael Niedermayer
  478. Loren Merritt
  479. Copyright:
  480. ==========
  481. GPL + GFDL + whatever is needed to make this a RFC