You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

11985 lines
231KB

  1. #LyX 1.6.1 created this file. For more info see http://www.lyx.org/
  2. \lyxformat 345
  3. \begin_document
  4. \begin_header
  5. \textclass scrbook
  6. \use_default_options true
  7. \language english
  8. \inputencoding auto
  9. \font_roman default
  10. \font_sans default
  11. \font_typewriter default
  12. \font_default_family default
  13. \font_sc false
  14. \font_osf false
  15. \font_sf_scale 100
  16. \font_tt_scale 100
  17. \graphics default
  18. \paperfontsize 10
  19. \spacing single
  20. \use_hyperref false
  21. \papersize letterpaper
  22. \use_geometry true
  23. \use_amsmath 2
  24. \use_esint 2
  25. \cite_engine basic
  26. \use_bibtopic false
  27. \paperorientation portrait
  28. \leftmargin 2cm
  29. \topmargin 2cm
  30. \rightmargin 2cm
  31. \bottommargin 2cm
  32. \secnumdepth 3
  33. \tocdepth 3
  34. \paragraph_separation indent
  35. \defskip medskip
  36. \quotes_language english
  37. \papercolumns 1
  38. \papersides 1
  39. \paperpagestyle headings
  40. \tracking_changes false
  41. \output_changes false
  42. \author ""
  43. \author ""
  44. \end_header
  45. \begin_body
  46. \begin_layout Title
  47. The Speex Manual
  48. \begin_inset Newline newline
  49. \end_inset
  50. Version 1.2
  51. \end_layout
  52. \begin_layout Author
  53. Jean-Marc Valin
  54. \end_layout
  55. \begin_layout Standard
  56. \begin_inset Newpage newpage
  57. \end_inset
  58. \end_layout
  59. \begin_layout Standard
  60. Copyright
  61. \begin_inset ERT
  62. status collapsed
  63. \begin_layout Plain Layout
  64. \backslash
  65. copyright
  66. \end_layout
  67. \end_inset
  68. 2002-2008 Jean-Marc Valin/Xiph.org Foundation
  69. \end_layout
  70. \begin_layout Standard
  71. Permission is granted to copy, distribute and/or modify this document under
  72. the terms of the GNU Free Documentation License, Version 1.1 or any later
  73. version published by the Free Software Foundation; with no Invariant Section,
  74. with no Front-Cover Texts, and with no Back-Cover.
  75. A copy of the license is included in the section entitled "GNU Free Documentati
  76. on License".
  77. \end_layout
  78. \begin_layout Standard
  79. \begin_inset Newpage newpage
  80. \end_inset
  81. \begin_inset CommandInset toc
  82. LatexCommand tableofcontents
  83. \end_inset
  84. \begin_inset Newpage newpage
  85. \end_inset
  86. \end_layout
  87. \begin_layout Standard
  88. \begin_inset FloatList table
  89. \end_inset
  90. \begin_inset Newpage newpage
  91. \end_inset
  92. \end_layout
  93. \begin_layout Chapter
  94. Introduction to Speex
  95. \end_layout
  96. \begin_layout Standard
  97. The Speex codec (
  98. \family typewriter
  99. http://www.speex.org/
  100. \family default
  101. ) exists because there is a need for a speech codec that is open-source
  102. and free from software patent royalties.
  103. These are essential conditions for being usable in any open-source software.
  104. In essence, Speex is to speech what Vorbis is to audio/music.
  105. Unlike many other speech codecs, Speex is not designed for mobile phones
  106. but rather for packet networks and voice over IP (VoIP) applications.
  107. File-based compression is of course also supported.
  108. \end_layout
  109. \begin_layout Standard
  110. The Speex codec is designed to be very flexible and support a wide range
  111. of speech quality and bit-rate.
  112. Support for very good quality speech also means that Speex can encode wideband
  113. speech (16 kHz sampling rate) in addition to narrowband speech (telephone
  114. quality, 8 kHz sampling rate).
  115. \end_layout
  116. \begin_layout Standard
  117. Designing for VoIP instead of mobile phones means that Speex is robust to
  118. lost packets, but not to corrupted ones.
  119. This is based on the assumption that in VoIP, packets either arrive unaltered
  120. or don't arrive at all.
  121. Because Speex is targeted at a wide range of devices, it has modest (adjustable
  122. ) complexity and a small memory footprint.
  123. \end_layout
  124. \begin_layout Standard
  125. All the design goals led to the choice of CELP
  126. \begin_inset Index
  127. status collapsed
  128. \begin_layout Plain Layout
  129. CELP
  130. \end_layout
  131. \end_inset
  132. as the encoding technique.
  133. One of the main reasons is that CELP has long proved that it could work
  134. reliably and scale well to both low bit-rates (e.g.
  135. DoD CELP @ 4.8 kbps) and high bit-rates (e.g.
  136. G.728 @ 16 kbps).
  137. \end_layout
  138. \begin_layout Section
  139. Getting help
  140. \begin_inset CommandInset label
  141. LatexCommand label
  142. name "sec:Getting-help"
  143. \end_inset
  144. \end_layout
  145. \begin_layout Standard
  146. As for many open source projects, there are many ways to get help with Speex.
  147. These include:
  148. \end_layout
  149. \begin_layout Itemize
  150. This manual
  151. \end_layout
  152. \begin_layout Itemize
  153. Other documentation on the Speex website (http://www.speex.org/)
  154. \end_layout
  155. \begin_layout Itemize
  156. Mailing list: Discuss any Speex-related topic on speex-dev@xiph.org (not
  157. just for developers)
  158. \end_layout
  159. \begin_layout Itemize
  160. IRC: The main channel is #speex on irc.freenode.net.
  161. Note that due to time differences, it may take a while to get someone,
  162. so please be patient.
  163. \end_layout
  164. \begin_layout Itemize
  165. Email the author privately at jean-marc.valin@usherbrooke.ca
  166. \series bold
  167. only
  168. \series default
  169. for private/delicate topics you do not wish to discuss publicly.
  170. \end_layout
  171. \begin_layout Standard
  172. Before asking for help (mailing list or IRC),
  173. \series bold
  174. it is important to first read this manual
  175. \series default
  176. (OK, so if you made it here it's already a good sign).
  177. It is generally considered rude to ask on a mailing list about topics that
  178. are clearly detailed in the documentation.
  179. On the other hand, it's perfectly OK (and encouraged) to ask for clarifications
  180. about something covered in the manual.
  181. This manual does not (yet) cover everything about Speex, so everyone is
  182. encouraged to ask questions, send comments, feature requests, or just let
  183. us know how Speex is being used.
  184. \end_layout
  185. \begin_layout Standard
  186. Here are some additional guidelines related to the mailing list.
  187. Before reporting bugs in Speex to the list, it is strongly recommended
  188. (if possible) to first test whether these bugs can be reproduced using
  189. the speexenc and speexdec (see Section
  190. \begin_inset CommandInset ref
  191. LatexCommand ref
  192. reference "sec:Command-line-encoder/decoder"
  193. \end_inset
  194. ) command-line utilities.
  195. Bugs reported based on 3rd party code are both harder to find and far too
  196. often caused by errors that have nothing to do with Speex.
  197. \end_layout
  198. \begin_layout Section
  199. About this document
  200. \end_layout
  201. \begin_layout Standard
  202. This document is divided in the following way.
  203. Section
  204. \begin_inset CommandInset ref
  205. LatexCommand ref
  206. reference "sec:Feature-description"
  207. \end_inset
  208. describes the different Speex features and defines many basic terms that
  209. are used throughout this manual.
  210. Section
  211. \begin_inset CommandInset ref
  212. LatexCommand ref
  213. reference "sec:Command-line-encoder/decoder"
  214. \end_inset
  215. documents the standard command-line tools provided in the Speex distribution.
  216. Section
  217. \begin_inset CommandInset ref
  218. LatexCommand ref
  219. reference "sec:Programming-with-Speex"
  220. \end_inset
  221. includes detailed instructions about programming using the libspeex
  222. \begin_inset Index
  223. status collapsed
  224. \begin_layout Plain Layout
  225. libspeex
  226. \end_layout
  227. \end_inset
  228. API.
  229. Section
  230. \begin_inset CommandInset ref
  231. LatexCommand ref
  232. reference "sec:Formats-and-standards"
  233. \end_inset
  234. has some information related to Speex and standards.
  235. \end_layout
  236. \begin_layout Standard
  237. The three last sections describe the algorithms used in Speex.
  238. These sections require signal processing knowledge, but are not required
  239. for merely using Speex.
  240. They are intended for people who want to understand how Speex really works
  241. and/or want to do research based on Speex.
  242. Section
  243. \begin_inset CommandInset ref
  244. LatexCommand ref
  245. reference "sec:Introduction-to-CELP"
  246. \end_inset
  247. explains the general idea behind CELP, while sections
  248. \begin_inset CommandInset ref
  249. LatexCommand ref
  250. reference "sec:Speex-narrowband-mode"
  251. \end_inset
  252. and
  253. \begin_inset CommandInset ref
  254. LatexCommand ref
  255. reference "sec:Speex-wideband-mode"
  256. \end_inset
  257. are specific to Speex.
  258. \end_layout
  259. \begin_layout Standard
  260. \begin_inset Newpage newpage
  261. \end_inset
  262. \end_layout
  263. \begin_layout Chapter
  264. Codec description
  265. \begin_inset CommandInset label
  266. LatexCommand label
  267. name "sec:Feature-description"
  268. \end_inset
  269. \end_layout
  270. \begin_layout Standard
  271. This section describes Speex and its features into more details.
  272. \end_layout
  273. \begin_layout Section
  274. Concepts
  275. \end_layout
  276. \begin_layout Standard
  277. Before introducing all the Speex features, here are some concepts in speech
  278. coding that help better understand the rest of the manual.
  279. Although some are general concepts in speech/audio processing, others are
  280. specific to Speex.
  281. \end_layout
  282. \begin_layout Subsection*
  283. Sampling rate
  284. \begin_inset Index
  285. status collapsed
  286. \begin_layout Plain Layout
  287. sampling rate
  288. \end_layout
  289. \end_inset
  290. \end_layout
  291. \begin_layout Standard
  292. The sampling rate expressed in Hertz (Hz) is the number of samples taken
  293. from a signal per second.
  294. For a sampling rate of
  295. \begin_inset Formula $F_{s}$
  296. \end_inset
  297. kHz, the highest frequency that can be represented is equal to
  298. \begin_inset Formula $F_{s}/2$
  299. \end_inset
  300. kHz (
  301. \begin_inset Formula $F_{s}/2$
  302. \end_inset
  303. is known as the Nyquist frequency).
  304. This is a fundamental property in signal processing and is described by
  305. the sampling theorem.
  306. Speex is mainly designed for three different sampling rates: 8 kHz, 16
  307. kHz, and 32 kHz.
  308. These are respectively referred to as narrowband
  309. \begin_inset Index
  310. status collapsed
  311. \begin_layout Plain Layout
  312. narrowband
  313. \end_layout
  314. \end_inset
  315. , wideband
  316. \begin_inset Index
  317. status collapsed
  318. \begin_layout Plain Layout
  319. wideband
  320. \end_layout
  321. \end_inset
  322. and ultra-wideband
  323. \begin_inset Index
  324. status collapsed
  325. \begin_layout Plain Layout
  326. ultra-wideband
  327. \end_layout
  328. \end_inset
  329. .
  330. \end_layout
  331. \begin_layout Subsection*
  332. Bit-rate
  333. \end_layout
  334. \begin_layout Standard
  335. When encoding a speech signal, the bit-rate is defined as the number of
  336. bits per unit of time required to encode the speech.
  337. It is measured in
  338. \emph on
  339. bits per second
  340. \emph default
  341. (bps), or generally
  342. \emph on
  343. kilobits per second
  344. \emph default
  345. .
  346. It is important to make the distinction between
  347. \emph on
  348. kilo
  349. \series bold
  350. bits
  351. \series default
  352. \emph default
  353. \emph on
  354. per second
  355. \emph default
  356. (k
  357. \series bold
  358. b
  359. \series default
  360. ps) and
  361. \emph on
  362. kilo
  363. \series bold
  364. bytes
  365. \series default
  366. \emph default
  367. \emph on
  368. per second
  369. \emph default
  370. (k
  371. \series bold
  372. B
  373. \series default
  374. ps).
  375. \end_layout
  376. \begin_layout Subsection*
  377. Quality
  378. \begin_inset Index
  379. status collapsed
  380. \begin_layout Plain Layout
  381. quality
  382. \end_layout
  383. \end_inset
  384. (variable)
  385. \end_layout
  386. \begin_layout Standard
  387. Speex is a lossy codec, which means that it achieves compression at the
  388. expense of fidelity of the input speech signal.
  389. Unlike some other speech codecs, it is possible to control the trade-off
  390. made between quality and bit-rate.
  391. The Speex encoding process is controlled most of the time by a quality
  392. parameter that ranges from 0 to 10.
  393. In constant bit-rate
  394. \begin_inset Index
  395. status collapsed
  396. \begin_layout Plain Layout
  397. constant bit-rate
  398. \end_layout
  399. \end_inset
  400. (CBR) operation, the quality parameter is an integer, while for variable
  401. bit-rate (VBR), the parameter is a float.
  402. \end_layout
  403. \begin_layout Subsection*
  404. Complexity
  405. \begin_inset Index
  406. status collapsed
  407. \begin_layout Plain Layout
  408. complexity
  409. \end_layout
  410. \end_inset
  411. (variable)
  412. \end_layout
  413. \begin_layout Standard
  414. With Speex, it is possible to vary the complexity allowed for the encoder.
  415. This is done by controlling how the search is performed with an integer
  416. ranging from 1 to 10 in a way that's similar to the -1 to -9 options to
  417. \emph on
  418. gzip
  419. \emph default
  420. and
  421. \emph on
  422. bzip2
  423. \emph default
  424. compression utilities.
  425. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher
  426. than at complexity 10, but the CPU requirements for complexity 10 is about
  427. 5 times higher than for complexity 1.
  428. In practice, the best trade-off is between complexity 2 and 4, though higher
  429. settings are often useful when encoding non-speech sounds like DTMF
  430. \begin_inset Index
  431. status collapsed
  432. \begin_layout Plain Layout
  433. DTMF
  434. \end_layout
  435. \end_inset
  436. tones.
  437. \end_layout
  438. \begin_layout Subsection*
  439. Variable Bit-Rate
  440. \begin_inset Index
  441. status collapsed
  442. \begin_layout Plain Layout
  443. variable bit-rate
  444. \end_layout
  445. \end_inset
  446. (VBR)
  447. \end_layout
  448. \begin_layout Standard
  449. Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically
  450. to adapt to the
  451. \begin_inset Quotes eld
  452. \end_inset
  453. difficulty
  454. \begin_inset Quotes erd
  455. \end_inset
  456. of the audio being encoded.
  457. In the example of Speex, sounds like vowels and high-energy transients
  458. require a higher bit-rate to achieve good quality, while fricatives (e.g.
  459. s,f sounds) can be coded adequately with less bits.
  460. For this reason, VBR can achieve lower bit-rate for the same quality, or
  461. a better quality for a certain bit-rate.
  462. Despite its advantages, VBR has two main drawbacks: first, by only specifying
  463. quality, there's no guaranty about the final average bit-rate.
  464. Second, for some real-time applications like voice over IP (VoIP), what
  465. counts is the maximum bit-rate, which must be low enough for the communication
  466. channel.
  467. \end_layout
  468. \begin_layout Subsection*
  469. Average Bit-Rate
  470. \begin_inset Index
  471. status collapsed
  472. \begin_layout Plain Layout
  473. average bit-rate
  474. \end_layout
  475. \end_inset
  476. (ABR)
  477. \end_layout
  478. \begin_layout Standard
  479. Average bit-rate solves one of the problems of VBR, as it dynamically adjusts
  480. VBR quality in order to meet a specific target bit-rate.
  481. Because the quality/bit-rate is adjusted in real-time (open-loop), the
  482. global quality will be slightly lower than that obtained by encoding in
  483. VBR with exactly the right quality setting to meet the target average bit-rate.
  484. \end_layout
  485. \begin_layout Subsection*
  486. Voice Activity Detection
  487. \begin_inset Index
  488. status collapsed
  489. \begin_layout Plain Layout
  490. voice activity detection
  491. \end_layout
  492. \end_inset
  493. (VAD)
  494. \end_layout
  495. \begin_layout Standard
  496. When enabled, voice activity detection detects whether the audio being encoded
  497. is speech or silence/background noise.
  498. VAD is always implicitly activated when encoding in VBR, so the option
  499. is only useful in non-VBR operation.
  500. In this case, Speex detects non-speech periods and encode them with just
  501. enough bits to reproduce the background noise.
  502. This is called
  503. \begin_inset Quotes eld
  504. \end_inset
  505. comfort noise generation
  506. \begin_inset Quotes erd
  507. \end_inset
  508. (CNG).
  509. \end_layout
  510. \begin_layout Subsection*
  511. Discontinuous Transmission
  512. \begin_inset Index
  513. status collapsed
  514. \begin_layout Plain Layout
  515. discontinuous transmission
  516. \end_layout
  517. \end_inset
  518. (DTX)
  519. \end_layout
  520. \begin_layout Standard
  521. Discontinuous transmission is an addition to VAD/VBR operation, that allows
  522. to stop transmitting completely when the background noise is stationary.
  523. In file-based operation, since we cannot just stop writing to the file,
  524. only 5 bits are used for such frames (corresponding to 250 bps).
  525. \end_layout
  526. \begin_layout Subsection*
  527. Perceptual enhancement
  528. \begin_inset Index
  529. status collapsed
  530. \begin_layout Plain Layout
  531. perceptual enhancement
  532. \end_layout
  533. \end_inset
  534. \end_layout
  535. \begin_layout Standard
  536. Perceptual enhancement is a part of the decoder which, when turned on, attempts
  537. to reduce the perception of the noise/distortion produced by the encoding/decod
  538. ing process.
  539. In most cases, perceptual enhancement brings the sound further from the
  540. original
  541. \emph on
  542. objectively
  543. \emph default
  544. (e.g.
  545. considering only SNR), but in the end it still
  546. \emph on
  547. sounds
  548. \emph default
  549. better (subjective improvement).
  550. \end_layout
  551. \begin_layout Subsection*
  552. Latency and algorithmic delay
  553. \begin_inset Index
  554. status collapsed
  555. \begin_layout Plain Layout
  556. algorithmic delay
  557. \end_layout
  558. \end_inset
  559. \end_layout
  560. \begin_layout Standard
  561. Every speech codec introduces a delay in the transmission.
  562. For Speex, this delay is equal to the frame size, plus some amount of
  563. \begin_inset Quotes eld
  564. \end_inset
  565. look-ahead
  566. \begin_inset Quotes erd
  567. \end_inset
  568. required to process each frame.
  569. In narrowband operation (8 kHz), the look-ahead is 10 ms, in wideband operation
  570. (16 kHz), the look-ahead is 13.9 ms and in ultra-wideband operation (32
  571. kHz) look-ahead is 15.9 ms, resulting in the algorithic delays of 30 ms,
  572. 33.9 ms and 35.9 ms accordingly.
  573. These values don't account for the CPU time it takes to encode or decode
  574. the frames.
  575. \end_layout
  576. \begin_layout Section
  577. Codec
  578. \end_layout
  579. \begin_layout Standard
  580. The main characteristics of Speex can be summarized as follows:
  581. \end_layout
  582. \begin_layout Itemize
  583. Free software/open-source
  584. \begin_inset Index
  585. status collapsed
  586. \begin_layout Plain Layout
  587. open-source
  588. \end_layout
  589. \end_inset
  590. , patent
  591. \begin_inset Index
  592. status collapsed
  593. \begin_layout Plain Layout
  594. patent
  595. \end_layout
  596. \end_inset
  597. and royalty-free
  598. \end_layout
  599. \begin_layout Itemize
  600. Integration of narrowband
  601. \begin_inset Index
  602. status collapsed
  603. \begin_layout Plain Layout
  604. narrowband
  605. \end_layout
  606. \end_inset
  607. and wideband
  608. \begin_inset Index
  609. status collapsed
  610. \begin_layout Plain Layout
  611. wideband
  612. \end_layout
  613. \end_inset
  614. using an embedded bit-stream
  615. \end_layout
  616. \begin_layout Itemize
  617. Wide range of bit-rates available (from 2.15 kbps to 44 kbps)
  618. \end_layout
  619. \begin_layout Itemize
  620. Dynamic bit-rate switching (AMR) and Variable Bit-Rate
  621. \begin_inset Index
  622. status collapsed
  623. \begin_layout Plain Layout
  624. variable bit-rate
  625. \end_layout
  626. \end_inset
  627. (VBR) operation
  628. \end_layout
  629. \begin_layout Itemize
  630. Voice Activity Detection
  631. \begin_inset Index
  632. status collapsed
  633. \begin_layout Plain Layout
  634. voice activity detection
  635. \end_layout
  636. \end_inset
  637. (VAD, integrated with VBR) and discontinuous transmission (DTX)
  638. \end_layout
  639. \begin_layout Itemize
  640. Variable complexity
  641. \begin_inset Index
  642. status collapsed
  643. \begin_layout Plain Layout
  644. complexity
  645. \end_layout
  646. \end_inset
  647. \end_layout
  648. \begin_layout Itemize
  649. Embedded wideband structure (scalable sampling rate)
  650. \end_layout
  651. \begin_layout Itemize
  652. Ultra-wideband sampling rate at 32 kHz
  653. \end_layout
  654. \begin_layout Itemize
  655. Intensity stereo encoding option
  656. \end_layout
  657. \begin_layout Itemize
  658. Fixed-point implementation
  659. \end_layout
  660. \begin_layout Section
  661. Preprocessor
  662. \end_layout
  663. \begin_layout Standard
  664. This part refers to the preprocessor module introduced in the 1.1.x branch.
  665. The preprocessor is designed to be used on the audio
  666. \emph on
  667. before
  668. \emph default
  669. running the encoder.
  670. The preprocessor provides three main functionalities:
  671. \end_layout
  672. \begin_layout Itemize
  673. noise suppression
  674. \end_layout
  675. \begin_layout Itemize
  676. automatic gain control (AGC)
  677. \end_layout
  678. \begin_layout Itemize
  679. voice activity detection (VAD)
  680. \end_layout
  681. \begin_layout Standard
  682. The denoiser can be used to reduce the amount of background noise present
  683. in the input signal.
  684. This provides higher quality speech whether or not the denoised signal
  685. is encoded with Speex (or at all).
  686. However, when using the denoised signal with the codec, there is an additional
  687. benefit.
  688. Speech codecs in general (Speex included) tend to perform poorly on noisy
  689. input, which tends to amplify the noise.
  690. The denoiser greatly reduces this effect.
  691. \end_layout
  692. \begin_layout Standard
  693. Automatic gain control (AGC) is a feature that deals with the fact that
  694. the recording volume may vary by a large amount between different setups.
  695. The AGC provides a way to adjust a signal to a reference volume.
  696. This is useful for voice over IP because it removes the need for manual
  697. adjustment of the microphone gain.
  698. A secondary advantage is that by setting the microphone gain to a conservative
  699. (low) level, it is easier to avoid clipping.
  700. \end_layout
  701. \begin_layout Standard
  702. The voice activity detector (VAD) provided by the preprocessor is more advanced
  703. than the one directly provided in the codec.
  704. \end_layout
  705. \begin_layout Section
  706. Adaptive Jitter Buffer
  707. \end_layout
  708. \begin_layout Standard
  709. When transmitting voice (or any content for that matter) over UDP or RTP,
  710. packet may be lost, arrive with different delay, or even out of order.
  711. The purpose of a jitter buffer is to reorder packets and buffer them long
  712. enough (but no longer than necessary) so they can be sent to be decoded.
  713. \end_layout
  714. \begin_layout Section
  715. Acoustic Echo Canceller
  716. \end_layout
  717. \begin_layout Standard
  718. In any hands-free communication system (Fig.
  719. \begin_inset CommandInset ref
  720. LatexCommand ref
  721. reference "fig:Acoustic-echo-model"
  722. \end_inset
  723. ), speech from the remote end is played in the local loudspeaker, propagates
  724. in the room and is captured by the microphone.
  725. If the audio captured from the microphone is sent directly to the remote
  726. end, then the remote user hears an echo of his voice.
  727. An acoustic echo canceller is designed to remove the acoustic echo before
  728. it is sent to the remote end.
  729. It is important to understand that the echo canceller is meant to improve
  730. the quality on the
  731. \series bold
  732. remote
  733. \series default
  734. end.
  735. For those who care a lot about mouth-to-ear delays it should be noted that
  736. unlike Speex codec, resampler and preprocessor, this Acoustic Echo Canceller
  737. does not introduce any latency.
  738. \end_layout
  739. \begin_layout Standard
  740. \begin_inset Float figure
  741. wide false
  742. sideways false
  743. status open
  744. \begin_layout Plain Layout
  745. \begin_inset ERT
  746. status collapsed
  747. \begin_layout Plain Layout
  748. \backslash
  749. begin{center}
  750. \end_layout
  751. \end_inset
  752. \begin_inset Graphics
  753. filename echo_path.eps
  754. width 10cm
  755. \end_inset
  756. \begin_inset ERT
  757. status collapsed
  758. \begin_layout Plain Layout
  759. \backslash
  760. end{center}
  761. \end_layout
  762. \end_inset
  763. \end_layout
  764. \begin_layout Plain Layout
  765. \begin_inset Caption
  766. \begin_layout Plain Layout
  767. Acoustic echo model
  768. \begin_inset CommandInset label
  769. LatexCommand label
  770. name "fig:Acoustic-echo-model"
  771. \end_inset
  772. \end_layout
  773. \end_inset
  774. \end_layout
  775. \end_inset
  776. \end_layout
  777. \begin_layout Section
  778. Resampler
  779. \end_layout
  780. \begin_layout Standard
  781. In some cases, it may be useful to convert audio from one sampling rate
  782. to another.
  783. There are many reasons for that.
  784. It can be for mixing streams that have different sampling rates, for supporting
  785. sampling rates that the soundcard doesn't support, for transcoding, etc.
  786. That's why there is now a resampler that is part of the Speex project.
  787. This resampler can be used to convert between any two arbitrary rates (the
  788. ratio must only be a rational number) and there is control over the quality/com
  789. plexity tradeoff.
  790. Keep in mind, that resampler introduce some delay in audio stream, which
  791. size depends on resampler quality setting.
  792. Refer to resampler API documentation to know how to get exact delay values.
  793. \end_layout
  794. \begin_layout Section
  795. Integration
  796. \end_layout
  797. \begin_layout Standard
  798. Knowing
  799. \emph on
  800. how
  801. \emph default
  802. to use each of the components is not that useful unless we know
  803. \emph on
  804. where
  805. \emph default
  806. to use them.
  807. Figure
  808. \begin_inset CommandInset ref
  809. LatexCommand ref
  810. reference "fig:Integration-VoIP"
  811. \end_inset
  812. shows where each of the components would be used in a typical VoIP client.
  813. Components in dotted lines are optional, though they may be very useful
  814. in some circumstances.
  815. There are several important things to note from there.
  816. The AEC must be placed as close as possible to the playback and capture.
  817. Only the resampling may be closer.
  818. Also, it is very important to use the same clock for both mic capture and
  819. speaker/headphones playback.
  820. \end_layout
  821. \begin_layout Standard
  822. \begin_inset Float figure
  823. wide false
  824. sideways false
  825. status open
  826. \begin_layout Plain Layout
  827. \begin_inset ERT
  828. status collapsed
  829. \begin_layout Plain Layout
  830. \backslash
  831. begin{center}
  832. \end_layout
  833. \end_inset
  834. \begin_inset Graphics
  835. filename components.eps
  836. width 80text%
  837. \end_inset
  838. \begin_inset ERT
  839. status collapsed
  840. \begin_layout Plain Layout
  841. \backslash
  842. end{center}
  843. \end_layout
  844. \end_inset
  845. \end_layout
  846. \begin_layout Plain Layout
  847. \begin_inset Caption
  848. \begin_layout Plain Layout
  849. Integration of all the components in a VoIP client.
  850. \begin_inset CommandInset label
  851. LatexCommand label
  852. name "fig:Integration-VoIP"
  853. \end_inset
  854. \end_layout
  855. \end_inset
  856. \end_layout
  857. \end_inset
  858. \end_layout
  859. \begin_layout Standard
  860. \begin_inset Newpage newpage
  861. \end_inset
  862. \end_layout
  863. \begin_layout Chapter
  864. Compiling and Porting
  865. \end_layout
  866. \begin_layout Standard
  867. Compiling Speex under UNIX/Linux or any other platform supported by autoconf
  868. (e.g.
  869. Win32/cygwin) is as easy as typing:
  870. \end_layout
  871. \begin_layout LyX-Code
  872. % ./configure [options]
  873. \end_layout
  874. \begin_layout LyX-Code
  875. % make
  876. \end_layout
  877. \begin_layout LyX-Code
  878. % make install
  879. \end_layout
  880. \begin_layout Standard
  881. The options supported by the Speex configure script are:
  882. \end_layout
  883. \begin_layout Description
  884. --prefix=<path> Specifies the base path for installing Speex (e.g.
  885. /usr)
  886. \end_layout
  887. \begin_layout Description
  888. --enable-shared/--disable-shared Whether to compile shared libraries
  889. \end_layout
  890. \begin_layout Description
  891. --enable-static/--disable-static Whether to compile static libraries
  892. \end_layout
  893. \begin_layout Description
  894. --disable-wideband Disable the wideband part of Speex (typically to save
  895. space)
  896. \end_layout
  897. \begin_layout Description
  898. --enable-valgrind Enable extra hits for valgrind for debugging purposes
  899. (do not use by default)
  900. \end_layout
  901. \begin_layout Description
  902. --enable-sse Enable use of SSE instructions (x86/float only)
  903. \end_layout
  904. \begin_layout Description
  905. --enable-fixed-point
  906. \begin_inset Index
  907. status collapsed
  908. \begin_layout Plain Layout
  909. fixed-point
  910. \end_layout
  911. \end_inset
  912. Compile Speex for a processor that does not have a floating point unit
  913. (FPU)
  914. \end_layout
  915. \begin_layout Description
  916. --enable-arm4-asm Enable assembly specific to the ARMv4 architecture (gcc
  917. only)
  918. \end_layout
  919. \begin_layout Description
  920. --enable-arm5e-asm Enable assembly specific to the ARMv5E architecture (gcc
  921. only)
  922. \end_layout
  923. \begin_layout Description
  924. --enable-fixed-point-debug Use only for debugging the fixed-point
  925. \begin_inset Index
  926. status collapsed
  927. \begin_layout Plain Layout
  928. fixed-point
  929. \end_layout
  930. \end_inset
  931. code (very slow)
  932. \end_layout
  933. \begin_layout Description
  934. --enable-ti-c55x Enable support for the TI C5x family
  935. \end_layout
  936. \begin_layout Description
  937. --enable-blackfin-asm Enable assembly specific to the Blackfin DSP architecture
  938. (gcc only)
  939. \end_layout
  940. \begin_layout Section
  941. Platforms
  942. \end_layout
  943. \begin_layout Standard
  944. Speex is known to compile and work on a large number of architectures, both
  945. floating-point and fixed-point.
  946. In general, any architecture that can natively compute the multiplication
  947. of two signed 16-bit numbers (32-bit result) and runs at a sufficient clock
  948. rate (architecture-dependent) is capable of running Speex.
  949. Architectures on which Speex is
  950. \series bold
  951. known
  952. \series default
  953. to work (it probably works on many others) are:
  954. \end_layout
  955. \begin_layout Itemize
  956. x86 & x86-64
  957. \end_layout
  958. \begin_layout Itemize
  959. Power
  960. \end_layout
  961. \begin_layout Itemize
  962. SPARC
  963. \end_layout
  964. \begin_layout Itemize
  965. ARM
  966. \end_layout
  967. \begin_layout Itemize
  968. Blackfin
  969. \end_layout
  970. \begin_layout Itemize
  971. Coldfire (68k family)
  972. \end_layout
  973. \begin_layout Itemize
  974. TI C54xx & C55xx
  975. \end_layout
  976. \begin_layout Itemize
  977. TI C6xxx
  978. \end_layout
  979. \begin_layout Itemize
  980. TriMedia (experimental)
  981. \end_layout
  982. \begin_layout Standard
  983. Operating systems on top of which Speex is known to work include (it probably
  984. works on many others):
  985. \end_layout
  986. \begin_layout Itemize
  987. Linux
  988. \end_layout
  989. \begin_layout Itemize
  990. \begin_inset Formula $\mu$
  991. \end_inset
  992. Clinux
  993. \end_layout
  994. \begin_layout Itemize
  995. MacOS X
  996. \end_layout
  997. \begin_layout Itemize
  998. BSD
  999. \end_layout
  1000. \begin_layout Itemize
  1001. Other UNIX/POSIX variants
  1002. \end_layout
  1003. \begin_layout Itemize
  1004. Symbian
  1005. \end_layout
  1006. \begin_layout Standard
  1007. The source code directory include additional information for compiling on
  1008. certain architectures or operating systems in README.xxx files.
  1009. \end_layout
  1010. \begin_layout Section
  1011. Porting and Optimising
  1012. \end_layout
  1013. \begin_layout Standard
  1014. Here are a few things to consider when porting or optimising Speex for a
  1015. new platform or an existing one.
  1016. \end_layout
  1017. \begin_layout Subsection
  1018. CPU optimisation
  1019. \end_layout
  1020. \begin_layout Standard
  1021. The single factor that will affect the CPU usage of Speex the most is whether
  1022. it is compiled for floating point or fixed-point.
  1023. If your CPU/DSP does not have a floating-point unit FPU, then compiling
  1024. as fixed-point will be orders of magnitudes faster.
  1025. If there is an FPU present, then it is important to test which version
  1026. is faster.
  1027. On the x86 architecture, floating-point is
  1028. \series bold
  1029. generally
  1030. \series default
  1031. faster, but not always.
  1032. To compile Speex as fixed-point, you need to pass --fixed-point to the
  1033. configure script or define the FIXED_POINT macro for the compiler.
  1034. As of 1.2beta3, it is now possible to disable the floating-point compatibility
  1035. API, which means that your code can link without a float emulation library.
  1036. To do that configure with --disable-float-api or define the DISABLE_FLOAT_API
  1037. macro.
  1038. Until the VBR feature is ported to fixed-point, you will also need to configure
  1039. with --disable-vbr or define DISABLE_VBR.
  1040. \end_layout
  1041. \begin_layout Standard
  1042. Other important things to check on some DSP architectures are:
  1043. \end_layout
  1044. \begin_layout Itemize
  1045. Make sure the cache is set to write-back mode
  1046. \end_layout
  1047. \begin_layout Itemize
  1048. If the chip has SRAM instead of cache, make sure as much code and data are
  1049. in SRAM, rather than in RAM
  1050. \end_layout
  1051. \begin_layout Standard
  1052. If you are going to be writing assembly, then the following functions are
  1053. \series bold
  1054. usually
  1055. \series default
  1056. the first ones you should consider optimising:
  1057. \end_layout
  1058. \begin_layout Itemize
  1059. \begin_inset listings
  1060. inline true
  1061. status collapsed
  1062. \begin_layout Plain Layout
  1063. filter_mem16()
  1064. \end_layout
  1065. \end_inset
  1066. \end_layout
  1067. \begin_layout Itemize
  1068. \begin_inset listings
  1069. inline true
  1070. status collapsed
  1071. \begin_layout Plain Layout
  1072. iir_mem16()
  1073. \end_layout
  1074. \end_inset
  1075. \end_layout
  1076. \begin_layout Itemize
  1077. \begin_inset listings
  1078. inline true
  1079. status collapsed
  1080. \begin_layout Plain Layout
  1081. vq_nbest()
  1082. \end_layout
  1083. \end_inset
  1084. \end_layout
  1085. \begin_layout Itemize
  1086. \begin_inset listings
  1087. inline true
  1088. status collapsed
  1089. \begin_layout Plain Layout
  1090. pitch_xcorr()
  1091. \end_layout
  1092. \end_inset
  1093. \end_layout
  1094. \begin_layout Itemize
  1095. \begin_inset listings
  1096. inline true
  1097. status collapsed
  1098. \begin_layout Plain Layout
  1099. interp_pitch()
  1100. \end_layout
  1101. \end_inset
  1102. \end_layout
  1103. \begin_layout Standard
  1104. The filtering functions
  1105. \begin_inset listings
  1106. inline true
  1107. status collapsed
  1108. \begin_layout Plain Layout
  1109. filter_mem16()
  1110. \end_layout
  1111. \end_inset
  1112. and
  1113. \begin_inset listings
  1114. inline true
  1115. status collapsed
  1116. \begin_layout Plain Layout
  1117. iir_mem16()
  1118. \end_layout
  1119. \end_inset
  1120. are implemented in the direct form II transposed (DF2T).
  1121. However, for architectures based on multiply-accumulate (MAC), DF2T requires
  1122. frequent reload of the accumulator, which can make the code very slow.
  1123. For these architectures (e.g.
  1124. Blackfin and Coldfire), a better approach is to implement those functions
  1125. as direct form I (DF1), which is easier to express in terms of MAC.
  1126. When doing that however,
  1127. \series bold
  1128. it is important to make sure that the DF1 implementation still behaves like
  1129. the original DF2T behaviour when it comes to memory values
  1130. \series default
  1131. .
  1132. This is necessary because the filter is time-varying and must compute exactly
  1133. the same value (not counting machine rounding) on any encoder or decoder.
  1134. \end_layout
  1135. \begin_layout Subsection
  1136. Memory optimisation
  1137. \end_layout
  1138. \begin_layout Standard
  1139. Memory optimisation is mainly something that should be considered for small
  1140. embedded platforms.
  1141. For PCs, Speex is already so tiny that it's just not worth doing any of
  1142. the things suggested here.
  1143. There are several ways to reduce the memory usage of Speex, both in terms
  1144. of code size and data size.
  1145. For optimising code size, the trick is to first remove features you do
  1146. not need.
  1147. Some examples of things that can easily be disabled
  1148. \series bold
  1149. if you don't need them
  1150. \series default
  1151. are:
  1152. \end_layout
  1153. \begin_layout Itemize
  1154. Wideband support (--disable-wideband)
  1155. \end_layout
  1156. \begin_layout Itemize
  1157. Support for stereo (removing stereo.c)
  1158. \end_layout
  1159. \begin_layout Itemize
  1160. VBR support (--disable-vbr or DISABLE_VBR)
  1161. \end_layout
  1162. \begin_layout Itemize
  1163. Static codebooks that are not needed for the bit-rates you are using (*_table.c
  1164. files)
  1165. \end_layout
  1166. \begin_layout Standard
  1167. Speex also has several methods for allocating temporary arrays.
  1168. When using a compiler that supports C99 properly (as of 2007, Microsoft
  1169. compilers don't, but gcc does), it is best to define VAR_ARRAYS.
  1170. That makes use of the variable-size array feature of C99.
  1171. The next best is to define USE_ALLOCA so that Speex can use alloca() to
  1172. allocate the temporary arrays.
  1173. Note that on many systems, alloca() is buggy so it may not work.
  1174. If none of VAR_ARRAYS and USE_ALLOCA are defined, then Speex falls back
  1175. to allocating a large
  1176. \begin_inset Quotes eld
  1177. \end_inset
  1178. scratch space
  1179. \begin_inset Quotes erd
  1180. \end_inset
  1181. and doing its own internal allocation.
  1182. The main disadvantage of this solution is that it is wasteful.
  1183. It needs to allocate enough stack for the worst case scenario (worst bit-rate,
  1184. highest complexity setting, ...) and by default, the memory isn't shared between
  1185. multiple encoder/decoder states.
  1186. Still, if the
  1187. \begin_inset Quotes eld
  1188. \end_inset
  1189. manual
  1190. \begin_inset Quotes erd
  1191. \end_inset
  1192. allocation is the only option left, there are a few things that can be
  1193. improved.
  1194. By overriding the speex_alloc_scratch() call in os_support.h, it is possible
  1195. to always return the same memory area for all states
  1196. \begin_inset Foot
  1197. status collapsed
  1198. \begin_layout Plain Layout
  1199. In this case, one must be careful with threads
  1200. \end_layout
  1201. \end_inset
  1202. .
  1203. In addition to that, by redefining the NB_ENC_STACK and NB_DEC_STACK (or
  1204. similar for wideband), it is possible to only allocate memory for a scenario
  1205. that is known in advance.
  1206. In this case, it is important to measure the amount of memory required
  1207. for the specific sampling rate, bit-rate and complexity level being used.
  1208. \end_layout
  1209. \begin_layout Standard
  1210. \begin_inset Newpage newpage
  1211. \end_inset
  1212. \end_layout
  1213. \begin_layout Chapter
  1214. Command-line encoder/decoder
  1215. \begin_inset CommandInset label
  1216. LatexCommand label
  1217. name "sec:Command-line-encoder/decoder"
  1218. \end_inset
  1219. \end_layout
  1220. \begin_layout Standard
  1221. The base Speex distribution includes a command-line encoder (
  1222. \emph on
  1223. speexenc
  1224. \emph default
  1225. ) and decoder (
  1226. \emph on
  1227. speexdec
  1228. \emph default
  1229. ).
  1230. Those tools produce and read Speex files encapsulated in the Ogg container.
  1231. Although it is possible to encapsulate Speex in any container, Ogg is the
  1232. recommended container for files.
  1233. This section describes how to use the command line tools for Speex files
  1234. in Ogg.
  1235. \end_layout
  1236. \begin_layout Section
  1237. \emph on
  1238. speexenc
  1239. \begin_inset Index
  1240. status collapsed
  1241. \begin_layout Plain Layout
  1242. speexenc
  1243. \end_layout
  1244. \end_inset
  1245. \end_layout
  1246. \begin_layout Standard
  1247. The
  1248. \emph on
  1249. speexenc
  1250. \emph default
  1251. utility is used to create Speex files from raw PCM or wave files.
  1252. It can be used by calling:
  1253. \end_layout
  1254. \begin_layout LyX-Code
  1255. speexenc [options] input_file output_file
  1256. \end_layout
  1257. \begin_layout Standard
  1258. The value '-' for input_file or output_file corresponds respectively to
  1259. stdin and stdout.
  1260. The valid options are:
  1261. \end_layout
  1262. \begin_layout Description
  1263. --narrowband
  1264. \begin_inset space ~
  1265. \end_inset
  1266. (-n) Tell Speex to treat the input as narrowband (8 kHz).
  1267. This is the default
  1268. \end_layout
  1269. \begin_layout Description
  1270. --wideband
  1271. \begin_inset space ~
  1272. \end_inset
  1273. (-w) Tell Speex to treat the input as wideband (16 kHz)
  1274. \end_layout
  1275. \begin_layout Description
  1276. --ultra-wideband
  1277. \begin_inset space ~
  1278. \end_inset
  1279. (-u) Tell Speex to treat the input as
  1280. \begin_inset Quotes eld
  1281. \end_inset
  1282. ultra-wideband
  1283. \begin_inset Quotes erd
  1284. \end_inset
  1285. (32 kHz)
  1286. \end_layout
  1287. \begin_layout Description
  1288. --quality
  1289. \begin_inset space ~
  1290. \end_inset
  1291. n Set the encoding quality (0-10), default is 8
  1292. \end_layout
  1293. \begin_layout Description
  1294. --bitrate
  1295. \begin_inset space ~
  1296. \end_inset
  1297. n Encoding bit-rate (use bit-rate n or lower)
  1298. \end_layout
  1299. \begin_layout Description
  1300. --vbr Enable VBR (Variable Bit-Rate), disabled by default
  1301. \end_layout
  1302. \begin_layout Description
  1303. --abr
  1304. \begin_inset space ~
  1305. \end_inset
  1306. n Enable ABR (Average Bit-Rate) at n kbps, disabled by default
  1307. \end_layout
  1308. \begin_layout Description
  1309. --vad Enable VAD (Voice Activity Detection), disabled by default
  1310. \end_layout
  1311. \begin_layout Description
  1312. --dtx Enable DTX (Discontinuous Transmission), disabled by default
  1313. \end_layout
  1314. \begin_layout Description
  1315. --nframes
  1316. \begin_inset space ~
  1317. \end_inset
  1318. n Pack n frames in each Ogg packet (this saves space at low bit-rates)
  1319. \end_layout
  1320. \begin_layout Description
  1321. --comp
  1322. \begin_inset space ~
  1323. \end_inset
  1324. n Set encoding speed/quality tradeoff.
  1325. The higher the value of n, the slower the encoding (default is 3)
  1326. \end_layout
  1327. \begin_layout Description
  1328. -V Verbose operation, print bit-rate currently in use
  1329. \end_layout
  1330. \begin_layout Description
  1331. --help
  1332. \begin_inset space ~
  1333. \end_inset
  1334. (-h) Print the help
  1335. \end_layout
  1336. \begin_layout Description
  1337. --version
  1338. \begin_inset space ~
  1339. \end_inset
  1340. (-v) Print version information
  1341. \end_layout
  1342. \begin_layout Subsection*
  1343. Speex comments
  1344. \end_layout
  1345. \begin_layout Description
  1346. --comment Add the given string as an extra comment.
  1347. This may be used multiple times.
  1348. \end_layout
  1349. \begin_layout Description
  1350. --author Author of this track.
  1351. \end_layout
  1352. \begin_layout Description
  1353. --title Title for this track.
  1354. \end_layout
  1355. \begin_layout Subsection*
  1356. Raw input options
  1357. \end_layout
  1358. \begin_layout Description
  1359. --rate
  1360. \begin_inset space ~
  1361. \end_inset
  1362. n Sampling rate for raw input
  1363. \end_layout
  1364. \begin_layout Description
  1365. --stereo Consider raw input as stereo
  1366. \end_layout
  1367. \begin_layout Description
  1368. --le Raw input is little-endian
  1369. \end_layout
  1370. \begin_layout Description
  1371. --be Raw input is big-endian
  1372. \end_layout
  1373. \begin_layout Description
  1374. --8bit Raw input is 8-bit unsigned
  1375. \end_layout
  1376. \begin_layout Description
  1377. --16bit Raw input is 16-bit signed
  1378. \end_layout
  1379. \begin_layout Section
  1380. \emph on
  1381. speexdec
  1382. \begin_inset Index
  1383. status collapsed
  1384. \begin_layout Plain Layout
  1385. speexdec
  1386. \end_layout
  1387. \end_inset
  1388. \end_layout
  1389. \begin_layout Standard
  1390. The
  1391. \emph on
  1392. speexdec
  1393. \emph default
  1394. utility is used to decode Speex files and can be used by calling:
  1395. \end_layout
  1396. \begin_layout LyX-Code
  1397. speexdec [options] speex_file [output_file]
  1398. \end_layout
  1399. \begin_layout Standard
  1400. The value '-' for input_file or output_file corresponds respectively to
  1401. stdin and stdout.
  1402. Also, when no output_file is specified, the file is played to the soundcard.
  1403. The valid options are:
  1404. \end_layout
  1405. \begin_layout Description
  1406. --enh enable post-filter (default)
  1407. \end_layout
  1408. \begin_layout Description
  1409. --no-enh disable post-filter
  1410. \end_layout
  1411. \begin_layout Description
  1412. --force-nb Force decoding in narrowband
  1413. \end_layout
  1414. \begin_layout Description
  1415. --force-wb Force decoding in wideband
  1416. \end_layout
  1417. \begin_layout Description
  1418. --force-uwb Force decoding in ultra-wideband
  1419. \end_layout
  1420. \begin_layout Description
  1421. --mono Force decoding in mono
  1422. \end_layout
  1423. \begin_layout Description
  1424. --stereo Force decoding in stereo
  1425. \end_layout
  1426. \begin_layout Description
  1427. --rate
  1428. \begin_inset space ~
  1429. \end_inset
  1430. n Force decoding at n Hz sampling rate
  1431. \end_layout
  1432. \begin_layout Description
  1433. --packet-loss
  1434. \begin_inset space ~
  1435. \end_inset
  1436. n Simulate n % random packet loss
  1437. \end_layout
  1438. \begin_layout Description
  1439. -V Verbose operation, print bit-rate currently in use
  1440. \end_layout
  1441. \begin_layout Description
  1442. --help
  1443. \begin_inset space ~
  1444. \end_inset
  1445. (-h) Print the help
  1446. \end_layout
  1447. \begin_layout Description
  1448. --version
  1449. \begin_inset space ~
  1450. \end_inset
  1451. (-v) Print version information
  1452. \end_layout
  1453. \begin_layout Standard
  1454. \begin_inset Newpage newpage
  1455. \end_inset
  1456. \end_layout
  1457. \begin_layout Chapter
  1458. Using the Speex Codec API (
  1459. \emph on
  1460. libspeex
  1461. \emph default
  1462. \begin_inset Index
  1463. status collapsed
  1464. \begin_layout Plain Layout
  1465. libspeex
  1466. \end_layout
  1467. \end_inset
  1468. )
  1469. \begin_inset CommandInset label
  1470. LatexCommand label
  1471. name "sec:Programming-with-Speex"
  1472. \end_inset
  1473. \end_layout
  1474. \begin_layout Standard
  1475. The
  1476. \emph on
  1477. libspeex
  1478. \emph default
  1479. library contains all the functions for encoding and decoding speech with
  1480. the Speex codec.
  1481. When linking on a UNIX system, one must add
  1482. \emph on
  1483. -lspeex -lm
  1484. \emph default
  1485. to the compiler command line.
  1486. One important thing to know is that
  1487. \series bold
  1488. libspeex calls are reentrant, but not thread-safe
  1489. \series default
  1490. .
  1491. That means that it is fine to use calls from many threads, but
  1492. \series bold
  1493. calls using the same state from multiple threads must be protected by mutexes
  1494. \series default
  1495. .
  1496. Examples of code can also be found in Appendix
  1497. \begin_inset CommandInset ref
  1498. LatexCommand ref
  1499. reference "sec:Sample-code"
  1500. \end_inset
  1501. and the complete API documentation is included in the Documentation section
  1502. of the Speex website (http://www.speex.org/).
  1503. \end_layout
  1504. \begin_layout Section
  1505. Encoding
  1506. \begin_inset CommandInset label
  1507. LatexCommand label
  1508. name "sub:Encoding"
  1509. \end_inset
  1510. \end_layout
  1511. \begin_layout Standard
  1512. In order to encode speech using Speex, one first needs to:
  1513. \end_layout
  1514. \begin_layout Standard
  1515. \begin_inset listings
  1516. inline false
  1517. status open
  1518. \begin_layout Plain Layout
  1519. #include <speex/speex.h>
  1520. \end_layout
  1521. \end_inset
  1522. Then in the code, a Speex bit-packing struct must be declared, along with
  1523. a Speex encoder state:
  1524. \begin_inset listings
  1525. inline false
  1526. status open
  1527. \begin_layout Plain Layout
  1528. SpeexBits bits;
  1529. \end_layout
  1530. \begin_layout Plain Layout
  1531. void *enc_state;
  1532. \end_layout
  1533. \end_inset
  1534. The two are initialized by:
  1535. \begin_inset listings
  1536. inline false
  1537. status open
  1538. \begin_layout Plain Layout
  1539. speex_bits_init(&bits);
  1540. \end_layout
  1541. \begin_layout Plain Layout
  1542. enc_state = speex_encoder_init(&speex_nb_mode);
  1543. \end_layout
  1544. \end_inset
  1545. \end_layout
  1546. \begin_layout Standard
  1547. For wideband coding,
  1548. \emph on
  1549. speex_nb_mode
  1550. \emph default
  1551. will be replaced by
  1552. \emph on
  1553. speex_wb_mode
  1554. \emph default
  1555. .
  1556. In most cases, you will need to know the frame size used at the sampling
  1557. rate you are using.
  1558. You can get that value in the
  1559. \emph on
  1560. frame_size
  1561. \emph default
  1562. variable (expressed in
  1563. \series bold
  1564. samples
  1565. \series default
  1566. , not bytes) with:
  1567. \end_layout
  1568. \begin_layout Standard
  1569. \begin_inset listings
  1570. inline false
  1571. status open
  1572. \begin_layout Plain Layout
  1573. speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);
  1574. \end_layout
  1575. \end_inset
  1576. \end_layout
  1577. \begin_layout Standard
  1578. In practice,
  1579. \emph on
  1580. frame_size
  1581. \emph default
  1582. will correspond to 20 ms when using 8, 16, or 32 kHz sampling rate.
  1583. There are many parameters that can be set for the Speex encoder, but the
  1584. most useful one is the quality parameter that controls the quality vs bit-rate
  1585. tradeoff.
  1586. This is set by:
  1587. \end_layout
  1588. \begin_layout Standard
  1589. \begin_inset listings
  1590. inline false
  1591. status open
  1592. \begin_layout Plain Layout
  1593. speex_encoder_ctl(enc_state,SPEEX_SET_QUALITY,&quality);
  1594. \end_layout
  1595. \end_inset
  1596. where
  1597. \emph on
  1598. quality
  1599. \emph default
  1600. is an integer value ranging from 0 to 10 (inclusively).
  1601. The mapping between quality and bit-rate is described in Fig.
  1602. \begin_inset CommandInset ref
  1603. LatexCommand ref
  1604. reference "cap:quality_vs_bps"
  1605. \end_inset
  1606. for narrowband.
  1607. \end_layout
  1608. \begin_layout Standard
  1609. Once the initialization is done, for every input frame:
  1610. \end_layout
  1611. \begin_layout Standard
  1612. \begin_inset listings
  1613. inline false
  1614. status open
  1615. \begin_layout Plain Layout
  1616. speex_bits_reset(&bits);
  1617. \end_layout
  1618. \begin_layout Plain Layout
  1619. speex_encode_int(enc_state, input_frame, &bits);
  1620. \end_layout
  1621. \begin_layout Plain Layout
  1622. nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES);
  1623. \end_layout
  1624. \end_inset
  1625. \end_layout
  1626. \begin_layout Standard
  1627. where
  1628. \emph on
  1629. input_frame
  1630. \emph default
  1631. is a
  1632. \emph on
  1633. (
  1634. \emph default
  1635. short
  1636. \emph on
  1637. *)
  1638. \emph default
  1639. pointing to the beginning of a speech frame,
  1640. \emph on
  1641. byte_ptr
  1642. \emph default
  1643. is a
  1644. \emph on
  1645. (char *)
  1646. \emph default
  1647. where the encoded frame will be written,
  1648. \emph on
  1649. MAX_NB_BYTES
  1650. \emph default
  1651. is the maximum number of bytes that can be written to
  1652. \emph on
  1653. byte_ptr
  1654. \emph default
  1655. without causing an overflow and
  1656. \emph on
  1657. nbBytes
  1658. \emph default
  1659. is the number of bytes actually written to
  1660. \emph on
  1661. byte_ptr
  1662. \emph default
  1663. (the encoded size in bytes).
  1664. Before calling speex_bits_write, it is possible to find the number of bytes
  1665. that need to be written by calling
  1666. \family typewriter
  1667. speex_bits_nbytes(&bits)
  1668. \family default
  1669. , which returns a number of bytes.
  1670. \end_layout
  1671. \begin_layout Standard
  1672. It is still possible to use the
  1673. \emph on
  1674. speex_encode()
  1675. \emph default
  1676. function, which takes a
  1677. \emph on
  1678. (float *)
  1679. \emph default
  1680. for the audio.
  1681. However, this would make an eventual port to an FPU-less platform (like
  1682. ARM) more complicated.
  1683. Internally,
  1684. \emph on
  1685. speex_encode()
  1686. \emph default
  1687. and
  1688. \emph on
  1689. speex_encode_int()
  1690. \emph default
  1691. are processed in the same way.
  1692. Whether the encoder uses the fixed-point version is only decided by the
  1693. compile-time flags, not at the API level.
  1694. \end_layout
  1695. \begin_layout Standard
  1696. After you're done with the encoding, free all resources with:
  1697. \end_layout
  1698. \begin_layout Standard
  1699. \begin_inset listings
  1700. inline false
  1701. status open
  1702. \begin_layout Plain Layout
  1703. speex_bits_destroy(&bits);
  1704. \end_layout
  1705. \begin_layout Plain Layout
  1706. speex_encoder_destroy(enc_state);
  1707. \end_layout
  1708. \end_inset
  1709. \end_layout
  1710. \begin_layout Standard
  1711. That's about it for the encoder.
  1712. \end_layout
  1713. \begin_layout Section
  1714. Decoding
  1715. \begin_inset CommandInset label
  1716. LatexCommand label
  1717. name "sub:Decoding"
  1718. \end_inset
  1719. \end_layout
  1720. \begin_layout Standard
  1721. In order to decode speech using Speex, you first need to:
  1722. \begin_inset listings
  1723. inline false
  1724. status open
  1725. \begin_layout Plain Layout
  1726. #include <speex/speex.h>
  1727. \end_layout
  1728. \end_inset
  1729. You also need to declare a Speex bit-packing struct
  1730. \begin_inset listings
  1731. inline false
  1732. status open
  1733. \begin_layout Plain Layout
  1734. SpeexBits bits;
  1735. \end_layout
  1736. \end_inset
  1737. and a Speex decoder state
  1738. \begin_inset listings
  1739. inline false
  1740. status open
  1741. \begin_layout Plain Layout
  1742. void *dec_state;
  1743. \end_layout
  1744. \end_inset
  1745. The two are initialized by:
  1746. \begin_inset listings
  1747. inline false
  1748. status open
  1749. \begin_layout Plain Layout
  1750. speex_bits_init(&bits);
  1751. \end_layout
  1752. \begin_layout Plain Layout
  1753. dec_state = speex_decoder_init(&speex_nb_mode);
  1754. \end_layout
  1755. \end_inset
  1756. \end_layout
  1757. \begin_layout Standard
  1758. For wideband decoding,
  1759. \emph on
  1760. speex_nb_mode
  1761. \emph default
  1762. will be replaced by
  1763. \emph on
  1764. speex_wb_mode
  1765. \emph default
  1766. .
  1767. If you need to obtain the size of the frames that will be used by the decoder,
  1768. you can get that value in the
  1769. \emph on
  1770. frame_size
  1771. \emph default
  1772. variable (expressed in
  1773. \series bold
  1774. samples
  1775. \series default
  1776. , not bytes) with:
  1777. \end_layout
  1778. \begin_layout Standard
  1779. \begin_inset listings
  1780. inline false
  1781. status open
  1782. \begin_layout Plain Layout
  1783. speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size);
  1784. \end_layout
  1785. \end_inset
  1786. \end_layout
  1787. \begin_layout Standard
  1788. There is also a parameter that can be set for the decoder: whether or not
  1789. to use a perceptual enhancer.
  1790. This can be set by:
  1791. \end_layout
  1792. \begin_layout Standard
  1793. \begin_inset listings
  1794. inline false
  1795. status open
  1796. \begin_layout Plain Layout
  1797. speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);
  1798. \end_layout
  1799. \end_inset
  1800. \end_layout
  1801. \begin_layout Standard
  1802. where
  1803. \emph on
  1804. enh
  1805. \emph default
  1806. is an int with value 0 to have the enhancer disabled and 1 to have it enabled.
  1807. As of 1.2-beta1, the default is now to enable the enhancer.
  1808. \end_layout
  1809. \begin_layout Standard
  1810. Again, once the decoder initialization is done, for every input frame:
  1811. \end_layout
  1812. \begin_layout Standard
  1813. \begin_inset listings
  1814. inline false
  1815. status open
  1816. \begin_layout Plain Layout
  1817. speex_bits_read_from(&bits, input_bytes, nbBytes);
  1818. \end_layout
  1819. \begin_layout Plain Layout
  1820. speex_decode_int(dec_state, &bits, output_frame);
  1821. \end_layout
  1822. \end_inset
  1823. where input_bytes is a
  1824. \emph on
  1825. (char *)
  1826. \emph default
  1827. containing the bit-stream data received for a frame,
  1828. \emph on
  1829. nbBytes
  1830. \emph default
  1831. is the size (in bytes) of that bit-stream, and
  1832. \emph on
  1833. output_frame
  1834. \emph default
  1835. is a
  1836. \emph on
  1837. (short *)
  1838. \emph default
  1839. and points to the area where the decoded speech frame will be written.
  1840. A NULL value as the second argument indicates that we don't have the bits
  1841. for the current frame.
  1842. When a frame is lost, the Speex decoder will do its best to "guess" the
  1843. correct signal.
  1844. \end_layout
  1845. \begin_layout Standard
  1846. As for the encoder, the
  1847. \emph on
  1848. speex_decode()
  1849. \emph default
  1850. function can still be used, with a
  1851. \emph on
  1852. (float *)
  1853. \emph default
  1854. as the output for the audio.
  1855. After you're done with the decoding, free all resources with:
  1856. \end_layout
  1857. \begin_layout Standard
  1858. \begin_inset listings
  1859. inline false
  1860. status open
  1861. \begin_layout Plain Layout
  1862. speex_bits_destroy(&bits);
  1863. \end_layout
  1864. \begin_layout Plain Layout
  1865. speex_decoder_destroy(dec_state);
  1866. \end_layout
  1867. \end_inset
  1868. \end_layout
  1869. \begin_layout Section
  1870. Codec Options (speex_*_ctl)
  1871. \begin_inset CommandInset label
  1872. LatexCommand label
  1873. name "sub:Codec-Options"
  1874. \end_inset
  1875. \end_layout
  1876. \begin_layout Quote
  1877. \align center
  1878. \emph on
  1879. Entities should not be multiplied beyond necessity -- William of Ockham.
  1880. \end_layout
  1881. \begin_layout Quote
  1882. \align center
  1883. \emph on
  1884. Just because there's an option for it doesn't mean you have to turn it on
  1885. -- me.
  1886. \end_layout
  1887. \begin_layout Standard
  1888. The Speex encoder and decoder support many options and requests that can
  1889. be accessed through the
  1890. \emph on
  1891. speex_encoder_ctl
  1892. \emph default
  1893. and
  1894. \emph on
  1895. speex_decoder_ctl
  1896. \emph default
  1897. functions.
  1898. These functions are similar to the
  1899. \emph on
  1900. ioctl
  1901. \emph default
  1902. system call and their prototypes are:
  1903. \end_layout
  1904. \begin_layout Standard
  1905. \begin_inset listings
  1906. inline false
  1907. status open
  1908. \begin_layout Plain Layout
  1909. void speex_encoder_ctl(void *encoder, int request, void *ptr);
  1910. \end_layout
  1911. \begin_layout Plain Layout
  1912. void speex_decoder_ctl(void *encoder, int request, void *ptr);
  1913. \end_layout
  1914. \end_inset
  1915. \end_layout
  1916. \begin_layout Standard
  1917. Despite those functions, the defaults are usually good for many applications
  1918. and
  1919. \series bold
  1920. optional settings should only be used when one understands them and knows
  1921. that they are needed
  1922. \series default
  1923. .
  1924. A common error is to attempt to set many unnecessary settings.
  1925. \end_layout
  1926. \begin_layout Standard
  1927. Here is a list of the values allowed for the requests.
  1928. Some only apply to the encoder or the decoder.
  1929. Because the last argument is of type
  1930. \begin_inset listings
  1931. inline true
  1932. status collapsed
  1933. \begin_layout Plain Layout
  1934. void *
  1935. \end_layout
  1936. \end_inset
  1937. , the
  1938. \begin_inset listings
  1939. inline true
  1940. status collapsed
  1941. \begin_layout Plain Layout
  1942. _ctl()
  1943. \end_layout
  1944. \end_inset
  1945. functions are
  1946. \series bold
  1947. not type safe
  1948. \series default
  1949. , and should thus be used with care.
  1950. The type
  1951. \begin_inset listings
  1952. inline true
  1953. status collapsed
  1954. \begin_layout Plain Layout
  1955. spx_int32_t
  1956. \end_layout
  1957. \end_inset
  1958. is the same as the C99
  1959. \begin_inset listings
  1960. inline true
  1961. status collapsed
  1962. \begin_layout Plain Layout
  1963. int32_t
  1964. \end_layout
  1965. \end_inset
  1966. type.
  1967. \end_layout
  1968. \begin_layout Description
  1969. SPEEX_SET_ENH
  1970. \begin_inset Formula $\ddagger$
  1971. \end_inset
  1972. Set perceptual enhancer
  1973. \begin_inset Index
  1974. status collapsed
  1975. \begin_layout Plain Layout
  1976. perceptual enhancement
  1977. \end_layout
  1978. \end_inset
  1979. to on (1) or off (0) (
  1980. \begin_inset listings
  1981. inline true
  1982. status collapsed
  1983. \begin_layout Plain Layout
  1984. spx_int32_t
  1985. \end_layout
  1986. \end_inset
  1987. , default is on)
  1988. \end_layout
  1989. \begin_layout Description
  1990. SPEEX_GET_ENH
  1991. \begin_inset Formula $\ddagger$
  1992. \end_inset
  1993. Get perceptual enhancer status (
  1994. \begin_inset listings
  1995. inline true
  1996. status collapsed
  1997. \begin_layout Plain Layout
  1998. spx_int32_t
  1999. \end_layout
  2000. \end_inset
  2001. )
  2002. \end_layout
  2003. \begin_layout Description
  2004. SPEEX_GET_FRAME_SIZE Get the number of samples per frame for the current
  2005. mode (
  2006. \begin_inset listings
  2007. inline true
  2008. status collapsed
  2009. \begin_layout Plain Layout
  2010. spx_int32_t
  2011. \end_layout
  2012. \end_inset
  2013. )
  2014. \end_layout
  2015. \begin_layout Description
  2016. SPEEX_SET_QUALITY
  2017. \begin_inset Formula $\dagger$
  2018. \end_inset
  2019. Set the encoder speech quality (
  2020. \begin_inset listings
  2021. inline true
  2022. status collapsed
  2023. \begin_layout Plain Layout
  2024. spx_int32_t
  2025. \end_layout
  2026. \end_inset
  2027. from 0 to 10, default is 8)
  2028. \end_layout
  2029. \begin_layout Description
  2030. SPEEX_GET_QUALITY
  2031. \begin_inset Formula $\dagger$
  2032. \end_inset
  2033. Get the current encoder speech quality (
  2034. \begin_inset listings
  2035. inline true
  2036. status collapsed
  2037. \begin_layout Plain Layout
  2038. spx_int32_t
  2039. \end_layout
  2040. \end_inset
  2041. from 0 to 10)
  2042. \end_layout
  2043. \begin_layout Description
  2044. SPEEX_SET_MODE
  2045. \begin_inset Formula $\dagger$
  2046. \end_inset
  2047. Set the mode number, as specified in the RTP spec (
  2048. \begin_inset listings
  2049. inline true
  2050. status collapsed
  2051. \begin_layout Plain Layout
  2052. spx_int32_t
  2053. \end_layout
  2054. \end_inset
  2055. )
  2056. \end_layout
  2057. \begin_layout Description
  2058. SPEEX_GET_MODE
  2059. \begin_inset Formula $\dagger$
  2060. \end_inset
  2061. Get the current mode number, as specified in the RTP spec (
  2062. \begin_inset listings
  2063. inline true
  2064. status collapsed
  2065. \begin_layout Plain Layout
  2066. spx_int32_t
  2067. \end_layout
  2068. \end_inset
  2069. )
  2070. \end_layout
  2071. \begin_layout Description
  2072. SPEEX_SET_VBR
  2073. \begin_inset Formula $\dagger$
  2074. \end_inset
  2075. Set variable bit-rate (VBR) to on (1) or off (0) (
  2076. \begin_inset listings
  2077. inline true
  2078. status collapsed
  2079. \begin_layout Plain Layout
  2080. spx_int32_t
  2081. \end_layout
  2082. \end_inset
  2083. , default is off)
  2084. \end_layout
  2085. \begin_layout Description
  2086. SPEEX_GET_VBR
  2087. \begin_inset Formula $\dagger$
  2088. \end_inset
  2089. Get variable bit-rate
  2090. \begin_inset Index
  2091. status collapsed
  2092. \begin_layout Plain Layout
  2093. variable bit-rate
  2094. \end_layout
  2095. \end_inset
  2096. (VBR) status (
  2097. \begin_inset listings
  2098. inline true
  2099. status collapsed
  2100. \begin_layout Plain Layout
  2101. spx_int32_t
  2102. \end_layout
  2103. \end_inset
  2104. )
  2105. \end_layout
  2106. \begin_layout Description
  2107. SPEEX_SET_VBR_QUALITY
  2108. \begin_inset Formula $\dagger$
  2109. \end_inset
  2110. Set the encoder VBR speech quality (float 0.0 to 10.0, default is 8.0)
  2111. \end_layout
  2112. \begin_layout Description
  2113. SPEEX_GET_VBR_QUALITY
  2114. \begin_inset Formula $\dagger$
  2115. \end_inset
  2116. Get the current encoder VBR speech quality (float 0 to 10)
  2117. \end_layout
  2118. \begin_layout Description
  2119. SPEEX_SET_COMPLEXITY
  2120. \begin_inset Formula $\dagger$
  2121. \end_inset
  2122. Set the CPU resources allowed for the encoder (
  2123. \begin_inset listings
  2124. inline true
  2125. status collapsed
  2126. \begin_layout Plain Layout
  2127. spx_int32_t
  2128. \end_layout
  2129. \end_inset
  2130. from 1 to 10, default is 2)
  2131. \end_layout
  2132. \begin_layout Description
  2133. SPEEX_GET_COMPLEXITY
  2134. \begin_inset Formula $\dagger$
  2135. \end_inset
  2136. Get the CPU resources allowed for the encoder (
  2137. \begin_inset listings
  2138. inline true
  2139. status collapsed
  2140. \begin_layout Plain Layout
  2141. spx_int32_t
  2142. \end_layout
  2143. \end_inset
  2144. from 1 to 10, default is 2)
  2145. \end_layout
  2146. \begin_layout Description
  2147. SPEEX_SET_BITRATE
  2148. \begin_inset Formula $\dagger$
  2149. \end_inset
  2150. Set the bit-rate to use the closest value not exceeding the parameter (
  2151. \begin_inset listings
  2152. inline true
  2153. status collapsed
  2154. \begin_layout Plain Layout
  2155. spx_int32_t
  2156. \end_layout
  2157. \end_inset
  2158. in bits per second)
  2159. \end_layout
  2160. \begin_layout Description
  2161. SPEEX_GET_BITRATE Get the current bit-rate in use (
  2162. \begin_inset listings
  2163. inline true
  2164. status collapsed
  2165. \begin_layout Plain Layout
  2166. spx_int32_t
  2167. \end_layout
  2168. \end_inset
  2169. in bits per second)
  2170. \end_layout
  2171. \begin_layout Description
  2172. SPEEX_SET_SAMPLING_RATE Set real sampling rate (
  2173. \begin_inset listings
  2174. inline true
  2175. status collapsed
  2176. \begin_layout Plain Layout
  2177. spx_int32_t
  2178. \end_layout
  2179. \end_inset
  2180. in Hz)
  2181. \end_layout
  2182. \begin_layout Description
  2183. SPEEX_GET_SAMPLING_RATE Get real sampling rate (
  2184. \begin_inset listings
  2185. inline true
  2186. status collapsed
  2187. \begin_layout Plain Layout
  2188. spx_int32_t
  2189. \end_layout
  2190. \end_inset
  2191. in Hz)
  2192. \end_layout
  2193. \begin_layout Description
  2194. SPEEX_RESET_STATE Reset the encoder/decoder state to its original state,
  2195. clearing all memories (no argument)
  2196. \end_layout
  2197. \begin_layout Description
  2198. SPEEX_SET_VAD
  2199. \begin_inset Formula $\dagger$
  2200. \end_inset
  2201. Set voice activity detection
  2202. \begin_inset Index
  2203. status collapsed
  2204. \begin_layout Plain Layout
  2205. voice activity detection
  2206. \end_layout
  2207. \end_inset
  2208. (VAD) to on (1) or off (0) (
  2209. \begin_inset listings
  2210. inline true
  2211. status collapsed
  2212. \begin_layout Plain Layout
  2213. spx_int32_t
  2214. \end_layout
  2215. \end_inset
  2216. , default is off)
  2217. \end_layout
  2218. \begin_layout Description
  2219. SPEEX_GET_VAD
  2220. \begin_inset Formula $\dagger$
  2221. \end_inset
  2222. Get voice activity detection (VAD) status (
  2223. \begin_inset listings
  2224. inline true
  2225. status collapsed
  2226. \begin_layout Plain Layout
  2227. spx_int32_t
  2228. \end_layout
  2229. \end_inset
  2230. )
  2231. \end_layout
  2232. \begin_layout Description
  2233. SPEEX_SET_DTX
  2234. \begin_inset Formula $\dagger$
  2235. \end_inset
  2236. Set discontinuous transmission
  2237. \begin_inset Index
  2238. status collapsed
  2239. \begin_layout Plain Layout
  2240. discontinuous transmission
  2241. \end_layout
  2242. \end_inset
  2243. (DTX) to on (1) or off (0) (
  2244. \begin_inset listings
  2245. inline true
  2246. status collapsed
  2247. \begin_layout Plain Layout
  2248. spx_int32_t
  2249. \end_layout
  2250. \end_inset
  2251. , default is off)
  2252. \end_layout
  2253. \begin_layout Description
  2254. SPEEX_GET_DTX
  2255. \begin_inset Formula $\dagger$
  2256. \end_inset
  2257. Get discontinuous transmission (DTX) status (
  2258. \begin_inset listings
  2259. inline true
  2260. status collapsed
  2261. \begin_layout Plain Layout
  2262. spx_int32_t
  2263. \end_layout
  2264. \end_inset
  2265. )
  2266. \end_layout
  2267. \begin_layout Description
  2268. SPEEX_SET_ABR
  2269. \begin_inset Formula $\dagger$
  2270. \end_inset
  2271. Set average bit-rate
  2272. \begin_inset Index
  2273. status collapsed
  2274. \begin_layout Plain Layout
  2275. average bit-rate
  2276. \end_layout
  2277. \end_inset
  2278. (ABR) to a value n in bits per second (
  2279. \begin_inset listings
  2280. inline true
  2281. status collapsed
  2282. \begin_layout Plain Layout
  2283. spx_int32_t
  2284. \end_layout
  2285. \end_inset
  2286. in bits per second)
  2287. \end_layout
  2288. \begin_layout Description
  2289. SPEEX_GET_ABR
  2290. \begin_inset Formula $\dagger$
  2291. \end_inset
  2292. Get average bit-rate (ABR) setting (
  2293. \begin_inset listings
  2294. inline true
  2295. status collapsed
  2296. \begin_layout Plain Layout
  2297. spx_int32_t
  2298. \end_layout
  2299. \end_inset
  2300. in bits per second)
  2301. \end_layout
  2302. \begin_layout Description
  2303. SPEEX_SET_PLC_TUNING
  2304. \begin_inset Formula $\dagger$
  2305. \end_inset
  2306. Tell the encoder to optimize encoding for a certain percentage of packet
  2307. loss (
  2308. \begin_inset listings
  2309. inline true
  2310. status collapsed
  2311. \begin_layout Plain Layout
  2312. spx_int32_t
  2313. \end_layout
  2314. \end_inset
  2315. in percent)
  2316. \end_layout
  2317. \begin_layout Description
  2318. SPEEX_GET_PLC_TUNING
  2319. \begin_inset Formula $\dagger$
  2320. \end_inset
  2321. Get the current tuning of the encoder for PLC (
  2322. \begin_inset listings
  2323. inline true
  2324. status collapsed
  2325. \begin_layout Plain Layout
  2326. spx_int32_t
  2327. \end_layout
  2328. \end_inset
  2329. in percent)
  2330. \end_layout
  2331. \begin_layout Description
  2332. SPEEX_GET_LOOKAHEAD Returns the lookahead used by Speex separately for an
  2333. encoder and a decoder.
  2334. Sum encoder and decoder lookahead values to get the total codec lookahead.
  2335. \end_layout
  2336. \begin_layout Description
  2337. SPEEX_SET_VBR_MAX_BITRATE
  2338. \begin_inset Formula $\dagger$
  2339. \end_inset
  2340. Set the maximum bit-rate allowed in VBR operation (
  2341. \begin_inset listings
  2342. inline true
  2343. status collapsed
  2344. \begin_layout Plain Layout
  2345. spx_int32_t
  2346. \end_layout
  2347. \end_inset
  2348. in bits per second)
  2349. \end_layout
  2350. \begin_layout Description
  2351. SPEEX_GET_VBR_MAX_BITRATE
  2352. \begin_inset Formula $\dagger$
  2353. \end_inset
  2354. Get the current maximum bit-rate allowed in VBR operation (
  2355. \begin_inset listings
  2356. inline true
  2357. status collapsed
  2358. \begin_layout Plain Layout
  2359. spx_int32_t
  2360. \end_layout
  2361. \end_inset
  2362. in bits per second)
  2363. \end_layout
  2364. \begin_layout Description
  2365. SPEEX_SET_HIGHPASS Set the high-pass filter on (1) or off (0) (
  2366. \begin_inset listings
  2367. inline true
  2368. status collapsed
  2369. \begin_layout Plain Layout
  2370. spx_int32_t
  2371. \end_layout
  2372. \end_inset
  2373. , default is on)
  2374. \end_layout
  2375. \begin_layout Description
  2376. SPEEX_GET_HIGHPASS Get the current high-pass filter status (
  2377. \begin_inset listings
  2378. inline true
  2379. status collapsed
  2380. \begin_layout Plain Layout
  2381. spx_int32_t
  2382. \end_layout
  2383. \end_inset
  2384. )
  2385. \end_layout
  2386. \begin_layout Description
  2387. \begin_inset Formula $\dagger$
  2388. \end_inset
  2389. applies only to the encoder
  2390. \end_layout
  2391. \begin_layout Description
  2392. \begin_inset Formula $\ddagger$
  2393. \end_inset
  2394. applies only to the decoder
  2395. \end_layout
  2396. \begin_layout Section
  2397. Mode queries
  2398. \begin_inset CommandInset label
  2399. LatexCommand label
  2400. name "sub:Mode-queries"
  2401. \end_inset
  2402. \end_layout
  2403. \begin_layout Standard
  2404. Speex modes have a query system similar to the speex_encoder_ctl and speex_decod
  2405. er_ctl calls.
  2406. Since modes are read-only, it is only possible to get information about
  2407. a particular mode.
  2408. The function used to do that is:
  2409. \begin_inset listings
  2410. inline false
  2411. status open
  2412. \begin_layout Plain Layout
  2413. void speex_mode_query(SpeexMode *mode, int request, void *ptr);
  2414. \end_layout
  2415. \end_inset
  2416. The admissible values for request are (unless otherwise note, the values
  2417. are returned through
  2418. \emph on
  2419. ptr
  2420. \emph default
  2421. ):
  2422. \end_layout
  2423. \begin_layout Description
  2424. SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode
  2425. \end_layout
  2426. \begin_layout Description
  2427. SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified through
  2428. \emph on
  2429. ptr
  2430. \emph default
  2431. (integer in bps).
  2432. \end_layout
  2433. \begin_layout Section
  2434. Packing and in-band signalling
  2435. \begin_inset Index
  2436. status collapsed
  2437. \begin_layout Plain Layout
  2438. in-band signalling
  2439. \end_layout
  2440. \end_inset
  2441. \end_layout
  2442. \begin_layout Standard
  2443. Sometimes it is desirable to pack more than one frame per packet (or other
  2444. basic unit of storage).
  2445. The proper way to do it is to call speex_encode
  2446. \begin_inset Formula $N$
  2447. \end_inset
  2448. times before writing the stream with speex_bits_write.
  2449. In cases where the number of frames is not determined by an out-of-band
  2450. mechanism, it is possible to include a terminator code.
  2451. That terminator consists of the code 15 (decimal) encoded with 5 bits,
  2452. as shown in Table
  2453. \begin_inset CommandInset ref
  2454. LatexCommand ref
  2455. reference "cap:quality_vs_bps"
  2456. \end_inset
  2457. .
  2458. Note that as of version 1.0.2, calling speex_bits_write automatically inserts
  2459. the terminator so as to fill the last byte.
  2460. This doesn't involves any overhead and makes sure Speex can always detect
  2461. when there is no more frame in a packet.
  2462. \end_layout
  2463. \begin_layout Standard
  2464. It is also possible to send in-band
  2465. \begin_inset Quotes eld
  2466. \end_inset
  2467. messages
  2468. \begin_inset Quotes erd
  2469. \end_inset
  2470. to the other side.
  2471. All these messages are encoded as
  2472. \begin_inset Quotes eld
  2473. \end_inset
  2474. pseudo-frames
  2475. \begin_inset Quotes erd
  2476. \end_inset
  2477. of mode 14 which contain a 4-bit message type code, followed by the message.
  2478. Table
  2479. \begin_inset CommandInset ref
  2480. LatexCommand ref
  2481. reference "cap:In-band-signalling-codes"
  2482. \end_inset
  2483. lists the available codes, their meaning and the size of the message that
  2484. follows.
  2485. Most of these messages are requests that are sent to the encoder or decoder
  2486. on the other end, which is free to comply or ignore them.
  2487. By default, all in-band messages are ignored.
  2488. \end_layout
  2489. \begin_layout Standard
  2490. \begin_inset Float table
  2491. placement htbp
  2492. wide false
  2493. sideways false
  2494. status open
  2495. \begin_layout Plain Layout
  2496. \begin_inset ERT
  2497. status collapsed
  2498. \begin_layout Plain Layout
  2499. \backslash
  2500. begin{center}
  2501. \end_layout
  2502. \end_inset
  2503. \begin_inset Tabular
  2504. <lyxtabular version="3" rows="17" columns="3">
  2505. <features>
  2506. <column alignment="center" valignment="top" width="0pt">
  2507. <column alignment="center" valignment="top" width="0pt">
  2508. <column alignment="center" valignment="top" width="0pt">
  2509. <row>
  2510. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  2511. \begin_inset Text
  2512. \begin_layout Plain Layout
  2513. Code
  2514. \end_layout
  2515. \end_inset
  2516. </cell>
  2517. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  2518. \begin_inset Text
  2519. \begin_layout Plain Layout
  2520. Size (bits)
  2521. \end_layout
  2522. \end_inset
  2523. </cell>
  2524. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  2525. \begin_inset Text
  2526. \begin_layout Plain Layout
  2527. Content
  2528. \end_layout
  2529. \end_inset
  2530. </cell>
  2531. </row>
  2532. <row>
  2533. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2534. \begin_inset Text
  2535. \begin_layout Plain Layout
  2536. 0
  2537. \end_layout
  2538. \end_inset
  2539. </cell>
  2540. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2541. \begin_inset Text
  2542. \begin_layout Plain Layout
  2543. 1
  2544. \end_layout
  2545. \end_inset
  2546. </cell>
  2547. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2548. \begin_inset Text
  2549. \begin_layout Plain Layout
  2550. Asks decoder to set perceptual enhancement off (0) or on(1)
  2551. \end_layout
  2552. \end_inset
  2553. </cell>
  2554. </row>
  2555. <row>
  2556. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2557. \begin_inset Text
  2558. \begin_layout Plain Layout
  2559. 1
  2560. \end_layout
  2561. \end_inset
  2562. </cell>
  2563. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2564. \begin_inset Text
  2565. \begin_layout Plain Layout
  2566. 1
  2567. \end_layout
  2568. \end_inset
  2569. </cell>
  2570. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2571. \begin_inset Text
  2572. \begin_layout Plain Layout
  2573. Asks (if 1) the encoder to be less
  2574. \begin_inset Quotes eld
  2575. \end_inset
  2576. aggressive
  2577. \begin_inset Quotes erd
  2578. \end_inset
  2579. due to high packet loss
  2580. \end_layout
  2581. \end_inset
  2582. </cell>
  2583. </row>
  2584. <row>
  2585. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2586. \begin_inset Text
  2587. \begin_layout Plain Layout
  2588. 2
  2589. \end_layout
  2590. \end_inset
  2591. </cell>
  2592. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2593. \begin_inset Text
  2594. \begin_layout Plain Layout
  2595. 4
  2596. \end_layout
  2597. \end_inset
  2598. </cell>
  2599. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2600. \begin_inset Text
  2601. \begin_layout Plain Layout
  2602. Asks encoder to switch to mode N
  2603. \end_layout
  2604. \end_inset
  2605. </cell>
  2606. </row>
  2607. <row>
  2608. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2609. \begin_inset Text
  2610. \begin_layout Plain Layout
  2611. 3
  2612. \end_layout
  2613. \end_inset
  2614. </cell>
  2615. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2616. \begin_inset Text
  2617. \begin_layout Plain Layout
  2618. 4
  2619. \end_layout
  2620. \end_inset
  2621. </cell>
  2622. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2623. \begin_inset Text
  2624. \begin_layout Plain Layout
  2625. Asks encoder to switch to mode N for low-band
  2626. \end_layout
  2627. \end_inset
  2628. </cell>
  2629. </row>
  2630. <row>
  2631. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2632. \begin_inset Text
  2633. \begin_layout Plain Layout
  2634. 4
  2635. \end_layout
  2636. \end_inset
  2637. </cell>
  2638. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2639. \begin_inset Text
  2640. \begin_layout Plain Layout
  2641. 4
  2642. \end_layout
  2643. \end_inset
  2644. </cell>
  2645. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2646. \begin_inset Text
  2647. \begin_layout Plain Layout
  2648. Asks encoder to switch to mode N for high-band
  2649. \end_layout
  2650. \end_inset
  2651. </cell>
  2652. </row>
  2653. <row>
  2654. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2655. \begin_inset Text
  2656. \begin_layout Plain Layout
  2657. 5
  2658. \end_layout
  2659. \end_inset
  2660. </cell>
  2661. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2662. \begin_inset Text
  2663. \begin_layout Plain Layout
  2664. 4
  2665. \end_layout
  2666. \end_inset
  2667. </cell>
  2668. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2669. \begin_inset Text
  2670. \begin_layout Plain Layout
  2671. Asks encoder to switch to quality N for VBR
  2672. \end_layout
  2673. \end_inset
  2674. </cell>
  2675. </row>
  2676. <row>
  2677. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2678. \begin_inset Text
  2679. \begin_layout Plain Layout
  2680. 6
  2681. \end_layout
  2682. \end_inset
  2683. </cell>
  2684. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2685. \begin_inset Text
  2686. \begin_layout Plain Layout
  2687. 4
  2688. \end_layout
  2689. \end_inset
  2690. </cell>
  2691. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2692. \begin_inset Text
  2693. \begin_layout Plain Layout
  2694. Request acknowledge (0=no, 1=all, 2=only for in-band data)
  2695. \end_layout
  2696. \end_inset
  2697. </cell>
  2698. </row>
  2699. <row>
  2700. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2701. \begin_inset Text
  2702. \begin_layout Plain Layout
  2703. 7
  2704. \end_layout
  2705. \end_inset
  2706. </cell>
  2707. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2708. \begin_inset Text
  2709. \begin_layout Plain Layout
  2710. 4
  2711. \end_layout
  2712. \end_inset
  2713. </cell>
  2714. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2715. \begin_inset Text
  2716. \begin_layout Plain Layout
  2717. Asks encoder to set CBR (0), VAD(1), DTX(3), VBR(5), VBR+DTX(7)
  2718. \end_layout
  2719. \end_inset
  2720. </cell>
  2721. </row>
  2722. <row>
  2723. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2724. \begin_inset Text
  2725. \begin_layout Plain Layout
  2726. 8
  2727. \end_layout
  2728. \end_inset
  2729. </cell>
  2730. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2731. \begin_inset Text
  2732. \begin_layout Plain Layout
  2733. 8
  2734. \end_layout
  2735. \end_inset
  2736. </cell>
  2737. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2738. \begin_inset Text
  2739. \begin_layout Plain Layout
  2740. Transmit (8-bit) character to the other end
  2741. \end_layout
  2742. \end_inset
  2743. </cell>
  2744. </row>
  2745. <row>
  2746. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2747. \begin_inset Text
  2748. \begin_layout Plain Layout
  2749. 9
  2750. \end_layout
  2751. \end_inset
  2752. </cell>
  2753. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2754. \begin_inset Text
  2755. \begin_layout Plain Layout
  2756. 8
  2757. \end_layout
  2758. \end_inset
  2759. </cell>
  2760. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2761. \begin_inset Text
  2762. \begin_layout Plain Layout
  2763. Intensity stereo information
  2764. \end_layout
  2765. \end_inset
  2766. </cell>
  2767. </row>
  2768. <row>
  2769. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2770. \begin_inset Text
  2771. \begin_layout Plain Layout
  2772. 10
  2773. \end_layout
  2774. \end_inset
  2775. </cell>
  2776. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2777. \begin_inset Text
  2778. \begin_layout Plain Layout
  2779. 16
  2780. \end_layout
  2781. \end_inset
  2782. </cell>
  2783. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2784. \begin_inset Text
  2785. \begin_layout Plain Layout
  2786. Announce maximum bit-rate acceptable (N in bytes/second)
  2787. \end_layout
  2788. \end_inset
  2789. </cell>
  2790. </row>
  2791. <row>
  2792. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2793. \begin_inset Text
  2794. \begin_layout Plain Layout
  2795. 11
  2796. \end_layout
  2797. \end_inset
  2798. </cell>
  2799. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2800. \begin_inset Text
  2801. \begin_layout Plain Layout
  2802. 16
  2803. \end_layout
  2804. \end_inset
  2805. </cell>
  2806. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2807. \begin_inset Text
  2808. \begin_layout Plain Layout
  2809. reserved
  2810. \end_layout
  2811. \end_inset
  2812. </cell>
  2813. </row>
  2814. <row>
  2815. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2816. \begin_inset Text
  2817. \begin_layout Plain Layout
  2818. 12
  2819. \end_layout
  2820. \end_inset
  2821. </cell>
  2822. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2823. \begin_inset Text
  2824. \begin_layout Plain Layout
  2825. 32
  2826. \end_layout
  2827. \end_inset
  2828. </cell>
  2829. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2830. \begin_inset Text
  2831. \begin_layout Plain Layout
  2832. Acknowledge receiving packet N
  2833. \end_layout
  2834. \end_inset
  2835. </cell>
  2836. </row>
  2837. <row>
  2838. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2839. \begin_inset Text
  2840. \begin_layout Plain Layout
  2841. 13
  2842. \end_layout
  2843. \end_inset
  2844. </cell>
  2845. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2846. \begin_inset Text
  2847. \begin_layout Plain Layout
  2848. 32
  2849. \end_layout
  2850. \end_inset
  2851. </cell>
  2852. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2853. \begin_inset Text
  2854. \begin_layout Plain Layout
  2855. reserved
  2856. \end_layout
  2857. \end_inset
  2858. </cell>
  2859. </row>
  2860. <row>
  2861. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2862. \begin_inset Text
  2863. \begin_layout Plain Layout
  2864. 14
  2865. \end_layout
  2866. \end_inset
  2867. </cell>
  2868. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  2869. \begin_inset Text
  2870. \begin_layout Plain Layout
  2871. 64
  2872. \end_layout
  2873. \end_inset
  2874. </cell>
  2875. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  2876. \begin_inset Text
  2877. \begin_layout Plain Layout
  2878. reserved
  2879. \end_layout
  2880. \end_inset
  2881. </cell>
  2882. </row>
  2883. <row>
  2884. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  2885. \begin_inset Text
  2886. \begin_layout Plain Layout
  2887. 15
  2888. \end_layout
  2889. \end_inset
  2890. </cell>
  2891. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  2892. \begin_inset Text
  2893. \begin_layout Plain Layout
  2894. 64
  2895. \end_layout
  2896. \end_inset
  2897. </cell>
  2898. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  2899. \begin_inset Text
  2900. \begin_layout Plain Layout
  2901. reserved
  2902. \end_layout
  2903. \end_inset
  2904. </cell>
  2905. </row>
  2906. </lyxtabular>
  2907. \end_inset
  2908. \begin_inset ERT
  2909. status collapsed
  2910. \begin_layout Plain Layout
  2911. \backslash
  2912. end{center}
  2913. \end_layout
  2914. \end_inset
  2915. \end_layout
  2916. \begin_layout Plain Layout
  2917. \begin_inset Caption
  2918. \begin_layout Plain Layout
  2919. In-band signalling codes
  2920. \begin_inset CommandInset label
  2921. LatexCommand label
  2922. name "cap:In-band-signalling-codes"
  2923. \end_inset
  2924. \end_layout
  2925. \end_inset
  2926. \end_layout
  2927. \end_inset
  2928. \end_layout
  2929. \begin_layout Standard
  2930. Finally, applications may define custom in-band messages using mode 13.
  2931. The size of the message in bytes is encoded with 5 bits, so that the decoder
  2932. can skip it if it doesn't know how to interpret it.
  2933. \begin_inset Newpage newpage
  2934. \end_inset
  2935. \end_layout
  2936. \begin_layout Chapter
  2937. Speech Processing API (
  2938. \emph on
  2939. libspeexdsp
  2940. \emph default
  2941. )
  2942. \end_layout
  2943. \begin_layout Standard
  2944. As of version 1.2beta3, the non-codec parts of the Speex package are now
  2945. in a separate library called
  2946. \emph on
  2947. libspeexdsp
  2948. \emph default
  2949. .
  2950. This library includes the preprocessor, the acoustic echo canceller, the
  2951. jitter buffer, and the resampler.
  2952. In a UNIX environment, it can be linked into a program by adding
  2953. \emph on
  2954. -lspeexdsp -lm
  2955. \emph default
  2956. to the compiler command line.
  2957. Just like for libspeex,
  2958. \series bold
  2959. libspeexdsp calls are reentrant, but not thread-safe
  2960. \series default
  2961. .
  2962. That means that it is fine to use calls from many threads, but
  2963. \series bold
  2964. calls using the same state from multiple threads must be protected by mutexes
  2965. \series default
  2966. .
  2967. \end_layout
  2968. \begin_layout Section
  2969. Preprocessor
  2970. \begin_inset CommandInset label
  2971. LatexCommand label
  2972. name "sub:Preprocessor"
  2973. \end_inset
  2974. \end_layout
  2975. \begin_layout Standard
  2976. \noindent
  2977. In order to use the Speex preprocessor
  2978. \begin_inset Index
  2979. status collapsed
  2980. \begin_layout Plain Layout
  2981. preprocessor
  2982. \end_layout
  2983. \end_inset
  2984. , you first need to:
  2985. \begin_inset listings
  2986. inline false
  2987. status open
  2988. \begin_layout Plain Layout
  2989. #include <speex/speex_preprocess.h>
  2990. \end_layout
  2991. \end_inset
  2992. \end_layout
  2993. \begin_layout Standard
  2994. \noindent
  2995. Then, a preprocessor state can be created as:
  2996. \begin_inset listings
  2997. inline false
  2998. status open
  2999. \begin_layout Plain Layout
  3000. SpeexPreprocessState *preprocess_state = speex_preprocess_state_init(frame_size,
  3001. sampling_rate);
  3002. \end_layout
  3003. \end_inset
  3004. \end_layout
  3005. \begin_layout Standard
  3006. \noindent
  3007. and it is recommended to use the same value for
  3008. \family typewriter
  3009. frame_size
  3010. \family default
  3011. as is used by the encoder (20
  3012. \emph on
  3013. ms
  3014. \emph default
  3015. ).
  3016. \end_layout
  3017. \begin_layout Standard
  3018. For each input frame, you need to call:
  3019. \end_layout
  3020. \begin_layout Standard
  3021. \begin_inset listings
  3022. inline false
  3023. status open
  3024. \begin_layout Plain Layout
  3025. speex_preprocess_run(preprocess_state, audio_frame);
  3026. \end_layout
  3027. \end_inset
  3028. \end_layout
  3029. \begin_layout Standard
  3030. \noindent
  3031. where
  3032. \family typewriter
  3033. audio_frame
  3034. \family default
  3035. is used both as input and output.
  3036. In cases where the output audio is not useful for a certain frame, it is
  3037. possible to use instead:
  3038. \end_layout
  3039. \begin_layout Standard
  3040. \begin_inset listings
  3041. inline false
  3042. status open
  3043. \begin_layout Plain Layout
  3044. speex_preprocess_estimate_update(preprocess_state, audio_frame);
  3045. \end_layout
  3046. \end_inset
  3047. \end_layout
  3048. \begin_layout Standard
  3049. \noindent
  3050. This call will update all the preprocessor internal state variables without
  3051. computing the output audio, thus saving some CPU cycles.
  3052. \end_layout
  3053. \begin_layout Standard
  3054. The behaviour of the preprocessor can be changed using:
  3055. \end_layout
  3056. \begin_layout Standard
  3057. \begin_inset listings
  3058. inline false
  3059. status open
  3060. \begin_layout Plain Layout
  3061. speex_preprocess_ctl(preprocess_state, request, ptr);
  3062. \end_layout
  3063. \end_inset
  3064. \end_layout
  3065. \begin_layout Standard
  3066. \noindent
  3067. which is used in the same way as the encoder and decoder equivalent.
  3068. Options are listed in Section
  3069. \begin_inset CommandInset ref
  3070. LatexCommand ref
  3071. reference "sub:Preprocessor-options"
  3072. \end_inset
  3073. .
  3074. \end_layout
  3075. \begin_layout Standard
  3076. The preprocessor state can be destroyed using:
  3077. \end_layout
  3078. \begin_layout Standard
  3079. \begin_inset listings
  3080. inline false
  3081. status open
  3082. \begin_layout Plain Layout
  3083. speex_preprocess_state_destroy(preprocess_state);
  3084. \end_layout
  3085. \end_inset
  3086. \end_layout
  3087. \begin_layout Subsection
  3088. Preprocessor options
  3089. \begin_inset CommandInset label
  3090. LatexCommand label
  3091. name "sub:Preprocessor-options"
  3092. \end_inset
  3093. \end_layout
  3094. \begin_layout Standard
  3095. As with the codec, the preprocessor also has options that can be controlled
  3096. using an ioctl()-like call.
  3097. The available options are:
  3098. \end_layout
  3099. \begin_layout Description
  3100. SPEEX_PREPROCESS_SET_DENOISE Turns denoising on(1) or off(0) (
  3101. \begin_inset listings
  3102. inline true
  3103. status collapsed
  3104. \begin_layout Plain Layout
  3105. spx_int32_t
  3106. \end_layout
  3107. \end_inset
  3108. )
  3109. \end_layout
  3110. \begin_layout Description
  3111. SPEEX_PREPROCESS_GET_DENOISE Get denoising status (
  3112. \begin_inset listings
  3113. inline true
  3114. status collapsed
  3115. \begin_layout Plain Layout
  3116. spx_int32_t
  3117. \end_layout
  3118. \end_inset
  3119. )
  3120. \end_layout
  3121. \begin_layout Description
  3122. SPEEX_PREPROCESS_SET_AGC Turns automatic gain control (AGC) on(1) or off(0)
  3123. (
  3124. \begin_inset listings
  3125. inline true
  3126. status collapsed
  3127. \begin_layout Plain Layout
  3128. spx_int32_t
  3129. \end_layout
  3130. \end_inset
  3131. )
  3132. \end_layout
  3133. \begin_layout Description
  3134. SPEEX_PREPROCESS_GET_AGC Get AGC status (
  3135. \begin_inset listings
  3136. inline true
  3137. status collapsed
  3138. \begin_layout Plain Layout
  3139. spx_int32_t
  3140. \end_layout
  3141. \end_inset
  3142. )
  3143. \end_layout
  3144. \begin_layout Description
  3145. SPEEX_PREPROCESS_SET_VAD Turns voice activity detector (VAD) on(1) or off(0)
  3146. (
  3147. \begin_inset listings
  3148. inline true
  3149. status collapsed
  3150. \begin_layout Plain Layout
  3151. spx_int32_t
  3152. \end_layout
  3153. \end_inset
  3154. )
  3155. \end_layout
  3156. \begin_layout Description
  3157. SPEEX_PREPROCESS_GET_VAD Get VAD status (
  3158. \begin_inset listings
  3159. inline true
  3160. status collapsed
  3161. \begin_layout Plain Layout
  3162. spx_int32_t
  3163. \end_layout
  3164. \end_inset
  3165. )
  3166. \end_layout
  3167. \begin_layout Description
  3168. SPEEX_PREPROCESS_SET_AGC_LEVEL
  3169. \end_layout
  3170. \begin_layout Description
  3171. SPEEX_PREPROCESS_GET_AGC_LEVEL
  3172. \end_layout
  3173. \begin_layout Description
  3174. SPEEX_PREPROCESS_SET_DEREVERB Turns reverberation removal on(1) or off(0)
  3175. (
  3176. \begin_inset listings
  3177. inline true
  3178. status collapsed
  3179. \begin_layout Plain Layout
  3180. spx_int32_t
  3181. \end_layout
  3182. \end_inset
  3183. )
  3184. \end_layout
  3185. \begin_layout Description
  3186. SPEEX_PREPROCESS_GET_DEREVERB Get reverberation removal status (
  3187. \begin_inset listings
  3188. inline true
  3189. status collapsed
  3190. \begin_layout Plain Layout
  3191. spx_int32_t
  3192. \end_layout
  3193. \end_inset
  3194. )
  3195. \end_layout
  3196. \begin_layout Description
  3197. SPEEX_PREPROCESS_SET_DEREVERB_LEVEL Not working yet, do not use
  3198. \end_layout
  3199. \begin_layout Description
  3200. SPEEX_PREPROCESS_GET_DEREVERB_LEVEL Not working yet, do not use
  3201. \end_layout
  3202. \begin_layout Description
  3203. SPEEX_PREPROCESS_SET_DEREVERB_DECAY Not working yet, do not use
  3204. \end_layout
  3205. \begin_layout Description
  3206. SPEEX_PREPROCESS_GET_DEREVERB_DECAY Not working yet, do not use
  3207. \end_layout
  3208. \begin_layout Description
  3209. SPEEX_PREPROCESS_SET_PROB_START
  3210. \end_layout
  3211. \begin_layout Description
  3212. SPEEX_PREPROCESS_GET_PROB_START
  3213. \end_layout
  3214. \begin_layout Description
  3215. SPEEX_PREPROCESS_SET_PROB_CONTINUE
  3216. \end_layout
  3217. \begin_layout Description
  3218. SPEEX_PREPROCESS_GET_PROB_CONTINUE
  3219. \end_layout
  3220. \begin_layout Description
  3221. SPEEX_PREPROCESS_SET_NOISE_SUPPRESS Set maximum attenuation of the noise
  3222. in dB (negative
  3223. \begin_inset listings
  3224. inline true
  3225. status collapsed
  3226. \begin_layout Plain Layout
  3227. spx_int32_t
  3228. \end_layout
  3229. \end_inset
  3230. )
  3231. \end_layout
  3232. \begin_layout Description
  3233. SPEEX_PREPROCESS_GET_NOISE_SUPPRESS Get maximum attenuation of the noise
  3234. in dB (negative
  3235. \begin_inset listings
  3236. inline true
  3237. status collapsed
  3238. \begin_layout Plain Layout
  3239. spx_int32_t
  3240. \end_layout
  3241. \end_inset
  3242. )
  3243. \end_layout
  3244. \begin_layout Description
  3245. SPEEX_PREPROCESS_SET_ECHO_SUPPRESS Set maximum attenuation of the residual
  3246. echo in dB (negative
  3247. \begin_inset listings
  3248. inline true
  3249. status collapsed
  3250. \begin_layout Plain Layout
  3251. spx_int32_t
  3252. \end_layout
  3253. \end_inset
  3254. )
  3255. \end_layout
  3256. \begin_layout Description
  3257. SPEEX_PREPROCESS_GET_ECHO_SUPPRESS Get maximum attenuation of the residual
  3258. echo in dB (negative
  3259. \begin_inset listings
  3260. inline true
  3261. status collapsed
  3262. \begin_layout Plain Layout
  3263. spx_int32_t
  3264. \end_layout
  3265. \end_inset
  3266. )
  3267. \end_layout
  3268. \begin_layout Description
  3269. SPEEX_PREPROCESS_SET_ECHO_SUPPRESS_ACTIVE Set maximum attenuation of the
  3270. echo in dB when near end is active (negative
  3271. \begin_inset listings
  3272. inline true
  3273. status collapsed
  3274. \begin_layout Plain Layout
  3275. spx_int32_t
  3276. \end_layout
  3277. \end_inset
  3278. )
  3279. \end_layout
  3280. \begin_layout Description
  3281. SPEEX_PREPROCESS_GET_ECHO_SUPPRESS_ACTIVE Get maximum attenuation of the
  3282. echo in dB when near end is active (negative
  3283. \begin_inset listings
  3284. inline true
  3285. status collapsed
  3286. \begin_layout Plain Layout
  3287. spx_int32_t
  3288. \end_layout
  3289. \end_inset
  3290. )
  3291. \end_layout
  3292. \begin_layout Description
  3293. SPEEX_PREPROCESS_SET_ECHO_STATE Set the associated echo canceller for residual
  3294. echo suppression (pointer or NULL for no residual echo suppression)
  3295. \end_layout
  3296. \begin_layout Description
  3297. SPEEX_PREPROCESS_GET_ECHO_STATE Get the associated echo canceller (pointer)
  3298. \end_layout
  3299. \begin_layout Section
  3300. Echo Cancellation
  3301. \begin_inset CommandInset label
  3302. LatexCommand label
  3303. name "sub:Echo-Cancellation"
  3304. \end_inset
  3305. \end_layout
  3306. \begin_layout Standard
  3307. The Speex library now includes an echo cancellation
  3308. \begin_inset Index
  3309. status collapsed
  3310. \begin_layout Plain Layout
  3311. echo cancellation
  3312. \end_layout
  3313. \end_inset
  3314. algorithm suitable for Acoustic Echo Cancellation
  3315. \begin_inset Index
  3316. status collapsed
  3317. \begin_layout Plain Layout
  3318. acoustic echo cancellation
  3319. \end_layout
  3320. \end_inset
  3321. (AEC).
  3322. In order to use the echo canceller, you first need to
  3323. \end_layout
  3324. \begin_layout Standard
  3325. \begin_inset listings
  3326. inline false
  3327. status open
  3328. \begin_layout Plain Layout
  3329. #include <speex/speex_echo.h>
  3330. \end_layout
  3331. \end_inset
  3332. \end_layout
  3333. \begin_layout Standard
  3334. Then, an echo canceller state can be created by:
  3335. \end_layout
  3336. \begin_layout Standard
  3337. \begin_inset listings
  3338. inline false
  3339. status open
  3340. \begin_layout Plain Layout
  3341. SpeexEchoState *echo_state = speex_echo_state_init(frame_size, filter_length);
  3342. \end_layout
  3343. \end_inset
  3344. \end_layout
  3345. \begin_layout Standard
  3346. where
  3347. \family typewriter
  3348. frame_size
  3349. \family default
  3350. is the amount of data (in samples) you want to process at once and
  3351. \family typewriter
  3352. filter_length
  3353. \family default
  3354. is the length (in samples) of the echo cancelling filter you want to use
  3355. (also known as
  3356. \shape italic
  3357. tail length
  3358. \shape default
  3359. \begin_inset Index
  3360. status collapsed
  3361. \begin_layout Plain Layout
  3362. tail length
  3363. \end_layout
  3364. \end_inset
  3365. ).
  3366. It is recommended to use a frame size in the order of 20 ms (or equal to
  3367. the codec frame size) and make sure it is easy to perform an FFT of that
  3368. size (powers of two are better than prime sizes).
  3369. The recommended tail length is approximately the third of the room reverberatio
  3370. n time.
  3371. For example, in a small room, reverberation time is in the order of 300
  3372. ms, so a tail length of 100 ms is a good choice (800 samples at 8000 Hz
  3373. sampling rate).
  3374. \end_layout
  3375. \begin_layout Standard
  3376. Once the echo canceller state is created, audio can be processed by:
  3377. \end_layout
  3378. \begin_layout Standard
  3379. \begin_inset listings
  3380. inline false
  3381. status open
  3382. \begin_layout Plain Layout
  3383. speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame);
  3384. \end_layout
  3385. \end_inset
  3386. \end_layout
  3387. \begin_layout Standard
  3388. where
  3389. \family typewriter
  3390. input_frame
  3391. \family default
  3392. is the audio as captured by the microphone,
  3393. \family typewriter
  3394. echo_frame
  3395. \family default
  3396. is the signal that was played in the speaker (and needs to be removed)
  3397. and
  3398. \family typewriter
  3399. output_frame
  3400. \family default
  3401. is the signal with echo removed.
  3402. \end_layout
  3403. \begin_layout Standard
  3404. One important thing to keep in mind is the relationship between
  3405. \family typewriter
  3406. input_frame
  3407. \family default
  3408. and
  3409. \family typewriter
  3410. echo_frame
  3411. \family default
  3412. .
  3413. It is important that, at any time, any echo that is present in the input
  3414. has already been sent to the echo canceller as
  3415. \family typewriter
  3416. echo_frame
  3417. \family default
  3418. .
  3419. In other words, the echo canceller cannot remove a signal that it hasn't
  3420. yet received.
  3421. On the other hand, the delay between the input signal and the echo signal
  3422. must be small enough because otherwise part of the echo cancellation filter
  3423. is inefficient.
  3424. In the ideal case, you code would look like:
  3425. \begin_inset listings
  3426. lstparams "breaklines=true"
  3427. inline false
  3428. status open
  3429. \begin_layout Plain Layout
  3430. write_to_soundcard(echo_frame, frame_size);
  3431. \end_layout
  3432. \begin_layout Plain Layout
  3433. read_from_soundcard(input_frame, frame_size);
  3434. \end_layout
  3435. \begin_layout Plain Layout
  3436. speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame);
  3437. \end_layout
  3438. \end_inset
  3439. \end_layout
  3440. \begin_layout Standard
  3441. If you wish to further reduce the echo present in the signal, you can do
  3442. so by associating the echo canceller to the preprocessor (see Section
  3443. \begin_inset CommandInset ref
  3444. LatexCommand ref
  3445. reference "sub:Preprocessor"
  3446. \end_inset
  3447. ).
  3448. This is done by calling:
  3449. \begin_inset listings
  3450. lstparams "breaklines=true"
  3451. inline false
  3452. status open
  3453. \begin_layout Plain Layout
  3454. speex_preprocess_ctl(preprocess_state, SPEEX_PREPROCESS_SET_ECHO_STATE,echo_stat
  3455. e);
  3456. \end_layout
  3457. \end_inset
  3458. in the initialisation.
  3459. \end_layout
  3460. \begin_layout Standard
  3461. As of version 1.2-beta2, there is an alternative, simpler API that can be
  3462. used instead of
  3463. \emph on
  3464. speex_echo_cancellation()
  3465. \emph default
  3466. .
  3467. When audio capture and playback are handled asynchronously (e.g.
  3468. in different threads or using the
  3469. \emph on
  3470. poll()
  3471. \emph default
  3472. or
  3473. \emph on
  3474. select()
  3475. \emph default
  3476. system call), it can be difficult to keep track of what input_frame comes
  3477. with what echo_frame.
  3478. Instead, the playback context/thread can simply call:
  3479. \end_layout
  3480. \begin_layout Standard
  3481. \begin_inset listings
  3482. inline false
  3483. status open
  3484. \begin_layout Plain Layout
  3485. speex_echo_playback(echo_state, echo_frame);
  3486. \end_layout
  3487. \end_inset
  3488. \end_layout
  3489. \begin_layout Standard
  3490. every time an audio frame is played.
  3491. Then, the capture context/thread calls:
  3492. \end_layout
  3493. \begin_layout Standard
  3494. \begin_inset listings
  3495. inline false
  3496. status open
  3497. \begin_layout Plain Layout
  3498. speex_echo_capture(echo_state, input_frame, output_frame);
  3499. \end_layout
  3500. \end_inset
  3501. \end_layout
  3502. \begin_layout Standard
  3503. for every frame captured.
  3504. Internally,
  3505. \emph on
  3506. speex_echo_playback()
  3507. \emph default
  3508. simply buffers the playback frame so it can be used by
  3509. \emph on
  3510. speex_echo_capture()
  3511. \emph default
  3512. to call
  3513. \emph on
  3514. speex_echo_cancel()
  3515. \emph default
  3516. .
  3517. A side effect of using this alternate API is that the playback audio is
  3518. delayed by two frames, which is the normal delay caused by the soundcard.
  3519. When capture and playback are already synchronised,
  3520. \emph on
  3521. speex_echo_cancellation()
  3522. \emph default
  3523. is preferable since it gives better control on the exact input/echo timing.
  3524. \end_layout
  3525. \begin_layout Standard
  3526. The echo cancellation state can be destroyed with:
  3527. \end_layout
  3528. \begin_layout Standard
  3529. \begin_inset listings
  3530. inline false
  3531. status open
  3532. \begin_layout Plain Layout
  3533. speex_echo_state_destroy(echo_state);
  3534. \end_layout
  3535. \end_inset
  3536. \end_layout
  3537. \begin_layout Standard
  3538. It is also possible to reset the state of the echo canceller so it can be
  3539. reused without the need to create another state with:
  3540. \end_layout
  3541. \begin_layout Standard
  3542. \begin_inset listings
  3543. inline false
  3544. status open
  3545. \begin_layout Plain Layout
  3546. speex_echo_state_reset(echo_state);
  3547. \end_layout
  3548. \end_inset
  3549. \end_layout
  3550. \begin_layout Subsection
  3551. Troubleshooting
  3552. \end_layout
  3553. \begin_layout Standard
  3554. There are several things that may prevent the echo canceller from working
  3555. properly.
  3556. One of them is a bug (or something suboptimal) in the code, but there are
  3557. many others you should consider first
  3558. \end_layout
  3559. \begin_layout Itemize
  3560. Using a different soundcard to do the capture and plaback will
  3561. \series bold
  3562. not
  3563. \series default
  3564. work, regardless of what you may think.
  3565. The only exception to that is if the two cards can be made to have their
  3566. sampling clock
  3567. \begin_inset Quotes eld
  3568. \end_inset
  3569. locked
  3570. \begin_inset Quotes erd
  3571. \end_inset
  3572. on the same clock source.
  3573. If not, the clocks will always have a small amount of drift, which will
  3574. prevent the echo canceller from adapting.
  3575. \end_layout
  3576. \begin_layout Itemize
  3577. The delay between the record and playback signals must be minimal.
  3578. Any signal played has to
  3579. \begin_inset Quotes eld
  3580. \end_inset
  3581. appear
  3582. \begin_inset Quotes erd
  3583. \end_inset
  3584. on the playback (far end) signal slightly before the echo canceller
  3585. \begin_inset Quotes eld
  3586. \end_inset
  3587. sees
  3588. \begin_inset Quotes erd
  3589. \end_inset
  3590. it in the near end signal, but excessive delay means that part of the filter
  3591. length is wasted.
  3592. In the worst situations, the delay is such that it is longer than the filter
  3593. length, in which case, no echo can be cancelled.
  3594. \end_layout
  3595. \begin_layout Itemize
  3596. When it comes to echo tail length (filter length), longer is
  3597. \series bold
  3598. not
  3599. \series default
  3600. better.
  3601. Actually, the longer the tail length, the longer it takes for the filter
  3602. to adapt.
  3603. Of course, a tail length that is too short will not cancel enough echo,
  3604. but the most common problem seen is that people set a very long tail length
  3605. and then wonder why no echo is being cancelled.
  3606. \end_layout
  3607. \begin_layout Itemize
  3608. Non-linear distortion cannot (by definition) be modeled by the linear adaptive
  3609. filter used in the echo canceller and thus cannot be cancelled.
  3610. Use good audio gear and avoid saturation/clipping.
  3611. \end_layout
  3612. \begin_layout Standard
  3613. Also useful is reading
  3614. \emph on
  3615. Echo Cancellation Demystified
  3616. \emph default
  3617. by Alexey Frunze
  3618. \begin_inset Foot
  3619. status collapsed
  3620. \begin_layout Plain Layout
  3621. http://www.embeddedstar.com/articles/2003/7/article20030720-1.html
  3622. \end_layout
  3623. \end_inset
  3624. , which explains the fundamental principles of echo cancellation.
  3625. The details of the algorithm described in the article are different, but
  3626. the general ideas of echo cancellation through adaptive filters are the
  3627. same.
  3628. \end_layout
  3629. \begin_layout Standard
  3630. As of version 1.2beta2, a new
  3631. \family typewriter
  3632. echo_diagnostic.m
  3633. \family default
  3634. tool is included in the source distribution.
  3635. The first step is to define DUMP_ECHO_CANCEL_DATA during the build.
  3636. This causes the echo canceller to automatically save the near-end, far-end
  3637. and output signals to files (aec_rec.sw aec_play.sw and aec_out.sw).
  3638. These are exactly what the AEC receives and outputs.
  3639. From there, it is necessary to start Octave and type:
  3640. \end_layout
  3641. \begin_layout Standard
  3642. \begin_inset listings
  3643. lstparams "language=Matlab"
  3644. inline false
  3645. status open
  3646. \begin_layout Plain Layout
  3647. echo_diagnostic('aec_rec.sw', 'aec_play.sw', 'aec_diagnostic.sw', 1024);
  3648. \end_layout
  3649. \end_inset
  3650. \end_layout
  3651. \begin_layout Standard
  3652. The value of 1024 is the filter length and can be changed.
  3653. There will be some (hopefully) useful messages printed and echo cancelled
  3654. audio will be saved to aec_diagnostic.sw .
  3655. If even that output is bad (almost no cancellation) then there is probably
  3656. problem with the playback or recording process.
  3657. \end_layout
  3658. \begin_layout Section
  3659. Jitter Buffer
  3660. \end_layout
  3661. \begin_layout Standard
  3662. The jitter buffer can be enabled by including:
  3663. \begin_inset listings
  3664. lstparams "breaklines=true"
  3665. inline false
  3666. status open
  3667. \begin_layout Plain Layout
  3668. #include <speex/speex_jitter.h>
  3669. \end_layout
  3670. \end_inset
  3671. and a new jitter buffer state can be initialised by:
  3672. \end_layout
  3673. \begin_layout Standard
  3674. \begin_inset listings
  3675. lstparams "breaklines=true"
  3676. inline false
  3677. status open
  3678. \begin_layout Plain Layout
  3679. JitterBuffer *state = jitter_buffer_init(step);
  3680. \end_layout
  3681. \end_inset
  3682. \end_layout
  3683. \begin_layout Standard
  3684. where the
  3685. \begin_inset listings
  3686. inline true
  3687. status collapsed
  3688. \begin_layout Plain Layout
  3689. step
  3690. \end_layout
  3691. \end_inset
  3692. argument is the default time step (in timestamp units) used for adjusting
  3693. the delay and doing concealment.
  3694. A value of 1 is always correct, but higher values may be more convenient
  3695. sometimes.
  3696. For example, if you are only able to do concealment on 20ms frames, there
  3697. is no point in the jitter buffer asking you to do it on one sample.
  3698. Another example is that for video, it makes no sense to adjust the delay
  3699. by less than a full frame.
  3700. The value provided can always be changed at a later time.
  3701. \end_layout
  3702. \begin_layout Standard
  3703. The jitter buffer API is based on the
  3704. \begin_inset listings
  3705. inline true
  3706. status open
  3707. \begin_layout Plain Layout
  3708. JitterBufferPacket
  3709. \end_layout
  3710. \end_inset
  3711. type, which is defined as:
  3712. \begin_inset listings
  3713. inline false
  3714. status open
  3715. \begin_layout Plain Layout
  3716. typedef struct {
  3717. \end_layout
  3718. \begin_layout Plain Layout
  3719. char *data; /* Data bytes contained in the packet */
  3720. \end_layout
  3721. \begin_layout Plain Layout
  3722. spx_uint32_t len; /* Length of the packet in bytes */
  3723. \end_layout
  3724. \begin_layout Plain Layout
  3725. spx_uint32_t timestamp; /* Timestamp for the packet */
  3726. \end_layout
  3727. \begin_layout Plain Layout
  3728. spx_uint32_t span; /* Time covered by the packet (timestamp units)
  3729. */
  3730. \end_layout
  3731. \begin_layout Plain Layout
  3732. } JitterBufferPacket;
  3733. \end_layout
  3734. \end_inset
  3735. \end_layout
  3736. \begin_layout Standard
  3737. As an example, for audio the timestamp field would be what is obtained from
  3738. the RTP timestamp field and the span would be the number of samples that
  3739. are encoded in the packet.
  3740. For Speex narrowband, span would be 160 if only one frame is included in
  3741. the packet.
  3742. \end_layout
  3743. \begin_layout Standard
  3744. When a packet arrives, it need to be inserter into the jitter buffer by:
  3745. \begin_inset listings
  3746. inline false
  3747. status open
  3748. \begin_layout Plain Layout
  3749. JitterBufferPacket packet;
  3750. \end_layout
  3751. \begin_layout Plain Layout
  3752. /* Fill in each field in the packet struct */
  3753. \end_layout
  3754. \begin_layout Plain Layout
  3755. jitter_buffer_put(state, &packet);
  3756. \end_layout
  3757. \end_inset
  3758. \end_layout
  3759. \begin_layout Standard
  3760. When the decoder is ready to decode a packet the packet to be decoded can
  3761. be obtained by:
  3762. \begin_inset listings
  3763. inline false
  3764. status open
  3765. \begin_layout Plain Layout
  3766. int start_offset;
  3767. \end_layout
  3768. \begin_layout Plain Layout
  3769. err = jitter_buffer_get(state, &packet, desired_span, &start_offset);
  3770. \end_layout
  3771. \end_inset
  3772. \end_layout
  3773. \begin_layout Standard
  3774. If
  3775. \begin_inset listings
  3776. inline true
  3777. status open
  3778. \begin_layout Plain Layout
  3779. jitter_buffer_put()
  3780. \end_layout
  3781. \end_inset
  3782. and
  3783. \begin_inset listings
  3784. inline true
  3785. status collapsed
  3786. \begin_layout Plain Layout
  3787. jitter_buffer_get()
  3788. \end_layout
  3789. \end_inset
  3790. are called from different threads, then
  3791. \series bold
  3792. you need to protect the jitter buffer state with a mutex
  3793. \series default
  3794. .
  3795. \end_layout
  3796. \begin_layout Standard
  3797. Because the jitter buffer is designed not to use an explicit timer, it needs
  3798. to be told about the time explicitly.
  3799. This is done by calling:
  3800. \begin_inset listings
  3801. inline false
  3802. status open
  3803. \begin_layout Plain Layout
  3804. jitter_buffer_tick(state);
  3805. \end_layout
  3806. \end_inset
  3807. \end_layout
  3808. \begin_layout Standard
  3809. This needs to be done periodically in the playing thread.
  3810. This will be the last jitter buffer call before going to sleep (until more
  3811. data is played back).
  3812. In some cases, it may be preferable to use
  3813. \begin_inset listings
  3814. inline false
  3815. status open
  3816. \begin_layout Plain Layout
  3817. jitter_buffer_remaining_span(state, remaining);
  3818. \end_layout
  3819. \end_inset
  3820. \end_layout
  3821. \begin_layout Standard
  3822. The second argument is used to specify that we are still holding data that
  3823. has not been written to the playback device.
  3824. For instance, if 256 samples were needed by the soundcard (specified by
  3825. \begin_inset listings
  3826. inline true
  3827. status collapsed
  3828. \begin_layout Plain Layout
  3829. desired_span
  3830. \end_layout
  3831. \end_inset
  3832. ), but
  3833. \begin_inset listings
  3834. inline true
  3835. status collapsed
  3836. \begin_layout Plain Layout
  3837. jitter_buffer_get()
  3838. \end_layout
  3839. \end_inset
  3840. returned 320 samples, we would have
  3841. \begin_inset listings
  3842. inline true
  3843. status open
  3844. \begin_layout Plain Layout
  3845. remaining=64
  3846. \end_layout
  3847. \end_inset
  3848. .
  3849. \end_layout
  3850. \begin_layout Section
  3851. Resampler
  3852. \end_layout
  3853. \begin_layout Standard
  3854. Speex includes a resampling modules.
  3855. To make use of the resampler, it is necessary to include its header file:
  3856. \end_layout
  3857. \begin_layout Standard
  3858. \begin_inset listings
  3859. inline false
  3860. status open
  3861. \begin_layout Plain Layout
  3862. #include <speex/speex_resampler.h>
  3863. \end_layout
  3864. \end_inset
  3865. \end_layout
  3866. \begin_layout Standard
  3867. For each stream that is to be resampled, it is necessary to create a resampler
  3868. state with:
  3869. \end_layout
  3870. \begin_layout Standard
  3871. \begin_inset listings
  3872. inline false
  3873. status open
  3874. \begin_layout Plain Layout
  3875. SpeexResamplerState *resampler;
  3876. \end_layout
  3877. \begin_layout Plain Layout
  3878. resampler = speex_resampler_init(nb_channels, input_rate, output_rate, quality,
  3879. &err);
  3880. \end_layout
  3881. \end_inset
  3882. \end_layout
  3883. \begin_layout Standard
  3884. where
  3885. \begin_inset listings
  3886. inline true
  3887. status collapsed
  3888. \begin_layout Plain Layout
  3889. nb_channels
  3890. \end_layout
  3891. \end_inset
  3892. is the number of channels that will be used (either interleaved or non-interlea
  3893. ved),
  3894. \begin_inset listings
  3895. inline true
  3896. status collapsed
  3897. \begin_layout Plain Layout
  3898. input_rate
  3899. \end_layout
  3900. \end_inset
  3901. is the sampling rate of the input stream,
  3902. \begin_inset listings
  3903. inline true
  3904. status collapsed
  3905. \begin_layout Plain Layout
  3906. output_rate
  3907. \end_layout
  3908. \end_inset
  3909. is the sampling rate of the output stream and
  3910. \begin_inset listings
  3911. inline true
  3912. status collapsed
  3913. \begin_layout Plain Layout
  3914. quality
  3915. \end_layout
  3916. \end_inset
  3917. is the requested quality setting (0 to 10).
  3918. The quality parameter is useful for controlling the quality/complexity/latency
  3919. tradeoff.
  3920. Using a higher quality setting means less noise/aliasing, a higher complexity
  3921. and a higher latency.
  3922. Usually, a quality of 3 is acceptable for most desktop uses and quality
  3923. 10 is mostly recommended for pro audio work.
  3924. Quality 0 usually has a decent sound (certainly better than using linear
  3925. interpolation resampling), but artifacts may be heard.
  3926. \end_layout
  3927. \begin_layout Standard
  3928. The actual resampling is performed using
  3929. \end_layout
  3930. \begin_layout Standard
  3931. \begin_inset listings
  3932. inline false
  3933. status open
  3934. \begin_layout Plain Layout
  3935. err = speex_resampler_process_int(resampler, channelID, in, &in_length,
  3936. out, &out_length);
  3937. \end_layout
  3938. \end_inset
  3939. where
  3940. \begin_inset listings
  3941. inline true
  3942. status collapsed
  3943. \begin_layout Plain Layout
  3944. channelID
  3945. \end_layout
  3946. \end_inset
  3947. is the ID of the channel to be processed.
  3948. For a mono stream, use 0.
  3949. The
  3950. \emph on
  3951. in
  3952. \emph default
  3953. pointer points to the first sample of the input buffer for the selected
  3954. channel and
  3955. \begin_inset listings
  3956. inline true
  3957. status collapsed
  3958. \begin_layout Plain Layout
  3959. out
  3960. \end_layout
  3961. \end_inset
  3962. points to the first sample of the output.
  3963. The size of the input and output buffers are specified by
  3964. \begin_inset listings
  3965. inline true
  3966. status collapsed
  3967. \begin_layout Plain Layout
  3968. in_length
  3969. \end_layout
  3970. \end_inset
  3971. and
  3972. \begin_inset listings
  3973. inline true
  3974. status collapsed
  3975. \begin_layout Plain Layout
  3976. out_length
  3977. \end_layout
  3978. \end_inset
  3979. respectively.
  3980. Upon completion, these values are replaced by the number of samples read
  3981. and written by the resampler.
  3982. Unless an error occurs, either all input samples will be read or all output
  3983. samples will be written to (or both).
  3984. For floating-point samples, the function
  3985. \begin_inset listings
  3986. inline true
  3987. status open
  3988. \begin_layout Plain Layout
  3989. speex_resampler_process_float()
  3990. \end_layout
  3991. \end_inset
  3992. behaves similarly.
  3993. \end_layout
  3994. \begin_layout Standard
  3995. It is also possible to process multiple channels at once.
  3996. To do that, you can use speex_resampler_process_interleaved_int() or
  3997. \begin_inset listings
  3998. inline true
  3999. status open
  4000. \begin_layout Plain Layout
  4001. speex_resampler_process_interleaved_float()
  4002. \end_layout
  4003. \end_inset
  4004. .
  4005. The arguments are the same except that there is no
  4006. \begin_inset listings
  4007. inline true
  4008. status collapsed
  4009. \begin_layout Plain Layout
  4010. channelID
  4011. \end_layout
  4012. \end_inset
  4013. argument.
  4014. Note that the
  4015. \series bold
  4016. length parameters are per-channel
  4017. \series default
  4018. .
  4019. So if you have 1024 samples for each of 4 channels, you pass 1024 and not
  4020. 4096.
  4021. \end_layout
  4022. \begin_layout Standard
  4023. The resampler allows changing the quality and input/output sampling frequencies
  4024. on the fly without glitches.
  4025. This can be done with calls such as
  4026. \begin_inset listings
  4027. inline true
  4028. status open
  4029. \begin_layout Plain Layout
  4030. speex_resampler_set_quality()
  4031. \end_layout
  4032. \end_inset
  4033. and
  4034. \begin_inset listings
  4035. inline true
  4036. status open
  4037. \begin_layout Plain Layout
  4038. speex_resampler_set_rate()
  4039. \end_layout
  4040. \end_inset
  4041. .
  4042. The only side effect is that a new filter will have to be recomputed, consuming
  4043. many CPU cycles.
  4044. \end_layout
  4045. \begin_layout Standard
  4046. When resampling a file, it is often desirable to have the output file perfectly
  4047. synchronised with the input.
  4048. To do that, you need to call
  4049. \begin_inset listings
  4050. inline true
  4051. status open
  4052. \begin_layout Plain Layout
  4053. speex_resampler_skip_zeros()
  4054. \end_layout
  4055. \end_inset
  4056. \series bold
  4057. before
  4058. \series default
  4059. you start processing any samples.
  4060. For real-time applications (e.g.
  4061. VoIP), it is not recommended to do that as the first process frame will
  4062. be shorter to compensate for the delay (the skipped zeros).
  4063. Instead, in real-time applications you may want to know how many delay
  4064. is introduced by the resampler.
  4065. This can be done at run-time with
  4066. \begin_inset listings
  4067. inline true
  4068. status open
  4069. \begin_layout Plain Layout
  4070. speex_resampler_get_input_latency()
  4071. \end_layout
  4072. \end_inset
  4073. and
  4074. \begin_inset listings
  4075. inline true
  4076. status open
  4077. \begin_layout Plain Layout
  4078. speex_resampler_get_output_latency()
  4079. \end_layout
  4080. \end_inset
  4081. functions.
  4082. First function returns delay measured in samples at input samplerate, while
  4083. second returns delay measured in samples at output samplerate.
  4084. \end_layout
  4085. \begin_layout Standard
  4086. To destroy a resampler state, just call
  4087. \begin_inset listings
  4088. inline true
  4089. status open
  4090. \begin_layout Plain Layout
  4091. speex_resampler_destroy()
  4092. \end_layout
  4093. \end_inset
  4094. .
  4095. \end_layout
  4096. \begin_layout Section
  4097. Ring Buffer
  4098. \end_layout
  4099. \begin_layout Standard
  4100. In some cases, it is necessary to interface components that use different
  4101. block sizes.
  4102. For example, it is possible that the soundcard does not support reading/writing
  4103. in blocks of 20
  4104. \begin_inset space ~
  4105. \end_inset
  4106. ms or sometimes, complicated resampling ratios mean that the blocks don't
  4107. always have the same time.
  4108. In thoses cases, it is often necessary to buffer a bit of audio using a
  4109. ring buffer.
  4110. \end_layout
  4111. \begin_layout Standard
  4112. \begin_inset Newpage newpage
  4113. \end_inset
  4114. \end_layout
  4115. \begin_layout Chapter
  4116. Formats and standards
  4117. \begin_inset Index
  4118. status collapsed
  4119. \begin_layout Plain Layout
  4120. standards
  4121. \end_layout
  4122. \end_inset
  4123. \begin_inset CommandInset label
  4124. LatexCommand label
  4125. name "sec:Formats-and-standards"
  4126. \end_inset
  4127. \end_layout
  4128. \begin_layout Standard
  4129. Speex can encode speech in both narrowband and wideband and provides different
  4130. bit-rates.
  4131. However, not all features need to be supported by a certain implementation
  4132. or device.
  4133. In order to be called
  4134. \begin_inset Quotes eld
  4135. \end_inset
  4136. Speex compatible
  4137. \begin_inset Quotes erd
  4138. \end_inset
  4139. (whatever that means), an implementation must implement at least a basic
  4140. set of features.
  4141. \end_layout
  4142. \begin_layout Standard
  4143. At the minimum, all narrowband modes of operation MUST be supported at the
  4144. decoder.
  4145. This includes the decoding of a wideband bit-stream by the narrowband decoder
  4146. \begin_inset Foot
  4147. status collapsed
  4148. \begin_layout Plain Layout
  4149. The wideband bit-stream contains an embedded narrowband bit-stream which
  4150. can be decoded alone
  4151. \end_layout
  4152. \end_inset
  4153. .
  4154. If present, a wideband decoder MUST be able to decode a narrowband stream,
  4155. and MAY either be able to decode all wideband modes or be able to decode
  4156. the embedded narrowband part of all modes (which includes ignoring the
  4157. high-band bits).
  4158. \end_layout
  4159. \begin_layout Standard
  4160. For encoders, at least one narrowband or wideband mode MUST be supported.
  4161. The main reason why all encoding modes do not have to be supported is that
  4162. some platforms may not be able to handle the complexity of encoding in
  4163. some modes.
  4164. \end_layout
  4165. \begin_layout Section
  4166. RTP
  4167. \begin_inset Index
  4168. status collapsed
  4169. \begin_layout Plain Layout
  4170. RTP
  4171. \end_layout
  4172. \end_inset
  4173. Payload Format
  4174. \end_layout
  4175. \begin_layout Standard
  4176. The RTP payload draft is included in appendix
  4177. \begin_inset CommandInset ref
  4178. LatexCommand ref
  4179. reference "sec:IETF-draft"
  4180. \end_inset
  4181. and the latest version is available at
  4182. \begin_inset Flex URL
  4183. status collapsed
  4184. \begin_layout Plain Layout
  4185. http://www.speex.org/drafts/latest
  4186. \end_layout
  4187. \end_inset
  4188. .
  4189. This draft has been sent (2003/02/26) to the Internet Engineering Task
  4190. Force (IETF) and will be discussed at the March 18th meeting in San Francisco.
  4191. \end_layout
  4192. \begin_layout Section
  4193. MIME Type
  4194. \end_layout
  4195. \begin_layout Standard
  4196. For now, you should use the MIME type audio/x-speex for Speex-in-Ogg.
  4197. We will apply for type
  4198. \family typewriter
  4199. audio/speex
  4200. \family default
  4201. in the near future.
  4202. \end_layout
  4203. \begin_layout Section
  4204. Ogg
  4205. \begin_inset Index
  4206. status collapsed
  4207. \begin_layout Plain Layout
  4208. Ogg
  4209. \end_layout
  4210. \end_inset
  4211. file format
  4212. \end_layout
  4213. \begin_layout Standard
  4214. Speex bit-streams can be stored in Ogg files.
  4215. In this case, the first packet of the Ogg file contains the Speex header
  4216. described in table
  4217. \begin_inset CommandInset ref
  4218. LatexCommand ref
  4219. reference "cap:ogg_speex_header"
  4220. \end_inset
  4221. .
  4222. All integer fields in the headers are stored as little-endian.
  4223. The
  4224. \family typewriter
  4225. speex_string
  4226. \family default
  4227. field must contain the
  4228. \begin_inset Quotes eld
  4229. \end_inset
  4230. \family typewriter
  4231. Speex
  4232. \family default
  4233. \begin_inset space ~
  4234. \end_inset
  4235. \begin_inset space ~
  4236. \end_inset
  4237. \begin_inset space ~
  4238. \end_inset
  4239. \begin_inset Quotes erd
  4240. \end_inset
  4241. (with 3 trailing spaces), which identifies the bit-stream.
  4242. The next field,
  4243. \family typewriter
  4244. speex_version
  4245. \family default
  4246. contains the version of Speex that encoded the file.
  4247. For now, refer to speex_header.[ch] for more info.
  4248. The
  4249. \emph on
  4250. beginning of stream
  4251. \emph default
  4252. (
  4253. \family typewriter
  4254. b_o_s
  4255. \family default
  4256. ) flag is set to 1 for the header.
  4257. The header packet has
  4258. \family typewriter
  4259. packetno=0
  4260. \family default
  4261. and
  4262. \family typewriter
  4263. granulepos=0
  4264. \family default
  4265. .
  4266. \end_layout
  4267. \begin_layout Standard
  4268. The second packet contains the Speex comment header.
  4269. The format used is the Vorbis comment format described here: http://www.xiph.org/
  4270. ogg/vorbis/doc/v-comment.html .
  4271. This packet has
  4272. \family typewriter
  4273. packetno=1
  4274. \family default
  4275. and
  4276. \family typewriter
  4277. granulepos=0
  4278. \family default
  4279. .
  4280. \end_layout
  4281. \begin_layout Standard
  4282. The third and subsequent packets each contain one or more (number found
  4283. in header) Speex frames.
  4284. These are identified with
  4285. \family typewriter
  4286. packetno
  4287. \family default
  4288. starting from 2 and the
  4289. \family typewriter
  4290. granulepos
  4291. \family default
  4292. is the number of the last sample encoded in that packet.
  4293. The last of these packets has the
  4294. \emph on
  4295. end of stream
  4296. \emph default
  4297. (
  4298. \family typewriter
  4299. e_o_s
  4300. \family default
  4301. ) flag is set to 1.
  4302. \end_layout
  4303. \begin_layout Standard
  4304. \begin_inset Float table
  4305. placement htbp
  4306. wide true
  4307. sideways false
  4308. status open
  4309. \begin_layout Plain Layout
  4310. \begin_inset ERT
  4311. status collapsed
  4312. \begin_layout Plain Layout
  4313. \backslash
  4314. begin{center}
  4315. \end_layout
  4316. \end_inset
  4317. \begin_inset Tabular
  4318. <lyxtabular version="3" rows="16" columns="3">
  4319. <features>
  4320. <column alignment="center" valignment="top" width="0pt">
  4321. <column alignment="center" valignment="top" width="0pt">
  4322. <column alignment="center" valignment="top" width="0pt">
  4323. <row>
  4324. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  4325. \begin_inset Text
  4326. \begin_layout Plain Layout
  4327. Field
  4328. \end_layout
  4329. \end_inset
  4330. </cell>
  4331. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  4332. \begin_inset Text
  4333. \begin_layout Plain Layout
  4334. Type
  4335. \end_layout
  4336. \end_inset
  4337. </cell>
  4338. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  4339. \begin_inset Text
  4340. \begin_layout Plain Layout
  4341. Size
  4342. \end_layout
  4343. \end_inset
  4344. </cell>
  4345. </row>
  4346. <row>
  4347. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4348. \begin_inset Text
  4349. \begin_layout Plain Layout
  4350. speex_string
  4351. \end_layout
  4352. \end_inset
  4353. </cell>
  4354. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4355. \begin_inset Text
  4356. \begin_layout Plain Layout
  4357. char[]
  4358. \end_layout
  4359. \end_inset
  4360. </cell>
  4361. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4362. \begin_inset Text
  4363. \begin_layout Plain Layout
  4364. 8
  4365. \end_layout
  4366. \end_inset
  4367. </cell>
  4368. </row>
  4369. <row>
  4370. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4371. \begin_inset Text
  4372. \begin_layout Plain Layout
  4373. speex_version
  4374. \end_layout
  4375. \end_inset
  4376. </cell>
  4377. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4378. \begin_inset Text
  4379. \begin_layout Plain Layout
  4380. char[]
  4381. \end_layout
  4382. \end_inset
  4383. </cell>
  4384. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4385. \begin_inset Text
  4386. \begin_layout Plain Layout
  4387. 20
  4388. \end_layout
  4389. \end_inset
  4390. </cell>
  4391. </row>
  4392. <row>
  4393. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4394. \begin_inset Text
  4395. \begin_layout Plain Layout
  4396. speex_version_id
  4397. \end_layout
  4398. \end_inset
  4399. </cell>
  4400. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4401. \begin_inset Text
  4402. \begin_layout Plain Layout
  4403. int
  4404. \end_layout
  4405. \end_inset
  4406. </cell>
  4407. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4408. \begin_inset Text
  4409. \begin_layout Plain Layout
  4410. 4
  4411. \end_layout
  4412. \end_inset
  4413. </cell>
  4414. </row>
  4415. <row>
  4416. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4417. \begin_inset Text
  4418. \begin_layout Plain Layout
  4419. header_size
  4420. \end_layout
  4421. \end_inset
  4422. </cell>
  4423. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4424. \begin_inset Text
  4425. \begin_layout Plain Layout
  4426. int
  4427. \end_layout
  4428. \end_inset
  4429. </cell>
  4430. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4431. \begin_inset Text
  4432. \begin_layout Plain Layout
  4433. 4
  4434. \end_layout
  4435. \end_inset
  4436. </cell>
  4437. </row>
  4438. <row>
  4439. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4440. \begin_inset Text
  4441. \begin_layout Plain Layout
  4442. rate
  4443. \end_layout
  4444. \end_inset
  4445. </cell>
  4446. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4447. \begin_inset Text
  4448. \begin_layout Plain Layout
  4449. int
  4450. \end_layout
  4451. \end_inset
  4452. </cell>
  4453. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4454. \begin_inset Text
  4455. \begin_layout Plain Layout
  4456. 4
  4457. \end_layout
  4458. \end_inset
  4459. </cell>
  4460. </row>
  4461. <row>
  4462. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4463. \begin_inset Text
  4464. \begin_layout Plain Layout
  4465. mode
  4466. \end_layout
  4467. \end_inset
  4468. </cell>
  4469. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4470. \begin_inset Text
  4471. \begin_layout Plain Layout
  4472. int
  4473. \end_layout
  4474. \end_inset
  4475. </cell>
  4476. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4477. \begin_inset Text
  4478. \begin_layout Plain Layout
  4479. 4
  4480. \end_layout
  4481. \end_inset
  4482. </cell>
  4483. </row>
  4484. <row>
  4485. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4486. \begin_inset Text
  4487. \begin_layout Plain Layout
  4488. mode_bitstream_version
  4489. \end_layout
  4490. \end_inset
  4491. </cell>
  4492. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4493. \begin_inset Text
  4494. \begin_layout Plain Layout
  4495. int
  4496. \end_layout
  4497. \end_inset
  4498. </cell>
  4499. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4500. \begin_inset Text
  4501. \begin_layout Plain Layout
  4502. 4
  4503. \end_layout
  4504. \end_inset
  4505. </cell>
  4506. </row>
  4507. <row>
  4508. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4509. \begin_inset Text
  4510. \begin_layout Plain Layout
  4511. nb_channels
  4512. \end_layout
  4513. \end_inset
  4514. </cell>
  4515. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4516. \begin_inset Text
  4517. \begin_layout Plain Layout
  4518. int
  4519. \end_layout
  4520. \end_inset
  4521. </cell>
  4522. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4523. \begin_inset Text
  4524. \begin_layout Plain Layout
  4525. 4
  4526. \end_layout
  4527. \end_inset
  4528. </cell>
  4529. </row>
  4530. <row>
  4531. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4532. \begin_inset Text
  4533. \begin_layout Plain Layout
  4534. bitrate
  4535. \end_layout
  4536. \end_inset
  4537. </cell>
  4538. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4539. \begin_inset Text
  4540. \begin_layout Plain Layout
  4541. int
  4542. \end_layout
  4543. \end_inset
  4544. </cell>
  4545. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4546. \begin_inset Text
  4547. \begin_layout Plain Layout
  4548. 4
  4549. \end_layout
  4550. \end_inset
  4551. </cell>
  4552. </row>
  4553. <row>
  4554. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4555. \begin_inset Text
  4556. \begin_layout Plain Layout
  4557. frame_size
  4558. \end_layout
  4559. \end_inset
  4560. </cell>
  4561. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4562. \begin_inset Text
  4563. \begin_layout Plain Layout
  4564. int
  4565. \end_layout
  4566. \end_inset
  4567. </cell>
  4568. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4569. \begin_inset Text
  4570. \begin_layout Plain Layout
  4571. 4
  4572. \end_layout
  4573. \end_inset
  4574. </cell>
  4575. </row>
  4576. <row>
  4577. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4578. \begin_inset Text
  4579. \begin_layout Plain Layout
  4580. vbr
  4581. \end_layout
  4582. \end_inset
  4583. </cell>
  4584. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4585. \begin_inset Text
  4586. \begin_layout Plain Layout
  4587. int
  4588. \end_layout
  4589. \end_inset
  4590. </cell>
  4591. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4592. \begin_inset Text
  4593. \begin_layout Plain Layout
  4594. 4
  4595. \end_layout
  4596. \end_inset
  4597. </cell>
  4598. </row>
  4599. <row>
  4600. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4601. \begin_inset Text
  4602. \begin_layout Plain Layout
  4603. frames_per_packet
  4604. \end_layout
  4605. \end_inset
  4606. </cell>
  4607. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4608. \begin_inset Text
  4609. \begin_layout Plain Layout
  4610. int
  4611. \end_layout
  4612. \end_inset
  4613. </cell>
  4614. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4615. \begin_inset Text
  4616. \begin_layout Plain Layout
  4617. 4
  4618. \end_layout
  4619. \end_inset
  4620. </cell>
  4621. </row>
  4622. <row>
  4623. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4624. \begin_inset Text
  4625. \begin_layout Plain Layout
  4626. extra_headers
  4627. \end_layout
  4628. \end_inset
  4629. </cell>
  4630. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4631. \begin_inset Text
  4632. \begin_layout Plain Layout
  4633. int
  4634. \end_layout
  4635. \end_inset
  4636. </cell>
  4637. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4638. \begin_inset Text
  4639. \begin_layout Plain Layout
  4640. 4
  4641. \end_layout
  4642. \end_inset
  4643. </cell>
  4644. </row>
  4645. <row>
  4646. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4647. \begin_inset Text
  4648. \begin_layout Plain Layout
  4649. reserved1
  4650. \end_layout
  4651. \end_inset
  4652. </cell>
  4653. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  4654. \begin_inset Text
  4655. \begin_layout Plain Layout
  4656. int
  4657. \end_layout
  4658. \end_inset
  4659. </cell>
  4660. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  4661. \begin_inset Text
  4662. \begin_layout Plain Layout
  4663. 4
  4664. \end_layout
  4665. \end_inset
  4666. </cell>
  4667. </row>
  4668. <row>
  4669. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  4670. \begin_inset Text
  4671. \begin_layout Plain Layout
  4672. reserved2
  4673. \end_layout
  4674. \end_inset
  4675. </cell>
  4676. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  4677. \begin_inset Text
  4678. \begin_layout Plain Layout
  4679. int
  4680. \end_layout
  4681. \end_inset
  4682. </cell>
  4683. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  4684. \begin_inset Text
  4685. \begin_layout Plain Layout
  4686. 4
  4687. \end_layout
  4688. \end_inset
  4689. </cell>
  4690. </row>
  4691. </lyxtabular>
  4692. \end_inset
  4693. \begin_inset ERT
  4694. status collapsed
  4695. \begin_layout Plain Layout
  4696. \backslash
  4697. end{center}
  4698. \end_layout
  4699. \end_inset
  4700. \end_layout
  4701. \begin_layout Plain Layout
  4702. \begin_inset Caption
  4703. \begin_layout Plain Layout
  4704. Ogg/Speex header packet
  4705. \begin_inset CommandInset label
  4706. LatexCommand label
  4707. name "cap:ogg_speex_header"
  4708. \end_inset
  4709. \end_layout
  4710. \end_inset
  4711. \end_layout
  4712. \end_inset
  4713. \end_layout
  4714. \begin_layout Standard
  4715. \begin_inset ERT
  4716. status collapsed
  4717. \begin_layout Plain Layout
  4718. \backslash
  4719. clearpage
  4720. \end_layout
  4721. \end_inset
  4722. \end_layout
  4723. \begin_layout Chapter
  4724. Introduction to CELP Coding
  4725. \begin_inset Index
  4726. status collapsed
  4727. \begin_layout Plain Layout
  4728. CELP
  4729. \end_layout
  4730. \end_inset
  4731. \begin_inset CommandInset label
  4732. LatexCommand label
  4733. name "sec:Introduction-to-CELP"
  4734. \end_inset
  4735. \end_layout
  4736. \begin_layout Quote
  4737. \align center
  4738. \emph on
  4739. Do not meddle in the affairs of poles, for they are subtle and quick to
  4740. leave the unit circle.
  4741. \end_layout
  4742. \begin_layout Standard
  4743. Speex is based on CELP, which stands for Code Excited Linear Prediction.
  4744. This section attempts to introduce the principles behind CELP, so if you
  4745. are already familiar with CELP, you can safely skip to section
  4746. \begin_inset CommandInset ref
  4747. LatexCommand ref
  4748. reference "sec:Speex-narrowband-mode"
  4749. \end_inset
  4750. .
  4751. The CELP technique is based on three ideas:
  4752. \end_layout
  4753. \begin_layout Enumerate
  4754. The use of a linear prediction (LP) model to model the vocal tract
  4755. \end_layout
  4756. \begin_layout Enumerate
  4757. The use of (adaptive and fixed) codebook entries as input (excitation) of
  4758. the LP model
  4759. \end_layout
  4760. \begin_layout Enumerate
  4761. The search performed in closed-loop in a
  4762. \begin_inset Quotes eld
  4763. \end_inset
  4764. perceptually weighted domain
  4765. \begin_inset Quotes erd
  4766. \end_inset
  4767. \end_layout
  4768. \begin_layout Standard
  4769. This section describes the basic ideas behind CELP.
  4770. This is still a work in progress.
  4771. \end_layout
  4772. \begin_layout Section
  4773. Source-Filter Model of Speech Prediction
  4774. \end_layout
  4775. \begin_layout Standard
  4776. The source-filter model of speech production assumes that the vocal cords
  4777. are the source of spectrally flat sound (the excitation signal), and that
  4778. the vocal tract acts as a filter to spectrally shape the various sounds
  4779. of speech.
  4780. While still an approximation, the model is widely used in speech coding
  4781. because of its simplicity.Its use is also the reason why most speech codecs
  4782. (Speex included) perform badly on music signals.
  4783. The different phonemes can be distinguished by their excitation (source)
  4784. and spectral shape (filter).
  4785. Voiced sounds (e.g.
  4786. vowels) have an excitation signal that is periodic and that can be approximated
  4787. by an impulse train in the time domain or by regularly-spaced harmonics
  4788. in the frequency domain.
  4789. On the other hand, fricatives (such as the "s", "sh" and "f" sounds) have
  4790. an excitation signal that is similar to white Gaussian noise.
  4791. So called voice fricatives (such as "z" and "v") have excitation signal
  4792. composed of an harmonic part and a noisy part.
  4793. \end_layout
  4794. \begin_layout Standard
  4795. The source-filter model is usually tied with the use of Linear prediction.
  4796. The CELP model is based on source-filter model, as can be seen from the
  4797. CELP decoder illustrated in Figure
  4798. \begin_inset CommandInset ref
  4799. LatexCommand ref
  4800. reference "fig:The-CELP-model"
  4801. \end_inset
  4802. .
  4803. \end_layout
  4804. \begin_layout Standard
  4805. \begin_inset Float figure
  4806. wide false
  4807. sideways false
  4808. status open
  4809. \begin_layout Plain Layout
  4810. \begin_inset ERT
  4811. status collapsed
  4812. \begin_layout Plain Layout
  4813. \backslash
  4814. begin{center}
  4815. \end_layout
  4816. \end_inset
  4817. \begin_inset Graphics
  4818. filename celp_decoder.eps
  4819. width 45page%
  4820. keepAspectRatio
  4821. \end_inset
  4822. \begin_inset ERT
  4823. status collapsed
  4824. \begin_layout Plain Layout
  4825. \backslash
  4826. end{center}
  4827. \end_layout
  4828. \end_inset
  4829. \end_layout
  4830. \begin_layout Plain Layout
  4831. \begin_inset Caption
  4832. \begin_layout Plain Layout
  4833. The CELP model of speech synthesis (decoder)
  4834. \begin_inset CommandInset label
  4835. LatexCommand label
  4836. name "fig:The-CELP-model"
  4837. \end_inset
  4838. \end_layout
  4839. \end_inset
  4840. \end_layout
  4841. \end_inset
  4842. \end_layout
  4843. \begin_layout Section
  4844. Linear Prediction Coefficients (LPC)
  4845. \begin_inset Index
  4846. status collapsed
  4847. \begin_layout Plain Layout
  4848. linear prediction
  4849. \end_layout
  4850. \end_inset
  4851. \end_layout
  4852. \begin_layout Standard
  4853. Linear prediction is at the base of many speech coding techniques, including
  4854. CELP.
  4855. The idea behind it is to predict the signal
  4856. \begin_inset Formula $x[n]$
  4857. \end_inset
  4858. using a linear combination of its past samples:
  4859. \end_layout
  4860. \begin_layout Standard
  4861. \begin_inset Formula \[
  4862. y[n]=\sum_{i=1}^{N}a_{i}x[n-i]\]
  4863. \end_inset
  4864. where
  4865. \begin_inset Formula $y[n]$
  4866. \end_inset
  4867. is the linear prediction of
  4868. \begin_inset Formula $x[n]$
  4869. \end_inset
  4870. .
  4871. The prediction error is thus given by:
  4872. \begin_inset Formula \[
  4873. e[n]=x[n]-y[n]=x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\]
  4874. \end_inset
  4875. \end_layout
  4876. \begin_layout Standard
  4877. The goal of the LPC analysis is to find the best prediction coefficients
  4878. \begin_inset Formula $a_{i}$
  4879. \end_inset
  4880. which minimize the quadratic error function:
  4881. \begin_inset Formula \[
  4882. E=\sum_{n=0}^{L-1}\left[e[n]\right]^{2}=\sum_{n=0}^{L-1}\left[x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\right]^{2}\]
  4883. \end_inset
  4884. That can be done by making all derivatives
  4885. \begin_inset Formula $\frac{\partial E}{\partial a_{i}}$
  4886. \end_inset
  4887. equal to zero:
  4888. \begin_inset Formula \[
  4889. \frac{\partial E}{\partial a_{i}}=\frac{\partial}{\partial a_{i}}\sum_{n=0}^{L-1}\left[x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\right]^{2}=0\]
  4890. \end_inset
  4891. \end_layout
  4892. \begin_layout Standard
  4893. For an order
  4894. \begin_inset Formula $N$
  4895. \end_inset
  4896. filter, the filter coefficients
  4897. \begin_inset Formula $a_{i}$
  4898. \end_inset
  4899. are found by solving the system
  4900. \begin_inset Formula $N\times N$
  4901. \end_inset
  4902. linear system
  4903. \begin_inset Formula $\mathbf{Ra}=\mathbf{r}$
  4904. \end_inset
  4905. , where
  4906. \begin_inset Formula \[
  4907. \mathbf{R}=\left[\begin{array}{cccc}
  4908. R(0) & R(1) & \cdots & R(N-1)\\
  4909. R(1) & R(0) & \cdots & R(N-2)\\
  4910. \vdots & \vdots & \ddots & \vdots\\
  4911. R(N-1) & R(N-2) & \cdots & R(0)\end{array}\right]\]
  4912. \end_inset
  4913. \begin_inset Formula \[
  4914. \mathbf{r}=\left[\begin{array}{c}
  4915. R(1)\\
  4916. R(2)\\
  4917. \vdots\\
  4918. R(N)\end{array}\right]\]
  4919. \end_inset
  4920. with
  4921. \begin_inset Formula $R(m)$
  4922. \end_inset
  4923. , the auto-correlation
  4924. \begin_inset Index
  4925. status collapsed
  4926. \begin_layout Plain Layout
  4927. auto-correlation
  4928. \end_layout
  4929. \end_inset
  4930. of the signal
  4931. \begin_inset Formula $x[n]$
  4932. \end_inset
  4933. , computed as:
  4934. \end_layout
  4935. \begin_layout Standard
  4936. \begin_inset Formula \[
  4937. R(m)=\sum_{i=0}^{N-1}x[i]x[i-m]\]
  4938. \end_inset
  4939. \end_layout
  4940. \begin_layout Standard
  4941. Because
  4942. \begin_inset Formula $\mathbf{R}$
  4943. \end_inset
  4944. is Hermitian Toeplitz, the Levinson-Durbin
  4945. \begin_inset Index
  4946. status collapsed
  4947. \begin_layout Plain Layout
  4948. Levinson-Durbin
  4949. \end_layout
  4950. \end_inset
  4951. algorithm can be used, making the solution to the problem
  4952. \begin_inset Formula $\mathcal{O}\left(N^{2}\right)$
  4953. \end_inset
  4954. instead of
  4955. \begin_inset Formula $\mathcal{O}\left(N^{3}\right)$
  4956. \end_inset
  4957. .
  4958. Also, it can be proven that all the roots of
  4959. \begin_inset Formula $A(z)$
  4960. \end_inset
  4961. are within the unit circle, which means that
  4962. \begin_inset Formula $1/A(z)$
  4963. \end_inset
  4964. is always stable.
  4965. This is in theory; in practice because of finite precision, there are two
  4966. commonly used techniques to make sure we have a stable filter.
  4967. First, we multiply
  4968. \begin_inset Formula $R(0)$
  4969. \end_inset
  4970. by a number slightly above one (such as 1.0001), which is equivalent to
  4971. adding noise to the signal.
  4972. Also, we can apply a window to the auto-correlation, which is equivalent
  4973. to filtering in the frequency domain, reducing sharp resonances.
  4974. \end_layout
  4975. \begin_layout Section
  4976. Pitch Prediction
  4977. \begin_inset Index
  4978. status collapsed
  4979. \begin_layout Plain Layout
  4980. pitch
  4981. \end_layout
  4982. \end_inset
  4983. \end_layout
  4984. \begin_layout Standard
  4985. During voiced segments, the speech signal is periodic, so it is possible
  4986. to take advantage of that property by approximating the excitation signal
  4987. \begin_inset Formula $e[n]$
  4988. \end_inset
  4989. by a gain times the past of the excitation:
  4990. \end_layout
  4991. \begin_layout Standard
  4992. \begin_inset Formula \[
  4993. e[n]\simeq p[n]=\beta e[n-T]\ ,\]
  4994. \end_inset
  4995. where
  4996. \begin_inset Formula $T$
  4997. \end_inset
  4998. is the pitch period,
  4999. \begin_inset Formula $\beta$
  5000. \end_inset
  5001. is the pitch gain.
  5002. We call that long-term prediction since the excitation is predicted from
  5003. \begin_inset Formula $e[n-T]$
  5004. \end_inset
  5005. with
  5006. \begin_inset Formula $T\gg N$
  5007. \end_inset
  5008. .
  5009. \end_layout
  5010. \begin_layout Section
  5011. Innovation Codebook
  5012. \end_layout
  5013. \begin_layout Standard
  5014. The final excitation
  5015. \begin_inset Formula $e[n]$
  5016. \end_inset
  5017. will be the sum of the pitch prediction and an
  5018. \emph on
  5019. innovation
  5020. \emph default
  5021. signal
  5022. \begin_inset Formula $c[n]$
  5023. \end_inset
  5024. taken from a fixed codebook, hence the name
  5025. \emph on
  5026. Code
  5027. \emph default
  5028. Excited Linear Prediction.
  5029. The final excitation is given by
  5030. \end_layout
  5031. \begin_layout Standard
  5032. \begin_inset Formula \[
  5033. e[n]=p[n]+c[n]=\beta e[n-T]+c[n]\ .\]
  5034. \end_inset
  5035. The quantization of
  5036. \begin_inset Formula $c[n]$
  5037. \end_inset
  5038. is where most of the bits in a CELP codec are allocated.
  5039. It represents the information that couldn't be obtained either from linear
  5040. prediction or pitch prediction.
  5041. In the
  5042. \emph on
  5043. z
  5044. \emph default
  5045. -domain we can represent the final signal
  5046. \begin_inset Formula $X(z)$
  5047. \end_inset
  5048. as
  5049. \begin_inset Formula \[
  5050. X(z)=\frac{C(z)}{A(z)\left(1-\beta z^{-T}\right)}\]
  5051. \end_inset
  5052. \end_layout
  5053. \begin_layout Section
  5054. Noise Weighting
  5055. \begin_inset Index
  5056. status collapsed
  5057. \begin_layout Plain Layout
  5058. error weighting
  5059. \end_layout
  5060. \end_inset
  5061. \begin_inset Index
  5062. status collapsed
  5063. \begin_layout Plain Layout
  5064. analysis-by-synthesis
  5065. \end_layout
  5066. \end_inset
  5067. \end_layout
  5068. \begin_layout Standard
  5069. Most (if not all) modern audio codecs attempt to
  5070. \begin_inset Quotes eld
  5071. \end_inset
  5072. shape
  5073. \begin_inset Quotes erd
  5074. \end_inset
  5075. the noise so that it appears mostly in the frequency regions where the
  5076. ear cannot detect it.
  5077. For example, the ear is more tolerant to noise in parts of the spectrum
  5078. that are louder and
  5079. \emph on
  5080. vice versa
  5081. \emph default
  5082. .
  5083. In order to maximize speech quality, CELP codecs minimize the mean square
  5084. of the error (noise) in the perceptually weighted domain.
  5085. This means that a perceptual noise weighting filter
  5086. \begin_inset Formula $W(z)$
  5087. \end_inset
  5088. is applied to the error signal in the encoder.
  5089. In most CELP codecs,
  5090. \begin_inset Formula $W(z)$
  5091. \end_inset
  5092. is a pole-zero weighting filter derived from the linear prediction coefficients
  5093. (LPC), generally using bandwidth expansion.
  5094. Let the spectral envelope be represented by the synthesis filter
  5095. \begin_inset Formula $1/A(z)$
  5096. \end_inset
  5097. , CELP codecs typically derive the noise weighting filter as
  5098. \begin_inset Formula \begin{equation}
  5099. W(z)=\frac{A(z/\gamma_{1})}{A(z/\gamma_{2})}\ ,\label{eq:gamma-weighting}\end{equation}
  5100. \end_inset
  5101. where
  5102. \begin_inset Formula $\gamma_{1}=0.9$
  5103. \end_inset
  5104. and
  5105. \begin_inset Formula $\gamma_{2}=0.6$
  5106. \end_inset
  5107. in the Speex reference implementation.
  5108. If a filter
  5109. \begin_inset Formula $A(z)$
  5110. \end_inset
  5111. has (complex) poles at
  5112. \begin_inset Formula $p_{i}$
  5113. \end_inset
  5114. in the
  5115. \begin_inset Formula $z$
  5116. \end_inset
  5117. -plane, the filter
  5118. \begin_inset Formula $A(z/\gamma)$
  5119. \end_inset
  5120. will have its poles at
  5121. \begin_inset Formula $p'_{i}=\gamma p_{i}$
  5122. \end_inset
  5123. , making it a flatter version of
  5124. \begin_inset Formula $A(z)$
  5125. \end_inset
  5126. .
  5127. \end_layout
  5128. \begin_layout Standard
  5129. The weighting filter is applied to the error signal used to optimize the
  5130. codebook search through analysis-by-synthesis (AbS).
  5131. This results in a spectral shape of the noise that tends towards
  5132. \begin_inset Formula $1/W(z)$
  5133. \end_inset
  5134. .
  5135. While the simplicity of the model has been an important reason for the
  5136. success of CELP, it remains that
  5137. \begin_inset Formula $W(z)$
  5138. \end_inset
  5139. is a very rough approximation for the perceptually optimal noise weighting
  5140. function.
  5141. Fig.
  5142. \begin_inset CommandInset ref
  5143. LatexCommand ref
  5144. reference "cap:Standard-noise-shaping"
  5145. \end_inset
  5146. illustrates the noise shaping that results from Eq.
  5147. \begin_inset CommandInset ref
  5148. LatexCommand ref
  5149. reference "eq:gamma-weighting"
  5150. \end_inset
  5151. .
  5152. Throughout this paper, we refer to
  5153. \begin_inset Formula $W(z)$
  5154. \end_inset
  5155. as the noise weighting filter and to
  5156. \begin_inset Formula $1/W(z)$
  5157. \end_inset
  5158. as the noise shaping filter (or curve).
  5159. \end_layout
  5160. \begin_layout Standard
  5161. \begin_inset Float figure
  5162. wide false
  5163. sideways false
  5164. status open
  5165. \begin_layout Plain Layout
  5166. \begin_inset ERT
  5167. status collapsed
  5168. \begin_layout Plain Layout
  5169. \backslash
  5170. begin{center}
  5171. \end_layout
  5172. \end_inset
  5173. \begin_inset Graphics
  5174. filename ref_shaping.eps
  5175. width 45page%
  5176. keepAspectRatio
  5177. \end_inset
  5178. \begin_inset ERT
  5179. status collapsed
  5180. \begin_layout Plain Layout
  5181. \backslash
  5182. end{center}
  5183. \end_layout
  5184. \end_inset
  5185. \end_layout
  5186. \begin_layout Plain Layout
  5187. \begin_inset Caption
  5188. \begin_layout Plain Layout
  5189. Standard noise shaping in CELP.
  5190. Arbitrary y-axis offset.
  5191. \begin_inset CommandInset label
  5192. LatexCommand label
  5193. name "cap:Standard-noise-shaping"
  5194. \end_inset
  5195. \end_layout
  5196. \end_inset
  5197. \end_layout
  5198. \end_inset
  5199. \end_layout
  5200. \begin_layout Section
  5201. Analysis-by-Synthesis
  5202. \end_layout
  5203. \begin_layout Standard
  5204. One of the main principles behind CELP is called Analysis-by-Synthesis (AbS),
  5205. meaning that the encoding (analysis) is performed by perceptually optimising
  5206. the decoded (synthesis) signal in a closed loop.
  5207. In theory, the best CELP stream would be produced by trying all possible
  5208. bit combinations and selecting the one that produces the best-sounding
  5209. decoded signal.
  5210. This is obviously not possible in practice for two reasons: the required
  5211. complexity is beyond any currently available hardware and the
  5212. \begin_inset Quotes eld
  5213. \end_inset
  5214. best sounding
  5215. \begin_inset Quotes erd
  5216. \end_inset
  5217. selection criterion implies a human listener.
  5218. \end_layout
  5219. \begin_layout Standard
  5220. In order to achieve real-time encoding using limited computing resources,
  5221. the CELP optimisation is broken down into smaller, more manageable, sequential
  5222. searches using the perceptual weighting function described earlier.
  5223. \end_layout
  5224. \begin_layout Standard
  5225. \begin_inset Newpage newpage
  5226. \end_inset
  5227. \end_layout
  5228. \begin_layout Chapter
  5229. The Speex Decoder Specification
  5230. \end_layout
  5231. \begin_layout Section
  5232. Narrowband decoder
  5233. \end_layout
  5234. \begin_layout Standard
  5235. <Insert decoder figure here>
  5236. \end_layout
  5237. \begin_layout Subsection
  5238. Narrowband modes
  5239. \end_layout
  5240. \begin_layout Standard
  5241. There are 7 different narrowband bit-rates defined for Speex, ranging from
  5242. 250 bps to 24.6 kbps, although the modes below 5.9 kbps should not be used
  5243. for speech.
  5244. The bit-allocation for each mode is detailed in table
  5245. \begin_inset CommandInset ref
  5246. LatexCommand ref
  5247. reference "cap:bits-narrowband"
  5248. \end_inset
  5249. .
  5250. Each frame starts with the mode ID encoded with 4 bits which allows a range
  5251. from 0 to 15, though only the first 7 values are used (the others are reserved).
  5252. The parameters are listed in the table in the order they are packed in
  5253. the bit-stream.
  5254. All frame-based parameters are packed before sub-frame parameters.
  5255. The parameters for a certain sub-frame are all packed before the following
  5256. sub-frame is packed.
  5257. The
  5258. \begin_inset Quotes eld
  5259. \end_inset
  5260. OL
  5261. \begin_inset Quotes erd
  5262. \end_inset
  5263. in the parameter description means that the parameter is an open loop estimatio
  5264. n based on the whole frame.
  5265. \end_layout
  5266. \begin_layout Standard
  5267. \begin_inset Float table
  5268. placement h
  5269. wide true
  5270. sideways false
  5271. status open
  5272. \begin_layout Plain Layout
  5273. \begin_inset ERT
  5274. status collapsed
  5275. \begin_layout Plain Layout
  5276. \backslash
  5277. begin{center}
  5278. \end_layout
  5279. \end_inset
  5280. \begin_inset Tabular
  5281. <lyxtabular version="3" rows="12" columns="11">
  5282. <features>
  5283. <column alignment="center" valignment="top" width="0pt">
  5284. <column alignment="center" valignment="top" width="0pt">
  5285. <column alignment="center" valignment="top" width="0pt">
  5286. <column alignment="center" valignment="top" width="0pt">
  5287. <column alignment="center" valignment="top" width="0pt">
  5288. <column alignment="center" valignment="top" width="0pt">
  5289. <column alignment="center" valignment="top" width="0pt">
  5290. <column alignment="center" valignment="top" width="0pt">
  5291. <column alignment="center" valignment="top" width="0pt">
  5292. <column alignment="center" valignment="top" width="0pt">
  5293. <column alignment="center" valignment="top" width="0pt">
  5294. <row>
  5295. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5296. \begin_inset Text
  5297. \begin_layout Plain Layout
  5298. Parameter
  5299. \end_layout
  5300. \end_inset
  5301. </cell>
  5302. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5303. \begin_inset Text
  5304. \begin_layout Plain Layout
  5305. Update rate
  5306. \end_layout
  5307. \end_inset
  5308. </cell>
  5309. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5310. \begin_inset Text
  5311. \begin_layout Plain Layout
  5312. 0
  5313. \end_layout
  5314. \end_inset
  5315. </cell>
  5316. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5317. \begin_inset Text
  5318. \begin_layout Plain Layout
  5319. 1
  5320. \end_layout
  5321. \end_inset
  5322. </cell>
  5323. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5324. \begin_inset Text
  5325. \begin_layout Plain Layout
  5326. 2
  5327. \end_layout
  5328. \end_inset
  5329. </cell>
  5330. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5331. \begin_inset Text
  5332. \begin_layout Plain Layout
  5333. 3
  5334. \end_layout
  5335. \end_inset
  5336. </cell>
  5337. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5338. \begin_inset Text
  5339. \begin_layout Plain Layout
  5340. 4
  5341. \end_layout
  5342. \end_inset
  5343. </cell>
  5344. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5345. \begin_inset Text
  5346. \begin_layout Plain Layout
  5347. 5
  5348. \end_layout
  5349. \end_inset
  5350. </cell>
  5351. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5352. \begin_inset Text
  5353. \begin_layout Plain Layout
  5354. 6
  5355. \end_layout
  5356. \end_inset
  5357. </cell>
  5358. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  5359. \begin_inset Text
  5360. \begin_layout Plain Layout
  5361. 7
  5362. \end_layout
  5363. \end_inset
  5364. </cell>
  5365. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  5366. \begin_inset Text
  5367. \begin_layout Plain Layout
  5368. 8
  5369. \end_layout
  5370. \end_inset
  5371. </cell>
  5372. </row>
  5373. <row>
  5374. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5375. \begin_inset Text
  5376. \begin_layout Plain Layout
  5377. Wideband bit
  5378. \end_layout
  5379. \end_inset
  5380. </cell>
  5381. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5382. \begin_inset Text
  5383. \begin_layout Plain Layout
  5384. frame
  5385. \end_layout
  5386. \end_inset
  5387. </cell>
  5388. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5389. \begin_inset Text
  5390. \begin_layout Plain Layout
  5391. 1
  5392. \end_layout
  5393. \end_inset
  5394. </cell>
  5395. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5396. \begin_inset Text
  5397. \begin_layout Plain Layout
  5398. 1
  5399. \end_layout
  5400. \end_inset
  5401. </cell>
  5402. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5403. \begin_inset Text
  5404. \begin_layout Plain Layout
  5405. 1
  5406. \end_layout
  5407. \end_inset
  5408. </cell>
  5409. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5410. \begin_inset Text
  5411. \begin_layout Plain Layout
  5412. 1
  5413. \end_layout
  5414. \end_inset
  5415. </cell>
  5416. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5417. \begin_inset Text
  5418. \begin_layout Plain Layout
  5419. 1
  5420. \end_layout
  5421. \end_inset
  5422. </cell>
  5423. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5424. \begin_inset Text
  5425. \begin_layout Plain Layout
  5426. 1
  5427. \end_layout
  5428. \end_inset
  5429. </cell>
  5430. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5431. \begin_inset Text
  5432. \begin_layout Plain Layout
  5433. 1
  5434. \end_layout
  5435. \end_inset
  5436. </cell>
  5437. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5438. \begin_inset Text
  5439. \begin_layout Plain Layout
  5440. 1
  5441. \end_layout
  5442. \end_inset
  5443. </cell>
  5444. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  5445. \begin_inset Text
  5446. \begin_layout Plain Layout
  5447. 1
  5448. \end_layout
  5449. \end_inset
  5450. </cell>
  5451. </row>
  5452. <row>
  5453. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5454. \begin_inset Text
  5455. \begin_layout Plain Layout
  5456. Mode ID
  5457. \end_layout
  5458. \end_inset
  5459. </cell>
  5460. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5461. \begin_inset Text
  5462. \begin_layout Plain Layout
  5463. frame
  5464. \end_layout
  5465. \end_inset
  5466. </cell>
  5467. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5468. \begin_inset Text
  5469. \begin_layout Plain Layout
  5470. 4
  5471. \end_layout
  5472. \end_inset
  5473. </cell>
  5474. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5475. \begin_inset Text
  5476. \begin_layout Plain Layout
  5477. 4
  5478. \end_layout
  5479. \end_inset
  5480. </cell>
  5481. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5482. \begin_inset Text
  5483. \begin_layout Plain Layout
  5484. 4
  5485. \end_layout
  5486. \end_inset
  5487. </cell>
  5488. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5489. \begin_inset Text
  5490. \begin_layout Plain Layout
  5491. 4
  5492. \end_layout
  5493. \end_inset
  5494. </cell>
  5495. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5496. \begin_inset Text
  5497. \begin_layout Plain Layout
  5498. 4
  5499. \end_layout
  5500. \end_inset
  5501. </cell>
  5502. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5503. \begin_inset Text
  5504. \begin_layout Plain Layout
  5505. 4
  5506. \end_layout
  5507. \end_inset
  5508. </cell>
  5509. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5510. \begin_inset Text
  5511. \begin_layout Plain Layout
  5512. 4
  5513. \end_layout
  5514. \end_inset
  5515. </cell>
  5516. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5517. \begin_inset Text
  5518. \begin_layout Plain Layout
  5519. 4
  5520. \end_layout
  5521. \end_inset
  5522. </cell>
  5523. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  5524. \begin_inset Text
  5525. \begin_layout Plain Layout
  5526. 4
  5527. \end_layout
  5528. \end_inset
  5529. </cell>
  5530. </row>
  5531. <row>
  5532. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5533. \begin_inset Text
  5534. \begin_layout Plain Layout
  5535. LSP
  5536. \end_layout
  5537. \end_inset
  5538. </cell>
  5539. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5540. \begin_inset Text
  5541. \begin_layout Plain Layout
  5542. frame
  5543. \end_layout
  5544. \end_inset
  5545. </cell>
  5546. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5547. \begin_inset Text
  5548. \begin_layout Plain Layout
  5549. 0
  5550. \end_layout
  5551. \end_inset
  5552. </cell>
  5553. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5554. \begin_inset Text
  5555. \begin_layout Plain Layout
  5556. 18
  5557. \end_layout
  5558. \end_inset
  5559. </cell>
  5560. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5561. \begin_inset Text
  5562. \begin_layout Plain Layout
  5563. 18
  5564. \end_layout
  5565. \end_inset
  5566. </cell>
  5567. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5568. \begin_inset Text
  5569. \begin_layout Plain Layout
  5570. 18
  5571. \end_layout
  5572. \end_inset
  5573. </cell>
  5574. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5575. \begin_inset Text
  5576. \begin_layout Plain Layout
  5577. 18
  5578. \end_layout
  5579. \end_inset
  5580. </cell>
  5581. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5582. \begin_inset Text
  5583. \begin_layout Plain Layout
  5584. 30
  5585. \end_layout
  5586. \end_inset
  5587. </cell>
  5588. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5589. \begin_inset Text
  5590. \begin_layout Plain Layout
  5591. 30
  5592. \end_layout
  5593. \end_inset
  5594. </cell>
  5595. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5596. \begin_inset Text
  5597. \begin_layout Plain Layout
  5598. 30
  5599. \end_layout
  5600. \end_inset
  5601. </cell>
  5602. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  5603. \begin_inset Text
  5604. \begin_layout Plain Layout
  5605. 18
  5606. \end_layout
  5607. \end_inset
  5608. </cell>
  5609. </row>
  5610. <row>
  5611. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5612. \begin_inset Text
  5613. \begin_layout Plain Layout
  5614. OL pitch
  5615. \end_layout
  5616. \end_inset
  5617. </cell>
  5618. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5619. \begin_inset Text
  5620. \begin_layout Plain Layout
  5621. frame
  5622. \end_layout
  5623. \end_inset
  5624. </cell>
  5625. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5626. \begin_inset Text
  5627. \begin_layout Plain Layout
  5628. 0
  5629. \end_layout
  5630. \end_inset
  5631. </cell>
  5632. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5633. \begin_inset Text
  5634. \begin_layout Plain Layout
  5635. 7
  5636. \end_layout
  5637. \end_inset
  5638. </cell>
  5639. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5640. \begin_inset Text
  5641. \begin_layout Plain Layout
  5642. 7
  5643. \end_layout
  5644. \end_inset
  5645. </cell>
  5646. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5647. \begin_inset Text
  5648. \begin_layout Plain Layout
  5649. 0
  5650. \end_layout
  5651. \end_inset
  5652. </cell>
  5653. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5654. \begin_inset Text
  5655. \begin_layout Plain Layout
  5656. 0
  5657. \end_layout
  5658. \end_inset
  5659. </cell>
  5660. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5661. \begin_inset Text
  5662. \begin_layout Plain Layout
  5663. 0
  5664. \end_layout
  5665. \end_inset
  5666. </cell>
  5667. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5668. \begin_inset Text
  5669. \begin_layout Plain Layout
  5670. 0
  5671. \end_layout
  5672. \end_inset
  5673. </cell>
  5674. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5675. \begin_inset Text
  5676. \begin_layout Plain Layout
  5677. 0
  5678. \end_layout
  5679. \end_inset
  5680. </cell>
  5681. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  5682. \begin_inset Text
  5683. \begin_layout Plain Layout
  5684. 7
  5685. \end_layout
  5686. \end_inset
  5687. </cell>
  5688. </row>
  5689. <row>
  5690. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5691. \begin_inset Text
  5692. \begin_layout Plain Layout
  5693. OL pitch gain
  5694. \end_layout
  5695. \end_inset
  5696. </cell>
  5697. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5698. \begin_inset Text
  5699. \begin_layout Plain Layout
  5700. frame
  5701. \end_layout
  5702. \end_inset
  5703. </cell>
  5704. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5705. \begin_inset Text
  5706. \begin_layout Plain Layout
  5707. 0
  5708. \end_layout
  5709. \end_inset
  5710. </cell>
  5711. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5712. \begin_inset Text
  5713. \begin_layout Plain Layout
  5714. 4
  5715. \end_layout
  5716. \end_inset
  5717. </cell>
  5718. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5719. \begin_inset Text
  5720. \begin_layout Plain Layout
  5721. 0
  5722. \end_layout
  5723. \end_inset
  5724. </cell>
  5725. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5726. \begin_inset Text
  5727. \begin_layout Plain Layout
  5728. 0
  5729. \end_layout
  5730. \end_inset
  5731. </cell>
  5732. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5733. \begin_inset Text
  5734. \begin_layout Plain Layout
  5735. 0
  5736. \end_layout
  5737. \end_inset
  5738. </cell>
  5739. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5740. \begin_inset Text
  5741. \begin_layout Plain Layout
  5742. 0
  5743. \end_layout
  5744. \end_inset
  5745. </cell>
  5746. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5747. \begin_inset Text
  5748. \begin_layout Plain Layout
  5749. 0
  5750. \end_layout
  5751. \end_inset
  5752. </cell>
  5753. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5754. \begin_inset Text
  5755. \begin_layout Plain Layout
  5756. 0
  5757. \end_layout
  5758. \end_inset
  5759. </cell>
  5760. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  5761. \begin_inset Text
  5762. \begin_layout Plain Layout
  5763. 4
  5764. \end_layout
  5765. \end_inset
  5766. </cell>
  5767. </row>
  5768. <row>
  5769. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5770. \begin_inset Text
  5771. \begin_layout Plain Layout
  5772. OL Exc gain
  5773. \end_layout
  5774. \end_inset
  5775. </cell>
  5776. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5777. \begin_inset Text
  5778. \begin_layout Plain Layout
  5779. frame
  5780. \end_layout
  5781. \end_inset
  5782. </cell>
  5783. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5784. \begin_inset Text
  5785. \begin_layout Plain Layout
  5786. 0
  5787. \end_layout
  5788. \end_inset
  5789. </cell>
  5790. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5791. \begin_inset Text
  5792. \begin_layout Plain Layout
  5793. 5
  5794. \end_layout
  5795. \end_inset
  5796. </cell>
  5797. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5798. \begin_inset Text
  5799. \begin_layout Plain Layout
  5800. 5
  5801. \end_layout
  5802. \end_inset
  5803. </cell>
  5804. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5805. \begin_inset Text
  5806. \begin_layout Plain Layout
  5807. 5
  5808. \end_layout
  5809. \end_inset
  5810. </cell>
  5811. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5812. \begin_inset Text
  5813. \begin_layout Plain Layout
  5814. 5
  5815. \end_layout
  5816. \end_inset
  5817. </cell>
  5818. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5819. \begin_inset Text
  5820. \begin_layout Plain Layout
  5821. 5
  5822. \end_layout
  5823. \end_inset
  5824. </cell>
  5825. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5826. \begin_inset Text
  5827. \begin_layout Plain Layout
  5828. 5
  5829. \end_layout
  5830. \end_inset
  5831. </cell>
  5832. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5833. \begin_inset Text
  5834. \begin_layout Plain Layout
  5835. 5
  5836. \end_layout
  5837. \end_inset
  5838. </cell>
  5839. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  5840. \begin_inset Text
  5841. \begin_layout Plain Layout
  5842. 5
  5843. \end_layout
  5844. \end_inset
  5845. </cell>
  5846. </row>
  5847. <row>
  5848. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5849. \begin_inset Text
  5850. \begin_layout Plain Layout
  5851. Fine pitch
  5852. \end_layout
  5853. \end_inset
  5854. </cell>
  5855. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5856. \begin_inset Text
  5857. \begin_layout Plain Layout
  5858. sub-frame
  5859. \end_layout
  5860. \end_inset
  5861. </cell>
  5862. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5863. \begin_inset Text
  5864. \begin_layout Plain Layout
  5865. 0
  5866. \end_layout
  5867. \end_inset
  5868. </cell>
  5869. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5870. \begin_inset Text
  5871. \begin_layout Plain Layout
  5872. 0
  5873. \end_layout
  5874. \end_inset
  5875. </cell>
  5876. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5877. \begin_inset Text
  5878. \begin_layout Plain Layout
  5879. 0
  5880. \end_layout
  5881. \end_inset
  5882. </cell>
  5883. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5884. \begin_inset Text
  5885. \begin_layout Plain Layout
  5886. 7
  5887. \end_layout
  5888. \end_inset
  5889. </cell>
  5890. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5891. \begin_inset Text
  5892. \begin_layout Plain Layout
  5893. 7
  5894. \end_layout
  5895. \end_inset
  5896. </cell>
  5897. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5898. \begin_inset Text
  5899. \begin_layout Plain Layout
  5900. 7
  5901. \end_layout
  5902. \end_inset
  5903. </cell>
  5904. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5905. \begin_inset Text
  5906. \begin_layout Plain Layout
  5907. 7
  5908. \end_layout
  5909. \end_inset
  5910. </cell>
  5911. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5912. \begin_inset Text
  5913. \begin_layout Plain Layout
  5914. 7
  5915. \end_layout
  5916. \end_inset
  5917. </cell>
  5918. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  5919. \begin_inset Text
  5920. \begin_layout Plain Layout
  5921. 0
  5922. \end_layout
  5923. \end_inset
  5924. </cell>
  5925. </row>
  5926. <row>
  5927. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5928. \begin_inset Text
  5929. \begin_layout Plain Layout
  5930. Pitch gain
  5931. \end_layout
  5932. \end_inset
  5933. </cell>
  5934. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5935. \begin_inset Text
  5936. \begin_layout Plain Layout
  5937. sub-frame
  5938. \end_layout
  5939. \end_inset
  5940. </cell>
  5941. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5942. \begin_inset Text
  5943. \begin_layout Plain Layout
  5944. 0
  5945. \end_layout
  5946. \end_inset
  5947. </cell>
  5948. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5949. \begin_inset Text
  5950. \begin_layout Plain Layout
  5951. 0
  5952. \end_layout
  5953. \end_inset
  5954. </cell>
  5955. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5956. \begin_inset Text
  5957. \begin_layout Plain Layout
  5958. 5
  5959. \end_layout
  5960. \end_inset
  5961. </cell>
  5962. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5963. \begin_inset Text
  5964. \begin_layout Plain Layout
  5965. 5
  5966. \end_layout
  5967. \end_inset
  5968. </cell>
  5969. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5970. \begin_inset Text
  5971. \begin_layout Plain Layout
  5972. 5
  5973. \end_layout
  5974. \end_inset
  5975. </cell>
  5976. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5977. \begin_inset Text
  5978. \begin_layout Plain Layout
  5979. 7
  5980. \end_layout
  5981. \end_inset
  5982. </cell>
  5983. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5984. \begin_inset Text
  5985. \begin_layout Plain Layout
  5986. 7
  5987. \end_layout
  5988. \end_inset
  5989. </cell>
  5990. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  5991. \begin_inset Text
  5992. \begin_layout Plain Layout
  5993. 7
  5994. \end_layout
  5995. \end_inset
  5996. </cell>
  5997. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  5998. \begin_inset Text
  5999. \begin_layout Plain Layout
  6000. 0
  6001. \end_layout
  6002. \end_inset
  6003. </cell>
  6004. </row>
  6005. <row>
  6006. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6007. \begin_inset Text
  6008. \begin_layout Plain Layout
  6009. Innovation gain
  6010. \end_layout
  6011. \end_inset
  6012. </cell>
  6013. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6014. \begin_inset Text
  6015. \begin_layout Plain Layout
  6016. sub-frame
  6017. \end_layout
  6018. \end_inset
  6019. </cell>
  6020. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6021. \begin_inset Text
  6022. \begin_layout Plain Layout
  6023. 0
  6024. \end_layout
  6025. \end_inset
  6026. </cell>
  6027. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6028. \begin_inset Text
  6029. \begin_layout Plain Layout
  6030. 1
  6031. \end_layout
  6032. \end_inset
  6033. </cell>
  6034. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6035. \begin_inset Text
  6036. \begin_layout Plain Layout
  6037. 0
  6038. \end_layout
  6039. \end_inset
  6040. </cell>
  6041. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6042. \begin_inset Text
  6043. \begin_layout Plain Layout
  6044. 1
  6045. \end_layout
  6046. \end_inset
  6047. </cell>
  6048. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6049. \begin_inset Text
  6050. \begin_layout Plain Layout
  6051. 1
  6052. \end_layout
  6053. \end_inset
  6054. </cell>
  6055. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6056. \begin_inset Text
  6057. \begin_layout Plain Layout
  6058. 3
  6059. \end_layout
  6060. \end_inset
  6061. </cell>
  6062. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6063. \begin_inset Text
  6064. \begin_layout Plain Layout
  6065. 3
  6066. \end_layout
  6067. \end_inset
  6068. </cell>
  6069. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6070. \begin_inset Text
  6071. \begin_layout Plain Layout
  6072. 3
  6073. \end_layout
  6074. \end_inset
  6075. </cell>
  6076. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  6077. \begin_inset Text
  6078. \begin_layout Plain Layout
  6079. 0
  6080. \end_layout
  6081. \end_inset
  6082. </cell>
  6083. </row>
  6084. <row>
  6085. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6086. \begin_inset Text
  6087. \begin_layout Plain Layout
  6088. Innovation VQ
  6089. \end_layout
  6090. \end_inset
  6091. </cell>
  6092. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6093. \begin_inset Text
  6094. \begin_layout Plain Layout
  6095. sub-frame
  6096. \end_layout
  6097. \end_inset
  6098. </cell>
  6099. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6100. \begin_inset Text
  6101. \begin_layout Plain Layout
  6102. 0
  6103. \end_layout
  6104. \end_inset
  6105. </cell>
  6106. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6107. \begin_inset Text
  6108. \begin_layout Plain Layout
  6109. 0
  6110. \end_layout
  6111. \end_inset
  6112. </cell>
  6113. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6114. \begin_inset Text
  6115. \begin_layout Plain Layout
  6116. 16
  6117. \end_layout
  6118. \end_inset
  6119. </cell>
  6120. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6121. \begin_inset Text
  6122. \begin_layout Plain Layout
  6123. 20
  6124. \end_layout
  6125. \end_inset
  6126. </cell>
  6127. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6128. \begin_inset Text
  6129. \begin_layout Plain Layout
  6130. 35
  6131. \end_layout
  6132. \end_inset
  6133. </cell>
  6134. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6135. \begin_inset Text
  6136. \begin_layout Plain Layout
  6137. 48
  6138. \end_layout
  6139. \end_inset
  6140. </cell>
  6141. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6142. \begin_inset Text
  6143. \begin_layout Plain Layout
  6144. 64
  6145. \end_layout
  6146. \end_inset
  6147. </cell>
  6148. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6149. \begin_inset Text
  6150. \begin_layout Plain Layout
  6151. 96
  6152. \end_layout
  6153. \end_inset
  6154. </cell>
  6155. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  6156. \begin_inset Text
  6157. \begin_layout Plain Layout
  6158. 10
  6159. \end_layout
  6160. \end_inset
  6161. </cell>
  6162. </row>
  6163. <row>
  6164. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6165. \begin_inset Text
  6166. \begin_layout Plain Layout
  6167. Total
  6168. \end_layout
  6169. \end_inset
  6170. </cell>
  6171. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6172. \begin_inset Text
  6173. \begin_layout Plain Layout
  6174. frame
  6175. \end_layout
  6176. \end_inset
  6177. </cell>
  6178. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6179. \begin_inset Text
  6180. \begin_layout Plain Layout
  6181. 5
  6182. \end_layout
  6183. \end_inset
  6184. </cell>
  6185. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6186. \begin_inset Text
  6187. \begin_layout Plain Layout
  6188. 43
  6189. \end_layout
  6190. \end_inset
  6191. </cell>
  6192. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6193. \begin_inset Text
  6194. \begin_layout Plain Layout
  6195. 119
  6196. \end_layout
  6197. \end_inset
  6198. </cell>
  6199. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6200. \begin_inset Text
  6201. \begin_layout Plain Layout
  6202. 160
  6203. \end_layout
  6204. \end_inset
  6205. </cell>
  6206. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6207. \begin_inset Text
  6208. \begin_layout Plain Layout
  6209. 220
  6210. \end_layout
  6211. \end_inset
  6212. </cell>
  6213. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6214. \begin_inset Text
  6215. \begin_layout Plain Layout
  6216. 300
  6217. \end_layout
  6218. \end_inset
  6219. </cell>
  6220. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6221. \begin_inset Text
  6222. \begin_layout Plain Layout
  6223. 364
  6224. \end_layout
  6225. \end_inset
  6226. </cell>
  6227. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6228. \begin_inset Text
  6229. \begin_layout Plain Layout
  6230. 492
  6231. \end_layout
  6232. \end_inset
  6233. </cell>
  6234. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  6235. \begin_inset Text
  6236. \begin_layout Plain Layout
  6237. 79
  6238. \end_layout
  6239. \end_inset
  6240. </cell>
  6241. </row>
  6242. </lyxtabular>
  6243. \end_inset
  6244. \begin_inset ERT
  6245. status collapsed
  6246. \begin_layout Plain Layout
  6247. \backslash
  6248. end{center}
  6249. \end_layout
  6250. \end_inset
  6251. \end_layout
  6252. \begin_layout Plain Layout
  6253. \begin_inset Caption
  6254. \begin_layout Plain Layout
  6255. Bit allocation for narrowband modes
  6256. \begin_inset CommandInset label
  6257. LatexCommand label
  6258. name "cap:bits-narrowband"
  6259. \end_inset
  6260. \end_layout
  6261. \end_inset
  6262. \end_layout
  6263. \end_inset
  6264. \end_layout
  6265. \begin_layout Subsection
  6266. LSP decoding
  6267. \end_layout
  6268. \begin_layout Standard
  6269. Depending on the mode, LSP parameters are encoded using either 18 bits or
  6270. 30 bits.
  6271. \end_layout
  6272. \begin_layout Standard
  6273. Interpolation
  6274. \end_layout
  6275. \begin_layout Standard
  6276. Safe margin
  6277. \end_layout
  6278. \begin_layout Subsection
  6279. Adaptive codebook
  6280. \end_layout
  6281. \begin_layout Standard
  6282. For rates of 8 kbit/s and above, the pitch period is encoded for each subframe.
  6283. The real period is
  6284. \begin_inset Formula $T=p_{i}+17$
  6285. \end_inset
  6286. where
  6287. \begin_inset Formula $p_{i}$
  6288. \end_inset
  6289. is a value encoded with 7 bits and 17 corresponds to the minimum pitch.
  6290. The maximum period is 144.
  6291. At 5.95 kbit/s (mode 2), the pitch period is similarly encoded, but only
  6292. once for the frame.
  6293. Each sub-frame then has a 2-bit offset that is added to the pitch value
  6294. of the frame.
  6295. In that case, the pitch for each sub-frame is equal to
  6296. \begin_inset Formula $T-1+offset$
  6297. \end_inset
  6298. .
  6299. For rates below 5.95 kbit/s, only the per-frame pitch is used and the pitch
  6300. is constant for all sub-frames.
  6301. \end_layout
  6302. \begin_layout Standard
  6303. Speex uses a 3-tap predictor for rates of 5.95 kbit/s and above.
  6304. The three gain values are obtained from a 5-bit or a 7-bit codebook, depending
  6305. on the mode.
  6306. \end_layout
  6307. \begin_layout Subsection
  6308. Innovation codebook
  6309. \end_layout
  6310. \begin_layout Standard
  6311. Split codebook, size and entries depend on bit-rate
  6312. \end_layout
  6313. \begin_layout Standard
  6314. a 5-bit gain is encoder on a per-frame basis
  6315. \end_layout
  6316. \begin_layout Standard
  6317. Depending on the mode, higher resolution per sub-frame
  6318. \end_layout
  6319. \begin_layout Standard
  6320. innovation sub-vectors concatenated, gain applied
  6321. \end_layout
  6322. \begin_layout Subsection
  6323. Perceptual enhancement
  6324. \end_layout
  6325. \begin_layout Standard
  6326. Optional, implementation-defined.
  6327. \end_layout
  6328. \begin_layout Subsection
  6329. Bit-stream definition
  6330. \end_layout
  6331. \begin_layout Standard
  6332. This section defines the bit-stream that is transmitted on the wire.
  6333. One speex packet consist of 1 frame header and 4 sub-frames:
  6334. \end_layout
  6335. \begin_layout Standard
  6336. \begin_inset Tabular
  6337. <lyxtabular version="3" rows="1" columns="5">
  6338. <features>
  6339. <column alignment="center" valignment="top" width="0">
  6340. <column alignment="center" valignment="top" width="0">
  6341. <column alignment="center" valignment="top" width="0">
  6342. <column alignment="center" valignment="top" width="0">
  6343. <column alignment="center" valignment="top" width="0">
  6344. <row>
  6345. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6346. \begin_inset Text
  6347. \begin_layout Plain Layout
  6348. Frame Header
  6349. \end_layout
  6350. \end_inset
  6351. </cell>
  6352. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6353. \begin_inset Text
  6354. \begin_layout Plain Layout
  6355. Subframe 1
  6356. \end_layout
  6357. \end_inset
  6358. </cell>
  6359. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6360. \begin_inset Text
  6361. \begin_layout Plain Layout
  6362. Subframe2
  6363. \end_layout
  6364. \end_inset
  6365. </cell>
  6366. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6367. \begin_inset Text
  6368. \begin_layout Plain Layout
  6369. Subframe 3
  6370. \end_layout
  6371. \end_inset
  6372. </cell>
  6373. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  6374. \begin_inset Text
  6375. \begin_layout Plain Layout
  6376. Subframe 4
  6377. \end_layout
  6378. \end_inset
  6379. </cell>
  6380. </row>
  6381. </lyxtabular>
  6382. \end_inset
  6383. \end_layout
  6384. \begin_layout Standard
  6385. The frame header is variable length, depending on decoding mode and submode.
  6386. The narrowband frame header is defined as follows:
  6387. \end_layout
  6388. \begin_layout Standard
  6389. \begin_inset Tabular
  6390. <lyxtabular version="3" rows="1" columns="6">
  6391. <features>
  6392. <column alignment="center" valignment="top" width="0">
  6393. <column alignment="center" valignment="top" width="0">
  6394. <column alignment="center" valignment="top" width="0">
  6395. <column alignment="center" valignment="top" width="0">
  6396. <column alignment="center" valignment="top" width="0">
  6397. <column alignment="center" valignment="top" width="0">
  6398. <row>
  6399. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6400. \begin_inset Text
  6401. \begin_layout Plain Layout
  6402. wb bit
  6403. \end_layout
  6404. \end_inset
  6405. </cell>
  6406. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6407. \begin_inset Text
  6408. \begin_layout Plain Layout
  6409. modeid
  6410. \end_layout
  6411. \end_inset
  6412. </cell>
  6413. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6414. \begin_inset Text
  6415. \begin_layout Plain Layout
  6416. LSP
  6417. \end_layout
  6418. \end_inset
  6419. </cell>
  6420. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6421. \begin_inset Text
  6422. \begin_layout Plain Layout
  6423. OL-pitch
  6424. \end_layout
  6425. \end_inset
  6426. </cell>
  6427. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6428. \begin_inset Text
  6429. \begin_layout Plain Layout
  6430. OL-pitchgain
  6431. \end_layout
  6432. \end_inset
  6433. </cell>
  6434. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  6435. \begin_inset Text
  6436. \begin_layout Plain Layout
  6437. OL ExcGain
  6438. \end_layout
  6439. \end_inset
  6440. </cell>
  6441. </row>
  6442. </lyxtabular>
  6443. \end_inset
  6444. \end_layout
  6445. \begin_layout Standard
  6446. wb-bit: Wideband bit (1 bit) 0=narrowband, 1=wideband
  6447. \end_layout
  6448. \begin_layout Standard
  6449. modeid: Mode identifier (4 bits)
  6450. \end_layout
  6451. \begin_layout Standard
  6452. LSP: Line Spectral Pairs (0, 18 or 30 bits)
  6453. \end_layout
  6454. \begin_layout Standard
  6455. OL-pitch: Open Loop Pitch (0 or 7 bits)
  6456. \end_layout
  6457. \begin_layout Standard
  6458. OL-pitchgain: Open Loop Pitch Gain (0 or 4 bits)
  6459. \end_layout
  6460. \begin_layout Standard
  6461. OL-ExcGain: Open Loop Excitation Gain (0 or 5 bits)
  6462. \end_layout
  6463. \begin_layout Standard
  6464. ...
  6465. \end_layout
  6466. \begin_layout Standard
  6467. Each subframe is defined as follows:
  6468. \end_layout
  6469. \begin_layout Standard
  6470. \begin_inset Tabular
  6471. <lyxtabular version="3" rows="1" columns="4">
  6472. <features>
  6473. <column alignment="center" valignment="top" width="0">
  6474. <column alignment="center" valignment="top" width="0">
  6475. <column alignment="center" valignment="top" width="0">
  6476. <column alignment="center" valignment="top" width="0">
  6477. <row>
  6478. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6479. \begin_inset Text
  6480. \begin_layout Plain Layout
  6481. FinePitch
  6482. \end_layout
  6483. \end_inset
  6484. </cell>
  6485. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6486. \begin_inset Text
  6487. \begin_layout Plain Layout
  6488. PitchGain
  6489. \end_layout
  6490. \end_inset
  6491. </cell>
  6492. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6493. \begin_inset Text
  6494. \begin_layout Plain Layout
  6495. InnovationGain
  6496. \end_layout
  6497. \end_inset
  6498. </cell>
  6499. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  6500. \begin_inset Text
  6501. \begin_layout Plain Layout
  6502. Innovation VQ
  6503. \end_layout
  6504. \end_inset
  6505. </cell>
  6506. </row>
  6507. </lyxtabular>
  6508. \end_inset
  6509. \end_layout
  6510. \begin_layout Standard
  6511. FinePitch: (0 or 7 bits)
  6512. \end_layout
  6513. \begin_layout Standard
  6514. PitchGain: (0, 5, or 7 bits)
  6515. \end_layout
  6516. \begin_layout Standard
  6517. Innovation Gain: (0, 1, 3 bits)
  6518. \end_layout
  6519. \begin_layout Standard
  6520. Innovation VQ: (0-96 bits)
  6521. \end_layout
  6522. \begin_layout Standard
  6523. ...
  6524. \end_layout
  6525. \begin_layout Subsection
  6526. Sample decoder
  6527. \end_layout
  6528. \begin_layout Standard
  6529. This section contains some sample source code, showing how a basic Speex
  6530. decoder can be implemented.
  6531. The sample decoder is narrowband submode 3 only, and with no advanced features
  6532. like enhancement, vbr etc.
  6533. \end_layout
  6534. \begin_layout Standard
  6535. ...
  6536. \end_layout
  6537. \begin_layout Subsection
  6538. Lookup tables
  6539. \end_layout
  6540. \begin_layout Standard
  6541. The Speex decoder includes a set of lookup tables and codebooks, which are
  6542. used to convert between values of different domains.
  6543. This includes:
  6544. \end_layout
  6545. \begin_layout Standard
  6546. - Excitation 10x16 (3200 bps)
  6547. \end_layout
  6548. \begin_layout Standard
  6549. - Excitation 10x32 (4000 bps)
  6550. \end_layout
  6551. \begin_layout Standard
  6552. - Excitation 20x32 (2000 bps)
  6553. \end_layout
  6554. \begin_layout Standard
  6555. - Excitation 5x256 (12800 bps)
  6556. \end_layout
  6557. \begin_layout Standard
  6558. - Excitation 5x64 (9600 bps)
  6559. \end_layout
  6560. \begin_layout Standard
  6561. - Excitation 8x128 (7000 bps)
  6562. \end_layout
  6563. \begin_layout Standard
  6564. - Codebook for 3-tap pitch prediction gain (Normal and Low Bitrate)
  6565. \end_layout
  6566. \begin_layout Standard
  6567. - Codebook for LSPs in narrowband CELP mode
  6568. \end_layout
  6569. \begin_layout Standard
  6570. ...
  6571. \end_layout
  6572. \begin_layout Standard
  6573. The exact lookup tables are included here for reference.
  6574. \end_layout
  6575. \begin_layout Section
  6576. Wideband embedded decoder
  6577. \end_layout
  6578. \begin_layout Standard
  6579. QMF filter.
  6580. Narrowband signal decoded using narrowband decoder
  6581. \end_layout
  6582. \begin_layout Standard
  6583. For the high band, the decoder is similar to the narrowband decoder, with
  6584. the main difference being that there is no adaptive codebook.
  6585. \end_layout
  6586. \begin_layout Standard
  6587. Gain is per-subframe
  6588. \end_layout
  6589. \begin_layout Chapter
  6590. Speex narrowband mode
  6591. \begin_inset CommandInset label
  6592. LatexCommand label
  6593. name "sec:Speex-narrowband-mode"
  6594. \end_inset
  6595. \begin_inset Index
  6596. status collapsed
  6597. \begin_layout Plain Layout
  6598. narrowband
  6599. \end_layout
  6600. \end_inset
  6601. \end_layout
  6602. \begin_layout Standard
  6603. This section looks at how Speex works for narrowband (
  6604. \begin_inset Formula $8\:\mathrm{kHz}$
  6605. \end_inset
  6606. sampling rate) operation.
  6607. The frame size for this mode is
  6608. \begin_inset Formula $20\:\mathrm{ms}$
  6609. \end_inset
  6610. , corresponding to 160 samples.
  6611. Each frame is also subdivided into 4 sub-frames of 40 samples each.
  6612. \end_layout
  6613. \begin_layout Standard
  6614. Also many design decisions were based on the original goals and assumptions:
  6615. \end_layout
  6616. \begin_layout Itemize
  6617. Minimizing the amount of information extracted from past frames (for robustness
  6618. to packet loss)
  6619. \end_layout
  6620. \begin_layout Itemize
  6621. Dynamically-selectable codebooks (LSP, pitch and innovation)
  6622. \end_layout
  6623. \begin_layout Itemize
  6624. sub-vector fixed (innovation) codebooks
  6625. \end_layout
  6626. \begin_layout Section
  6627. Whole-Frame Analysis
  6628. \begin_inset Index
  6629. status collapsed
  6630. \begin_layout Plain Layout
  6631. linear prediction
  6632. \end_layout
  6633. \end_inset
  6634. \end_layout
  6635. \begin_layout Standard
  6636. In narrowband, Speex frames are 20 ms long (160 samples) and are subdivided
  6637. in 4 sub-frames of 5 ms each (40 samples).
  6638. For most narrowband bit-rates (8 kbps and above), the only parameters encoded
  6639. at the frame level are the Line Spectral Pairs (LSP) and a global excitation
  6640. gain
  6641. \begin_inset Formula $g_{frame}$
  6642. \end_inset
  6643. , as shown in Fig.
  6644. \begin_inset CommandInset ref
  6645. LatexCommand ref
  6646. reference "cap:Frame-open-loop-analysis"
  6647. \end_inset
  6648. .
  6649. All other parameters are encoded at the sub-frame level.
  6650. \end_layout
  6651. \begin_layout Standard
  6652. Linear prediction analysis is performed once per frame using an asymmetric
  6653. Hamming window centered on the fourth sub-frame.
  6654. Because linear prediction coefficients (LPC) are not robust to quantization,
  6655. they are first converted to line spectral pairs (LSP)
  6656. \begin_inset Index
  6657. status collapsed
  6658. \begin_layout Plain Layout
  6659. line spectral pair
  6660. \end_layout
  6661. \end_inset
  6662. .
  6663. The LSP's are considered to be associated to the
  6664. \begin_inset Formula $4^{th}$
  6665. \end_inset
  6666. sub-frames and the LSP's associated to the first 3 sub-frames are linearly
  6667. interpolated using the current and previous LSP coefficients.
  6668. The LSP coefficients and converted back to the LPC filter
  6669. \begin_inset Formula $\hat{A}(z)$
  6670. \end_inset
  6671. .
  6672. The non-quantized interpolated filter is denoted
  6673. \begin_inset Formula $A(z)$
  6674. \end_inset
  6675. and can be used for the weighting filter
  6676. \begin_inset Formula $W(z)$
  6677. \end_inset
  6678. because it does not need to be available to the decoder.
  6679. \end_layout
  6680. \begin_layout Standard
  6681. To make Speex more robust to packet loss, no prediction is applied on the
  6682. LSP coefficients prior to quantization.
  6683. The LSPs are encoded using vector quantization (VQ) with 30 bits for higher
  6684. quality modes and 18 bits for lower quality.
  6685. \end_layout
  6686. \begin_layout Standard
  6687. \begin_inset Float figure
  6688. wide false
  6689. sideways false
  6690. status open
  6691. \begin_layout Plain Layout
  6692. \begin_inset ERT
  6693. status collapsed
  6694. \begin_layout Plain Layout
  6695. \backslash
  6696. begin{center}
  6697. \end_layout
  6698. \end_inset
  6699. \begin_inset Graphics
  6700. filename speex_analysis.eps
  6701. width 35page%
  6702. \end_inset
  6703. \begin_inset ERT
  6704. status collapsed
  6705. \begin_layout Plain Layout
  6706. \backslash
  6707. end{center}
  6708. \end_layout
  6709. \end_inset
  6710. \end_layout
  6711. \begin_layout Plain Layout
  6712. \begin_inset Caption
  6713. \begin_layout Plain Layout
  6714. Frame open-loop analysis
  6715. \begin_inset CommandInset label
  6716. LatexCommand label
  6717. name "cap:Frame-open-loop-analysis"
  6718. \end_inset
  6719. \end_layout
  6720. \end_inset
  6721. \end_layout
  6722. \end_inset
  6723. \end_layout
  6724. \begin_layout Section
  6725. Sub-Frame Analysis-by-Synthesis
  6726. \end_layout
  6727. \begin_layout Standard
  6728. \begin_inset Float figure
  6729. wide false
  6730. sideways false
  6731. status open
  6732. \begin_layout Plain Layout
  6733. \begin_inset ERT
  6734. status collapsed
  6735. \begin_layout Plain Layout
  6736. \backslash
  6737. begin{center}
  6738. \end_layout
  6739. \end_inset
  6740. \begin_inset Graphics
  6741. filename speex_abs.eps
  6742. lyxscale 75
  6743. width 40page%
  6744. \end_inset
  6745. \begin_inset ERT
  6746. status collapsed
  6747. \begin_layout Plain Layout
  6748. \backslash
  6749. end{center}
  6750. \end_layout
  6751. \end_inset
  6752. \end_layout
  6753. \begin_layout Plain Layout
  6754. \begin_inset Caption
  6755. \begin_layout Plain Layout
  6756. Analysis-by-synthesis closed-loop optimization on a sub-frame.
  6757. \begin_inset CommandInset label
  6758. LatexCommand label
  6759. name "cap:Sub-frame-AbS"
  6760. \end_inset
  6761. \end_layout
  6762. \end_inset
  6763. \end_layout
  6764. \end_inset
  6765. \end_layout
  6766. \begin_layout Standard
  6767. The analysis-by-synthesis (AbS) encoder loop is described in Fig.
  6768. \begin_inset CommandInset ref
  6769. LatexCommand ref
  6770. reference "cap:Sub-frame-AbS"
  6771. \end_inset
  6772. .
  6773. There are three main aspects where Speex significantly differs from most
  6774. other CELP codecs.
  6775. First, while most recent CELP codecs make use of fractional pitch estimation
  6776. with a single gain, Speex uses an integer to encode the pitch period, but
  6777. uses a 3-tap predictor (3 gains).
  6778. The adaptive codebook contribution
  6779. \begin_inset Formula $e_{a}[n]$
  6780. \end_inset
  6781. can thus be expressed as:
  6782. \begin_inset Formula \begin{equation}
  6783. e_{a}[n]=g_{0}e[n-T-1]+g_{1}e[n-T]+g_{2}e[n-T+1]\label{eq:adaptive-3tap}\end{equation}
  6784. \end_inset
  6785. where
  6786. \begin_inset Formula $g_{0}$
  6787. \end_inset
  6788. ,
  6789. \begin_inset Formula $g_{1}$
  6790. \end_inset
  6791. and
  6792. \begin_inset Formula $g_{2}$
  6793. \end_inset
  6794. are the jointly quantized pitch gains and
  6795. \begin_inset Formula $e[n]$
  6796. \end_inset
  6797. is the codec excitation memory.
  6798. It is worth noting that when the pitch is smaller than the sub-frame size,
  6799. we repeat the excitation at a period
  6800. \begin_inset Formula $T$
  6801. \end_inset
  6802. .
  6803. For example, when
  6804. \begin_inset Formula $n-T+1\geq0$
  6805. \end_inset
  6806. , we use
  6807. \begin_inset Formula $n-2T+1$
  6808. \end_inset
  6809. instead.
  6810. In most modes, the pitch period is encoded with 7 bits in the
  6811. \begin_inset Formula $\left[17,144\right]$
  6812. \end_inset
  6813. range and the
  6814. \begin_inset Formula $\beta_{i}$
  6815. \end_inset
  6816. coefficients are vector-quantized using 7 bits at higher bit-rates (15
  6817. kbps narrowband and above) and 5 bits at lower bit-rates (11 kbps narrowband
  6818. and below).
  6819. \end_layout
  6820. \begin_layout Standard
  6821. Many current CELP codecs use moving average (MA) prediction to encode the
  6822. fixed codebook gain.
  6823. This provides slightly better coding at the expense of introducing a dependency
  6824. on previously encoded frames.
  6825. A second difference is that Speex encodes the fixed codebook gain as the
  6826. product of the global excitation gain
  6827. \begin_inset Formula $g_{frame}$
  6828. \end_inset
  6829. with a sub-frame gain corrections
  6830. \begin_inset Formula $g_{subf}$
  6831. \end_inset
  6832. .
  6833. This increases robustness to packet loss by eliminating the inter-frame
  6834. dependency.
  6835. The sub-frame gain correction is encoded before the fixed codebook is searched
  6836. (not closed-loop optimized) and uses between 0 and 3 bits per sub-frame,
  6837. depending on the bit-rate.
  6838. \end_layout
  6839. \begin_layout Standard
  6840. The third difference is that Speex uses sub-vector quantization of the innovatio
  6841. n (fixed codebook) signal instead of an algebraic codebook.
  6842. Each sub-frame is divided into sub-vectors of lengths ranging between 5
  6843. and 20 samples.
  6844. Each sub-vector is chosen from a bitrate-dependent codebook and all sub-vectors
  6845. are concatenated to form a sub-frame.
  6846. As an example, the 3.95 kbps mode uses a sub-vector size of 20 samples with
  6847. 32 entries in the codebook (5 bits).
  6848. This means that the innovation is encoded with 10 bits per sub-frame, or
  6849. 2000 bps.
  6850. On the other hand, the 18.2 kbps mode uses a sub-vector size of 5 samples
  6851. with 256 entries in the codebook (8 bits), so the innovation uses 64 bits
  6852. per sub-frame, or 12800 bps.
  6853. \end_layout
  6854. \begin_layout Section
  6855. Bit-rates
  6856. \end_layout
  6857. \begin_layout Standard
  6858. So far, no MOS (Mean Opinion Score
  6859. \begin_inset Index
  6860. status collapsed
  6861. \begin_layout Plain Layout
  6862. mean opinion score
  6863. \end_layout
  6864. \end_inset
  6865. ) subjective evaluation has been performed for Speex.
  6866. In order to give an idea of the quality achievable with it, table
  6867. \begin_inset CommandInset ref
  6868. LatexCommand ref
  6869. reference "cap:quality_vs_bps"
  6870. \end_inset
  6871. presents my own subjective opinion on it.
  6872. It should be noted that different people will perceive the quality differently
  6873. and that the person that designed the codec often has a bias (one way or
  6874. another) when it comes to subjective evaluation.
  6875. Last thing, it should be noted that for most codecs (including Speex) encoding
  6876. quality sometimes varies depending on the input.
  6877. Note that the complexity is only approximate (within 0.5 mflops and using
  6878. the lowest complexity setting).
  6879. Decoding requires approximately 0.5 mflops
  6880. \begin_inset Index
  6881. status collapsed
  6882. \begin_layout Plain Layout
  6883. complexity
  6884. \end_layout
  6885. \end_inset
  6886. in most modes (1 mflops with perceptual enhancement).
  6887. \end_layout
  6888. \begin_layout Standard
  6889. \begin_inset Float table
  6890. placement h
  6891. wide true
  6892. sideways false
  6893. status open
  6894. \begin_layout Plain Layout
  6895. \begin_inset ERT
  6896. status collapsed
  6897. \begin_layout Plain Layout
  6898. \backslash
  6899. begin{center}
  6900. \end_layout
  6901. \end_inset
  6902. \begin_inset Tabular
  6903. <lyxtabular version="3" rows="17" columns="5">
  6904. <features>
  6905. <column alignment="center" valignment="top" width="0pt">
  6906. <column alignment="center" valignment="top" width="0pt">
  6907. <column alignment="center" valignment="top" width="0pt">
  6908. <column alignment="center" valignment="top" width="0pt">
  6909. <column alignment="center" valignment="top" width="0pt">
  6910. <row>
  6911. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6912. \begin_inset Text
  6913. \begin_layout Plain Layout
  6914. Mode
  6915. \end_layout
  6916. \end_inset
  6917. </cell>
  6918. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6919. \begin_inset Text
  6920. \begin_layout Plain Layout
  6921. Quality
  6922. \end_layout
  6923. \end_inset
  6924. </cell>
  6925. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6926. \begin_inset Text
  6927. \begin_layout Plain Layout
  6928. Bit-rate
  6929. \begin_inset Index
  6930. status collapsed
  6931. \begin_layout Plain Layout
  6932. bit-rate
  6933. \end_layout
  6934. \end_inset
  6935. (bps)
  6936. \end_layout
  6937. \end_inset
  6938. </cell>
  6939. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  6940. \begin_inset Text
  6941. \begin_layout Plain Layout
  6942. mflops
  6943. \begin_inset Index
  6944. status collapsed
  6945. \begin_layout Plain Layout
  6946. complexity
  6947. \end_layout
  6948. \end_inset
  6949. \end_layout
  6950. \end_inset
  6951. </cell>
  6952. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  6953. \begin_inset Text
  6954. \begin_layout Plain Layout
  6955. Quality/description
  6956. \end_layout
  6957. \end_inset
  6958. </cell>
  6959. </row>
  6960. <row>
  6961. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6962. \begin_inset Text
  6963. \begin_layout Plain Layout
  6964. 0
  6965. \end_layout
  6966. \end_inset
  6967. </cell>
  6968. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6969. \begin_inset Text
  6970. \begin_layout Plain Layout
  6971. -
  6972. \end_layout
  6973. \end_inset
  6974. </cell>
  6975. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6976. \begin_inset Text
  6977. \begin_layout Plain Layout
  6978. 250
  6979. \end_layout
  6980. \end_inset
  6981. </cell>
  6982. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6983. \begin_inset Text
  6984. \begin_layout Plain Layout
  6985. 0
  6986. \end_layout
  6987. \end_inset
  6988. </cell>
  6989. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  6990. \begin_inset Text
  6991. \begin_layout Plain Layout
  6992. No transmission (DTX)
  6993. \end_layout
  6994. \end_inset
  6995. </cell>
  6996. </row>
  6997. <row>
  6998. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  6999. \begin_inset Text
  7000. \begin_layout Plain Layout
  7001. 1
  7002. \end_layout
  7003. \end_inset
  7004. </cell>
  7005. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7006. \begin_inset Text
  7007. \begin_layout Plain Layout
  7008. 0
  7009. \end_layout
  7010. \end_inset
  7011. </cell>
  7012. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7013. \begin_inset Text
  7014. \begin_layout Plain Layout
  7015. 2,150
  7016. \end_layout
  7017. \end_inset
  7018. </cell>
  7019. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7020. \begin_inset Text
  7021. \begin_layout Plain Layout
  7022. 6
  7023. \end_layout
  7024. \end_inset
  7025. </cell>
  7026. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7027. \begin_inset Text
  7028. \begin_layout Plain Layout
  7029. Vocoder (mostly for comfort noise)
  7030. \end_layout
  7031. \end_inset
  7032. </cell>
  7033. </row>
  7034. <row>
  7035. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7036. \begin_inset Text
  7037. \begin_layout Plain Layout
  7038. 2
  7039. \end_layout
  7040. \end_inset
  7041. </cell>
  7042. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7043. \begin_inset Text
  7044. \begin_layout Plain Layout
  7045. 2
  7046. \end_layout
  7047. \end_inset
  7048. </cell>
  7049. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7050. \begin_inset Text
  7051. \begin_layout Plain Layout
  7052. 5,950
  7053. \end_layout
  7054. \end_inset
  7055. </cell>
  7056. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7057. \begin_inset Text
  7058. \begin_layout Plain Layout
  7059. 9
  7060. \end_layout
  7061. \end_inset
  7062. </cell>
  7063. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7064. \begin_inset Text
  7065. \begin_layout Plain Layout
  7066. Very noticeable artifacts/noise, good intelligibility
  7067. \end_layout
  7068. \end_inset
  7069. </cell>
  7070. </row>
  7071. <row>
  7072. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7073. \begin_inset Text
  7074. \begin_layout Plain Layout
  7075. 3
  7076. \end_layout
  7077. \end_inset
  7078. </cell>
  7079. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7080. \begin_inset Text
  7081. \begin_layout Plain Layout
  7082. 3-4
  7083. \end_layout
  7084. \end_inset
  7085. </cell>
  7086. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7087. \begin_inset Text
  7088. \begin_layout Plain Layout
  7089. 8,000
  7090. \end_layout
  7091. \end_inset
  7092. </cell>
  7093. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7094. \begin_inset Text
  7095. \begin_layout Plain Layout
  7096. 10
  7097. \end_layout
  7098. \end_inset
  7099. </cell>
  7100. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7101. \begin_inset Text
  7102. \begin_layout Plain Layout
  7103. Artifacts/noise sometimes noticeable
  7104. \end_layout
  7105. \end_inset
  7106. </cell>
  7107. </row>
  7108. <row>
  7109. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7110. \begin_inset Text
  7111. \begin_layout Plain Layout
  7112. 4
  7113. \end_layout
  7114. \end_inset
  7115. </cell>
  7116. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7117. \begin_inset Text
  7118. \begin_layout Plain Layout
  7119. 5-6
  7120. \end_layout
  7121. \end_inset
  7122. </cell>
  7123. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7124. \begin_inset Text
  7125. \begin_layout Plain Layout
  7126. 11,000
  7127. \end_layout
  7128. \end_inset
  7129. </cell>
  7130. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7131. \begin_inset Text
  7132. \begin_layout Plain Layout
  7133. 14
  7134. \end_layout
  7135. \end_inset
  7136. </cell>
  7137. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7138. \begin_inset Text
  7139. \begin_layout Plain Layout
  7140. Artifacts usually noticeable only with headphones
  7141. \end_layout
  7142. \end_inset
  7143. </cell>
  7144. </row>
  7145. <row>
  7146. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7147. \begin_inset Text
  7148. \begin_layout Plain Layout
  7149. 5
  7150. \end_layout
  7151. \end_inset
  7152. </cell>
  7153. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7154. \begin_inset Text
  7155. \begin_layout Plain Layout
  7156. 7-8
  7157. \end_layout
  7158. \end_inset
  7159. </cell>
  7160. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7161. \begin_inset Text
  7162. \begin_layout Plain Layout
  7163. 15,000
  7164. \end_layout
  7165. \end_inset
  7166. </cell>
  7167. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7168. \begin_inset Text
  7169. \begin_layout Plain Layout
  7170. 11
  7171. \end_layout
  7172. \end_inset
  7173. </cell>
  7174. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7175. \begin_inset Text
  7176. \begin_layout Plain Layout
  7177. Need good headphones to tell the difference
  7178. \end_layout
  7179. \end_inset
  7180. </cell>
  7181. </row>
  7182. <row>
  7183. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7184. \begin_inset Text
  7185. \begin_layout Plain Layout
  7186. 6
  7187. \end_layout
  7188. \end_inset
  7189. </cell>
  7190. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7191. \begin_inset Text
  7192. \begin_layout Plain Layout
  7193. 9
  7194. \end_layout
  7195. \end_inset
  7196. </cell>
  7197. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7198. \begin_inset Text
  7199. \begin_layout Plain Layout
  7200. 18,200
  7201. \end_layout
  7202. \end_inset
  7203. </cell>
  7204. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7205. \begin_inset Text
  7206. \begin_layout Plain Layout
  7207. 17.5
  7208. \end_layout
  7209. \end_inset
  7210. </cell>
  7211. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7212. \begin_inset Text
  7213. \begin_layout Plain Layout
  7214. Hard to tell the difference even with good headphones
  7215. \end_layout
  7216. \end_inset
  7217. </cell>
  7218. </row>
  7219. <row>
  7220. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7221. \begin_inset Text
  7222. \begin_layout Plain Layout
  7223. 7
  7224. \end_layout
  7225. \end_inset
  7226. </cell>
  7227. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7228. \begin_inset Text
  7229. \begin_layout Plain Layout
  7230. 10
  7231. \end_layout
  7232. \end_inset
  7233. </cell>
  7234. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7235. \begin_inset Text
  7236. \begin_layout Plain Layout
  7237. 24,600
  7238. \end_layout
  7239. \end_inset
  7240. </cell>
  7241. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7242. \begin_inset Text
  7243. \begin_layout Plain Layout
  7244. 14.5
  7245. \end_layout
  7246. \end_inset
  7247. </cell>
  7248. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7249. \begin_inset Text
  7250. \begin_layout Plain Layout
  7251. Completely transparent for voice, good quality music
  7252. \end_layout
  7253. \end_inset
  7254. </cell>
  7255. </row>
  7256. <row>
  7257. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7258. \begin_inset Text
  7259. \begin_layout Plain Layout
  7260. 8
  7261. \end_layout
  7262. \end_inset
  7263. </cell>
  7264. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7265. \begin_inset Text
  7266. \begin_layout Plain Layout
  7267. 1
  7268. \end_layout
  7269. \end_inset
  7270. </cell>
  7271. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7272. \begin_inset Text
  7273. \begin_layout Plain Layout
  7274. 3,950
  7275. \end_layout
  7276. \end_inset
  7277. </cell>
  7278. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7279. \begin_inset Text
  7280. \begin_layout Plain Layout
  7281. 10.5
  7282. \end_layout
  7283. \end_inset
  7284. </cell>
  7285. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7286. \begin_inset Text
  7287. \begin_layout Plain Layout
  7288. Very noticeable artifacts/noise, good intelligibility
  7289. \end_layout
  7290. \end_inset
  7291. </cell>
  7292. </row>
  7293. <row>
  7294. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7295. \begin_inset Text
  7296. \begin_layout Plain Layout
  7297. 9
  7298. \end_layout
  7299. \end_inset
  7300. </cell>
  7301. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7302. \begin_inset Text
  7303. \begin_layout Plain Layout
  7304. -
  7305. \end_layout
  7306. \end_inset
  7307. </cell>
  7308. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7309. \begin_inset Text
  7310. \begin_layout Plain Layout
  7311. -
  7312. \end_layout
  7313. \end_inset
  7314. </cell>
  7315. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7316. \begin_inset Text
  7317. \begin_layout Plain Layout
  7318. -
  7319. \end_layout
  7320. \end_inset
  7321. </cell>
  7322. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7323. \begin_inset Text
  7324. \begin_layout Plain Layout
  7325. reserved
  7326. \end_layout
  7327. \end_inset
  7328. </cell>
  7329. </row>
  7330. <row>
  7331. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7332. \begin_inset Text
  7333. \begin_layout Plain Layout
  7334. 10
  7335. \end_layout
  7336. \end_inset
  7337. </cell>
  7338. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7339. \begin_inset Text
  7340. \begin_layout Plain Layout
  7341. -
  7342. \end_layout
  7343. \end_inset
  7344. </cell>
  7345. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7346. \begin_inset Text
  7347. \begin_layout Plain Layout
  7348. -
  7349. \end_layout
  7350. \end_inset
  7351. </cell>
  7352. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7353. \begin_inset Text
  7354. \begin_layout Plain Layout
  7355. -
  7356. \end_layout
  7357. \end_inset
  7358. </cell>
  7359. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7360. \begin_inset Text
  7361. \begin_layout Plain Layout
  7362. reserved
  7363. \end_layout
  7364. \end_inset
  7365. </cell>
  7366. </row>
  7367. <row>
  7368. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7369. \begin_inset Text
  7370. \begin_layout Plain Layout
  7371. 11
  7372. \end_layout
  7373. \end_inset
  7374. </cell>
  7375. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7376. \begin_inset Text
  7377. \begin_layout Plain Layout
  7378. -
  7379. \end_layout
  7380. \end_inset
  7381. </cell>
  7382. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7383. \begin_inset Text
  7384. \begin_layout Plain Layout
  7385. -
  7386. \end_layout
  7387. \end_inset
  7388. </cell>
  7389. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7390. \begin_inset Text
  7391. \begin_layout Plain Layout
  7392. -
  7393. \end_layout
  7394. \end_inset
  7395. </cell>
  7396. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7397. \begin_inset Text
  7398. \begin_layout Plain Layout
  7399. reserved
  7400. \end_layout
  7401. \end_inset
  7402. </cell>
  7403. </row>
  7404. <row>
  7405. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7406. \begin_inset Text
  7407. \begin_layout Plain Layout
  7408. 12
  7409. \end_layout
  7410. \end_inset
  7411. </cell>
  7412. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7413. \begin_inset Text
  7414. \begin_layout Plain Layout
  7415. -
  7416. \end_layout
  7417. \end_inset
  7418. </cell>
  7419. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7420. \begin_inset Text
  7421. \begin_layout Plain Layout
  7422. -
  7423. \end_layout
  7424. \end_inset
  7425. </cell>
  7426. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7427. \begin_inset Text
  7428. \begin_layout Plain Layout
  7429. -
  7430. \end_layout
  7431. \end_inset
  7432. </cell>
  7433. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7434. \begin_inset Text
  7435. \begin_layout Plain Layout
  7436. reserved
  7437. \end_layout
  7438. \end_inset
  7439. </cell>
  7440. </row>
  7441. <row>
  7442. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7443. \begin_inset Text
  7444. \begin_layout Plain Layout
  7445. 13
  7446. \end_layout
  7447. \end_inset
  7448. </cell>
  7449. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7450. \begin_inset Text
  7451. \begin_layout Plain Layout
  7452. -
  7453. \end_layout
  7454. \end_inset
  7455. </cell>
  7456. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7457. \begin_inset Text
  7458. \begin_layout Plain Layout
  7459. -
  7460. \end_layout
  7461. \end_inset
  7462. </cell>
  7463. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7464. \begin_inset Text
  7465. \begin_layout Plain Layout
  7466. -
  7467. \end_layout
  7468. \end_inset
  7469. </cell>
  7470. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7471. \begin_inset Text
  7472. \begin_layout Plain Layout
  7473. Application-defined, interpreted by callback or skipped
  7474. \end_layout
  7475. \end_inset
  7476. </cell>
  7477. </row>
  7478. <row>
  7479. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7480. \begin_inset Text
  7481. \begin_layout Plain Layout
  7482. 14
  7483. \end_layout
  7484. \end_inset
  7485. </cell>
  7486. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7487. \begin_inset Text
  7488. \begin_layout Plain Layout
  7489. -
  7490. \end_layout
  7491. \end_inset
  7492. </cell>
  7493. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7494. \begin_inset Text
  7495. \begin_layout Plain Layout
  7496. -
  7497. \end_layout
  7498. \end_inset
  7499. </cell>
  7500. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7501. \begin_inset Text
  7502. \begin_layout Plain Layout
  7503. -
  7504. \end_layout
  7505. \end_inset
  7506. </cell>
  7507. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7508. \begin_inset Text
  7509. \begin_layout Plain Layout
  7510. Speex in-band signaling
  7511. \end_layout
  7512. \end_inset
  7513. </cell>
  7514. </row>
  7515. <row>
  7516. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7517. \begin_inset Text
  7518. \begin_layout Plain Layout
  7519. 15
  7520. \end_layout
  7521. \end_inset
  7522. </cell>
  7523. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7524. \begin_inset Text
  7525. \begin_layout Plain Layout
  7526. -
  7527. \end_layout
  7528. \end_inset
  7529. </cell>
  7530. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7531. \begin_inset Text
  7532. \begin_layout Plain Layout
  7533. -
  7534. \end_layout
  7535. \end_inset
  7536. </cell>
  7537. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7538. \begin_inset Text
  7539. \begin_layout Plain Layout
  7540. -
  7541. \end_layout
  7542. \end_inset
  7543. </cell>
  7544. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  7545. \begin_inset Text
  7546. \begin_layout Plain Layout
  7547. Terminator code
  7548. \end_layout
  7549. \end_inset
  7550. </cell>
  7551. </row>
  7552. </lyxtabular>
  7553. \end_inset
  7554. \begin_inset ERT
  7555. status collapsed
  7556. \begin_layout Plain Layout
  7557. \backslash
  7558. end{center}
  7559. \end_layout
  7560. \end_inset
  7561. \end_layout
  7562. \begin_layout Plain Layout
  7563. \begin_inset Caption
  7564. \begin_layout Plain Layout
  7565. Quality versus bit-rate
  7566. \begin_inset CommandInset label
  7567. LatexCommand label
  7568. name "cap:quality_vs_bps"
  7569. \end_inset
  7570. \end_layout
  7571. \end_inset
  7572. \end_layout
  7573. \end_inset
  7574. \end_layout
  7575. \begin_layout Section
  7576. Perceptual enhancement
  7577. \begin_inset Index
  7578. status collapsed
  7579. \begin_layout Plain Layout
  7580. perceptual enhancement
  7581. \end_layout
  7582. \end_inset
  7583. \end_layout
  7584. \begin_layout Standard
  7585. \series bold
  7586. This section was only valid for version 1.1.12 and earlier.
  7587. It does not apply to version 1.2-beta1 (and later), for which the new perceptual
  7588. enhancement is not yet documented.
  7589. \end_layout
  7590. \begin_layout Standard
  7591. This part of the codec only applies to the decoder and can even be changed
  7592. without affecting inter-operability.
  7593. For that reason, the implementation provided and described here should
  7594. only be considered as a reference implementation.
  7595. The enhancement system is divided into two parts.
  7596. First, the synthesis filter
  7597. \begin_inset Formula $S(z)=1/A(z)$
  7598. \end_inset
  7599. is replaced by an enhanced filter:
  7600. \begin_inset Formula \[
  7601. S'(z)=\frac{A\left(z/a_{2}\right)A\left(z/a_{3}\right)}{A\left(z\right)A\left(z/a_{1}\right)}\]
  7602. \end_inset
  7603. where
  7604. \begin_inset Formula $a_{1}$
  7605. \end_inset
  7606. and
  7607. \begin_inset Formula $a_{2}$
  7608. \end_inset
  7609. depend on the mode in use and
  7610. \begin_inset Formula $a_{3}=\frac{1}{r}\left(1-\frac{1-ra_{1}}{1-ra_{2}}\right)$
  7611. \end_inset
  7612. with
  7613. \begin_inset Formula $r=.9$
  7614. \end_inset
  7615. .
  7616. The second part of the enhancement consists of using a comb filter to enhance
  7617. the pitch in the excitation domain.
  7618. \end_layout
  7619. \begin_layout Standard
  7620. \begin_inset Newpage newpage
  7621. \end_inset
  7622. \end_layout
  7623. \begin_layout Chapter
  7624. Speex wideband mode (sub-band CELP)
  7625. \begin_inset Index
  7626. status collapsed
  7627. \begin_layout Plain Layout
  7628. wideband
  7629. \end_layout
  7630. \end_inset
  7631. \begin_inset CommandInset label
  7632. LatexCommand label
  7633. name "sec:Speex-wideband-mode"
  7634. \end_inset
  7635. \end_layout
  7636. \begin_layout Standard
  7637. For wideband, the Speex approach uses a
  7638. \emph on
  7639. q
  7640. \emph default
  7641. uadrature
  7642. \emph on
  7643. m
  7644. \emph default
  7645. irror
  7646. \emph on
  7647. f
  7648. \emph default
  7649. ilter
  7650. \begin_inset Index
  7651. status collapsed
  7652. \begin_layout Plain Layout
  7653. quadrature mirror filter
  7654. \end_layout
  7655. \end_inset
  7656. (QMF) to split the band in two.
  7657. The 16 kHz signal is thus divided into two 8 kHz signals, one representing
  7658. the low band (0-4 kHz), the other the high band (4-8 kHz).
  7659. The low band is encoded with the narrowband mode described in section
  7660. \begin_inset CommandInset ref
  7661. LatexCommand ref
  7662. reference "sec:Speex-narrowband-mode"
  7663. \end_inset
  7664. in such a way that the resulting
  7665. \begin_inset Quotes eld
  7666. \end_inset
  7667. embedded narrowband bit-stream
  7668. \begin_inset Quotes erd
  7669. \end_inset
  7670. can also be decoded with the narrowband decoder.
  7671. Since the low band encoding has already been described, only the high band
  7672. encoding is described in this section.
  7673. \end_layout
  7674. \begin_layout Section
  7675. Linear Prediction
  7676. \end_layout
  7677. \begin_layout Standard
  7678. The linear prediction part used for the high-band is very similar to what
  7679. is done for narrowband.
  7680. The only difference is that we use only 12 bits to encode the high-band
  7681. LSP's using a multi-stage vector quantizer (MSVQ).
  7682. The first level quantizes the 10 coefficients with 6 bits and the error
  7683. is then quantized using 6 bits, too.
  7684. \end_layout
  7685. \begin_layout Section
  7686. Pitch Prediction
  7687. \end_layout
  7688. \begin_layout Standard
  7689. That part is easy: there's no pitch prediction for the high-band.
  7690. There are two reasons for that.
  7691. First, there is usually little harmonic structure in this band (above 4
  7692. kHz).
  7693. Second, it would be very hard to implement since the QMF folds the 4-8
  7694. kHz band into 4-0 kHz (reversing the frequency axis), which means that
  7695. the location of the harmonics is no longer at multiples of the fundamental
  7696. (pitch).
  7697. \end_layout
  7698. \begin_layout Section
  7699. Excitation Quantization
  7700. \end_layout
  7701. \begin_layout Standard
  7702. The high-band excitation is coded in the same way as for narrowband.
  7703. \end_layout
  7704. \begin_layout Section
  7705. Bit allocation
  7706. \end_layout
  7707. \begin_layout Standard
  7708. For the wideband mode, the entire narrowband frame is packed before the
  7709. high-band is encoded.
  7710. The narrowband part of the bit-stream is as defined in table
  7711. \begin_inset CommandInset ref
  7712. LatexCommand ref
  7713. reference "cap:bits-narrowband"
  7714. \end_inset
  7715. .
  7716. The high-band follows, as described in table
  7717. \begin_inset CommandInset ref
  7718. LatexCommand ref
  7719. reference "cap:bits-wideband"
  7720. \end_inset
  7721. .
  7722. For wideband, the mode ID is the same as the Speex quality setting and
  7723. is defined in table
  7724. \begin_inset CommandInset ref
  7725. LatexCommand ref
  7726. reference "tab:wideband-quality"
  7727. \end_inset
  7728. .
  7729. This also means that a wideband frame may be correctly decoded by a narrowband
  7730. decoder with the only caveat that if more than one frame is packed in the
  7731. same packet, the decoder will need to skip the high-band parts in order
  7732. to sync with the bit-stream.
  7733. \end_layout
  7734. \begin_layout Standard
  7735. \begin_inset Float table
  7736. placement h
  7737. wide true
  7738. sideways false
  7739. status open
  7740. \begin_layout Plain Layout
  7741. \begin_inset ERT
  7742. status collapsed
  7743. \begin_layout Plain Layout
  7744. \backslash
  7745. begin{center}
  7746. \end_layout
  7747. \end_inset
  7748. \begin_inset Tabular
  7749. <lyxtabular version="3" rows="7" columns="7">
  7750. <features>
  7751. <column alignment="center" valignment="top" width="0pt">
  7752. <column alignment="center" valignment="top" width="0pt">
  7753. <column alignment="center" valignment="top" width="0pt">
  7754. <column alignment="center" valignment="top" width="0pt">
  7755. <column alignment="center" valignment="top" width="0pt">
  7756. <column alignment="center" valignment="top" width="0pt">
  7757. <column alignment="center" valignment="top" width="0pt">
  7758. <row>
  7759. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7760. \begin_inset Text
  7761. \begin_layout Plain Layout
  7762. Parameter
  7763. \end_layout
  7764. \end_inset
  7765. </cell>
  7766. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7767. \begin_inset Text
  7768. \begin_layout Plain Layout
  7769. Update rate
  7770. \end_layout
  7771. \end_inset
  7772. </cell>
  7773. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7774. \begin_inset Text
  7775. \begin_layout Plain Layout
  7776. 0
  7777. \end_layout
  7778. \end_inset
  7779. </cell>
  7780. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7781. \begin_inset Text
  7782. \begin_layout Plain Layout
  7783. 1
  7784. \end_layout
  7785. \end_inset
  7786. </cell>
  7787. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7788. \begin_inset Text
  7789. \begin_layout Plain Layout
  7790. 2
  7791. \end_layout
  7792. \end_inset
  7793. </cell>
  7794. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  7795. \begin_inset Text
  7796. \begin_layout Plain Layout
  7797. 3
  7798. \end_layout
  7799. \end_inset
  7800. </cell>
  7801. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  7802. \begin_inset Text
  7803. \begin_layout Plain Layout
  7804. 4
  7805. \end_layout
  7806. \end_inset
  7807. </cell>
  7808. </row>
  7809. <row>
  7810. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7811. \begin_inset Text
  7812. \begin_layout Plain Layout
  7813. Wideband bit
  7814. \end_layout
  7815. \end_inset
  7816. </cell>
  7817. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7818. \begin_inset Text
  7819. \begin_layout Plain Layout
  7820. frame
  7821. \end_layout
  7822. \end_inset
  7823. </cell>
  7824. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7825. \begin_inset Text
  7826. \begin_layout Plain Layout
  7827. 1
  7828. \end_layout
  7829. \end_inset
  7830. </cell>
  7831. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7832. \begin_inset Text
  7833. \begin_layout Plain Layout
  7834. 1
  7835. \end_layout
  7836. \end_inset
  7837. </cell>
  7838. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7839. \begin_inset Text
  7840. \begin_layout Plain Layout
  7841. 1
  7842. \end_layout
  7843. \end_inset
  7844. </cell>
  7845. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7846. \begin_inset Text
  7847. \begin_layout Plain Layout
  7848. 1
  7849. \end_layout
  7850. \end_inset
  7851. </cell>
  7852. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7853. \begin_inset Text
  7854. \begin_layout Plain Layout
  7855. 1
  7856. \end_layout
  7857. \end_inset
  7858. </cell>
  7859. </row>
  7860. <row>
  7861. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7862. \begin_inset Text
  7863. \begin_layout Plain Layout
  7864. Mode ID
  7865. \end_layout
  7866. \end_inset
  7867. </cell>
  7868. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7869. \begin_inset Text
  7870. \begin_layout Plain Layout
  7871. frame
  7872. \end_layout
  7873. \end_inset
  7874. </cell>
  7875. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7876. \begin_inset Text
  7877. \begin_layout Plain Layout
  7878. 3
  7879. \end_layout
  7880. \end_inset
  7881. </cell>
  7882. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7883. \begin_inset Text
  7884. \begin_layout Plain Layout
  7885. 3
  7886. \end_layout
  7887. \end_inset
  7888. </cell>
  7889. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7890. \begin_inset Text
  7891. \begin_layout Plain Layout
  7892. 3
  7893. \end_layout
  7894. \end_inset
  7895. </cell>
  7896. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7897. \begin_inset Text
  7898. \begin_layout Plain Layout
  7899. 3
  7900. \end_layout
  7901. \end_inset
  7902. </cell>
  7903. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7904. \begin_inset Text
  7905. \begin_layout Plain Layout
  7906. 3
  7907. \end_layout
  7908. \end_inset
  7909. </cell>
  7910. </row>
  7911. <row>
  7912. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7913. \begin_inset Text
  7914. \begin_layout Plain Layout
  7915. LSP
  7916. \end_layout
  7917. \end_inset
  7918. </cell>
  7919. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7920. \begin_inset Text
  7921. \begin_layout Plain Layout
  7922. frame
  7923. \end_layout
  7924. \end_inset
  7925. </cell>
  7926. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7927. \begin_inset Text
  7928. \begin_layout Plain Layout
  7929. 0
  7930. \end_layout
  7931. \end_inset
  7932. </cell>
  7933. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7934. \begin_inset Text
  7935. \begin_layout Plain Layout
  7936. 12
  7937. \end_layout
  7938. \end_inset
  7939. </cell>
  7940. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7941. \begin_inset Text
  7942. \begin_layout Plain Layout
  7943. 12
  7944. \end_layout
  7945. \end_inset
  7946. </cell>
  7947. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7948. \begin_inset Text
  7949. \begin_layout Plain Layout
  7950. 12
  7951. \end_layout
  7952. \end_inset
  7953. </cell>
  7954. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  7955. \begin_inset Text
  7956. \begin_layout Plain Layout
  7957. 12
  7958. \end_layout
  7959. \end_inset
  7960. </cell>
  7961. </row>
  7962. <row>
  7963. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7964. \begin_inset Text
  7965. \begin_layout Plain Layout
  7966. Excitation gain
  7967. \end_layout
  7968. \end_inset
  7969. </cell>
  7970. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7971. \begin_inset Text
  7972. \begin_layout Plain Layout
  7973. sub-frame
  7974. \end_layout
  7975. \end_inset
  7976. </cell>
  7977. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7978. \begin_inset Text
  7979. \begin_layout Plain Layout
  7980. 0
  7981. \end_layout
  7982. \end_inset
  7983. </cell>
  7984. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7985. \begin_inset Text
  7986. \begin_layout Plain Layout
  7987. 5
  7988. \end_layout
  7989. \end_inset
  7990. </cell>
  7991. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7992. \begin_inset Text
  7993. \begin_layout Plain Layout
  7994. 4
  7995. \end_layout
  7996. \end_inset
  7997. </cell>
  7998. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  7999. \begin_inset Text
  8000. \begin_layout Plain Layout
  8001. 4
  8002. \end_layout
  8003. \end_inset
  8004. </cell>
  8005. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8006. \begin_inset Text
  8007. \begin_layout Plain Layout
  8008. 4
  8009. \end_layout
  8010. \end_inset
  8011. </cell>
  8012. </row>
  8013. <row>
  8014. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8015. \begin_inset Text
  8016. \begin_layout Plain Layout
  8017. Excitation VQ
  8018. \end_layout
  8019. \end_inset
  8020. </cell>
  8021. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8022. \begin_inset Text
  8023. \begin_layout Plain Layout
  8024. sub-frame
  8025. \end_layout
  8026. \end_inset
  8027. </cell>
  8028. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8029. \begin_inset Text
  8030. \begin_layout Plain Layout
  8031. 0
  8032. \end_layout
  8033. \end_inset
  8034. </cell>
  8035. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8036. \begin_inset Text
  8037. \begin_layout Plain Layout
  8038. 0
  8039. \end_layout
  8040. \end_inset
  8041. </cell>
  8042. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8043. \begin_inset Text
  8044. \begin_layout Plain Layout
  8045. 20
  8046. \end_layout
  8047. \end_inset
  8048. </cell>
  8049. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8050. \begin_inset Text
  8051. \begin_layout Plain Layout
  8052. 40
  8053. \end_layout
  8054. \end_inset
  8055. </cell>
  8056. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  8057. \begin_inset Text
  8058. \begin_layout Plain Layout
  8059. 80
  8060. \end_layout
  8061. \end_inset
  8062. </cell>
  8063. </row>
  8064. <row>
  8065. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8066. \begin_inset Text
  8067. \begin_layout Plain Layout
  8068. Total
  8069. \end_layout
  8070. \end_inset
  8071. </cell>
  8072. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8073. \begin_inset Text
  8074. \begin_layout Plain Layout
  8075. frame
  8076. \end_layout
  8077. \end_inset
  8078. </cell>
  8079. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8080. \begin_inset Text
  8081. \begin_layout Plain Layout
  8082. 4
  8083. \end_layout
  8084. \end_inset
  8085. </cell>
  8086. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8087. \begin_inset Text
  8088. \begin_layout Plain Layout
  8089. 36
  8090. \end_layout
  8091. \end_inset
  8092. </cell>
  8093. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8094. \begin_inset Text
  8095. \begin_layout Plain Layout
  8096. 112
  8097. \end_layout
  8098. \end_inset
  8099. </cell>
  8100. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8101. \begin_inset Text
  8102. \begin_layout Plain Layout
  8103. 192
  8104. \end_layout
  8105. \end_inset
  8106. </cell>
  8107. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  8108. \begin_inset Text
  8109. \begin_layout Plain Layout
  8110. 352
  8111. \end_layout
  8112. \end_inset
  8113. </cell>
  8114. </row>
  8115. </lyxtabular>
  8116. \end_inset
  8117. \begin_inset ERT
  8118. status collapsed
  8119. \begin_layout Plain Layout
  8120. \backslash
  8121. end{center}
  8122. \end_layout
  8123. \end_inset
  8124. \end_layout
  8125. \begin_layout Plain Layout
  8126. \begin_inset Caption
  8127. \begin_layout Plain Layout
  8128. Bit allocation for high-band in wideband mode
  8129. \begin_inset CommandInset label
  8130. LatexCommand label
  8131. name "cap:bits-wideband"
  8132. \end_inset
  8133. \end_layout
  8134. \end_inset
  8135. \end_layout
  8136. \end_inset
  8137. \end_layout
  8138. \begin_layout Standard
  8139. \begin_inset Float table
  8140. placement h
  8141. wide true
  8142. sideways false
  8143. status open
  8144. \begin_layout Plain Layout
  8145. \begin_inset ERT
  8146. status collapsed
  8147. \begin_layout Plain Layout
  8148. \backslash
  8149. begin{center}
  8150. \end_layout
  8151. \end_inset
  8152. \begin_inset Tabular
  8153. <lyxtabular version="3" rows="12" columns="3">
  8154. <features>
  8155. <column alignment="center" valignment="top" width="0pt">
  8156. <column alignment="center" valignment="top" width="0pt">
  8157. <column alignment="center" valignment="top" width="0pt">
  8158. <row>
  8159. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8160. \begin_inset Text
  8161. \begin_layout Plain Layout
  8162. Mode/Quality
  8163. \end_layout
  8164. \end_inset
  8165. </cell>
  8166. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8167. \begin_inset Text
  8168. \begin_layout Plain Layout
  8169. Bit-rate
  8170. \begin_inset Index
  8171. status collapsed
  8172. \begin_layout Plain Layout
  8173. bit-rate
  8174. \end_layout
  8175. \end_inset
  8176. (bps)
  8177. \end_layout
  8178. \end_inset
  8179. </cell>
  8180. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  8181. \begin_inset Text
  8182. \begin_layout Plain Layout
  8183. Quality/description
  8184. \end_layout
  8185. \end_inset
  8186. </cell>
  8187. </row>
  8188. <row>
  8189. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8190. \begin_inset Text
  8191. \begin_layout Plain Layout
  8192. 0
  8193. \end_layout
  8194. \end_inset
  8195. </cell>
  8196. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8197. \begin_inset Text
  8198. \begin_layout Plain Layout
  8199. 3,950
  8200. \end_layout
  8201. \end_inset
  8202. </cell>
  8203. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8204. \begin_inset Text
  8205. \begin_layout Plain Layout
  8206. Barely intelligible (mostly for comfort noise)
  8207. \end_layout
  8208. \end_inset
  8209. </cell>
  8210. </row>
  8211. <row>
  8212. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8213. \begin_inset Text
  8214. \begin_layout Plain Layout
  8215. 1
  8216. \end_layout
  8217. \end_inset
  8218. </cell>
  8219. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8220. \begin_inset Text
  8221. \begin_layout Plain Layout
  8222. 5,750
  8223. \end_layout
  8224. \end_inset
  8225. </cell>
  8226. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8227. \begin_inset Text
  8228. \begin_layout Plain Layout
  8229. Very noticeable artifacts/noise, poor intelligibility
  8230. \end_layout
  8231. \end_inset
  8232. </cell>
  8233. </row>
  8234. <row>
  8235. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8236. \begin_inset Text
  8237. \begin_layout Plain Layout
  8238. 2
  8239. \end_layout
  8240. \end_inset
  8241. </cell>
  8242. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8243. \begin_inset Text
  8244. \begin_layout Plain Layout
  8245. 7,750
  8246. \end_layout
  8247. \end_inset
  8248. </cell>
  8249. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8250. \begin_inset Text
  8251. \begin_layout Plain Layout
  8252. Very noticeable artifacts/noise, good intelligibility
  8253. \end_layout
  8254. \end_inset
  8255. </cell>
  8256. </row>
  8257. <row>
  8258. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8259. \begin_inset Text
  8260. \begin_layout Plain Layout
  8261. 3
  8262. \end_layout
  8263. \end_inset
  8264. </cell>
  8265. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8266. \begin_inset Text
  8267. \begin_layout Plain Layout
  8268. 9,800
  8269. \end_layout
  8270. \end_inset
  8271. </cell>
  8272. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8273. \begin_inset Text
  8274. \begin_layout Plain Layout
  8275. Artifacts/noise sometimes annoying
  8276. \end_layout
  8277. \end_inset
  8278. </cell>
  8279. </row>
  8280. <row>
  8281. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8282. \begin_inset Text
  8283. \begin_layout Plain Layout
  8284. 4
  8285. \end_layout
  8286. \end_inset
  8287. </cell>
  8288. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8289. \begin_inset Text
  8290. \begin_layout Plain Layout
  8291. 12,800
  8292. \end_layout
  8293. \end_inset
  8294. </cell>
  8295. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8296. \begin_inset Text
  8297. \begin_layout Plain Layout
  8298. Artifacts/noise usually noticeable
  8299. \end_layout
  8300. \end_inset
  8301. </cell>
  8302. </row>
  8303. <row>
  8304. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8305. \begin_inset Text
  8306. \begin_layout Plain Layout
  8307. 5
  8308. \end_layout
  8309. \end_inset
  8310. </cell>
  8311. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8312. \begin_inset Text
  8313. \begin_layout Plain Layout
  8314. 16,800
  8315. \end_layout
  8316. \end_inset
  8317. </cell>
  8318. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8319. \begin_inset Text
  8320. \begin_layout Plain Layout
  8321. Artifacts/noise sometimes noticeable
  8322. \end_layout
  8323. \end_inset
  8324. </cell>
  8325. </row>
  8326. <row>
  8327. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8328. \begin_inset Text
  8329. \begin_layout Plain Layout
  8330. 6
  8331. \end_layout
  8332. \end_inset
  8333. </cell>
  8334. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8335. \begin_inset Text
  8336. \begin_layout Plain Layout
  8337. 20,600
  8338. \end_layout
  8339. \end_inset
  8340. </cell>
  8341. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8342. \begin_inset Text
  8343. \begin_layout Plain Layout
  8344. Need good headphones to tell the difference
  8345. \end_layout
  8346. \end_inset
  8347. </cell>
  8348. </row>
  8349. <row>
  8350. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8351. \begin_inset Text
  8352. \begin_layout Plain Layout
  8353. 7
  8354. \end_layout
  8355. \end_inset
  8356. </cell>
  8357. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8358. \begin_inset Text
  8359. \begin_layout Plain Layout
  8360. 23,800
  8361. \end_layout
  8362. \end_inset
  8363. </cell>
  8364. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8365. \begin_inset Text
  8366. \begin_layout Plain Layout
  8367. Need good headphones to tell the difference
  8368. \end_layout
  8369. \end_inset
  8370. </cell>
  8371. </row>
  8372. <row>
  8373. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8374. \begin_inset Text
  8375. \begin_layout Plain Layout
  8376. 8
  8377. \end_layout
  8378. \end_inset
  8379. </cell>
  8380. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8381. \begin_inset Text
  8382. \begin_layout Plain Layout
  8383. 27,800
  8384. \end_layout
  8385. \end_inset
  8386. </cell>
  8387. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8388. \begin_inset Text
  8389. \begin_layout Plain Layout
  8390. Hard to tell the difference even with good headphones
  8391. \end_layout
  8392. \end_inset
  8393. </cell>
  8394. </row>
  8395. <row>
  8396. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8397. \begin_inset Text
  8398. \begin_layout Plain Layout
  8399. 9
  8400. \end_layout
  8401. \end_inset
  8402. </cell>
  8403. <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
  8404. \begin_inset Text
  8405. \begin_layout Plain Layout
  8406. 34,200
  8407. \end_layout
  8408. \end_inset
  8409. </cell>
  8410. <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
  8411. \begin_inset Text
  8412. \begin_layout Plain Layout
  8413. Hard to tell the difference even with good headphones
  8414. \end_layout
  8415. \end_inset
  8416. </cell>
  8417. </row>
  8418. <row>
  8419. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8420. \begin_inset Text
  8421. \begin_layout Plain Layout
  8422. 10
  8423. \end_layout
  8424. \end_inset
  8425. </cell>
  8426. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
  8427. \begin_inset Text
  8428. \begin_layout Plain Layout
  8429. 42,200
  8430. \end_layout
  8431. \end_inset
  8432. </cell>
  8433. <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
  8434. \begin_inset Text
  8435. \begin_layout Plain Layout
  8436. Completely transparent for voice, good quality music
  8437. \end_layout
  8438. \end_inset
  8439. </cell>
  8440. </row>
  8441. </lyxtabular>
  8442. \end_inset
  8443. \begin_inset ERT
  8444. status collapsed
  8445. \begin_layout Plain Layout
  8446. \backslash
  8447. end{center}
  8448. \end_layout
  8449. \end_inset
  8450. \end_layout
  8451. \begin_layout Plain Layout
  8452. \begin_inset Caption
  8453. \begin_layout Plain Layout
  8454. Quality versus bit-rate for the wideband encoder
  8455. \begin_inset CommandInset label
  8456. LatexCommand label
  8457. name "tab:wideband-quality"
  8458. \end_inset
  8459. \end_layout
  8460. \end_inset
  8461. \end_layout
  8462. \end_inset
  8463. \end_layout
  8464. \begin_layout Standard
  8465. \begin_inset ERT
  8466. status open
  8467. \begin_layout Plain Layout
  8468. \backslash
  8469. clearpage
  8470. \end_layout
  8471. \end_inset
  8472. \end_layout
  8473. \begin_layout Standard
  8474. \begin_inset ERT
  8475. status collapsed
  8476. \begin_layout Plain Layout
  8477. \backslash
  8478. clearpage
  8479. \end_layout
  8480. \end_inset
  8481. \end_layout
  8482. \begin_layout Chapter
  8483. \start_of_appendix
  8484. Sample code
  8485. \begin_inset CommandInset label
  8486. LatexCommand label
  8487. name "sec:Sample-code"
  8488. \end_inset
  8489. \end_layout
  8490. \begin_layout Standard
  8491. This section shows sample code for encoding and decoding speech using the
  8492. Speex API.
  8493. The commands can be used to encode and decode a file by calling:
  8494. \family typewriter
  8495. \begin_inset Newline newline
  8496. \end_inset
  8497. % sampleenc in_file.sw | sampledec out_file.sw
  8498. \family default
  8499. \begin_inset Newline newline
  8500. \end_inset
  8501. where both files are raw (no header) files encoded at 16 bits per sample
  8502. (in the machine natural endianness).
  8503. \end_layout
  8504. \begin_layout Section
  8505. sampleenc.c
  8506. \end_layout
  8507. \begin_layout Standard
  8508. sampleenc takes a raw 16 bits/sample file, encodes it and outputs a Speex
  8509. stream to stdout.
  8510. Note that the packing used is
  8511. \series bold
  8512. not
  8513. \series default
  8514. compatible with that of speexenc/speexdec.
  8515. \end_layout
  8516. \begin_layout Standard
  8517. \begin_inset CommandInset include
  8518. LatexCommand lstinputlisting
  8519. filename "sampleenc.c"
  8520. lstparams "caption={Source code for sampleenc},label={sampleenc-source-code},numbers=left,numberstyle={\\footnotesize}"
  8521. \end_inset
  8522. \end_layout
  8523. \begin_layout Section
  8524. sampledec.c
  8525. \end_layout
  8526. \begin_layout Standard
  8527. sampledec reads a Speex stream from stdin, decodes it and outputs it to
  8528. a raw 16 bits/sample file.
  8529. Note that the packing used is
  8530. \series bold
  8531. not
  8532. \series default
  8533. compatible with that of speexenc/speexdec.
  8534. \end_layout
  8535. \begin_layout Standard
  8536. \begin_inset CommandInset include
  8537. LatexCommand lstinputlisting
  8538. filename "sampledec.c"
  8539. lstparams "caption={Source code for sampledec},label={sampledec-source-code},numbers=left,numberstyle={\\footnotesize}"
  8540. \end_inset
  8541. \end_layout
  8542. \begin_layout Standard
  8543. \begin_inset Newpage newpage
  8544. \end_inset
  8545. \end_layout
  8546. \begin_layout Chapter
  8547. Jitter Buffer for Speex
  8548. \end_layout
  8549. \begin_layout Standard
  8550. \begin_inset CommandInset include
  8551. LatexCommand lstinputlisting
  8552. filename "../speexclient/speex_jitter_buffer.c"
  8553. lstparams "caption={Example of using the jitter buffer for Speex packets},label={example-speex-jitter},numbers=left,numberstyle={\\footnotesize}"
  8554. \end_inset
  8555. \end_layout
  8556. \begin_layout Standard
  8557. \begin_inset Newpage newpage
  8558. \end_inset
  8559. \end_layout
  8560. \begin_layout Chapter
  8561. IETF RTP Profile
  8562. \begin_inset CommandInset label
  8563. LatexCommand label
  8564. name "sec:IETF-draft"
  8565. \end_inset
  8566. \end_layout
  8567. \begin_layout Standard
  8568. \begin_inset CommandInset include
  8569. LatexCommand verbatiminput
  8570. filename "draft-ietf-avt-rtp-speex-05-tmp.txt"
  8571. \end_inset
  8572. \end_layout
  8573. \begin_layout Standard
  8574. \begin_inset Newpage newpage
  8575. \end_inset
  8576. \end_layout
  8577. \begin_layout Chapter
  8578. Speex License
  8579. \begin_inset CommandInset label
  8580. LatexCommand label
  8581. name "sec:Speex-License"
  8582. \end_inset
  8583. \end_layout
  8584. \begin_layout Standard
  8585. \begin_inset CommandInset include
  8586. LatexCommand verbatiminput
  8587. filename "../COPYING"
  8588. \end_inset
  8589. \end_layout
  8590. \begin_layout Standard
  8591. \begin_inset Newpage newpage
  8592. \end_inset
  8593. \end_layout
  8594. \begin_layout Chapter
  8595. GNU Free Documentation License
  8596. \end_layout
  8597. \begin_layout Standard
  8598. Version 1.1, March 2000
  8599. \end_layout
  8600. \begin_layout Standard
  8601. Copyright (C) 2000 Free Software Foundation, Inc.
  8602. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted
  8603. to copy and distribute verbatim copies of this license document, but changing
  8604. it is not allowed.
  8605. \end_layout
  8606. \begin_layout Section*
  8607. 0.
  8608. PREAMBLE
  8609. \end_layout
  8610. \begin_layout Standard
  8611. The purpose of this License is to make a manual, textbook, or other written
  8612. document "free" in the sense of freedom: to assure everyone the effective
  8613. freedom to copy and redistribute it, with or without modifying it, either
  8614. commercially or noncommercially.
  8615. Secondarily, this License preserves for the author and publisher a way
  8616. to get credit for their work, while not being considered responsible for
  8617. modifications made by others.
  8618. \end_layout
  8619. \begin_layout Standard
  8620. This License is a kind of "copyleft", which means that derivative works
  8621. of the document must themselves be free in the same sense.
  8622. It complements the GNU General Public License, which is a copyleft license
  8623. designed for free software.
  8624. \end_layout
  8625. \begin_layout Standard
  8626. We have designed this License in order to use it for manuals for free software,
  8627. because free software needs free documentation: a free program should come
  8628. with manuals providing the same freedoms that the software does.
  8629. But this License is not limited to software manuals; it can be used for
  8630. any textual work, regardless of subject matter or whether it is published
  8631. as a printed book.
  8632. We recommend this License principally for works whose purpose is instruction
  8633. or reference.
  8634. \end_layout
  8635. \begin_layout Section*
  8636. 1.
  8637. APPLICABILITY AND DEFINITIONS
  8638. \end_layout
  8639. \begin_layout Standard
  8640. This License applies to any manual or other work that contains a notice
  8641. placed by the copyright holder saying it can be distributed under the terms
  8642. of this License.
  8643. The "Document", below, refers to any such manual or work.
  8644. Any member of the public is a licensee, and is addressed as "you".
  8645. \end_layout
  8646. \begin_layout Standard
  8647. A "Modified Version" of the Document means any work containing the Document
  8648. or a portion of it, either copied verbatim, or with modifications and/or
  8649. translated into another language.
  8650. \end_layout
  8651. \begin_layout Standard
  8652. A "Secondary Section" is a named appendix or a front-matter section of the
  8653. Document that deals exclusively with the relationship of the publishers
  8654. or authors of the Document to the Document's overall subject (or to related
  8655. matters) and contains nothing that could fall directly within that overall
  8656. subject.
  8657. (For example, if the Document is in part a textbook of mathematics, a Secondary
  8658. Section may not explain any mathematics.) The relationship could be a matter
  8659. of historical connection with the subject or with related matters, or of
  8660. legal, commercial, philosophical, ethical or political position regarding
  8661. them.
  8662. \end_layout
  8663. \begin_layout Standard
  8664. The "Invariant Sections" are certain Secondary Sections whose titles are
  8665. designated, as being those of Invariant Sections, in the notice that says
  8666. that the Document is released under this License.
  8667. \end_layout
  8668. \begin_layout Standard
  8669. The "Cover Texts" are certain short passages of text that are listed, as
  8670. Front-Cover Texts or Back-Cover Texts, in the notice that says that the
  8671. Document is released under this License.
  8672. \end_layout
  8673. \begin_layout Standard
  8674. A "Transparent" copy of the Document means a machine-readable copy, represented
  8675. in a format whose specification is available to the general public, whose
  8676. contents can be viewed and edited directly and straightforwardly with generic
  8677. text editors or (for images composed of pixels) generic paint programs
  8678. or (for drawings) some widely available drawing editor, and that is suitable
  8679. for input to text formatters or for automatic translation to a variety
  8680. of formats suitable for input to text formatters.
  8681. A copy made in an otherwise Transparent file format whose markup has been
  8682. designed to thwart or discourage subsequent modification by readers is
  8683. not Transparent.
  8684. A copy that is not "Transparent" is called "Opaque".
  8685. \end_layout
  8686. \begin_layout Standard
  8687. Examples of suitable formats for Transparent copies include plain ASCII
  8688. without markup, Texinfo input format, LaTeX input format, SGML or XML using
  8689. a publicly available DTD, and standard-conforming simple HTML designed
  8690. for human modification.
  8691. Opaque formats include PostScript, PDF, proprietary formats that can be
  8692. read and edited only by proprietary word processors, SGML or XML for which
  8693. the DTD and/or processing tools are not generally available, and the machine-ge
  8694. nerated HTML produced by some word processors for output purposes only.
  8695. \end_layout
  8696. \begin_layout Standard
  8697. The "Title Page" means, for a printed book, the title page itself, plus
  8698. such following pages as are needed to hold, legibly, the material this
  8699. License requires to appear in the title page.
  8700. For works in formats which do not have any title page as such, "Title Page"
  8701. means the text near the most prominent appearance of the work's title,
  8702. preceding the beginning of the body of the text.
  8703. \end_layout
  8704. \begin_layout Section*
  8705. 2.
  8706. VERBATIM COPYING
  8707. \end_layout
  8708. \begin_layout Standard
  8709. You may copy and distribute the Document in any medium, either commercially
  8710. or noncommercially, provided that this License, the copyright notices,
  8711. and the license notice saying this License applies to the Document are
  8712. reproduced in all copies, and that you add no other conditions whatsoever
  8713. to those of this License.
  8714. You may not use technical measures to obstruct or control the reading or
  8715. further copying of the copies you make or distribute.
  8716. However, you may accept compensation in exchange for copies.
  8717. If you distribute a large enough number of copies you must also follow
  8718. the conditions in section 3.
  8719. \end_layout
  8720. \begin_layout Standard
  8721. You may also lend copies, under the same conditions stated above, and you
  8722. may publicly display copies.
  8723. \end_layout
  8724. \begin_layout Section*
  8725. 3.
  8726. COPYING IN QUANTITY
  8727. \end_layout
  8728. \begin_layout Standard
  8729. If you publish printed copies of the Document numbering more than 100, and
  8730. the Document's license notice requires Cover Texts, you must enclose the
  8731. copies in covers that carry, clearly and legibly, all these Cover Texts:
  8732. Front-Cover Texts on the front cover, and Back-Cover Texts on the back
  8733. cover.
  8734. Both covers must also clearly and legibly identify you as the publisher
  8735. of these copies.
  8736. The front cover must present the full title with all words of the title
  8737. equally prominent and visible.
  8738. You may add other material on the covers in addition.
  8739. Copying with changes limited to the covers, as long as they preserve the
  8740. title of the Document and satisfy these conditions, can be treated as verbatim
  8741. copying in other respects.
  8742. \end_layout
  8743. \begin_layout Standard
  8744. If the required texts for either cover are too voluminous to fit legibly,
  8745. you should put the first ones listed (as many as fit reasonably) on the
  8746. actual cover, and continue the rest onto adjacent pages.
  8747. \end_layout
  8748. \begin_layout Standard
  8749. If you publish or distribute Opaque copies of the Document numbering more
  8750. than 100, you must either include a machine-readable Transparent copy along
  8751. with each Opaque copy, or state in or with each Opaque copy a publicly-accessib
  8752. le computer-network location containing a complete Transparent copy of the
  8753. Document, free of added material, which the general network-using public
  8754. has access to download anonymously at no charge using public-standard network
  8755. protocols.
  8756. If you use the latter option, you must take reasonably prudent steps, when
  8757. you begin distribution of Opaque copies in quantity, to ensure that this
  8758. Transparent copy will remain thus accessible at the stated location until
  8759. at least one year after the last time you distribute an Opaque copy (directly
  8760. or through your agents or retailers) of that edition to the public.
  8761. \end_layout
  8762. \begin_layout Standard
  8763. It is requested, but not required, that you contact the authors of the Document
  8764. well before redistributing any large number of copies, to give them a chance
  8765. to provide you with an updated version of the Document.
  8766. \end_layout
  8767. \begin_layout Section*
  8768. 4.
  8769. MODIFICATIONS
  8770. \end_layout
  8771. \begin_layout Standard
  8772. You may copy and distribute a Modified Version of the Document under the
  8773. conditions of sections 2 and 3 above, provided that you release the Modified
  8774. Version under precisely this License, with the Modified Version filling
  8775. the role of the Document, thus licensing distribution and modification
  8776. of the Modified Version to whoever possesses a copy of it.
  8777. In addition, you must do these things in the Modified Version:
  8778. \end_layout
  8779. \begin_layout Itemize
  8780. A.
  8781. Use in the Title Page (and on the covers, if any) a title distinct from
  8782. that of the Document, and from those of previous versions (which should,
  8783. if there were any, be listed in the History section of the Document).
  8784. You may use the same title as a previous version if the original publisher
  8785. of that version gives permission.
  8786. \end_layout
  8787. \begin_layout Itemize
  8788. B.
  8789. List on the Title Page, as authors, one or more persons or entities responsible
  8790. for authorship of the modifications in the Modified Version, together with
  8791. at least five of the principal authors of the Document (all of its principal
  8792. authors, if it has less than five).
  8793. \end_layout
  8794. \begin_layout Itemize
  8795. C.
  8796. State on the Title page the name of the publisher of the Modified Version,
  8797. as the publisher.
  8798. \end_layout
  8799. \begin_layout Itemize
  8800. D.
  8801. Preserve all the copyright notices of the Document.
  8802. \end_layout
  8803. \begin_layout Itemize
  8804. E.
  8805. Add an appropriate copyright notice for your modifications adjacent to
  8806. the other copyright notices.
  8807. \end_layout
  8808. \begin_layout Itemize
  8809. F.
  8810. Include, immediately after the copyright notices, a license notice giving
  8811. the public permission to use the Modified Version under the terms of this
  8812. License, in the form shown in the Addendum below.
  8813. \end_layout
  8814. \begin_layout Itemize
  8815. G.
  8816. Preserve in that license notice the full lists of Invariant Sections and
  8817. required Cover Texts given in the Document's license notice.
  8818. \end_layout
  8819. \begin_layout Itemize
  8820. H.
  8821. Include an unaltered copy of this License.
  8822. \end_layout
  8823. \begin_layout Itemize
  8824. I.
  8825. Preserve the section entitled "History", and its title, and add to it an
  8826. item stating at least the title, year, new authors, and publisher of the
  8827. Modified Version as given on the Title Page.
  8828. If there is no section entitled "History" in the Document, create one stating
  8829. the title, year, authors, and publisher of the Document as given on its
  8830. Title Page, then add an item describing the Modified Version as stated
  8831. in the previous sentence.
  8832. \end_layout
  8833. \begin_layout Itemize
  8834. J.
  8835. Preserve the network location, if any, given in the Document for public
  8836. access to a Transparent copy of the Document, and likewise the network
  8837. locations given in the Document for previous versions it was based on.
  8838. These may be placed in the "History" section.
  8839. You may omit a network location for a work that was published at least
  8840. four years before the Document itself, or if the original publisher of
  8841. the version it refers to gives permission.
  8842. \end_layout
  8843. \begin_layout Itemize
  8844. K.
  8845. In any section entitled "Acknowledgements" or "Dedications", preserve the
  8846. section's title, and preserve in the section all the substance and tone
  8847. of each of the contributor acknowledgements and/or dedications given therein.
  8848. \end_layout
  8849. \begin_layout Itemize
  8850. L.
  8851. Preserve all the Invariant Sections of the Document, unaltered in their
  8852. text and in their titles.
  8853. Section numbers or the equivalent are not considered part of the section
  8854. titles.
  8855. \end_layout
  8856. \begin_layout Itemize
  8857. M.
  8858. Delete any section entitled "Endorsements".
  8859. Such a section may not be included in the Modified Version.
  8860. \end_layout
  8861. \begin_layout Itemize
  8862. N.
  8863. Do not retitle any existing section as "Endorsements" or to conflict in
  8864. title with any Invariant Section.
  8865. \end_layout
  8866. \begin_layout Standard
  8867. If the Modified Version includes new front-matter sections or appendices
  8868. that qualify as Secondary Sections and contain no material copied from
  8869. the Document, you may at your option designate some or all of these sections
  8870. as invariant.
  8871. To do this, add their titles to the list of Invariant Sections in the Modified
  8872. Version's license notice.
  8873. These titles must be distinct from any other section titles.
  8874. \end_layout
  8875. \begin_layout Standard
  8876. You may add a section entitled "Endorsements", provided it contains nothing
  8877. but endorsements of your Modified Version by various parties--for example,
  8878. statements of peer review or that the text has been approved by an organization
  8879. as the authoritative definition of a standard.
  8880. \end_layout
  8881. \begin_layout Standard
  8882. You may add a passage of up to five words as a Front-Cover Text, and a passage
  8883. of up to 25 words as a Back-Cover Text, to the end of the list of Cover
  8884. Texts in the Modified Version.
  8885. Only one passage of Front-Cover Text and one of Back-Cover Text may be
  8886. added by (or through arrangements made by) any one entity.
  8887. If the Document already includes a cover text for the same cover, previously
  8888. added by you or by arrangement made by the same entity you are acting on
  8889. behalf of, you may not add another; but you may replace the old one, on
  8890. explicit permission from the previous publisher that added the old one.
  8891. \end_layout
  8892. \begin_layout Standard
  8893. The author(s) and publisher(s) of the Document do not by this License give
  8894. permission to use their names for publicity for or to assert or imply endorseme
  8895. nt of any Modified Version.
  8896. \end_layout
  8897. \begin_layout Section*
  8898. 5.
  8899. COMBINING DOCUMENTS
  8900. \end_layout
  8901. \begin_layout Standard
  8902. You may combine the Document with other documents released under this License,
  8903. under the terms defined in section 4 above for modified versions, provided
  8904. that you include in the combination all of the Invariant Sections of all
  8905. of the original documents, unmodified, and list them all as Invariant Sections
  8906. of your combined work in its license notice.
  8907. \end_layout
  8908. \begin_layout Standard
  8909. The combined work need only contain one copy of this License, and multiple
  8910. identical Invariant Sections may be replaced with a single copy.
  8911. If there are multiple Invariant Sections with the same name but different
  8912. contents, make the title of each such section unique by adding at the end
  8913. of it, in parentheses, the name of the original author or publisher of
  8914. that section if known, or else a unique number.
  8915. Make the same adjustment to the section titles in the list of Invariant
  8916. Sections in the license notice of the combined work.
  8917. \end_layout
  8918. \begin_layout Standard
  8919. In the combination, you must combine any sections entitled "History" in
  8920. the various original documents, forming one section entitled "History";
  8921. likewise combine any sections entitled "Acknowledgements", and any sections
  8922. entitled "Dedications".
  8923. You must delete all sections entitled "Endorsements."
  8924. \end_layout
  8925. \begin_layout Section*
  8926. 6.
  8927. COLLECTIONS OF DOCUMENTS
  8928. \end_layout
  8929. \begin_layout Standard
  8930. You may make a collection consisting of the Document and other documents
  8931. released under this License, and replace the individual copies of this
  8932. License in the various documents with a single copy that is included in
  8933. the collection, provided that you follow the rules of this License for
  8934. verbatim copying of each of the documents in all other respects.
  8935. \end_layout
  8936. \begin_layout Standard
  8937. You may extract a single document from such a collection, and distribute
  8938. it individually under this License, provided you insert a copy of this
  8939. License into the extracted document, and follow this License in all other
  8940. respects regarding verbatim copying of that document.
  8941. \end_layout
  8942. \begin_layout Section*
  8943. 7.
  8944. AGGREGATION WITH INDEPENDENT WORKS
  8945. \end_layout
  8946. \begin_layout Standard
  8947. A compilation of the Document or its derivatives with other separate and
  8948. independent documents or works, in or on a volume of a storage or distribution
  8949. medium, does not as a whole count as a Modified Version of the Document,
  8950. provided no compilation copyright is claimed for the compilation.
  8951. Such a compilation is called an "aggregate", and this License does not
  8952. apply to the other self-contained works thus compiled with the Document,
  8953. on account of their being thus compiled, if they are not themselves derivative
  8954. works of the Document.
  8955. \end_layout
  8956. \begin_layout Standard
  8957. If the Cover Text requirement of section 3 is applicable to these copies
  8958. of the Document, then if the Document is less than one quarter of the entire
  8959. aggregate, the Document's Cover Texts may be placed on covers that surround
  8960. only the Document within the aggregate.
  8961. Otherwise they must appear on covers around the whole aggregate.
  8962. \end_layout
  8963. \begin_layout Section*
  8964. 8.
  8965. TRANSLATION
  8966. \end_layout
  8967. \begin_layout Standard
  8968. Translation is considered a kind of modification, so you may distribute
  8969. translations of the Document under the terms of section 4.
  8970. Replacing Invariant Sections with translations requires special permission
  8971. from their copyright holders, but you may include translations of some
  8972. or all Invariant Sections in addition to the original versions of these
  8973. Invariant Sections.
  8974. You may include a translation of this License provided that you also include
  8975. the original English version of this License.
  8976. In case of a disagreement between the translation and the original English
  8977. version of this License, the original English version will prevail.
  8978. \end_layout
  8979. \begin_layout Section*
  8980. 9.
  8981. TERMINATION
  8982. \end_layout
  8983. \begin_layout Standard
  8984. You may not copy, modify, sublicense, or distribute the Document except
  8985. as expressly provided for under this License.
  8986. Any other attempt to copy, modify, sublicense or distribute the Document
  8987. is void, and will automatically terminate your rights under this License.
  8988. However, parties who have received copies, or rights, from you under this
  8989. License will not have their licenses terminated so long as such parties
  8990. remain in full compliance.
  8991. \end_layout
  8992. \begin_layout Section*
  8993. 10.
  8994. FUTURE REVISIONS OF THIS LICENSE
  8995. \end_layout
  8996. \begin_layout Standard
  8997. The Free Software Foundation may publish new, revised versions of the GNU
  8998. Free Documentation License from time to time.
  8999. Such new versions will be similar in spirit to the present version, but
  9000. may differ in detail to address new problems or concerns.
  9001. See http://www.gnu.org/copyleft/.
  9002. \end_layout
  9003. \begin_layout Standard
  9004. Each version of the License is given a distinguishing version number.
  9005. If the Document specifies that a particular numbered version of this License
  9006. "or any later version" applies to it, you have the option of following
  9007. the terms and conditions either of that specified version or of any later
  9008. version that has been published (not as a draft) by the Free Software Foundatio
  9009. n.
  9010. If the Document does not specify a version number of this License, you
  9011. may choose any version ever published (not as a draft) by the Free Software
  9012. Foundation.
  9013. \end_layout
  9014. \begin_layout Standard
  9015. \begin_inset CommandInset index_print
  9016. LatexCommand printindex
  9017. \end_inset
  9018. \end_layout
  9019. \end_body
  9020. \end_document