|
- #LyX 1.6.1 created this file. For more info see http://www.lyx.org/
- \lyxformat 345
- \begin_document
- \begin_header
- \textclass scrbook
- \use_default_options true
- \language english
- \inputencoding auto
- \font_roman default
- \font_sans default
- \font_typewriter default
- \font_default_family default
- \font_sc false
- \font_osf false
- \font_sf_scale 100
- \font_tt_scale 100
-
- \graphics default
- \paperfontsize 10
- \spacing single
- \use_hyperref false
- \papersize letterpaper
- \use_geometry true
- \use_amsmath 2
- \use_esint 2
- \cite_engine basic
- \use_bibtopic false
- \paperorientation portrait
- \leftmargin 2cm
- \topmargin 2cm
- \rightmargin 2cm
- \bottommargin 2cm
- \secnumdepth 3
- \tocdepth 3
- \paragraph_separation indent
- \defskip medskip
- \quotes_language english
- \papercolumns 1
- \papersides 1
- \paperpagestyle headings
- \tracking_changes false
- \output_changes false
- \author ""
- \author ""
- \end_header
-
- \begin_body
-
- \begin_layout Title
- The Speex Manual
- \begin_inset Newline newline
- \end_inset
-
- Version 1.2
- \end_layout
-
- \begin_layout Author
- Jean-Marc Valin
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Copyright
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- copyright
- \end_layout
-
- \end_inset
-
- 2002-2008 Jean-Marc Valin/Xiph.org Foundation
- \end_layout
-
- \begin_layout Standard
- Permission is granted to copy, distribute and/or modify this document under
- the terms of the GNU Free Documentation License, Version 1.1 or any later
- version published by the Free Software Foundation; with no Invariant Section,
- with no Front-Cover Texts, and with no Back-Cover.
- A copy of the license is included in the section entitled "GNU Free Documentati
- on License".
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \begin_inset CommandInset toc
- LatexCommand tableofcontents
-
- \end_inset
-
-
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset FloatList table
-
- \end_inset
-
-
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Introduction to Speex
- \end_layout
-
- \begin_layout Standard
- The Speex codec (
- \family typewriter
- http://www.speex.org/
- \family default
- ) exists because there is a need for a speech codec that is open-source
- and free from software patent royalties.
- These are essential conditions for being usable in any open-source software.
- In essence, Speex is to speech what Vorbis is to audio/music.
- Unlike many other speech codecs, Speex is not designed for mobile phones
- but rather for packet networks and voice over IP (VoIP) applications.
- File-based compression is of course also supported.
-
- \end_layout
-
- \begin_layout Standard
- The Speex codec is designed to be very flexible and support a wide range
- of speech quality and bit-rate.
- Support for very good quality speech also means that Speex can encode wideband
- speech (16 kHz sampling rate) in addition to narrowband speech (telephone
- quality, 8 kHz sampling rate).
- \end_layout
-
- \begin_layout Standard
- Designing for VoIP instead of mobile phones means that Speex is robust to
- lost packets, but not to corrupted ones.
- This is based on the assumption that in VoIP, packets either arrive unaltered
- or don't arrive at all.
- Because Speex is targeted at a wide range of devices, it has modest (adjustable
- ) complexity and a small memory footprint.
- \end_layout
-
- \begin_layout Standard
- All the design goals led to the choice of CELP
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- CELP
- \end_layout
-
- \end_inset
-
- as the encoding technique.
- One of the main reasons is that CELP has long proved that it could work
- reliably and scale well to both low bit-rates (e.g.
- DoD CELP @ 4.8 kbps) and high bit-rates (e.g.
- G.728 @ 16 kbps).
-
- \end_layout
-
- \begin_layout Section
- Getting help
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Getting-help"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- As for many open source projects, there are many ways to get help with Speex.
- These include:
- \end_layout
-
- \begin_layout Itemize
- This manual
- \end_layout
-
- \begin_layout Itemize
- Other documentation on the Speex website (http://www.speex.org/)
- \end_layout
-
- \begin_layout Itemize
- Mailing list: Discuss any Speex-related topic on speex-dev@xiph.org (not
- just for developers)
- \end_layout
-
- \begin_layout Itemize
- IRC: The main channel is #speex on irc.freenode.net.
- Note that due to time differences, it may take a while to get someone,
- so please be patient.
- \end_layout
-
- \begin_layout Itemize
- Email the author privately at jean-marc.valin@usherbrooke.ca
- \series bold
- only
- \series default
- for private/delicate topics you do not wish to discuss publicly.
- \end_layout
-
- \begin_layout Standard
- Before asking for help (mailing list or IRC),
- \series bold
- it is important to first read this manual
- \series default
- (OK, so if you made it here it's already a good sign).
- It is generally considered rude to ask on a mailing list about topics that
- are clearly detailed in the documentation.
- On the other hand, it's perfectly OK (and encouraged) to ask for clarifications
- about something covered in the manual.
- This manual does not (yet) cover everything about Speex, so everyone is
- encouraged to ask questions, send comments, feature requests, or just let
- us know how Speex is being used.
-
- \end_layout
-
- \begin_layout Standard
- Here are some additional guidelines related to the mailing list.
- Before reporting bugs in Speex to the list, it is strongly recommended
- (if possible) to first test whether these bugs can be reproduced using
- the speexenc and speexdec (see Section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Command-line-encoder/decoder"
-
- \end_inset
-
- ) command-line utilities.
- Bugs reported based on 3rd party code are both harder to find and far too
- often caused by errors that have nothing to do with Speex.
-
- \end_layout
-
- \begin_layout Section
- About this document
- \end_layout
-
- \begin_layout Standard
- This document is divided in the following way.
- Section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Feature-description"
-
- \end_inset
-
- describes the different Speex features and defines many basic terms that
- are used throughout this manual.
- Section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Command-line-encoder/decoder"
-
- \end_inset
-
- documents the standard command-line tools provided in the Speex distribution.
- Section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Programming-with-Speex"
-
- \end_inset
-
- includes detailed instructions about programming using the libspeex
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- libspeex
- \end_layout
-
- \end_inset
-
- API.
- Section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Formats-and-standards"
-
- \end_inset
-
- has some information related to Speex and standards.
-
- \end_layout
-
- \begin_layout Standard
- The three last sections describe the algorithms used in Speex.
- These sections require signal processing knowledge, but are not required
- for merely using Speex.
- They are intended for people who want to understand how Speex really works
- and/or want to do research based on Speex.
- Section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Introduction-to-CELP"
-
- \end_inset
-
- explains the general idea behind CELP, while sections
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Speex-narrowband-mode"
-
- \end_inset
-
- and
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Speex-wideband-mode"
-
- \end_inset
-
- are specific to Speex.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Codec description
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Feature-description"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- This section describes Speex and its features into more details.
- \end_layout
-
- \begin_layout Section
- Concepts
- \end_layout
-
- \begin_layout Standard
- Before introducing all the Speex features, here are some concepts in speech
- coding that help better understand the rest of the manual.
- Although some are general concepts in speech/audio processing, others are
- specific to Speex.
- \end_layout
-
- \begin_layout Subsection*
- Sampling rate
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- sampling rate
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The sampling rate expressed in Hertz (Hz) is the number of samples taken
- from a signal per second.
- For a sampling rate of
- \begin_inset Formula $F_{s}$
- \end_inset
-
- kHz, the highest frequency that can be represented is equal to
- \begin_inset Formula $F_{s}/2$
- \end_inset
-
- kHz (
- \begin_inset Formula $F_{s}/2$
- \end_inset
-
- is known as the Nyquist frequency).
- This is a fundamental property in signal processing and is described by
- the sampling theorem.
- Speex is mainly designed for three different sampling rates: 8 kHz, 16
- kHz, and 32 kHz.
- These are respectively referred to as narrowband
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- narrowband
- \end_layout
-
- \end_inset
-
- , wideband
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- wideband
- \end_layout
-
- \end_inset
-
- and ultra-wideband
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- ultra-wideband
- \end_layout
-
- \end_inset
-
- .
-
- \end_layout
-
- \begin_layout Subsection*
- Bit-rate
- \end_layout
-
- \begin_layout Standard
- When encoding a speech signal, the bit-rate is defined as the number of
- bits per unit of time required to encode the speech.
- It is measured in
- \emph on
- bits per second
- \emph default
- (bps), or generally
- \emph on
- kilobits per second
- \emph default
- .
- It is important to make the distinction between
- \emph on
- kilo
- \series bold
- bits
- \series default
- \emph default
-
- \emph on
- per second
- \emph default
- (k
- \series bold
- b
- \series default
- ps) and
- \emph on
- kilo
- \series bold
- bytes
- \series default
- \emph default
-
- \emph on
- per second
- \emph default
- (k
- \series bold
- B
- \series default
- ps).
- \end_layout
-
- \begin_layout Subsection*
- Quality
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- quality
- \end_layout
-
- \end_inset
-
- (variable)
- \end_layout
-
- \begin_layout Standard
- Speex is a lossy codec, which means that it achieves compression at the
- expense of fidelity of the input speech signal.
- Unlike some other speech codecs, it is possible to control the trade-off
- made between quality and bit-rate.
- The Speex encoding process is controlled most of the time by a quality
- parameter that ranges from 0 to 10.
- In constant bit-rate
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- constant bit-rate
- \end_layout
-
- \end_inset
-
- (CBR) operation, the quality parameter is an integer, while for variable
- bit-rate (VBR), the parameter is a float.
-
- \end_layout
-
- \begin_layout Subsection*
- Complexity
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- complexity
- \end_layout
-
- \end_inset
-
- (variable)
- \end_layout
-
- \begin_layout Standard
- With Speex, it is possible to vary the complexity allowed for the encoder.
- This is done by controlling how the search is performed with an integer
- ranging from 1 to 10 in a way that's similar to the -1 to -9 options to
-
- \emph on
- gzip
- \emph default
- and
- \emph on
- bzip2
- \emph default
- compression utilities.
- For normal use, the noise level at complexity 1 is between 1 and 2 dB higher
- than at complexity 10, but the CPU requirements for complexity 10 is about
- 5 times higher than for complexity 1.
- In practice, the best trade-off is between complexity 2 and 4, though higher
- settings are often useful when encoding non-speech sounds like DTMF
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- DTMF
- \end_layout
-
- \end_inset
-
- tones.
- \end_layout
-
- \begin_layout Subsection*
- Variable Bit-Rate
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- variable bit-rate
- \end_layout
-
- \end_inset
-
- (VBR)
- \end_layout
-
- \begin_layout Standard
- Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically
- to adapt to the
- \begin_inset Quotes eld
- \end_inset
-
- difficulty
- \begin_inset Quotes erd
- \end_inset
-
- of the audio being encoded.
- In the example of Speex, sounds like vowels and high-energy transients
- require a higher bit-rate to achieve good quality, while fricatives (e.g.
- s,f sounds) can be coded adequately with less bits.
- For this reason, VBR can achieve lower bit-rate for the same quality, or
- a better quality for a certain bit-rate.
- Despite its advantages, VBR has two main drawbacks: first, by only specifying
- quality, there's no guaranty about the final average bit-rate.
- Second, for some real-time applications like voice over IP (VoIP), what
- counts is the maximum bit-rate, which must be low enough for the communication
- channel.
- \end_layout
-
- \begin_layout Subsection*
- Average Bit-Rate
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- average bit-rate
- \end_layout
-
- \end_inset
-
- (ABR)
- \end_layout
-
- \begin_layout Standard
- Average bit-rate solves one of the problems of VBR, as it dynamically adjusts
- VBR quality in order to meet a specific target bit-rate.
- Because the quality/bit-rate is adjusted in real-time (open-loop), the
- global quality will be slightly lower than that obtained by encoding in
- VBR with exactly the right quality setting to meet the target average bit-rate.
- \end_layout
-
- \begin_layout Subsection*
- Voice Activity Detection
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- voice activity detection
- \end_layout
-
- \end_inset
-
- (VAD)
- \end_layout
-
- \begin_layout Standard
- When enabled, voice activity detection detects whether the audio being encoded
- is speech or silence/background noise.
- VAD is always implicitly activated when encoding in VBR, so the option
- is only useful in non-VBR operation.
- In this case, Speex detects non-speech periods and encode them with just
- enough bits to reproduce the background noise.
- This is called
- \begin_inset Quotes eld
- \end_inset
-
- comfort noise generation
- \begin_inset Quotes erd
- \end_inset
-
- (CNG).
- \end_layout
-
- \begin_layout Subsection*
- Discontinuous Transmission
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- discontinuous transmission
- \end_layout
-
- \end_inset
-
- (DTX)
- \end_layout
-
- \begin_layout Standard
- Discontinuous transmission is an addition to VAD/VBR operation, that allows
- to stop transmitting completely when the background noise is stationary.
- In file-based operation, since we cannot just stop writing to the file,
- only 5 bits are used for such frames (corresponding to 250 bps).
- \end_layout
-
- \begin_layout Subsection*
- Perceptual enhancement
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- perceptual enhancement
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Perceptual enhancement is a part of the decoder which, when turned on, attempts
- to reduce the perception of the noise/distortion produced by the encoding/decod
- ing process.
- In most cases, perceptual enhancement brings the sound further from the
- original
- \emph on
- objectively
- \emph default
- (e.g.
- considering only SNR), but in the end it still
- \emph on
- sounds
- \emph default
- better (subjective improvement).
- \end_layout
-
- \begin_layout Subsection*
- Latency and algorithmic delay
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- algorithmic delay
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Every speech codec introduces a delay in the transmission.
- For Speex, this delay is equal to the frame size, plus some amount of
- \begin_inset Quotes eld
- \end_inset
-
- look-ahead
- \begin_inset Quotes erd
- \end_inset
-
- required to process each frame.
- In narrowband operation (8 kHz), the look-ahead is 10 ms, in wideband operation
- (16 kHz), the look-ahead is 13.9 ms and in ultra-wideband operation (32
- kHz) look-ahead is 15.9 ms, resulting in the algorithic delays of 30 ms,
- 33.9 ms and 35.9 ms accordingly.
- These values don't account for the CPU time it takes to encode or decode
- the frames.
- \end_layout
-
- \begin_layout Section
- Codec
- \end_layout
-
- \begin_layout Standard
- The main characteristics of Speex can be summarized as follows:
- \end_layout
-
- \begin_layout Itemize
- Free software/open-source
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- open-source
- \end_layout
-
- \end_inset
-
- , patent
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- patent
- \end_layout
-
- \end_inset
-
- and royalty-free
- \end_layout
-
- \begin_layout Itemize
- Integration of narrowband
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- narrowband
- \end_layout
-
- \end_inset
-
- and wideband
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- wideband
- \end_layout
-
- \end_inset
-
- using an embedded bit-stream
- \end_layout
-
- \begin_layout Itemize
- Wide range of bit-rates available (from 2.15 kbps to 44 kbps)
- \end_layout
-
- \begin_layout Itemize
- Dynamic bit-rate switching (AMR) and Variable Bit-Rate
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- variable bit-rate
- \end_layout
-
- \end_inset
-
- (VBR) operation
- \end_layout
-
- \begin_layout Itemize
- Voice Activity Detection
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- voice activity detection
- \end_layout
-
- \end_inset
-
- (VAD, integrated with VBR) and discontinuous transmission (DTX)
- \end_layout
-
- \begin_layout Itemize
- Variable complexity
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- complexity
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Itemize
- Embedded wideband structure (scalable sampling rate)
- \end_layout
-
- \begin_layout Itemize
- Ultra-wideband sampling rate at 32 kHz
- \end_layout
-
- \begin_layout Itemize
- Intensity stereo encoding option
- \end_layout
-
- \begin_layout Itemize
- Fixed-point implementation
- \end_layout
-
- \begin_layout Section
- Preprocessor
- \end_layout
-
- \begin_layout Standard
- This part refers to the preprocessor module introduced in the 1.1.x branch.
- The preprocessor is designed to be used on the audio
- \emph on
- before
- \emph default
- running the encoder.
- The preprocessor provides three main functionalities:
- \end_layout
-
- \begin_layout Itemize
- noise suppression
- \end_layout
-
- \begin_layout Itemize
- automatic gain control (AGC)
- \end_layout
-
- \begin_layout Itemize
- voice activity detection (VAD)
- \end_layout
-
- \begin_layout Standard
- The denoiser can be used to reduce the amount of background noise present
- in the input signal.
- This provides higher quality speech whether or not the denoised signal
- is encoded with Speex (or at all).
- However, when using the denoised signal with the codec, there is an additional
- benefit.
- Speech codecs in general (Speex included) tend to perform poorly on noisy
- input, which tends to amplify the noise.
- The denoiser greatly reduces this effect.
- \end_layout
-
- \begin_layout Standard
- Automatic gain control (AGC) is a feature that deals with the fact that
- the recording volume may vary by a large amount between different setups.
- The AGC provides a way to adjust a signal to a reference volume.
- This is useful for voice over IP because it removes the need for manual
- adjustment of the microphone gain.
- A secondary advantage is that by setting the microphone gain to a conservative
- (low) level, it is easier to avoid clipping.
- \end_layout
-
- \begin_layout Standard
- The voice activity detector (VAD) provided by the preprocessor is more advanced
- than the one directly provided in the codec.
-
- \end_layout
-
- \begin_layout Section
- Adaptive Jitter Buffer
- \end_layout
-
- \begin_layout Standard
- When transmitting voice (or any content for that matter) over UDP or RTP,
- packet may be lost, arrive with different delay, or even out of order.
- The purpose of a jitter buffer is to reorder packets and buffer them long
- enough (but no longer than necessary) so they can be sent to be decoded.
-
- \end_layout
-
- \begin_layout Section
- Acoustic Echo Canceller
- \end_layout
-
- \begin_layout Standard
- In any hands-free communication system (Fig.
-
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "fig:Acoustic-echo-model"
-
- \end_inset
-
- ), speech from the remote end is played in the local loudspeaker, propagates
- in the room and is captured by the microphone.
- If the audio captured from the microphone is sent directly to the remote
- end, then the remote user hears an echo of his voice.
- An acoustic echo canceller is designed to remove the acoustic echo before
- it is sent to the remote end.
- It is important to understand that the echo canceller is meant to improve
- the quality on the
- \series bold
- remote
- \series default
- end.
- For those who care a lot about mouth-to-ear delays it should be noted that
- unlike Speex codec, resampler and preprocessor, this Acoustic Echo Canceller
- does not introduce any latency.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float figure
- wide false
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Graphics
- filename echo_path.eps
- width 10cm
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Acoustic echo model
- \begin_inset CommandInset label
- LatexCommand label
- name "fig:Acoustic-echo-model"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Section
- Resampler
- \end_layout
-
- \begin_layout Standard
- In some cases, it may be useful to convert audio from one sampling rate
- to another.
- There are many reasons for that.
- It can be for mixing streams that have different sampling rates, for supporting
- sampling rates that the soundcard doesn't support, for transcoding, etc.
- That's why there is now a resampler that is part of the Speex project.
- This resampler can be used to convert between any two arbitrary rates (the
- ratio must only be a rational number) and there is control over the quality/com
- plexity tradeoff.
- Keep in mind, that resampler introduce some delay in audio stream, which
- size depends on resampler quality setting.
- Refer to resampler API documentation to know how to get exact delay values.
- \end_layout
-
- \begin_layout Section
- Integration
- \end_layout
-
- \begin_layout Standard
- Knowing
- \emph on
- how
- \emph default
- to use each of the components is not that useful unless we know
- \emph on
- where
- \emph default
- to use them.
- Figure
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "fig:Integration-VoIP"
-
- \end_inset
-
- shows where each of the components would be used in a typical VoIP client.
- Components in dotted lines are optional, though they may be very useful
- in some circumstances.
- There are several important things to note from there.
- The AEC must be placed as close as possible to the playback and capture.
- Only the resampling may be closer.
- Also, it is very important to use the same clock for both mic capture and
- speaker/headphones playback.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float figure
- wide false
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Graphics
- filename components.eps
- width 80text%
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Integration of all the components in a VoIP client.
- \begin_inset CommandInset label
- LatexCommand label
- name "fig:Integration-VoIP"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Compiling and Porting
- \end_layout
-
- \begin_layout Standard
- Compiling Speex under UNIX/Linux or any other platform supported by autoconf
- (e.g.
- Win32/cygwin) is as easy as typing:
- \end_layout
-
- \begin_layout LyX-Code
- % ./configure [options]
- \end_layout
-
- \begin_layout LyX-Code
- % make
- \end_layout
-
- \begin_layout LyX-Code
- % make install
- \end_layout
-
- \begin_layout Standard
- The options supported by the Speex configure script are:
- \end_layout
-
- \begin_layout Description
- --prefix=<path> Specifies the base path for installing Speex (e.g.
- /usr)
- \end_layout
-
- \begin_layout Description
- --enable-shared/--disable-shared Whether to compile shared libraries
- \end_layout
-
- \begin_layout Description
- --enable-static/--disable-static Whether to compile static libraries
- \end_layout
-
- \begin_layout Description
- --disable-wideband Disable the wideband part of Speex (typically to save
- space)
- \end_layout
-
- \begin_layout Description
- --enable-valgrind Enable extra hits for valgrind for debugging purposes
- (do not use by default)
- \end_layout
-
- \begin_layout Description
- --enable-sse Enable use of SSE instructions (x86/float only)
- \end_layout
-
- \begin_layout Description
- --enable-fixed-point
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- fixed-point
- \end_layout
-
- \end_inset
-
- Compile Speex for a processor that does not have a floating point unit
- (FPU)
- \end_layout
-
- \begin_layout Description
- --enable-arm4-asm Enable assembly specific to the ARMv4 architecture (gcc
- only)
- \end_layout
-
- \begin_layout Description
- --enable-arm5e-asm Enable assembly specific to the ARMv5E architecture (gcc
- only)
- \end_layout
-
- \begin_layout Description
- --enable-fixed-point-debug Use only for debugging the fixed-point
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- fixed-point
- \end_layout
-
- \end_inset
-
- code (very slow)
- \end_layout
-
- \begin_layout Description
- --enable-ti-c55x Enable support for the TI C5x family
- \end_layout
-
- \begin_layout Description
- --enable-blackfin-asm Enable assembly specific to the Blackfin DSP architecture
- (gcc only)
- \end_layout
-
- \begin_layout Section
- Platforms
- \end_layout
-
- \begin_layout Standard
- Speex is known to compile and work on a large number of architectures, both
- floating-point and fixed-point.
- In general, any architecture that can natively compute the multiplication
- of two signed 16-bit numbers (32-bit result) and runs at a sufficient clock
- rate (architecture-dependent) is capable of running Speex.
- Architectures on which Speex is
- \series bold
- known
- \series default
- to work (it probably works on many others) are:
- \end_layout
-
- \begin_layout Itemize
- x86 & x86-64
- \end_layout
-
- \begin_layout Itemize
- Power
- \end_layout
-
- \begin_layout Itemize
- SPARC
- \end_layout
-
- \begin_layout Itemize
- ARM
- \end_layout
-
- \begin_layout Itemize
- Blackfin
- \end_layout
-
- \begin_layout Itemize
- Coldfire (68k family)
- \end_layout
-
- \begin_layout Itemize
- TI C54xx & C55xx
- \end_layout
-
- \begin_layout Itemize
- TI C6xxx
- \end_layout
-
- \begin_layout Itemize
- TriMedia (experimental)
- \end_layout
-
- \begin_layout Standard
- Operating systems on top of which Speex is known to work include (it probably
- works on many others):
- \end_layout
-
- \begin_layout Itemize
- Linux
- \end_layout
-
- \begin_layout Itemize
- \begin_inset Formula $\mu$
- \end_inset
-
- Clinux
- \end_layout
-
- \begin_layout Itemize
- MacOS X
- \end_layout
-
- \begin_layout Itemize
- BSD
- \end_layout
-
- \begin_layout Itemize
- Other UNIX/POSIX variants
- \end_layout
-
- \begin_layout Itemize
- Symbian
- \end_layout
-
- \begin_layout Standard
- The source code directory include additional information for compiling on
- certain architectures or operating systems in README.xxx files.
- \end_layout
-
- \begin_layout Section
- Porting and Optimising
- \end_layout
-
- \begin_layout Standard
- Here are a few things to consider when porting or optimising Speex for a
- new platform or an existing one.
- \end_layout
-
- \begin_layout Subsection
- CPU optimisation
- \end_layout
-
- \begin_layout Standard
- The single factor that will affect the CPU usage of Speex the most is whether
- it is compiled for floating point or fixed-point.
- If your CPU/DSP does not have a floating-point unit FPU, then compiling
- as fixed-point will be orders of magnitudes faster.
- If there is an FPU present, then it is important to test which version
- is faster.
- On the x86 architecture, floating-point is
- \series bold
- generally
- \series default
- faster, but not always.
- To compile Speex as fixed-point, you need to pass --fixed-point to the
- configure script or define the FIXED_POINT macro for the compiler.
- As of 1.2beta3, it is now possible to disable the floating-point compatibility
- API, which means that your code can link without a float emulation library.
- To do that configure with --disable-float-api or define the DISABLE_FLOAT_API
- macro.
- Until the VBR feature is ported to fixed-point, you will also need to configure
- with --disable-vbr or define DISABLE_VBR.
- \end_layout
-
- \begin_layout Standard
- Other important things to check on some DSP architectures are:
- \end_layout
-
- \begin_layout Itemize
- Make sure the cache is set to write-back mode
- \end_layout
-
- \begin_layout Itemize
- If the chip has SRAM instead of cache, make sure as much code and data are
- in SRAM, rather than in RAM
- \end_layout
-
- \begin_layout Standard
- If you are going to be writing assembly, then the following functions are
-
- \series bold
- usually
- \series default
- the first ones you should consider optimising:
- \end_layout
-
- \begin_layout Itemize
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- filter_mem16()
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Itemize
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- iir_mem16()
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Itemize
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- vq_nbest()
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Itemize
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- pitch_xcorr()
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Itemize
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- interp_pitch()
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The filtering functions
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- filter_mem16()
- \end_layout
-
- \end_inset
-
- and
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- iir_mem16()
- \end_layout
-
- \end_inset
-
- are implemented in the direct form II transposed (DF2T).
- However, for architectures based on multiply-accumulate (MAC), DF2T requires
- frequent reload of the accumulator, which can make the code very slow.
- For these architectures (e.g.
- Blackfin and Coldfire), a better approach is to implement those functions
- as direct form I (DF1), which is easier to express in terms of MAC.
- When doing that however,
- \series bold
- it is important to make sure that the DF1 implementation still behaves like
- the original DF2T behaviour when it comes to memory values
- \series default
- .
- This is necessary because the filter is time-varying and must compute exactly
- the same value (not counting machine rounding) on any encoder or decoder.
- \end_layout
-
- \begin_layout Subsection
- Memory optimisation
- \end_layout
-
- \begin_layout Standard
- Memory optimisation is mainly something that should be considered for small
- embedded platforms.
- For PCs, Speex is already so tiny that it's just not worth doing any of
- the things suggested here.
- There are several ways to reduce the memory usage of Speex, both in terms
- of code size and data size.
- For optimising code size, the trick is to first remove features you do
- not need.
- Some examples of things that can easily be disabled
- \series bold
- if you don't need them
- \series default
- are:
- \end_layout
-
- \begin_layout Itemize
- Wideband support (--disable-wideband)
- \end_layout
-
- \begin_layout Itemize
- Support for stereo (removing stereo.c)
- \end_layout
-
- \begin_layout Itemize
- VBR support (--disable-vbr or DISABLE_VBR)
- \end_layout
-
- \begin_layout Itemize
- Static codebooks that are not needed for the bit-rates you are using (*_table.c
- files)
- \end_layout
-
- \begin_layout Standard
- Speex also has several methods for allocating temporary arrays.
- When using a compiler that supports C99 properly (as of 2007, Microsoft
- compilers don't, but gcc does), it is best to define VAR_ARRAYS.
- That makes use of the variable-size array feature of C99.
- The next best is to define USE_ALLOCA so that Speex can use alloca() to
- allocate the temporary arrays.
- Note that on many systems, alloca() is buggy so it may not work.
- If none of VAR_ARRAYS and USE_ALLOCA are defined, then Speex falls back
- to allocating a large
- \begin_inset Quotes eld
- \end_inset
-
- scratch space
- \begin_inset Quotes erd
- \end_inset
-
- and doing its own internal allocation.
- The main disadvantage of this solution is that it is wasteful.
- It needs to allocate enough stack for the worst case scenario (worst bit-rate,
- highest complexity setting, ...) and by default, the memory isn't shared between
- multiple encoder/decoder states.
- Still, if the
- \begin_inset Quotes eld
- \end_inset
-
- manual
- \begin_inset Quotes erd
- \end_inset
-
- allocation is the only option left, there are a few things that can be
- improved.
- By overriding the speex_alloc_scratch() call in os_support.h, it is possible
- to always return the same memory area for all states
- \begin_inset Foot
- status collapsed
-
- \begin_layout Plain Layout
- In this case, one must be careful with threads
- \end_layout
-
- \end_inset
-
- .
- In addition to that, by redefining the NB_ENC_STACK and NB_DEC_STACK (or
- similar for wideband), it is possible to only allocate memory for a scenario
- that is known in advance.
- In this case, it is important to measure the amount of memory required
- for the specific sampling rate, bit-rate and complexity level being used.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Command-line encoder/decoder
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Command-line-encoder/decoder"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The base Speex distribution includes a command-line encoder (
- \emph on
- speexenc
- \emph default
- ) and decoder (
- \emph on
- speexdec
- \emph default
- ).
- Those tools produce and read Speex files encapsulated in the Ogg container.
- Although it is possible to encapsulate Speex in any container, Ogg is the
- recommended container for files.
- This section describes how to use the command line tools for Speex files
- in Ogg.
- \end_layout
-
- \begin_layout Section
-
- \emph on
- speexenc
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- speexenc
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The
- \emph on
- speexenc
- \emph default
- utility is used to create Speex files from raw PCM or wave files.
- It can be used by calling:
- \end_layout
-
- \begin_layout LyX-Code
- speexenc [options] input_file output_file
- \end_layout
-
- \begin_layout Standard
- The value '-' for input_file or output_file corresponds respectively to
- stdin and stdout.
- The valid options are:
- \end_layout
-
- \begin_layout Description
- --narrowband
- \begin_inset space ~
- \end_inset
-
- (-n) Tell Speex to treat the input as narrowband (8 kHz).
- This is the default
- \end_layout
-
- \begin_layout Description
- --wideband
- \begin_inset space ~
- \end_inset
-
- (-w) Tell Speex to treat the input as wideband (16 kHz)
- \end_layout
-
- \begin_layout Description
- --ultra-wideband
- \begin_inset space ~
- \end_inset
-
- (-u) Tell Speex to treat the input as
- \begin_inset Quotes eld
- \end_inset
-
- ultra-wideband
- \begin_inset Quotes erd
- \end_inset
-
- (32 kHz)
- \end_layout
-
- \begin_layout Description
- --quality
- \begin_inset space ~
- \end_inset
-
- n Set the encoding quality (0-10), default is 8
- \end_layout
-
- \begin_layout Description
- --bitrate
- \begin_inset space ~
- \end_inset
-
- n Encoding bit-rate (use bit-rate n or lower)
- \end_layout
-
- \begin_layout Description
- --vbr Enable VBR (Variable Bit-Rate), disabled by default
- \end_layout
-
- \begin_layout Description
- --abr
- \begin_inset space ~
- \end_inset
-
- n Enable ABR (Average Bit-Rate) at n kbps, disabled by default
- \end_layout
-
- \begin_layout Description
- --vad Enable VAD (Voice Activity Detection), disabled by default
- \end_layout
-
- \begin_layout Description
- --dtx Enable DTX (Discontinuous Transmission), disabled by default
- \end_layout
-
- \begin_layout Description
- --nframes
- \begin_inset space ~
- \end_inset
-
- n Pack n frames in each Ogg packet (this saves space at low bit-rates)
- \end_layout
-
- \begin_layout Description
- --comp
- \begin_inset space ~
- \end_inset
-
- n Set encoding speed/quality tradeoff.
- The higher the value of n, the slower the encoding (default is 3)
- \end_layout
-
- \begin_layout Description
- -V Verbose operation, print bit-rate currently in use
- \end_layout
-
- \begin_layout Description
- --help
- \begin_inset space ~
- \end_inset
-
- (-h) Print the help
- \end_layout
-
- \begin_layout Description
- --version
- \begin_inset space ~
- \end_inset
-
- (-v) Print version information
- \end_layout
-
- \begin_layout Subsection*
- Speex comments
- \end_layout
-
- \begin_layout Description
- --comment Add the given string as an extra comment.
- This may be used multiple times.
-
- \end_layout
-
- \begin_layout Description
- --author Author of this track.
-
- \end_layout
-
- \begin_layout Description
- --title Title for this track.
-
- \end_layout
-
- \begin_layout Subsection*
- Raw input options
- \end_layout
-
- \begin_layout Description
- --rate
- \begin_inset space ~
- \end_inset
-
- n Sampling rate for raw input
- \end_layout
-
- \begin_layout Description
- --stereo Consider raw input as stereo
- \end_layout
-
- \begin_layout Description
- --le Raw input is little-endian
- \end_layout
-
- \begin_layout Description
- --be Raw input is big-endian
- \end_layout
-
- \begin_layout Description
- --8bit Raw input is 8-bit unsigned
- \end_layout
-
- \begin_layout Description
- --16bit Raw input is 16-bit signed
- \end_layout
-
- \begin_layout Section
-
- \emph on
- speexdec
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- speexdec
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The
- \emph on
- speexdec
- \emph default
- utility is used to decode Speex files and can be used by calling:
- \end_layout
-
- \begin_layout LyX-Code
- speexdec [options] speex_file [output_file]
- \end_layout
-
- \begin_layout Standard
- The value '-' for input_file or output_file corresponds respectively to
- stdin and stdout.
- Also, when no output_file is specified, the file is played to the soundcard.
- The valid options are:
- \end_layout
-
- \begin_layout Description
- --enh enable post-filter (default)
- \end_layout
-
- \begin_layout Description
- --no-enh disable post-filter
- \end_layout
-
- \begin_layout Description
- --force-nb Force decoding in narrowband
- \end_layout
-
- \begin_layout Description
- --force-wb Force decoding in wideband
- \end_layout
-
- \begin_layout Description
- --force-uwb Force decoding in ultra-wideband
- \end_layout
-
- \begin_layout Description
- --mono Force decoding in mono
- \end_layout
-
- \begin_layout Description
- --stereo Force decoding in stereo
- \end_layout
-
- \begin_layout Description
- --rate
- \begin_inset space ~
- \end_inset
-
- n Force decoding at n Hz sampling rate
- \end_layout
-
- \begin_layout Description
- --packet-loss
- \begin_inset space ~
- \end_inset
-
- n Simulate n % random packet loss
- \end_layout
-
- \begin_layout Description
- -V Verbose operation, print bit-rate currently in use
- \end_layout
-
- \begin_layout Description
- --help
- \begin_inset space ~
- \end_inset
-
- (-h) Print the help
- \end_layout
-
- \begin_layout Description
- --version
- \begin_inset space ~
- \end_inset
-
- (-v) Print version information
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Using the Speex Codec API (
- \emph on
- libspeex
- \emph default
-
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- libspeex
- \end_layout
-
- \end_inset
-
- )
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Programming-with-Speex"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The
- \emph on
- libspeex
- \emph default
- library contains all the functions for encoding and decoding speech with
- the Speex codec.
- When linking on a UNIX system, one must add
- \emph on
- -lspeex -lm
- \emph default
- to the compiler command line.
- One important thing to know is that
- \series bold
- libspeex calls are reentrant, but not thread-safe
- \series default
- .
- That means that it is fine to use calls from many threads, but
- \series bold
- calls using the same state from multiple threads must be protected by mutexes
- \series default
- .
- Examples of code can also be found in Appendix
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Sample-code"
-
- \end_inset
-
- and the complete API documentation is included in the Documentation section
- of the Speex website (http://www.speex.org/).
- \end_layout
-
- \begin_layout Section
- Encoding
- \begin_inset CommandInset label
- LatexCommand label
- name "sub:Encoding"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- In order to encode speech using Speex, one first needs to:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- #include <speex/speex.h>
- \end_layout
-
- \end_inset
-
- Then in the code, a Speex bit-packing struct must be declared, along with
- a Speex encoder state:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- SpeexBits bits;
- \end_layout
-
- \begin_layout Plain Layout
-
- void *enc_state;
- \end_layout
-
- \end_inset
-
- The two are initialized by:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_bits_init(&bits);
- \end_layout
-
- \begin_layout Plain Layout
-
- enc_state = speex_encoder_init(&speex_nb_mode);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- For wideband coding,
- \emph on
- speex_nb_mode
- \emph default
- will be replaced by
- \emph on
- speex_wb_mode
- \emph default
- .
- In most cases, you will need to know the frame size used at the sampling
- rate you are using.
- You can get that value in the
- \emph on
- frame_size
- \emph default
- variable (expressed in
- \series bold
- samples
- \series default
- , not bytes) with:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- In practice,
- \emph on
- frame_size
- \emph default
- will correspond to 20 ms when using 8, 16, or 32 kHz sampling rate.
- There are many parameters that can be set for the Speex encoder, but the
- most useful one is the quality parameter that controls the quality vs bit-rate
- tradeoff.
- This is set by:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_encoder_ctl(enc_state,SPEEX_SET_QUALITY,&quality);
- \end_layout
-
- \end_inset
-
- where
- \emph on
- quality
- \emph default
- is an integer value ranging from 0 to 10 (inclusively).
- The mapping between quality and bit-rate is described in Fig.
-
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:quality_vs_bps"
-
- \end_inset
-
- for narrowband.
- \end_layout
-
- \begin_layout Standard
- Once the initialization is done, for every input frame:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_bits_reset(&bits);
- \end_layout
-
- \begin_layout Plain Layout
-
- speex_encode_int(enc_state, input_frame, &bits);
- \end_layout
-
- \begin_layout Plain Layout
-
- nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- where
- \emph on
- input_frame
- \emph default
- is a
- \emph on
- (
- \emph default
- short
- \emph on
- *)
- \emph default
- pointing to the beginning of a speech frame,
- \emph on
- byte_ptr
- \emph default
- is a
- \emph on
- (char *)
- \emph default
- where the encoded frame will be written,
- \emph on
- MAX_NB_BYTES
- \emph default
- is the maximum number of bytes that can be written to
- \emph on
- byte_ptr
- \emph default
- without causing an overflow and
- \emph on
- nbBytes
- \emph default
- is the number of bytes actually written to
- \emph on
- byte_ptr
- \emph default
- (the encoded size in bytes).
- Before calling speex_bits_write, it is possible to find the number of bytes
- that need to be written by calling
- \family typewriter
- speex_bits_nbytes(&bits)
- \family default
- , which returns a number of bytes.
- \end_layout
-
- \begin_layout Standard
- It is still possible to use the
- \emph on
- speex_encode()
- \emph default
- function, which takes a
- \emph on
- (float *)
- \emph default
- for the audio.
- However, this would make an eventual port to an FPU-less platform (like
- ARM) more complicated.
- Internally,
- \emph on
- speex_encode()
- \emph default
- and
- \emph on
- speex_encode_int()
- \emph default
- are processed in the same way.
- Whether the encoder uses the fixed-point version is only decided by the
- compile-time flags, not at the API level.
- \end_layout
-
- \begin_layout Standard
- After you're done with the encoding, free all resources with:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_bits_destroy(&bits);
- \end_layout
-
- \begin_layout Plain Layout
-
- speex_encoder_destroy(enc_state);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- That's about it for the encoder.
-
- \end_layout
-
- \begin_layout Section
- Decoding
- \begin_inset CommandInset label
- LatexCommand label
- name "sub:Decoding"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- In order to decode speech using Speex, you first need to:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- #include <speex/speex.h>
- \end_layout
-
- \end_inset
-
- You also need to declare a Speex bit-packing struct
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- SpeexBits bits;
- \end_layout
-
- \end_inset
-
- and a Speex decoder state
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- void *dec_state;
- \end_layout
-
- \end_inset
-
- The two are initialized by:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_bits_init(&bits);
- \end_layout
-
- \begin_layout Plain Layout
-
- dec_state = speex_decoder_init(&speex_nb_mode);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- For wideband decoding,
- \emph on
- speex_nb_mode
- \emph default
- will be replaced by
- \emph on
- speex_wb_mode
- \emph default
- .
- If you need to obtain the size of the frames that will be used by the decoder,
- you can get that value in the
- \emph on
- frame_size
- \emph default
- variable (expressed in
- \series bold
- samples
- \series default
- , not bytes) with:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- There is also a parameter that can be set for the decoder: whether or not
- to use a perceptual enhancer.
- This can be set by:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- where
- \emph on
- enh
- \emph default
- is an int with value 0 to have the enhancer disabled and 1 to have it enabled.
- As of 1.2-beta1, the default is now to enable the enhancer.
- \end_layout
-
- \begin_layout Standard
- Again, once the decoder initialization is done, for every input frame:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_bits_read_from(&bits, input_bytes, nbBytes);
- \end_layout
-
- \begin_layout Plain Layout
-
- speex_decode_int(dec_state, &bits, output_frame);
- \end_layout
-
- \end_inset
-
- where input_bytes is a
- \emph on
- (char *)
- \emph default
- containing the bit-stream data received for a frame,
- \emph on
- nbBytes
- \emph default
- is the size (in bytes) of that bit-stream, and
- \emph on
- output_frame
- \emph default
- is a
- \emph on
- (short *)
- \emph default
- and points to the area where the decoded speech frame will be written.
- A NULL value as the second argument indicates that we don't have the bits
- for the current frame.
- When a frame is lost, the Speex decoder will do its best to "guess" the
- correct signal.
- \end_layout
-
- \begin_layout Standard
- As for the encoder, the
- \emph on
- speex_decode()
- \emph default
- function can still be used, with a
- \emph on
- (float *)
- \emph default
- as the output for the audio.
- After you're done with the decoding, free all resources with:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_bits_destroy(&bits);
- \end_layout
-
- \begin_layout Plain Layout
-
- speex_decoder_destroy(dec_state);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Section
- Codec Options (speex_*_ctl)
- \begin_inset CommandInset label
- LatexCommand label
- name "sub:Codec-Options"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Quote
- \align center
-
- \emph on
- Entities should not be multiplied beyond necessity -- William of Ockham.
- \end_layout
-
- \begin_layout Quote
- \align center
-
- \emph on
- Just because there's an option for it doesn't mean you have to turn it on
- -- me.
- \end_layout
-
- \begin_layout Standard
- The Speex encoder and decoder support many options and requests that can
- be accessed through the
- \emph on
- speex_encoder_ctl
- \emph default
- and
- \emph on
- speex_decoder_ctl
- \emph default
- functions.
- These functions are similar to the
- \emph on
- ioctl
- \emph default
- system call and their prototypes are:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- void speex_encoder_ctl(void *encoder, int request, void *ptr);
- \end_layout
-
- \begin_layout Plain Layout
-
- void speex_decoder_ctl(void *encoder, int request, void *ptr);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Despite those functions, the defaults are usually good for many applications
- and
- \series bold
- optional settings should only be used when one understands them and knows
- that they are needed
- \series default
- .
- A common error is to attempt to set many unnecessary settings.
-
- \end_layout
-
- \begin_layout Standard
- Here is a list of the values allowed for the requests.
- Some only apply to the encoder or the decoder.
- Because the last argument is of type
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- void *
- \end_layout
-
- \end_inset
-
- , the
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- _ctl()
- \end_layout
-
- \end_inset
-
- functions are
- \series bold
- not type safe
- \series default
- , and should thus be used with care.
- The type
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- is the same as the C99
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- int32_t
- \end_layout
-
- \end_inset
-
- type.
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_ENH
- \begin_inset Formula $\ddagger$
- \end_inset
-
- Set perceptual enhancer
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- perceptual enhancement
- \end_layout
-
- \end_inset
-
- to on (1) or off (0) (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- , default is on)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_ENH
- \begin_inset Formula $\ddagger$
- \end_inset
-
- Get perceptual enhancer status (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_FRAME_SIZE Get the number of samples per frame for the current
- mode (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_QUALITY
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set the encoder speech quality (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- from 0 to 10, default is 8)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_QUALITY
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get the current encoder speech quality (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- from 0 to 10)
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_MODE
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set the mode number, as specified in the RTP spec (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_MODE
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get the current mode number, as specified in the RTP spec (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_VBR
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set variable bit-rate (VBR) to on (1) or off (0) (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- , default is off)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_VBR
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get variable bit-rate
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- variable bit-rate
- \end_layout
-
- \end_inset
-
- (VBR) status (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_VBR_QUALITY
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set the encoder VBR speech quality (float 0.0 to 10.0, default is 8.0)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_VBR_QUALITY
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get the current encoder VBR speech quality (float 0 to 10)
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_COMPLEXITY
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set the CPU resources allowed for the encoder (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- from 1 to 10, default is 2)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_COMPLEXITY
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get the CPU resources allowed for the encoder (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- from 1 to 10, default is 2)
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_BITRATE
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set the bit-rate to use the closest value not exceeding the parameter (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in bits per second)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_BITRATE Get the current bit-rate in use (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in bits per second)
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_SAMPLING_RATE Set real sampling rate (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in Hz)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_SAMPLING_RATE Get real sampling rate (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in Hz)
- \end_layout
-
- \begin_layout Description
- SPEEX_RESET_STATE Reset the encoder/decoder state to its original state,
- clearing all memories (no argument)
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_VAD
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set voice activity detection
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- voice activity detection
- \end_layout
-
- \end_inset
-
- (VAD) to on (1) or off (0) (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- , default is off)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_VAD
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get voice activity detection (VAD) status (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_DTX
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set discontinuous transmission
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- discontinuous transmission
- \end_layout
-
- \end_inset
-
- (DTX) to on (1) or off (0) (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- , default is off)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_DTX
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get discontinuous transmission (DTX) status (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_ABR
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set average bit-rate
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- average bit-rate
- \end_layout
-
- \end_inset
-
- (ABR) to a value n in bits per second (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in bits per second)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_ABR
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get average bit-rate (ABR) setting (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in bits per second)
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_PLC_TUNING
- \begin_inset Formula $\dagger$
- \end_inset
-
- Tell the encoder to optimize encoding for a certain percentage of packet
- loss (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in percent)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_PLC_TUNING
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get the current tuning of the encoder for PLC (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in percent)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_LOOKAHEAD Returns the lookahead used by Speex separately for an
- encoder and a decoder.
- Sum encoder and decoder lookahead values to get the total codec lookahead.
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_VBR_MAX_BITRATE
- \begin_inset Formula $\dagger$
- \end_inset
-
- Set the maximum bit-rate allowed in VBR operation (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in bits per second)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_VBR_MAX_BITRATE
- \begin_inset Formula $\dagger$
- \end_inset
-
- Get the current maximum bit-rate allowed in VBR operation (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- in bits per second)
- \end_layout
-
- \begin_layout Description
- SPEEX_SET_HIGHPASS Set the high-pass filter on (1) or off (0) (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- , default is on)
- \end_layout
-
- \begin_layout Description
- SPEEX_GET_HIGHPASS Get the current high-pass filter status (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- \begin_inset Formula $\dagger$
- \end_inset
-
- applies only to the encoder
- \end_layout
-
- \begin_layout Description
- \begin_inset Formula $\ddagger$
- \end_inset
-
- applies only to the decoder
- \end_layout
-
- \begin_layout Section
- Mode queries
- \begin_inset CommandInset label
- LatexCommand label
- name "sub:Mode-queries"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Speex modes have a query system similar to the speex_encoder_ctl and speex_decod
- er_ctl calls.
- Since modes are read-only, it is only possible to get information about
- a particular mode.
- The function used to do that is:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- void speex_mode_query(SpeexMode *mode, int request, void *ptr);
- \end_layout
-
- \end_inset
-
- The admissible values for request are (unless otherwise note, the values
- are returned through
- \emph on
- ptr
- \emph default
- ):
- \end_layout
-
- \begin_layout Description
- SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode
- \end_layout
-
- \begin_layout Description
- SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified through
-
- \emph on
- ptr
- \emph default
- (integer in bps).
-
- \end_layout
-
- \begin_layout Section
- Packing and in-band signalling
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- in-band signalling
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Sometimes it is desirable to pack more than one frame per packet (or other
- basic unit of storage).
- The proper way to do it is to call speex_encode
- \begin_inset Formula $N$
- \end_inset
-
- times before writing the stream with speex_bits_write.
- In cases where the number of frames is not determined by an out-of-band
- mechanism, it is possible to include a terminator code.
- That terminator consists of the code 15 (decimal) encoded with 5 bits,
- as shown in Table
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:quality_vs_bps"
-
- \end_inset
-
- .
- Note that as of version 1.0.2, calling speex_bits_write automatically inserts
- the terminator so as to fill the last byte.
- This doesn't involves any overhead and makes sure Speex can always detect
- when there is no more frame in a packet.
- \end_layout
-
- \begin_layout Standard
- It is also possible to send in-band
- \begin_inset Quotes eld
- \end_inset
-
- messages
- \begin_inset Quotes erd
- \end_inset
-
- to the other side.
- All these messages are encoded as
- \begin_inset Quotes eld
- \end_inset
-
- pseudo-frames
- \begin_inset Quotes erd
- \end_inset
-
- of mode 14 which contain a 4-bit message type code, followed by the message.
- Table
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:In-band-signalling-codes"
-
- \end_inset
-
- lists the available codes, their meaning and the size of the message that
- follows.
- Most of these messages are requests that are sent to the encoder or decoder
- on the other end, which is free to comply or ignore them.
- By default, all in-band messages are ignored.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float table
- placement htbp
- wide false
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Tabular
- <lyxtabular version="3" rows="17" columns="3">
- <features>
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Code
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Size (bits)
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Content
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Asks decoder to set perceptual enhancement off (0) or on(1)
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Asks (if 1) the encoder to be less
- \begin_inset Quotes eld
- \end_inset
-
- aggressive
- \begin_inset Quotes erd
- \end_inset
-
- due to high packet loss
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 2
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Asks encoder to switch to mode N
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Asks encoder to switch to mode N for low-band
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Asks encoder to switch to mode N for high-band
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Asks encoder to switch to quality N for VBR
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 6
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Request acknowledge (0=no, 1=all, 2=only for in-band data)
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Asks encoder to set CBR (0), VAD(1), DTX(3), VBR(5), VBR+DTX(7)
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 8
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 8
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Transmit (8-bit) character to the other end
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 9
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 8
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Intensity stereo information
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 10
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 16
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Announce maximum bit-rate acceptable (N in bytes/second)
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 11
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 16
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 12
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 32
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Acknowledge receiving packet N
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 13
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 32
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 14
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 64
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 15
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 64
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved
- \end_layout
-
- \end_inset
- </cell>
- </row>
- </lyxtabular>
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- In-band signalling codes
- \begin_inset CommandInset label
- LatexCommand label
- name "cap:In-band-signalling-codes"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Finally, applications may define custom in-band messages using mode 13.
- The size of the message in bytes is encoded with 5 bits, so that the decoder
- can skip it if it doesn't know how to interpret it.
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Speech Processing API (
- \emph on
- libspeexdsp
- \emph default
- )
- \end_layout
-
- \begin_layout Standard
- As of version 1.2beta3, the non-codec parts of the Speex package are now
- in a separate library called
- \emph on
- libspeexdsp
- \emph default
- .
- This library includes the preprocessor, the acoustic echo canceller, the
- jitter buffer, and the resampler.
- In a UNIX environment, it can be linked into a program by adding
- \emph on
- -lspeexdsp -lm
- \emph default
- to the compiler command line.
- Just like for libspeex,
- \series bold
- libspeexdsp calls are reentrant, but not thread-safe
- \series default
- .
- That means that it is fine to use calls from many threads, but
- \series bold
- calls using the same state from multiple threads must be protected by mutexes
- \series default
- .
- \end_layout
-
- \begin_layout Section
- Preprocessor
- \begin_inset CommandInset label
- LatexCommand label
- name "sub:Preprocessor"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \noindent
- In order to use the Speex preprocessor
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- preprocessor
- \end_layout
-
- \end_inset
-
- , you first need to:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- #include <speex/speex_preprocess.h>
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \noindent
- Then, a preprocessor state can be created as:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- SpeexPreprocessState *preprocess_state = speex_preprocess_state_init(frame_size,
- sampling_rate);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \noindent
- and it is recommended to use the same value for
- \family typewriter
- frame_size
- \family default
- as is used by the encoder (20
- \emph on
- ms
- \emph default
- ).
- \end_layout
-
- \begin_layout Standard
- For each input frame, you need to call:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_preprocess_run(preprocess_state, audio_frame);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \noindent
- where
- \family typewriter
- audio_frame
- \family default
- is used both as input and output.
- In cases where the output audio is not useful for a certain frame, it is
- possible to use instead:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_preprocess_estimate_update(preprocess_state, audio_frame);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \noindent
- This call will update all the preprocessor internal state variables without
- computing the output audio, thus saving some CPU cycles.
- \end_layout
-
- \begin_layout Standard
- The behaviour of the preprocessor can be changed using:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_preprocess_ctl(preprocess_state, request, ptr);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \noindent
- which is used in the same way as the encoder and decoder equivalent.
- Options are listed in Section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sub:Preprocessor-options"
-
- \end_inset
-
- .
- \end_layout
-
- \begin_layout Standard
- The preprocessor state can be destroyed using:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_preprocess_state_destroy(preprocess_state);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Subsection
- Preprocessor options
- \begin_inset CommandInset label
- LatexCommand label
- name "sub:Preprocessor-options"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- As with the codec, the preprocessor also has options that can be controlled
- using an ioctl()-like call.
- The available options are:
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_DENOISE Turns denoising on(1) or off(0) (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_DENOISE Get denoising status (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_AGC Turns automatic gain control (AGC) on(1) or off(0)
- (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_AGC Get AGC status (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_VAD Turns voice activity detector (VAD) on(1) or off(0)
- (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_VAD Get VAD status (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_AGC_LEVEL
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_AGC_LEVEL
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_DEREVERB Turns reverberation removal on(1) or off(0)
- (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_DEREVERB Get reverberation removal status (
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_DEREVERB_LEVEL Not working yet, do not use
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_DEREVERB_LEVEL Not working yet, do not use
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_DEREVERB_DECAY Not working yet, do not use
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_DEREVERB_DECAY Not working yet, do not use
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_PROB_START
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_PROB_START
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_PROB_CONTINUE
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_PROB_CONTINUE
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_NOISE_SUPPRESS Set maximum attenuation of the noise
- in dB (negative
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_NOISE_SUPPRESS Get maximum attenuation of the noise
- in dB (negative
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_ECHO_SUPPRESS Set maximum attenuation of the residual
- echo in dB (negative
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_ECHO_SUPPRESS Get maximum attenuation of the residual
- echo in dB (negative
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_ECHO_SUPPRESS_ACTIVE Set maximum attenuation of the
- echo in dB when near end is active (negative
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_ECHO_SUPPRESS_ACTIVE Get maximum attenuation of the
- echo in dB when near end is active (negative
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- spx_int32_t
- \end_layout
-
- \end_inset
-
- )
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_SET_ECHO_STATE Set the associated echo canceller for residual
- echo suppression (pointer or NULL for no residual echo suppression)
- \end_layout
-
- \begin_layout Description
- SPEEX_PREPROCESS_GET_ECHO_STATE Get the associated echo canceller (pointer)
- \end_layout
-
- \begin_layout Section
- Echo Cancellation
- \begin_inset CommandInset label
- LatexCommand label
- name "sub:Echo-Cancellation"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The Speex library now includes an echo cancellation
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- echo cancellation
- \end_layout
-
- \end_inset
-
- algorithm suitable for Acoustic Echo Cancellation
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- acoustic echo cancellation
- \end_layout
-
- \end_inset
-
- (AEC).
- In order to use the echo canceller, you first need to
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- #include <speex/speex_echo.h>
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Then, an echo canceller state can be created by:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- SpeexEchoState *echo_state = speex_echo_state_init(frame_size, filter_length);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- where
- \family typewriter
- frame_size
- \family default
- is the amount of data (in samples) you want to process at once and
- \family typewriter
- filter_length
- \family default
- is the length (in samples) of the echo cancelling filter you want to use
- (also known as
- \shape italic
- tail length
- \shape default
-
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- tail length
- \end_layout
-
- \end_inset
-
- ).
- It is recommended to use a frame size in the order of 20 ms (or equal to
- the codec frame size) and make sure it is easy to perform an FFT of that
- size (powers of two are better than prime sizes).
- The recommended tail length is approximately the third of the room reverberatio
- n time.
- For example, in a small room, reverberation time is in the order of 300
- ms, so a tail length of 100 ms is a good choice (800 samples at 8000 Hz
- sampling rate).
- \end_layout
-
- \begin_layout Standard
- Once the echo canceller state is created, audio can be processed by:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- where
- \family typewriter
- input_frame
- \family default
- is the audio as captured by the microphone,
- \family typewriter
- echo_frame
- \family default
- is the signal that was played in the speaker (and needs to be removed)
- and
- \family typewriter
- output_frame
- \family default
- is the signal with echo removed.
-
- \end_layout
-
- \begin_layout Standard
- One important thing to keep in mind is the relationship between
- \family typewriter
- input_frame
- \family default
- and
- \family typewriter
- echo_frame
- \family default
- .
- It is important that, at any time, any echo that is present in the input
- has already been sent to the echo canceller as
- \family typewriter
- echo_frame
- \family default
- .
- In other words, the echo canceller cannot remove a signal that it hasn't
- yet received.
- On the other hand, the delay between the input signal and the echo signal
- must be small enough because otherwise part of the echo cancellation filter
- is inefficient.
- In the ideal case, you code would look like:
- \begin_inset listings
- lstparams "breaklines=true"
- inline false
- status open
-
- \begin_layout Plain Layout
-
- write_to_soundcard(echo_frame, frame_size);
- \end_layout
-
- \begin_layout Plain Layout
-
- read_from_soundcard(input_frame, frame_size);
- \end_layout
-
- \begin_layout Plain Layout
-
- speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- If you wish to further reduce the echo present in the signal, you can do
- so by associating the echo canceller to the preprocessor (see Section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sub:Preprocessor"
-
- \end_inset
-
- ).
- This is done by calling:
- \begin_inset listings
- lstparams "breaklines=true"
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_preprocess_ctl(preprocess_state, SPEEX_PREPROCESS_SET_ECHO_STATE,echo_stat
- e);
- \end_layout
-
- \end_inset
-
- in the initialisation.
- \end_layout
-
- \begin_layout Standard
- As of version 1.2-beta2, there is an alternative, simpler API that can be
- used instead of
- \emph on
- speex_echo_cancellation()
- \emph default
- .
- When audio capture and playback are handled asynchronously (e.g.
- in different threads or using the
- \emph on
- poll()
- \emph default
- or
- \emph on
- select()
- \emph default
- system call), it can be difficult to keep track of what input_frame comes
- with what echo_frame.
- Instead, the playback context/thread can simply call:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_echo_playback(echo_state, echo_frame);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- every time an audio frame is played.
- Then, the capture context/thread calls:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_echo_capture(echo_state, input_frame, output_frame);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- for every frame captured.
- Internally,
- \emph on
- speex_echo_playback()
- \emph default
- simply buffers the playback frame so it can be used by
- \emph on
- speex_echo_capture()
- \emph default
- to call
- \emph on
- speex_echo_cancel()
- \emph default
- .
- A side effect of using this alternate API is that the playback audio is
- delayed by two frames, which is the normal delay caused by the soundcard.
- When capture and playback are already synchronised,
- \emph on
- speex_echo_cancellation()
- \emph default
- is preferable since it gives better control on the exact input/echo timing.
- \end_layout
-
- \begin_layout Standard
- The echo cancellation state can be destroyed with:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_echo_state_destroy(echo_state);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- It is also possible to reset the state of the echo canceller so it can be
- reused without the need to create another state with:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- speex_echo_state_reset(echo_state);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Subsection
- Troubleshooting
- \end_layout
-
- \begin_layout Standard
- There are several things that may prevent the echo canceller from working
- properly.
- One of them is a bug (or something suboptimal) in the code, but there are
- many others you should consider first
- \end_layout
-
- \begin_layout Itemize
- Using a different soundcard to do the capture and plaback will
- \series bold
- not
- \series default
- work, regardless of what you may think.
- The only exception to that is if the two cards can be made to have their
- sampling clock
- \begin_inset Quotes eld
- \end_inset
-
- locked
- \begin_inset Quotes erd
- \end_inset
-
- on the same clock source.
- If not, the clocks will always have a small amount of drift, which will
- prevent the echo canceller from adapting.
- \end_layout
-
- \begin_layout Itemize
- The delay between the record and playback signals must be minimal.
- Any signal played has to
- \begin_inset Quotes eld
- \end_inset
-
- appear
- \begin_inset Quotes erd
- \end_inset
-
- on the playback (far end) signal slightly before the echo canceller
- \begin_inset Quotes eld
- \end_inset
-
- sees
- \begin_inset Quotes erd
- \end_inset
-
- it in the near end signal, but excessive delay means that part of the filter
- length is wasted.
- In the worst situations, the delay is such that it is longer than the filter
- length, in which case, no echo can be cancelled.
- \end_layout
-
- \begin_layout Itemize
- When it comes to echo tail length (filter length), longer is
- \series bold
- not
- \series default
- better.
- Actually, the longer the tail length, the longer it takes for the filter
- to adapt.
- Of course, a tail length that is too short will not cancel enough echo,
- but the most common problem seen is that people set a very long tail length
- and then wonder why no echo is being cancelled.
- \end_layout
-
- \begin_layout Itemize
- Non-linear distortion cannot (by definition) be modeled by the linear adaptive
- filter used in the echo canceller and thus cannot be cancelled.
- Use good audio gear and avoid saturation/clipping.
- \end_layout
-
- \begin_layout Standard
- Also useful is reading
- \emph on
- Echo Cancellation Demystified
- \emph default
- by Alexey Frunze
- \begin_inset Foot
- status collapsed
-
- \begin_layout Plain Layout
- http://www.embeddedstar.com/articles/2003/7/article20030720-1.html
- \end_layout
-
- \end_inset
-
- , which explains the fundamental principles of echo cancellation.
- The details of the algorithm described in the article are different, but
- the general ideas of echo cancellation through adaptive filters are the
- same.
- \end_layout
-
- \begin_layout Standard
- As of version 1.2beta2, a new
- \family typewriter
- echo_diagnostic.m
- \family default
- tool is included in the source distribution.
- The first step is to define DUMP_ECHO_CANCEL_DATA during the build.
- This causes the echo canceller to automatically save the near-end, far-end
- and output signals to files (aec_rec.sw aec_play.sw and aec_out.sw).
- These are exactly what the AEC receives and outputs.
- From there, it is necessary to start Octave and type:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- lstparams "language=Matlab"
- inline false
- status open
-
- \begin_layout Plain Layout
-
- echo_diagnostic('aec_rec.sw', 'aec_play.sw', 'aec_diagnostic.sw', 1024);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The value of 1024 is the filter length and can be changed.
- There will be some (hopefully) useful messages printed and echo cancelled
- audio will be saved to aec_diagnostic.sw .
- If even that output is bad (almost no cancellation) then there is probably
- problem with the playback or recording process.
- \end_layout
-
- \begin_layout Section
- Jitter Buffer
- \end_layout
-
- \begin_layout Standard
- The jitter buffer can be enabled by including:
- \begin_inset listings
- lstparams "breaklines=true"
- inline false
- status open
-
- \begin_layout Plain Layout
-
- #include <speex/speex_jitter.h>
- \end_layout
-
- \end_inset
-
- and a new jitter buffer state can be initialised by:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- lstparams "breaklines=true"
- inline false
- status open
-
- \begin_layout Plain Layout
-
- JitterBuffer *state = jitter_buffer_init(step);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- where the
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- step
- \end_layout
-
- \end_inset
-
- argument is the default time step (in timestamp units) used for adjusting
- the delay and doing concealment.
- A value of 1 is always correct, but higher values may be more convenient
- sometimes.
- For example, if you are only able to do concealment on 20ms frames, there
- is no point in the jitter buffer asking you to do it on one sample.
- Another example is that for video, it makes no sense to adjust the delay
- by less than a full frame.
- The value provided can always be changed at a later time.
- \end_layout
-
- \begin_layout Standard
- The jitter buffer API is based on the
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- JitterBufferPacket
- \end_layout
-
- \end_inset
-
- type, which is defined as:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- typedef struct {
- \end_layout
-
- \begin_layout Plain Layout
-
- char *data; /* Data bytes contained in the packet */
- \end_layout
-
- \begin_layout Plain Layout
-
- spx_uint32_t len; /* Length of the packet in bytes */
- \end_layout
-
- \begin_layout Plain Layout
-
- spx_uint32_t timestamp; /* Timestamp for the packet */
- \end_layout
-
- \begin_layout Plain Layout
-
- spx_uint32_t span; /* Time covered by the packet (timestamp units)
- */
- \end_layout
-
- \begin_layout Plain Layout
-
- } JitterBufferPacket;
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- As an example, for audio the timestamp field would be what is obtained from
- the RTP timestamp field and the span would be the number of samples that
- are encoded in the packet.
- For Speex narrowband, span would be 160 if only one frame is included in
- the packet.
-
- \end_layout
-
- \begin_layout Standard
- When a packet arrives, it need to be inserter into the jitter buffer by:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- JitterBufferPacket packet;
- \end_layout
-
- \begin_layout Plain Layout
-
- /* Fill in each field in the packet struct */
- \end_layout
-
- \begin_layout Plain Layout
-
- jitter_buffer_put(state, &packet);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- When the decoder is ready to decode a packet the packet to be decoded can
- be obtained by:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- int start_offset;
- \end_layout
-
- \begin_layout Plain Layout
-
- err = jitter_buffer_get(state, &packet, desired_span, &start_offset);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- If
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- jitter_buffer_put()
- \end_layout
-
- \end_inset
-
- and
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- jitter_buffer_get()
- \end_layout
-
- \end_inset
-
- are called from different threads, then
- \series bold
- you need to protect the jitter buffer state with a mutex
- \series default
- .
-
- \end_layout
-
- \begin_layout Standard
- Because the jitter buffer is designed not to use an explicit timer, it needs
- to be told about the time explicitly.
- This is done by calling:
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- jitter_buffer_tick(state);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- This needs to be done periodically in the playing thread.
- This will be the last jitter buffer call before going to sleep (until more
- data is played back).
- In some cases, it may be preferable to use
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- jitter_buffer_remaining_span(state, remaining);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The second argument is used to specify that we are still holding data that
- has not been written to the playback device.
- For instance, if 256 samples were needed by the soundcard (specified by
-
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- desired_span
- \end_layout
-
- \end_inset
-
- ), but
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- jitter_buffer_get()
- \end_layout
-
- \end_inset
-
- returned 320 samples, we would have
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- remaining=64
- \end_layout
-
- \end_inset
-
- .
- \end_layout
-
- \begin_layout Section
- Resampler
- \end_layout
-
- \begin_layout Standard
- Speex includes a resampling modules.
- To make use of the resampler, it is necessary to include its header file:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- #include <speex/speex_resampler.h>
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- For each stream that is to be resampled, it is necessary to create a resampler
- state with:
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- SpeexResamplerState *resampler;
- \end_layout
-
- \begin_layout Plain Layout
-
- resampler = speex_resampler_init(nb_channels, input_rate, output_rate, quality,
- &err);
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- where
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- nb_channels
- \end_layout
-
- \end_inset
-
- is the number of channels that will be used (either interleaved or non-interlea
- ved),
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- input_rate
- \end_layout
-
- \end_inset
-
- is the sampling rate of the input stream,
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- output_rate
- \end_layout
-
- \end_inset
-
- is the sampling rate of the output stream and
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- quality
- \end_layout
-
- \end_inset
-
- is the requested quality setting (0 to 10).
- The quality parameter is useful for controlling the quality/complexity/latency
- tradeoff.
- Using a higher quality setting means less noise/aliasing, a higher complexity
- and a higher latency.
- Usually, a quality of 3 is acceptable for most desktop uses and quality
- 10 is mostly recommended for pro audio work.
- Quality 0 usually has a decent sound (certainly better than using linear
- interpolation resampling), but artifacts may be heard.
- \end_layout
-
- \begin_layout Standard
- The actual resampling is performed using
- \end_layout
-
- \begin_layout Standard
- \begin_inset listings
- inline false
- status open
-
- \begin_layout Plain Layout
-
- err = speex_resampler_process_int(resampler, channelID, in, &in_length,
- out, &out_length);
- \end_layout
-
- \end_inset
-
- where
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- channelID
- \end_layout
-
- \end_inset
-
- is the ID of the channel to be processed.
- For a mono stream, use 0.
- The
- \emph on
- in
- \emph default
- pointer points to the first sample of the input buffer for the selected
- channel and
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- out
- \end_layout
-
- \end_inset
-
- points to the first sample of the output.
- The size of the input and output buffers are specified by
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- in_length
- \end_layout
-
- \end_inset
-
- and
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- out_length
- \end_layout
-
- \end_inset
-
- respectively.
- Upon completion, these values are replaced by the number of samples read
- and written by the resampler.
- Unless an error occurs, either all input samples will be read or all output
- samples will be written to (or both).
- For floating-point samples, the function
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- speex_resampler_process_float()
- \end_layout
-
- \end_inset
-
- behaves similarly.
- \end_layout
-
- \begin_layout Standard
- It is also possible to process multiple channels at once.
- To do that, you can use speex_resampler_process_interleaved_int() or
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- speex_resampler_process_interleaved_float()
- \end_layout
-
- \end_inset
-
- .
- The arguments are the same except that there is no
- \begin_inset listings
- inline true
- status collapsed
-
- \begin_layout Plain Layout
-
- channelID
- \end_layout
-
- \end_inset
-
- argument.
- Note that the
- \series bold
- length parameters are per-channel
- \series default
- .
- So if you have 1024 samples for each of 4 channels, you pass 1024 and not
- 4096.
- \end_layout
-
- \begin_layout Standard
- The resampler allows changing the quality and input/output sampling frequencies
- on the fly without glitches.
- This can be done with calls such as
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- speex_resampler_set_quality()
- \end_layout
-
- \end_inset
-
- and
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- speex_resampler_set_rate()
- \end_layout
-
- \end_inset
-
- .
- The only side effect is that a new filter will have to be recomputed, consuming
- many CPU cycles.
-
- \end_layout
-
- \begin_layout Standard
- When resampling a file, it is often desirable to have the output file perfectly
- synchronised with the input.
- To do that, you need to call
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- speex_resampler_skip_zeros()
- \end_layout
-
- \end_inset
-
-
- \series bold
- before
- \series default
- you start processing any samples.
- For real-time applications (e.g.
- VoIP), it is not recommended to do that as the first process frame will
- be shorter to compensate for the delay (the skipped zeros).
- Instead, in real-time applications you may want to know how many delay
- is introduced by the resampler.
- This can be done at run-time with
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- speex_resampler_get_input_latency()
- \end_layout
-
- \end_inset
-
- and
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- speex_resampler_get_output_latency()
- \end_layout
-
- \end_inset
-
- functions.
- First function returns delay measured in samples at input samplerate, while
- second returns delay measured in samples at output samplerate.
- \end_layout
-
- \begin_layout Standard
- To destroy a resampler state, just call
- \begin_inset listings
- inline true
- status open
-
- \begin_layout Plain Layout
-
- speex_resampler_destroy()
- \end_layout
-
- \end_inset
-
- .
- \end_layout
-
- \begin_layout Section
- Ring Buffer
- \end_layout
-
- \begin_layout Standard
- In some cases, it is necessary to interface components that use different
- block sizes.
- For example, it is possible that the soundcard does not support reading/writing
- in blocks of 20
- \begin_inset space ~
- \end_inset
-
- ms or sometimes, complicated resampling ratios mean that the blocks don't
- always have the same time.
- In thoses cases, it is often necessary to buffer a bit of audio using a
- ring buffer.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Formats and standards
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- standards
- \end_layout
-
- \end_inset
-
-
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Formats-and-standards"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Speex can encode speech in both narrowband and wideband and provides different
- bit-rates.
- However, not all features need to be supported by a certain implementation
- or device.
- In order to be called
- \begin_inset Quotes eld
- \end_inset
-
- Speex compatible
- \begin_inset Quotes erd
- \end_inset
-
- (whatever that means), an implementation must implement at least a basic
- set of features.
- \end_layout
-
- \begin_layout Standard
- At the minimum, all narrowband modes of operation MUST be supported at the
- decoder.
- This includes the decoding of a wideband bit-stream by the narrowband decoder
- \begin_inset Foot
- status collapsed
-
- \begin_layout Plain Layout
- The wideband bit-stream contains an embedded narrowband bit-stream which
- can be decoded alone
- \end_layout
-
- \end_inset
-
- .
- If present, a wideband decoder MUST be able to decode a narrowband stream,
- and MAY either be able to decode all wideband modes or be able to decode
- the embedded narrowband part of all modes (which includes ignoring the
- high-band bits).
- \end_layout
-
- \begin_layout Standard
- For encoders, at least one narrowband or wideband mode MUST be supported.
- The main reason why all encoding modes do not have to be supported is that
- some platforms may not be able to handle the complexity of encoding in
- some modes.
- \end_layout
-
- \begin_layout Section
- RTP
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- RTP
- \end_layout
-
- \end_inset
-
- Payload Format
- \end_layout
-
- \begin_layout Standard
- The RTP payload draft is included in appendix
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:IETF-draft"
-
- \end_inset
-
- and the latest version is available at
- \begin_inset Flex URL
- status collapsed
-
- \begin_layout Plain Layout
-
- http://www.speex.org/drafts/latest
- \end_layout
-
- \end_inset
-
- .
- This draft has been sent (2003/02/26) to the Internet Engineering Task
- Force (IETF) and will be discussed at the March 18th meeting in San Francisco.
-
- \end_layout
-
- \begin_layout Section
- MIME Type
- \end_layout
-
- \begin_layout Standard
- For now, you should use the MIME type audio/x-speex for Speex-in-Ogg.
- We will apply for type
- \family typewriter
- audio/speex
- \family default
- in the near future.
- \end_layout
-
- \begin_layout Section
- Ogg
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- Ogg
- \end_layout
-
- \end_inset
-
- file format
- \end_layout
-
- \begin_layout Standard
- Speex bit-streams can be stored in Ogg files.
- In this case, the first packet of the Ogg file contains the Speex header
- described in table
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:ogg_speex_header"
-
- \end_inset
-
- .
- All integer fields in the headers are stored as little-endian.
- The
- \family typewriter
- speex_string
- \family default
- field must contain the
- \begin_inset Quotes eld
- \end_inset
-
-
- \family typewriter
- Speex
- \family default
-
- \begin_inset space ~
- \end_inset
-
-
- \begin_inset space ~
- \end_inset
-
-
- \begin_inset space ~
- \end_inset
-
-
- \begin_inset Quotes erd
- \end_inset
-
- (with 3 trailing spaces), which identifies the bit-stream.
- The next field,
- \family typewriter
- speex_version
- \family default
- contains the version of Speex that encoded the file.
- For now, refer to speex_header.[ch] for more info.
- The
- \emph on
- beginning of stream
- \emph default
- (
- \family typewriter
- b_o_s
- \family default
- ) flag is set to 1 for the header.
- The header packet has
- \family typewriter
- packetno=0
- \family default
- and
- \family typewriter
- granulepos=0
- \family default
- .
- \end_layout
-
- \begin_layout Standard
- The second packet contains the Speex comment header.
- The format used is the Vorbis comment format described here: http://www.xiph.org/
- ogg/vorbis/doc/v-comment.html .
- This packet has
- \family typewriter
- packetno=1
- \family default
- and
- \family typewriter
- granulepos=0
- \family default
- .
- \end_layout
-
- \begin_layout Standard
- The third and subsequent packets each contain one or more (number found
- in header) Speex frames.
- These are identified with
- \family typewriter
- packetno
- \family default
- starting from 2 and the
- \family typewriter
- granulepos
- \family default
- is the number of the last sample encoded in that packet.
- The last of these packets has the
- \emph on
- end of stream
- \emph default
- (
- \family typewriter
- e_o_s
- \family default
- ) flag is set to 1.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float table
- placement htbp
- wide true
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Tabular
- <lyxtabular version="3" rows="16" columns="3">
- <features>
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Field
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Type
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Size
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- speex_string
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- char[]
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 8
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- speex_version
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- char[]
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 20
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- speex_version_id
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- header_size
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- rate
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- mode
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- mode_bitstream_version
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- nb_channels
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- bitrate
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame_size
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- vbr
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frames_per_packet
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- extra_headers
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved2
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- int
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- </lyxtabular>
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Ogg/Speex header packet
- \begin_inset CommandInset label
- LatexCommand label
- name "cap:ogg_speex_header"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- clearpage
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Introduction to CELP Coding
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- CELP
- \end_layout
-
- \end_inset
-
-
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Introduction-to-CELP"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Quote
- \align center
-
- \emph on
- Do not meddle in the affairs of poles, for they are subtle and quick to
- leave the unit circle.
- \end_layout
-
- \begin_layout Standard
- Speex is based on CELP, which stands for Code Excited Linear Prediction.
- This section attempts to introduce the principles behind CELP, so if you
- are already familiar with CELP, you can safely skip to section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Speex-narrowband-mode"
-
- \end_inset
-
- .
- The CELP technique is based on three ideas:
- \end_layout
-
- \begin_layout Enumerate
- The use of a linear prediction (LP) model to model the vocal tract
- \end_layout
-
- \begin_layout Enumerate
- The use of (adaptive and fixed) codebook entries as input (excitation) of
- the LP model
- \end_layout
-
- \begin_layout Enumerate
- The search performed in closed-loop in a
- \begin_inset Quotes eld
- \end_inset
-
- perceptually weighted domain
- \begin_inset Quotes erd
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- This section describes the basic ideas behind CELP.
- This is still a work in progress.
- \end_layout
-
- \begin_layout Section
- Source-Filter Model of Speech Prediction
- \end_layout
-
- \begin_layout Standard
- The source-filter model of speech production assumes that the vocal cords
- are the source of spectrally flat sound (the excitation signal), and that
- the vocal tract acts as a filter to spectrally shape the various sounds
- of speech.
- While still an approximation, the model is widely used in speech coding
- because of its simplicity.Its use is also the reason why most speech codecs
- (Speex included) perform badly on music signals.
- The different phonemes can be distinguished by their excitation (source)
- and spectral shape (filter).
- Voiced sounds (e.g.
- vowels) have an excitation signal that is periodic and that can be approximated
- by an impulse train in the time domain or by regularly-spaced harmonics
- in the frequency domain.
- On the other hand, fricatives (such as the "s", "sh" and "f" sounds) have
- an excitation signal that is similar to white Gaussian noise.
- So called voice fricatives (such as "z" and "v") have excitation signal
- composed of an harmonic part and a noisy part.
- \end_layout
-
- \begin_layout Standard
- The source-filter model is usually tied with the use of Linear prediction.
- The CELP model is based on source-filter model, as can be seen from the
- CELP decoder illustrated in Figure
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "fig:The-CELP-model"
-
- \end_inset
-
- .
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float figure
- wide false
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Graphics
- filename celp_decoder.eps
- width 45page%
- keepAspectRatio
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- The CELP model of speech synthesis (decoder)
- \begin_inset CommandInset label
- LatexCommand label
- name "fig:The-CELP-model"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Section
- Linear Prediction Coefficients (LPC)
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- linear prediction
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Linear prediction is at the base of many speech coding techniques, including
- CELP.
- The idea behind it is to predict the signal
- \begin_inset Formula $x[n]$
- \end_inset
-
- using a linear combination of its past samples:
- \end_layout
-
- \begin_layout Standard
- \begin_inset Formula \[
- y[n]=\sum_{i=1}^{N}a_{i}x[n-i]\]
-
- \end_inset
-
- where
- \begin_inset Formula $y[n]$
- \end_inset
-
- is the linear prediction of
- \begin_inset Formula $x[n]$
- \end_inset
-
- .
- The prediction error is thus given by:
- \begin_inset Formula \[
- e[n]=x[n]-y[n]=x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\]
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The goal of the LPC analysis is to find the best prediction coefficients
-
- \begin_inset Formula $a_{i}$
- \end_inset
-
- which minimize the quadratic error function:
- \begin_inset Formula \[
- E=\sum_{n=0}^{L-1}\left[e[n]\right]^{2}=\sum_{n=0}^{L-1}\left[x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\right]^{2}\]
-
- \end_inset
-
- That can be done by making all derivatives
- \begin_inset Formula $\frac{\partial E}{\partial a_{i}}$
- \end_inset
-
- equal to zero:
- \begin_inset Formula \[
- \frac{\partial E}{\partial a_{i}}=\frac{\partial}{\partial a_{i}}\sum_{n=0}^{L-1}\left[x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\right]^{2}=0\]
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- For an order
- \begin_inset Formula $N$
- \end_inset
-
- filter, the filter coefficients
- \begin_inset Formula $a_{i}$
- \end_inset
-
- are found by solving the system
- \begin_inset Formula $N\times N$
- \end_inset
-
- linear system
- \begin_inset Formula $\mathbf{Ra}=\mathbf{r}$
- \end_inset
-
- , where
- \begin_inset Formula \[
- \mathbf{R}=\left[\begin{array}{cccc}
- R(0) & R(1) & \cdots & R(N-1)\\
- R(1) & R(0) & \cdots & R(N-2)\\
- \vdots & \vdots & \ddots & \vdots\\
- R(N-1) & R(N-2) & \cdots & R(0)\end{array}\right]\]
-
- \end_inset
-
-
- \begin_inset Formula \[
- \mathbf{r}=\left[\begin{array}{c}
- R(1)\\
- R(2)\\
- \vdots\\
- R(N)\end{array}\right]\]
-
- \end_inset
-
- with
- \begin_inset Formula $R(m)$
- \end_inset
-
- , the auto-correlation
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- auto-correlation
- \end_layout
-
- \end_inset
-
- of the signal
- \begin_inset Formula $x[n]$
- \end_inset
-
- , computed as:
- \end_layout
-
- \begin_layout Standard
- \begin_inset Formula \[
- R(m)=\sum_{i=0}^{N-1}x[i]x[i-m]\]
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Because
- \begin_inset Formula $\mathbf{R}$
- \end_inset
-
- is Hermitian Toeplitz, the Levinson-Durbin
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- Levinson-Durbin
- \end_layout
-
- \end_inset
-
- algorithm can be used, making the solution to the problem
- \begin_inset Formula $\mathcal{O}\left(N^{2}\right)$
- \end_inset
-
- instead of
- \begin_inset Formula $\mathcal{O}\left(N^{3}\right)$
- \end_inset
-
- .
- Also, it can be proven that all the roots of
- \begin_inset Formula $A(z)$
- \end_inset
-
- are within the unit circle, which means that
- \begin_inset Formula $1/A(z)$
- \end_inset
-
- is always stable.
- This is in theory; in practice because of finite precision, there are two
- commonly used techniques to make sure we have a stable filter.
- First, we multiply
- \begin_inset Formula $R(0)$
- \end_inset
-
- by a number slightly above one (such as 1.0001), which is equivalent to
- adding noise to the signal.
- Also, we can apply a window to the auto-correlation, which is equivalent
- to filtering in the frequency domain, reducing sharp resonances.
- \end_layout
-
- \begin_layout Section
- Pitch Prediction
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- pitch
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- During voiced segments, the speech signal is periodic, so it is possible
- to take advantage of that property by approximating the excitation signal
-
- \begin_inset Formula $e[n]$
- \end_inset
-
- by a gain times the past of the excitation:
- \end_layout
-
- \begin_layout Standard
- \begin_inset Formula \[
- e[n]\simeq p[n]=\beta e[n-T]\ ,\]
-
- \end_inset
-
- where
- \begin_inset Formula $T$
- \end_inset
-
- is the pitch period,
- \begin_inset Formula $\beta$
- \end_inset
-
- is the pitch gain.
- We call that long-term prediction since the excitation is predicted from
-
- \begin_inset Formula $e[n-T]$
- \end_inset
-
- with
- \begin_inset Formula $T\gg N$
- \end_inset
-
- .
- \end_layout
-
- \begin_layout Section
- Innovation Codebook
- \end_layout
-
- \begin_layout Standard
- The final excitation
- \begin_inset Formula $e[n]$
- \end_inset
-
- will be the sum of the pitch prediction and an
- \emph on
- innovation
- \emph default
- signal
- \begin_inset Formula $c[n]$
- \end_inset
-
- taken from a fixed codebook, hence the name
- \emph on
- Code
- \emph default
- Excited Linear Prediction.
- The final excitation is given by
- \end_layout
-
- \begin_layout Standard
- \begin_inset Formula \[
- e[n]=p[n]+c[n]=\beta e[n-T]+c[n]\ .\]
-
- \end_inset
-
- The quantization of
- \begin_inset Formula $c[n]$
- \end_inset
-
- is where most of the bits in a CELP codec are allocated.
- It represents the information that couldn't be obtained either from linear
- prediction or pitch prediction.
- In the
- \emph on
- z
- \emph default
- -domain we can represent the final signal
- \begin_inset Formula $X(z)$
- \end_inset
-
- as
- \begin_inset Formula \[
- X(z)=\frac{C(z)}{A(z)\left(1-\beta z^{-T}\right)}\]
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Section
- Noise Weighting
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- error weighting
- \end_layout
-
- \end_inset
-
-
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- analysis-by-synthesis
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- Most (if not all) modern audio codecs attempt to
- \begin_inset Quotes eld
- \end_inset
-
- shape
- \begin_inset Quotes erd
- \end_inset
-
- the noise so that it appears mostly in the frequency regions where the
- ear cannot detect it.
- For example, the ear is more tolerant to noise in parts of the spectrum
- that are louder and
- \emph on
- vice versa
- \emph default
- .
- In order to maximize speech quality, CELP codecs minimize the mean square
- of the error (noise) in the perceptually weighted domain.
- This means that a perceptual noise weighting filter
- \begin_inset Formula $W(z)$
- \end_inset
-
- is applied to the error signal in the encoder.
- In most CELP codecs,
- \begin_inset Formula $W(z)$
- \end_inset
-
- is a pole-zero weighting filter derived from the linear prediction coefficients
- (LPC), generally using bandwidth expansion.
- Let the spectral envelope be represented by the synthesis filter
- \begin_inset Formula $1/A(z)$
- \end_inset
-
- , CELP codecs typically derive the noise weighting filter as
- \begin_inset Formula \begin{equation}
- W(z)=\frac{A(z/\gamma_{1})}{A(z/\gamma_{2})}\ ,\label{eq:gamma-weighting}\end{equation}
-
- \end_inset
-
- where
- \begin_inset Formula $\gamma_{1}=0.9$
- \end_inset
-
- and
- \begin_inset Formula $\gamma_{2}=0.6$
- \end_inset
-
- in the Speex reference implementation.
- If a filter
- \begin_inset Formula $A(z)$
- \end_inset
-
- has (complex) poles at
- \begin_inset Formula $p_{i}$
- \end_inset
-
- in the
- \begin_inset Formula $z$
- \end_inset
-
- -plane, the filter
- \begin_inset Formula $A(z/\gamma)$
- \end_inset
-
- will have its poles at
- \begin_inset Formula $p'_{i}=\gamma p_{i}$
- \end_inset
-
- , making it a flatter version of
- \begin_inset Formula $A(z)$
- \end_inset
-
- .
- \end_layout
-
- \begin_layout Standard
- The weighting filter is applied to the error signal used to optimize the
- codebook search through analysis-by-synthesis (AbS).
- This results in a spectral shape of the noise that tends towards
- \begin_inset Formula $1/W(z)$
- \end_inset
-
- .
- While the simplicity of the model has been an important reason for the
- success of CELP, it remains that
- \begin_inset Formula $W(z)$
- \end_inset
-
- is a very rough approximation for the perceptually optimal noise weighting
- function.
- Fig.
-
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:Standard-noise-shaping"
-
- \end_inset
-
- illustrates the noise shaping that results from Eq.
-
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "eq:gamma-weighting"
-
- \end_inset
-
- .
- Throughout this paper, we refer to
- \begin_inset Formula $W(z)$
- \end_inset
-
- as the noise weighting filter and to
- \begin_inset Formula $1/W(z)$
- \end_inset
-
- as the noise shaping filter (or curve).
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float figure
- wide false
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Graphics
- filename ref_shaping.eps
- width 45page%
- keepAspectRatio
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Standard noise shaping in CELP.
- Arbitrary y-axis offset.
- \begin_inset CommandInset label
- LatexCommand label
- name "cap:Standard-noise-shaping"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Section
- Analysis-by-Synthesis
- \end_layout
-
- \begin_layout Standard
- One of the main principles behind CELP is called Analysis-by-Synthesis (AbS),
- meaning that the encoding (analysis) is performed by perceptually optimising
- the decoded (synthesis) signal in a closed loop.
- In theory, the best CELP stream would be produced by trying all possible
- bit combinations and selecting the one that produces the best-sounding
- decoded signal.
- This is obviously not possible in practice for two reasons: the required
- complexity is beyond any currently available hardware and the
- \begin_inset Quotes eld
- \end_inset
-
- best sounding
- \begin_inset Quotes erd
- \end_inset
-
- selection criterion implies a human listener.
-
- \end_layout
-
- \begin_layout Standard
- In order to achieve real-time encoding using limited computing resources,
- the CELP optimisation is broken down into smaller, more manageable, sequential
- searches using the perceptual weighting function described earlier.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- The Speex Decoder Specification
- \end_layout
-
- \begin_layout Section
- Narrowband decoder
- \end_layout
-
- \begin_layout Standard
- <Insert decoder figure here>
- \end_layout
-
- \begin_layout Subsection
- Narrowband modes
- \end_layout
-
- \begin_layout Standard
- There are 7 different narrowband bit-rates defined for Speex, ranging from
- 250 bps to 24.6 kbps, although the modes below 5.9 kbps should not be used
- for speech.
- The bit-allocation for each mode is detailed in table
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:bits-narrowband"
-
- \end_inset
-
- .
- Each frame starts with the mode ID encoded with 4 bits which allows a range
- from 0 to 15, though only the first 7 values are used (the others are reserved).
- The parameters are listed in the table in the order they are packed in
- the bit-stream.
- All frame-based parameters are packed before sub-frame parameters.
- The parameters for a certain sub-frame are all packed before the following
- sub-frame is packed.
- The
- \begin_inset Quotes eld
- \end_inset
-
- OL
- \begin_inset Quotes erd
- \end_inset
-
- in the parameter description means that the parameter is an open loop estimatio
- n based on the whole frame.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float table
- placement h
- wide true
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Tabular
- <lyxtabular version="3" rows="12" columns="11">
- <features>
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Parameter
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Update rate
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 2
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 6
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 8
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Wideband bit
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Mode ID
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- LSP
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 18
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 18
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 18
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 18
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 30
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 30
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 30
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 18
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- OL pitch
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- OL pitch gain
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- OL Exc gain
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Fine pitch
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- sub-frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Pitch gain
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- sub-frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Innovation gain
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- sub-frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Innovation VQ
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- sub-frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 16
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 20
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 35
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 48
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 64
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 96
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 10
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Total
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 43
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 119
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 160
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 220
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 300
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 364
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 492
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 79
- \end_layout
-
- \end_inset
- </cell>
- </row>
- </lyxtabular>
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Bit allocation for narrowband modes
- \begin_inset CommandInset label
- LatexCommand label
- name "cap:bits-narrowband"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Subsection
- LSP decoding
- \end_layout
-
- \begin_layout Standard
- Depending on the mode, LSP parameters are encoded using either 18 bits or
- 30 bits.
- \end_layout
-
- \begin_layout Standard
- Interpolation
- \end_layout
-
- \begin_layout Standard
- Safe margin
- \end_layout
-
- \begin_layout Subsection
- Adaptive codebook
- \end_layout
-
- \begin_layout Standard
- For rates of 8 kbit/s and above, the pitch period is encoded for each subframe.
- The real period is
- \begin_inset Formula $T=p_{i}+17$
- \end_inset
-
- where
- \begin_inset Formula $p_{i}$
- \end_inset
-
- is a value encoded with 7 bits and 17 corresponds to the minimum pitch.
- The maximum period is 144.
- At 5.95 kbit/s (mode 2), the pitch period is similarly encoded, but only
- once for the frame.
- Each sub-frame then has a 2-bit offset that is added to the pitch value
- of the frame.
- In that case, the pitch for each sub-frame is equal to
- \begin_inset Formula $T-1+offset$
- \end_inset
-
- .
- For rates below 5.95 kbit/s, only the per-frame pitch is used and the pitch
- is constant for all sub-frames.
- \end_layout
-
- \begin_layout Standard
- Speex uses a 3-tap predictor for rates of 5.95 kbit/s and above.
- The three gain values are obtained from a 5-bit or a 7-bit codebook, depending
- on the mode.
-
- \end_layout
-
- \begin_layout Subsection
- Innovation codebook
- \end_layout
-
- \begin_layout Standard
- Split codebook, size and entries depend on bit-rate
- \end_layout
-
- \begin_layout Standard
- a 5-bit gain is encoder on a per-frame basis
- \end_layout
-
- \begin_layout Standard
- Depending on the mode, higher resolution per sub-frame
- \end_layout
-
- \begin_layout Standard
- innovation sub-vectors concatenated, gain applied
- \end_layout
-
- \begin_layout Subsection
- Perceptual enhancement
- \end_layout
-
- \begin_layout Standard
- Optional, implementation-defined.
-
- \end_layout
-
- \begin_layout Subsection
- Bit-stream definition
- \end_layout
-
- \begin_layout Standard
- This section defines the bit-stream that is transmitted on the wire.
- One speex packet consist of 1 frame header and 4 sub-frames:
- \end_layout
-
- \begin_layout Standard
- \begin_inset Tabular
- <lyxtabular version="3" rows="1" columns="5">
- <features>
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Frame Header
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Subframe 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Subframe2
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Subframe 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Subframe 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- </lyxtabular>
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The frame header is variable length, depending on decoding mode and submode.
- The narrowband frame header is defined as follows:
- \end_layout
-
- \begin_layout Standard
- \begin_inset Tabular
- <lyxtabular version="3" rows="1" columns="6">
- <features>
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- wb bit
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- modeid
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- LSP
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- OL-pitch
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- OL-pitchgain
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- OL ExcGain
- \end_layout
-
- \end_inset
- </cell>
- </row>
- </lyxtabular>
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- wb-bit: Wideband bit (1 bit) 0=narrowband, 1=wideband
- \end_layout
-
- \begin_layout Standard
- modeid: Mode identifier (4 bits)
- \end_layout
-
- \begin_layout Standard
- LSP: Line Spectral Pairs (0, 18 or 30 bits)
- \end_layout
-
- \begin_layout Standard
- OL-pitch: Open Loop Pitch (0 or 7 bits)
- \end_layout
-
- \begin_layout Standard
- OL-pitchgain: Open Loop Pitch Gain (0 or 4 bits)
- \end_layout
-
- \begin_layout Standard
- OL-ExcGain: Open Loop Excitation Gain (0 or 5 bits)
- \end_layout
-
- \begin_layout Standard
- ...
- \end_layout
-
- \begin_layout Standard
- Each subframe is defined as follows:
- \end_layout
-
- \begin_layout Standard
- \begin_inset Tabular
- <lyxtabular version="3" rows="1" columns="4">
- <features>
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <column alignment="center" valignment="top" width="0">
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- FinePitch
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- PitchGain
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- InnovationGain
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Innovation VQ
- \end_layout
-
- \end_inset
- </cell>
- </row>
- </lyxtabular>
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- FinePitch: (0 or 7 bits)
- \end_layout
-
- \begin_layout Standard
- PitchGain: (0, 5, or 7 bits)
- \end_layout
-
- \begin_layout Standard
- Innovation Gain: (0, 1, 3 bits)
- \end_layout
-
- \begin_layout Standard
- Innovation VQ: (0-96 bits)
- \end_layout
-
- \begin_layout Standard
- ...
- \end_layout
-
- \begin_layout Subsection
- Sample decoder
- \end_layout
-
- \begin_layout Standard
- This section contains some sample source code, showing how a basic Speex
- decoder can be implemented.
- The sample decoder is narrowband submode 3 only, and with no advanced features
- like enhancement, vbr etc.
- \end_layout
-
- \begin_layout Standard
- ...
- \end_layout
-
- \begin_layout Subsection
- Lookup tables
- \end_layout
-
- \begin_layout Standard
- The Speex decoder includes a set of lookup tables and codebooks, which are
- used to convert between values of different domains.
- This includes:
- \end_layout
-
- \begin_layout Standard
- - Excitation 10x16 (3200 bps)
- \end_layout
-
- \begin_layout Standard
- - Excitation 10x32 (4000 bps)
- \end_layout
-
- \begin_layout Standard
- - Excitation 20x32 (2000 bps)
- \end_layout
-
- \begin_layout Standard
- - Excitation 5x256 (12800 bps)
- \end_layout
-
- \begin_layout Standard
- - Excitation 5x64 (9600 bps)
- \end_layout
-
- \begin_layout Standard
- - Excitation 8x128 (7000 bps)
- \end_layout
-
- \begin_layout Standard
- - Codebook for 3-tap pitch prediction gain (Normal and Low Bitrate)
- \end_layout
-
- \begin_layout Standard
- - Codebook for LSPs in narrowband CELP mode
- \end_layout
-
- \begin_layout Standard
- ...
- \end_layout
-
- \begin_layout Standard
- The exact lookup tables are included here for reference.
- \end_layout
-
- \begin_layout Section
- Wideband embedded decoder
- \end_layout
-
- \begin_layout Standard
- QMF filter.
- Narrowband signal decoded using narrowband decoder
- \end_layout
-
- \begin_layout Standard
- For the high band, the decoder is similar to the narrowband decoder, with
- the main difference being that there is no adaptive codebook.
- \end_layout
-
- \begin_layout Standard
- Gain is per-subframe
- \end_layout
-
- \begin_layout Chapter
- Speex narrowband mode
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Speex-narrowband-mode"
-
- \end_inset
-
-
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- narrowband
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- This section looks at how Speex works for narrowband (
- \begin_inset Formula $8\:\mathrm{kHz}$
- \end_inset
-
- sampling rate) operation.
- The frame size for this mode is
- \begin_inset Formula $20\:\mathrm{ms}$
- \end_inset
-
- , corresponding to 160 samples.
- Each frame is also subdivided into 4 sub-frames of 40 samples each.
- \end_layout
-
- \begin_layout Standard
- Also many design decisions were based on the original goals and assumptions:
- \end_layout
-
- \begin_layout Itemize
- Minimizing the amount of information extracted from past frames (for robustness
- to packet loss)
- \end_layout
-
- \begin_layout Itemize
- Dynamically-selectable codebooks (LSP, pitch and innovation)
- \end_layout
-
- \begin_layout Itemize
- sub-vector fixed (innovation) codebooks
- \end_layout
-
- \begin_layout Section
- Whole-Frame Analysis
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- linear prediction
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- In narrowband, Speex frames are 20 ms long (160 samples) and are subdivided
- in 4 sub-frames of 5 ms each (40 samples).
- For most narrowband bit-rates (8 kbps and above), the only parameters encoded
- at the frame level are the Line Spectral Pairs (LSP) and a global excitation
- gain
- \begin_inset Formula $g_{frame}$
- \end_inset
-
- , as shown in Fig.
-
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:Frame-open-loop-analysis"
-
- \end_inset
-
- .
- All other parameters are encoded at the sub-frame level.
- \end_layout
-
- \begin_layout Standard
- Linear prediction analysis is performed once per frame using an asymmetric
- Hamming window centered on the fourth sub-frame.
- Because linear prediction coefficients (LPC) are not robust to quantization,
- they are first converted to line spectral pairs (LSP)
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- line spectral pair
- \end_layout
-
- \end_inset
-
- .
- The LSP's are considered to be associated to the
- \begin_inset Formula $4^{th}$
- \end_inset
-
- sub-frames and the LSP's associated to the first 3 sub-frames are linearly
- interpolated using the current and previous LSP coefficients.
- The LSP coefficients and converted back to the LPC filter
- \begin_inset Formula $\hat{A}(z)$
- \end_inset
-
- .
- The non-quantized interpolated filter is denoted
- \begin_inset Formula $A(z)$
- \end_inset
-
- and can be used for the weighting filter
- \begin_inset Formula $W(z)$
- \end_inset
-
- because it does not need to be available to the decoder.
-
- \end_layout
-
- \begin_layout Standard
- To make Speex more robust to packet loss, no prediction is applied on the
- LSP coefficients prior to quantization.
- The LSPs are encoded using vector quantization (VQ) with 30 bits for higher
- quality modes and 18 bits for lower quality.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float figure
- wide false
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Graphics
- filename speex_analysis.eps
- width 35page%
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Frame open-loop analysis
- \begin_inset CommandInset label
- LatexCommand label
- name "cap:Frame-open-loop-analysis"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Section
- Sub-Frame Analysis-by-Synthesis
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float figure
- wide false
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Graphics
- filename speex_abs.eps
- lyxscale 75
- width 40page%
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Analysis-by-synthesis closed-loop optimization on a sub-frame.
- \begin_inset CommandInset label
- LatexCommand label
- name "cap:Sub-frame-AbS"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- The analysis-by-synthesis (AbS) encoder loop is described in Fig.
-
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:Sub-frame-AbS"
-
- \end_inset
-
- .
- There are three main aspects where Speex significantly differs from most
- other CELP codecs.
- First, while most recent CELP codecs make use of fractional pitch estimation
- with a single gain, Speex uses an integer to encode the pitch period, but
- uses a 3-tap predictor (3 gains).
- The adaptive codebook contribution
- \begin_inset Formula $e_{a}[n]$
- \end_inset
-
- can thus be expressed as:
- \begin_inset Formula \begin{equation}
- e_{a}[n]=g_{0}e[n-T-1]+g_{1}e[n-T]+g_{2}e[n-T+1]\label{eq:adaptive-3tap}\end{equation}
-
- \end_inset
-
- where
- \begin_inset Formula $g_{0}$
- \end_inset
-
- ,
- \begin_inset Formula $g_{1}$
- \end_inset
-
- and
- \begin_inset Formula $g_{2}$
- \end_inset
-
- are the jointly quantized pitch gains and
- \begin_inset Formula $e[n]$
- \end_inset
-
- is the codec excitation memory.
- It is worth noting that when the pitch is smaller than the sub-frame size,
- we repeat the excitation at a period
- \begin_inset Formula $T$
- \end_inset
-
- .
- For example, when
- \begin_inset Formula $n-T+1\geq0$
- \end_inset
-
- , we use
- \begin_inset Formula $n-2T+1$
- \end_inset
-
- instead.
- In most modes, the pitch period is encoded with 7 bits in the
- \begin_inset Formula $\left[17,144\right]$
- \end_inset
-
- range and the
- \begin_inset Formula $\beta_{i}$
- \end_inset
-
- coefficients are vector-quantized using 7 bits at higher bit-rates (15
- kbps narrowband and above) and 5 bits at lower bit-rates (11 kbps narrowband
- and below).
- \end_layout
-
- \begin_layout Standard
- Many current CELP codecs use moving average (MA) prediction to encode the
- fixed codebook gain.
- This provides slightly better coding at the expense of introducing a dependency
- on previously encoded frames.
- A second difference is that Speex encodes the fixed codebook gain as the
- product of the global excitation gain
- \begin_inset Formula $g_{frame}$
- \end_inset
-
- with a sub-frame gain corrections
- \begin_inset Formula $g_{subf}$
- \end_inset
-
- .
- This increases robustness to packet loss by eliminating the inter-frame
- dependency.
- The sub-frame gain correction is encoded before the fixed codebook is searched
- (not closed-loop optimized) and uses between 0 and 3 bits per sub-frame,
- depending on the bit-rate.
- \end_layout
-
- \begin_layout Standard
- The third difference is that Speex uses sub-vector quantization of the innovatio
- n (fixed codebook) signal instead of an algebraic codebook.
- Each sub-frame is divided into sub-vectors of lengths ranging between 5
- and 20 samples.
- Each sub-vector is chosen from a bitrate-dependent codebook and all sub-vectors
- are concatenated to form a sub-frame.
- As an example, the 3.95 kbps mode uses a sub-vector size of 20 samples with
- 32 entries in the codebook (5 bits).
- This means that the innovation is encoded with 10 bits per sub-frame, or
- 2000 bps.
- On the other hand, the 18.2 kbps mode uses a sub-vector size of 5 samples
- with 256 entries in the codebook (8 bits), so the innovation uses 64 bits
- per sub-frame, or 12800 bps.
-
- \end_layout
-
- \begin_layout Section
- Bit-rates
- \end_layout
-
- \begin_layout Standard
- So far, no MOS (Mean Opinion Score
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- mean opinion score
- \end_layout
-
- \end_inset
-
- ) subjective evaluation has been performed for Speex.
- In order to give an idea of the quality achievable with it, table
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:quality_vs_bps"
-
- \end_inset
-
- presents my own subjective opinion on it.
- It should be noted that different people will perceive the quality differently
- and that the person that designed the codec often has a bias (one way or
- another) when it comes to subjective evaluation.
- Last thing, it should be noted that for most codecs (including Speex) encoding
- quality sometimes varies depending on the input.
- Note that the complexity is only approximate (within 0.5 mflops and using
- the lowest complexity setting).
- Decoding requires approximately 0.5 mflops
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- complexity
- \end_layout
-
- \end_inset
-
- in most modes (1 mflops with perceptual enhancement).
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float table
- placement h
- wide true
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Tabular
- <lyxtabular version="3" rows="17" columns="5">
- <features>
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Mode
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Quality
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Bit-rate
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- bit-rate
- \end_layout
-
- \end_inset
-
- (bps)
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- mflops
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- complexity
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Quality/description
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 250
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- No transmission (DTX)
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 2,150
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 6
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Vocoder (mostly for comfort noise)
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 2
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 2
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5,950
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 9
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Very noticeable artifacts/noise, good intelligibility
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3-4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 8,000
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 10
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Artifacts/noise sometimes noticeable
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5-6
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 11,000
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 14
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Artifacts usually noticeable only with headphones
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7-8
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 15,000
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 11
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Need good headphones to tell the difference
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 6
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 9
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 18,200
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 17.5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Hard to tell the difference even with good headphones
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 10
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 24,600
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 14.5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Completely transparent for voice, good quality music
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 8
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3,950
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 10.5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Very noticeable artifacts/noise, good intelligibility
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 9
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 10
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 11
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 12
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- reserved
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 13
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Application-defined, interpreted by callback or skipped
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 14
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Speex in-band signaling
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 15
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- -
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Terminator code
- \end_layout
-
- \end_inset
- </cell>
- </row>
- </lyxtabular>
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Quality versus bit-rate
- \begin_inset CommandInset label
- LatexCommand label
- name "cap:quality_vs_bps"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Section
- Perceptual enhancement
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- perceptual enhancement
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
-
- \series bold
- This section was only valid for version 1.1.12 and earlier.
- It does not apply to version 1.2-beta1 (and later), for which the new perceptual
- enhancement is not yet documented.
- \end_layout
-
- \begin_layout Standard
- This part of the codec only applies to the decoder and can even be changed
- without affecting inter-operability.
- For that reason, the implementation provided and described here should
- only be considered as a reference implementation.
- The enhancement system is divided into two parts.
- First, the synthesis filter
- \begin_inset Formula $S(z)=1/A(z)$
- \end_inset
-
- is replaced by an enhanced filter:
- \begin_inset Formula \[
- S'(z)=\frac{A\left(z/a_{2}\right)A\left(z/a_{3}\right)}{A\left(z\right)A\left(z/a_{1}\right)}\]
-
- \end_inset
-
- where
- \begin_inset Formula $a_{1}$
- \end_inset
-
- and
- \begin_inset Formula $a_{2}$
- \end_inset
-
- depend on the mode in use and
- \begin_inset Formula $a_{3}=\frac{1}{r}\left(1-\frac{1-ra_{1}}{1-ra_{2}}\right)$
- \end_inset
-
- with
- \begin_inset Formula $r=.9$
- \end_inset
-
- .
- The second part of the enhancement consists of using a comb filter to enhance
- the pitch in the excitation domain.
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Speex wideband mode (sub-band CELP)
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- wideband
- \end_layout
-
- \end_inset
-
-
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Speex-wideband-mode"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- For wideband, the Speex approach uses a
- \emph on
- q
- \emph default
- uadrature
- \emph on
- m
- \emph default
- irror
- \emph on
- f
- \emph default
- ilter
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- quadrature mirror filter
- \end_layout
-
- \end_inset
-
- (QMF) to split the band in two.
- The 16 kHz signal is thus divided into two 8 kHz signals, one representing
- the low band (0-4 kHz), the other the high band (4-8 kHz).
- The low band is encoded with the narrowband mode described in section
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "sec:Speex-narrowband-mode"
-
- \end_inset
-
- in such a way that the resulting
- \begin_inset Quotes eld
- \end_inset
-
- embedded narrowband bit-stream
- \begin_inset Quotes erd
- \end_inset
-
- can also be decoded with the narrowband decoder.
- Since the low band encoding has already been described, only the high band
- encoding is described in this section.
- \end_layout
-
- \begin_layout Section
- Linear Prediction
- \end_layout
-
- \begin_layout Standard
- The linear prediction part used for the high-band is very similar to what
- is done for narrowband.
- The only difference is that we use only 12 bits to encode the high-band
- LSP's using a multi-stage vector quantizer (MSVQ).
- The first level quantizes the 10 coefficients with 6 bits and the error
- is then quantized using 6 bits, too.
- \end_layout
-
- \begin_layout Section
- Pitch Prediction
- \end_layout
-
- \begin_layout Standard
- That part is easy: there's no pitch prediction for the high-band.
- There are two reasons for that.
- First, there is usually little harmonic structure in this band (above 4
- kHz).
- Second, it would be very hard to implement since the QMF folds the 4-8
- kHz band into 4-0 kHz (reversing the frequency axis), which means that
- the location of the harmonics is no longer at multiples of the fundamental
- (pitch).
- \end_layout
-
- \begin_layout Section
- Excitation Quantization
- \end_layout
-
- \begin_layout Standard
- The high-band excitation is coded in the same way as for narrowband.
-
- \end_layout
-
- \begin_layout Section
- Bit allocation
- \end_layout
-
- \begin_layout Standard
- For the wideband mode, the entire narrowband frame is packed before the
- high-band is encoded.
- The narrowband part of the bit-stream is as defined in table
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:bits-narrowband"
-
- \end_inset
-
- .
- The high-band follows, as described in table
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "cap:bits-wideband"
-
- \end_inset
-
- .
- For wideband, the mode ID is the same as the Speex quality setting and
- is defined in table
- \begin_inset CommandInset ref
- LatexCommand ref
- reference "tab:wideband-quality"
-
- \end_inset
-
- .
- This also means that a wideband frame may be correctly decoded by a narrowband
- decoder with the only caveat that if more than one frame is packed in the
- same packet, the decoder will need to skip the high-band parts in order
- to sync with the bit-stream.
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float table
- placement h
- wide true
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Tabular
- <lyxtabular version="3" rows="7" columns="7">
- <features>
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Parameter
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Update rate
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 2
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Wideband bit
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Mode ID
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- LSP
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 12
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 12
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 12
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 12
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Excitation gain
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- sub-frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Excitation VQ
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- sub-frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 20
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 40
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 80
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Total
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- frame
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 36
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 112
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 192
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 352
- \end_layout
-
- \end_inset
- </cell>
- </row>
- </lyxtabular>
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Bit allocation for high-band in wideband mode
- \begin_inset CommandInset label
- LatexCommand label
- name "cap:bits-wideband"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset Float table
- placement h
- wide true
- sideways false
- status open
-
- \begin_layout Plain Layout
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- begin{center}
- \end_layout
-
- \end_inset
-
-
- \begin_inset Tabular
- <lyxtabular version="3" rows="12" columns="3">
- <features>
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <column alignment="center" valignment="top" width="0pt">
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Mode/Quality
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Bit-rate
- \begin_inset Index
- status collapsed
-
- \begin_layout Plain Layout
- bit-rate
- \end_layout
-
- \end_inset
-
- (bps)
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Quality/description
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 0
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3,950
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Barely intelligible (mostly for comfort noise)
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 1
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5,750
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Very noticeable artifacts/noise, poor intelligibility
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 2
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7,750
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Very noticeable artifacts/noise, good intelligibility
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 3
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 9,800
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Artifacts/noise sometimes annoying
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 4
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 12,800
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Artifacts/noise usually noticeable
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 5
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 16,800
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Artifacts/noise sometimes noticeable
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 6
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 20,600
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Need good headphones to tell the difference
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 7
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 23,800
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Need good headphones to tell the difference
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 8
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 27,800
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Hard to tell the difference even with good headphones
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 9
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 34,200
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Hard to tell the difference even with good headphones
- \end_layout
-
- \end_inset
- </cell>
- </row>
- <row>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 10
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- 42,200
- \end_layout
-
- \end_inset
- </cell>
- <cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
- \begin_inset Text
-
- \begin_layout Plain Layout
- Completely transparent for voice, good quality music
- \end_layout
-
- \end_inset
- </cell>
- </row>
- </lyxtabular>
-
- \end_inset
-
-
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- end{center}
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Plain Layout
- \begin_inset Caption
-
- \begin_layout Plain Layout
- Quality versus bit-rate for the wideband encoder
- \begin_inset CommandInset label
- LatexCommand label
- name "tab:wideband-quality"
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset ERT
- status open
-
- \begin_layout Plain Layout
-
-
- \backslash
- clearpage
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset ERT
- status collapsed
-
- \begin_layout Plain Layout
-
-
- \backslash
- clearpage
- \end_layout
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- \start_of_appendix
- Sample code
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Sample-code"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- This section shows sample code for encoding and decoding speech using the
- Speex API.
- The commands can be used to encode and decode a file by calling:
- \family typewriter
-
- \begin_inset Newline newline
- \end_inset
-
- % sampleenc in_file.sw | sampledec out_file.sw
- \family default
-
- \begin_inset Newline newline
- \end_inset
-
- where both files are raw (no header) files encoded at 16 bits per sample
- (in the machine natural endianness).
- \end_layout
-
- \begin_layout Section
- sampleenc.c
- \end_layout
-
- \begin_layout Standard
- sampleenc takes a raw 16 bits/sample file, encodes it and outputs a Speex
- stream to stdout.
- Note that the packing used is
- \series bold
- not
- \series default
- compatible with that of speexenc/speexdec.
- \end_layout
-
- \begin_layout Standard
- \begin_inset CommandInset include
- LatexCommand lstinputlisting
- filename "sampleenc.c"
- lstparams "caption={Source code for sampleenc},label={sampleenc-source-code},numbers=left,numberstyle={\\footnotesize}"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Section
- sampledec.c
- \end_layout
-
- \begin_layout Standard
- sampledec reads a Speex stream from stdin, decodes it and outputs it to
- a raw 16 bits/sample file.
- Note that the packing used is
- \series bold
- not
- \series default
- compatible with that of speexenc/speexdec.
- \end_layout
-
- \begin_layout Standard
- \begin_inset CommandInset include
- LatexCommand lstinputlisting
- filename "sampledec.c"
- lstparams "caption={Source code for sampledec},label={sampledec-source-code},numbers=left,numberstyle={\\footnotesize}"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Jitter Buffer for Speex
- \end_layout
-
- \begin_layout Standard
- \begin_inset CommandInset include
- LatexCommand lstinputlisting
- filename "../speexclient/speex_jitter_buffer.c"
- lstparams "caption={Example of using the jitter buffer for Speex packets},label={example-speex-jitter},numbers=left,numberstyle={\\footnotesize}"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- IETF RTP Profile
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:IETF-draft"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset CommandInset include
- LatexCommand verbatiminput
- filename "draft-ietf-avt-rtp-speex-05-tmp.txt"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- Speex License
- \begin_inset CommandInset label
- LatexCommand label
- name "sec:Speex-License"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset CommandInset include
- LatexCommand verbatiminput
- filename "../COPYING"
-
- \end_inset
-
-
- \end_layout
-
- \begin_layout Standard
- \begin_inset Newpage newpage
- \end_inset
-
-
- \end_layout
-
- \begin_layout Chapter
- GNU Free Documentation License
- \end_layout
-
- \begin_layout Standard
- Version 1.1, March 2000
- \end_layout
-
- \begin_layout Standard
- Copyright (C) 2000 Free Software Foundation, Inc.
- 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted
- to copy and distribute verbatim copies of this license document, but changing
- it is not allowed.
-
- \end_layout
-
- \begin_layout Section*
- 0.
- PREAMBLE
- \end_layout
-
- \begin_layout Standard
- The purpose of this License is to make a manual, textbook, or other written
- document "free" in the sense of freedom: to assure everyone the effective
- freedom to copy and redistribute it, with or without modifying it, either
- commercially or noncommercially.
- Secondarily, this License preserves for the author and publisher a way
- to get credit for their work, while not being considered responsible for
- modifications made by others.
- \end_layout
-
- \begin_layout Standard
- This License is a kind of "copyleft", which means that derivative works
- of the document must themselves be free in the same sense.
- It complements the GNU General Public License, which is a copyleft license
- designed for free software.
- \end_layout
-
- \begin_layout Standard
- We have designed this License in order to use it for manuals for free software,
- because free software needs free documentation: a free program should come
- with manuals providing the same freedoms that the software does.
- But this License is not limited to software manuals; it can be used for
- any textual work, regardless of subject matter or whether it is published
- as a printed book.
- We recommend this License principally for works whose purpose is instruction
- or reference.
-
- \end_layout
-
- \begin_layout Section*
- 1.
- APPLICABILITY AND DEFINITIONS
- \end_layout
-
- \begin_layout Standard
- This License applies to any manual or other work that contains a notice
- placed by the copyright holder saying it can be distributed under the terms
- of this License.
- The "Document", below, refers to any such manual or work.
- Any member of the public is a licensee, and is addressed as "you".
- \end_layout
-
- \begin_layout Standard
- A "Modified Version" of the Document means any work containing the Document
- or a portion of it, either copied verbatim, or with modifications and/or
- translated into another language.
- \end_layout
-
- \begin_layout Standard
- A "Secondary Section" is a named appendix or a front-matter section of the
- Document that deals exclusively with the relationship of the publishers
- or authors of the Document to the Document's overall subject (or to related
- matters) and contains nothing that could fall directly within that overall
- subject.
- (For example, if the Document is in part a textbook of mathematics, a Secondary
- Section may not explain any mathematics.) The relationship could be a matter
- of historical connection with the subject or with related matters, or of
- legal, commercial, philosophical, ethical or political position regarding
- them.
- \end_layout
-
- \begin_layout Standard
- The "Invariant Sections" are certain Secondary Sections whose titles are
- designated, as being those of Invariant Sections, in the notice that says
- that the Document is released under this License.
- \end_layout
-
- \begin_layout Standard
- The "Cover Texts" are certain short passages of text that are listed, as
- Front-Cover Texts or Back-Cover Texts, in the notice that says that the
- Document is released under this License.
- \end_layout
-
- \begin_layout Standard
- A "Transparent" copy of the Document means a machine-readable copy, represented
- in a format whose specification is available to the general public, whose
- contents can be viewed and edited directly and straightforwardly with generic
- text editors or (for images composed of pixels) generic paint programs
- or (for drawings) some widely available drawing editor, and that is suitable
- for input to text formatters or for automatic translation to a variety
- of formats suitable for input to text formatters.
- A copy made in an otherwise Transparent file format whose markup has been
- designed to thwart or discourage subsequent modification by readers is
- not Transparent.
- A copy that is not "Transparent" is called "Opaque".
- \end_layout
-
- \begin_layout Standard
- Examples of suitable formats for Transparent copies include plain ASCII
- without markup, Texinfo input format, LaTeX input format, SGML or XML using
- a publicly available DTD, and standard-conforming simple HTML designed
- for human modification.
- Opaque formats include PostScript, PDF, proprietary formats that can be
- read and edited only by proprietary word processors, SGML or XML for which
- the DTD and/or processing tools are not generally available, and the machine-ge
- nerated HTML produced by some word processors for output purposes only.
- \end_layout
-
- \begin_layout Standard
- The "Title Page" means, for a printed book, the title page itself, plus
- such following pages as are needed to hold, legibly, the material this
- License requires to appear in the title page.
- For works in formats which do not have any title page as such, "Title Page"
- means the text near the most prominent appearance of the work's title,
- preceding the beginning of the body of the text.
- \end_layout
-
- \begin_layout Section*
- 2.
- VERBATIM COPYING
- \end_layout
-
- \begin_layout Standard
- You may copy and distribute the Document in any medium, either commercially
- or noncommercially, provided that this License, the copyright notices,
- and the license notice saying this License applies to the Document are
- reproduced in all copies, and that you add no other conditions whatsoever
- to those of this License.
- You may not use technical measures to obstruct or control the reading or
- further copying of the copies you make or distribute.
- However, you may accept compensation in exchange for copies.
- If you distribute a large enough number of copies you must also follow
- the conditions in section 3.
- \end_layout
-
- \begin_layout Standard
- You may also lend copies, under the same conditions stated above, and you
- may publicly display copies.
- \end_layout
-
- \begin_layout Section*
- 3.
- COPYING IN QUANTITY
- \end_layout
-
- \begin_layout Standard
- If you publish printed copies of the Document numbering more than 100, and
- the Document's license notice requires Cover Texts, you must enclose the
- copies in covers that carry, clearly and legibly, all these Cover Texts:
- Front-Cover Texts on the front cover, and Back-Cover Texts on the back
- cover.
- Both covers must also clearly and legibly identify you as the publisher
- of these copies.
- The front cover must present the full title with all words of the title
- equally prominent and visible.
- You may add other material on the covers in addition.
- Copying with changes limited to the covers, as long as they preserve the
- title of the Document and satisfy these conditions, can be treated as verbatim
- copying in other respects.
- \end_layout
-
- \begin_layout Standard
- If the required texts for either cover are too voluminous to fit legibly,
- you should put the first ones listed (as many as fit reasonably) on the
- actual cover, and continue the rest onto adjacent pages.
- \end_layout
-
- \begin_layout Standard
- If you publish or distribute Opaque copies of the Document numbering more
- than 100, you must either include a machine-readable Transparent copy along
- with each Opaque copy, or state in or with each Opaque copy a publicly-accessib
- le computer-network location containing a complete Transparent copy of the
- Document, free of added material, which the general network-using public
- has access to download anonymously at no charge using public-standard network
- protocols.
- If you use the latter option, you must take reasonably prudent steps, when
- you begin distribution of Opaque copies in quantity, to ensure that this
- Transparent copy will remain thus accessible at the stated location until
- at least one year after the last time you distribute an Opaque copy (directly
- or through your agents or retailers) of that edition to the public.
- \end_layout
-
- \begin_layout Standard
- It is requested, but not required, that you contact the authors of the Document
- well before redistributing any large number of copies, to give them a chance
- to provide you with an updated version of the Document.
-
- \end_layout
-
- \begin_layout Section*
- 4.
- MODIFICATIONS
- \end_layout
-
- \begin_layout Standard
- You may copy and distribute a Modified Version of the Document under the
- conditions of sections 2 and 3 above, provided that you release the Modified
- Version under precisely this License, with the Modified Version filling
- the role of the Document, thus licensing distribution and modification
- of the Modified Version to whoever possesses a copy of it.
- In addition, you must do these things in the Modified Version:
- \end_layout
-
- \begin_layout Itemize
- A.
- Use in the Title Page (and on the covers, if any) a title distinct from
- that of the Document, and from those of previous versions (which should,
- if there were any, be listed in the History section of the Document).
- You may use the same title as a previous version if the original publisher
- of that version gives permission.
- \end_layout
-
- \begin_layout Itemize
- B.
- List on the Title Page, as authors, one or more persons or entities responsible
- for authorship of the modifications in the Modified Version, together with
- at least five of the principal authors of the Document (all of its principal
- authors, if it has less than five).
- \end_layout
-
- \begin_layout Itemize
- C.
- State on the Title page the name of the publisher of the Modified Version,
- as the publisher.
- \end_layout
-
- \begin_layout Itemize
- D.
- Preserve all the copyright notices of the Document.
- \end_layout
-
- \begin_layout Itemize
- E.
- Add an appropriate copyright notice for your modifications adjacent to
- the other copyright notices.
- \end_layout
-
- \begin_layout Itemize
- F.
- Include, immediately after the copyright notices, a license notice giving
- the public permission to use the Modified Version under the terms of this
- License, in the form shown in the Addendum below.
- \end_layout
-
- \begin_layout Itemize
- G.
- Preserve in that license notice the full lists of Invariant Sections and
- required Cover Texts given in the Document's license notice.
- \end_layout
-
- \begin_layout Itemize
- H.
- Include an unaltered copy of this License.
- \end_layout
-
- \begin_layout Itemize
- I.
- Preserve the section entitled "History", and its title, and add to it an
- item stating at least the title, year, new authors, and publisher of the
- Modified Version as given on the Title Page.
- If there is no section entitled "History" in the Document, create one stating
- the title, year, authors, and publisher of the Document as given on its
- Title Page, then add an item describing the Modified Version as stated
- in the previous sentence.
- \end_layout
-
- \begin_layout Itemize
- J.
- Preserve the network location, if any, given in the Document for public
- access to a Transparent copy of the Document, and likewise the network
- locations given in the Document for previous versions it was based on.
- These may be placed in the "History" section.
- You may omit a network location for a work that was published at least
- four years before the Document itself, or if the original publisher of
- the version it refers to gives permission.
- \end_layout
-
- \begin_layout Itemize
- K.
- In any section entitled "Acknowledgements" or "Dedications", preserve the
- section's title, and preserve in the section all the substance and tone
- of each of the contributor acknowledgements and/or dedications given therein.
- \end_layout
-
- \begin_layout Itemize
- L.
- Preserve all the Invariant Sections of the Document, unaltered in their
- text and in their titles.
- Section numbers or the equivalent are not considered part of the section
- titles.
- \end_layout
-
- \begin_layout Itemize
- M.
- Delete any section entitled "Endorsements".
- Such a section may not be included in the Modified Version.
- \end_layout
-
- \begin_layout Itemize
- N.
- Do not retitle any existing section as "Endorsements" or to conflict in
- title with any Invariant Section.
-
- \end_layout
-
- \begin_layout Standard
- If the Modified Version includes new front-matter sections or appendices
- that qualify as Secondary Sections and contain no material copied from
- the Document, you may at your option designate some or all of these sections
- as invariant.
- To do this, add their titles to the list of Invariant Sections in the Modified
- Version's license notice.
- These titles must be distinct from any other section titles.
- \end_layout
-
- \begin_layout Standard
- You may add a section entitled "Endorsements", provided it contains nothing
- but endorsements of your Modified Version by various parties--for example,
- statements of peer review or that the text has been approved by an organization
- as the authoritative definition of a standard.
- \end_layout
-
- \begin_layout Standard
- You may add a passage of up to five words as a Front-Cover Text, and a passage
- of up to 25 words as a Back-Cover Text, to the end of the list of Cover
- Texts in the Modified Version.
- Only one passage of Front-Cover Text and one of Back-Cover Text may be
- added by (or through arrangements made by) any one entity.
- If the Document already includes a cover text for the same cover, previously
- added by you or by arrangement made by the same entity you are acting on
- behalf of, you may not add another; but you may replace the old one, on
- explicit permission from the previous publisher that added the old one.
- \end_layout
-
- \begin_layout Standard
- The author(s) and publisher(s) of the Document do not by this License give
- permission to use their names for publicity for or to assert or imply endorseme
- nt of any Modified Version.
-
- \end_layout
-
- \begin_layout Section*
- 5.
- COMBINING DOCUMENTS
- \end_layout
-
- \begin_layout Standard
- You may combine the Document with other documents released under this License,
- under the terms defined in section 4 above for modified versions, provided
- that you include in the combination all of the Invariant Sections of all
- of the original documents, unmodified, and list them all as Invariant Sections
- of your combined work in its license notice.
- \end_layout
-
- \begin_layout Standard
- The combined work need only contain one copy of this License, and multiple
- identical Invariant Sections may be replaced with a single copy.
- If there are multiple Invariant Sections with the same name but different
- contents, make the title of each such section unique by adding at the end
- of it, in parentheses, the name of the original author or publisher of
- that section if known, or else a unique number.
- Make the same adjustment to the section titles in the list of Invariant
- Sections in the license notice of the combined work.
- \end_layout
-
- \begin_layout Standard
- In the combination, you must combine any sections entitled "History" in
- the various original documents, forming one section entitled "History";
- likewise combine any sections entitled "Acknowledgements", and any sections
- entitled "Dedications".
- You must delete all sections entitled "Endorsements."
- \end_layout
-
- \begin_layout Section*
- 6.
- COLLECTIONS OF DOCUMENTS
- \end_layout
-
- \begin_layout Standard
- You may make a collection consisting of the Document and other documents
- released under this License, and replace the individual copies of this
- License in the various documents with a single copy that is included in
- the collection, provided that you follow the rules of this License for
- verbatim copying of each of the documents in all other respects.
- \end_layout
-
- \begin_layout Standard
- You may extract a single document from such a collection, and distribute
- it individually under this License, provided you insert a copy of this
- License into the extracted document, and follow this License in all other
- respects regarding verbatim copying of that document.
-
- \end_layout
-
- \begin_layout Section*
- 7.
- AGGREGATION WITH INDEPENDENT WORKS
- \end_layout
-
- \begin_layout Standard
- A compilation of the Document or its derivatives with other separate and
- independent documents or works, in or on a volume of a storage or distribution
- medium, does not as a whole count as a Modified Version of the Document,
- provided no compilation copyright is claimed for the compilation.
- Such a compilation is called an "aggregate", and this License does not
- apply to the other self-contained works thus compiled with the Document,
- on account of their being thus compiled, if they are not themselves derivative
- works of the Document.
- \end_layout
-
- \begin_layout Standard
- If the Cover Text requirement of section 3 is applicable to these copies
- of the Document, then if the Document is less than one quarter of the entire
- aggregate, the Document's Cover Texts may be placed on covers that surround
- only the Document within the aggregate.
- Otherwise they must appear on covers around the whole aggregate.
- \end_layout
-
- \begin_layout Section*
- 8.
- TRANSLATION
- \end_layout
-
- \begin_layout Standard
- Translation is considered a kind of modification, so you may distribute
- translations of the Document under the terms of section 4.
- Replacing Invariant Sections with translations requires special permission
- from their copyright holders, but you may include translations of some
- or all Invariant Sections in addition to the original versions of these
- Invariant Sections.
- You may include a translation of this License provided that you also include
- the original English version of this License.
- In case of a disagreement between the translation and the original English
- version of this License, the original English version will prevail.
- \end_layout
-
- \begin_layout Section*
- 9.
- TERMINATION
- \end_layout
-
- \begin_layout Standard
- You may not copy, modify, sublicense, or distribute the Document except
- as expressly provided for under this License.
- Any other attempt to copy, modify, sublicense or distribute the Document
- is void, and will automatically terminate your rights under this License.
- However, parties who have received copies, or rights, from you under this
- License will not have their licenses terminated so long as such parties
- remain in full compliance.
-
- \end_layout
-
- \begin_layout Section*
- 10.
- FUTURE REVISIONS OF THIS LICENSE
- \end_layout
-
- \begin_layout Standard
- The Free Software Foundation may publish new, revised versions of the GNU
- Free Documentation License from time to time.
- Such new versions will be similar in spirit to the present version, but
- may differ in detail to address new problems or concerns.
- See http://www.gnu.org/copyleft/.
- \end_layout
-
- \begin_layout Standard
- Each version of the License is given a distinguishing version number.
- If the Document specifies that a particular numbered version of this License
- "or any later version" applies to it, you have the option of following
- the terms and conditions either of that specified version or of any later
- version that has been published (not as a draft) by the Free Software Foundatio
- n.
- If the Document does not specify a version number of this License, you
- may choose any version ever published (not as a draft) by the Free Software
- Foundation.
- \end_layout
-
- \begin_layout Standard
- \begin_inset CommandInset index_print
- LatexCommand printindex
-
- \end_inset
-
-
- \end_layout
-
- \end_body
- \end_document
|