Sounds Logical
Send Page To a Friend

[M-Pack 1 index]

M-Pack 1 WAV file processing toolbox

Introduction

Supported WAV formats

PCM WAV file format essentials

Scaling laws

Dither

Sample rate conversion

Bugs and Limitations

References

Introduction

Table Of Contents

M-Pack 1 from Sounds Logical is a collection of MATLAB® (m- and mex-) functions designed to facilitate processing of PCM WAV files of arbitrary size (see index for full function listing). These significantly extend MATLAB's built-in WAV file handling functionality in the following key ways:

WAV file I/O

The M-Pack WAV file reading (wavin) and writing (wavout) functions incorporate the following extended features:

    1. support for arbitrary bit resolution between 2 and 32, including a distinction between valid bits (i.e. the actual bit resolution of the data), and container bits (the wordsize of the "container" use to store each sample)
    2. support for Microsoft's WAV_FORMAT_EXTENSIBLE file format for multichannel PCM WAV files including the associated speaker layout feature and the explicit removal of ambiguities for bit resolutions in excess of 16
    3. includes full range of dithering options when re-quantizing (e.g. additive, first-order high-pass noise-shaping, user-defined FIR noise-shaping)
    4. supports chunk-by-chunk WAV reading and writing enabling easy construction of functions to perform "disk-based" editing of WAV files (e.g. splitting large WAV files into smaller ones, concatenating multiple WAV files together, etc)

WAV file "disk-based" processing

The M-Pack includes a range of WAV file processing functions which take advantage of the chunk-by-chunk capabilities of the core WAV file I/O functions (wavin, wavout) to perform "disk-based" processing of arbitrary large WAV files without incurring prohibitive RAM consumption in the MATLAB workspace. For example, the waveffect function presents a straightforward yet flexible interface for applying an arbitrary number of user-defined processing functions to the WAV file (again, in a chunk-by-chunk manner).

WAV file sample rate conversion

The M-Pack includes the wavresample function, a pro-audio quality WAV file sample rate converter which supports all sample rates commonly used in digital audio. Again, this function operates chunk-by-chunk, enabling the conversion of arbitrarily long WAV files.

WAV file playback

The M-Pack includes the winplaywav function (and a convenient GUI wrap-around playwavgui function) which enables triggering of WAV file playback (via the Windows default soundcard) from within the MATLAB environment.

Refer to the individual function Help pages for a detailed description of each function in the M-Pack. The M-Pack also contains the following set of m-scripts containing example usage of the various functions:

xmplaudiodither

demonstrates the use of audiodither

xmplaudioresample

demonstrates the use of audioresample

xmplwaveffect

demonstrates the use of waveffect

xmplwavnormalize

demonstrates the use of wavnormalize

xmplwavout

demonstrates the use of wavout (and wavin)

xmplwavpeakfind

demonstrates the use of wavpeakfind

xmplwavresample

demonstrates the use of wavresample

These make use of the example WAV files located in the WAVfiles subdirectory of the M-Pack root.

The M-Pack also includes the M-Utilities suite of utility functions (common to all M-Packs). Refer to the individual function Help pages for a detailed description of each utility function.

The following sections described in detail the supported WAV formats and the techniques employed in the (re-)quantization, dithering, and sample rate conversion functions used throughout the toolbox.

Supported WAV formats

Table Of Contents

The generic WAV file structure (see refs [1] ,[2] ) allows for a wide range of formats (identified by the FormatTag within the file header). Many of these formats are proprietary (to 3rd-parties), involving various compression and storage methods. However, the simplest and most popular format (and the one which is usually implied when using the term "WAV file") is the (uncompressed) PCM (Pulse Code Modulation) format whereby the signal is quantized into a number of bins (or bits), linearly distributed over the amplitude range.

This M-Pack currently supports two such formats (i.e. uncompressed, linear, integer-storage PCM), denoted as follows:

    · WAV_FORMAT_PCM
    · WAV_FORMAT_EXTENSIBLE (with PCM)

WAV_FORMAT_PCM (from Microsoft) is the most widely used format, supported by many applications. WAV_FORMAT_EXTENSIBLE (also from Microsoft, see ref [3]) is a relatively new extension to this format, designed specifically for multichannel data with prescribed speaker arrangements (e.g. "5.1 Surround Sound" etc).

PCM WAV file format essentials

Table Of Contents

For a detailed overview of the WAV file format, refer to refs [1] ,[2] . You may also find it useful to inspect the source code for the wavin and wavout functions.

The following rules are specific to linear PCM WAV files, and are embodied in the wavin and wavout functions.

    1. each sample is always written "left justified" within its container (only relevant when ValidBits is less than ContainerBits). The values in the remaining "right-hand" bits are set to zero.

    2. for the case of an 8-bit container (i.e. ValidBits between 2 and 8 inclusive) each sample is written as an unsigned byte (8 bits uchar)

    3. for the case of a 24-bit container, each sample is written as a sequence of three bytes (two 8-bit unsigned chars followed by an 8-bit signed char). The bytes are ordered starting with the least significant, ending with the most significant. This is known as the "24-bit packed forma

    4. in the case of a 16-bit container, each sample is written as a signed short integer (16-bits).

    5. in the case of a 32-bit container, each sample is written as a signed long integer (32-bits).

Notes:

    1. in addition to its main purpose of incorporating the ability to prescribe the multichannel speaker allocation, the WAV_FORMAT_EXTENSIBLE format also clears up an ambiguity inherent to the WAV_FORMAT_PCM format when storing a sample with a given bit resolution in a "container" of a different (larger) size. For example, if the data is to be quantised using 20-bits, this will typically be stored in a 24-bit "container" (since WAV files always use multiples of 8-bits when storing the data). In the WAV_FORMAT_PCM format, there is an ambiguity associated with the BitsPerSample field. Some applications use the BitsPerSample field for its originally- intended purpose, i.e. to represent the number of "valid" bits per sample, 20 in this example. The container size would then be inferred from the BlockAlign field. However, other applications assume that BitsPerSample represents the "container" size i.e. 24 in this example. This ambiguity is cleared up in the WAV_FORMAT_EXTENSIBLE format whereby a new field is introduced, namely, the ValidBitsPerSample field to represent the valid bits (and the BitsPerSample field is then exclusively reserved for the container size). Note that the ambiguity never arose when working exclusively with 8- or 16-bit data. It only emerged when the newer (larger) bit-resolutions appeared.

    2. Even though the FormatTag is always the same for WAV_FORMAT_EXTENSIBLE, there is full flexibility in the actual data storage format. This is facilitated via the SubFormat field which enables any format (e.g. proprietary compressed formats) to be specified. Currently this toolbox supports only PCM data storage, specified by a SubFormat field value which corresponds to the Microsoft GUID for PCM data, i.e. KSDATAFORMAT_SUBTYPE_PCM (see ref [3]).

Scaling laws

Table Of Contents

Floating-point to integer conversion

When converting from floating-point to fixed-point representation (e.g. for transmission over a digital line, or for storage in a PCM format WAV file), the floating-point data is (re)-quantized on to the integer range of the given fixed-point representation. For linear PCM quantization, the original signal (considered continuous) is mapped on to a set of evenly-spaced discrete values, or "steps", as depicted in the following sketch:

The "height" , Q , of each step (i.e. the quantization step size) is given by the total input range divided by the number of (vertical) steps:

with the symbols defined as follows:

Symbol

Definition

quantization bit depth (i.e. "valid bits" used in the WAV file)

quantization step size, often called LSB since a change in input level of Q corresponds to a change in the LSB (Least Significant Bit) of binary coded output

amplitude range of data sent to the quantizer

According to convention, digital audio samples are mapped over the floating-point range -1:1. Hence, the quantization step size becomes:

For example, 16-bit quantization (of floating-point data over the range -1:+1), will have a quantization step size of 1/32767.5.

There are numerous approaches to performing the conversion from floating-point to integer (e.g. algorithms based on rounding, truncation, etc). However, the most natural method for creating a signed integer is to use the floor operation since this directly maps the data on to the required range:

with the symbols defined as follows:

Symbol

Definition

(pre-quantized) floating-point sample (within nominal range:-1:+1)

signed integer represention of the (quantized) sample


whereby the nominal floating-point range maps on to the appropriate signed-integer range as follows:

For example, the range mapping for 16 bit quantization is given by:

The wavout function implements this floor mapping. By default, the scale option is set to `full' which implies that the data lies within the valid floating-point range (-1:+1). Any floating-point number outside of this range will be "clipped" (and, even worse, "folded over") when converting to fixed-point (and writing to the file). Hence it is important to ensure that the data is constrained to lie within -1:1 before calling the wavout function (with the default `full' setting for the scale option). The scale option can also be set to automatically normalize the input data to the required -1:+1 range before the conversion.

In the case where the container size is the same as the bit-depth, the signed integer values from the floor operation are written directly to the WAV file (except for the 8-bit case, where 128 is added to each value to produce an unsigned integer, as required by the standard; and the 24-bit case, where each value is converted to three unsigned 8-bit words). In the case where the container size is larger than the bit-depth, each integer sample must be shifted to the left within its container , according to the left-justification WAV standard. This is achieved by multiplying each sample by the appropriate factor:

with the symbols defined as follows:

Symbol

Definition

container size in bits (always an integer multiple of 8, equal to or greater than the quantization bit depth)

left-justified signed integer representing the quantized sample within its container (where the container size is larger than the quantization bit depth)

The wavout function implements this left-justification whenever the user-specified container size is larger than the bit depth.

Integer to floating-point conversion

The converison going the other way from signed-integer to floating-point (nominal range -1:1) is summarized as follows:

where y represents the quantized data re-scaled as a floating-point number in the range -1:+1.

The wavin function implements this mapping (with the default `full' setting for the scale option).

The conversion reduces to the following when the container size is equal to the bit depth:

Simplified integer to floating-point conversion

The conversion presented above (as implemented in the wavin function) requires an addition as well as a multiplication, per sample. This is unavoidable if the signed integer range is to be mapped identically on to the -1:+1 output range. However, a common approximation (e.g. as implemented in the wavread function from The MathWorks) is to use only a multiplication:

where the floating-point range is slightly reduced, as follows:

Dither

Table Of Contents

In any operation where the bit-depth of the quantization is reduced (e.g. when re-saving a 24-bit WAV file as a 16-bit WAV file), then it is generally recommended to apply dither to reduce the audible non-linear amplitude distortion caused by the re-quantization. In essence, the application of dither amounts to adding low-level random noise to the signal. By selecting the noise amplitude comparable to the quantization step size, the effect of the dither is to linearize the input-output characteristic of the quantizer, thereby increasing the effective resolution, albeit with the addition of audible noise. The perceived additional noise can be minimized by use of an appropriately-designed noise-shaping filter incorporated within the dithering process.

Basic dither relations

The input-output relationship for the quantizer can be expressed as follows

(This is a combination of the floating-point-to-integer-to-floating-point conversions presented earlier.) The following figure illustrates the mapping for an 8-bit quantizer (the fact that the midpoint of the mapping is vertical rather than horizontal gives it the name "midriser"):

As evident from the plot, the floor function renders this relationship nonlinear, leading to audible distortion, particularly for signals of sufficiently low level to be comparable with the quantizer step size. The purpose of the dither is to reduce the effects of this nonlinearity. Note that although the quantizer performs a nonlinear deterministic action on the input, a convenient simplification when assessing the noise properties associated with the quantizer is to assume that, to first order, it has the effect of simply adding a random noise:

where the output error, q, is considered to be uniformly-distributed over the range [-Q/2 : +Q/2] and thereby has a mean-square value of:

Additive dither

The simplest dithering scheme, as sketched below, is to add a random noise, d, to the signal before entering the quantizer. To be effective, the dither must be statistically independent from the signal.


The most basic dither is typically generated from a pseudo-random sequence, uniformly distributed over the range [-Q/2 : +Q/2] i.e. with a peak-to-peak amplitude equal to Q (or to 1 LSB). This type of dither is usually called "rectangular" owing to its uniform probability distribution function (pdf).

To first order, the output error of the rectangular-dithered quantizer can be considered as a stochastic noise with a mean-square value of:

due to the combination of the intrinsic quantizer error plus the dither.

The wavout (and the audiodither) function includes the option to apply this basic additive rectangular dither by setting the Bits.DitherMethod input argument field equal to 1. The dither amplitude can be arbitrarily adjusted via the Bits.DitherGain field (in units of LSB, default value of 1).

Triangular "highpass" additive dither

Another commonly used dither signal is that with a triangular pdf and a peak-to-peak amplitude of 2 LSB. Its use is motivated by the fact that it is theoretically optimal, and, moreover is simple to generate in the digital domain by summing (or differencing) two rectangular dither signals (each with a peak-to-peak amplitude of 1 LSB). In fact, the preferred method in audio applications is to create the triangular dither sequence by differencing successive values of a rectangular dither sequence. This results in an automatic highpass filtering of the dither signal, which, depending on the sample rate, can result in a reduction in the perceived additive noise without affecting the underlying performance of the dither on the quantizer.

To first order, the output error of the triangular-dithered quantizer can be considered as a stochastic noise with a mean-square value of:

due to the combination of the intrinsic quantizer error plus the dither.

The wavout (and the audiodither) function includes the option to apply this highpass triangular additive dither by setting the Bits.DitherMethod input argument field equal to 2. The dither amplitude can be arbitrarily adjusted via the Bits.DitherGain field (in units of 2 LSB).

Dither with noise-shaping

The perceived noise due to the dither can be reduced by employing an error-feedback loop around the quantizer, with an appropriately designed "noise-shaping" filter in the feedback path. The key to the technique is to take psychoacoustical advantage of the human hearing curve by designing the noise-shaping filter to be, in effect, the "inverse" of this curve, thereby "moving" the noise into less audible regions of the spectrum.

The general structure of the noise-shaping quantizer is sketched below where H(z) represents the noise-shaping filter (adapted from ref [6] ):

Note that the feedback path incorporates a single unit delay, irrespective of the filter design. This is to eliminate the possibility of an algebraic loop (which would render the network non-computable).

Assuming triangular (2 LSB) dither, then, to first order, the output error of the quantizer with noise-shaping feedback can be considered as a stochastic noise with a mean-square value of (adapted from ref [6] ):


where the quantity

represents the frequency-dependent noise gain factor due to the feedback noise-shaping filter. The other symbols are defined as follows:

Symbol

Definition

frequency (in hertz)

sample rate (in hertz)

the complex variable

Again, the total residual noise is due to the combination of the intrinsic quantizer error plus the dither, but this time scaled by the feedback filter response. Therefore the total noise depends on the design of the filter.

Recalling that the central purpose of the noise-shaping filter is to reduce the perceived noise, then the quantity of primary interest is the weighted noise power which is computed by taking into account the human hearing threshold curve, denoted W(f), i.e:

The goal of the design of the noise-shaping filter is to find a filter H(z) which minimizes the above integral. Note that generally speaking, any practical filter which lowers the weighted noise, will tend to increase the unweighted noise, so there is usually a tradeoff to be performed when choosing the filter.

Reference [6] presents a range of filters, both FIR (non-recursive) and IIR (recursive), designed to minimize the weighted noise (based on their modified E-weighting curves to represent the human audibility function).

The FIR (non-recursive) filters from ref [6] have been implemented here (since these were found to yield flatter noise spectra than the IIR filters). Specifically, the wavout (and the audiodither) function includes the feedback noise-shaping algorithm for two classes of filter design:

    1. Simple delay (i.e. H(z)=1). Selected by setting the Bits.DitherMethod input argument field equal to 3.

    2. Arbitrary FIR filter. Selected by setting the Bits.DitherMethod input argument field equal to 4. Any FIR filter may be specified via the Bits.NoiseShapeFIR field. The default filter is the following five-coefficient FIR filter (specified in ref [6] ): [2.033 -2.165 1.959 -1.590 0.6149]

In both cases, highpass triangular additive dither is used. The dither amplitude can be arbitrarily adjusted via the Bits.DitherGain field (in units of 2 LSB). The gain of the feedback loop can be arbitrarily adjusted via the Bits.NoiseShapeGain field (default value of 1).

The actual MATLAB implementation of the noise-shaper can be read from the source m-code in the wavout (and the audiodither) function. However, for sake of clarification, the core of the algorithm can be summarized as follows:

Initialize the feedback variable

For current time step, k, do the following:

 

Add dither to the current sample. The dither consists of the additive noise term minus the noise-shaping filtered feedback term

Note: evaluation of the feedback term involves the computation of a digital filter acting on the historical "error" stream (e)

 

Convert to integer (and write to WAV file)

 

Re-scale to a floating-point number within -1:+1

 

Use this to generate the feedback term for the next time step

Note: uses the result of the filter computation from above (no need to compute it again)

Advance to the next time step

Note that in all cases where dither is invoked, the wavout (and the audiodither) function incorporates post-dither clipping to ensure that the -1:+1 floating-point range is not inadvertently exceeded due to the addition of the dither.

Refer to the wavout usage examples (contained in the m-script file entitled xmplwavout.m) for demonstrations of the various types of dither in a practical audio application.

Refer to the audiodither usage examples (contained in the m-script file entitled xmplaudiodither.m) for a graphical illustration of the various types of dither.

Sample rate conversion

Table Of Contents

The wavresample function enables the sample rate of the WAV file to be converted between all sample rates commonly-employed in digital audio. The sample rates currently supported are as follows:

Sample rates supported by the wavresample function

8000 Hz

11025 Hz

16000 Hz

22050 Hz

32000 Hz

44100 Hz

48000 Hz

96000 Hz

192000 Hz

The converters are implemented using "hard-wired" FIR filters to provide the low-pass anti-aliasing (for downsampling) and anti-imaging (for upsampling) protection. The filters for all converters have been designed according to the following specification:

Design specifications for FIR filters used in all sample-rate converters

Lowpass edge frequency

Nyquist rate associated with given conversion: i.e. half the output sample-rate for downsampling, and half the input sample-rate for upsampling.

stopband attenuation

>=120 dB

passband attenuation

<=0.01 dB

Moreover, for computational efficiency, the filters are implemented in multi-stage polyphase form, whereby a given conversion ratio is achieved via a cascade of converters, within each of which the filtering is always carried out at the lower sample rate. The composite (overall) effect of the multiple stages achieves the desired conversion ratio according to the overall lowpass specification in the previous table. The detailed breakdown of the multi-stage design for each converter is summarized as follows:

Input (Hz)

Output (Hz)

Multi-stage conversion factors

FIR filter lengths (per stage)

8000

11025

(2/1)*(3/1)*(3/1)*(7/1)*(7/640)=882/640

475, 33, 19, 43, 41

8000

16000

2/1

431

8000

22050

(2/1)*(3/1)*(3/1)*(7/1)*(7/320)=882/320

475, 33, 19, 43, 41

8000

32000

(2/1)*(2/1)=4/1

449, 19

8000

44100

(2/1)*(3/1)*(3/1)*(7/1)*(7/160)=882/160

475, 33, 19, 43, 41

8000

48000

(2/1)*(3/1)=6/1

449, 31

8000

96000

(2/1)*(6/1)=12/1

449, 65

8000

192000

(2/1)*(3/1)*(4/1)=24/1

461, 33, 25

11025

8000

(640/7)*(1/7)*(1/3)*(1/3)*(1/2)=640/882

41, 43, 19, 33, 475

11025

16000

(2/1)*(2/1)*(2/1)*(2/1)*(5/1)*(8/441)=640/441

479, 21, 13, 9, 31, 47

11025

22050

2/1

431

11025

32000

(2/1)*(2/1)*(2/1)*(2/1)*(2/1)*(5/1)*(8/441)

=1280/441

483, 21, 13, 9, 9, 29, 47

11025

44100

(2/1)*(2/1)=4/1

449, 19

11025

48000

(2/1)*(2/1)*(2/1)*(2/1)*(5/1)*(8/147)=640/147

479, 21, 13, 9, 31, 47

11025

96000

(2/1)*(2/1)*(2/1)*(2/1)*(2/1)*(5/1)*(8/147)

=1280/147

483, 21, 13, 9, 9, 29, 47

11025

192000

(2/1)*(2/1)*(2/1)*(2/1)*(2/1)*(2/1)*(5/1)*(8/147)

=2560/147

487, 21, 13, 9, 9, 7, 29, 49

16000

8000

1/2

431

16000

11025

(441/8)*(1/5)*(1/2)*(1/2)*(1/2)*(1/2)=441/640

47, 31, 9, 13, 21, 479

16000

22050

(2/1)*(3/1)*(3/1)*(7/1)*(7/640)=882/640

475, 33, 19, 43, 41

16000

32000

2/1

431

16000

44100

(2/1)*(3/1)*(3/1)*(7/1)*(7/320)=882/320

475, 33, 19, 43, 41

16000

48000

(2/1)*(3/2)=6/2

449, 31

16000

96000

(2/1)*(3/1)=6/1

449, 31

16000

192000

(2/1)*(6/1)=12/1

449, 65

22050

8000

(320/7)*(1/7)*(1/3)*(1/3)*(1/2)=320/882

41, 43, 19, 33, 475

22050

11025

1/2

431

22050

16000

(640/7)*(1/7)*(1/3)*(1/3)*(1/2)=640/882

41, 43, 19, 33, 475

22050

32000

(2/1)*(2/1)*(2/1)*(2/1)*(5/1)*(8/441)=640/441

479, 21, 13, 9, 31, 47

22050

44100

2/1

431

22050

48000

(2/1)*(2/1)*(2/1)*(5/1)*(8/147)=320/147

475, 21, 13, 33, 49

22050

96000

(2/1)*(2/1)*(2/1)*(2/1)*(5/1)*(8/147)=640/147

479, 21, 13, 9, 31, 47

22050

192000

(2/1)*(2/1)*(2/1)*(2/1)*(2/1)*(5/1)*(8/147)

=1280/147

483, 21, 13, 9, 9, 29, 47

32000

8000

(1/2)*(1/2)=1/4

19, 449

32000

11025

(441/8)*(1/5)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)

=441/1280

47, 29, 9, 9, 13, 21, 483

32000

16000

1/2

431

32000

22050

(441/8)*(1/5)*(1/2)*(1/2)*(1/2)*(1/2)=441/640

47, 31, 9, 13, 21, 479

32000

44100

(2/1)*(3/1)*(3/1)*(7/1)*(7/640)=882/640

475, 33 , 19, 43, 41

32000

48000

(2/1)*(3/4)=6/4

449, 31

32000

96000

(2/1)*(3/2)=6/2

449, 31

32000

192000

(2/1)*(3/1)=6/1

449, 31

44100

8000

(160/7)*(1/7)*(1/3)*(1/3)*(1/2)=160/882

41, 43, 19, 33, 475

44100

11025

(1/2)*(1/2)=1/4

19, 449

44100

16000

(320/7)*(1/7)*(1/3)*(1/3)*(1/2)=320/882

41, 43, 19, 33, 475

44100

22050

1/2

431

44100

32000

(640/7)*(1/7)*(1/3)*(1/3)*(1/2)=640/882

41, 43, 19, 33, 475

44100

48000

(2/1)*(2/1)*(5/1)*(8/147)=160/147

469, 21, 37, 49

44100

96000

(2/1)*(2/1)*(2/1)*(5/1)*(8/147)=320/147

475, 21, 13, 33, 49

44100

192000

(2/1)*(2/1)*(2/1)*(2/1)*(5/1)*(8/147)=640/147

479, 21, 13, 9, 31, 47

48000

8000

(1/3)*(1/2)=1/6

31, 449

48000

11025

(147/8)*(1/5)*(1/2)*(1/2)*(1/2)*(1/2)=147/640

47, 31, 9, 13, 21, 479

48000

16000

(2/3)*(1/2)=2/6

31, 449

48000

22050

(147/8)*(1/5)*(1/2)*(1/2)*(1/2)=147/320

49, 33, 13, 21, 475

48000

32000

(4/3)*(1/2)=4/6

31, 449

48000

44100

(147/8)*(1/5)*(1/2)*(1/2)=147/160

49, 37, 21, 469

48000

96000

2/1

431

48000

192000

(2/1)*(2/1)=4/1

449, 19

96000

8000

(1/6)*(1/2)=1/12

65, 449

96000

11025

(147/8)*(1/5)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)

=147/1280

47, 29, 9, 9, 13, 21, 483

96000

16000

(1/3)*(1/2)=1/6

31, 449

96000

22050

(147/8)*(1/5)*(1/2)*(1/2)*(1/2)*(1/2)=147/640

47, 31, 9, 13, 21, 479

96000

32000

(2/3)*(1/2)=2/6

31, 449

96000

44100

(147/8)*(1/5)*(1/2)*(1/2)*(1/2)=147/320

49, 33, 13, 21, 475

96000

48000

1/2

431

96000

192000

2/1

431

192000

8000

(1/4)*(1/3)*(1/2)=1/24

25, 33, 461

192000

11025

(147/8)*(1/5)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)

=147/2560

49, 29, 7, 9, 9, 13, 21, 487

192000

16000

(1/6)*(1/2)=1/12

65, 449

192000

22050

(147/8)*(1/5)*(1/2)*(1/2)*(1/2)*(1/2)*(1/2)

=147/1280

47, 29, 9, 9, 13, 21, 483

192000

32000

(1/3)*(1/2)=1/6

31, 449

192000

44100

(147/8)*(1/5)*(1/2)*(1/2)*(1/2)*(1/2)=147/640

47, 31, 9, 13, 21, 479

192000

48000

(1/2)*(1/2)=1/4

19, 449

192000

96000

1/2

431

The converters are configured for chunk-by-chunk processing so that the resampling can be performed (by wavresample) on arbitrarily large WAV files without incurring excessive memory usage. An alternative function, audioresample, utilizes the same converters for single-chunk RAM-based conversion, suitable for converting a single chunk of data in the MATLAB workspace. This function is similar to the resample function in the Signal Processing Toolbox (from The MathWorks).

In both wavresample and audioresample, the delays introduced by the filtering are automatically removed so there is no need to separately compensate for the delays.

The wavresample function also includes all the re-quantization and dithering options available in the wavout function, so the WAV file can be completely re-sampled (in terms of sample rate and bit depth) with a single command.

Bugs and Limitations

Table Of Contents

Sounds Logical is aware of the following bugs and limitations with M-Pack 1.

Bug or limitation

Workaround

On some Windows systems, the wavresample function may have a memory leak in MATLAB when upsampling a WAV file i.e. when increasing its sample rate (there is no leak when downsampling).

No known workaround within MATLAB. You may wish to consider using Sounds Logical's ReSample product for performing standalone WAV file sample-rate (and bit-depth) conversions outside of the MATLAB environment.

   

References

Table Of Contents

For further information on the WAV file structure (requires internet access):

[1] CCRMA WAV format description
[2] Sonic Spot WAV format description

For further information on the WAV_FORMAT_EXTENSIBLE format (requires internet access):

[3] "Enhanced Audio Formats For Multi-Channel Configurations And High Bit Resolution" Windows Multimedia Group Microsoft Corporation, 1999.

For further information on quantization, dithering, and noise-shaping:

[4] "Dither in Digital Audio", John Vanderkooy and Stanley P. Lipshitz, J. Audio Eng. Soc., Vol. 35, No. 12, December 1987.

[5] "Quantization and Dither: A Theoretical Survey", Stanley P. Lipshitz, Robert A. Wannamaker, and John Vanderkooy, J. Audio Eng. Soc., Vol. 40, No. 5, May 1992.

[6] "Minimally Audible Noise Shaping", Stanley P. Lipshitz, John Vanderkooy, and Robert A. Wannamaker, J. Audio Eng. Soc., Vol. 39, No. 11, November 1991.

For a complete treatment of multi-stage polyphase sample rate conversion architectures (as implemented in wavresample and audioresample):

[7] "Multirate Digital Signal Processing", Ronald E Crochiere and Lawrence R
Rabiner, Prentice Hall, 1983.

For "educational" examples of implementation of the polyphase form for sample rate conversion (requires internet access):

[8] Sounds Logical WaveWarp example DrawingBoard: polyphase resampling example 1

[9] Sounds Logical WaveWarp example DrawingBoard: polyphase resampling example 2

Table Of Contents

Send Page To a Friend

home - news - products - store - support - site map - company info
© 2007 Sounds Logical. All rights reserved.
Sounds Logical
legal notice - privacy statement