Sounds Logical
Send Page To a Friend

Table Of Contents Previous Page Next Page

[M-Pack 1 overview]

M-Pack 1: WAV file processing: MATLAB function reference

wavresample
Version 1.2 Requires MATLAB 6.0 (R12) or later

Performs sample rate and/or bit-depth conversion on the input WAV file and saves the results in the output WAV file. The computations are performed chunk-by-chunk to avoid the need for large RAM allocations when converting large files. The following sample rates commonly used in digital audio are supported (Hz): 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000, 192000.

Supports both WAVE_FORMAT_PCM and WAVE_FORMAT_EXTENSIBLE multichannel uncompressed formats and any bit resolution between 2 and 32 inclusive. As an option, dither may be applied before re-quantisation with a choice from a variety of standard dithering methods including noise-shaping with a user-specified custom FIR shaping filter.

File format: mex-file (dll)
Editable source code:
no
Utilises non-editable functions: no
Platform:
PC/Windows
Required MATLAB Toolboxes: none (except core MATLAB)
Demo version limitations: 30 second WAV file length limit (then silence)
Syntax:
  wavresample(InputWavFile,OutputWavFile,OutputFs)
  wavresample(InputWavFile,OutputWavFile,OutputFs,OutputFormat);
  wavresample(InputWavFile,OutputWavFile,OutputFs,OutputFormat,
 
Scaling,ReadChunkSize);
   
  wavresample(InputWavFile,OutputWavFile,OutputFs,OutputFormat,
 

Scaling,ReadChunkSize,WaitbarActivate);

Arguments:
Inputs:
InputWavFile

name of the input WAV file. The file must have one of the following supported sample rates are (in Hz): 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000, 192000

OutputWavFile

name of the output WAV file. Cannot be the same as the input filename if both reside in the same directory.

OutputFs

sample rate of output WAV file. Any of the following supported sample rates may be used (in Hz): 8000, 11025, 16000, 22050, 32000, 44100, 48000, 96000, 192000.

Defaults to sample rate of input WAV file if not specified.

OutputFormat

Optional argument. Must be a structure with fields described as follows (see also wavout):

  • Format field (i.e. wavformat.Format) should be a string with value 'pcm' (for WAVE_FORMAT_PCM) or 'ext' (for WAVE_FORMAT_EXTENSIBLE).
  • ChannelMask field (i.e. wavformat.ChannelMask) should contain the "ChannelMask" variable (in decimal format) which prescribes the multichannel speaker allocation for WAVE_FORMAT_EXTENSIBLE. (See chnmsk2spkrlist to read more about the ChannelMask property.)
  • ValidBits field (i.e. bits.ValidBits): as described in Notes below
  • ContainerBits field (i.e. bits.ContainerBits): as described in Notes below
  • DitherMethod field (i.e. bits.DitherMethod): numerical value indicating which type of dithering method is applied before quantisation. Possible values are:
    • 0: no dither [default]
    • 1: rectangular PDF dither, with a peak-to-peak amplitude of 1*LSB
    • 2: triangular PDF dither, with a peak-to-peak amplitude of 2*LSB
    • 3: triangular PDF with first-order high-pass noise shaping
    • 4: triangular PDF with custom FIR noise shaping filter
  • DitherGain field (i.e. bits.DitherGain): gain applied to amplitude of dither [default value of 1] i.e. rectangular PDF dither has amplitude of LSB*DitherGain, and triangular PDF dither has amplitude of 2*LSB*DitherGain.
  • NoiseShapeGain field: (i.e. bits.NoiseShapeGain): gain applied to the feedback path in the case where where noise shaping is activated (i.e. for DitherMethod values greater than 2) [default value of 1]
  • NoiseShapeFIR field: (i.e. bits.NoiseShapeFIR): vector of coefficients for custom noise shaping filter (only valid for DitherMethod = 4). An arbitrary filter of any order may be specified. Default value is the following fifth order filter: [2.033 -2.165 1.959 -1.590 0.6149] taken from ref [2] p 851 (note: designed for 44.1 kHz).

Notes: ValidBits refers to the number of bits used in the quantisation of each sample, and ContainerBits refers to number of bits used to store each sample. ValidBits can have any value from 2 up to and including ContainerBits. Usually the value of ContainerBits is the nearest integer multiple of 8 above the value of ValidBits, but this does not need to be the case (e.g. a 2-bit quantised signal can be stored in a 32-bit container, though this would be highly wasteful in terms of disk space!)

If omitted, output format retains attributes of the input.

Scaling

Optional argument. Must be a structure with fields described as follows:

  • Input field: (i.e. Scaling.Input): [default 'full']. Describes the scaling law applied when reading the input WAV file. Identical to the scale argument in wavin. See wavin for full range of
    options.
  • Output field: (i.e. Scaling.Output): [default 'full']. Describes the scaling law applied when writing the output WAV file. Identical to the scale argument in wavout. See wavout for full range of
    options.
  • Gain field: (i.e. Scaling.Gain): An overall gain applied to the signal. Useful for changing the overall signal strength from input to output. If Scaling.Gain has the same (or greater) number of elements as the number of channels, each output channel is scaled by the corresponding gain value. If Scaling.Gain is scalar, all channels are scaled by the same amount. If Scaling.Gain is non-scalar but has number of elements less than the number of channels, the gain vector is padded (using its last value) to have the same number of elements as number of channels, then each output channel is scaled by the corresponding gain value.

    NOTE: for mono signals, the gain is ineffective when the output scaling is selected as 'norm' or 'normc', since the normalization (associated with 'norm', 'normc') eliminates the effect of the gain. Likewise, for multichannel signals, the gain is ineffective when the output scaling is selected as 'normc' since the per-channel normalization eliminates the effect of the gain (though the gains are generally effective in a relative sense when output scaling is set to 'norm').
ReadChunkSize

Optional argument [default value 65536]. Represents the length (in samples per channel) of each successive chunk to be processed. Can have an arbitrary value, but if too small, computational efficiency will be compromised since the file I/O operations are relatively slow. If too large, memory requirements may be excessive. (Note: the value is overridden if the file length happens to be smaller!)

WaitbarActivate

Optional argument. If the argument is present (even if empty), the progress is displayed via a waitbar which can be customized via the following optional fields:

  • Handle field (i.e. WaitbarActivate.Handle) the handle of an externally-generated waitbar figure. If omitted a new locally-generated waitbar is created (then deleted at the end of the function)
  • Label field (i.e. WaitbarActivate.Label) string containing the label of the waitbar

If this input argument is omitted, no waitbar will be used.

Note: any of the following input arguments: OutputFormat, Scaling, ReadChunkSize, and WaitbarActivate may be substituted by [ ] to force their respective default value(s).
 

Uses hard-wired multi-stage FIR re-sampling (anti-aliasing / anti-imaging) filters written directly in C/C++ and implemented in MATLAB within a compiled mex function (dll). This mex function calls the wavin and wavout m-functions to perform the data I/O.

All multi-stage filters have been designed to meet the following low-pass composite specifications as applied to the input-output data:

  • passband attenuation: <=0.01 dB;
  • stopband attenuation >=120 dB

where the composite low-pass edge frequency is chosen in accordance with the given conversion ratio (i.e. at half the output sample rate for down-samplers to avoid "aliasing", and at half the input sample rate for up-samplers to avoid "imaging").

For convenience, any delays introduced by the filtering are automatically removed.

For efficiency, all filters in all stages have been implemented in polyphase form.
See ref [1] for more information on multi-stage sample rate conversion and polyphase representation.

The output WAV file will be a different length than the input WAV file, in accordance with the ratio of sample rates (and bit-resolutions if selected).

See the M-Pack 1 overview for a detailed discussion of the sample rate conversion, WAV format, quantization, dithering, and noise-shaping techniques used in this function.

Note that wavresample uses wavin and wavout for the WAV file I/O and thereby does not preserve any peripheral header information (e.g. in the '.info' field) beyond the basic '.fmt' (audio format information).

Ref[1]:"Multirate Digital Signal Processing", Ronald E. Crochiere and Lawrence R. Rabiner, Prentice-Hall, 1983.

Ref[2]: "Minimally Audible Noise Shaping", Stanley P. Lipshitz, John Vanderkooy, and Robert A. Wannamaker, J. Audio Eng. Soc., Vol. 39, No. 11, November 1991.

 

  audioresample resampling of data in the MATLAB workspace (similar to the resample function in the Signal Processing Toolbox from The MathWorks)
Examples:
The following examples are contained in the m-script file entitled xmplwavresample.m
 
Ex.1 Convert the sample rate of a 44.1 kHz WAV file ('wavewarp.wav', human voice recording) to 48 kHz, writing the result to mywav.wav (in directory ..\WAVFiles), keeping all other file attributes the same as the original:
 
  wavresample('wavewarp.wav','..\WAVfiles\mywav.wav',48000);
   
  Load the WAV files and plot the original data versus the resampled data (using the appropriately defined time vectors).
   
  [y44,fs44]=wavin('..\WAVfiles\wavewarp.wav');
  [y48,fs48]=wavin('..\WAVfiles\mywav.wav');
  t44=0:1/44100:0.5; %time vector for 44.1kHz sample rate
  t48=0:1/48000:0.5; %time vector for 48kHz sample rate
  plot(t44,y44(1:length(t44)),t48,y48(1:length(t48)),'r');
   
  In the plot, the resampled data is almost indistinguishable from the original. Likewise, they sound identical, as can be conveniently verified using the winplaywav command to play the WAV files directly from disk:
  winplaywav('..\WAVfiles\wavewarp.wav',2); %'sync' playback
  winplaywav('..\WAVfiles\mywav.wav');
 
Ex.2 Same as Ex 1 but invoking the progress-tracking waitbar by including the 7th argument (even if empty):
 
 

wavresample('wavewarp.wav',...

 
'..\WAVfiles\mywav.wav',48000,[],[],[],[]);
 
Ex.3 Now convert the original 44.1Khz file to 8kHz:
 

wavresample('wavewarp.wav','..\WAVfiles\mywav.wav',8000);

   
  Plot the results, this time zooming in to observe the detail using a staircase plot to emphasize the difference in sample rates:
  t44=0:1/44100:0.01; %time vector for 44.1kHz sample rate
  t8=0:1/8000:0.01; %time vector for 8kHz sample rate
  stairs(t44,y44(1:length(t44))); hold; stairs(t8,y8(1:length(t8)),'r');
   
  In the plot, the reduced sample rate (i.e. 8 kHz versus 44.1 kHz) is clearly evident. Moreover, the 8 kHz data sounds "muffled" due to the effects of the anti-aliasing filter (employed by wavresample). (This can be conveniently verified using the winplaywav command to play the WAV files directly from disk, as demonstrated in Ex 2.)
   
Ex.4 Same as in previous example but, in addition to the sample rate reduction, perform a simultaneous re-quantization from 16- to 8-bits with the application of simple rectangular dither (with a peak-to-peak amplitude of 1*LSB):
 
  OutputFormat.ValidBits=8;
 

OutputFormat.DitherMethod=1;

 

wavresample('wavewarp.wav',...

 
'..\WAVfiles\mywav.wav',8000,OutputFormat);
   
 
Ex.5 Multichannel example. Convert the 44.1 kHz WAV 4-channel WAV file
('..\WAVFiles\4channel.wav', human voice recording) to 192 kHz, writing the result to mywav.wav (in directory ..\WAVfiles), keeping all other file attributes the same as the original. Also, for the purposes of illustration, override the default read chunk size to a value of 8192, and invoke the progress-tracking waitbar (by specifying the last input argument as present though empty). The conversion will take considerably longer due to the smaller read chunk size:
 
  wavresample('..\WAVFiles\4channel.wav',...
 
'..\WAVfiles\mywav.wav',192000,[],[],8192,[]);
 
Ex.6 As a "roundtrip" test, convert the 44.1 kHz voice track up to 192 kHz then back down to 44.1 kHz and compare with original. The plot below shows the results (zoomed-in). As expected, the "roundtrip" resampled data is virtually indistinguishable from the original. This is due to the fact that the original signal is bandlimited to 22.05 kHz (i.e. the Nyquist rate corresponding to the sample rate of 44.1 kHz), so the process of upsampling to 192 kHz and back down again has no discernable effect.
 
  wavresample('wavewarp.wav','mywav.wav',192000);
  wavresample('mywav.wav','mywav2.wav',44100);
   
 

Top Of Page Table Of Contents Previous Page Next Page

Send Page To a Friend

home - news - products - store - support - site map - company info
© 2007 Sounds Logical. All rights reserved.
Sounds Logical
legal notice - privacy statement