Sounds Logical
Send Page To a Friend

Table Of Contents Previous Page Next Page

[M-Pack 1 overview]

M-Pack 1: WAV file processing: MATLAB function reference

wavout
Version 1.2 Requires MATLAB 6.0 (R12) or later

Write data to a PCM WAV format audio file, either in its entirety or chunk-by-chunk. Supports both WAVE_FORMAT_PCM and WAVE_FORMAT_EXTENSIBLE multichannel uncompressed formats and any bit resolution between 2 and 32 inclusive. As an option, dither may be applied before quantisation with a choice from a variety of standard dithering methods including noise-shaping with a user-specified custom FIR shaping filter.

By default, the following settings are assumed: 'pcm' format, 16-bits quantisation, no dither.

File format: m-file
Editable source code:
yes
Utilises non-editable functions: no
Platform:
PC/Windows
Required MATLAB Toolboxes: none (except core MATLAB)
Demo version limitations: p-code only (non-editable), 30 second WAV file length limit (then silence)
Syntax:
 

wavout(y,fs,filename);

  wavout(y,fs,filename,bits);
  wavout(y,fs,filename,bits,wavformat);
  wavout(y,fs,filename,bits,wavformat,scale);
  wavout(y,fs,filename,bits,wavformat,scale,oflag);
  wavout(y,fs,filename,bits,wavformat,scale,oflag,writeskip);
  fdx=wavout(y,fs,filename,...);
Arguments:
Inputs:
y

data matrix of dimension [samples*channels] i.e. each column represents a separate channel, each row a separate sample instance.

fs

sample rate (Hz)

filename

name of output WAV file (with optional .WAV extension) or alternatively can be the fdx output from a previous call to wavout (e.g. for writing
chunk-by-chunk via successive calls).

bits

if scalar, corresponds to the number of valid bits to be used in the quantisation [default value of 16]. The value is rounded up to the nearest integer multiple of 8 to yield the storage "container size" [default value of 16].

if a two-vector, the first element corresponds to the number of valid bits to be used in the quantisation, and the second element corresponds to the "container size" for storage (rounded up if not an integer multiple of 8). See notes below.

if a structure, may have the fields described as follows:

  • ValidBits field (i.e. bits.ValidBits): as described above [default value of 16]
  • ContainerBits field (i.e. bits.ContainerBits): as described above [default value of 16]
  • DitherMethod field (i.e. bits.DitherMethod): numerical value indicating which type of dithering method is applied before quantisation. Possible values are:
    • 0: no dither [default]
    • 1: rectangular PDF dither, with a peak-to-peak amplitude of 1*LSB
    • 2: triangular PDF dither, with a peak-to-peak amplitude of 2*LSB
    • 3: triangular PDF with first-order high-pass noise shaping
    • 4: triangular PDF with custom FIR noise shaping filter
  • DitherGain field (i.e. bits.DitherGain): gain applied to amplitude of dither [default value of 1] i.e. rectangular PDF dither has amplitude of LSB*DitherGain, and triangular PDF dither has amplitude of 2*LSB*DitherGain.
  • NoiseShapeGain field: (i.e. bits.NoiseShapeGain): gain applied to the feedback path in the case where where noise shaping is activated (i.e. for DitherMethod values greater than 2) [default value of 1]
  • NoiseShapeFIR field: (i.e. bits.NoiseShapeFIR): vector of coefficients for custom noise shaping filter (only valid for DitherMethod = 4). An arbitrary filter of any order may be specified. Default value is the following fifth order filter: [2.033 -2.165 1.959 -1.590 0.6149] taken from ref [2] p 851 (note: designed for 44.1 kHz).

Notes: ValidBits refers to the number of bits used in the quantisation of each sample, and ContainerBits refers to number of bits used to store each sample. ValidBits can have any value from 2 up to and including ContainerBits. Usually the value of ContainerBits is the nearest integer multiple of 8 above the value of ValidBits, but this does not need to be the case (e.g. a 2-bit quantised signal can be stored in a 32-bit container, though this would be highly wasteful in terms of disk space!)

wavformat

if a string, possible values are 'pcm' [default] for WAV_FORMAT_PCM or 'ext' for 'WAVE_FORMAT_EXTENSIBLE'. The ChannelMask is automatically set to 0 (only relevant for 'WAVE_FORMAT_EXTENSIBLE').

if a structure, should have two fields described as follows:

  • Format field (i.e. wavformat.Format) should be a string with value 'pcm' or 'ext' as described above.
  • ChannelMask field (i.e. wavformat.ChannelMask) should contain the "ChannelMask" variable (in decimal format) which prescribes the multichannel speaker allocation for WAVE_FORMAT_EXTENSIBLE. Type 'help chnmsk2spkrlist' to read more about the ChannelMask property.
scale

string with value 'norm', 'normc', 'full' [default], or 'clip'. These determine how the data is scaled before writing, as follows:

  • 'norm': data is normalised by the max absolute value across all channels (i.e. ScaleFactor=1/(max(max(abs(y))))), thus ensuring that the data is within the range -1:+1 thereby avoiding digital clipping and foldover when quantised. WARNING: when used in chunk-by-chunk writing mode, each chunk will be individually scaled which will result in a non-uniform scaling across the entire file!
  • 'normc': data is normalised per channel by the max absolute value per channel, thus ensuring that the data is within the range -1:+1 thereby avoiding digital clipping and foldover when quantised. WARNING: when used in chunk-by-chunk writing mode, each chunk will be individually scaled which will result in a non-uniform scaling across the entire file!
  • 'full': assumes data is already scaled to within the range -1:+1 WARNING: if data lies outside the -1:+1 range, digital clipping and foldover will occur during the quantisation process!
  • 'clip': data is clipped to within the range -1:+1 thereby avoiding digital foldover when quantised. WARNING: even though foldover is avoided, the clipping will generally lead to audible distortion (albeit less extreme than with foldover), so it is recommended that the data is properly conditioned to fall within the range -1:+1 before being written to a WAV file.
oflag

flag for determining whether or not to leave the file open when exiting the function. Possible values are:

0: close on exit [default]

1: leave open (e.g. for chunk-by-chunk writing)

writeskip

Location (measured in number of sample per channel from start of data section) from where to start current writing (only relevant if fdx is specified instead of filename). Use the value -1 [default] to continue from previous write.

 

Note: any of the following input arguments: bits, wavformat, scale, and oflag may be substituted by [ ] to force their respective default value(s).
Output:
fdx

(optional) row vector (1X10) containing the following elements:

  1. file id (obtained from the fopen command)
  2. current 'per channel' byte position in file (i.e. not the actual offset which takes into account all channels). Required for navigation when using chunk-by-chunk write mode.
  3. byte offset from start of file to start of actual data i.e. after all headers etc. Required for navigation when using chunk-by-chunk write mode.
  4. number of samples per channel in the entire file
  5. number of channels
  6. "container size" (in bytes) per data value. Note: corresponding container size in bits (= ContainerBytes/8) is always an integer multiple of 8.
  7. umber of valids bits of precision (may be less than given container size)
  8. WAVE data format. Supported values are:
    • 1 -- for WAVE_FORMAT_PCM i.e. uncompressed PCM, "old style", fine for 8-bit and 16-bit, but ambiguous for higher bit-resolutions
    • 65534 -- for WAVE_FORMAT_EXTENSIBLE i.e. uncompressed PCM, "new style" with multichannel speaker-location options plus no ambiguity for high bit-resolutions (see M-Pack 1 overview and ref [1]).
    • No other formats are supported by this m-function
  9. sample frequency in Hz
  10. the "ChannelMask" variable (in decimal format) which prescribes the multichannel speaker allocation for WAVE_FORMAT_EXTENSIBLE. Type 'help chnmsk2spkrlist' to read more about the ChannelMask property

See the M-Pack 1 overview for a detailed discussion of the WAV format, quantization, dithering, and noise-shaping techniques used in this m-function.

Notes:

(i) that wavout does not support wave-list data, nor does it write any peripheral header information (e.g. in the '.info' field) beyond the basic '.fmt' (audio format information).

(ii) for WAVE_FORMAT_EXTENSIBLE files, wavout only supports the PCM storage format (i.e. the SubFormat is automatically written corresponding to the Microsoft GUID KSDATAFORMAT_SUBTYPE_PCM.)

Ref[1]: "Enhanced Audio Formats For Multi-Channel Configurations And High Bit Resolution" Windows Multimedia Group Microsoft Corporation, 1999.

Ref[2]: "Minimally Audible Noise Shaping", Stanley P. Lipshitz, John Vanderkooy, and Robert A. Wannamaker, J. Audio Eng. Soc., Vol. 39, No. 11, November 1991.

 

  audiodither dithering of MATLAB data (without WAV writing)
  wavin     WAV file reading
     
Examples:
The following examples are all contained in the m-script file entitled xmplwavout.m

First start by generating a single channel (column) of audio data samples within the amplitude range -1:+1:

 

y=sin(2000*pi*(0:(1/44100):1)'); %amplitude [-1:+1]

 
Ex.1 Write this data to a (default) 16-bit (undithered), mono WAV file (in default WAVE_FORMAT_PCM format) named mywav.wav in directory ..\WAVFiles, assigning a sample rate of 44.1 kHz:
 
  wavout(y,44100,'..\WAVfiles\mywav.wav');
 
Ex.2 Now save in 12-bits (undithered) within 16-bit containers, and assigning a sample rate of 32 kHz:
 
  bits=[12; 16];
  wavout(y,32000,'..\WAVfiles\mywav.wav',bits);
 
Ex.3 Same as in Ex.2 but with the application of simple rectangular dither with a peak-to-peak amplitude of 1*LSB:
 
 

bits.ValidBits=12;

  bits.ContainerBits=16;
  bits.DitherMethod=1;
  wavout(y,32000,'..\WAVfiles\mywav.wav',bits);
   
Ex.4 This example demonstrates the effects of dither in a practical application. First read a 16-bit WAV file containing human speech sampled at 44.1 kHz.
 
   
 

[voice, fs]=wavin('..\WAVfiles\wavewarp.wav');

   
  Listen to the voice (requires a Windows-compatible soundcard)
  sound(voice,fs);
   
  Now re-quantize by writing to a WAV file in 7-bit format (within 8-bit containers), without performing any dithering. (A reduction from 16-bits to 7-bits has been chosen because it clearly illustrates the point)
   
  bits.ValidBits=7;
  bits.ContainerBits=8;
  wavout(voice,fs,'..\WAVfiles\dither0.wav',bits);
   
  Now listen to this file. You will observe that the non-linear modulation of the signal with the quantization noise is clearly evident. The purpose of the dither will be to suppress this distortion:
   
[newvoice, fs]=wavin('..\WAVfiles\dither0.wav');
sound(newvoice,fs);
   
  Now do the same again, but this time using simple additive rectangular dither of amplitude 1 LSB:
   
  bits.DitherMethod=1;
  wavout(voice,fs,'..\WAVfiles\dither1.wav',bits);
   
  Now listen to this file. You will observe that the non-linear modulation of the signal with the quantization noise has been suppressed by the dither, albeit with an increase in background noise level:
   
  [newvoice, fs]=wavin('..\WAVfiles\dither1.wav');
  sound(newvoice,fs);
   
  Now do the same again, but this time with simple additive triangular "highpass" dither of amplitude 2 LSB:
   
  bits.DitherMethod=2;
  wavout(voice,fs,'..\WAVfiles\dither2.wav',bits);
   
  Now listen to this file. Again, the non-linear modulation of the signal with the quantization noise has been suppressed by the dither. The additional background noise is still present but shifted to higher (less audible) frequencies by virtue of the "highpass" differencing used to create the triangular dither:
   
  [newvoice, fs]=wavin('..\WAVfiles\dither2.wav');
  sound(newvoice,fs);
   
  Now do the same again, but this time with simple (pure delay) feedback noise-shaping in combination with the triangular dither:
   
  bits.DitherMethod=3;
  wavout(voice,fs,'..\WAVfiles\dither3.wav',bits);
   
  Now listen to this file. Again, the non-linear modulation of the signal with the quantization noise has been suppressed by the dither. The additional background noise is still present but shifted to even higher (less audible) frequencies by virtue of the "highpass" effect of the negative feedback delay:
   
  [newvoice, fs]=wavin('..\WAVfiles\dither3.wav');
  sound(newvoice,fs);
   
  Now do the same again, but with a two-coefficient FIR noise-shaping filter (from ref [2] p 851, designed from psychoacoustical considerations):
   
  bits.DitherMethod=4;
  bits.NoiseShapeFIR=[1.537 -0.8367];
  wavout(voice,fs,'..\WAVfiles\dither4.wav',bits);
   
  Now listen to this file. Again, the non-linear modulation of the signal with the quantization noise has been suppressed by the dither. The additional background noise is still present but even less audible due to the psychoacoustical noise-shaping:
   
  [newvoice, fs]=wavin('..\WAVfiles\dither4.wav');
  sound(newvoice,fs);
   
  Finally, do the same again, but with (the default) five-coefficient FIR noise-shaping filter (from ref [2] p 851, designed from psychoacoustical considerations):
   
  bits.DitherMethod=4;
  bits.NoiseShapeFIR=[2.033 -2.165 1.959 -1.590 0.6149];
  wavout(voice,fs,'..\WAVfiles\dither5.wav',bits);
   
  Now listen to this file. Again, the non-linear modulation of the signal with the quantization noise has been suppressed by the dither. The additional background noise is still present but even less audible due to the more elaborate psychoacoustical noise-shaping:
   
  [newvoice, fs]=wavin('..\WAVfiles\dither5.wav');
  sound(newvoice,fs);
   
The next example deals with scaling. Multiply the original test data by a large factor (e.g. 1000) so that the amplitude values lie well outside the range [-1:+1]:
 

z = y*1000; %range [-1000:+1000];

If this were written to a WAV file using the (default) 'full' scale setting (as implicit in all previous examples), the data would be garbled due to digital foldover occurring for all amplitudes outside the expected range. Instead, as in the next example, use the 'norm' scale setting which has the effect of pre-scaling the data to the [-1:+1] range before quantization:
 
Ex.5 Normalize the data and save with same format as in Ex.1:
  wavout(z,44100,'..\WAVfiles\mywav.wav',[ ],[ ],'norm');
 
Now create a stereo data matrix from the original mono (by duplicating the data into a second column), then multiply the first channel by a large factor (e.g. 1000) so that the amplitude values for this channel lie well outside the range [-1:+1]:
 

z = [y*1000 y];

As in the previous example, this can be normalized using the 'norm' scale setting, which, for multichannel (in this case, stereo) data, has the effect of pre-scaling the data by the largest magnitude across all (both) channels before quantization. Consequently, in this example, the first channel would be normalized to unity amplitude, but the second channel would be scaled down to a very low amplitude (0.001). Alternatively, using the 'normc' scale setting as in Ex.6, the normalization can be applied on a per-channel basis, such that each channel is individually normlized to unity:
 
Ex.6 Normalize the data individually per channel, and save with same format as in Ex.1:
  wavout(z,44100,'..\WAVfiles\mywav.wav',[ ],[ ],'normc');
 
Now, to demonstrate the multichannel capabilities, first set up a matrix of audio data, say, four repeated channels of the original data set:
 

z = [y y y y]; %range [-1:+1]

   
Ex.7 Write this data to a (default) 16-bit (undithered), 4-channel WAV file (in default WAVE_FORMAT_PCM format), assigning a sample rate of 44.1 kHz:
 
  wavout(z,44100,'..\WAVfiles\mywav.wav');
   
Ex.8 Now save as before (default 16-bit, 44.1 kHz) but in WAVE_FORMAT_EXTENSIBLE format specifying a "3.1 surround sound" (4-channel) speaker layout (ChannelMask value of 15):
 
 
 

wavformat.Format='ext';

  wavformat.ChannelMask=15;
  wavout(z,44100,'..\WAVfiles\mywav.wav',[ ],wavformat);
   
Ex.9 Now save in 18-bits (within 24-bit containers) using simple rectangular dither, assigning a sample rate of 48 kHz, and a "quadrophonic (4 corner)" speaker layout (ChannelMask value of 51):
 
 
  bits.ValidBits=18;
  bits.ContainerBits=24;
  bits.DitherMethod=1;
 

wavformat.Format='ext';

  wavformat.ChannelMask=51;
  wavout(z,48000,'..\WAVfiles\mywav.wav',bits,wavformat);
   
Note that in the above examples, the sample rate written to the file was arbitrarily changed from example-to-example. Such arbitrary re-assigning of sample rate -- irrespective of the data -- has the effect of changing the frequency and duration of the signal when processed via a WAV player. For the sinusoidal test data (nominal frequency of 1 kHz for a sample rate of 44.1 kHz), this serves as an illustrative demonstration. However, for "real" audio data, it is generally unacceptable to play back the recorded signal at the "wrong" sample rate (since it leads to the classic "helium voice" distortion of time and pitch). Rather, when it is desired to change the sample rate of an audio file (e.g. to minimize the file size and storage requirements), it is necessary to resample the audio data in order to preserve the original pitch and duration when played at the new sample rate. The audioresample and wavresample functions are provided explicitly for this purpose.
   
Ex.10 As a final example, we demonstrate how to use the wavout (and wavin ) functions in a chunk-by-chunk mode whereby a WAV file can be read in and re-written chunk-by-chunk, thereby enabling WAV files of arbitrary length to be processed with a fixed memory allocation. See also waveffect for an example of how to use wavin and wavout to build a chunk-by-chunk effects processor. Refer to example 5 in the wavin help page for an example on reading arbitrary portions from arbitrary locations of a WAV file.
 
   
%
Set the chunk size for reading(can have arbitrary value)
  Nchunk=1024;
   
%
Read first chunk of WAV file '..\WAVfiles\4channel.wav', leaving the input file
%
open, and returning the fdxi vector for use in the next read call instead of the file
%
name (i.e. for chunk-by-chunk reading):
  [y1,fs,fdxi]=wavin('..\WAVfiles\4channel.wav',[],1,Nchunk);
%

... apply some process ...

 
%
Set the parameters of the output file as, for example, 24-bit, undithered, in
%
WAVE_FORMAT_EXTENSIBLE format with a "quadrophonic (4 corner)"
%
speaker layout (ChannelMask value of 51):
  bits=[24; 24];
  wavformat.Format='ext';
  wavformat.ChannelMask=51;
   
%
Now write the first chunk, preserving the input sample rate, leaving the output
%
file open (oflag=1), and returning the fdxo vector for use in the next write
%
call instead of the file name (i.e. for chunk-by-chunk writing)

fdxo=wavout(y1,fs,'..\WAVfiles\mywav.wav',bits,

  wavformat,[],1);
   
%
Now prepare to loop through the remaining chunks...
%
Determine total length of the input file then set up chunk loop accordingly...
  InputFileInfo=wavinfo('..\WAVfiles\4channel.wav');
  Len=InputFileInfo.SamplesPerChannel;
  Ncycles=floor(Len/Nchunk)-1; %Chunks to go (after first)
  for i=1:Ncycles,
        % read using fdx instead of filename, set skipsize=-1 to continue from
        % previous read, set oflag=1 to keep file open
      [ychunk,fs,fdxi]=wavin(fdxi,[],1,Nchunk,-1);
     % ... apply some process ...
       % Write using fdx instead of filename,set writeskip=-1 for continuation from
       % previous write, set oflag=1 to keep file open
       fdxo=wavout(ychunk,fs,fdxo,bits,wavformat,[],1,-1);
  end
   
%
Check if need to do tailend chunk
  Ntail=rem(Len,Nchunk);
  ychunk=[];
  if Ntail,
 

     % read the tailend chunk and close the input file after reading (oflag=0)

       [ychunk,fs,fdxi]=wavin(fdxi,[],0,Ntail,-1);
     % ... apply some process ...
 

     % write the tailend chunk and close the output file after writing (oflag=0)

     wavout(ychunk,fs,fdxo,bits,wavformat,[],0,-1);
end;
 

Top Of Page Table Of Contents Previous Page Next Page

Send Page To a Friend

home - news - products - store - support - site map - company info
© 2007 Sounds Logical. All rights reserved.
Sounds Logical
legal notice - privacy statement