|
|
|
|
Send Page To a Friend

|
M-Pack
1: WAV file processing: MATLAB function reference
|
| Version 1.2 |
Requires MATLAB
6.0 (R12) or later |
|
|
|
Write
data to a PCM WAV format audio file, either in its
entirety or chunk-by-chunk. Supports both WAVE_FORMAT_PCM
and WAVE_FORMAT_EXTENSIBLE multichannel uncompressed
formats and any bit resolution between 2 and 32 inclusive.
As an option, dither may be applied before quantisation
with a choice from a variety of standard dithering
methods including noise-shaping with a user-specified
custom FIR shaping filter.
By
default, the following settings are assumed: 'pcm'
format, 16-bits quantisation, no dither. |
|
| File
format: |
m-file |
|
Editable
source code:
|
yes
|
| Utilises
non-editable functions: |
no |
|
Platform:
|
PC/Windows
|
| Required
MATLAB Toolboxes: |
none
(except core MATLAB) |
| Demo
version limitations: |
p-code
only (non-editable), 30 second WAV file length limit
(then silence) |
|
|
|
| |
wavout(y,fs,filename);
|
| |
wavout(y,fs,filename,bits); |
| |
wavout(y,fs,filename,bits,wavformat); |
| |
wavout(y,fs,filename,bits,wavformat,scale); |
| |
wavout(y,fs,filename,bits,wavformat,scale,oflag); |
| |
wavout(y,fs,filename,bits,wavformat,scale,oflag,writeskip); |
| |
fdx=wavout(y,fs,filename,...); |
|
|
|
| Inputs: |
| y |
data matrix of dimension [samples*channels] i.e. each
column represents a separate channel, each row a separate
sample instance.
|
| fs |
sample rate (Hz)
|
| filename |
name of output WAV file (with optional .WAV extension)
or alternatively can be the fdx output from a previous
call to wavout
(e.g. for writing
chunk-by-chunk via successive calls).
|
| bits |
if scalar, corresponds to the number of valid bits
to be used in the quantisation [default value of 16].
The value is rounded up to the nearest integer multiple
of 8 to yield the storage "container size" [default
value of 16].
if
a two-vector, the first element corresponds to the
number of valid bits to be used in the quantisation,
and the second element corresponds to the "container
size" for storage (rounded up if not an integer multiple
of 8). See notes below.
if
a structure, may have the fields described as follows:
-
ValidBits field (i.e. bits.ValidBits): as described
above [default value of 16]
- ContainerBits
field (i.e. bits.ContainerBits): as described above
[default value of 16]
- DitherMethod
field (i.e. bits.DitherMethod): numerical value
indicating which type of dithering method is applied
before quantisation. Possible values are:
- 0:
no dither [default]
- 1:
rectangular PDF dither, with a peak-to-peak
amplitude of 1*LSB
- 2:
triangular PDF dither, with a peak-to-peak amplitude
of 2*LSB
- 3:
triangular PDF with first-order high-pass noise
shaping
- 4:
triangular PDF with custom FIR noise shaping
filter
- DitherGain
field (i.e. bits.DitherGain): gain applied to amplitude
of dither [default value of 1] i.e. rectangular
PDF dither has amplitude of LSB*DitherGain, and
triangular PDF dither has amplitude of 2*LSB*DitherGain.
- NoiseShapeGain
field: (i.e. bits.NoiseShapeGain): gain applied
to the feedback path in the case where where noise
shaping is activated (i.e. for DitherMethod values
greater than 2) [default value of 1]
- NoiseShapeFIR
field: (i.e. bits.NoiseShapeFIR): vector of coefficients
for custom noise shaping filter (only valid for
DitherMethod = 4). An arbitrary filter of any order
may be specified. Default value is the following
fifth order filter: [2.033 -2.165 1.959 -1.590 0.6149]
taken from ref [2] p 851 (note:
designed for 44.1 kHz).
Notes: ValidBits refers to the number of bits used
in the quantisation of each sample, and ContainerBits
refers to number of bits used to store each sample.
ValidBits can have any value from 2 up to and including
ContainerBits. Usually the value of ContainerBits
is the nearest integer multiple of 8 above the value
of ValidBits, but this does not need to be the case
(e.g. a 2-bit quantised signal can be stored in a
32-bit container, though this would be highly wasteful
in terms of disk space!)
|
| wavformat |
if
a string, possible values are 'pcm' [default] for
WAV_FORMAT_PCM or 'ext' for 'WAVE_FORMAT_EXTENSIBLE'.
The ChannelMask is automatically set to 0 (only relevant
for 'WAVE_FORMAT_EXTENSIBLE').
if
a structure, should have two fields described as follows:
- Format
field (i.e. wavformat.Format) should be a string
with value 'pcm' or 'ext' as described above.
- ChannelMask
field (i.e. wavformat.ChannelMask) should contain
the "ChannelMask" variable (in decimal format) which
prescribes the multichannel speaker allocation for
WAVE_FORMAT_EXTENSIBLE. Type 'help chnmsk2spkrlist'
to read more about the ChannelMask property.
|
| scale |
string
with value 'norm', 'normc', 'full' [default], or 'clip'.
These determine how the data is scaled before writing,
as follows:
-
'norm': data is normalised by the max absolute value
across all channels (i.e. ScaleFactor=1/(max(max(abs(y))))),
thus ensuring that the data is within the range
-1:+1 thereby avoiding digital clipping and foldover
when quantised. WARNING: when used in chunk-by-chunk
writing mode, each chunk will be individually scaled
which will result in a non-uniform scaling across
the entire file!
- 'normc':
data is normalised per channel by the max absolute
value per channel, thus ensuring that the data is
within the range -1:+1 thereby avoiding digital
clipping and foldover when quantised. WARNING: when
used in chunk-by-chunk writing mode, each chunk
will be individually scaled which will result in
a non-uniform scaling across the entire file!
- 'full':
assumes data is already scaled to within the range
-1:+1 WARNING: if data lies outside the -1:+1 range,
digital clipping and foldover will occur during
the quantisation process!
- 'clip':
data is clipped to within the range -1:+1 thereby
avoiding digital foldover when quantised. WARNING:
even though foldover is avoided, the clipping will
generally lead to audible distortion (albeit less
extreme than with foldover), so it is recommended
that the data is properly conditioned to fall within
the range -1:+1 before being written to a WAV file.
|
| oflag |
flag
for determining whether or not to leave the file open
when exiting the function. Possible values are:
0: close on exit [default]
1:
leave open (e.g. for chunk-by-chunk writing)
|
| writeskip |
Location
(measured in number of sample per channel from start
of data section) from where to start current writing
(only relevant if fdx is specified instead of filename).
Use the value -1 [default] to continue from previous
write.
|
| Note:
any of the following input arguments: bits, wavformat,
scale, and oflag may be substituted by [ ] to force
their respective default value(s). |
|
| Output: |
| fdx |
(optional) row vector (1X10) containing the following
elements:
- file
id (obtained from the fopen command)
- current
'per channel' byte position in file (i.e. not the
actual offset which takes into account all channels).
Required for navigation when using chunk-by-chunk
write mode.
- byte
offset from start of file to start of actual data
i.e. after all headers etc. Required for navigation
when using chunk-by-chunk write mode.
- number
of samples per channel in the entire file
- number
of channels
- "container
size" (in bytes) per data value. Note: corresponding
container size in bits (= ContainerBytes/8) is always
an integer multiple of 8.
- umber
of valids bits of precision (may be less than given
container size)
- WAVE
data format. Supported values are:
-
1 -- for WAVE_FORMAT_PCM i.e. uncompressed PCM,
"old style", fine for 8-bit and 16-bit, but
ambiguous for higher bit-resolutions
- 65534
-- for WAVE_FORMAT_EXTENSIBLE i.e. uncompressed
PCM, "new style" with multichannel speaker-location
options plus no ambiguity for high bit-resolutions
(see M-Pack 1 overview
and ref [1]).
- No
other formats are supported by this m-function
- sample
frequency in Hz
- the
"ChannelMask" variable (in decimal format) which
prescribes the multichannel speaker allocation for
WAVE_FORMAT_EXTENSIBLE. Type 'help chnmsk2spkrlist'
to read more about the ChannelMask property
|
|
|
|
|
See
the M-Pack 1 overview
for a detailed discussion of the WAV format, quantization,
dithering, and noise-shaping techniques used in this
m-function.
Notes:
(i)
that wavout
does not support wave-list data, nor does it write
any peripheral header information (e.g. in the '.info'
field) beyond the basic '.fmt' (audio format information).
(ii)
for WAVE_FORMAT_EXTENSIBLE files,
wavout only supports the PCM storage format
(i.e. the SubFormat is automatically written corresponding
to the Microsoft GUID KSDATAFORMAT_SUBTYPE_PCM.)
Ref[1]:
"Enhanced
Audio Formats For Multi-Channel Configurations And
High Bit Resolution" Windows Multimedia Group Microsoft
Corporation, 1999.
Ref[2]:
"Minimally Audible Noise Shaping", Stanley P. Lipshitz,
John Vanderkooy, and Robert A. Wannamaker, J. Audio
Eng. Soc., Vol. 39, No. 11, November 1991.
|
|
|
|
| |
audiodither |
dithering
of MATLAB data (without WAV writing) |
| |
wavin
|
WAV
file reading |
| |
|
|
|
|
|
| The
following examples are all contained in the m-script
file entitled xmplwavout.m
First
start by generating a single channel (column) of audio
data samples within the amplitude range -1:+1:
|
| |
y=sin(2000*pi*(0:(1/44100):1)');
%amplitude [-1:+1]
|
| |
| Ex.1 |
Write
this data to a (default) 16-bit (undithered), mono WAV
file (in default WAVE_FORMAT_PCM format) named mywav.wav
in directory
..\WAVFiles,
assigning a sample rate of 44.1 kHz: |
| |
| |
wavout(y,44100,'..\WAVfiles\mywav.wav'); |
| |
| Ex.2 |
Now
save in 12-bits (undithered) within 16-bit containers,
and assigning a sample rate of 32 kHz: |
| |
| |
bits=[12;
16]; |
| |
wavout(y,32000,'..\WAVfiles\mywav.wav',bits); |
| |
| Ex.3 |
Same
as in Ex.2 but with the application of simple rectangular
dither with
a peak-to-peak amplitude of 1*LSB:
|
| |
| |
bits.ValidBits=12;
|
| |
bits.ContainerBits=16; |
| |
bits.DitherMethod=1; |
| |
wavout(y,32000,'..\WAVfiles\mywav.wav',bits); |
| |
|
| Ex.4 |
This
example demonstrates the effects of dither in a practical
application. First read a 16-bit WAV file containing
human speech sampled at 44.1 kHz. |
| |
| |
|
| |
[voice,
fs]=wavin('..\WAVfiles\wavewarp.wav');
|
| |
|
| |
Listen
to the voice (requires a Windows-compatible soundcard)
|
| |
sound(voice,fs);
|
| |
|
| |
Now
re-quantize by writing to a WAV file in 7-bit format
(within 8-bit containers), without performing any dithering.
(A reduction from 16-bits to 7-bits has been chosen
because it clearly illustrates the point) |
| |
|
| |
bits.ValidBits=7;
|
| |
bits.ContainerBits=8;
|
| |
wavout(voice,fs,'..\WAVfiles\dither0.wav',bits);
|
| |
|
| |
Now
listen to this file. You will observe that the non-linear
modulation of the signal with the quantization noise
is clearly evident. The purpose of the dither will be
to suppress this distortion: |
| |
|
|
[newvoice,
fs]=wavin('..\WAVfiles\dither0.wav'); |
|
sound(newvoice,fs);
|
| |
|
| |
Now
do the same again, but this time using simple additive
rectangular dither of amplitude 1 LSB: |
| |
|
| |
bits.DitherMethod=1; |
| |
wavout(voice,fs,'..\WAVfiles\dither1.wav',bits);
|
| |
|
| |
Now
listen to this file. You will observe that the non-linear
modulation of the signal with the quantization noise
has been suppressed by the dither, albeit with an increase
in background noise level: |
| |
|
| |
[newvoice,
fs]=wavin('..\WAVfiles\dither1.wav'); |
| |
sound(newvoice,fs);
|
| |
|
| |
Now
do the same again, but this time with simple additive
triangular "highpass" dither of amplitude 2 LSB: |
| |
|
| |
bits.DitherMethod=2; |
| |
wavout(voice,fs,'..\WAVfiles\dither2.wav',bits);
|
| |
|
| |
Now
listen to this file. Again, the non-linear modulation
of the signal with the quantization noise has been suppressed
by the dither. The additional background noise is still
present but shifted to higher (less audible) frequencies
by virtue of the "highpass" differencing used to create
the triangular dither: |
| |
|
| |
[newvoice,
fs]=wavin('..\WAVfiles\dither2.wav'); |
| |
sound(newvoice,fs);
|
| |
|
| |
Now
do the same again, but this time with simple (pure delay)
feedback noise-shaping in combination with the triangular
dither: |
| |
|
| |
bits.DitherMethod=3; |
| |
wavout(voice,fs,'..\WAVfiles\dither3.wav',bits);
|
| |
|
| |
Now
listen to this file. Again, the non-linear modulation
of the signal with the quantization noise has been suppressed
by the dither. The additional background noise is still
present but shifted to even higher (less audible) frequencies
by virtue of the "highpass" effect of the negative feedback
delay: |
| |
|
| |
[newvoice,
fs]=wavin('..\WAVfiles\dither3.wav'); |
| |
sound(newvoice,fs);
|
| |
|
| |
Now
do the same again, but with a two-coefficient FIR noise-shaping
filter (from ref [2] p 851, designed
from psychoacoustical considerations): |
| |
|
| |
bits.DitherMethod=4; |
| |
bits.NoiseShapeFIR=[1.537
-0.8367]; |
| |
wavout(voice,fs,'..\WAVfiles\dither4.wav',bits);
|
| |
|
| |
Now
listen to this file. Again, the non-linear modulation
of the signal with the quantization noise has been suppressed
by the dither. The additional background noise is still
present but even less audible due to the psychoacoustical
noise-shaping: |
| |
|
| |
[newvoice,
fs]=wavin('..\WAVfiles\dither4.wav'); |
| |
sound(newvoice,fs);
|
| |
|
| |
Finally,
do the same again, but with (the default) five-coefficient
FIR noise-shaping filter (from ref [2]
p 851, designed from psychoacoustical considerations):
|
| |
|
| |
bits.DitherMethod=4; |
| |
bits.NoiseShapeFIR=[2.033
-2.165 1.959 -1.590 0.6149]; |
| |
wavout(voice,fs,'..\WAVfiles\dither5.wav',bits);
|
| |
|
| |
Now
listen to this file. Again, the non-linear modulation
of the signal with the quantization noise has been suppressed
by the dither. The additional background noise is still
present but even less audible due to the more elaborate
psychoacoustical noise-shaping: |
| |
|
| |
[newvoice,
fs]=wavin('..\WAVfiles\dither5.wav'); |
| |
sound(newvoice,fs);
|
| |
|
| The
next example deals with scaling. Multiply
the original test data by a large factor (e.g. 1000)
so that the amplitude values lie well outside the range
[-1:+1]: |
| |
z
= y*1000; %range [-1000:+1000];
|
| If
this were written to a WAV file using the (default)
'full' scale setting (as implicit in all previous examples),
the data would be garbled due to digital foldover occurring
for all amplitudes outside the expected range. Instead,
as in the next example, use the 'norm' scale setting
which has the effect of pre-scaling the data to the
[-1:+1] range before quantization: |
| |
| Ex.5 |
Normalize
the data and save with same format as in Ex.1: |
| |
wavout(z,44100,'..\WAVfiles\mywav.wav',[ ],[ ],'norm'); |
| |
| Now
create a stereo data matrix from the original mono (by
duplicating the data into a second column), then multiply
the first channel by a large factor (e.g. 1000) so that
the amplitude values for this channel lie well outside
the range [-1:+1]: |
| |
z
= [y*1000 y];
|
| As
in the previous example, this can be normalized using
the 'norm' scale setting, which, for multichannel (in
this case, stereo) data, has the effect of pre-scaling
the data by the largest magnitude across all (both)
channels before quantization. Consequently, in this
example, the first channel would be normalized to unity
amplitude, but the second channel would be scaled down
to a very low amplitude (0.001). Alternatively, using
the 'normc' scale setting as in Ex.6, the normalization
can be applied on a per-channel basis, such that each
channel is individually normlized to unity: |
| |
| Ex.6 |
Normalize
the data individually per channel, and save with same
format as in Ex.1: |
| |
wavout(z,44100,'..\WAVfiles\mywav.wav',[ ],[ ],'normc'); |
| |
| Now,
to demonstrate the multichannel capabilities, first
set up a matrix of audio data, say, four repeated channels
of the original data set: |
| |
z
= [y y y y]; %range [-1:+1]
|
| |
|
| Ex.7 |
Write
this data to a (default) 16-bit (undithered), 4-channel
WAV file (in default WAVE_FORMAT_PCM format),
assigning a sample rate of 44.1 kHz: |
| |
| |
wavout(z,44100,'..\WAVfiles\mywav.wav'); |
| |
|
| Ex.8 |
Now
save as before (default 16-bit, 44.1 kHz) but in WAVE_FORMAT_EXTENSIBLE
format specifying a "3.1 surround sound" (4-channel)
speaker layout (ChannelMask value of 15): |
| |
| |
| |
wavformat.Format='ext';
|
| |
wavformat.ChannelMask=15; |
| |
wavout(z,44100,'..\WAVfiles\mywav.wav',[ ],wavformat); |
| |
|
| Ex.9 |
Now
save in 18-bits (within 24-bit containers) using simple
rectangular dither, assigning
a sample rate of 48 kHz, and a "quadrophonic
(4 corner)" speaker layout (ChannelMask value of
51): |
| |
| |
| |
bits.ValidBits=18;
|
| |
bits.ContainerBits=24; |
| |
bits.DitherMethod=1; |
| |
wavformat.Format='ext';
|
| |
wavformat.ChannelMask=51; |
| |
wavout(z,48000,'..\WAVfiles\mywav.wav',bits,wavformat); |
| |
|
| Note
that in the above examples, the sample rate written
to the file was arbitrarily changed from example-to-example.
Such arbitrary re-assigning of sample rate -- irrespective
of the data -- has the effect of changing the frequency
and duration of the signal when processed via a WAV
player. For the sinusoidal test data (nominal frequency
of 1 kHz for a sample rate of 44.1 kHz), this serves
as an illustrative demonstration. However, for "real"
audio data, it is generally unacceptable to play back
the recorded signal at the "wrong" sample
rate (since it leads to the classic "helium voice"
distortion of time and pitch). Rather, when it is desired
to change the sample rate of an audio file (e.g. to
minimize the file size and storage requirements), it
is necessary to resample the audio data in order
to preserve the original pitch and duration when played
at the new sample rate. The audioresample
and wavresample
functions are provided explicitly for this purpose. |
| |
|
| Ex.10 |
As
a final example, we demonstrate how to use the wavout
(and wavin
) functions in a chunk-by-chunk mode whereby a WAV file
can be read in and re-written chunk-by-chunk, thereby
enabling WAV files of arbitrary length to be processed
with a fixed memory allocation. See
also waveffect
for an example of how to use wavin
and wavout
to build a chunk-by-chunk effects processor.
Refer
to example 5
in the wavin
help
page
for an example on reading arbitrary portions from arbitrary
locations of a WAV file. |
| |
| |
|
|
%
|
Set the chunk size for reading(can have arbitrary value)
|
| |
Nchunk=1024; |
| |
|
|
%
|
Read
first chunk of WAV file '..\WAVfiles\4channel.wav',
leaving the input file |
|
%
|
open,
and returning the fdxi vector for use in the next read
call instead of the file |
|
%
|
name
(i.e. for chunk-by-chunk reading): |
| |
[y1,fs,fdxi]=wavin('..\WAVfiles\4channel.wav',[],1,Nchunk);
|
|
%
|
...
apply some process ...
|
| |
|
%
|
Set the parameters of the output file as, for example,
24-bit, undithered, in |
|
%
|
WAVE_FORMAT_EXTENSIBLE
format with a "quadrophonic (4 corner)" |
|
%
|
speaker layout (ChannelMask value of 51): |
| |
bits=[24;
24]; |
| |
wavformat.Format='ext'; |
| |
wavformat.ChannelMask=51;
|
| |
|
|
%
|
Now
write the first chunk, preserving the input sample rate,
leaving the output |
|
%
|
file
open (oflag=1), and returning the fdxo vector for use
in the next write |
|
%
|
call
instead of the file name (i.e. for chunk-by-chunk writing)
|
|
|
fdxo=wavout(y1,fs,'..\WAVfiles\mywav.wav',bits,
|
| |
wavformat,[],1);
|
| |
|
|
%
|
Now prepare to loop through the remaining chunks...
|
|
%
|
Determine
total length of the input file then set up chunk loop
accordingly... |
| |
InputFileInfo=wavinfo('..\WAVfiles\4channel.wav');
|
| |
Len=InputFileInfo.SamplesPerChannel;
|
| |
Ncycles=floor(Len/Nchunk)-1;
%Chunks to go (after first) |
| |
for
i=1:Ncycles, |
| |
%
read using fdx instead of filename, set skipsize=-1
to continue from |
| |
%
previous read, set oflag=1 to keep file open |
|
[ychunk,fs,fdxi]=wavin(fdxi,[],1,Nchunk,-1); |
|
|
%
...
apply some process ... |
| |
%
Write using fdx instead of filename,set writeskip=-1
for continuation from |
| |
%
previous write, set oflag=1 to keep file open |
| |
fdxo=wavout(ychunk,fs,fdxo,bits,wavformat,[],1,-1);
|
| |
end |
| |
|
|
%
|
Check
if need to do tailend chunk |
| |
Ntail=rem(Len,Nchunk);
|
| |
ychunk=[];
|
| |
if
Ntail, |
| |
%
read the tailend chunk and close
the input file after reading (oflag=0)
|
| |
[ychunk,fs,fdxi]=wavin(fdxi,[],0,Ntail,-1);
|
|
|
%
...
apply some process ... |
| |
%
write the tailend chunk and close
the output file after writing (oflag=0)
|
|
wavout(ychunk,fs,fdxo,bits,wavformat,[],0,-1); |
|
end;
|
| |
|

Send Page To a Friend
|
|