minimumphase filter kernels and partitioned convolution
Previous pages described parametric
Fourier filters which can do
radical and precise filtering, but which suffer from considerable
latency, due to FFT length (25  50 ms typically). As it turns out, it
is possible to
reduce that latency to a few samples, using a combination of two
techniques, both invented decades ago. The first one, homomorphic
filtering, reorganizes the filterkernel, and the second, partitioned
convolution, breaks up fast convolution in small blocks.
zerophase filter kernel 
When a filter kernel (impulse response) is created by the inverse
Fourier transform of a magnitude spectrum, that filter kernel is
symmetric by definition: a zerophase filter. The above plot shows an
example. The anticausal coefficients introduce latency. The longer
filter, the more latency. Radical filters are long by definition.
It
is much harder to create a purely causal filter kernel from a given
magnitude
spectrum. The causal filter is not simply the (scaled) right
half of the equivalent zerophase
filter.
In the article 'Homomorphic
Prediction' by Alan Oppenheim &c. (1975), the following
description is given: signal
x[n] can
be considered the convolution of it's minimum and maximumphase
components. And in four simple equations, the
whole theory of 'homomorphic processing' is outlined. With help of this
text, I
developed a minimumphase FIR generator in Pure Data, step by step. The
convolution of minimum and maximumphase
components, making up the zerophase signal x[n], can be written like
this:
If you manage to separate these components of a filter kernel x[n],
you have an anticausal and causal filter. But how can you separate two
things which are convolved, if you don't at least know one of the
originals? Seems impossible.
Now, think of this: convolution in signal domain is complex
multiplication in frequency domain. It happens to be the case that
positive and negative frequencies live separated here. So, couldn't we
perform complex division to get
minphase and maxphase components? Ehhm.... that doesn't work.
The imaginary parts of the zerophase spectrum all have value zero. To
split a zerophase spectrum in minimum and maximumphase components,
is like splitting matter and antimatter out of the void.
Next step is to express the magnitude spectrum
independently
from the phase spectrum. On the zplane, where the spectrum is
initially expressed, this is not the case. The zplane has rectangular
coordinates, cosine and sine components, which can be recalculated into
complex exponentials, 'polar coordinates'.
Magnitude is
expressed as radius or amplitude, and the phase aspect as arc length on
the unit
circle. That is not orthogonal, as is illustrated below:
The logarithms of the complex exponentials must be taken, which transforms the zplane to an splane by unwinding the unit circle.
On the splane, the natural logarithm of the amplitude is on the xaxis, and the natural logarithm of the zplane's e^{iy}, being the phase, is on the yaxis. Since ln(0) = inf, the amplitudes' lower limit must be restricted to some small value, subliminal in the sense of audio perception. The number, regularly called 'epsilon', could have value 1/million for example. The natural logarithm of 1/million is about 13.81551. So radius lengths between 1/million and the unit circle would then translate to logarithmic magnitude values between 13.81551 and zero.
On the splane, minimum and maximumphase components are no longer
a product, but a
sum, because the complex logarithms of all data were taken. So the
phaseseparation problem is
simplified from deconvolution to division
to subtraction. But we are still in frequency domain, where the
opposite phase components have cancelled each other, with no way to
split them into the constituent parts.
amplitudes 
logarithms of amplitudes 
On to the next domain. The spectral data will now be transformed to
a
'cepstrum', by applying
inverse Fourier transform. The fantasy term 'cepstrum' is an indication
of the domain's abstractness. What data is on the x and iy axes there?
For the purpose under concern, I find it useful to think of the
cepstrum as a signal, reorganized according to the geometry of the
splane instead of the zplane. If the original signal was pure real,
the spectrum was symmetric, and the cepstral data is again pure real.
Also, the cepstral coefficients of a zerophase spectrum
concentrate symmetrically around index zero, like it is the case for
zerophase signals.
It seems to me that the cepstrum represents information about the
signal in a negated form: components not or hardly present in the
signal will have most energy in the cepstrum. After all, a white
spectrum, if normalized, has all magnitudes zero on the splane so
there would be nothing in the cepstrum. Hmm, weird idea.
In cepstral domain, we can finally split the now reorganized signal
into it's phase components. All the maximumphase coefficients are on
the left and all the minimumphase components are on the right. In the
center is a point where they overlap, corresponding to the identity
index x[0] in the zerophase signal. The value in this point belongs
half to the maximumphase and half to the minimumphase part. Here is
how the cepstrum
coefficients belonging to the earlier plotted FIR are separated:
sum of cepstrum coefficients 
maximumphase cepstrum coefficients 
minimumphase cepstrum coefficients 
In order to get a minimumphase IR with the same magnitude as
before, the minimumphase cepstrum coefficients are multiplied by two:
minimumphase cepstrum coefficients multiplied by 2 
The result of this action is a singlesided cepstrum. When this
singlesided cepstrum is Fouriertransformed back to the splane, the
splane coordinates will no longer be pure real. It is now a
minimumphase spectrum, with real and imaginary coefficients. The
sequence of imaginary coefficients is
orthogonal to the sequence of real coefficients. Much like an 'analytic
signal', which is the transform of a singlesided spectrum.
logarithms of amplitudes 
phases 
From the splane, it's now back to zplane and from there to signal
domain, where the IR appears as a minimumphase filter. It is a causal
sequence. For comparison, the zerophase and minimumphase equivalents
are shown together below. They both have the same magnitude spectrum.
zerophase filter kernel 
minimumphase filter kernel 
When FIR filter kernels are created directly from an arbitrary
magnitude spectrum, there is a risk that the spectrum sequence will
contain sharp edges. In a signal, sharp edges introduce alias
frequencies, they simply should not be there. In a spectrum, they
should not be there either because they introduce alias phases in the
signal. The easiest way to avoid them is to lopassfilter the
magnitude spectrum by default. This erases any possible sharp edge,
just to be sure. Alternatively,
the filter kernel
could be windowed in signal domain.
For the purpose of this discussion I'll use the Hann window as the
example. More sophisticated windows exist, but the Hann window is
attractive as
it's Fourier transform has only three coefficients. Multiplication with
the Hann window in one domain is a threepoint convolution in the other
domain.
Hann window 
Hann window Fourier transform 
Even if you would accidentally define a brickwall spectrum, the
lopass filtering will remove the sharp edges.

For a zerophase filter, it works straightforwardly. Sidelobes are
reduced to an acceptable level. Compare the two frequency response
plots below, for the case of an eight bin wide brickwall spectrum and
it's lopassfiltered version:
measured frequency response for brickwall spectrum 
measured frequency response for lopassfiltered brickwall spectrum 
Now comes the bad news. When the lopassfiltered brickwall is
recalculated into logarithms of amplitudes, the magnitude spectrum
appears on the splane with discontinuities again:
If this logarithmic spectrum is transformed to cepstrum rightaway,
there will still be alias phases in the cepstrum. In the sense of
lopass filtering, the zplane and splane data are simply
incompatible! And look how the 'lopassfiltered' minimumphase
brickwall performs in reality:
measured frequency response for lopassfiltered minimumphase brickwall spectrum 
In official methods, the problem with splane aliases is solved by
upsampling the
spectrum. Upsampling makes the spectrum smoother, and sharp edges
disappear without data loss. It is 'most efficiently' done by
zeropadding the filter kernel in time domain, then use very large
FFT's to go to spectrum and cepstrum. The FFT sizes needed for this are
discourageing. Is it possible to compute how many times the IR must be
upsampled, in order to erase sharp edges in an splane spectrum? It
depends on epsilon, for one thing. If epsilon is 1/million and it's
natural log is 13.81551, the neighbour of epsilon must be exp(0.75 *
13.31551). Oops, that is about 0.00005. In the lopassfiltered
brickwall spectrum, the neighbour of epsilon is 0.25. It is indeed not
uncommon to zeropad to a thousand times the IR length. For my 2048 pt
Fourier filters,
that would mean twomegapoint FFT's, to find a minimumphase filter!
Completely ridiculous and out of
the question, for a realtime performance application.
Instead, I decided to just window the
cepstrum. This is equivalent to lopass filtering of the logaritmic
magnitudes. Of course, this is utterly incorrect from a theoretical
perspective. After all the operations, the doublyfiltered
brickwall amplitudes reappear like this on the zplane:
But hey, it helps. When the real frequency response associated with
the resulting
spectrum coefficients is tested, it turns out to perform no better
than the zerophase unfiltered
brickwall spectrum, but much better than with unwindowed cepstrum. Here
it is:
measured frequency response for lopassfiltered minimumphase brickwall spectrum, cepstrum windowed 
For radical filter settings, sidebands are very well perceivable in
test conditions with a sine sweep. I'll have to put up with that, till
I invent some decent alternative for the excessive upsampling.
By coincidence, my parametric Fourier filter (as described on previous pages) is to a
large extent immune to splane discontinuities. Look at the generic
spectrum shape below. The filter definition is based on a logarithmic
sine sweep,
raised to a power and clipped at the top. The small numbers at the
'foot of the mountain' happen to translate well to a nonaliased
splane magnitude spectrum.
parametric Fourier filter lobe (linear scales) 
Just like there's two ways to make an analytic signal via frequency
domain, there's also two ways to make a minimumphase signal. In the
above description, the cepstrum was made singlesided, and an analytic
sequence resulted as a spectrum on the splane. Instead, Hilbert
Transform could be performed on the splane magnitudes, in order to
derive the phase spectrum as a pi/2 radian rotated sequence. The two
methods were illustrated for signals on the page 'complexify
a real signal'. Phase separation by Hilbert transform is described
by Niranjan DameraVenkata &c. in [1].
In order to rotate all phases of a sequence over pi/2 radian, it's
Fourier transform must be multiplied with i for the positive
frequencies and i for the negative frequencies. In the plot below, that
is done for the example cepstrum. The positive side is multiplied by
1, and rotation by i is simply accomplished by feeding the cepstrum in
the imaginary input of the FFT, back on the way to splane. The
identity point, 'DC', is eliminated, like the Nyquist point should.
They wouldn't survive the imaginary side of Fourier transform anyway.
cepstrum 
cepstrum made antisymmetric 
The phase spectrum now appears at the real output of the FFT, and it
must be combined with the original magnitude spectrum to compute the
zplane coordinates. The result is almost identical to the earlier
described 'homomorphic' method. However, for the 'brickwall cases' the
Hilbert transform method seems to be slightly more advantageous, as the
original amplitudes are kept intact, whatever windowing is viciously
done on the cepstrum.
measured frequency response for lopassfiltered minimumphase brickwall spectrum, cepstrum windowed, Hilbert transform method 
When the cepstral multiplications of both methods (homomorphic and
Hilbert transform) are compared, a striking similarity appears. In both
cases, it is a (scaled / flipped / horizontally shifted) unit step
function with exceptions for DC and Nyquist point. In addition, the
Hilbert transform method requires multiplication by i.
homomorphic cepstral multiplication 
Hilbert method cepstral multiplication 
A long zerophase FIR is in fact a weird filter solution. If you
think of a filter as a series of reflections, the zerophase filter
produces reflections which precede the direct response, preflections
one could say. Even though the filter reflections are not audible as
echo's, the preflections are unnatural. If you listen well, they can be
perceived as a slurping sound, like a reversed playback, but very
short. I only became aware of them when comparing my filters, while
feeding an impulse for latency testing with
radical settings. In real
world musical situations, preflections won't be perceived as such, but
they can make the sound muddy. In contrast, reflections which follow
the direct response are natural and do not compromise the character of
a sound.
zerophase impulse response 
minimumphase impulse response 
So, if you perceive a minimumphase response as more direct and
present, it is not only due to reduced latency but also to the more
natural reflection pattern.
The minimumphase IR can only reveal it's fast response when the input
sound is not processed in filtersize blocks. It could be processed as
plain timedomain convolution, but my favourite filters are up to a
couple thousand coefficients, way too CPUintensive for simple
convolution. With 'partitioned convolution', a balance between
efficiency and directness of response can be found. The technique uses
FFT to perform fast convolution in small blocks, called partitions.
minimumphase IR divided in partitions 
Each partition of the IR is zeropadded and separately transformed
to frequency domain. The input signal is partitioned, zeropadded and
transformed likewise. Zeropadding is necessary to avoid circular
convolution. Complex multiplications in frequency domain replace time
domain convolution as usual. Overlapadd and overlapsave algorithms
seem to be used both in partitioned convolution. See page 'fast convolution'
for illustrations.
partitions are processed and summed to the output 
A more detailed scheme of operations can be found in this article by
Angelo Farina: REALTIME
PARTITIONED CONVOLUTION FOR AMBIOPHONICS SURROUND SOUND. I still
have to write my own partitioned convolution to learn more about it.
For the moment I'm happy to use Benjamin Saylor's [partconv~] object in
Pure Data.
Remarkably, partitioned convolution works with unwindowed input and
output signal, in contrast with regular fast convolution. The window is
supposed to be implemented in the filter kernel. In itself, this is
advantageous as you need not waste CPU time on windowed overlapping.
But a welcome sideeffect of windowed output is now sorely missed: the
automatic gain crossover at the moments of filter kernel switch. The
sudden filter kernel replacement produces a nasty spike when an input
signal is present. Even when the signal is in the filter's stop band!
A windowed overlap of the same signal with old and new filter is
just not available. The fact that partitioned convolution does not work
with block size equal to the filter size disables that option by
definition. So what to do about it then? The output gain could be
temporarily shut off to avoid the spike, but a short gain modification
sounds like a spike as well.
I found a practical secondbest solution in a granular synthesis
equivalent. The output is shortly and smoothly reduced, and
filtersized granule of delayed sound is copied over the location. This
gives the wellknown amplitude modulation effect associated with cheap
pitch shifting (see illustrations on page 'pitch
shifting'), but it is at least much better than spikes.
The stuff described on this page is implemented in a Pure Data demo
patch. If you have Pdextended 0.42.5 installed you can use it
rightaway. Otherwise download and install Pdextended for your
platform, it's fairly easy (http://puredata.info). Get the zipped
demo patch from the link below the screenshot.
minimumphaseFIRdemo.pd.zip,
Pure Data patch for Pdextended, 8 KB 
reference [1]:
Design of Optimal MinimumPhase Digital FIR Filters Using Discrete
Hilbert Transform  Niranjan DameraVenkata, Brian L. Evans, and Shawn
R. McCaslin  IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 5,
MAY 2000