# Windowing & Frequency Domain Filtering

While the page FFT Window and Overlap illustrated some minute details of windowing in general, I now want to find the best windowing strategy for spectral filtering. My parametric Fourier filter routine has the following basic filter spectrum curve:

A logarithmic sweep, clipped at the top, with variable width and position. When shifted over the frequencies range, the bandwidth is adjusted automatically so the Q factor is retained.

In a log sweep the left side slope is always the steepest, and it gets steeper automatically when the curve is shifted towards the lower frequencies. Steep flanks in a filter spectrum can cause us troubles, and since I want to push it to the limit, I need to know where the limit is and what it looks like.

Allegedly, brickwall filtering exceeds the limit of what can be done. I have learned that at school, but I could never imagine what the precise effect is. Therefore I will now perform some simple spectrum-mutilation experiments and see what happens.

Below is a test function plotted. It is a windowed cosine of periodicity 3. A modulated window function, one could say as well.

The spectrum coefficients of the modulated Hann-window are:

X[2] = 0.125
X[3] = 0.25
X[4] = 0.125
and their conjugates

Mutilating the wave's spectrum with surgical precision, I will now remove the X[4] coefficient and it's conjugate from the spectrum, and revert to time domain:

Oops. Seeing the plot, it is hard to imagine that I could not imagine this. The window is partly undone. That is how I perceive it, because I would like to filter any frequency but not the window. But this is only one way to perceive the effect. Let us think a little longer about it. In fact, the original coefficients represented:

0.25 * cos((2*pi*x/N)*2)   second harmonic
0.5 *  cos((2*pi*x/N)*3)    third harmonic
0.25 * cos((2*pi*x/N)*4)   fourth harmonic

The third harmonic I consider the test input frequency, and the second and fourth harmonic are the sum and difference frequencies resulting from multiplication with the window. The combination appears as a windowed function within the FFT frame. Successive frames look like this:

With coefficient X[4] removed, the reconstruction is still periodic without discontinuities, and contains two frequencies of unequal amplitude:

But these are not the signals that we are going to hear. I have to find the result of overlapping frames. With two times overlap, each successive frame is time-shifted by N/2. The window function remains centered in the frame, but the input functions are phase-shifted, and for a cosines of periodicity 1 and 3 this happens to be a sign-inversion. The sum of overlapping frames, each still having all three harmonics, is exactly that single harmonic 3:

And here is the sum of overlapping frames where the X[4] coefficient was taken out:

Amazing! The original input is still restored. This must be sheer coincidence, or not? Was my example too simplistic to reveal the downside effects?

Yes, the example input frequency harmonised with the framesize, so it is an exceptional case. Still it shows a mechanism that is also at work for non-harmonic frequencies. To understand what is happening here, we must temporarily adopt another perspective on windowing and overlap. The window is to be perceived as a set of two (or more) functions. One is the DC component, and it's task is to preserve the original signal with a certain magnitude factor, like 0.5. The other function is a modulator, converting the original input frequencies into sum- and difference frequencies, also with a magnitude around 0.5. In the Hann window, and some others, the modulator is just the FFT's fundamental cosine. That is an uneven frequency, and from this comes the special effect in overlapping FFT frames. Every half period, it's sign is flipped. So in overlapping frames, though the window is always the same, it's cosine component is everytime flipped respective to the analysed signal. I hope the following figure can make this clear:

Notice that the window's DC component does not flip sign. Therefore, the original frequencies are analysed and restored as they are, with amplitude factor 0.5 + 0.5 = 1 for the overlapping frames. The sum- and difference frequencies however, as generated by the signflipping cosine component, have different sign for the even and uneven numbered FFT frames. They cancel themselves in the sum signal as it is reconstructed. The fattened main lobes in a spectrum are partly built of antimatter!

Now I want to try a test wave inharmonic with the FFT size: periodicity 10.2. I picked coefficient 11 (and it's conjugate) off the spectrum. To my surprise, the wave came out untouched. But then I zeroed coefficient 10. This really spoilt the wave. While it fades, it is heavily distorted, and be shure that it sounds rotten (that is, if you want things neatly processed). The visible kink produces a high frequency rattle, but there is also a low frequency product in the sound, possibly originating from the overlapping FFT periodicity.

So - does the antimatter trick no longer work here? The case of an input frequency inharmonic with the FFT size is much more complicated. The discontinuities at the frame borders begin to play their role. Although a Hann window looks smooth, it is actually a sum of two functions which are both not smooth at all. The DC component in a window is an attenuated rectangle window, and the cosine term is chopped as well.

These chop cuts are responsible for extra frequencies, which we can see if the chopped functions are analysed within a wider frame, padded with a lot of zero's on the left and right. The previous page showed Dirichlet kernels representing spectra of chopped functions. Time-shifting the chopped fundamental cosine by N/2 will sign-flip it's Dirichletish spectrum. This part of the window retains it's function, no matter if the input frequency harmonizes with the FFT size or not. It is the rectangular part of the window that causes us troubles. This part was intended to act as an identity, preserving the input frequency. But now it turns out that this window part produces sum- and difference frequencies as well. And, since the rectangular part does not phase-shift over time, these products do not annihilate themselves in the reconstruction with overlapping frames. If we take out coefficient 10, the input frequency is attenuated, but the products caused by the rectangular part of the window will survive. In the spectrum, these products were phase-hidden by the cosine window products, but in the overlapping reconstruction they are neatly restored. What you see is not what you get! For such a case, the output result is very similar to non-windowed processing.

Using a Hann window, I have checked that a gradual attenuation in the spectrum,  running over three coefficients, worked without generating audible artefacts:
x[9]  = 1.00
x[10] = 0.75
x[11] = 0.50
x[12] = 0.25
x[13] = 0.00
x[14] = 0.00 etc

In my Fourier filter with it's extreme filter options, spectrum flanks much steeper than this can happen. Is there any window type that can handle such abuse? Using the Max Msp pfft~/fftin~/fftout~ objects, I definitely have smoother sound from the same Fourier filter process, than with my own C code.

the resynthesis window: brickwall me

Admittedly, I have been puzzled by this matter for more than a week, while the answer is soooo simple. It is embarrasing. I have been using the window before analysis all the time, while it is much more important to have one after resynthesis. Shall I now rewrite this whole page, and never mention how I bungled around with an analysis window? Hmmmm... the analysis experiments revealed some effects that must be at work in a resynthesis window as well, and I still want to understand why and how things work.

Let me first state that hard cut brickwalling with very low artefact level is possible, if only you use enough FFT overlap, and windows before and after transform. The wacky scope trace below can not be proof of that, but believe me, it sounds a hundred times better than before.

Is there a way to visualise what is actually happening? Let me produce a very simplistic testcase: a cosine input of periodicity 1.5 respective to the FFT framesize, as inharmonic as can be. I do not window this cosine before analysis. In it's spectrum, I eliminate coefficients 1, 2, and their conjugates N-1 and N-2. After IFFT, there is still quite a lot of output, as the figure on the right below shows:

The mutilated wave still has the periodicity of the input, but the frequency content is completely altered. At the frame borders, it has a spiky shape. The output largely represents the difference between a cosine of infinite length and a chopped cosine. So it represents the cuts, respective to the cosine that I had tried to remove. Like a deflated balloon, glued to the frame edges. Rather alarming, how much signal energy there is still left.

But now I am going to window this output signal:

There. The most offending element is already gone. But what will happen to the remainder? That is harder to grasp.

When we windowed the input signal, the overlapping frames were time-shifted snapshots of one and the same signal stream everytime. Therefore, the sum- and difference frequencies resulting from the window-modulation, propagating through the transforms, were perfectly undone in the overlapping reconstruction of the signal. In contrast, the transform artefacts were not modulated, only summed to the total output.

The even- and odd-numbered IFFT output frames however (assuming two times FFT overlap here) do not represent one and the same signal. The original input components coincide, but eventual transform artefacts pop up alternating. If we window each IFFT output frame and then sum the frames, the overlap will not by definition undo all modulation products. The elimination of boundary spikes is the most conspicuous aspect of this mechanism. Theoretically, every new frequency component that was generated as a side-effect of the frame-by-frame processing, is a candidate for elimination. The artefacts are however not identical in successive frames, so their elimination is not guaranteed. The more overlap, the better elimination.

To summarize the decisive difference between analysis and resynthesis window: while the analysis window temporarily repositions the frame boundary effects, the resynthesis window boldly eliminates them in the output. It is best to use them both, but the resynthesis window does most of the work. By the way, using two windows in series means that they are actually squared. Therefore, you need at least four times FFT overlap to produce a constant window sum.

At last. I can now test some window types in practice. I compared Hann and Blackman, feeding a pure sinusoid of variable frequency through a 1024 point spectral filter that zeroes everything below or above a chosen brick wall cut off point. Starting out with four times FFT overlap, both window types seem to produce a similar type of artefact: around the cut off point, where the input fades, a faint undertone is produced. With the Hann window, it is very hard to perceive this tone, but with Blackman it's level is higher.

Scaling up to eight times overlap, I can no longer hear any product frequency.  There is only a smooth fade out of the input frequency. The window type does not matter. Hann and Blackman do this job equally well.

Now that brickwall filtering seems to be a realistic option, I want to know the specifications of the output. What is the filter slope? I checked the dB output for a 1024 point lo pass filter with a hard cut off between FFT bin 10 and 11. Here are the attenuation values for the center frequencies of some bins:

bin 10: - 1.5 dB
bin 11: - 13.5 dB
bin 12: - 45.5 dB
bin 13: - 93 dB

Inbetween these values, the decay is smooth. This was with Blackman windows. Using Hann windows, the -93 dB point was already at bin 12, so the slope is actually steeper than with Blackman. I repeated those measurements at other frequencies, resulting in  equivalent bin attenuation values. So, in the output, the slope is not a brickwall. It is probably sinusoidal, albeit extremely narrow. The above mentioned values are for 8 times overlap. With 4 times overlap, the decay in the first bins is equivalent, but further away there seem to be higher ripples than with 8 times overlap.

The measurements made me realise that specifications of brickwall filtering in frequency domain can not be expressed in dB per octave. Neither can it be expressed in dB per bin, since that has a non-linear relation. But you could define it in terms of decay over the first bin interval or decay over the first two bin intervals. And a bin interval is a single-harmonic interval. What that means in Herz depends on FFT framesize, sampling rate, and position within the spectrum. At the low end of the spectrum, the bin 1 - 2 interval is one octave indeed. (The bin 0 - 1 interval is undefined in terms of octaves, it is infinitely many octaves). The bin 1 - 4 interval is two octaves, and you can easily produce -93 dB attenuation over that interval. At all higher positions in the spectrum, the figure is (much) better.

The one thing that you can not do with spectral filtering, is to selectively eliminate or attenuate a frequency within a bin. Because, there really is only one frequency defined per bin. In the low frequencies range, that can certainly be an issue. But the virtue of a brickwall-proof filter setup is, that at least the sound can not be spoilt by an attempt to push things to the limit.

conclusion

After these experiments, my personal windowing preference for spectral processing would be: four times overlap, using Hann windows before analysis and after resynthesis. The artefact level is so low, that it can not really be perceived otherwise than in test conditions. The Blackman window leaves louder modulation frequencies, possibly because of it's second harmonic cosine component which has less optimal sign-flipping behaviour in overlapping frames.

I am now getting used to four times overlap as the minimum for FFT processing. It is not even so computationally intensive. On my 2 GHz MacBook, the complete filter process is responsible for slightly over 1 % cpu load in realtime at 44k1 sampling rate.