At the time of this writing, 2011, Pure Data still works with
single precision floating point format as it's numeric datatype. With
double precision, some issues could be solved, like indexing of large
audiobuffers. But at what cost? Obviously, double
precision numbers consume double memory. But is double precision also
slower? Is it possible at all to have Pd work at double precision?
In order to test performance and evaluate pros and cons, a double
precision Pd must first be built. In the past, work has been done to
enable double precision builds of Pd. However,
there is still code in Pd's core which is known to fail with type
double. This page documents alternatives to such
bits of code, and shows test results of double precision Pd,
performance-wise and precision-wise.
So far, I have mainly concentrated on the phase-wrapping classes
which were not yet ready for double precision because they use a
type-specific optimization trick. These classes are: phasor~, cos~,
osc~, vcf~ and tabosc4~. Another known incompatibility is macro
PD_BIGORSMALL in the API
header m_pd.h. Further, I have fixed issues which popped up while
testing mentioned rewritten classes. At the page bottom, you can
download .patch files to implement the code changes in vanilla Pd
0.43-0. A set of tests is included in the .zip. Mind that this is a
work in progress, and issues (hopefully only minor) are still to be
The precision of 32 bit floats is equivalent to 24 bit integers, with the added advantage of a huge range and the associated ease of programming. So what need is there to build a double precision Pd? In exceptional cases, 32 bit floats can not properly express a calculation result, due to rounding. Depending on the calculation type, this may translate as aliasing in the audio or some other unexpected effect. There is always the option to write such calculations in a C external, using integers for indexing and doubles for running sums and interpolation. But for me, it is a delight to have double precision directly in a Pd patch. Pd is a fast dsp prototyping tool, exactly because you do not need to write things in tedious low level syntax. Also, for educational purposes it is better to have calculations in a Pd patch than hidden in a C file. I collected some examples to illustrate precision bottlenecks in real world dsp situations.
The first example is where I wanted to produce a 20 seconds
exponential chirp for IR measurement. When this was done as a Pd patch
with regular math objects, my colleague in the IR project perceived a
weird modulation in the high frequency end. A print of time indexes,
expressed in unit seconds, soon pointed to the cause. Numbers were
rounded to their bones. In some math routines, a different order of
operations may help out, but not in the exponential chirp case. The
chirp generator was done as a C external with double precision inside.
The time index sequences below, from regular Pd and my experimental
double precision Pd respectively, illustrate how double precision
numbers have enough 'numerical headroom' for this particular case,
the single precision numbers have not.
current Pd 0.43
Another, possibly familiar, example is where unit vectors
corresponding to low frequencies are computed.
Cosine tends to be 1., while sine tends to be identical to the angle in
radians. This is caused by rounding. Together these incorrect sine
and cosine still produce a vector with radius ~1. If you use them both,
for example to make an oscillator or resonator, you can go down to
rather low but undefined frequencies. In contrast, doubles have just
bits to represent sine and cosine for LFO frequencies with sufficient
accuracy. Looking at these numbers I guess that doubles would not
be of help in calculating planetary orbits, with their extremely low
current Pd 0.43
You have probably noticed that single precision floats are
represented in Pd with a maximum of 6 significant digits. This is done
to hide inconsistencies like 33 * 0.3 = 9.900001, which happens when 7
decimal digits are shown. The limitation to 6 significant digits is
however very annoying when we want to type or see large integers, like
indexes of long audio buffers.
In calculations, Pd can use all integers from 0 to
2^24 = 16,777,216, over 16 million. Numbers with up to 8 decimal
digits. But in the GUI, 1 million is shown in scientific notation as
1e+06. This scientific notation also uses a maximum of 6 significant
digits. Therefore, the next representable number is 1.00001e+06, that
is 1 million and 10. See below how the result of 2^20 is rounded in the
GUI, as an example. The fact that numbers are also stored with patches
this way, makes
things worse. At 44100 Hz samplerate, one million samples is about 23
seconds - not at all that much. Longer audio buffers are allowed, but
handling them in the GUI becomes tricky.
current Pd 0.43
Apart from the representation issue, 32 bit float resolution leaves little room for interpolation in the case of long audio buffers. The [soundfiler~] object loads a maximum of 4 million samples into an array, longer soundfiles are truncated by default (can be overruled). At 44100 Hz samplerate, this is about 1.5 minute. The [tabplay4~] object can still produce a decent sound with that length, provided you do not play at too low speed.
The table below shows some fractional indexes in double precision
for speed 0.3. In single precision, these numbers can not all be
expressed, and aliasing is the audible result. Double precision
instead, has enough bits to interpolate in buffers of any length your
RAM will permit. Therefore, the maximum buffer length for [soundfiler~]
extended when building Pd for doubles.
large indexes for fractional speed 0.3, in double precision
The above mentioned examples suggest that double precision is needed whenever you want to do something eccentric in Pd: very long buffers, very low frequencies. In such cases it can make a difference for sound quality indeed.
However, it is certainly not the case that double precision Pd
sounds more beautiful in general. If you listen to [osc~] at any
audible frequency, it is unlikely you'll perceive any difference
between the two precisions. With Fourier analysis, I could not detect
impurities above -100 dB in single precision [osc~]. The object reads
from a 512 point table, with linear interpolation, and this will only
suffer from imprecision at very low subsonic frequencies. The same
observation holds for other table-reading classes, and for the filters.
These classes are designed to function with sufficient precision under
Therefore, it makes no sense to load an arbitrary patch in double
precision Pd and expect to hear better sound. Single precision Pd can
sound very beautiful as well. Problems with sound quality are more
often result of poorly controlled dynamics, than of precision issues or
other deficiencies in Pd objects.
While having limited effect on overall sound quality, double
precision can still be beneficial for the quality of Pd programming in
a different sense. Precision-robustness, combined with full numerical
control from within the GUI, will convert Pd from a pocket calculator
into a scientific calculator. Any user - artist or scientist - could
verify meticulous math stuff directly in Pd, rather than bc, Octave,
Grapher, C, or whatever inconvenient combination of these. As a
computer language, Pd could be used for math classes and assignments at
any level. Abstractions illustrating advanced math topics could be
made. A wider use of Pd as a serious math language could have an impact
on technical knowledge shared within Pd community. In my view, double
precision could affect sound quality in particular via this detour.
The rewritten phase-wrapping classes are optimized as much as possible. They were profiled individually to make sure none of them suffers performance loss. This work was done on an Intel Mac, with SSE instructions automatically generated by the gcc compiler. Later, I found that Pd compiled for Linux uses the FPU for all floating point operations, while CPU type is not explicitely defined. I reckoned with this to make Pd run smooth in both cases.
In order to check overall performance, I searched for pure vanilla
patches, employing a representative mix of objects. The following works
seem suitable for benchmarking: Chaosmonster1 from Martin
Brinkmann, and Cave of Creation
initiated by Hamster on the Pd forum. Both have an emphasis on
vcf~, osc~ and delaylines. As such, they test the modified
phase-wrapping method, the PD_BIGORSMALL definition which is used for
every sample written to a delay line, and speed of memory access for
floats and doubles in general. Since the new code must also perform
in a single precision build, comparisons includes three Pd builds:
original = official vanilla Pd 0.43-0
single = vanilla Pd 0.43-0, double-ready modified, but built in single precision
double = vanilla Pd 0.43-0, double-ready modified, built in double precision
The benchmarks in table 1 were done in the Shark
profiler application on a MacBook with Intel core2duo 2 GHz.
Numbers represent CPU time as a percentage of total CPU time. Note that
Pd dsp can only use one core at a time. Table 2 represents a Panasonic
Toughbook coreduo 1.83 GHz with Debian Squeeze installed. The results
were read from the top command and are less precise. Top shows
percentages per core by default, and in this case, by lack of Symmetric
Multi-Processing support, that happens to equal the percentage of total
CPU time. Cross-comparisons with figures in table 1 are therefore
irrelevant. Builds with and without SSE instructions are shown in table
Cave of Creation (1 instance)
Chaosmonster1 (10 instances)
table 1: percentage of total cpu time (+/- 0.1) on MacBook core2duo
2 GHz with OSX 10.5
Cave of Creation (1 instance)
Chaosmonster1 (10 instances)
table 2: percentage of total cpu time (+/- 1) on Panasonic Toughbook
coreduo 1.83 GHz with Debian Squeeze, no SMP support
From this (admittedly small) set of benchmark tests, I conclude that
single precision builds of double-ready Pd may be a tiny bit faster
than current Pd, with SSE more than with FPU instructions. Double
precision Pd is slower than singe precision compiled from the same C
code, but differences are so small that optimizations in the
double-ready code make the net result comparable to current Pd.
More benchmark testing is needed to derive final conclusions. The
rewritten code parts were kind of 'tuned' to the SSE instructions in
the developement process, and modified again when I noticed pathetic
performance on the FPU. Now it is still the question how it behaves on
PPC and ARM.
Expressed in number of bytes, or bits for that matter, double precision requires two times more data processing than single precision. How come the performance differences are still so small? It seems that Pd does not profit from 'vectorization'. On Intel, simple operations on floating point numbers can be done in the 128 bit wide xmm registers. Instructions are available to process two doubles or four floats simultaneously. An example is MOVAPD - Move Aligned Packed Double-Precision Floating-Point Values. With proper optimization options set, the compiler will try to implement such vectorized instructions.
Conditions for compiler autovectorization are rather strict. Basically, data must be accessible as blocks with a size matching the corresponding instruction. For many of the SIMD (Single Instruction Multiple Data) instructions, 16 byte aligned data is required. Pd's signal block size is set in runtime, and in general the compiler knows little about data alignment in Pd. Other obstacles for autovectorization are: potential pointer aliasing, conditional statements, and loops with too complicated routines. All these obstacles are frequent in Pd.
When Pd is compiled with gcc flags -ftree-vectorize and
-ftree-vectorizer-verbose=5, the compiler comments indicate that not a
single performance-critical dsp loop in Pd can be autovectorized. Pure
Data operates largely unvectorized. Single precision Pd exploits ~1/4
of the processing capacity in the xmm registers, and double precision
Pd exploits ~1/2 of it. It would be a challenge to modify Pure Data in
a way to profit a bit more from autovectorization, provided this would
not affect functionality, flexibility and portability. But that could
be a major project in itself. The purpose of this digression on
vectorization was to understand the surprisingly small performance loss
of double precision Pd. In terms of CPU load, we pay too much for
single precision, and get double precision at the same price.
There seems to be no performance penalty for double precision Pd, but in terms of memory space the price is of course high. How much audio can be loaded in 1 gigabyte (1024^3 or 2^30 bytes) of memory? This would be 2^28 samples for 32 bit float format and 2^27 samples for 64 bit. Assuming 44100 samplerate, this tranlates to about 100 minutes of single precision audio and 50 minutes of double precision audio.
If Pd is built for double precision, there is no way to make an
exception for audio buffers. Considering the fact that most audio is
stored on disk as 16 bit integers, the sample format conversion to 64
bit floating point seems ridiculous precision overkill, no good for
nothing. Of course, recent computers do have a lot of memory. But in
order to use it all, you may need 64 bit address space, which is yet
another complicated topic.
All wouldn't be so problematic when Pd could load soundfiles in the
background with low priority. At the moment this is not the case. If
you want to play soundfiles in live performance, you need to load all
files into memory before the session, otherwise dropouts will ruin your
show. This annoyance, which is in itself unrelated to the precision
question, asks for a solution anyway.
It is possible to simultaneously run a single precision and a double precision build of Pd, but they can not be mixed. Function symbols are identical for both precisions, and fatal symbol collisions can happen if externals of the wrong type happen to be in Pd's search path, and loaded accidentally. The different type sizes will definitely lead to a segmentation fault.
Here's the git repository where you can browse and get the double-ready code:
Test patches for pd-double are in this downloadable .zip file:
pd-double-testpatches.zip, 52 KB
Double precision builds of vanilla Pd are available for OSX and Linux via this page:
Some partly succeeded double precision builds of Pd-extended 0.43.1
are available from the autobuilds:
Windows builds are not yet available because of
double-precision-related bugs in some parts of Windows-specific code.
I've uploaded a video clip with single/double precision demo here. The patch used in this video clip is included in the .zip file with test patches, so you can try the demo if you've built or installed pd-double.