  Coherence is a common feature found on many analyzers enabling us to distinguish signal from measurement contamination. It will indicate whether you're measuring a loudspeaker or, e.g., the noise produced by a moving light.

Coherence is subject to change. One of the aspects involved (among others) that we’ll explore in-depth, is the relationship between the direct sound of a loudspeaker in a room and the room’s reverberation.

This article is an attempt at putting real-world observations into context. Many of the results have been obtained experimentally and some concepts, that don't directly further our understanding, have been omitted deliberately.

Loosely defined, coherence is a—lump—statistical measure which indicates contamination of the measurement. It's proportional to the ratio of coherent power (signal) to the sum of coherent power and non-coherent power (contamination).

\begin{equation}Coherence\propto\frac{Coherent\ Power}{Coherent\ Power + Non\mbox{-}Coherent\ Power}\end{equation}
Where non-coherent power is all uncorrelated residual output signal power, that is—not linear dependent—on the system-under-test's input signal power, and comes in many flavors such as noise, reverberation, and distortion.

For a more in-depth explanation why this is, please watch the video at the bottom of this page.

Among others, coherence indicates Signal-to-Noise Ratio (SNR), illustrated by the spectra in figure 1, and by extension, speech intelligibility and such. In practice, it suffices to think of coherence as a data quality indicator.

High-coherent data is reliable and actionable data, informing us how to move forward with sound system calibration.

Figure 2 shows the behavior of coherence ($$\gamma$$) vs. SNR in, e.g., Smaart v8. From this chart we can conclude that 10 dB of SNR should suffice for approximately 95% coherence. In other words, as long as you measure 10 dB above the noise floor, you should be fine.

From there on, increasing the number of averages will allow you to improve coherence even further, without actually increasing SNR with brute force by raising the excitation signal level.

Once there's no more visible improvement in coherence, increasing the number of averages even more—without gaining new information­—becomes pointless.

However, all of this is based on the assumption there's actual signal left over...

Destructive interference

Destructive interference will destroy signal which in turn will be replaced by whatever is left over. Typically, residual ambient noises like HVAC, moving lights, generators, audience enthusiasm and such.

Figure 3 shows a comb filter spectrum. Comb filters manifest themselves as an alternating pattern of peaks and cancels in the frequency domain. Comb filters are caused by adding multiple copies of the same signal (produced by other speakers or reflections), arriving at different times, together.

Comb filters are inevitable whenever there's physical displacement between multiple sources reproducing the same signal or surroundings which consist of specular, reflective boundaries. These phenomena are particularly noticeable when walking the room while listening to pink noise, and are typically described as phasing, flanging or chorusing. Sounds familiar?

Comb filters appear to have audible pitch. The comb filter's peak-frequencies constitute a harmonics series whose apparent pitch is equal to the first peak's frequency (fundamental). Which pitch you perceive is uniquely defined by your listening position with respect to the sound system and surrounding boundaries, regardless of the program material itself, and is therefor a moving target which typically goes up and down in frequency (glissandi), as you walk the room, because you're dealing with a purely spatial problem.

Whenever you hear phenomena like these, one should realize that equalization is no longer a viable option unless the apparent pitch remains constant over space, which I yet have to encounter.

However, in such instances, every time the direct sound has been cancelled for whatever reason, signal has been replaced by residual ambient noise (at the nulls or cancels) and both SNR and subsequently coherence, decrease.

I would like to emphasize that all of this happens typically once the sound is airborne, and signal processing typically has limited to no merit other than properly time aligning multiple sources, which doesn't resolve room interaction!

Ripple

The difference between a comb filter's maxima and minima, expressed in dB, is known as ripple (figure 4) and arguably the most important metric when designing sound systems.

Ripple is determined by the relative level offset between two or more copies of the same signal (figure 5). An in-depth explanation of ripple is beyond the scope of this article. For more information please consult the chapter "Summation" which you'll find in all three editions of Bob McCarthy's book "Sound Systems: Design and Optimization".

Meanwhile, I encourage you to play with my phase calculator to gain insight in the balancing act between relative level and time offsets.

Figure 6 shows several transfer functions of the same comb filter with varying amounts of ripple while competing with different amounts of background noise. Notice how coherence (red trace) is greatly affected by both ripple as well as background noise.

In general, less ripple (less degraded, more robust signal) results in overall improved coherence. Interference between multiple copies of the same signal is minimized when relative level offset comes to the rescue. Simultaneously, lower background noise‑levels translate into increased SNR which also improves coherence.

So how does ripple typically evolve over distance indoors?

Critical distance

Ripple goes hand in hand with the Direct-to-Reverberant ratio (D/R). For frequencies with wavelengths much smaller than the dimensions of a given room, we can resort to statistics. This criterion needs to be met for all acoustics equations that follow from hereon. Under such circumstances, the direct sound drops with 6 dB per doubling distance (inverse-square-law) whereas reverberation tends to maintain its level regardless of distance (figure 7).

In the direct field, direct sound dominates over reverberation, with positive D/R values. In the revereberant field, it's the other way around with negative D/R values. The distance where direct and reverberant see eye to eye, at the same level, with a D/R value of zero, is called critical distance. It's where the scale tips.

Ultimately, we listeners experience and measure the combined SPL of both direct plus reverberation, which implies that in the reverberant field, beyond critical distance, the inverse-square-law is typically no longer observed nor experienced.

As long as the direct sound is dominating, there will be little or no ripple (6 dB or less) and high coherence. Because, any late‑arriving energy, that could possibly cause destructive interference, is low in level by comparison. We are effectively isolated from the room and obtain "near-anechoic" data.

Conversely, if reverberant sound dominates, late‑arriving energy is so strong by comparison, that it wreaks havoc on the direct sound causing ripple in excess of 12 dB and poor coherence.

My favorite tool, for the sole purpose of explaining the underlying mechanism involved, is the Hopkins-Stryker equation. With this equation we can estimate critical distance at best. The accuracy of this equation will get you into the “ball park.” It will bring you into the right order of magnitude but should be treated with scrutiny.

Hopkins-Stryker equation

In its simplest form, without the additional modifiers $$Ma$$ and $$Me$$, it looks as follows

\begin{equation}L_{p}=L_{W}+10\log\left(\frac{Q}{4\pi D_{x}^{2}}+\frac{4}{S\bar{a}}\right)+K\end{equation}

Figure 8 provides a detailed explanation of all the variables.

What makes this equation interesting is the part between parentheses. The first and second fraction determine how direct and reverberant levels evolve over distance respectively, Independent of the Sound Power Level (SWL), or simply put, source loudness (volume).

Notice, that only the first fraction contains a $$D_{x}^{2}$$ in the denominator. That's the $$1/r^2$$ dependency or inverse-square-law. Whereas reverberation relies solely on the venue's total surface area and the average absorption coefficient of that combined area.

If we set $$D_{x}$$ to $$D_{c}$$, as in critical distance, and make direct and reverberation equally loud—the condition at critical distance—we obtain equation 3.

\begin{equation}\frac{Q}{4\pi D_{c}^{2}}=\frac{4}{S\bar{a}}\end{equation}

If we solve equation 3 for $$D_{c}$$ we get equation 4.

\begin{equation}D_{c}=0,141\sqrt{QS\bar{a}}\end{equation}

Which indicates that in practice, critical distance depends primarily on $$Q$$ and $$\bar{a}$$ since surface area is a given, unless you intend to bring a wrecking ball.

Let's start by looking at directivity factor first.

Directivity factor

The directivity factor, $$Q$$, is defined as the ratio of the intensity, $$I\,(W/m^2)$$, along a given axis and at a given distance from a sound radiator, compared to the intensity that would be produced at the same distance by an omni‑directional point source radiating the same power (figure 9).

The Directivity Index, $$DI$$, is exactly the same story but expressed on a logarithmic scale. Since $$Q$$ is about sound power, the $$10\times\log_{10}$$ rule is used to obtain equation 5.

\begin{equation}DI=10\log\left(Q\right)\end{equation}

Directivity data (figure 10) may available from a manufacturer, but can also be derived from GLL‑files in, e.g., AFMG's EASE GLL Viewer.

In the absence of such information, $$Q$$ could be estimated, e.g., in the case of a rectangular horn by using Molloy's equation.

\begin{equation}Q=\frac{180^{\circ}}{\arcsin\left(\sin\frac{\alpha}{2}\cdot\frac{\beta}{2}\right)}\end{equation}
Where $$\alpha$$ is the loudspeaker's nominal horizontal coverage angle and $$\beta$$ the nominal vertical coverage angle.

Molloy's equation should be used with care, because it assumes all sound power is confined within the loudspeaker's intended coverage angle. Whereas real loudspeakers suffer from spill or leakage outside their intended coverage angle.

Ideally, the directivity factor is obtained from actual measurements which is best left to loudspeaker manufacturers.

At the end of the day, wide loudspeakers are low-$$Q$$ (an omni‑directional source having a $$Q$$ of 1) and narrow loudspeakers are high-$$Q$$.

More important, real loudspeakers aren't Constant Beamwidth Transducers (CBT). They don't exhibit the same coverage angle for all frequencies. Their $$Q$$ changes with frequency from near omni‑directional with a value of 1 in the low end to typically a constant‑$$Q$$ value for high frequencies as can be seen in figure 10.

Be sure to check out the beautiful know how posters, regarding this topic and more, by Klippel in Dresden Germany

Absorption coefficient

The absorption coefficient, $$\bar{a}$$ of a material, is a number between 0 and 1 which indicates the proportion of sound which is absorbed by the surface compared to the proportion which is reflected back into the room.

A large, fully‑opened window (figure 11) does not reflect sound passing through it. This illustrates an absorption coefficient of 1.

Whereas a rigid concrete wall (window closed), effectively acts like an acoustic mirror and has an absorption coefficient very close to zero.

These are all the ingredients we need to further our understanding of critical distance.

Critical distance continued

Equation 4 implies that in practice, critical distance is a function of frequency. If we entertain ourselves with the notion of a venue whose $$RT_{60}$$ and $$\bar{a}$$ remain constant over frequency, then $$Q$$—for real arrays as well as loudspeakers—will still change with frequency, and critical distance becomes a "moving target".

Figure 12 shows the bigger picture, starting with $$Q$$ and $$DI$$ values for an imaginary loudspeaker in chart #1.

Chart #2 shows $$RT_{60}$$ and $$\bar{a}$$ values for a venue with a volume of 20.000 m3.

Chart #3 shows critical distance according to equation 4, based on the values in charts #1 and #2, and is clearly moving with frequency.

And finally, chart #4 shows D/R which can also be derived from the Hopkins-Stryker equation (equation 2).

This means that for real loudspeakers, at any given distance, for some frequencies you'll end up in the direct field with little to no ripple and high coherence. Whereas for other frequencies you might already find yourself in the reverberant field with lots of ripple and low coherence.

Ripple continued

Figure 13 shows the natural progression of ripple over distance in a room. Evidently, the ripple is only visible if you refrain from applying smoothing.

Close to speaker, in the direct field (even at lower frequencies), most of the frequency response exhibits ripple of 6 dB or less which implies isolation.

However as we distance ourselves to the loudspeaker. The low end is first to transition into the reverberant field, beyond critical distance, with ripple in excess of 12 dB.

Low end

At lower frequencies, single direct radiators or piston drivers are effectively omni‑directional, once they produce wavelengths that significantly exceed their own diameters. Their $$Q$$‑value approaches 1, whereby decreasing critical distance. Therefor, any chance of expanding their direct field, in this part of the frequency spectrum—without raising directivity by committee, i.e., loudspeaker arrays—relies solely on absorption.

However, for low frequencies, absorption by means of friction (converting the kinetic energy of the propagation medium's moving particles into heat), using various materials and fabrics, is impractical. The material needs to be at least a quarter‑wavelength thick which leaves no more room for the audience (pun intended).

Also, achieving diaphragmatic action with membrane absorbers, in this part of spectrum, is cumbersome. Certainly the rigid walls that keep up to the roof won't flex in our favor, and neither will the solid floor.

In other words, the conditions at low frequencies typically tend to be—both—low $$Q$$ and $$\bar{a}$$ values. Neither favor critical distance, so the low end is first to loose where both ripple and coherence will rapidly become worse.

The bottom end of a loudspeaker is typically incapable of avoiding energizing the room. And upon impact, the room cannot absorb that energy. We woke the beast.

Notice, that it is in this part of the spectrum where we observe room gain or LF-buildup. The imaginary black‑dashed lines over the peaks in figure 13, called the "eyeball" envelope, which is how we gauge tonality, clearly shows an increase in LF-buildup over distance. In the combing zone where D/R values are close to zero, direct and indirect are effectively equally loud and correlated late‑arriving energy has 66% chance of adding "gain". Hence room gain.

High end

Conversely, for high frequencies, the horn is very much capable of not waking the beast, by simply steering the sound where it needs to go. Which is towards the audience and not to the plaster.

In this part of the spectrum, where the horn is sole custodian, $$Q$$ will be relatively high (typically 10 or more) and also absorption is still a viable option. Once the "meat-absorbers" enter the room, the audience is likely to improve the conditions, and so does HF-attenuation by air.

High $$Q$$ and $$\bar{a}$$ values, both favor critical distance. Which means that the direct field, in this part of the spectrum, lasts much longer before being overtaken by reverberation. Real-world measurements, like figure 13, confirm this.

The isolation zone where D/R exceeds 10 dB or more, is last to lose market share before having to surrender to reverberation. Ultimately, even the horn, at greater distance, is no longer capable of avoiding the room.

This is why coherence is typically always better—relatively speaking—at higher frequencies compared to lower frequencies. Even in the near-field at close distance to the source.

Limited resources

It's in our interest to keep our audience in the direct field, if we intend to deliver a robust and coherent signal with high intelligibility. Extending critical distance as far as is realistically possible is therefor mandatory.

Looking at equation 4, this typically leaves us with two options. We either invest in: a) $$Q$$ (directivity), e.g., by clever arraying, increasing $$Q$$ by committee, or; b) invest in $$\bar{a}$$ (absorption) which is typically much more cumbersome and expensive.

You can spend the money only once so choose wisely. If neither option is possible, a distributed system is the way to go.

This leaves us with one very important aspect regarding coherence.

FFT & resolution

Dual-channel FFT analyzers transform waveforms (time domain) into spectra (frequency domain). The frequency resolution of those spectra is determined by the so-called FFT size.

The FFT size, expressed in samples, represents a time record, window, or constant. Which I mix up indiscriminately. To calculate the time record for a given FFT size use equation 7.

\begin{equation}TR=\frac{FFT\,size}{SR}\end{equation}

Where TR is the time record in milliseconds and SR is the sample rate in kiloHertz.

For an FFT size of, e.g., 1K (notice the capital K) which is 1024 samples, at a sample rate of 48 kHz. The time record would be 21 ms.

To go from time record to frequency resolution use equation 8.

\begin{equation}FR=\frac{1000}{TR}\end{equation}

Where FR is the frequency resolution in Hertz.

For a time record of 21 ms, the frequency resolution would be 47 Hz.

To go straight from FFT size to frequency resolution use equation 9.

\begin{equation}FR=\frac{SR}{FFT\,size}\end{equation}

Where SR is the sample rate in Hertz, no longer kiloHertz.

Quasi-logartihmic scale

The Fourier transform is linear and results in a linear frequency scale which is challenging to reconcile with our logarithmic hearing sense.

The upper transfer function in figure 14 is obtained using an fixed FFT size of 32K. It provides the desired amount of resolution at low frequencies but way to much resolution at high frequencies.

This is the issue with a single time record for the entire spectrum and the linear frequency scale that comes with it.

Therefore, analyzers like SIM and Smaart divide the entire spectrum in several pass‑bands and assign each band their own optimum time record to produce a quasi-logarithmic FPPO scale.

The lower transfer function in figure 14 shows the same system using multiple time records which results in virtually the same frequency resolution for the entire spectrum.

The criteria for determining the number of time records and their length, are outside the scope of this article. It's a matter of psychoacoustics, e.g., the perception of tonality, reverberation, and echoes while trying to preserve accuracy.

For more information please consult the chapters "Reception" and "Examination" which you'll find in all three editions of Bob McCarthy's book "Sound Systems: Design and Optimization". In the third edition the chapter "Reception" was renamed "Perception".

However, having multiple time records or windows comes with a second advantage. It allows us to "window out" or reject late‑arriving reverberant energy, including echoes if we desire to do so.

RMS or vector averaging

Most analyzers allow you to average measurement data over time which can be done in two ways. RMS or polar averaging, and vector or complex averaging. Both types of averaging produce noticeably different outcomes.

RMS or polar averaging—which discards time—suffices for single‑ended applications like spectra when dealing with random/uncorrelated signals.

However, time‑blind RMS or polar averaging—used with a dual-channel application—like a transfer function, admits late arriving energy, including echoes, into the measurement unlike vector or complex averaging.

Vector or complex averaging is better for correlated signals with a causal relationship (cause and effect), like dual-channel applications.

The process of tuning a sound system is "coordinating interaction with a goal" (Jamie Anderson). Room interaction plays a big part in the perceived tonality and is a steady-state phenomenon as long as the loudspeakers remain stationary and nobody brings a wrecking ball.

The early decay (effect) invoked by a sound system (cause) that excites a room, colorizes the sound. This aspect of reverberation should be captured and measured whereas late arriving energy and potentially echoes, should be rejected. Vector or complex averaging, rejects energy arriving—outside the time records—and discards it as noise.

Figure 15 shows the same system using both types of averaging. One instance with RMS or polar averaging (blue trace) and another instance with vector or complex averaging (green trace).

In both cases, coherence (red trace) will look identical, but RMS or polar averaging admitted much more energy—outside the time records—into the measurement. How much, is best appreciated by looking at the difference between these two traces which is the orange trace in the bottom of figure 15.

Also notice that the blue trace, using RMS or polar averaging, suggests (residual) energy is very much present. Even when coherence is low, the latter indicating low D/R. Without knowing the cause for coherence loss, this is convoluted information. Whereas vector or complex averaging confirms low D/R by the absence of direct sound (big nulls or cancels).

When we conduct response analysis we use internal delay to synchronize the two channels, thereby compensating for propagation delay (latency) and/or time-of-flight.

Once the delay has been set, measurement signals arriving exactly in time will have high coherence values whereas measurement signals arriving earlier or later will have proportionally lower coherence values. Signals that are early or late by 100% of the time record have a coherence value of zero percent (figure 16 - experimentally determined).

With vector or complex averaging, these signals, arriving out of time, will be attenuated in proportion to their coherence values (equation 10), shown by the blue line in figure 16.

\begin{equation}attenuation\propto\log(coherence)\end{equation}

Modern analyzers, with their multiple time records, provide the means to reject late arriving reverberant energy after 1 second for low frequencies, to as little as 5 ms for high frequencies. Which, as far as we know, is psycho‑acoustically justifiable.

Ultimately, both types of averaging—in concert—make for a very powerful combination. As long as one knows the cause for coherence loss.

These technicalities are not trivial. Understanding them increases confidence in the fidelity of the measurement platform.

Be sure to read this seminal AES paper (open access) by Floyd E. Toole which underscores the importance of acknowledging reverberation and its intricacies when tuning sound systems.

Also Meyer Sound's latest video about M-Noise contains a great section, among others, regarding coherence where you see many of the phenomena discussed in this article, in action.

The video below provides an in-depth explanation why coherence can be thought of as an indicator for SNR. 