  Coherence is a common feature found on many analyzers enabling us to distinguish signal from measurement contamination. It will indicate whether you're measuring a loudspeaker or, e.g., the noise produced by a moving light.

Coherence is subject to change. One of the aspects involved (among others) that we’ll explore in-depth, is the relationship between the direct sound of a loudspeaker in a room and the room’s reverberation.

This article is an attempt at putting real-world observations into context. Many of the results have been obtained experimentally and some concepts, that don't directly further our understanding, have been omitted deliberately.

Loosely defined, coherence is a statistical measure which indicates contamination of the measurement. It's proportional to the ratio of coherent power (signal) to the sum of coherent power and non-coherent power (contamination).

\begin{equation}Coherence\propto\frac{Coherent\ Power}{Coherent\ Power + Non\mbox{-}Coherent\ Power}\end{equation}
Where non-coherent power is all uncorrelated residual output signal power, not caused exclusively by the system-under-test's signal input power, which comes in many flavors such as noise, reverberation and distortion.

For a more in-depth explanation why this is, please watch the video at the bottom of this page.

In other words, coherence is (among others) an indicator of Signal-to-Noise Ratio (SNR), illustrated by the spectra in figure 1, and by extension, speech intelligibility and such. In practice, it suffices to think of coherence as a data quality indicator.

High-coherent data is reliable and actionable data, informing us how to move forward with the calibration of a sound system.

Figure 2 shows the behavior of coherence vs. SNR (experimentally determined) in, e.g., Smaart v8. From this chart we can conclude that 10 dB of SNR should suffice for approximately 95% coherence. In other words, as long as you measure 10 dB above the noise floor, you should be fine.

From there on, increasing the number of averages will allow you to improve coherence even further, without actually increasing SNR with brute force by raising the excitation signal level.

Once there's no more visible improvement in coherence, increasing the number of averages even more, without getting something in return, becomes pointless.

However, all of this is based on the assumption there's actual signal left over...

Destructive interference

Destructive interference will destroy signal which in turn will be replaced by whatever is left over. Typically, residual noises like HVAC, moving lights, generators, audience enthusiasm and such.

Figure 3 shows a comb filter spectrum. Comb filters manifest themselves as an alternating pattern of peaks and cancels in the frequency domain. Comb filters are caused by adding multiple copies of the same signal (produced by other speakers or reflections), arriving at different times, together.

Comb filters are inevitable whenever there's physical displacement between multiple sources reproducing the same signal or surroundings which consist of specular, reflective boundaries. These phenomena are particularly noticeable when walking the room while listening to pink noise, and are typically described as phasing, flanging or chorusing. Sounds familiar?

Comb filters appear to have audible pitch. The comb filter's peak-frequencies constitute a harmonics series and the apparent pitch is equal to the first peak's frequency (fundamental). Which pitch you perceive is uniquely defined by your listening position with respect to the sound system and surrounding boundaries, regardless of the program material itself, and is therefor a moving target which typically goes up and down in frequency (glissandi), as you walk the room, because you're dealing with a purely spatial problem.

Whenever you hear phenomena like these, one should realize that equalization is no longer a viable option unless the apparent pitch remains constant over space, which I yet have to encounter.

However, in such instances, every time the direct sound has been cancelled for whatever reason, signal has been replaced by noise (at the nulls or cancels) and both SNR and inherently coherence, decrease.

I would like to emphasize that all of this happens typically after the sound left the loudspeaker(s), and signal processing typically has limited to no merit other than properly time aligning multiple sources, which doesn't resolve room interaction!

Ripple

The difference between a comb filter's maxima and minima, expressed in dB, is known as ripple (figure 4) and arguably the most important metric when designing sound systems.

Ripple is determined by the relative level offset between two or more copies of the same signal (figure 5). An in-depth explanation of ripple is beyond the scope of this article. For more information please consult the chapter "Summation" which you'll find in all three editions of Bob McCarthy's book "Sound Systems: Design and Optimization".

Meanwhile, I encourage you to play with my phase calculator to gain insight in the balancing act between relative level and time offsets.

Figure 6 shows several transfer functions of the same comb filter with varying amounts of ripple while competing with different amounts of background noise. Notice how coherence (red trace) is greatly affected by both ripple as well as background noise.

In general, less ripple (less degraded, more robust signal) results in overall improved coherence. Interference between multiple copies of the same signal is minimized when relative level offset comes to the rescue. Simultaneously, lower background noise levels translate into more SNR which also improves coherence.

So how does ripple typically evolve over distance indoors?

Critical distance

Ripple goes hand in hand with the Direct-to-Reverberant ratio (D/R). For frequencies with wavelengths much smaller than the dimensions of a given room, we can resort to statistics. This criterion needs to be met for all acoustics equations that follow from hereon. Under such circumstances, the direct sound drops with 6 dB per doubling distance (inverse-square-law) whereas reverberation tends to maintain its level regardless of distance (figure 7).

In the direct field, the direct sound dominates over the reverberation with positive D/R values. In the revereberant field, it's the other way around with negative D/R values. The distance where direct and reverberant see eye to eye, at the same level, with a D/R value of zero, is called critical distance. It's where the scale tips.

Ultimately, we listeners experience and measure the combined SPL of both direct plus reverberation, which implies that in the reverberant field, beyond critical distance, the inverse-square-law is typically no longer observed nor experienced.

As long as the direct sound is dominating, there will be little or no ripple (6 dB or less) and high coherence because any reflections that could possibly cause destructive interference are soft by comparison. We're effectively isolated from the room and obtain "near-anechoic" data.

Conversely, if the reverberant sound dominates, the reflections are so strong by comparison, that they wreak havoc on the direct sound causing ripple in excess of 12 dB and low coherence.

My favorite tool, for the sole purpose of explaining the underlying mechanism involved, is the Hopkins-Stryker equation. With this equation we can estimate critical distance at best. The accuracy of this equation will get you into the “ball park.” It will bring you into the right order of magnitude but should be treated with scrutiny.

Hopkins-Stryker equation

In its simplest form, without the additional modifiers $$Ma$$ and $$Me$$, it looks as follows

\begin{equation}L_{p}=L_{W}+10\log\left(\frac{Q}{4\pi D_{x}^{2}}+\frac{4}{S\bar{a}}\right)+K\end{equation}

Figure 8 provides a detailed explanation of all the variables.

What makes this equation interesting is the part between parentheses. The first and second fraction determine how direct and reverberant levels evolve over distance respectively, Independent of the Sound Power Level (SWL), or simply put, source loudness (volume).

Notice, that only the first fraction contains a $$D_{x}^{2}$$ in the denominator. That's the $$1/r^2$$ dependency or inverse-square-law. Reverberation relies solely on the venue's total surface area and the average absorption coefficient of that combined area.

If we set $$D_{x}$$ to $$d_{c}$$, as in critical distance, and make direct and reverberation equally loud, the condition at critical distance, we obtain equation 3.

\begin{equation}\frac{Q}{4\pi d_{c}^{2}}=\frac{4}{S\bar{a}}\end{equation}

If we solve equation 3 for $$d_{c}$$ we get equation 4.

\begin{equation}d_{c}=0,141\sqrt{QS\bar{a}}\end{equation}

Which indicates that in practice, critical distance depends primarily on $$Q$$ and $$\bar{a}$$ since surface area is a given, unless you intend to bring a wrecking ball.

Let's start by looking at directivity factor first.

Directivity factor

The directivity factor, $$Q$$, is defined as the ratio of the intensity, $$I\,(W/m^2)$$, along a given axis and at a given distance from a sound radiator, compared to the intensity that would be produced at the same distance by an omnidirectional point source radiating the same power (figure 9).

The Directivity Index, $$DI$$, is exactly the same story but expressed on a logarithmic scale. Since $$Q$$ is about sound power, the $$10\times\log_{10}$$ rule is used to obtain equation 5.

\begin{equation}DI=10\log\left(Q\right)\end{equation}

Some loudspeaker manufacturers publish directivity data (figure 10).

In the absence of such information, $$Q$$ could be estimated, e.g., in the case of a rectangular horn by using the equation of Molloy.

\begin{equation}Q=\frac{180^{\circ}}{\arcsin\left(\sin\frac{\alpha}{2}\cdot\frac{\beta}{2}\right)}\end{equation}
Where $$\alpha$$ is the loudspeaker's nominal horizontal coverage angle and $$\beta$$ the nominal vertical coverage angle.

Molloy's equation should be used with care because it assumes all sound power is confined within the loudspeaker's intended coverage angle whereas real loudspeakers suffer from spill or leakage outside their intended coverage angle.

Ideally, the directivity factor is obtained from actual measurements which is best left to loudspeaker manufacturers.

At the end of the day, wide loudspeakers are low-$$Q$$ (an omnidirectional source having a $$Q$$ of 1) and narrow loudspeakers are high-$$Q$$.

More important, real loudspeakers aren't Constant Beamwidth Transducers (CBT). They don't exhibit the same coverage angle for all frequencies. Their $$Q$$ changes with frequency from near omnidirectional with a value of 1 in the low end to typically a constant $$Q$$ value for the high frequencies as can be seen in figure 10.

Be sure to check out the beautiful know how posters, regarding this topic and more, by Klippel in Dresden Germany

Absorption coefficient

The absorption coefficient, $$\bar{a}$$, of a material is a number between 0 and 1 which indicates the proportion of sound which is absorbed by the surface compared to the proportion which is reflected back into the room.

A large, fully open window (figure 11) would offer no reflection as any sound reaching it would pass straight out and no sound would be reflected. This would have an absorption coefficient of 1.

Conversely, a thick, smooth painted concrete ceiling would be the acoustic equivalent of a mirror and have an absorption coefficient very close to zero.

These are all the ingredients we need to further our understanding of critical distance.

Critical distance continued

Equation 4 implies that in practice, critical distance is a function of frequency. If we entertain ourselves with the notion of a venue with a constant $$RT_{60}$$ and $$\bar{a}$$ which in reality it won't, then $$Q$$ for real loudspeakers or even arrays for that matter, will still change with frequency and critical distance becomes a "moving target".

Figure 12 shows the bigger picture, starting with $$Q$$ and $$DI$$ values for a loudspeaker in chart #1.

Chart #2 shows $$RT_{60}$$ and $$\bar{a}$$ values for a venue with a volume of 20.000 m3.

Chart #3 shows critical distance according to equation 4 based on the values in charts #1 and #2 which is clearly moving with frequency.

And finally, chart #4 shows D/R which is also derived from the Hopkins-Stryker equation (equation 2).

This means that for real loudspeakers, at any given distance, for some frequencies you'll end up in the direct field with little to no ripple and high coherence whereas for other frequencies you might already find yourself in the reverberant field with lots of ripple and low coherence.

Ripple continued

Figure 13 shows the natural progression of ripple over distance in a room. Evidently, the ripple is only visible if you refrain from applying smoothing.

Close to speaker, in the direct field (even at lower frequencies), most of the frequency response exhibits ripple of 6 dB or less which implies isolation.

However as we distance ourselves to the loudspeaker. The low end is first to exit the direct field and enter the reverberant field, beyond critical distance, with ripple in excess of 12 dB.

Low end

Direct radiators or piston drivers are effectively omnidirectional at lower frequencies, when they produce wavelengths that exceed their own diameters substantially. Their $$Q$$ value approaches 1, decreasing critical distance. Therefor, any chance of expanding the direct field in this part of the frequency spectrum, relies solely on absorption.

However, for low frequencies, conventional absorption by friction (converting particle velocity into heat), using various materials and fabrics, is impractical. The material needs to be a quarter wavelength thick which leaves no more room for the audience (pun intended).

Also, achieving diaphragmatic action with membrane absorbers, in this part of spectrum, is cumbersome. Certainly the rigid walls that keep up to the roof won't flex in our favor and neither will the solid floor.

In other words, the conditions at low frequencies are low $$Q$$ and $$\bar{a}$$ values. Neither favor critical distance, so the low end is first to loose and both ripple and coherence will rapidly become worse.

The bottom end of a loudspeaker is typically incapable of avoiding energizing the room and upon impact, the room can't absorb that energy. We woke the beast.

Notice, that it is in this part of the spectrum where we observe room gain or LF-buildup. The imaginary black dashed lines over the peaks in figure 13, called the "eyeball" envelope which is how we gauge tonality, clearly shows an increase in LF-buildup over distance. In the combing zone where D/R values are close to zero, direct and indirect are effectively equally loud and every reflection has 66% chance of adding "gain". Hence room gain.

High end

Conversely, the horn is very much capable of not waking the beast by simply steering the sound where it needs to go which is the audience and not to the walls.

In this part of the spectrum where the horn is sole custodian, $$Q$$ will be relatively high (typically 10 or more) and also absorption is still a viable option. Once the "meat-absorbers" come into the room, even the audience is likely to improve the conditions and so does HF-attenuation by air.

High $$Q$$ and $$\bar{a}$$ values, both favor critical distance which means that the direct field in this part of the spectrum lasts much longer before being overtaken by reverberation. Real-world measurements, like figure 13, confirm this.

The isolation zone where D/R exceeds 10 dB or more, is last to lose market share before having to surrender to reverberation. Ultimately, even the horn, at greater distance, is no longer capable of avoiding the room.

This is why coherence is typically always better, relatively speaking, at higher frequencies compared to lower frequencies. Even in the near-field at close distance to the source.

Limited resources

It's in our interest to keep our audience in the direct field if we intend to deliver a robust and coherent signal with high intelligibility. Making critical distance as long as is realistically possible is therefor mandatory.

Looking at equation 4, this leaves us with two options. We either invest in $$Q$$ (directivity), e.g., by clever arraying, increasing $$Q$$ by committee or invest in $$\bar{a}$$ (absorption) which is typically much more involved and expensive.

You can spend the money only once so choose wisely. If neither option is possible, a distributed system is the way to go.

This leaves us with one very important aspect regarding coherence.

FFT & resolution

Dual-channel FFT analyzers make use of the Discrete Fourier Transform (DFT) to transform waveforms (time domain) into spectra (frequency domain). The frequency resolution of those spectra is determined by the so-called FFT size.

The FFT size, expressed in samples, represents a time record, window or constant which I mix up indiscriminately. To calculate the time record for a given FFT size use equation 7.

\begin{equation}TR=\frac{FFT\,size}{SR}\end{equation}

Where TR is the time record in milliseconds and SR is the sample rate in kiloHertz.

For an FFT size of, e.g., 1K (notice the capital K) which is 1024 samples, at a sample rate of 48 kHz. The time record would be 21 ms.

To go from time record to frequency resolution use equation 8.

\begin{equation}FR=\frac{1000}{TR}\end{equation}

Where FR is the frequency resolution in Hertz.

For a time record of 21 ms, the frequency resolution would be 47 Hz.

To go straight from FFT size to frequency resolution use equation 9.

\begin{equation}FR=\frac{SR}{FFT\,size}\end{equation}

Where SR is the sample rate in Hertz, no longer kiloHertz.

Quasi-logartihmic scale

The Fourier transform is linear which results in a linear frequency scale. This provides a challenge for an industry which requires a logarithmic scale preferably with a Fixed amount of Points per Octave (FPPO).

The upper transfer function in figure 14 is obtained using an fixed FFT size of 32K. It provides the desired amount of resolution at low frequencies but way to much resolution at high frequencies.

This is the issue with a single time record for the entire spectrum and the linear frequency scale that comes with it.

Therefor, analyzers like SIM and Smaart divide the entire spectrum in several bands and assign each band their own optimum time record to produce a quasi-logarithmic FPPO scale.

The lower transfer function in figure 14 shows the same system using multiple time records which results in virtually the same frequency resolution for the entire spectrum.

The criteria for determining the number of time records and their length, are outside the scope of this article. It's a matter of psychoacoustics, e.g., the perception of tonality, reverberation and echoes while trying to preserve accuracy.

For more information please consult the chapters "Reception" and "Examination" which you'll find in all three editions of Bob McCarthy's book "Sound Systems: Design and Optimization". In the third edition the chapter "Reception" was renamed "Perception".

However, having multiple time records or windows comes with a second advantage. It allows us to "window out" or reject late arriving reverberant energy including echoes if we desire to do so.

RMS or vector averaging

Most analyzers allow you to average measurement data over time which can be done in two ways. RMS or polar averaging and vector or complex averaging. Both types of averaging produce noticeably different outcomes.

RMS or polar averaging which discards time, suffices for single-channel applications like spectra when dealing with random/uncorrelated signals.

However, RMS or polar averaging used in conjunction with a dual-channel application, like a transfer function, admits late arriving energy, including echoes, into the measurement unlike vector or complex averaging.

Vector or complex averaging is better for correlated signals with a causal relationship (cause and effect) like dual-channel applications.

The process of tuning a sound system is "coordinating interaction with a goal" (Jamie Anderson). Room interaction plays a big part in the perceived tonality and is a steady-state phenomenon as long as the loudspeakers remain stationary and nobody brings a wrecking ball.

The reverberant energy (effect) which we induced with our sound system (cause) by exciting the room, colorizes the sound. This identity of the reverberation should be preserved and measured whereas late arriving energy and potentially echoes, should be rejected. Vector or complex averaging, rejects energy arriving outside of the time records and discards it as noise.

Figure 15 shows the same system using both types of averaging. One instance with RMS or polar averaging (blue trace) and another instance with vector or complex averaging (green trace).

In both cases, coherence (red trace) will look identical but RMS or polar averaging admitted much more reverberant energy into the measurement. How much, is best appreciated by looking at the difference between these two traces which is the orange trace in the bottom of figure 15.

Also notice that the blue trace, using RMS or polar averaging, suggests signal is very much present even when coherence is low, the latter indicating poor SNR. This is convoluted information whereas vector or complex averaging confirms poor SNR by the absence of signal (big nulls or cancels).

In my opinion, RMS or polar averaging makes the room "look" prettier, less hostile and less volatile than your ears suggest. This doesn't qualify as "eye-to-ear" training. Vector or complex averaging on the other hand, provides a more realistic perspective.

When we conduct response analysis we use internal delay to synchronize the two channels, thereby compensating for propagation delay (latency) and/or time-of-flight.

Once the delay has been set, measurement signals arriving exactly in time will have high coherence values whereas measurement signals arriving earlier or later will have proportionally lower coherence values. Signals that are early or late by 100% of the time record have a coherence value of zero percent (figure 16 - experimentally determined).

With vector or complex averaging, these signals, arriving out of time, will be attenuated in proportion to their coherence values (equation 10), shown by the blue line in figure 16.

\begin{equation}attenuation\propto\log(coherence)\end{equation}

Modern analyzers, with their multiple time records, provide the means to reject late arriving reverberant energy after 1 second for low frequencies to as little as 5 ms for high frequencies which is, as far as we know, psychoacoustically justifiable.

Ultimately, which type of averaging to use is a matter of comfort. Vector or complex averaging is an acquired taste. Nonetheless, I genuinely recommend it.

These technicalities are not trivial. Understanding them increases confidence in the fidelity of the measurement platform.

Be sure to read this seminal AES paper (open access) by Floyd E. Toole which underscores the importance of acknowledging reverberation and its intricacies when tuning sound systems.

Also Meyer Sound's latest video about M-Noise contains a great section, among others, regarding coherence where you see many of the phenomena discussed in this article, in action.

The video below provides an in-depth explanation why coherence can be thought of as an indicator for SNR. 