Maintaining Audio Quality In The
Broadcast Facility

Back


By
Robert Orban and Greg Ogonowski
Orban/CRL


Continuous From System Considerations


In
digital signal processing devices, the lowest number of bits per word necessary to achieve professional quality is 24 bits. This is because there are a number of common DSP operations (like infinite-impulse-response filtering) that substantially increase the digital noise floor, and 24 bits allows enough headroom to accommodate this without audibly losing quality. (This assumes that the designer is sophisticated enough to use appropriate measures to control noise when particularly difficult filters are used.) If floating-point arithmetic is used, the lowest acceptable word length for professional quality is 32 bits (24-bit mantissa and 8-bit exponent; sometimes called “single-precision”). In digital distribution systems, 20-bit words (120dB dynamic range) are usually adequate to represent the signal accurately. 20 bits can retain the full quality of a 16-bit source even after as much as 24dB attenuation by a mixer. There are almost no A/D converters that can achieve more than 20 bits of real accuracy, and many “24-bit” converters have accuracy considerably below the 20-bit level. “Marketing bits” in A/D converters are outrageously abused to deceive customers, and, if these 4 Jensen Transformer, Inc., North Hollywood, California, USA (Phone +1 213 876-0059, or Fax +1 818 7634574) 5 Lundahl Transformers AB, Tibeliusgatan 7 SE-761 50, Norrtalje SWEDEN (Phone: +46 - 176 139 30 Fax: +46 - 176 139 35) A/D converters were consumer products, the Federal Trade Commission would doubtless quickly forbid such bogus claims. There is considerable disagreement about the audible benefits (if any) of raising the sample rate above 44.1 kHz. To the author’s knowledge, as of 1999 there have been no rigorous tests of this that are double-blind and that adequately control for other variables, like the performance of the hardware as the sample rate is changed. (Assuming perfect hardware, it can be shown that this debate comes down entirely to the audibility of a given anti-aliasing filter design, as is discussed below.) Nevertheless, in a marketing-driven push, the record industry is attempting to change the consumer standard from 44.1 kHz to 96 kHz. Regardless of whether scientifically accurate testing eventually proves that this is audibly beneficial, it has no benefit in FM stereo because the sampling rate of FM stereo is 38 kHz, so the signal must eventually be lowpass-filtered to 17 kHz or less to prevent aliasing. It is beneficial in DAR, which typically has 20 kHz audio bandwidth, but offers no benefit at all in AM, whose bandwidth is no greater than 10 kHz in any country and is often 4.5 kHz. There has been at least one rigorous test comparing 48 kHz and 96 kHz sample rates6. This test concluded that there is no audible difference between these two sample rates if the 48 kHz rate’s anti-aliasing filter is appropriately designed. Some A/D converters have built-in soft clippers that start to act when the input signal is 3 – 6 dB below full scale. While these can be useful in mastering work, they have no place in transferring previously mastered recordings (like commercial CD). If the soft clipper in an A/D converter cannot be defeated, that A/D should not be used for transfer work. Dither is random noise that is added to the signal at approximately the level of the least significant bit. It should be added to the analog signal before the A/D converter, and to any digital signal before its word length is shortened. Its purpose is to linearize the digital system by changing what is, in essence, “crossover distortion” into audibly innocuous random noise. Without dither, any signal falling below the level of the least significant bit will disappear altogether. Dither will randomly move this signal through the threshold of the LSB, rendering it audible (though noisy). Whenever any DSP operation is performed on the signal (particularly decreasing gain), the resulting signal must be re-dithered before the word length is truncated back to the length of the input words. Ordinarily, correct dither is added in the A/D stage of any competent commercial product performing the conversion. However, some products allow the user to turn the dither on or off when truncating the length of a word in the digital domain. If the user chooses to omit adding dither, this should be because the signal in question already contained enough dither noise to make it unnecessary to add more. In the absence of “noise shaping,” the spectrum of the usual “triangularprobability- function (TPF)” dither is white (that is, each arithmetic frequency increment contains the same energy). However, noise shaping can change this noise spectrum to concentrate most of the dither energy into the frequency range where the ear is least sensitive. In practice, this means reducing the energy around 4 kHz and 6 Katz, Bob: Mastering Audio: the art and the science. Oxford, Focal Press, 2002, p. 223 raising it above 9 kHz. Doing this can increase the effective resolution of a 16-bit system to almost 19 bits in the crucial midrange area, and is standard in CD mastering. There are many proprietary curves used by various manufacturers for noise shaping, and each has a slightly different sound. It has been shown that passing noise shaped dither through most classes of signal processing and/or a D/A converter with non-monotonic behavior will destroy the advantages of the noise shaping by “filling in” the frequency areas where the original noise-shaped signal had little energy. The result is usually poorer than if no noise shaping had been used. For this reason, Orban has adopted a conservative approach to noise shaping, recommending so-called “first-order highpass” noise shaping and implementing this in Orban products that allow dither to be added to their digital output streams. First-order highpass noise shaping provides a substantial improvement in resolution over simple white TPF dither, but its total noise power is only 3dB higher than white TPF dither. Therefore, if it is passed through additional signal processing and/or an imperfect D/A converter, there will be little noise penalty by comparison to more aggressive noise shaping schemes. One of the great benefits of the digitization of the signal path in broadcasting is this: Once in digital form, the signal is far less subject to subtle degradation than it would be if it were in analog form. Short of becoming entirely un-decodable, the worst that can happen to the signal is deterioration of noise-shaped dither, and/or added jitter. Jitter is a time-base error. The only jitter than cannot be removed from the signal is jitter that was added in the original analog-to-digital conversion process. All subsequent jitter can be completely removed in a sort of “time-base correction” operation, accurately recovering the original signal. The only limitation is the performance of the “time-base correction” circuitry, which requires sophisticated design to reduce added jitter below audibility. This “time-base correction” usually occurs in the digital input receiver, although further stages can be used downstream. There are several pervasive myths regarding digital audio: One myth is that long reconstruction filters smear the transient response of digital audio, and that there is therefore an advantage to using a reconstruction filter with a short impulse response, even if this means rolling off frequencies above 10 kHz. Several commercial high-end D-to-A converters operate on exactly this mistaken assumption. This is one area of digital audio where intuition is particularly deceptive. The sole purpose of a reconstruction filter is to fill in the missing pieces between the digital samples. These days, symmetrical finite-impulse-response filters are used for this task because they have no phase distortion. The output of such a filter is a weighted sum of the digital samples symmetrically surrounding the point being reconstructed. The more samples that are used, the better and more accurate the result, even if this means that the filter is very long. It’s easiest to justify this assertion in the frequency domain. Provided that the frequencies in the passband and the transition region of the original anti-aliasing filter are entirely within the passband of the reconstruction filter, then the reconstruction filter will act only as a delay line and will pass the audio without distortion. Of course, all practical reconstruction filters have slight frequency response ripples in their passbands, and these can affect the sound by making the amplitude response (but not the phase response) of the “delay line” slightly imperfect. But typically, these ripples are in the order of a few thousandths of a dB in high-quality equipment and are very unlikely to be audible. The authors have proved this experimentally by simulating such a system and subtracting the output of the reconstruction filter from its input to determine what errors the reconstruction filter introduces. Of course, you have to add a time delay to the input to compensate for the reconstruction filter’s delay. The source signal was random noise, applied to a very sharp filter that band-limited the white noise so that its energy was entirely within the passband of the reconstruction filter. We used a very high-quality linear-phase FIR reconstruction filter and ran the simulation in double-precision floating-point arithmetic. The resulting error signal was a minimum of 125 dB below full scale on a sample-by-sample basis, which was comparable to the stopband depth in the experimental reconstruction filter. We therefore have the paradoxical result that, in a properly designed digital audio system, the frequency response of the system and its sound is determined by the anti-aliasing filter and not by the reconstruction filter. Provided that they are realized with high-precision arithmetic, longer reconstruction filters are always better. This means that a rigorous way to test the assumption that high sample rates soundbetter than low sample rates is to set up a high-sample rate system. Then, without changing any other variable, introduce a filter in the digital domain with the same frequency response as the high-quality anti-aliasing filter that would be required for the lower sample rate. If you cannot detect the presence of this filter in a doubleblind test, then you have just proved that the higher sample rate has no intrinsic audible advantage, because you can always make the reconstruction filter audibly transparent. Another myth is that digital audio cannot resolve time differences smaller than one sample period, and therefore damages the stereo image. People who believe this like to imagine an analog step moving in time between two sample points. They argue that there will be no change in the output of the A/D converter until the step crosses one sample point and therefore the time resolution is limited to one sample. The problem with this argument is that there is no such thing as an infinite-risetimestep function in the digital domain. To be properly represented, such a function has to first be applied to an anti-aliasing filter. This filter turns the step into an exponential ramp, which typically has equal pre- and post-ringing. This ramp can be moved far less than one sample period in time and still cause the sample points to change value. In fact, assuming no jitter and correct dithering, the time resolution of a digital system is the same as an analog system having the same bandwidth and noise floor. Ultimately, the time resolution is determined by the sampling frequency and by the noise floor of the system. As you try to get finer and finer resolution, the measurements will become more and more uncertain due to dither noise. Finally, you will get to the point where noise obscures the signal and your measurement cannot get any finer. However, this point is orders of magnitude smaller in time than one sample period and is the same as in an analog system. A final myth is that upsampling digital audio to a higher sample frequency will increase audio quality or resolution. In fact, the original recording at the original sample rate contains all of the information obtainable from that recording. The only thing that raising the sample frequency does is to add ultrasonic images of the original audio around the new sample frequency. In any correctly designed sample rate converter, these are reduced (but never entirely eliminated) by a filter following the upsampler. People who claim to hear differences between “upsampled” audio and the original are either imagining things or hearing coloration caused by the added image frequencies or the frequency response of the upsampler’s filter. They are not hearing a more accurate reproduction of the original recording. This also applies to the sample rate conversion that often occurs in a digital facility. It is quite possible to create a sample rate converter whose filters are poor enough to make images audible. One should test any sample rate converter, hardware or software, intended for use in professional audio by converting the highest frequency sinewave in the bandpass of the audio being converted, which is typically about 0.45 times the sample frequency. Observe the output of the SRC on a spectrum analyzer or with software containing an FFT analyzer (like Adobe Audition or Cool Edit). In a professional-quality SRC, images will be at least 90 dB below the desired signal, and, in SRC’s designed to accommodate long word lengths (like 24 bit), images will often be –120 dB or lower. And finally, some truisms regarding loudness and quality: Every radio is equipped with a volume control, and every listener knows how to use it. If the listener has access to the volume control, he or she will adjust it to his or her preferred loudness. After said listener does this, the only thing left distinguishing the “sound” of the radio station is its texture, which will be either clean or degraded, depending on the source quality and the audio processing. Any Program Director who boasts of his station’s $20,000 worth of “enhancement” equipment should be first taken to a physician who can clean the wax from his ears, then forced to swear that he is not under the influence of any suspicious substances, and finally placed gently but firmly in front of a high-quality monitor system for a demonstration of the degradation that $20,000 worth of “enhancement” causes! Always remember that less is more.

Stereo Enhancement

In contemporary broadcast audio processing, high value is placed on the loudness and impact of a station compared to its competition. OPTIMOD already has made a major contribution to competiveness. Orban’s 222A Stereo Spatial Enhancer augments your station’s spatial image to achieve a more dramatic and more listenable sound. Your stereo image will become magnified and intensified; your listeners will also perceive greater loudness, brightness, clarity, dynamics and depth. In use, the 222A detects and enhances the attack transients present in all stereo program material, while not processing other portions. Because the ear relies primarily on attack transients to determine the location of a sound source in the stereo image, this technique increases the apparent width of the stereo soundstage. Since only attack transients are affected, the average L–R energy is not significantly increased, so the 222A does not exacerbate multiple distortion. Several of Orban’s digital audio processors now incorporate the 222A algorithm in DSP.

Other Production Equipment
The preceding discussions of disk reproduction, tape, and electronic quality also apply to the production studio. Compact discs and DVD-Audio discs usually provide the highest quality. For cuts that must be taken from vinyl disk, it is preferable to use “high-end” consumer phono cartridges, arms, and turntables in production. Make sure that one person has responsibility for production quality and for preventing abuse of the record playing equipment. Having a single production director will also help achieve a consistent air sound—an important contribution to the “big-time” sound many stations want. A new generation of low-cost all-digital mixers, made by companies like Soundcraft, Yamaha, Mackie, and Roland, provide the ability to automate mixes and to keep the signal in the digital domain throughout the production process. At the high end, Orban’s Audicy digital workstation is oriented towards fast radio production. It comin combines a dedicated mixing control surface with no-delay RAM-based editing and high-quality built-in digital effects. Although some people still swear by certain “classic” vacuum-tube power amplifiers (notably those manufactured by Marantz and McIntosh), the best choice for a monitor amplifier is probably a medium-power (100 watts or so per channel) solid-state amplifier with a good record of reliability in professional applications. We do not recommend using an amplifier that employs a magnetic field power supply or other such unusual technology, because these amplifiers literally chop cycles of the AC power line and tend to cause RFI problems. Do not be tempted to dust off an old Gates or RCA power amplifier and place it in service because it saves you money. It is also usually unwise to use the monitor amplifiers built into most consoles.

Production Practices
The following represents our opinions on production practices. We are aware that some stations operate under substantially different philosophies. But we feel that the recommendations below are rational and offer a good guide to achieving consistently high quality.

1. Do not apply general audio processing to dubs from commercial recordings in the production studio.
OPTIMOD provides all the processing necessary, and does so with a remarkable lack of audible side effects. Further compression is not only undesirable but is likely to be very audible. If the production compressor has a slow attack time (and therefore produces overshoots that can activate gain reduction in OPTIMOD), it will probably “fight” with OPTIMOD, ultimately yielding a substantially worse air sound than one might expect given the individual sounds of the two units. If it proves impossible to train production personnel to record with the correct levels, we recommend using the Orban 8200ST (an integrated gated leveler, compressor, high-frequency limiter, and peak clipper that is the successor to Orban’s 464A “Co-Operator”) to protect the production recorder from overload. When used for leveling only, the 8200ST does not affect short-term peak-to-average ratio of the audio, and so will not introduce unnatural artifacts into OPTIMOD processing. If production personnel control levels correctly, the 8200ST can be used as a safety limiter and high-frequency limiter by using only the 8200ST compressor function and adjusting its input gain so that broadband gain reduction never occurs when the console VU meters are peaking normally. With this set-up, only high frequencies will be controlled and high-frequency tape saturation will be prevented without adding unwanted broadband compression. (The 8200ST subtle broadband compressor will still prevent tape overload if the console output level is peaked too high.)

2. Avoid excessive bass and treble boost.
Sub-standard recordings can be sweetened with equalization to achieve a tonal balance typical of the best currently produced recordings. However, excessive treble boost (to achieve a certain sound signature for the station) must be avoided if a tape speed of 7.5ips is used, because the tape is subject to high-frequency saturation due to the high-frequency boost applied by the recorder’s equalization network. If production is recorded and played on-air digitally, there is no effective limit to the amount of HF boost you can apply in production. However be aware that large amounts of HF boost will stress your on-air AM or FM audio processor because it has to deal with pre-emphasis. We recommend using a modern CD typical of your program material as a reference for spectral balance. Very experienced engineers master major-label CDs using the best available processing and monitoring equipment, typically costing over $100,000 per room in a well-equipped mastering studio. The sound of majorlabel CDs represents an artful compromise between the demands of different types of playback systems and is designed to sound good on all of them. Mastering engineers do not make these compromises lightly. We believe it is very unwise for a radio station to significantly depart from the spectral balance typical of major-label CDs, because this almost certainly guarantees that there will be a class of receivers on which the station sounds terrible.

3. Pay particular attention to the maintenance of production studio equipment.
Even greater care than that employed in maintaining on-air equipment is necessary in the production studio, since quality loss here will appear on the air repeatedly. The production director should be acutely sensitized to audible quality degradation and should immediately inform the engineering staff of any problems detected by ear.

4. Minimize motor noise.
To prevent motor noise from leaking into the production microphone, tape machines with noisy motors and computers with noisy fans and hard drives should be installed in alcoves under soffits, and surrounded by acoustic treatment. In the real world of budget limitations this is sometimes not possible, although sound-deadening treatment of small spaces is so inexpensive that there is little excuse for not doing it. But even in an untreated room, it is possible to use a directional microphone (with figure-eight configuration, for example) with the noisy machine placed on the microphone’s “dead” axis. Choosing the frequency response of the microphone to avoid exaggerating low frequencies will help. In particularly difficult cases, a noise gate or expander can be used after the microphone preamp to shut off the microphone except during actual speech.

5. Consider processing the microphone signal.
Audio processing can be applied to the microphone channel to give the sound more punch. Suitable equalization may include gentle low- and high-frequency boosts to crispen sound, aid intelligibility, and add a “big-time” quality to the announcer. But be careful not to use too much bass boost, because it can degrade intelligibility. Effects like telephone and transistor radio can be achieved with equalization, too. The punch of production material can often be enhanced by tasteful application of compression to the microphone chain. However, avoid using an excessive amount of gain reduction and excessively fast release time. These cause room noise and announcer breath sounds to be exaggerated to grotesque levels (although this problem can be minimized if the compressor has a built-in expander or noise gate function). When adjusting the microphone processor, adjust the on-air audio processor for your desired sound on music first, and then adjust the microphone processor to complement the on-air processing you have selected. Close-micing, which is customary in the production studio, can exaggerate voice sibilance. In addition, many women’s voices are sibilant enough to cause unpleasant effects. High-frequency equalization and/or compression will further exaggerate sibilance. If you prefer an uncompressed sound for production work but still have a sibilance problem, then consider locating a dedicated de-esser after all other processing in the microphone chain.

WWW.261.gr