Ambisonics: Questions and Answers

By Richard Elen

Reprinted from Studio Sound, October 1982

Author's Notes: My copy of this article was so far removed from the original that OCR techniques produced pure basura. As a result, I reconstructed it by reading it into IBM ViaVoice which, interestingly, did a very good job. As the program learned my voice (it does not initially do very well with "British-English" voices), it got more and more accurate. A useful technique if you have material you can only just read!

Obviously this article is wildly out of date. It's included largely for historical reasons -- and in fact it provides an accessible starting point for people who are new to the field, so hopefully it was worth all the hassle getting it here. Some tiny changes have been made, primarily to correct editorial errors (who was the Editor of Studio Sound at the time? Me.). In addition, some cuts for length have been restored.

It is worth noting, however, that although CD never delivered the theoretical promise of four audio channels, DVD will. The present DVD-Video spec allows for four channels of up to 24-bit, 96kHz sampling audio -- enough for UHJ with height if we also have in the spec the two flags proposed by the ARA indicating hierarchical encoding and lossless compression. That would mean that an ordinary DVD-Video disc could include full with-height periphony. In addition, any surround system could be used to record the audio, which could then be transcoded into UHJ. At the other end, any surround system could be derived from the UHJ signal.

Although the NRDC Ambisonic surround-sound technology has been in existence for over a decade, it has been largely unknown to engineers and producers outside the UK. This article covers some of the primary aspects of the system and how it may be applied, in question-and-answer form.

Q What exactly is Ambisonics?

A Ambisonics is a system for capturing or creating a soundfield and reproducing it in the listening environment in such a way as to recreate the original placement of sound and instruments. In its ultimate form, it can represent this soundfield in all three spatial dimensions. At a lower level, it can create a horizontal soundfield which does not include height information.

Q What is the difference between Ambisonics and "quad"?

A A great dea1. "Quadraphony" failed because it was based on a false premise: that a soundfield could he represented by four separate sources of sound, behaving, if you like, as four stereo signal pairs around the listener: this is simply not true. In addition, there was a profusion of different systems, none of which were technically satisfactory. Ambisonics provides one hierarchical encoding system which is truly compatible all the way from with-height "periphony" to mono. It also has the advantage that it works: ambisonics recognizes the fact that speaker feeds must be interrelated to reproduce a coherent soundfield exactly like that which was recorded or created. A lot of effort was expended by the proponents of "quad" to attempt to capture four separate channels of information and transmit them, either matrixed into two and subsequently recovered (which is impossible) or encoded with extra subcarriers which required a great deal of sophisticated equipment to extract. An ambisonic signal does not suffer these limitations.

Q What does an ambisonic signal comprise?

A A three-dimensional soundfield can be captured (for example on tape) on four channels: these correspond to a mono signal (usually termed "W"); front minus back ("X"); left minus right ("Y"), and up minus down ("Z") Removing the "Z" channel of this "B-Format" signal leaves you with a horizontal surround-sound signal on only three channels. Removing the "X" channel returns you to sum-and-difference stereo information only. A signal of the B-format type can be generated in a number of ways, and it can be replayed through a suitable speaker system via a decoder which produces speaker feeds to recreate the surround information, in much the same way as a sum-and-difference matrix decoder creates left and right feeds from L+R and L-R in FM stereo radio.

Q How can an ambisonic B-format signal be transmitted or cut on disc?

A This depends on the number of transmission channels available. The capturing of the surround information, and the transmission of this information are quite different techniques. The B-Format signal, for instance, does not need to be directly compatible with stereo or mono, but for most applications the transmitted form must be available to listeners who do not have special decoding equipment: it must be compatible with traditional forms. The UHJ hierarchy of encoding and decoding methods enables the maximum amount of surround information to be transmitted on the available channels, yet also allows for compatibility with both mono or stereo. Thus, for example, the three B-Format channels of horizontal ambisonic surround sound may be encoded into two channels for cutting on conventional disc or for stereo radio broadcasting, producing horizontal surround for those who have decoders, stereo for those who have stereo, and mono for the portable radio or record-player listener. It is possible to encode horizontal B-Format in mono-stereo-compatible form using three channels, as the IBA [British Independent Broadcasting Authority] has demonstrated, for FM transmission with phase-quadrature modulation carrying the third channel at full or reduced bandwidth.

The coming of the digital audio disc opens up new possibilities, of course: thc four-channel capability of the Philips-Sony Compact Disc makes periphonic (three-dimensional surround-sound) disc releases possible: in such cases, listeners equipped with suitable decoders could experience full surround-sound, while those with horizontal-surround decoders could play the same disc and hear excellent reproduction, without the height information. Yet listeners in stereo and even mono would still hear all the music, albeit without the extra localization information that ambisonics provides.

Q How does the listener decode the signal?

A The signa1 source, be it disc, radio transmission or tape, is fed into a decoder unit. This takes the encoded information on the signal and produces low-level outputs corresponding to the number of loudspeakers to be driven. For horizonta1 surround sound this is typically four or more, while for periphony a minimum of six speakers is required (and eight is preferable). These outputs can be fed to amplifiers end speakers in the normal way. A typical decoder also has controls to compensate for the distance and position of loudspeakers in the room, and often an ability to use part of the decoder circuitry to "spread" ordinary stereo around the room - so-called "super-stereo".

Q How stable is the surround-sound image? Is the sound the same all over the listening area?

A Unlike "quad", and much conventiona1 stereo, ambisonic surround-sound does not rely solely on level to localize a sound. The "quad" systems which relied on level, and conventional "pan-potted" stereo, suffer the disadvantage that the perceived position of a sound is very much dependent on thc listener's position in the listening area. Small changes in listening position cause large changes in the sound image. In addition, there is no "depth" to the sound: with pan-potted stereo, sounds can only appear to come from a straight line between the speakers. For a good image, the listening position is very small. Ambisonics, however, utilizes phase information as well as level to localize the image. This means that the stability of the surround-sound picture is much less dependent on listening position, just as it is when listening to a coincident-pair stereo recording as opposed to a pan-potted one. In fact, you can go quite close to a speaker which is part of an ambisonic surround-sound replay system and the image will not shift appreciably. The relative levels may alter, but the image stays more or less constant over a wide area. In fact, even outside the speaker layout, the soundfield image may be appreciated. At first sight, the equations that govern ambisonics appear to indicate that the soundfield would only be recreated at a single point, resulting in a tiny "sweet spot". This is, however, a misunderstanding of Ambisonic theory and is not borne out by the results. In practical terms, Ambisonics has the largest listening area of any surround system.

Q Exactly how truly compatible is a UHJ-recorded surround signal?

A The simple answer is virtually completely. When a surround-sound signal is "collapsed" into stereo, there is in theory a slight phase-variation which could affect signals which are localized to the rear of the soundfield. However, this is seldom, if ever. noticed in practice. Indeed, listening to a UHJ signal in stereo on loudspeakers often produces an impression of the stereo sound-stage being wider than normal; on headphones, a UHJ signal replayed in stereo can produce an impression of "surround" -- not a true ambisonic soundfield of course but a distinct impression that some sounds are a little behind you and others a little in front, as if the human ear/brain combination was contributing a degree of "aural decoding" to the signal.

Q How can an ambisonic recording or broadcast be made?

A there are a number of methods available, some of which currently still under development. There are three basic approaches: the soundfield microphone, the transcoder, and the ambisonic mixer. The soundfield microphone is most easily applicable to the concert environment, where the B-format signal may be produced from a single microphone in the auditorium, this signal either being recorded on four channels of a tape recorder, for subsequent UHJ encoding, or encoded directly into UHJ for two channel recording or FM stereo broadcast. The B-format signal may also be encoded in other ways under the UHJ hierarchy, as already described.

The transcoder does not produce a B-format signal, but outputs a two-channel UHJ-encoded signal that may be broadcast, recorded or cut to disc or CD directly. The input to the transcoder is in the form of four signals representing a front and rear stereo sound stage -- and corresponding roughly to the four group outputs of a conventional " quadraphonic " desk. There is also a B-format input for a soundfield mic etc. Thus a multitrack tape recording may be mixed down through a "quad" desk and the result transcoded to produce an ambisonic mixdown. As the Transcoder in fact utilizes two sound stages -- 180 degrees wide at the front and slightly less to the rear -- there are certain limitations to the placement of sounds in the sound field. The transcoder in addition, by virtue of not producing a B-format output, does not facilitate certain creative engineering effects such as rotating one soundfield within another. Thus the transfer is best utilized for simple mixdowns only, and for the conversion of existing "quad " four-channel masters to UHJ ambisonic form.

The ambisonic console may take two forms: either a complete mixing desk equipped with ambisonic "panpots" and effects controls, in addition to conventional EQ, recording and other facilities, or a "mixdown panel " which may be patched into the direct channel outputs of a conventional desk, offering ambisonic localization controls and effects returns, plus B-format inputs. In both cases, a multitrack tape, produced in the conventional way with perhaps the addition of the soundfield mic for capturing certain instruments (eg drums) and ambience may be mixed down to B-format for subsequent encoding via a suitable UHJ encoder. Such equipment, which is currently under development, may offer a horizontal-surround only or full Periphonic mixdown capability, with a number of sophisticated effects controls including ambisonic reverberation facilities and soundfield alteration capability. There are no theoretical restrictions on sound source placement within a two or three-dimensional soundfield, and the generation of the B-format master allows the material to be subsequently reissued in more advanced formats. For example, it may not be possible to release periphonic mixdowns in periphonic form today, but it may be released initially in horizontal-only UHJ. However with the advent of multi-channel digital audio discs, the material may be made available in "full-surround " form.

Q What is the soundfield mic?

A The soundfield mic is a special microphone designed to capture the entire ambient soundfield from a single location. An example of this type of mic is the well-known Calrec design, which may be used both for ambisonic recording or broadcast, or for stereo purposes. In the latter case, the signal may still be recorded in B-format, the recording subsequently being replayed via the accompanying control unit, allowing the re-orientation of the mic--in terms of polar diagram, direction, and effective physical position--after the recording has taken place. As the entire three-dimensional soundfield is captured, it may be modified after the event.

The mic itself consists of a tetrahedral array of capacitor-type capsules and appropriate preamplifiers. The control unit takes these capsules signals (A-format) and converts them to standard ambisonic B-format, allowing comprehensive control of effective mic orientation at the same time. This mic is thus useful in conventional stereo applications, with or without additional conventional mixdown, as well as being all that is needed for an ideal ambisonic recording or broadcast of a live event.

Q What do I need to record: a) a live concert, or b) a band in the studio, ambisonically?

A. In both cases, you'll need a professional decoder and speaker system for monitoring applications. If we consider the example of a horizontal-surround recording, you'll need a horizontal-surround decoder and at least four matched loudspeakers. The four should ideally be identical--different loudspeakers produce the same kind of loss of image localization as two different speakers would in stereo listing--and the monitoring environment and speaker placement should be optimized. In general, a studio control room equipped for "quad" monitoring will be sufficient for this.

For example (a)--the live concert--it will normally be quite sufficient to place one soundfield mic in a suitable center position. The B-format output of the mic control unit should be routed to a four-channel recorder, or via a UHJ encoder to a two-channel machine, the setting of the soundfield controls on the control unit being optimized for the best results. For example (b)--the band in the studio--normal multitrack techniques may be utilized, in which case no surround-sound monitoring systems will be required until mixdown, unless a soundfield mic is being used for ambience or other recording during the session. In the latter case, the B-format signal should be recorded on four channels of the multitrack, as and when it is required. Surround monitoring may be useful in this instance, at least for monitoring the signal going to tape to check for correct mic position. The ambisonic console or mixdown panel is then used for the mix, and the resultant B-format signal is then recorded on four-track for subsequent encoding. Alternatively, the Transcoder may be used, as described earlier.

Q What is required to transmit or cut such a recording?

A By taking a typical current example of FM stereo radio or conventional stereo analog disc, the B-format signal must first be encoded into a UHJ 2-channel form. This is the current domestic format. The signal is then simply transmitted or cut to disc as normal.

Q Are there any special requirements to cut an ambisonic disc?

A Continuing with the example of cutting a conventional analog master lacquer from a UHJ-encoded analog source, the usual constraints should be observed--for example, loud out-of-phase signals may cause problems unless there's a strong bass signal located center-front (we may regard this as "North"). As ambisonics uses phase as well as level to provide localization, exceptionally heavy bass signals to the rear of the soundfield, particularly from the "South" should be avoided in the mix unless there is a good "anchor" sound in that register from the "North".

On the cut itself, there may be deleterious effects to localization if phase-correction equipment is used: just as phase-related stereo effects on conventional recordings may suffer from the same process. As far as possible, the mix should be optimized to avoid such problems, typically by hooking up a UHJ 2-channel encoder on the mix to encode the B-format signal and observing its characteristics on a phase-meter. The phase-meter should also be observed during the encoding process itself, to check the mix has been optimized in this respect. However, it is worth noting that although these points may produce minor problems with unconventional material (especially heavy effects or rear reverberation, for example), such problems are very seldom encountered in normal ambisonic work.

Many of these limitations do not apply in the case of mastering a digital audio disc.