By Richard G. Elen
This is an extended version of an article appearing
in the April 1998 edition of AudioMedia
Included with this article are two other items: an article on "G+2" - a proposed Ambisonic delivery system for DVD that is compatible with existing players and requires no decoder - and a series of notes to the main article below which may be accessed from footnotes in the text.
With the advent of home theater systems, there has been increased interest in surround sound. And with the allegedly imminent agreement of a specification for DVD-Audio, surround sound it is continually in the public mind -- or at least in the mind of the well-heeled audiophile. Increasingly, surround sound is also a subject that concerns engineers and producers. But while in most minds there is only one (5.1), or perhaps two (AC-3 and DTS), systems of interest, for todays audio engineer there are in fact some additional options. In this article, Richard Elen discusses one surround sound system that, unlike those at the forefront of today's marketing hype, offers a remarkable set of capabilities -- including several features which are difficult, if not impossible, to realize with any other method. The system under discussion is Ambisonics, and Richard Elen, along with a growing number of enthusiasts, is hopeful that this British-designed surround sound technology is finally poised to become widely adopted in the new world of fully-digital transmission systems.
Nearly 30 years ago, the old quadraphonic systems were in their death throes. For a number of reasons, nearly every one of them had exhibited fundamental flaws and insurmountable problems. There were two main problem areas.(1)
First, there was the challenge of persuading an inherently 2-channel transmission medium, such as vinyl disk or FM radio, to transmit the four "discrete" channels of a quadraphonic recording via a matrixing arrangement in such a way as to make it possible to recover the original four channels unscathed at the other end. The fact that this is mathematically impossible did not seem to dissuade the designers of quadraphonic matrixing systems, who developed numerous so-called "4-2-4" schemes to attempt to achieve this goal. A more sophisticated - and somewhat more successful - approach employed high-frequency subcarriers on the vinyl disk, the left groove wall providing left-front plus left-rear conventionally, while also carrying the difference between the two in the subcarrier -- with a corresponding scheme for the right-hand side of the groove. The subcarrier systems, despite their inherent difficulties -- such as the need for special styli and cartridges to reproduce the high frequency subcarriers -- were capable in some cases (notably Nippon Columbias UD-4) of accurately reproducing the original four signals. This performance was in stark contrast to the so-called "4-2-4" approach, which resulted in significant compromising of the integrity of the original recording as far as sound-source localization was concerned.
The second source of problems for the quadraphonic techniques of the early seventies was more fundamental, and its very existence was unrecognized (or at least unaddressed) by the vast majority of people working in surround sound -- so much so that almost every subsequent surround sound implementation has suffered from the very same problem. To put this problem into perspective, we need to take a look at the principles and history of stereo.
Stereophonic sound recording and reproduction was successfully demonstrated at EMI in Britain by Alan Dower Blumlein as early as the 1930s and experiments were also carried out by Bell Labs in the US at virtually the same time. Blumlein was single-handedly responsible for a vast number of audio-related patents up until his death while testing airborne radar equipment during the second world war. Among his inventions were several different stereophonic coincident-mic techniques which produced exceptional results, including capturing a distinct feeling of "depth" and a smooth spread of sound from one side of the "sound stage" to the other. The Bell Labs experiments of the same period used spaced mics which tended to exhibit a "hole in the middle" effect and we will see why shortly.
By the time that stereo recordings were generally available in the marketplace, in the late fifties, it had been established there were certain limits to stereo replayed over loudspeakers. The speakers needed to have similar characteristics and be fed by matched amplifiers, for example. But most important, the speakers had to be about 60 degrees apart with respect to the listening position. If there were closer together, the width of the soundstage was narrowed. If the speakers were further apart than 60 degrees, a "hole in the middle" tended to be experienced. This effect is particularly noticeable when sound sources are localized using only relative level to define their position. This, of course, is the standard method of localized in sound sources in conventional multitrack mixing common names for the technique include "pairwise mixing", "panpotted mono", "amplitude mixing" and "intensity stereophony". It is also the kind of localization utilized in spaced-mic recordings such as those Bell labs experiments.
If you are seated in the conventional stereo listening position, with speakers at +30 and 30 degrees either side of center front, and you take a mono sound source, split it into two equal-level feeds and send one feed to the left speaker and the other to the right, you will hear the sound localized center front. However, if you move to the left, you will now hear more sound from the left speaker, and as a result the source will appear to move with you.
"Discrete" 4-channel quad recordings ran headlong into the problem of stereo speaker position. Quad placed four speakers around the listener, typically in a square. The listener was faced with, if you like, four stereo pairs: front, left, rear and right but the speakers were 90 degrees apart. This spacing was more than sufficient to ensure that not only were significant holes between the speakers experienced; it in fact made certain that virtually no inter-speaker imaging was possible. For the system to stand even a chance of working at all effectively, the listener had to sit at the dead center of the array -- the so-called "sweet spot". Even so, a well-set-up "discrete" quad system was capable of results that were more impressive than traditional stereo.
But there are other problems with using level as the sole means of localization. The human ear/brain combination, in contrast, uses a number of different localization techniques at different, overlapping frequencies, which are at least as important. We use phase to localize sounds between 150 Hz and 1.5 kHz, while level is utilized between 300 Hz and 5 kHz. Above 2.5 kHz, other directional cues are used. Luckily, when two speakers in front of you are separated by no more than 60 degrees, both ears hear both speakers and LF amplitude differences are converted into phase differences between the ears. However, this effect works only poorly if the speakers are behind you, and not at all if the speakers are to the side. This means that, by definition, traditional quad and todays 5.1 systems which are their descendants cannot work. As Martin Leese puts it in his invaluable Ambisonic FAQ, " Any surround system that relies on pairwise mixing between adjacent speakers must fail."
Now try another experiment. Take the same setup of a mono signal being split and feeding a pair of speakers. But this time, insert a fixed delay of a few milliseconds in the left feed and a variable delay of up to twice the length in the right. As you vary the delay on the right channel and pass through the delay setting of the fixed delay, a strange thing will happen. The sound source will suddenly appear to pan wildly back and forth between the speakers, even seeming to go beyond them. This effect is commonly known as phase-shift panning, and it exhibits how important phase is as a localization technique. Most important of all, if you set the delay to localize the sound to a certain position and then move about in the room, youll notice that instead of the source seeming to follow you as you go closer to one speaker or the other, it will stay in more or less the same position irrespective of where you are. It is evidently a powerful technique yet it is seldom used in the studio because, traditionally, it has been hard to control.
The goal of quad was to make a recording of four different sources on four different channels, and preserve or recover those channels on replay. The important thing was the separation between the channels, and the efficiency of the quad systems was judged by their ability to maintain channel separation. For this reason, we can describe these systems as "multifontal" relying on multiple sources to represent the soundfield.
The inefficiency of all the two channel matrix systems, and the difficulty of implementing subcarrier systems effectively, coupled with the lack of availability of four-channel tape recorders at the time -- along with a plethora of different systems --largely put paid to quadraphonics by the late 1970s. It lived on, however, in the form of Dolby Surround, which owed a great deal to Sansuis QS two-channel matrix system the less popular of the two most common systems. In its theatrical implementation, the system provided a stereo front stage with a mono "surround" speaker placed around the back. Problems arose almost immediately: dialogue got lost. The solution was to introduce a center-front dialogue channel to solve the problem (which of course was inevitable, given the front stage speaker positions).
An indication of the shortcomings of the different quad systems was provided by the European electronics magazine Elektor in the early Seventies, which provided a diagram of the way in which each quad system reproduced a sound source panned through 360 degrees in the studio. Theoretically, the locus should be a perfect circle (as in the original, shown in black here); as this interpretation of the original diagram shows, this was hardly ever the case. The CBS SQ system, the most successful of the matrix systems, had a reasonable left/right separation (at around the maximum for a vinyl disk, of about 35 dB) but had a front/rear separation of only 5dB in theory and more like 3dB in practice (shown in red). QS, on, the other hand (blue), demonstrated a cardioid-like locus in which a sound positioned center-rear in the control room would end up hitting you in the back of the head on replay. CD-4, the commonest subcarrier system, fared a little better, but still tended to exhibit a distortion of the soundfield around the speakers.
The diagram shows the way in which the distance of the sound source from the listener was distorted by the different quad systems, and two additional factors should be mentioned here. First, the original source is shown as being rotated around a circle passing through all the speakers ie, around the edge of the array. A notable fact of quad systems is that while it is theoretically possible to bring a sound in from the edge of the speaker array (for example, supplying equal level to each speaker should put the sound at the listening position), this is virtually never achieved in practice with quad systems and their descendants. Second, the diagram does not show the way in which the direction of the sound source being panned varies from its originally-intended position. Due to the limitations of level-only localization, sound sources are pulled into the speakers. This means that if you were to pan a sound around the room at a constant speed, on replay the sound would appear to speed up when approaching a speaker and slow down when leaving it (it would largely skip from one speaker to the next).
In the 90s, with improved video duplication systems and the increased popularity of the VCR, "Home Theater" systems began to appear which attempted to decode the surround information present on consumer video movie releases. Immediately, they ran into all the old problems. So-called "logic decoding" was by this time commonly employed to improve the separation between the channels recovered from the "4-2-4" style matrix information on tape, but this only operates effectively when a small number of sound sources are present. Essentially, the technique involves turning the other speakers down when a sound is supposed to be coming from the direction of the remaining one which is fine only when there is a single source.
What everyone wanted was a discrete multi-channel audio distribution system that would keep all the channels separate, clean and unsullied from the production environment to the listener at home. The original Compact Disc specification actually allowed for 4-channel audio, but the play time was sufficiently reduced that nobody ever tried it commercially.
Today, for the first time, we have such a medium: a potential multi-channel fully-digital sound carrier in the form of DVD, in its several incarnations. This technology not only offers the possibility of carrying multichannel film soundtracks to a consumer audience: it also offers the potential for a multichannel high-quality audio medium.
Unfortunately, it is being generally assumed that a system designed to make sound effects sound impressive in a movie theater will be ideal for high quality music in surround. All thats needed, it is assumed, is to carry six audio channels -- four channels of discrete quad plus that dialogue channel center front, and the added bonus of a low frequency channel -- which we now refer to as "5.1", and everything will be hunky dory.
It will be apparent from the story so far that not everyone believes the accepted explanation: that instead of perfect surround-sound, we will get something little better than the old discrete quad tapes better than conventional stereo mixes, no doubt, but by no means as good as it could be, if everything we know about recording and about human hearing was used to create a practical, home-friendly surround-sound system that was capable of truly re-creating the original acoustic environment.
It may come as a surprise to learn that there is another way of doing surround sound that addresses all the problems of discrete quad and its descendant, 5.1 -- and largely solves every one of them. It may be even more surprising to learn that this technique has been around for almost as long as quad; that it has been in continuous use for music recording for over 20 years; and that, to date, more original album releases have been created using it than all other surround-sound systems put together. That system is Ambisonics.
Ambisonics was the brain child of a group of British researchers, notably the late Michael A. Gerzon(2) at the Mathematical Institute in Oxford, and Professor Peter Fellgett of the Cybernetics department at Reading University. Beginning their research in the latter days of quad, Gerzon, Fellgett and their colleagues worked to develop a surround sound system that would enable a musical performance to be captured on tape or another medium, for transmission via available or future distribution media to the consumer, where it could be replayed in a conventional living room in which as far as possible the original sound and acoustic environment of the original performance would be recreated. The system was christened Ambisonics an unassuming name that simply means "surround sound".
The primary method of capturing the original performance "Ambisonically" was to be a special new kind of microphone. Called the Soundfield Microphone(8), the device included a tetrahedral array of capsules in a single enclosure. The signals from the four capsules were processed together to correct them for true coincidence up to around 10kHz, while further processing took the basic signals from the mic array and matrixed them to create four signals. These corresponded to a kind of three-dimensional version of one of Blumleins original sum-and-difference coincident pair techniques: a mono omnidirectional signal (referred to as "W"); a left-facing figure-eight (left minus right, referred to as the "Y" signal); a front-facing figure-eight (front minus back, called "X") and finally an upward-facing figure-eight providing up minus down (the "Z" signal). The four-channel signal, WXYZ, termed "B-Format" to distinguish it from the "A-Format" capsule feeds, could be recorded on a 4-track machine. The modern Ambisonic logo shown here incorporates the omni and figure-eight mic polar diagrams in stylized form. Imagine the logo as a series of superimposed polar diagrams viewed from above, and you'll get the idea.
It is interesting to note that as early as the mid-Seventies, Ambisonics included the capability to record and reproduce height information, which even now is not a part of surround-sound practice, despite the fact that it adds almost as much to the realism of a surround system as rear speakers do.
While many of the original Ambisonic team were audiophiles interested in capturing live music performances and recreating them in the living room, some were also interested in other types of recording, or at least realized their importance. Michael Gerzon especially, interested as he was in many kinds of music including the most obscure German avant-garde rock bands, understood the fact that to be successful, the new system required not only the ability to faithfully recreate a live performance, but also had to work with modern multitrack studio techniques. To this end, he developed the equations and basic designs for a wide range of controls for the Ambisonic multitrack studio, including surround panpots and reverb returns, along with effects controls capable of possibilities that were inconceivable with multifontal systems, such as rotating one soundfield while panning it within another.
For replay, the B-Format signal was fed to a special decoder which derived a minimum of four loudspeaker feeds. With the exception of discrete quad recordings, all surround recordings of the time needed a decoder of some sort but this was where the similarity of an Ambisonic decoder ended. Instead of striving to capture four discrete sources at one end and feed them individually to four speakers in a square at the other, the B-Format decoder derived a set of interrelated speaker feeds. Instead of four independent signals, the speakers in an Ambisonic replay system were fed with signals each of which contained all the elements of the recording, but with different relationships. The speakers worked together to recreate the acoustic and ambience of the original recording.(3)
Ambisonics attempts to use the available loudspeakers to recreate the original soundfield (as a result, small speakers are very effective as they work together in this task). But there is more to this than meets the eye. If all that was involved to capture the wavefronts impinging upon a soundfield mic and recreate them so as to impinge on the listener during replay, the results would be disappointing: you could only hear the effect at a single point in the room, and the "sweet spot" would at best be the size of a football. That was the problem with traditional surround systems Ambisonics had to do better. It succeeded, using not only level to localize the sound sources, but by using phase and other directional cues in addition. The Ambisonics team based their surround scheme on careful research in psychoacoustics and human hearing mechanisms as well as the pure physics of the problem, taking advantage of different localization systems in different frequency ranges.
The system had a number of noticeable benefits. First of all, you could put the speakers more or less where you wanted them. In the very first Ambisonic decoders, you could place the four speakers in almost any rectangle, as long as the ratio between long and short sides was between 2:1 and 1:2. You simply placed the speakers in convenient places and set a layout control accordingly. Todays digital decoders can handle different numbers of speakers in many different layouts.
Second, the surround effect was pronounced and stable over a very wide listening area. You could even stand outside the speaker array and experience a kind of "sonic image" emanating from within the array. Only if you listened really close to a speaker did the effect diminish whereupon you would be surprised to notice that you could hear everything coming from that one speaker. The concept of "separation" is a meaningless one in Ambisonics: instead the operative word is "relationship". Unlike multifontal surround systems, all the sound did not seem to come from the speakers. Instead, the speakers hardly seemed to be there: sound sources appeared to come for any direction, whether a speaker was there or not in stark contrast to the usual experience. A famous British audio expert who was blind was played an Ambisonic recording made in the Albert Hall, and was convinced that he could hear one of the speakers. He pointed to the offending position and there was no speaker there. It later turned out that he was hearing sound reflected from one of the acoustic "flying saucers" that were hung from the roof of the Albert Hall at the time to improve the sound which was faithfully reproduced by the system.
And finally, Ambisonics offered the promise of something that was beyond the capability of any practical consumer surround system then available: the reproduction of height information. This was christened "Periphony", from Greek roots meaning "sound around the edge". It required more speakers a minimum of six, though eight was better but it was extremely impressive.
The new system had some challenges, too. First of all, the "native" B-Format was a sum-and-difference recording system, like M-S stereo recording. As a result, you couldnt simply listen to B-Format on speakers: you had to decode it to listen to anything beyond the W (mono) channel.
With the exception of discrete quad, all the extant surround systems were compatible, to a greater or lesser extent, with conventional stereo and mono. If you played back an SQ or CD-4 disk without a decoder, for example, you would hear a respectable stereo mix. And if you listened in mono, you would hear a respectable sum of left and right. Ambisonics needed a stereo/mono-compatible matrix as well.
The solution came from a number of sources. The BBC was experimenting with surround sound at the same time and they adopted Matrix H (presumably the eighth one they came up with: it was allegedly based on QS principles) for experimental surround broadcasting. The Ambisonic team also developed a number of matrixing schemes, with varying names: the BBCs Matrix H and the Ambisonic teams 45J matrixing system were combined as Matrix HJ. Work done on the Nippon Columbia UD-4 ("Universal Discrete 4-channel") quad system was also brought into the mix, along with research by Duane Cooper at the University of Illinois, and the final result was referred to as "UHJ" ("Universal HJ"). This is sometimes referred to as "C-Format".(4)
UHJ is an example of what is referred to as a "hierarchical" surround encoding scheme, offering an increasing gamut of capabilities depending on the number of transmission channels available (see illustration) and on the decoder. In its fullest form, 4-channel UHJ carries all the information present in a 4-channel B-Format signal, including height information. If four channels are not available, the fourth, Q channel can be dropped, leaving a three-channel horizontal (planar) surround signal. If necessary, the third T channel can be bandwidth limited: this is referred to as a "2.5 channel" system, and was used in experimental Ambisonic radio broadcasting by the British Independent Broadcasting Authority, where the third channel was transmitted via phase-quadrature modulation. If only two channels are available, they can be used with a decoder to provide a very effective horizontal surround capability, although the accuracy of localization is not quite as high as in a 2.5- or three-channel version. The fact that most transmission and distribution systems at the time (such as FM radio, vinyl disk and later CD) were essentially 2-channel media, 2-channel UHJ became the most common member of the UHJ hierarchy employed for commercial releases, despite the fact that it does involve some compromises.(5)
If no decoder is available, a 2-channel UHJ recording can be listened to as if it was a stereo signal. In this case, the phase and level relationships in the signal, which are based on aspects of the way in which the brain localizes sound sources in nature, lead to a certain amount of "aural decoding" in which the brain tries to make sense of the signals by giving them surround positions and a "super stereo" effect that goes way beyond the speakers. The result is the fact that 2-channel UHJ is a powerful "3D stereo" tool, at least as effective as more recent 2-channel surround techniques used today for computer and multimedia applications, and a good deal more impressive than most with the added benefit that with a decoder, horizontal multi-speaker surround is available.(6)
The new Ambisonic surround system quickly attracted the attention of Count Numa Lubinsky, founder of the fledgling audiophile label Nimbus Records. Based in a manor house at Wyastone Leys, on the English-Welsh border, with most of their staff living as well as working on-site, Nimbus had already experimented with stereo recording techniques and even released a small number of QS matrix recordings. Nimbus heard Ambisonics and fell in love with it. They carried out a number of recordings with the Soundfield microphone and subsequently resident genius Dr Halliday developed a horizontal-surround mic array consisting of an omnidirectional B&K mic and two Schoeps figure-eights. The mic signals were processed in a custom box to produce B-Format. Initial recordings were made in B-Format on a 4-track recorder and then encoded into 2-channel UHJ, but with the advent of stereo digital systems, Nimbus chose instead, for the highest audio quality, to go direct to UHJ and omit the B-Format stage. It is only relatively recently, with the arrival of affordable 4-channel digital recorders, that Nimbus have again returned to B-Format recording. Apart from the initial stereo and QS releases, every album ever released by Nimbus has been recorded Ambisonically.
While the initial thrust of Ambisonic research at least as far as the recording side was concerned focused on using the Soundfield mic to "capture" a live performance, work went on in various quarters to develop multitrack mixing systems. Alice Stancoil Ltd built a mixer for the IBA in the early Eighties, which was written up by Chris Daubney(9). The present author was part of a small group which designed and built Ambisonic panpots and other devices and hoped to obtain funding for an Ambisonic localization system that could be added to a conventional mixing console.
Ambisonic mixing finally became a reality in the early Eighties with the launch of a series of rack-mounted outboard processors by Audio + Design Recording in Reading. Known as the Ambisonic Mastering Package(10), the units were designed by Dr Geoff Barton and included a professional studio decoder, capable of decoding B-Format and 2- or 3-channel UHJ. The other units utilized a number of different approaches to the challenge of localizing signals in an Ambisonic soundfield, and were all designed to integrate with a conventional console as much as possible.
The Pan-Rotate unit offered eight mono inputs, the angle of each of which could be controlled by a continuously-rotating 360-degree knob which panned the signal around the room. A second "radius vector" control determined the distance from the center, and allowed a sound to be brought in from the edge of the array, through the center and out the other side a facility which is not possible with most other surround systems. It output a B-Format signal and included a rotate control which could be used to rotate the entire soundfield, and an additional B-Format input to be fed from another source such as a Soundfield mic.
The B-Format Converter took feeds from four console groups and an aux send, and used the console panpots to pan across each quadrant of the soundfield depending on the pair of groups selected. It, too, output a B-Format signal.
The final unit was the Transcoder. This versatile device accepted a B-Format signal and encoded it into UHJ, but it also had another function which was at least as useful. It had two stereo inputs, a front stage and a rear stage. The width of each stage could be controlled independently and the output was a 2-channel UHJ signal. The Transcoder therefore provided a "quick and dirty" method of producing an Ambisonic mix you simply fed two pairs of console groups to the unit as well as providing the final encoding stage for a B-Format mix created with the other units.
The first Ambisonically-mixed album was a production music album from the KPM Music Library: Contact, by Keith Mansfield, was released in 1983 and was cut and pressed by Nimbus Records. The first Ambisonically-mixed Compact Disc was released the following year and was a collaboration between Nimbus and KPM. Titled Surprise, Surprise, by Chin & Cang, it was only the third CD to be manufactured in the United Kingdom, following the opening of the first Nimbus CD pressing plant at Wyastone Leys.
Early Ambisonically mixed albums contained a description deliberately reminiscent of that appearing on early stereo releases: "This UHJ/Ambisonic album will reproduce full surround-sound when replayed through an Ambisonic decoder; however, enhanced stereo and improved stereo/mono compatibility will be experienced when replayed through normal audio equipment "
Nimbus was the primary source for audiophile releases made with single-point Ambisonic mic technique, joined by smaller labels such as Music from York, Unicorn/Kanchana and IMF. KPM was initially the primary outlet for Ambisonically-mixed material.
There was discussion among Ambisonic adherents on how to evangelize the system. The approach taken by the quad manufacturers had been to encourage record companies to endorse one system or another and release all their material in that format an approach which, for whatever reasons, had failed. The approach preferred by the Ambisonic fraternity was to go direct to the people who made the records artists and especially engineers and producers -- and encourage them to use the system, primarily by demonstrating its superiority. Although there were few people engaged in this kind of promotion, they had quite a good deal of success, and within a few years Ambisonics had been used by a varied range of artists, including Alan Parsons (Stereotomy), Tina Turner (Break Every Rule), Steve Hackett (Till We Have Faces), Adrian Legg (Lost For Words), Ambisonically-mixed CDs on the Collins Classics label, and Paul McCartneys Liverpool Oratorio, to name but a few. In each case, the albums received rave reviews for their sound quality and the extent of the sound-stage, and in each case, the use of Ambisonics was the engineer or producers decision, just as the use of any other kind of outboard equipment might be.
Evidently, by this time, Ambisonics was highly thought of in the recording community; many favorable articles had been written on the subject (and not solely by the present author!); and plenty of albums had been released. It is entirely reasonable at this point to ask the question, "If Ambisonics is so amazing, why arent we all using it?" The simple answer is money; the longer answer involves bureaucratic bungling of the kind only the English can manage.
The inventors of Ambisonics, public-spirited as they were, took their invention to a British quasi-non-governmental organization called the National Research Development Corporation (NRDC). The purpose of this organization was to take British inventions and license them to industry, and it had been very successful in some cases (and dreadfully unsuccessful in others, such as the Air-Cushion Vehicle, or Hovercraft). The NRDC technique was to take, say, a new chemical process invented in a British university by penniless students, and license it exclusively to a large manufacturer, taking a cut of the royalties. This was fine for a chemical process, where the intention was to find an exclusive licensee. Ambisonics, however, would only succeed if everyone licensed it, like the Philips Compact Cassette or Dolby B. And the NRDC simply couldnt even think of the kind of marketing resources necessary to do that kind of a job. The existing dozen or two licensees met from time to time and the idea of a joint marketing strategy was floated, but it never got off the ground.
Every time something seemed about to happen, it was dealt a blow from an unexpected source. Thus, with Ambisonics poised for success fairly early on, the Thatcher government came along and decided that British inventions should be allowed to fail just like anyone elses, and withdrew NRDCs funding. The organization looked for an exclusive licensee something they were normally quite good at but of the three contenders at the time, they chose the least likely; when that failed, they went with the next least likely; and it was only when there were no other options left did they allow Nimbus to become the exclusive licensee which they should have become several years earlier.
Nimbus, now part of Robert Maxwells Mirror Group, took the responsibility of promoting Ambisonics seriously, under the guidance of Stuart Garman. Garman made trips to Japan and persuaded Onkyo and Mitsubishi to include Ambisonic decode capability in their home theater systems. Nimbus funded an industry seminar in Abbey Roads Studio 2, which despite some technical difficulties was very well received and then Maxwell jumped off a boat and the impetus collapsed in the resulting confusion.
Meanwhile, Dolby Laboratories had no such bad luck and no such lack of funding. Their QS-derived matrix became the movie theater standard and impressive it was too, although it was not capable of the localization accuracy needed for music-only program material. And with the advent of digital distribution techniques, the goal of the old discrete quad supporters was poised to be realized: a multi-channel digital distribution medium was on the horizon. Ambisonics took a back seat but it never went away.
Today, all the talk is of DVD the Digital Versatile Disc in its many forms. At the time of writing, the specifications have not been tied down yet. DVD as a video distribution format includes the possibility of four channels of 96kHz sampling or eight channels of 48 kHz PCM digital audio, or six channels of Dolby Digital sampled at 48 kHz. Does Ambisonics have a future in the world of DVD? The answer is a definite Yes. In fact, its best days may be ahead.
Dolbys AC-3 compression system is a little long in the tooth, and the competing DTS system is not much more recent. Both have come under some criticism (especially from such groups as Acoustic Renaissance in Audio(12) an international body of some of the greatest minds in digital audio) as they are "lossy" compression systems in other words, you never get back what you put in. The systems lose data where it is theoretically less noticeable. The ARA points out that modern computer technology can cheaply provide a similar level of lossless compression, or "packing" where no data is lost and has called for the inclusion of a special data flag in the data stream to indicate that such a method has been used. So far this eminently sensible recommendation has not made it into the draft DVD-Audio spec. This discussion is peripheral to the place of Ambisonics on DVD: it is possible that a lossy compression scheme designed to handle amplitude-only surround systems (ie 5.1) might cause damage to some of the more subtle information in an Ambisonic signal, but this is unlikely. Like most people calling for higher quality audio discs and surround standards, Ambisonics proponents have tended to side with the ARA in favor of lossless compression but if it never came about, Ambisonics would probably be no more affected than other surround systems.
Another proposal from the ARA is of particular interest to Ambisonics enthusiasts. They propose a second flag that indicates that a hierarchical surround encoding scheme has been employed. Astute readers will come to the conclusion that this is a reference to Ambisonics - and this is indeed the case. In fact the hierarchy referred to by the ARA is based on work on sound for HDTV performed by Gerzon and Barton and presented at the AES 92nd Convention in Vienna in 1992 (Preprints 3339 and 3345). The proposal incorporates what we can refer to as the "G-Format Hierarchy" (we will refer to it here as "GH" for short): a development of the UHJ idea in which two of the six (5.1) channels provide left and right and the others are derived from a kind of enhanced B-Format ("B-Format+", or "BEF" - the additional two signals are called "E" and "F") that includes two additional difference channels, one providing additional "channel separation" for 5.1 compatibility and the other a tight, stable center channel for dialog. In addition, the system can handle irregular speaker layouts as well as regular rectangular ones. (For those acquainted with the minutiae of recent work in Ambisonics, "B-Format+" is not the same as "second-order" Ambisonics).
In the early days of DVD, it was expected that it would be necessary for a surround system to be stereo-compatible, hence the inclusion of L and R. This may still be true for DVD-Audio (or DVD-music as it is also known), but there is no agreed standard for this incarnation of DVD at the time of writing.(13). It may also, if the ARA is successful, be possible to use B-Format+ directly on DVD-Audio. In any event, G-Format offers the possibility of Ambisonics without decoders -- it is essentially similar to "pre-decoding" an Ambisonic signal for 5.1 -- if you are prepared to suffer a few compromises (see below).
UHJ - or something like it - is an excellent medium for digital transmission of surround information, whatever the surround system in use. Any surround system including 5.1 can be encoded successfully into UHJ, and such a signal can be decoded 100% successfully into a 5.1 speaker array. UHJ also has the remarkable benefit of using significantly fewer channels. It takes only three channels to convey a high-definition horizontal surround signal with UHJ, while 5.1 requires six. And if you have four channels available (as is the case with DVD even the standard video version, let alone DVD-Audio) you can carry height information, which for some reason does not even make it into the discussions except in certain rarefied atmospheres(14).
If the ARA gets its hierarchical surround encoding flag, it would be entirely practical to make a recording in 5.1, transmit it via three channels of UHJ (or use GH or BEF, which both require five channels for horizontal surround), and decode it into 5.1 at the other end. Or make an Ambisonic recording and decode it to a 5.1 array. Or make a 5.1 recording and decode it with an Ambisonic decoder, with its unique and innately flexible speaker positioning capabilities. Or, of course, you could work entirely in Ambisonics. UHJ would provide a seamless integration of all conceivable surround systems and because of the fewer channels, they wouldnt even need compression.
There would be another benefit too. One thing that really scares todays engineers is the thought of having to do several mixes including stereo and surround versions of their work. And its not surprising. While old-timers may point out that in the old days we used to do a mono mix, a stereo mix, a radio mix, an album mix and a single mix, and why cant todays engineers do some real work, the fact is that todays tracks are incomparably more complex than they were in the early days of stereo mixes and dual stereo/mono inventory. It really is quite a frightening and expensive prospect. There are some new technologies which go some way towards ameliorating the problem, such as Lexicons Logic 7 system developed by Dave Griesinger, but even he admits that Logic 7 provides no more than a good starting point for multi-format mixes(15).
The answer, once again, is Ambisonics. We learned over 15 years ago that if you get your sources panned into position by listening in surround, and then monitor in the lowest-level format that your listeners will be likely to contend with stereo, for example as you carry out the final mix, you will end up not only with a good stereo mix but an excellent surround mix also. And with UHJ, you simply master the one mix and people hear a successful balance, however many channels they have access to, and whether they have a decoder or not.
Unfortunately there is one major drawback to promoting UHJ - or BEF for that matter - as the standard surround-sound encoding scheme for DVD actually, two. Theoretically, 5.1 surround carried by a 6-channel digital medium does not need a decoder of any kind. You put six channels in one end and get them out of the other. To get surround sound information out of a UHJ signal, however, you have to decode it, so there has to be a decoder somewhere, while in the 5.1 environment, you dont need one. This means that the transmission path is more expensive than the 5.1 and in a commercial environment where you want to minimize your costs, it wont fly.
In fact, the situation is not as simple as that. Every DVD player in existence today actually has a decoder in it: a Dolby AC-3 decoder! So whatever soundtrack you want to put on your DVD, it is going to go through that decoder and come out 5.1. So on the face of it, it looks as though UHJ and Ambisonics as the solution to all the problems of surround sound has been thwarted at the starting gate. If the ARA is successful, you could use GH in future players as an optional encode/decode scheme, but it is already too late for either to be the official transmission scheme for DVD.
But not so fast. We may not be able to get UHJ into the equation except as an option, but there is still room for Ambisonics. If you have to have a 5.1 signal path well, give them a 5.1 signal path. If we make an Ambisonic recording in any of the conventional ways, for example recording the result in B-Format, we decode it into speaker feeds to listen to. Suppose we decoded the Ambisonic signal for a standard 5.1 speaker array a reasonable feature for a modern Ambisonic decoder: remember that Ambisonics can support virtually any regular speaker array and fed those speaker feeds on to DVD? The 5.1 signals would emerge at the other end and the listener would experience Ambisonics, decoded for 5.1.
This is an entirely reasonable course of action, and will likely be the course that will be taken by those who wish to take advantage of the benefits of Ambisonics as a surround-sound recording medium, yet wish to reach the largest audiences.
The idea of 5.1 decoded Ambisonic speaker feeds is in fact very closely related to the G-Format Hierarchy discussed above. It differs a little from traditional Ambisonic decoding in that it is frequency-independent as no shelf filters are used (like an Ambisonic decoder designed for large listening spaces, eg live sound), so in fact the next-generation Ambisonic production system would differ a little from a straight consumer decoder. To avoid confusion, we'll call this format "GD" - D for "Decoded" . GD can be encoded with AC-3 or DTS with little or no damage, and placed on to DVD. On replay, most of the benefits of the latest Ambisonic techniques will be evident to the listener, including the listener-independent imaging. The main feature of Ambisonics that will be lost in this configuration will be the ability to put the speakers where you like: as there is no adjustable decoder, the speakers will have to be in the right places the "official" places for good 5.1. Another missing feature is the lack of height, but there is a solution to this: mixing height information into the 5.1 feeds in a way in which it could be recovered with appropriate equipment, although the exact parameters for this are still being worked out.
And even this is not the end of the story. The Ambisonically-decoded 5.1 speaker feeds will have enough data in them to enable full horizontal B-Format or B-Format+ to be recovered, and optionally height as well (see above). Such a signal could then be decoded with a next-generation Ambisonic decoder, or even an existing one. GD will in the end offer almost as many possibilities as UHJ with one exception: the disc would still be obliged by the DVD spec to carry multiple mixes. Luckily, as we have seen, these could be developed in the studio automatically during the Ambisonic mixing process. Ideally, the Ambisonic DVD disc could contain GD format in the 5.1 channels, and a simultaneous 2-channel UHJ downmix in the linear PCM channels. I call this format "G+2".
We have seen how a powerful and effective surround sound system has co-existed alongside the hype given to first quad, then movie surround, and now DVD. Ambisonics has been used for more original record releases than all other suround formats put together, and it is the surround-sound system with the longest history of continuous use in the industry. We have seen how Ambisonics is capable of superior results such as listener position independence and carrying additional information such as height. And finally, we have seen how Ambisonic techniques and technology can be integrated into the very latest digital transmission media and can even be taken into the enemy camp via GD: Ambisonic signal feeds pre-decoded for 5.1 replay.
There is now a need for a new version of the old Ambisonic Mastering System. Although many hit records still use analog mixers, and the existing equipment designed for Audio + Design by Geoff Barton and now manufactured by Cepiar Ltd in the UK is completely compatible with such systems, todays Ambisonic outboard processor could benefit from additional facilities to take advantages of the possibilities of DVD. Digital processing would be an obvious choice, and in fact a new system could operate entirely in the digital domain.
The "Ambisonic 5.1 Toolkit" would take stereo front and rear stages, just like the old Transcoder, plus an optional center front input so it could handle 5.1 inputs directly), and generate an Ambisonic signal in a number of formats. Unlike the majority of 5.1 systems, it would be able to integrate the center front channel properly into the mix, and in fact could be used to generate "3-speaker stereo" signals offering tighter front-stage-only imaging for such applications as HDTV. It would also, of course, generate "G-Format" 5.1 outputs and 2-ch UHJ, either from B-Format+ or from transcoded console inputs, while a companion unit would recover B-Format+ from the replayed 5.1 signal and include a full Ambisonic decode capability to drive a number of different speaker configurations. It might even handle height, and it may be able to allow the input of individual mono signals for mocalization.
Now find out how to put Ambisonics on to a standard DVD Video disc -- with an automatically-generated stereo mix.
Richard Elen was the founder of two professional audio magazines and the editor of a third, Studio Sound, from 1980-84. He is a recording engineer and producer specializing in production music, has been involved in the recording industry since 1970, and has produced over 50 surround-sound recordings, almost all of them Ambisonic. He admits that on more than one occasion he has incorrectly predicted the universal acceptance of Ambisonics, but hopes that he will ultimately succeed in generating a successful self-fulfilling prophecy. Elen has had his own PR agency, and has been part of a major advertising agency specializing in professional audio. Today he is Vice President of Marketing at Apogee Electronics Corporation, the Santa Monica-based manufacturer of high-quality digital audio conversion systems, and as you can see, he still gets to write from time to time. He lives in his Southern California canyon home with his cat, CaitlÌn (evidently a pun) and hopes that all three of them havent been swept away by El NiÒo by the time you read this. In what pitifully small free time he has, he is writing a novel as well as being involved with several environmental organizations and in Web site design, amongst other things.