The British surround-sound system Ambisonics has been in existence for almost thirty years. Despite lacking the funding of major corporations available to competing systems, Ambisonics has survived thanks to superior technical performance and the support of enthusiasts around the world. Today developments are underway which make Ambisonics more than adequate for emerging digital 5.1 delivery systems such as DVD-Video, DVD-Audio and SACD. Ambisonics is poised to be reborn as a studio production technology capable of transforming the performance of 5.1 surround, without the need for the listener to obtain any additional equipment. Richard Elen introduces G-Format: Ambisonics for the New Millennium.
For many years only two distribution channels were available for most material -- vinyl disc, cassette, FM broadcast and later the Compact Disc. Quadraphonic techniques attempted to extend sound into two dimensions, but the additional channels had to be squeezed into the two existing ones. The major disadvantage all the quad systems shared was the need to rely on these two channels. They all required stereo compatibility and this compromised the performance in one way or another. What everyone looked for was a way of carrying the four quad channels individually to the listener. It was only really possible with tape, and for many reasons, quad died out as far as the record buyer was concerned.
As far as the record-buyer was concerned, only one surround technology persisted, primarily because of its superior performance even in a two-channel environment: the British surround system Ambisonics, developed in the late Sixties by Michael Gerzon of the Mathematical Institute in Oxford, Professor Peter Fellgett at the University of Reading, and others. Despite the rise of other systems today, there are still a great many Ambisonic albums available today -- and the number is increasing all the time.
In the film world, however, quad continued -- and eventually flourished, thanks largely to the efforts of Dolby Laboratories. Problems with holes between the speakers in quad -- inevitable when the speakers are more than 60 degrees apart and level only is used to position sounds in the environment -- led to the introduction of a center-front dialog channel. A Low Frequency Effects (LFE) channel added extra impact to movie blockbusters. And eventually a second surround channel was introduced, providing the ability to localize sounds towards the rear of the listener.
The result was the 5.1 system, which is now moving into the home with the advent of home theater systems and particularly DVD, the Digital Versatile Disc, allowing at least five full-bandwidth digital channels to be supplied direct from the production environment to the listener at home. Today, DVD-Video offers 5.1 digital audio to home listeners, while the emerging DVD-Audio and Super-Audio CD systems offer high quality audio-only listening with both 5.1 and stereo layers or areas on the same disc -- in some cases with an additional Red Book CD-like layer for backwards compatibility. The dream of the Quad enthusiasts -- discrete, multiple channels -- is now a reality.
In traditional Ambisonics, either a special Soundfield microphone or similar array, or Ambisonic mixing equipment, is used to capture or generate a surround-sound experience which is generally recorded as a B-Format signal, containing full details of the entire audio environment. This signal, consisting of up to four interrelated tracks -- front minus back (X), left minus right (Y), and up minus down (Z), plus a mono sum of left, right, front, back, up and down (W) -- is unfortunately not compatible with most distribution systems (such as left-right stereo). As a result, an encoding scheme, UHJ, was developed in the Seventies to provide this compatibility. In addition to the work of Gerzon, Fellgett and others in the UK, UHJ owes a debt to Duane Cooper in the USA.
UHJ is a hierarchy, and in its fullest form uses four channels to represent the same data as is present in a 4-channel B-Format signal. However instead of the sum-and-difference channels of B-Format, UHJ utilizes up to four channels of which two are compatible with stereo. Remove the fourth channel, Q, and you remove the height information (until recently a unique feature of Ambisonics); remove the third channel, T, and you reduce the accuracy of the horizontal (planar) surround; listen to the remaining L and R signals with a decoder and you get good horizontal surround; listen to them without a decoder and you get "super-stereo", or sum them for superb mono compatibility.
Although 2-channel UHJ is compatible with mono and stereo, when decoded it does not provide as accurate a surround picture as one would like. In condensing the data for horizontal surround into two channels, the sound undergoes what we might call "spatial compression", reducing its accuracy -- although 2-channel UHJ still outperforms other two-channel matrix surround systems in the view of its adherents. In addition, a few people have difficulty listening to decoded or even undecoded UHJ due to its phase components (which provide the "beyond the speakers" super-stereo effects). These characteristics (which also applied to other 2-channel surround matrices) rendered 2-channel UHJ unsatisfactory for some people, and detractors seized on this as a means of putting down Ambisonics, although it only accounted for the "lowest common denominator" available for surround transmission, and still outperformed competing systems which had the benefit of larger marketing budgets.
It was this technical superiority that enabled Ambisonics to survive while quad disappeared from the record stores. Nimbus Records adopted Ambisonics early on, and continues to issue only Ambisonic recordings, made with their own version of the Soundfield microphone. Other companies such as Collins Classics, Hyperion and others regularly issue Ambisonically-mixed albums.
The big disadvantage, however, was that to get surround-sound from an Ambisonic album you required a special decoder. Consumer and professional Ambisonic decoders were -- and are -- available (today from such companies as Meridian, Cepiar and Cantara) to recover surround from a 2-channel UHJ recording.
Today's DVD systems often appear to require no decoder, although this is not in fact the case. Somewhere in the replay chain there is a decoder to recover the digital multi-channel AC-3, dts or other surround data from the disc. These methods, however, are really only methods of encoding the data to be fed to the channels. Unlike an Ambisonic signal, where there is a distinct relationship between the audio channels, most current 5.1 systems seek only to transfer the original 6-channel master on to the disc at the mastering stage and get it back again on replay. An Ambisonic decoder, like a Dolby Pro Logic decoder, actually extracts speaker feed information from the signal rather than simply extracting the channels and squirting them into their appropriate "discrete" output.
In a modern home theater system, the decoder is usually in the player, or in the preamp, receiver or surround controller: only occasionally is it found as a separate unit, and only occasionally do such decoders include Ambisonic capability (notable exceptions being the Meridian 500 and 800 series). Ambisonic decoders were purchased only by enthusiasts, or were acquired inadvertently by listeners if the manufacturer included 2-channel UHJ decode in addition to more common modes such as Dolby Pro Logic.Quite soon after the introduction of DVD as a consumer video distribution medium, it became clear to audiophiles and manufacturers alike that the specifications of DVD-Video were insufficient to satisfy those who required a significant step up from the levels of quality experienced with Compact Disc. A big problem to many was the reliance on lossy compression to get all the audio on the disc. Lossy compression or "perceptual coding", unlike the kind of compression employed to compress computer programs, but like that used to stream sound files on the Internet, relies on psychoacoustic phenomena to reduce the data rate, losing data that is judged inaudible. This kind of approach is an anathema to audiophiles.
DVD-Audio was the answer, with both 5.1 surround and stereo capability, and 24-bit digital audio sampled at 96 kHz (or even 192 kHz in stereo only). There is also a competing high-quality disc format, Super-Audio CD from Philips and Sony, which uses a completely different digital audio technology to the PCM technique used in DVD-Audio. Instead it uses a 1-bit "bitstream" based on Sony's Direct Stream Digital (DSD) format. But, like DVD-Audio, SACD can carry 5.1 surround as well as stereo, and promises comparable levels of quality, although much of the equipment required has yet to be developed.
Meanwhile, work was going on in the Ambisonic camp. Just as UHJ is compatible with stereo, Ambisonics proponents reasoned, it should be possible to develop an Ambisonic transmission system that was compatible with modern 5.1 surround technologies. And perhaps, just perhaps, it could be done in such a way that the listener would need no additional gear to enjoy the superiority of Ambisonics.
The answer was proposed by Dr Geoff Barton, co-founder of Trifield Productions -- a London-based TV production company specializing in providing satellite TV services to Japan -- and a member of the original Ambisonic development team. Barton developed the first Ambisonic studio mixing equipment for British pro audio manufacturer Audio & Design Recording over 15 years ago (which is still available from Cepiar in the UK).
While working with Michael Gerzon on some of his last work -- to which Trifield holds the rights -- Barton suggested that a 5.1 system could be used simply to carry the loudspeaker feeds from an Ambisonic decoder in the studio to the listener at home. There were some compromises: like conventional 5.1 techniques, you would need to make sure that the listener had their loudspeakers in essentially similar positions to the ones in the studio (conventional Ambisonic decoders allow you to place the speakers where you like); and the studio decoder would be a little different from the kind you would use at home; but the idea seemed to hold water. The technique, now referred to as "G-Format", emerged as one of the possibilities offered in a paper authored by the two men and presented at the 1992 Audio Engineering Society Convention in Vienna. The paper also discussed the new kinds of decoders that would be needed to drive speakers in a 5.1 array, where, unlike quad and traditional Ambisonics, the speakers did not form a regular polygon. Such decoders are now referred to as "Vienna" decoders in the argot of Ambisonic surround sound. In addition, the paper introduced an "enhanced B-Format", also known as "BEF", in which two channels are optionally added to the existing four: one to enhance front-stage localization and another to increase front/rear separation.
The idea was adopted by Acoustic Renaissance for Audio, a group set up to advance the cause of a higher-quality audio disc, and chaired by Bob Stuart of Meridian (a manufacturer of high-end consumer audio equipment, including Ambisonic decoders). The group (which included Michael Gerzon and a host of other significant figures in international digital audio) called for the DVD-Audio spec to include a flag indicating that lossless compression had been used on a disc. They also suggested a flag that would tell the player that the disc contained G-Format.
The DVD-Audio specification is essentially complete at the time of writing (September 1998) with the recent announcement that Meridian Lossless Packing, a lossless compression algorithm developed by members of the ARA, is to be incorporated in the specification. MLP includes the ability to code all kinds of surround information, and not just for DVD. It could equally be used to make ordinary CDs carry Ambisonic B-Format surround, or stereo at 88.2kHz sampling. In many senses, the ARA has got what it has lobbied for.
Meanwhile, G-Format received a mixed reception in the minds of Ambisonic supporters, at least as defined by subscribers to the "sursound" Internet mailing list. Some complained that G-Format was a poor compromise and would never surpass the benefits of UHJ or B-Format, despite the need for a decoder at home with these techniques. It was pointed out by supporters, however, that if designed correctly, G-Format could be used to recover B-format to drive a "traditional" Ambisonic decoder if desired.
"Reversible" G-Format removes filters that are present in conventional decoders and slightly changes the content of the speaker feeds. The difference is inaudible to the 5.1 listener, but the difference makes it possible for the concerned Ambisonic purist to recover the original B-format signal for local decoding. This provision satisfied most commentators: the vast majority of listeners would hear superior 5.1 surround with no additional decoder, while the purists among the Ambisonic elite could use a decoder to recover B-Format and/or drive other kinds of speaker arrays.
In what ways is G-format actually superior to conventional 5.1 surround? To answer that question we need to consider briefly the shortcomings of conventional approaches. Conventional surround (and stereo!) recordings localize sound sources simply by means of level. To place something mid-way between front and rear left, for example, you simply apply equal levels to these two loudspeakers. This is not very satisfactory, however, for several reasons. Human hearing relies on different combinations of level, phase and arrival time -- as well as other factors -- to localize sound sources, and not solely on level. Loudspeakers in a conventional stereo pair are generally placed 60 degrees apart with respect to the listener. In this configuration, the ears can hear sound coming from both speakers and differences in level are interpreted as phase differences and provide localization information. This effect begins to fail as the speakers are moved further apart, and by the time you get to a 90 degree front stage -- as was found in quad setups and is still recommended in some quarters today -- a significant "hole in the middle" has developed (it's one reason for the center front channel in 5.1).
Level-only localization is poor at best in the rear, and virtually nonexistent at the sides. As a result, conventional surround tends to suffer from poor inter-speaker imaging. Sounds are sucked into the speakers and it is hard to get them to appear from anywhere else -- as a result, some engineers and producers have taken to deliberately placing sounds only in the loudspeakers: a simple and effective way out, but one that is rather limiting in creative terms. Another problem is the fact that as only level is involved, moving about within the listening environment changes the positions of sounds in the replayed image. Move to the left and you hear more of the left speakers, so the balance appears to move to the left. Move to the rear and the image moves backwards. This is not satisfactory: not only is the image unstable, it is only "right" in a very small spot in the center of the array -- the "sweet spot".
Ambisonics, on the other hand, relies on phase as well as level and other cues to provide localization information. As a result, it is less sensitive to listener position. In fact, a favorite sales technique for Ambisonics is to set up the system driving near-field monitors in a studio control room and ask the client to come into the room during replay of an Ambisonic recording. A remarkably stable image inside the speaker array is generally experienced from outside the array, much to the client's delight. And the image changes little as you enter the replay area and sit down (unless, of course, you go right up to a loudspeaker). Contrary to belief in some circles -- largely on the part of people who think they know the equations on which Ambisonics is based, and therefore know what it would sound like if they ever listened to it properly -- the "sweet spot" for Ambisonics is very large.
Ambisonics also allows sounds to be positioned anywhere inside the listening area. A sound panned around the room at constant speed a constant distance from the listener should stay that way and sound the same all round -- it shouldn't jump from one speaker to the next, changing its timbre as it goes, and moving closer to and further from the listener, which is all too often the case with conventional 5.1. In Ambisonics, it behaves impeccably.
In essence, G-format moves the decoder out of the home and into the studio. The big advantage is that no decoder is needed at home -- in fact no special gear is needed whatsoever, above that required for regular 5.1 listening. All the processing occurs at the production end of the chain, and one can thus reasonably assume high-quality gear and a correct setup. As a result, any transmission medium which has "discrete" 5.1 capability can be used to carry a G-format signal. G-format does not have to wait for DVD-Audio: it can be used today, on a standard DVD-Video disc using Dolby Digital AC-3; on a movie soundtrack; or a DTS or MLP multi-channel Compact Disc. In the future, it will be available for DVD-Audio applications (where a flag in the data stream may be used to switch optional end-user Ambisonic decoders in and out) or for the Sony/Philips Super-Audio CD. None of the available audio compression and transmission schemes -- even lossy ones -- seem to have any deleterious effect on the surround quality of G-format Ambisonics.
One proviso is the way in which stereo compatibility is handled. On an AC-3 DVD, for example, coefficients in the disc header specify how the surround information should "fold down" to stereo. Although it would be possible to devise a method of doing this with G-format, this is not the best solution. Luckily, a better answer already exists: on a standard DVD-Video disc, you can optionally provide the stereo version as a separate 2-channel linear PCM soundtrack. The present author has proposed that in fact the LPCM tracks could contain a 2-channel UHJ mix of the surround material, and calls the overall result "G+2". The presence of 2-channel UHJ in the stereo tracks of a DVD-Video disc, the Red Book layer of a Super-Audio CD, or the equivalent area of a DVD-Audio disc would give the listener additional options including being able to use an existing Ambisonic decoder for surround in the absence of a 5.1 replay system.
One question that has to be addressed -- by the entire surround industry, not simply those working to introduce G-Format -- is that of where the loudspeakers are going to be for surround music mixing. Whatever the surround system in use, failure to bear the end-user's "target" speaker positions and configuration in mind may result in the listener at home hearing something significantly different to that intended in the studio.
Film sound has developed its own surround monitoring techniques, and they do not necessarily sit well with the requirements of a home audio system. THX, for example, recommends the use of front speakers at 45 degrees width -- significantly less than normal stereo and much less than quad -- to ensure that mixers are not too close to the screen. However this tends to make the listening area much longer than it is wide, which exacerbates the problems of side imaging. Various rear (surround) speaker arrays have been proposed, including dipolar speakers (useful in the days of Pro Logic when there was only one surround channel, but arguably of significantly less value if you want to localize sounds around the rear rather than simply imbue the proceedings with a warm ambient feeling) and multiple rear loudspeakers.
In determining the optimum layouts for recording environments, and making recommendations to end-users, it must be remembered that most home listeners (and music recording studios!) will have their systems set up to listen to stereo music as well as surround: 60 degree front speaker spacing is likely and most home listeners will only have one speaker per channel. If studio monitoring systems are more complex, production equipment for any kind of surround will need to make allowances for this likely fact.
In any case, the likely home speaker configuration and the studio monitoring configuration should correspond, via the use of compensation networks if the studio setup involves more speakers than the likely home configuration.
There is also the question of the Low Frequency Effects (LFE) channel. Many surround aficionados, especially in the Ambisonics field, are in favor of using the LFE to carry height information to an overhead speaker. (Height has been included in Ambisonics from the beginning, but is only now beginning to catch on in other surround environments.) However, some surround encoding schemes bandwidth-limit the LFE making it impossible to use for this purpose. Other techniques are possible, but they would all require more than a simple crossover to separate LF effects and height in the home. This should not be a problem with DVD-Audio, where an MLP data stream can have up to 64 channels and height information can take a channel all to itself.
We can envisage that G-format will find applications in all kinds of multi-channel production environments including film sound as well as music-only productions. A typical DVD-Audio disc, for example, might contain a G-format version encoded with Dolby Digital (AC-3) to make it playable on a regular DVD-Video player; a more sophisticated version with height in the MLP data stream; and a stereo mix that could optionally utilize 2-channel UHJ. Second-generation DVD-Audio players with a FireWire digital output would pass the MLP data stream to an external decoder if desired, which could even handle B-format decoding. A Super-Audio CD could include a G-format 5.1 mix and a Red Book-compatible 2-channel UHJ version. In addition, G-Format modes can be developed to handle higher numbers of transmission channels, such as 7.1 for SDDS and similar systems.
G-Format production equipment will offer a number of advanced surround processing and creation functions. Present designs in prototype form consist of powerful DSP systems configured with a serial port allowing them to be operated from a computer. A number of programs for different surround applications are stored on the PC: they are downloaded to the DSP and controlled via graphic user interfaces on the computer. At present, the applications being tested include:
Many of these functions parallel those of the original analog Ambisonic production systems released in the Eighties by Audio & Design in the UK, although current designs are entirely digital. B-format tools are included as B-format is still the most economical method of storing surround information, using as it does only four channels to store an entire three-dimensional soundfield, including height. G-format tools will also ultimately handle height information.
Although current prototypes are stand-alone boxes, it is likely that at least one version of the G-format production system will be in the form of a plug-in card for a popular multi-channel digital system, which may also include such features as monitor array control functions. In addition, several groups are working on software packages to offer the same kind of facilities, including plug-in suites for systems such as Pro Tools.
Tools like these will
enable conventional consoles and DAWs to generate sophisticated 5.1 surround
mixes which would noticeably outperform existing practice. Purely on the basis
of performance, engineers and producers are likely to prefer G-format production
equipment simply because it sounds better and provides a higher quality of surround
experience. Many may neither know nor care that they are using the latest incarnation
of the surround sound system that has been in the longest continuous use for
music and has been employed in recording of a large and continually expanding
body of work: Ambisonics.