A Compatible, Single-Mix DVD Format
for Ambisonic Distribution

There is no doubt that an effective format for the distribution of Ambisonic surround material would be 4-channel UHJ or an alternative hierarchical surround encoding scheme that degrades (to use Web design terminology) gracefully. In the case of 4-channel UHJ, for example, remove the Q channel and you lose the height information; remove the T channel and you are left with respectable 2-channel encoded surround; play back L and R without a decoder and you hear super-stereo; and sum L and R for highly-compatible mono. Engineers only need to do one mix as the mix is compatible all the way down the line.

The ARA (Acoustic Renaissance in Audio) group has proposed defining a flag in the DVD-Audio spec to indicate that a hierarchical scheme has been used in the recording. In fact they are thinking of a "G-Format Hierarchy", or GH, based on a more advanced form of B-Format, "B-Format+". It is not yet clear whether this flag will be adopted (they have also proposed a flag indicating lossless compression, which would also be a useful feature). However, material released in this way would not be available to owners of existing DVD Video players, and would only be applicable to future DVD-Audio disc systems ≠ and only then if not only is the flag adopted, but manufacturers implement it. This may simply not happen: right now, for example, DVD players in North America can utilize MPEG-2 multi-channel audio compression - but nobody has.

The problem with UHJ as a transmission medium for DVD is that unless you listen in stereo or mono, you need a decoder ≠ unlike 5.1 surround systems. This may put people off. In addition, UHJ does not take advantage of some of the latest developments in Ambisonics (such as those noted in Gerzon and Barton's papers presented to the AES 92nd Convention in Vienna in 1992).

As a result, it has been suggested that an Ambisonic signal could be decoded into 5.1 before being mastered to DVD. Then, when the signal was replayed, no decoder would be required. This "pre-decoded" format is related to the "G-Format Hierarchy" mentioned above, but to avoid confusion we'll call it "GD" (G for "G-Format" and D for "decoded") here to distinguish it from the Ambisonic distribution format proposed for DVD-Audio.

The disadvantages of GD include the facts that you do not get height information out of the system, and that you are obliged to place your speakers in a standard 5.1 arrangement (normal decoded Ambisonics enables you to place any number of speakers anywhere you like). The big advantage, however, is that you make Ambisonics available to anyone with a DVD player of any kind.

In addition, enough information is carried by the GD signal to recover Ambisonic B-Format or B-Format+ (a development of B-Format described in the 1992 papers that provides a number of additional benefits, although it uses more channels) from it, which could then be re-decoded into an Ambisonic speaker array with any number of speakers in any positions. It would even be possible to encode the height information into the GD signal in such a way that it could be extracted as part of the B-Format recovery process, allowing a periphonic decode system that was only a little less efficient than 4-channel UHJ ≠ although the listener in 5.1 without a decoder would not be aware of it.

It may be determined that the Low Frequency Effects (LFE) channel (the "point-one" in a 5.1 system) is of limited usefulness in a music replay environment, but we will assume here that a system for generating GD includes provisions for handling the sixth channel transparently as well.

Because Ambisonics is very much a "relational" rather than a "multiple discrete source" surround system, it is important that the relationships (phase, for example) between the 5.1 channels are maintained throughout the production process, from their generation to their replay. In the absence of lossless compression, a system which definitely would maintain the inter-channel relationships, because it is designed to do so, is MPEG-2 Audio. This has the advantage that it is already a part of the DVD spec: it is the preferred system for 50Hz countries and an option for 60Hz countries. Unfortunately the fact is that this option is unlikely to be taken up in 60Hz countries, and even now none of the 100 or so DVD players sold in Europe can handle MPEG-2 either. So we may have to make do with AC-3, although DTS would also be an option.The scheme used to transfer the 5.1 channels from the studio to the listener at home is, in fact, largely immaterial as long as it doesn't damage the signals or their relationships.

An existing DVD Video disc could carry a GD signal in the 5.1 data stream today: it would replay correctly on an existing system with 5.1 capability, and provide the full Ambisonic recovery benefits outlined above. In addition, the default stereo downmix provided to users without surround capability might not be optimum, but it would not be meaningless.

There is still the question of multiple mixes to be addressed, but there is an elegant solution to this problem, which also deals more satisfactorily with the stereo listener.

Along with the multi-channel material for surround listeners, the DVD spec requires that a stereo mix be supplied for the listener without surround capability. The DVD specification allows this to be provided as a two-channel linear pulse-code modulation (LPCM) digital signal, completely separate from the surround material. Exactly this technique is to be used to provide the stereo version for listeners to DTS-encoded DVD discs, where the multichannel DTS-encoded audio is in fact completely inaudible without a DTS encoder.

In an Ambisonic production environment, working, for example in B-Format or "BEF" (B-Format+), we can not only decode the signal to 5.1 speaker feeds for the 5.1 data stream: we can simultaneously (and in sync) generate a 2-channel UHJ feed from the same B-Format mix. The UHJ feed can be carried via the two LPCM channels.

This "G+2" system supplies a number of benefits. The Ambisonic GD data is supplied via an existing multichannel compression scheme ≠ and as a result is delivered for standard 5.1 listening by an existing player without the need for additional flags or an Ambisonic decoder. At the same time, the two LPCM channels provide a high-quality "super-stereo" enhanced-width mix for the stereo listener that can also be decoded using an existing Ambisonic 2-channel UHJ decoder to provide respectable surround (though not as good as the GD).

In the studio, only one mix needs to be performed, typically by positioning the mix elements in Ambisonic surround (or optionally listening to the final 5.1) and then, if desired, the final balance is performed by listening to the undecoded 2-channel UHJ, as experience has shown that this will produce not only a good stereo/UHJ balance, but a good surround balance as well.

The studio equipment required for this type of production would be a standard B-Format Ambisonic production system (such as the Audio + Design/Cepiar units or a more developed digital implementation with similar functions, probably generating B-Format+; or a Soundfield mic or equivalent, or both) to generate a B-Format or B-Format+ mix; a 2-channel UHJ encoder, and a GD 5.1 decoder (the latter might be provided in the same box: this is also the main part of the system that does not currently exist). It might also be possible to create production equipment equivalent to the Transcoder, which would take pairwise mixes for front and rear stages, plus an optional center front channel, and output 2-channel UHJ plus the GD 5.1 feeds.

This equipment is currently in preparation.

Return to the Ambisonics in the Age of DVD article

go home