DVD-Audio and Ambisonics

Also available: an updated version of this article
(minus most of the Ambisonics content)
published under the title DVD Audio Reality
in the April 1999 edition of AudioMedia US (PDF, 480K)

DVD, Surround and Ambisonics

by Richard Elen, October 1998

Contents

Surround for DVD-Video
Stereo for DVD-Video
Ambisonics for DVD-Video

DVD-Video audio possibilities
DVD-Audio possibilities

MLP - the heart of DVD-A
The Challenge of Height
Speaker Positioning
Bass Management

The advent of the Digital Versatile Disc (DVD) means that for the first time a digital multi-channel audio distribution system is available for the transmission of surround-sound material. Needless to say, Ambisonics has a role to play. In this article, Richard Elen examines the audio capabilities of DVD-Video and DVD-Audio systems and considers how to make best use of them.

Surround for DVD-Video

The primary audio encoding scheme for DVD-Video discs is Dolby Digital (AC-3), a perceptual coding ("lossy") system that allows for five full-bandwidth channels and one bandwidth-limited Low-Frequency Effects (LFE) or "sub-woofer" channel. While the latter is useful for crashing asteroids, dinosaur footfalls and other movie effects, it may not have an application in musical material. The system of five full-bandwidth channels plus LFE (left front, right front, left surround, right surround, center front, sub/LFE is a common master tape track order currently in use) is generally referred to as a "5.1" system.

An alternative DVD-Video option is DTS encoding, another perceptual scheme, providing up to six full-bandwidth channels (although the sixth may be limited to low frequencies only by the decoder). DTS encoding can also be used on special CDs which can be played on a regular CD player, but require a DTS decoder to extract the surround information

In Europe, players may use MPEG-2 Multi-Channel Audio for encoding up to six channels. This is another lossy system.

All DVD-Video players (in NTSC territories at least) will replay AC-3, and virtually all recent players will also handle DTS - especially now, as the latest decoder chips will handle both.

The digital signals used for surround audio on DVD-video are generally 20-bit with 48 kHz sampling, although 24-bit, 96kHz is also becoming more common.

Top

Stereo for DVD-Video

In addition to the surround encoding, DVD-Video discs must have a means of outputting stereo for those who do not have surround replay capabilities, or for high-quality stereo audio. This can either be done by "folding down" or "downmixing" the 5.1 mix to stereo, determining the proportions of each channel that appear in the stereo output by means of coefficients in the disc header; or by including a linear PCM stereo track on the disc. Such an LPCM track may carry data sampled at up to 24 bits at 96 kHz.

Many music producers are not happy with the characteristics of an automatically-downmixed stereo program and as a result prefer to include a PCM stereo mix on music-based DVD-Video discs.

Top

Ambisonics for DVD-Video

Ambisonics in the past was limited by requiring the listener to have a special decoder (in common with all the 2-channel surround systems) to extract the surround information. With a 5.1 path, however, this is no longer necessary. An Ambisonic recording can be decoded in the studio to speakers in standard 5.1 listening positions, the resulting decoded speaker feeds being placed on the disc. This is referred to as "G-Format". The decoding scheme is designed to be completely reversible, so that the listener at home, with an optional coder, can convert the G-format to B-format and then use a conventional Ambisonic decoder for listening, where speaker configurations differing from 5.1 layouts may be employed.

Traditional Ambisonic decoders required a symmetrical array of speakers, which is not the case with a 5.1 replay system. As a result the studio decoder should be based on the new technology introduced in the papers by Michael Gerzon and Geoff Barton at the Vienna AES in 1992. Such decoders, which can deal with irregular speaker arrays, are thus generally referred to as "Vienna" decoders.

A G-format signal can be placed on a DVD by any of the means listed above. It is not recommended to "fold down" the G-format signal for stereo output. Instead, a stereo LPCM track should be provided. In addition, the author has suggested that this stereo track could in fact be a 2-channel UHJ Ambisonic mix, derived from the original B-format recording at the same time as the G-format was generated. A single 8-channel master tape could carry time-coincident G-format and UHJ mixes, placing UHJ left and right on tracks 7 and 8 respectively, in addition to the 5.1 track layout. This could be mastered to produce a "G+2" disc.

Top

DVD-Video audio possibilities

Interestingly, DVD-Video allows for four channels of PCM at 96kHz sampling and eight channels at 48 kHz, the maximum bit-rate being 6.144 Mbps. Thus it would be possible to carry a 6-channel G-format Ambisonic mix on DVD-Video and allow an additional channel for height information (see later). Or provide a 6-channel surround mix plus a separate stereo or 2-ch UHJ mix. However, although these potentials exist in the DVD-Video spec, players would have to exist to replay the material, and this may not be the case.

Note also that while you can do four channels of 96kHz sampling, you can only do it if you are happy with 16-bit word lengths. (Only stereo at 24/96 is practical.) The whole scheme goes like this:

Theoretical PCM Options for DVD-Video
16/48 - up to 8 channels
20/48 - up to 6 channels
24/48 - up to 5 channels
16/96 - up to 4 channels
20/96 - up to 3 channels
24/96 - up to 2 channels

Dolby Digital on DVD-Video may deliver a maximum bit-rate of 448 kbps, with six channels sampled at 48kHz. The other alternatives are MPEG-1 - two channels of 48kHz sampling for a bit-rate of 384kbps - and MPEG-2 multichannel - eight channels of 48 kHz for a bit rate of up to 912kbps.

Top

DVD-Audio possibilities

What we hope will be the final specification (1.0) for DVD-Audio has now been published. The maximum bit-rate is 9.6Mbps, and the intention is that a disc should be capable of delivering 74 minutes of material (LPCM or lossless compression). Players will support a hybrid CD/DVD-A disc, meaning that if you want to do a disc that will play on both a regular CD player and a DVD-Audio player (like a Super-Audio CD), you can. In addition, DVD-A includes the following capabilities:

Scalable linear PCM multi-channel audio

48/96kHz, 44.1/88.2 kHz and 16/20/24 bits; six channels maximum.

Super high quality linear PCM audio

192/176.4 kHz and 16/20/24bits. Two channels maximum.

System Managed Audio Resource Technique (SMART)

Downmixing for stereo presentation of multi-channel contents in 2-channel form. Each of the six channels may be mixed down to stereo by means of coefficients which can be set on a track-by-track basis. Coefficients determine level, panning and polarity.

Producer's choices

The DVD-Audio specification allows the content producer to determine a number of factors, such as:

There must be at least two channels of LPCM on a disc, so that the earliest DVD-Audio players will play any future disc and vice-versa. However, it seems unlikely that anyone with space available on the disc will not include an AC-3 (or possibly DTS) version of the material, as this will make the disc playable on a DVD-Video player. Dolby, who are the major player when it comes to perceptual coding (and are also the licensor of MLP - smart move there, Bob) of course encourage this.

However it would appear that while you can use MLP or LPCM, you must use one or the other. This makes a perceptually-coded audio track an option, and not mandatory - if you want to include a perceptually-coded track, then you must also use MLP as your primary encoding scheme (presumably there wouldn't be room for a perceptual track if you used LPCM for your primary program).

This seems sensible to most people except to DTS, who are miffed that they get equal billing with AC-3, but not with MLP (remember that lossless compression was part of the music industry's International Steering Committee's requirements for a high quality audio disc, so it has to be mandatory).

As a result, DTS are suing Working Group 4, which may prove to be a disastrous strategic mistake. DTS also claim that their perceptual (lossy) coding scheme sounds better than PCM (or lossless compression, which amounts to the same thing). Ask yourself whether it is true that if you take something away from a signal, you are left with the same as if you took nothing away - I don't think so.

Top

MLP - the heart of DVD-Audio

The real breakthrough offered by DVD-Audio is in the form of MLP - "Meridian Lossless Packing". Like ZIP or Stuffit on your computer, MLP takes a PCM data stream and "packs" it at one end of the chain, "unpacking" it at the other to provide a completely accurate replica of the original - this includes accuracy of sample-timing, too, by the way. The technique was chosen in a shoot-out between four competing lossless systems. Initially, the focus fell on the encoding side of the equation. Here, MLP is fairly comprehensive and it requires a respectable amount of processing power. Other systems required less in this area. However, on the decoding side, MLP is much simpler to process. This is because most systems treat the unpacking and the extraction of the PCM data as two separate operations - first you do one, then the other. MLP on the other hand, treats both as one operation. The result: a simple-to decode system that as a result requires very little expense in the player - where it matters. And indeed, even the encode side is not too dreadful. If you have an authoring suite with an AC-3 encoder based around Dolby's Onyx chip-set, you'll be able to run the MLP encoder on it (I assume with a hefty memory upgrade, though).

MLP was developed by Bob Stuart of Meridian, Peter Craven, and the late Michael Gerzon. Bob is the chair of the Acoustic Renaissance in Audio group, who have been proposing a very sensible set of ideas for DVD-Audio for some time - check them out. Gerzon and Craven - who often co-operated on digital audio projects - were behind the ARA on the technical side from the beginning. For a fascinating interview with Bob Stuart and a sidebar on the license deal with Dolby, see Philip DeLancie's article in the December 1998 issue of Mix magazine. It's not on their Web site at present - lobby them!

With Michael Gerzon's involvement, you might expect that MLP would not be unfriendly to Ambisonics, and this is thankfully the case. In addition, Meridian have been including several Ambisonics-friendly features in their products for some time, including UHJ and Trifield decoding (the latter derives an L-C-R signal from stereo, for tighter imaging, and was developed by Gerzon and Barton. Trifield Productions is the name of Dr Barton's London-based company, now working on G-Format production equipment).

A preliminary data sheet on MLP is available (here in PDF form and as a web document), and the following notes are taken from that document and from other sources.

MLP was initially presented to the public by Bob Stuart and Meridian at the Hi-Fi 98 show in Los Angeles earlier this year, and in a very interesting way. Stuart replayed some very unusual and impressive CDs with the system, including some encoded with MLP in horizontal B-Format (WXY). The official DVD-Audio spec discusses up to eight audio channels: however the data sheet lists "up to 64..." with "flags for speaker feed identification, [and] flags for hierarchical feeds (eg M&S, Ambisonic B-format and others)." The term "others" almost certainly includes G-Format.

The data sheet also suggests several applications besides DVD-Audio, including "3 or 4-channels on CD; 2-channels 20/24-bit on CD; [and] 88.2kHz 2-channel on CD".

Some of the qualitative features of the system are also given:

Exactly how the flags for hierarchical encoding are implemented is not given. One hopes that they are more like MIDI IDs and less like DVD-Video flags. The latter are essentially individual bits that are set on or off to indicate a function. You would run out of these fairly quickly. A MIDI ID-like flag system, however, would allocate an ID code bit-pattern to each registered system. If you came up with a new surround scheme and you wanted decoders to be aware of it in the future, you would ask for an ID to be allocated by the equivalent of the MIDI Manufacturers' Association. We might find this useful, for example, in incorporating height information into surround recordings.

Top

The challenge of height

Height has been part of the Ambisonic portfolio of capabilities since the beginning. The earliest Soundfield microphones were able to capture not only the horizontal surround of W, X and Y, but also the "periphonic" attribute of height, Z. Only recently, however, has height attracted the attention of the wider surround-sound field - and much to peoples' astonishment, they've found it makes nearly as much difference as rear speakers! The challenge is how to achieve it, as today's cinema-derived 5.1 systems allow for thundering bass effects but not for height. In musical applications, the emphasis needs to be the other way around.

In a traditional Ambisonic decoder, different speaker arrangements were allowed for to introduce the replay of height information. Typically, two crossed rectangles have been employed, as this makes it possible to retain speakers in the conventional stereo and horizontal-surround listening positions. Eight speakers at the corners of a cuboid also works, while the simplest arrangement of six speakers, one in the center of each face of a cuboid, is compatible with nearly nothing!

Of course, if MLP is being used to transmit a complete B-Format recording including height information on DVD-A, a periphonic decoder at the listening end will do fine.

However, we also have to consider 5.1 systems and G-Format. Although G-Format is reversible to allow the recovery of horizontal B-format, it does not include the provision of height information. The fundamental principle of G-Format is to allow the transmission of Ambisonic material via 5.1 systems in such a way as to avoid the need for a decoder at the listening end (although reversibility makes it an available option). Although it is certainly possible to envisage a G-Format studio decoder on the Vienna model which would generate "G-with-height", the primary consideration so far has been to provide height information to drive a simple overhead speaker. In either case a decoder at the listening end of some sort, however primitive, will be required, and attention has to be given to how to encode the height "channel" into the existing 5.1 stream.

Height in the LFE channel

One method is to add height information to the LFE channel. This is simple and obvious, and the only "decoding" required would be a high-pass filter (operating at speaker level if you wish) to drive the overhead speaker (all deep bass information, including such sounds from above, would issue from the sub-woofer: low bass sounds are - incorrectly I believe - regarded as not capable of localization).

The only problem with this method is the fact that some systems do not permit the LFE channel to carry full bandwidth (eg AC-3). In this case, evidently, height in the LFE channel simply won't work. In the case of DTS, the sixth channel can be full bandwidth, but many decoders roll it off. It may therefore be the case that G-format recordings for encoding with the two most common perceptual coding methods simply will not be able to include height.

This is not the case with DVD-A, with up to six full-bandwidth LPCM channels, or MLP, where there are at least six full-bandwidth channels available. You could either use the LFE-encoding technique or (with MLP) simply allocate another channel to height only. As it remains to be seen how many MLP decoders will actually allow you to handle more than six channels, the former technique is safer.

Height in the main channels

Another method of adding height information would work even if the LFE is bandwidth-limited. This involves encoding the height data into the other channels with some variant of a sum-and-difference technique. Technically, this is little, if any more complex than height in the LFE, nor significantly more difficult to implement in terms of components. However it is conceptually and operationally more inconvenient: the user would have to plug a box between the existing player or decoder outputs and the amp feeds (or perhaps on the amp outputs), and ensure that two sets of 5.1 connections were wired up correctly. At least we can imagine that people with an interest in recovering height would also presumably have some knowledge about setting up a system.

Other height options

Another option would be to use the center front channel for height, and have a virtual center as in standard stereo listening. This is of course the easiest of all to do, but it would be necessary to be careful setting up a decoder to ensure that only the desired signal went there. Other possibilities also exist for incorporating height into 5.1 - please feel free to suggest some.

Top

Speaker placement

A subject that seems to have gone unnoticed in the world of 5.1 is the question of where the speakers are. This is an obvious consideration when it comes to G-Format - where we are decoding Ambisonics, and of course every Ambisonic decoder wants to know where the speakers are! - but it is also important in ordinary 5.1 systems as well. And the problem is, there is no common standard.

The main difference is that between film and music environments. In the film world, front left and right speakers are generally 45 degrees apart, where for music they are more likely to be 60 degrees apart - as the studio will probably make stereo recordings too! Such front speaker separation is also a reasonable assumption for the home listening environment, which will be even less likely to see the shifting of speakers depending on the source chosen for listening.

Surround speakers are also different. In the film world - either larger mix rooms or in movie theaters - the tendency is to use an array of surround speakers, often down the sides of the listening environment and not at the rear at all - certainly very wide apart. In smaller mix environments, there may only be one surround speaker per channel, but very likely it will be a bipolar design, good for creating a warm, friendly surround ambience but not very helpful for clear rear localization. Luckily this latter tendency is less apparent in modern room designs, and also luckily, it has been shown that bipolar surround speakers don't really adversely affect surround localization after all.

In the music environment we tend to find more rectangular speaker arrangements with one direct-radiating enclosure per channel, and with surround speakers towards the rear, possibly in memory of quad. However these days we do also see wider rear angles than front, often around twice the front 60-degree separation.

Luckily, such arrangements are less problematical for modern Ambisonic decoders (such as the Vienna designs used for G-Format) than they would have been earlier. And even earlier decoder designs can compensate for likely listening environments thanks to the judicious application of delays (Meridian use this technique, for example). In fact it may be that non-symmetrical speaker layouts pose more of a problem to level-only 5.1 than they do for Ambisonic G-Format!

All this means that there is probably a higher degree of congruence between music studio control rooms and home listening environments than between film mixing rooms/movie theaters and the home environment. G-Format studio decoders allow for different intended destination speaker positions, even if it is as simple as a selectable "music/film" matrix selection. G-Format production equipment will also be capable of converting between the two - something that should be considered when preparing a movie for home video release in surround! Some G-Format gear will also allow the production of one kind of material in a room designed for the other, for example by including a THX matrix controller in the monitor path.

Less of a problem exists for the music producer in that the "music matrix" on a G-Format generation system will likely approximate to the home environment as well as the studio. There is a need, however, to define a reasonable standard for home listening speaker positions that will take into consideration the need to listen to stereo music recordings with the front speakers at 60 degrees. Such a standard would also apply to the music studio, and film sound content producers would be advised to transcode movie soundtracks for video release to conform with such a standard.

Top

Bass management

Another source of continuing confusion is the purpose of the LFE channel in music production. It was designed, of course, for Low Frequency Effects - it says so! That means asteroids crashing into the Earth and the sound of T-Rex footfalls. It probably does not mean much in the context of music production. Even the low organ note at the end of Britten's Antiphon does not need the LFE, although it may well use the sub-woofer in a replay system that has one. Some engineers have used the LFE as a distinct alternative position for bass sources. This is probably not a good idea. All this means that we need to take a look at what happens to the bass end - where it comes from, and where it goes.

One problem is that there is bass handling at both the creation and playback sides of the equation. Reasonable parameters for the mixing environment - and the home listening environment - include:

The idea of bass management is simply to ensure that we hear all the bass that is present in the program material. The LFE is an additional source of bass-end information - we shouldn't think of it as the "subwoofer channel" - and the main channels also carry bass.

Evidently, stereo recordings can have deep bass without the need for an LFE channel. When considering the LFE channel, it's important to remember that bass sources usually have harmonics or overtones, and even if bass itself it not capable of serious localization (which I doubt), the overtones from, say a bass guitar, certainly are, and it would be helpful if everything came from the same place. Phase coherence is important, too.

LFE information can be generated from the main channels (but they will need a bass rolloff to avoid the bass appearing from both places) or it can derive from completely new signals such as effects.

If all the bass is in the main channels, with no LFE, the replay system will determine what signals are fed to a sub if present - which works fine. If the LFE is derived from the main channels and their rolloff is higher than that from that in the replay system, there is the possibility of some low frequencies between the crossover points being lost entirely. To avoid this, the LFE crossover frequency must be lower than the speaker crossover.

If the LFE contains new signals, they need to be rolled off and any higher frequencies (such as harmonics) fed back into the main channels.

The important point to remember is that a replay system needs to reproduce bass from both main and LFE channels - whether it includes a subwoofer, or has full-range speakers throughout and no sub at all. In the former case, the main speakers will be rolled off and the bass end from the main channels will be mixed with the LFE information if any, the result being fed to the sub. In the case where all the speakers are full range, the LFE information may be summed into the left and right front pair and other speakers. All sorts of combinations exist in between.

With this in mind, we may conclude that the best - and simplest - way to handle the LFE is not to use it at all, instead allowing the monitoring system in the studio or in the listener's system to handle bass frequencies: passing them to the sub if the speakers aren't full-range, or relaying it faithfully if they are. It's as simple as that! As a result, of course, we can happily label the sixth channel of a DVD-Audio multichannel stream "height" and not worry about finding another channel for it...

Top

Go home