Whatever Happened to Ambisonics?

The following feedback was sent in to S2N, an on-line magazine edited by Paul D. Lehrman at the University of Massachusetts at Lowell, following their re-publication of this article.
First appears a letter from a reader; this is followed by a discussion between the author and noted electronic music composer/performer, Wendy Carlos. Reprinted by permission.

Date: Thu, 09 Oct 1997 08:49 EST
From: Charles Repka, CR Recordings

As someone who has been using a Soundfield mic (an ST-250) for the past few years to make classical music recordings, I read Richard Elen's article on Ambisonics with a great deal of interest. I was also involved with making Quadraphonic mixes using SQ, QS as well as discrete mixes for open reel tape, so I am well aware of the pitfalls involved in making "surround" recordings.

With the pending introduction of a DVD audio format (that's if we can get all the parties to agree, but that's another story) it will be possible to make surround recordings with little or no compromise. I would love to try making some Ambisonic recordings but alas, there are no Ambisonics (B Format) decoders available. How can I find the right placement for the Soundfield mic if I cannot listen to the entire Encode/Decode process?

The placement of a Soundfield mic for a Ambisonic recording is a controversial one. Some engineers (like John Eargle) state that a proper Surround recording can't be made with a single point mic. That may be true, judging by some of the Nimbus recordings I have heard. But that may just be the result of the choice of the wrong acoustic and incorrect mic placement. Is it possible to combine spot mics with the output of a Soundfield mic and still get acceptable results? I don't know.

Comments anyone?

Richard Elen replies:

Luckily I can point to more than one manufacturer making Ambisonic equipment including converters with B-Format capability. Indeed, surround sound enthusiasts now give the system significantly renewed attention, with some major companies poised to enter the field (notably Waves, who worked closely with Michael Gerzon, one of the chief founders of Ambisonics, in the last years of his life).

Specifically concerning B-Format decoders, you can obtain them from Meridian in the UK; and Cepiar, also in the UK, produce the full line of former Audio & Design Ambisonic mixing equipment (including the decoder) plus some additional decoder options. Companies such as Lake DSP produce software products which can handle and manipulate B-format.

The Surround Sound mailing list provides a useful source of information on Ambisonics in general. To subscribe to the list, send an e-mail message to majordomo@lists.uoregon.edu with the following command in the body of your email message: subscribe sursound yourname [youremail@host.domain].

You'll receive a confirmatory message with instructions.

Several sources of Ambisonic information exist on the Web. I have a set of links on my Ambisonic Index page.

Microphone technique remains, of course, a matter of personal preference, so I would hesitate to say that "a proper surround sound recording cannot be made with a single point mic". However, I seldom use a Soundfield microphone as the sole audio source in a recording, even when I set up for a "purist" recording (such as the Frank Perry recording released by Celestial Harmonies, for which I used both the Soundfield and a coincident pair of cardioids).

I would certainly agree, though, that in a 4+ channel system which relies on level only for localization (eg 5.1), a single point microphone seems unlikely to recreate a satisfactory acoustic environment.

However, if the microphone can capture all the audio information arriving at a given position, including height information, as the Soundfield does, and you decode that information so as to utilize all its characteristics, such as in an Ambisonic decode setup, you can recreate highly convincing environments. In addition, with a full control unit and a B-format recording, you can manipulate the microphone position after you've recorded it, effectively moving the mic into different locations, pointing it in different directions, and changing the polar diagram. You can even derive a wide selection of stereo pairs with different characteristics.

Even so, I have heard some criticism of Nimbus recordings (they do not use a Soundfield microphone, but an array of Dr. Halliday's construction which does not deal with height information) when replayed without a decoder, and indeed I sometimes find undecoded Nimbus recordings a little over-ambient. When decoded, however, I enjoy Nimbus material a great deal.

My personal interest in Ambisonics originated, however, in a desire to perform multitrack-derived Ambisonic mixes rather than purist single-mic recordings. As a result I formed part of a group of independent researchers developing Ambisonic mixing equipment as outboard gear. This resulted eventually, via a tortuous route, in the Audio & Design Ambisonic Mastering System, designed largely by Dr. Geoff Barton. I recounted its use in my article on the subject in Studio Sound, available from The Ambisonic Index.

You can indeed use this system to combine B-Format material with other sources (mic or line) and you can even stay in B-Format without having to encode the result into 2-channel formats, if desired, although you do have to decide what format you would use to release the final result.

If the group Acoustic Renaissance in Audio have their way, the DVD Audio spec will include a flag which permits the automatic recognition of hierarchical surround encoding schemes (such as the Ambisonic "UHJ" hierarchy), so you could retain the degree of detail derived from B-Format and even recover true B-format on replay (or derive any other system including 5.1 for that matter). The use of a maximum of four channels in a hierarchical configuration such as UHJ would also allow lossless compression to be used (another ARA proposal requests a flag for this too), which would satisfy many people's requirements for a higher-quality disc.

Using the Cepiar/ADR Pan-Rotate unit, you can easily combine up to eight mono sources with up to two pre-existing B-Format signals to produce a B-Format result. Each of the eight mono inputs has a continuously rotating 360-degree panpot and a "radius vector" control that sets the distance of the source from the center (+ and -), panning across a full diameter of the soundfield from the point set by the main 360-degree panpot.

The unit includes a rotate control which rotates the entire soundfield produced by the unit. You can mix in additional B-format signals before and after the rotate control for maximum flexibility.

My Studio Sound article includes full details on Ambisonic mixing (the S2n piece only carries an overview).

Wendy Carlos and author Richard Elen talk about the article...

S2N Editor's note: During the October 1997 AES conference in New York, Wendy Carlos handed me a printout of Richard Elen's Ambisonics article, with a number of corrections and annotations pencilled in the margins. The two happened to meet at a party during the conference, and I asked Wendy to give the printout directly to Richard, who then wrote up his reponses to her notes. Then Wendy got a chance to look at Richard's responses, and write up her responses. The result is a fascinating dialogue between two committed people who, among other things, happen to be brilliant engineers and engineering theorists. ---Paul D. Lehrman.

WC: Richard's discussion of phase-shift panning confuses time delay, which requires delays greater than 10 ms, with phase (delays much less than 10 ms).

RE: I agree with Wendy. The original manuscript for this article included a more detailed description in which I clearly distinguished these two effects. I also referred to such localization parameters as the "Makita direction". However AudioMedia (the magazine that originally commissioned the article --Ed.) asked me to simplify the discussion and as a result a little blurring occurred.

In the present version of the article, the DDLs don't need to be set to 100 ms maximum: 10 ms will do fine. You can divide all the millisecond values in that paragraph by 10 and it actually works better.

I wanted to draw a distinction between level and time domain localization mechanisms, which is why I didn't mind cutting the detail from the time-domain part of the discussion. We actually use three primary mechanisms: level, phase (well below 10 ms) and Haas effect (delays between the ears in excess of 10 ms). Wendy correctly draws the distinction between the last two, where I didn't.

WC: I'd assumed that some simplifications had been imposed "from above", as it was clear (and I now know from meeting him!) that Richard is a deft, sharp engineer, and he understands these distinctions (fer sure.) But the way it appeared in print (on screen?) caught my needling eye, and began this "debate" --it was not quite true. That's why I brought the whole thing up. S2N has no need to limit the accuracy of its articles as an original may once have been.

WC: In the section on Blumlein (M-S) coincident-pair stereo, Richard stated that you can obtain a similar effect to a Blumlein pair, without the need for a sum-and-difference "decoder", by using a pair of cardioid microphones with the capsules crossed at 90 degrees. No! They should be crossed at 180 degrees, not 90 degrees.

RE: I'm afraid I have to take issue with Wendy here. Possibly she considered techniques utilizing mics with a broader pattern, or with a vertical plate separating them, where 180-degree positioning would work. The use of 90-degree cardioid pairs seems one of the most fundamental stereo recording techniques available. The effect on headphones sounds nearly identical to Blumlein's, as I stated in the article. As you widen the angle between the mics, you begin to get a hole in the middle: by 120 degrees you will find this quite noticeable, on headphones, and even to a degree on speakers. It already sounds fundamentally unlike a Blumlein pair. In the article I did not concern myself with the actual mathematics of the crossed-cardioid pair. Instead I referred to the audible effect, and I stand by my comments on the subject.

WC: Call a sheep's tail a leg, but the sheep still has four legs. Cardioids crossed at 180 degrees can be "matrix-mathed" to be similar to the Blumlein pair, not so 90 degree cardioids. A Blumlein figure-8 pair at 90 degrees has no real "front" or "back". Any quadrant will do, it's symmetrical all around. So also is the 180 degree cardiod pair, but NOT the 90 degree cardioids, which have a definite "front" to them, whatever your ears may "think". Just try it: walk around a Blumlein pair while talking and you'll remain farily constant in level. Same for 180 cardioids. But 90-degree cardioids will drop your volume when you're behind them, which is precisely to the point. For most recordings, definitely Richard's right: the 90 degree pair sounds better -- a different point completely!

In the "three-dimensional stereo" section, regarding the Soundfield mic, Wendy notes:
WC: Not as "narrow" as panpotted sounds - the narrowest pattern is a hypercardioid, which is still much fatter than a shotgun, contact or direct mic. Really, it's more like four or more amiable cardioids, all time-coincident, no real Haas effect or even much phase...

RE: Unfortunately, I think Wendy has missed the point here. The Soundfield mic extends Blumlein's stereo technique into three dimensions. The actual microphone consists of four capsules in a tetrahedral array. These signals are combined to represent an omni (mono) mic to capture the velocity component, plus three figure-eights at right-angles. We call the resulting signal "B-Format" and it contains complete information on the soundfield present at the microphone position. If you wish to do so, you can perform operations on this signal to derive any basic coincident stereo mic configuration (up to hypercardioid pairs at any angle - not higher order mics like shotguns) and steer them or even move them about in the acoustic space after the B-Format recording has been made. However, in the Ambisonic environment, the Soundfield mic does not behave as a steerable microphone. It captures information on all sound sources in the soundfield and encodes the entire field in such a way that it can be reproduced in an ordinary listening environment.

WC: Richard and I are talking around in circles this time. He just said: "(up to hypercardioid pairs at any angle - not higher order mics like shotguns)", and that's all I meant, too. I do appreciate how the tetrahedral array picks up a "warm and friendly sound amicably from all around". In so doing it also picks up a lot of leakage, and this has its down side. Hypercardiod is not sharp enough when you're trying to isolate sources, but a shotgun or pan potted mono tracks will do nicely, while B-format can't cut it.
I admire the Soundfield mike greatly, but also wish to acknowledge its weaknesses. You must be skeptical of the over-selling given to the mike through the years. It's just a tool, as is UHJ. I wasn't able to outline all of the tradeoffs in my penciled margin notes given to Richard. As in a few other of his replies here, it may appear that I missed a point where I just jotted a mnemonic to an argument I never got the chance to raise.

WC: So why did the old 'Quad' systems need four channels to encode simple horizontal surround? Because each was more isolated, at least in discrete...

RE: People often find the question of isolation, or channel separation, problematical in relation to Ambisonics. If you listen to any speaker in an Ambisonic replay system, you will hear all the information, because each loudspeaker carries a carefully-derived signal in which the phase relationships between different mix components varies between the speakers (the exact relationship depends on the speaker positions, as you set the decoder according to where the speakers sit, rather than the other way around).
These phase relationships mirror the localization cues we experience in real life. And, just as in real life, where we do not experience individual sounds coming from discrete positions in the world around us -- instead we experience a "soundfield" -- so, in an Ambisonic replay system, the sounds from the speakers combine to re-create the soundfield originally created or recorded. "Separation" and "discreteness" we can regard as problems to be overcome, not solutions to the challenge of surround-sound reproduction.

WC: I'm sorry, Richard, the above is just a tad too unfocused for me. I think either of us could set up a fine surround recording and playback demo. But I like to know exactly where all the bodies are buried (and there always are some), and then try to steer to our strengths. For an effective surround environment a perfectly "natural" system may not be the most effective, have the most "impact", or even sound the most natural. As in the best motion picture soundtracks, often the most convincing effects are obtained by faking it, using the tricks of the trade that isolated tracks and narrow patterns allow. Blend in the natural sound (of a Soundfield?), sure, but don't exclusively depend on it. Give and take. Honest.

WC: In the first paragraph of the section "Two Quadraphonic Fallacies", the article misrepresents "quad" by setting up a "straw man".

RE: I indeed laced the paragraph with value judgments. I did it completely knowingly and deliberately! All the widely available surround systems today seem based on the first of these fallacies - level-only localization with speakers at 90 degrees. This seems dreadful to me. We can do much, much better.

WC: Agreed.

WC: I take issue with the diagram representing spatial inaccuracies of different quad systems. The diagram shows the locus of a signal panned in a circle around the listener, as reproduced by SQ, QS, CD-4 and UD-4. The diagram misrepresents CD-4, which was simply four discrete channels -- to be usable in many ways. That pattern here is 'cooked' and shows either a lousy quad panpot or that the speakers are 90 degrees apart. Ideal speakers are 0 degrees, 60 degrees, 120 degrees, 180 degrees -- you need more channels for the rear.

RE: Interestingly enough, I didn't create the original version of this diagram. It appeared in the early '70s in an electronics magazine in the UK called Elektor, which published a survey of the pros and cons of different quad systems. Most people agree on the shortcomings of SQ (poor front/back separation: 5dB in theory, 3dB in practice) and QS (in which "center rear" is in the back of your head). However certainly, in theory, CD-4 - a subcarrier-based disc system - should have successfully encoded the original discrete four channels and they should have decoded correctly to reconstitute the 4-channel original. Why didn't they?
It seems possible that discrete "quad" does not give you a circular locus, in which case the lumpy locus seems the result of an inherent failure of "discrete quad" rather than CD-4 per se. In my article, I do indeed suggest that this seems the case -- that level-only based localization with the speakers at 90 degrees will have holes between the speakers.

WC: Yes, I agree with you here, Richard. Thanks for pointing out that the diagram was used (often, dammit) before you. Four channels isn't enough for completely circular surround. Quad failed because it was used stupidly. And in some cases, as with the pseudo systems that dominated, it wasn't even quad! But you now get to one main crux of the problem, one that David Griesinger also discovered and wrote about in the JAES: where do you put the #$%&* speakers...

RE: Wendy notes that speakers at 90 degrees creates problems. Indeed so! But in most cases, four speakers (in the past for discrete quad, and now for everything except the dialog channel) at 90 degrees represents all you can do. For this reason we have 5.1: four channels in the corners doesn't work - you get holes between the speakers, particularly between the front speakers, where the dialog originates in a movie soundtrack. So they added a special channel to "fill in" the hole.

WC: And they also removed a directly in the back surround channel, substituting speaker arrays to the SIDES of the theater on each side, for the 1st and 5th channels. This works, while the "in the corners" cliche was a blunder.

RE: Wendy's "hexaphonic" proposal, with speakers at 60 degrees, ameliorates, but does not solve, the problem. More channels means you can fill in the holes that exist between four speakers. But to maintain stereo compatibility you actually need speakers at plus and minus thirty degrees, and we find none present in Wendy's hexaphonic proposal. This represents a problem in 3-D Ambisonics too, incidentally, where the smallest number of speakers required for width-height decoding -- six -- lands you with a layout completely incompatible with regular stereo, so instead you have to use eight: two rectangles, one horizontal and one vertical.

WC: Speakers at plus and minus 30 degrees would be even better, sure. From most listening tests, it seems that 60 degrees is about the maximum separation that can "fuse" ghosted locations from many listening positions. Sad to say, Dolby removed the two screen speakers, left-center and right-center, used in Cinerama and Todd-AO for years. The removal was partially economic, partially pragmatic, as it was enough to expect theater owners to get three screen channels working well, not to expect five. So more tends to be better in this game (up to a point!), and the 5.1 system ought be a 7.1 system(!), five screen channels, two surrounds on the walls, and a subwoof.

RE: Consider just the front of the layout, as we might use for stereo. Level-only localization requires speakers a maximum of 60 degrees apart, and Wendy suggests this -- but with a speaker dead center (good for a dialog channel), and speakers 60 degrees left and right of that. Level-only localization would work if all three speakers were used -- "three-channel stereo". You would have to derive the center channel from the stereo feed - not too difficult, I imagine -- because the left and right speakers are 120 degrees apart! So now we have working level-only localization. Sounds good?
With level-only localization we can only achieve localization on a straight line between the speakers: any apparent depth in the "image" requires reverb or other techniques to produce the illusion. Simply "filling in the holes" with six speakers/channels seems insufficient to me, because it just gets level-only localization to work -- where the problem remains that, ultimately, level-only localization does not sound good enough to recreate an acoustic environment or the experience of one. This seems unsurprising, as we use more than level for localization in the world at large -- hence the discussion of phase-shift panning covered earlier.

WC: You're right again, Richard. Up until now it was difficult to gain deliberate control over the time and phase of recorded sound. You could "flip the phase", zero and 180, but that's slim pickin's, and by the middle 70's we had early time delay units, set up just for that -- delays. The trick is how to control it. Here I expect the DAW base to allow a good engineer to manipulate these parameters (as well as the relative levels), to produce better and more exciting stereoization, heard over more channels at home. Perhaps the change won't be as dramatic as when stereo first came in over mono, but it will be audibly real and on the path to the future of audio.

WC: The purpose of so-called "logic decoding" was to try to remove some crosstalk inherent in '4-2-4' (matrix quad) systems rather than to solve the problems of poor localization.

RE: I kind of agree with Wendy, in that both of us seem to say the same thing. If we forget the inherent shortcomings of level-only localization for a moment, doing 4-2-4 successfully seems mathematically impossible -- I called this the second fallacy of quad. You would get a compromise whatever you did: SQ and QS for example, where we called the problem crosstalk or lack of separation. Lack of separation of course resulted in poor localization. So I believe we agree.

WC: We do. In the section on UHJ, "Multi-Channel Compatibility", referring to 2-channel UHJ: WC: This 2-channel form is essentially the same as crossed cardioids (or bidirectional). RE: Kind of. You can decode crossed cardioid recordings (for example with a Hafler technique with a rear difference channel) to extract the ambience and wrap it around the rear of a surround system, and it sounds quite good. We have already discussed the functional similarity of Blumlein M-S stereo and crossed cardioids. So similarly, you can decode a Blumlein pair into surround.
However a Blumlein pair only contains left-right and mono information. A 2-channel UHJ signal contains left-right and front-back as well as the mono. Thus you can hear significantly greater depth and surround effect with a decoded 2-channel UHJ signal than with an ambience-decoded M-S or crossed-cardioid recording. And while a crossed-cardioid recording has a certain depth when replayed on two stereo speakers with no decoder, a 2-channel UHJ recording has a much wider sound-stage and much more depth when played back into two speakers at 60 degrees with no decoder.

WC: The axes of freedom to position sounds requires more than two channels of output. So it's sort of true that UHJ contains the additional direction over crossed cardioids, but at best in a "virtual" way, same as if other mikes in the rear were blended into the two-track crossed cardiod pair recording. You can squeeze quite a bit into two channels, I've done it all my career (because it's FUN to do!), but at some point we need more release channels, and even UHJ's two track version ain't gonna do it.

The "Ambisonic Decoder" section discusses typical practical monitoring setups for the studio, for example using four nearfield monitors for checking surround positioning.
WC: This speaker arrangement is still 90 degrees and poor. Stick with the sixty degrees you recommend: 0, 60, 120, 180 degrees.

RE: This would certainly sound true if Ambisonics used level-only localization (and see the discussion of hexaphonic reproduction above). But Ambisonics does not solely rely on level, so as a result you can put the speakers anywhere you like (if you have four, they should sit in a rough rectangle with a ratio of sides between 2:1 and 1:2) as long as you set the decoder's layout control accordingly. Six speakers certainly sound good, and you can get 6-speaker decoders, but four certainly sounds OK.

WC: I'm not sure if readers will appreciate the 0, 60, 120, 180 degree speaker setup until they try it. The "0" speaker is set to the far left, the 180 to the far right, the other two in between in front, and about as far from you as the side channels. Forget the rear for the moment (Richard's once again correct, that we'd need a hexaphonic system to tap that resource.) My skeptic's view is that the improvement would be quite subtle, as the ear can be fooled easily by the above four channels into thinking it hears rear sounds already. Danger: Psychoacoustics Ahead (and we know who you are)!
The essential detail to be stressed is NOT the numbers of channels we can achieve. It's only to realize that once you have two channels set up well, with the speakers in front and at around the 60 degree separation that works best yatta-yatta, the NEXT two speakers added ought go not to the rear, implying (as it did in the early '70's) one speaker per corner. Instead, put the two new channels to your far left and far right. These directions cannot be adequately simulated from any other stimulus location, and require their own dedicated channels. There is no better way to extend the current audio directionality than adding two dedicated additional channels there. (And after that, a front-center channel, as in 5.1, perhaps, hmm...?)

RE: I feel that Wendy made some wonderful comments on my comments on her comments on my article. We seem to agree almost 100%. I would draw attention to her last paragraph: the number of channels does seem less important than what you do with them. But I would say that we should finally divorce the following two parameters: "number of transmission channels" and "number of speakers". These do not need a one-to-one relationship. I would suggest that as B-Format (for example) encodes all the information that exists about a 3-D soundfield, that sounds like all you need for any kind of surround as far as transmission channels are concerned. How many speakers you use at the other end appears quite different. I like Wendy's proposed array a great deal, and you could derive it from B-Format with an existing Cepiar decoder. Indeed, I have enjoyed six and eight-speaker planar surround on many occasions.
I enjoyed David Griesinger's demo [of Logic 7] although I did not read or hear his paper. He has succeeded in overcoming a major problem with current multi-channel release formats in that you still need a stereo mix, and today's lazy engineers (who don't remember how we did the single in mono, the album version in stereo and goodness knows what other versions) seem scared witless of doing more than one mix "by hand". Logic 7 makes an excellent attempt at deriving a workable stereo mix from 5.1, or vice-versa. It reminds me of the "super-stereo" mode of a UHJ decoder.
However, I disagree with the good Doctor on one point: he maintains that when it comes to recreating the emotion and feel of performance or piece of music, the re-creation of the acoustic remains more important than the localization of individual sources. I regret I feel that if you get the localization right, the acoustics will take care of themselves. He believes you just need to put the sound "in the nearest loudspeaker" and that this sounds good enough.
I feel most concerned that today's idea of surround sounds seems no more than putting individual sounds in individual speakers, with no attempt to produce inter-speaker imaging, or localizations anywhere other than on a line between the speakers (ie nothing inside or outside the speaker array). In the case of Bob Margouleff's Boyz II Men mixes it sounds good, but I wouldn't want to do everything that way. I do not believe that level-only localization, however, can do any better. And that seems where we stand today: with holes between the speakers that we must fill with more speakers - and still seem to lack the depth that I and a surprisingly large number of others have routinely achieved in our own surround mixes for over 20 years - despite the limitations of a 2-channel surround encoding scheme.

Go home