Richard Elen wonders about the future of high-quality audio distribution media.
This article is almost entirely unlike the original version published in the July 1998 edition of AudioMedia UK, which is available as a PDF. The present article is shorter and contains late-breaking news, but the original, while longer, has more laughs. Take your pick!
You may have thought that the format for the next High Quality Audio Disc – HQAD -- was already settled. Surely, it’s DVD-Audio, isn't it? -- with 96kHz sampling and 24-bit word length; 5.1 surround and stereo audio; and Meridian Lossless Packing (MLP) for inaudible data compression (now officially endorsed by DVD’s WG-4) which also allows for innovative surround schemes such as Ambisonic B-Format. But as you may have heard relatively recently, there is now another contender in the shape of Sony and Philips’ Super-Audio CD (SACD for short).
DVD-Audio and SACD are fundamentally different. (We can discount the inclusion, for political reasons, of SACD-like technology as an option in the current draft of the DVD-Audio spec. If you make a standard too wide, manufacturers will only implement the bits that suit them – and in the case of DVD-Audio, this would probably be PCM.) So there are currently two separate proposals on the table.
It’s time we looked at what we actually need from a HQAD, as well as trying to guess what we are likely to get. Which of these two techniques get closest to what we desire, and what are their pros and cons?
We rightfully want to make records with the highest quality gear available – preferably gear capable of higher quality than consumer equipment, preferably. This is why there is a quest for ever-higher sampling rates and longer word lengths. In my personal opinion, perceptible improvements in audio quality due to increased sampling rates for conventional PCM (ie not DSD) tail off above about 60kHz – but 88.2 and 96 are nice multiples of current practice, so why not. 24 bits is an improvement over 20, which is a significant step forward over 16-bit. Of course, few of us use all that dynamic range apart from during fades and in reverb tails.
We can probably hear the difference between 48 and 96kHz sampling in a quiet, modern studio, but it is difficult to say whether record buyers can. Even if they have home theater systems with surround capability, there are plenty of people around who will tell you that they’re happy with 20/48 (the capability of most DVD-Video players). This includes some noted producers, who feel that 20/48 DVDs with surround – and even surround-encoded CDs – are entirely adequate. 20/48, they believe, satisfies the quality requirements of the material, and the missing link is not higher digital resolution but surround-sound, which we can do with current systems (though I would argue not very well -- see Ambisonics in the Age of DVD, AudioMedia April 1998). Others actively dislike 96kHz, because on current converters they can hear problems like jitter. That doesn’t mean to say that the same people won’t use higher resolution systems in the studio, of course.
And are 24/96 converters real anyway? Yes, but they are difficult to do well. You may be able to get 24 or so bits to wiggle 96,000 times per second, but that doesn’t mean that the data itself carries any additional real information. Clock jitter is more difficult to deal with, for example, and noise levels in the analog stages -- more than the digital circuitry -- define the actual achievable dynamic range as much worse than the theoretical 144 dB. But compare converters and see what you think.
But converters with higher sampling rates can still sound better. Placing the anti-imaging and anti-aliasing filters at a sufficiently high frequency that they can’t do any audible damage to the sound is one obvious reason, and the other I’ve never seen in print before, so it may be rubbish (but here goes anyway).
As is widely recognized, we can’t hear much above 18kHz, but that does not mean that there isn’t anything up there that we need to record – and here’s the second reason for higher sampling rates. Plenty of acoustic instruments produce usable output up to around the 30 kHz mark – something that would be picked up in some form by a decent 30 in/s half-inch analogue recording. A string section, for example, could well produce some significant ultrasonic energy. Arguably, the ultrasonic content of all those instruments blends together to produce audible beat frequencies which contribute to the overall timbre of the sound. If you record your string section at a safe distance with a Soundfield mic, for example, all those interactions will have taken place in the air before your microphones ever capture the sound. You can record such a signal at 44.1kHz sampling and never worry about losing anything – as long as your filters are decent and you have enough bits.
If, however, your idea of recording a string section is with a couple of 48-track digital machines, a mic on each desk feeding its own track so that you can mix it all later, you are doomed. Your close-mic technique does not pick up any interactions, so the only place they can happen is when you mix it – by which time the ultrasonic stuff has all been knocked off by your 48 kHz multitrack machines, so that will never happen. So if I was to be uncharitable, I could say that high sampling rates allow you to use bad mic technique with better results.
Having established that higher sampling rates are a good idea, there is a question as to what the sample rate should actually be in a studio environment. On the face of it, 96kHz takes care of capturing any audio that might ever happen, and 24 bits offer quite enough quantization steps. Is that enough?
Yes, in theory. But there are some potential problems, real or imaginary, to having a production environment that has the same resolution as the consumer distribution format. Think of it as a kind of "headroom". We need higher resolution in the studio than consumers so we can start with a higher level of quality in case some gets lost on the way, which might well happen. And what happens when you modify a digital signal in the digital domain, say by EQing it? You create more bits. You ought to have spare bits so you have room to work. You can always lose resolution: but you can’t easily get it back again.
There’s another way of looking at it, which will be familiar to engineers and producers who recall the way things were in the Seventies. With recording facilities of the time, you could make an album which sounded fine to you, and probably to most people. But if you ever heard an audiophile’s playback system sucking every last ounce out of the vinyl, you’d hear not only what you recorded, but what you didn’t know you’d recorded – guitar amps humming, someone tapping their foot, and weird breaths at a drop-in point.
With the arrival of Compact Disc, everyone suddenly had the equivalent of an audiophile system, even if it cost far, far less. Suddenly, everyone could hear all the things you had recorded but over which you had no control.
I was only partly joking when I remarked some years ago that, quite frankly, listeners at home should degrade the replay quality of their gear to match the industrial audio setting of the studio. That way, they could hear our records the way that we heard them when we all agreed that Take 146 was the master. Do you want the closest approach to the original sound or not? What do listeners think the "original sound" is, anyway? Does it include things you couldn’t hear?
This, it seems to me, is another reason to have a production environment that has a higher intrinsic resolution than the consumer distribution medium: They are a whole lot less likely to get more information out of your recording than you knowingly put in. We simply can’t afford to have people recovering undefined sonic experiences from our albums, enjoying things that we had never known were there and would have removed if we had.
So even if the consumer format is based on 24/96 PCM, you may well feel that you need something even more exotic to make records with. Presumably this will include a sensibly-designed surround control-room where you can hear it all properly – but talking about that would be to broach a topic that would make Pandora’s Box seem as innocuous as a pack of Oreos. Let’s go there another time.
Unfortunately, today it isn’t even as simple as asking whether or not we should upgrade to 24/96 or even 24/192. Because, with Super Audio CD and Direct Stream Digital, there is another, virtually fundamentally different, option.
DSD, with its "1-bit" bitstream approach, also features lossless compression, an idea whose time has evidently, and thankfully, come. SACD is the distribution medium for this system: a disk made with DVD-like technology but containing a high-density layer with 5.1 surround and stereo areas, plus a Red-Book-compatible layer which can be read by a regular CD player.
At the recent Hi-Fi 98 show in Los Angeles, I had the opportunity to listen to demonstrations of Super-Audio CD, and to hear the originators talk about it. At a Sony/Philips demo, we heard excellent stereo recordings by Michael Bishop of Telarc, and surround recordings by Philips engineers. We also heard a real Super Audio CD played on a real SACD player – and on a boom-box, showing that the dual layer SACD/Red Book construction really does work.
I asked if there were plans for a higher sample rate than 2.882 MHz for production applications – perhaps one that would divide down nicely to all the likely PCM sample rates we might need in the marketplace, bearing in mind that non-integral sample-rate conversion is not easy to do without it sounding nasty. The answer I got was that there wasn’t apparently a decision on that point, but 7.056 MHz – Fs x 160 – "would be logical".
Even though I was never conscious of the 100kHz tweeters (part of the DSD technology, apparently), the sound was excellent. Gus Skinas of Sony kindly took me around the equipment room afterwards where Sony stereo hard disk recorders (interfaced with SDIF-II) and Philips multitrack HD recorders (interfaced with ST Optical cable) were arrayed.
In another room on the same floor, Marantz (owned by Philips) was demonstrating SACD versus 24/96. And whilst bearing in mind that the listening environment was a typically nasty hotel room, my colleagues, quite honestly, couldn’t hear the difference. Hmmm.
DSD is what you might call a "scalable" technology. The D/A is simply a low-pass filter, which you can implement as cleverly as you can afford. You can do simple, cheap, OK-sounding LPFs relatively easily (for an SACD ‘WalkPerson’ for example), while with a significantly higher degree of effort, truly high levels of quality are possible. The same recording – and the same physical disc – could satisfy the jogger, the in-car listener and the audiophile.
The sigma-delta style of A/D conversion used in the vast majority of PCM converters is still employed – you simply record the 1-bit stream directly instead of decimating it. And some existing A/D chips already have an output which can be used to derive a DSD stream.
And assuming you don’t simply want to convert your recording to DSD at the mastering stage – which many might see as rather missing the point of using the technology at all – then you need to replace virtually every piece of digital equipment in your studio. Ooops.
Luckily, you don’t need a whole lot for high-quality classical recording. Multichannel DSD converters to capture a surround signal, a recorder to store the output of the converters, an editing system to put it all together, and while you’re at it, put a couple of extra mics up and record a stereo version at 44.1 for the Red Book layer (or the CD version if you’re still doing them). All these products already exist, and generating sonically-decent 44.1 PCM from DSD at 2.882MHz is not too daunting.
When it comes to multitrack recording and mixing, however, more gear is involved, and DSD begins to get a bit scary. If PCM-based DVD-Audio becomes the standard, all we do is to upgrade our studio gear until we reach and possibly exceed (depending on the headroom we would like to have) 24/96 performance. It’s technology we know, and it’s an evolutionary strategy: comparatively safe.
An obvious problem, however, is that generating a 44.1 version will not be simply a matter of putting up a couple of mics for most people: DVD is based around 48 kHz and multiples, while Red Book CD is based around 44.1. Sample-rate conversion from 96 to 44.1 may not sound very nice – it’s not a simple divide-by-n -- so we will have to do two or even three mixes, stereo at 44.1 and 96, and 5.1 at 96. This may be too expensive for many record companies to consider. It may be why there are second thoughts about single-inventory, and why nobody is quite sure whether a DVD-Audio disc will have a Red Book layer
DSD/SACD on the other hand is a revolutionary strategy, and thus more risky. If we end up using DSD for multitrack-based production, a whole load of gear is required, almost all of which is currently imaginary. Or it may be, after all, that we can use high-enough-quality PCM systems in the studio and convert to DSD at the end of the day. In this case, the classical recordists are the only people who need to invest in (relatively simple) systems relying on DSD throughout.
The pros and cons do not stop there. There are many who believe bitstream signal processing to be fraught with difficulties. There are even potential problems with bitstream technology as a whole, which may render it intrinsically inferior to PCM – see the ARA Web site at http://www.meridian-audio.com/ara/ for details – although I would not claim to know enough of the theory to back one side or the other.
My guess is that we will end up with a consumer audio distribution format based either on SACD, or upon DVD-Audio -- PCM discs recorded at 24/96 -- or both, which might be the worst of all possible worlds. I suspect both will have a Red Book CD-compatible layer, and 5.1 as well as stereo mixes in most cases.
I can imagine that SACD will find most adherents among the fans of "serious" music, while the backers of PCM will be more inclined towards "popular". But it is difficult to imagine that the classical field could support its very own HQAD format, whether at the release or the replay end of the chain. As Mike Batt once put it, there are no such things as "popular" and "serious" music – just "popular" and "unpopular": and this technology is sufficiently expensive that whatever the next HQAD may be, it will need to be popular.
We have all the signs of a format war on our hands that, as always, will be expensive and controversial however it falls out in the end. The worst-case scenario – both formats co-existing – could have the same result as that caused in the past by having two incompatible open-reel digital systems. This arguably held back the introduction of serious digital recorders into the studio for years, and resulted in both formats being eclipsed in many minds by the MDM. Such a war would not be good for our business, and I would join a growing number of industry bodies in calling for a single-format solution, as quickly as possible.
Richard Elen (relen@brideswell.com) has been a frequent writer on professional audio for over two decades. He is now VP of Marketing at Apogee Electronics Corporation in California.
This article reflects the personal views of its author, which may not be those of his employers.