From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Lazzaro Date: Sat, 17 Jun 2000 18:33:03 +0000 Subject: Re: [linux-audio-dev] Re: Multimedia compression Message-Id: List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-sound@vger.kernel.org > OTOH, one way to achieve great compression with little distorsion would be > decomposing the audio stream in it's elementary "instruments", like > drumset , guitars ,violines , bass etc, and then code a sort of MIDI-like file > (very short) with the "samples" stored in the stream only once. > > I think MP4-SAOL is a step in that direction. Basically, there's two interrelated research agendas, both of which need to make it out of the lab for this sort of compression to make it to the real world: [1] Doing the decomposition. The three major ways people are thinking about solving this problem right now are [A] The mathematical approach -- in a nutshell "Let's separate out N signals from one, under the assumption that the best separation makes each separated signal carry unique information. The best starting point for understanding how these folks think about the problem is: http://www.cnl.salk.edu/~tony/ica.html While many of these papers think in terms of "N microphones to listen to a performance of N instrumentalists" as the starting signal, this is just to make the math easier -- in practice the ideas can be extended to the more general case. [B] Auditory scene analysis -- in short, take the analogy from computer vision, where raw camera (or retina) output gets broken up into different "maps" coding motion, color, shape, ect, and apply it to audition. These maps should (in theory) make the process of doing separation a lot easier. The good initial resource for this approach is: http://sound.media.mit.edu/~dpwe/AUDITORY/ [C] Get access to the 24-track master tapes (uh, I think I just dated myself :-), and avoid the whole decomposition problem. Technically easy, of course, but might not be practical. [2] Once you've done the decomposition, create encoders specialized for specific types of sounds. For certain types of specialized sounds, this field is very mature (i.e. spoken speech). For other types of sounds, though, you're basically going to have to [1] Look at the best music synthesis algorithms for the sound type, and see how easily the algorithm can be "inverted". [2] Hope a speech codec works well on the sound (there are a few codecs specialized for singing voice that take this approach ...). [3] Do original research on the problem. One big problem with a field like this is that its hard to get people motivated to work on the "little problems" described above, if its unclear how success on the little problem is going to solve any big problem. The hope with Structured Audio is to solve this "meta-problem" -- by having a fielded platform out there, ready to support new approaches to sound encoding in pilot applications, SA will help motivate research in the whole area. In particular, the politics of MPEG is very different than the politics of IETF -- "joining the process" of creating a new coding standard is a deep committement of time and resources and money in MPEG-land (note that I'm not an MPEG member, so I'm outside the process, largely because of the committment it takes to meaningly participate). But implementing a new codec _in_ Structured Audio is a whole different ballgame -- you could get a team together under the rubric of the IETF, or you could just take the traditional Free Software approach and start a grass roots effort like LAD. You don't need anyone's permission to write a SAOL program ... --jl ------------------------------------------------------------------------- John Lazzaro -- Research Specialist -- CS Division -- EECS -- UC Berkeley lazzaro [at] cs [dot] berkeley [dot] edu www.cs.berkeley.edu/~lazzaro -------------------------------------------------------------------------