All of lore.kernel.org
 help / color / mirror / Atom feed
* ALSA processor usage is too high
@ 2010-11-05 14:12 Adam Rosenberg
  2010-11-05 14:31 ` Clemens Ladisch
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Rosenberg @ 2010-11-05 14:12 UTC (permalink / raw)
  To: alsa-devel

I am trying to decode 8 MP3 files simultaneously.  Each file is 3
minutes 11 seconds long.  When I just decode the files and copy the
PCM data to memory it takes 2 minutes 36 seconds.  When I do the same
test after opening 8 ALSA PCM streams and while writing data to them
it takes 4 minutes 37 seconds.  This is much too slow for what should
be a simple copy operation.

I need to have 8 MP3 files decoding and playing on our custom board.
I have taken the driver as far as I can on my own.  I am seeking
expert help to get this working as soon as possible.  I am sorry if
this was not the appropriate place to post such a request.  Please
contact me directly if you are interested in working on this project.

The custom board is setup as follows:

Analog Devices BF537 processor running uClinux from
svn://blackfin.uclinux.org/uclinux-dist/trunk uclinux-dist
Using alsa-lib-1.0.23 from http://www.alsa-project.org
Two Cirrus Logic CS42448 CODECs connected to BF537 SPORT0 Primary and
Secondary data lines in Multichannel mode
CS42448 configured for TDM (32 bits per channel * 8 channels = 256
bits per frame)
ICS661 Audio Clock at (256*48000Hz) connected to BF537 SPORT0 and Both
CS42448 CODECs for bit clocking
You can find the driver here:
http://www.alcorn.com/ftp/swap/sound_cs42448.zip

Thank you,
Adam

Adam Rosenberg
Software Engineer

Alcorn McBride Inc.
3300 South Hiawassee
Building 105
Orlando, FL 32835

(407) 296 - 5800 ext. 5490

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ALSA processor usage is too high
  2010-11-05 14:12 ALSA processor usage is too high Adam Rosenberg
@ 2010-11-05 14:31 ` Clemens Ladisch
  2010-11-05 15:19   ` Adam Rosenberg
  0 siblings, 1 reply; 10+ messages in thread
From: Clemens Ladisch @ 2010-11-05 14:31 UTC (permalink / raw)
  To: Adam Rosenberg; +Cc: alsa-devel

Adam Rosenberg wrote:
> I am trying to decode 8 MP3 files simultaneously.  Each file is 3
> minutes 11 seconds long.  When I just decode the files and copy the
> PCM data to memory it takes 2 minutes 36 seconds.  When I do the same
> test after opening 8 ALSA PCM streams and while writing data to them
> it takes 4 minutes 37 seconds.  This is much too slow for what should
> be a simple copy operation.

Are you using the "hw" device?  Otherwise, it's not a simple copy op.

How much CPU does "aplay -D hw -t raw -f dat /dev/zero" use?


Regards,
Clemens

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ALSA processor usage is too high
  2010-11-05 14:31 ` Clemens Ladisch
@ 2010-11-05 15:19   ` Adam Rosenberg
  2010-11-05 17:58     ` Clemens Ladisch
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Rosenberg @ 2010-11-05 15:19 UTC (permalink / raw)
  To: Clemens Ladisch; +Cc: alsa-devel

On Fri, Nov 5, 2010 at 10:31 AM, Clemens Ladisch <clemens@ladisch.de> wrote:
> Adam Rosenberg wrote:
>> I am trying to decode 8 MP3 files simultaneously.  Each file is 3
>> minutes 11 seconds long.  When I just decode the files and copy the
>> PCM data to memory it takes 2 minutes 36 seconds.  When I do the same
>> test after opening 8 ALSA PCM streams and while writing data to them
>> it takes 4 minutes 37 seconds.  This is much too slow for what should
>> be a simple copy operation.
>
> Are you using the "hw" device?  Otherwise, it's not a simple copy op.

I am using the hw device for each stream.

>
> How much CPU does "aplay -D hw -t raw -f dat /dev/zero" use?
>

I do not know how to calculate CPU usage for a given process.

All of my calculations have just been done by running the application
I wrote (which has to do a number of other things while decoding mp3
data and playing audio).  I time the application from start to finish
to determine if it is handling all of the tasks in an acceptable
amount of time.

For the audio playback I am polling the streams using
snd_pcm_avail_update() and then writing the number of frames available
using snd_pcm_writei().  I am trying to squish this whole project into
2mb of flash so I will not be able to include aplay in the final os
image.

I am able to add aplay for testing so I used your command to open the
8 streams, removed the audio processing from my application, and ran
the test again.  The result was 4 minutes and 35 seconds, which is
basically the same as the result from the test within my application
and much too slow.

Thanks,
Adam

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ALSA processor usage is too high
  2010-11-05 15:19   ` Adam Rosenberg
@ 2010-11-05 17:58     ` Clemens Ladisch
  2010-11-05 18:45       ` Adam Rosenberg
  0 siblings, 1 reply; 10+ messages in thread
From: Clemens Ladisch @ 2010-11-05 17:58 UTC (permalink / raw)
  To: Adam Rosenberg; +Cc: alsa-devel

Adam Rosenberg wrote:
> On Fri, Nov 5, 2010 at 10:31 AM, Clemens Ladisch <clemens@ladisch.de> wrote:
> > How much CPU does "aplay -D hw -t raw -f dat /dev/zero" use?
> 
> I do not know how to calculate CPU usage for a given process.

The time utility (if you have it) measures both elapsed and actually
used CPU time.

> For the audio playback I am polling the streams using
> snd_pcm_avail_update() and then writing the number of frames available
> using snd_pcm_writei().

And what does your program do when avail_update returns 0 frames?

> I am able to add aplay for testing so I used your command to open the
> 8 streams, removed the audio processing from my application, and ran
> the test again.  The result was 4 minutes and 35 seconds, which is
> basically the same as the result from the test within my application
> and much too slow.

You cannot write data faster than it's playing; the audio ring buffer
has a finite size.

You would have a problem if the processing and/or the driver would
make everything so slow that you wouldn't be able to write new data
to the device fast enough, which would result in an buffer underrun.
Does this actually happen?


Regards,
Clemens

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ALSA processor usage is too high
  2010-11-05 17:58     ` Clemens Ladisch
@ 2010-11-05 18:45       ` Adam Rosenberg
  2010-11-05 18:51         ` Mark Brown
  2010-11-05 19:10         ` Clemens Ladisch
  0 siblings, 2 replies; 10+ messages in thread
From: Adam Rosenberg @ 2010-11-05 18:45 UTC (permalink / raw)
  To: Clemens Ladisch; +Cc: alsa-devel

On Fri, Nov 5, 2010 at 1:58 PM, Clemens Ladisch <clemens@ladisch.de> wrote:
> Adam Rosenberg wrote:
>> On Fri, Nov 5, 2010 at 10:31 AM, Clemens Ladisch <clemens@ladisch.de> wrote:
>> > How much CPU does "aplay -D hw -t raw -f dat /dev/zero" use?
>>
>> I do not know how to calculate CPU usage for a given process.
>
> The time utility (if you have it) measures both elapsed and actually
> used CPU time.

I am not sure how to interpret this, but I told aplay to play for 3
minutes from /dev/zero and here are the results:
root:/> time aplay -d 180 -D hw -t raw -f dat /dev/zero
Playing raw data '/dev/zero' : Signed 16 bit Little Endian, Rate 48000
Hz, Stereo
real    3m 0.10s
user    0m 0.22s
sys     0m 8.48s

>
>> For the audio playback I am polling the streams using
>> snd_pcm_avail_update() and then writing the number of frames available
>> using snd_pcm_writei().
>
> And what does your program do when avail_update returns 0 frames?

If it returns 0 then I do not write any frames.  I then check the next
stream.  This continues in an infinite loop.

> You cannot write data faster than it's playing; the audio ring buffer
> has a finite size.
>
> You would have a problem if the processing and/or the driver would
> make everything so slow that you wouldn't be able to write new data
> to the device fast enough, which would result in an buffer underrun.
> Does this actually happen?

I am currently writing the decoded mp3 data to a buffer in RAM so that
the program is decoding the mp3 data as fast as it can.  I then run
the audio process separately and just play silence.  I am doing this
so that I can tell how much time is being spent decoding mp3 data and
processing audio data so that I know how much time remains for other
tasks.  From the times I have calculated I can tell that a buffer
underrun would occur frequently if I was actually writing the decoded
mp3 data to the pcm streams.

-Adam

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ALSA processor usage is too high
  2010-11-05 18:45       ` Adam Rosenberg
@ 2010-11-05 18:51         ` Mark Brown
  2010-11-05 19:10         ` Clemens Ladisch
  1 sibling, 0 replies; 10+ messages in thread
From: Mark Brown @ 2010-11-05 18:51 UTC (permalink / raw)
  To: Adam Rosenberg; +Cc: alsa-devel, Clemens Ladisch

On Fri, Nov 05, 2010 at 02:45:41PM -0400, Adam Rosenberg wrote:

> I am not sure how to interpret this, but I told aplay to play for 3
> minutes from /dev/zero and here are the results:
> root:/> time aplay -d 180 -D hw -t raw -f dat /dev/zero
> Playing raw data '/dev/zero' : Signed 16 bit Little Endian, Rate 48000
> Hz, Stereo
> real    3m 0.10s

This means that the application ran for 3m 0.1s...

> user    0m 0.22s

...during this time it spent 220ms in the actual application

> sys     0m 8.48s

...and 8.48s in kernel mode.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ALSA processor usage is too high
  2010-11-05 18:45       ` Adam Rosenberg
  2010-11-05 18:51         ` Mark Brown
@ 2010-11-05 19:10         ` Clemens Ladisch
  2010-11-05 19:27           ` Adam Rosenberg
  1 sibling, 1 reply; 10+ messages in thread
From: Clemens Ladisch @ 2010-11-05 19:10 UTC (permalink / raw)
  To: Adam Rosenberg; +Cc: alsa-devel

Adam Rosenberg wrote:
> I am not sure how to interpret this, but I told aplay to play for 3
> minutes from /dev/zero and here are the results:
> root:/> time aplay -d 180 -D hw -t raw -f dat /dev/zero
> Playing raw data '/dev/zero' : Signed 16 bit Little Endian, Rate 48000
> Hz, Stereo
> real    3m 0.10s
> user    0m 0.22s
> sys     0m 8.48s

9s / 180s = 5%

> > And what does your program do when avail_update returns 0 frames?
> 
> If it returns 0 then I do not write any frames.  I then check the next
> stream.  This continues in an infinite loop.

When the program is looping and waiting for some free space to become
available in any of the eight buffers, it doesn't actually process
audio data.  (And you should use poll() with all eight handles so that
you don't eat CPU while waiting.)

The aplay experiment above tells me that your program spent 95% of
its time calling snd_pcm_avail_update.  In that time, you could
decode instead.

> > You cannot write data faster than it's playing; the audio ring buffer
> > has a finite size.
> >
> > You would have a problem if the processing and/or the driver would
> > make everything so slow that you wouldn't be able to write new data
> > to the device fast enough, which would result in an buffer underrun.
> > Does this actually happen?
> 
> I am currently writing the decoded mp3 data to a buffer in RAM so that
> the program is decoding the mp3 data as fast as it can.  I then run
> the audio process separately and just play silence.  I am doing this
> so that I can tell how much time is being spent decoding mp3 data and
> processing audio data so that I know how much time remains for other
> tasks.

Playing silence is not any faster than playing anything else, because
the sound card _cannot_ run faster than the configured sample rate.


Regards,
Clemens

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ALSA processor usage is too high
  2010-11-05 19:10         ` Clemens Ladisch
@ 2010-11-05 19:27           ` Adam Rosenberg
  2010-11-06  5:30             ` Mark Brown
  2010-11-08 10:39             ` Clemens Ladisch
  0 siblings, 2 replies; 10+ messages in thread
From: Adam Rosenberg @ 2010-11-05 19:27 UTC (permalink / raw)
  To: Clemens Ladisch; +Cc: alsa-devel

On Fri, Nov 5, 2010 at 3:10 PM, Clemens Ladisch <clemens@ladisch.de> wrote:
> When the program is looping and waiting for some free space to become
> available in any of the eight buffers, it doesn't actually process
> audio data.  (And you should use poll() with all eight handles so that
> you don't eat CPU while waiting.)

I can't use poll because the application has to perform many other
tasks in a deterministic manner (meaning I can only use threads and
other processes to notify the main loop to perform some task).  I
tried using the async callback method so that I could set a flag when
it was time to copy more audio data to the stream but that didn't seem
to work well with multiple streams.  I found that polling using
avail_update was the only reliable method.  Could you provide an
alternative example that is known to work with multiple streams in the
same application?

>
> The aplay experiment above tells me that your program spent 95% of
> its time calling snd_pcm_avail_update.  In that time, you could
> decode instead.

Sorry for the confusion, the main loop of my application basically does this:
while(1)
{
  processNextAlsaStream();
  processMp3Decoder();
  processLCD();
  processInputs();
  processSerial();
}

so the processNextAlsaStream() function just calls avail_update for
the next stream in my list of 8 streams and then handles the result
before allowing the loop to process the next task.

>
> Playing silence is not any faster than playing anything else, because
> the sound card _cannot_ run faster than the configured sample rate.
>

I agree.  I only mentioned it was silence so that it was understood I
have a static buffer that I am copying audio frames from so there is
no other processing needed (no reading from a file, etc).

Thank you for your help, I am happy to be discussing this with you all
as it makes me feel as though I am not totally lost.  Please let me
know if you have an example of a program that can efficiently handle
multiple PCM streams.

Thanks!
Adam

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ALSA processor usage is too high
  2010-11-05 19:27           ` Adam Rosenberg
@ 2010-11-06  5:30             ` Mark Brown
  2010-11-08 10:39             ` Clemens Ladisch
  1 sibling, 0 replies; 10+ messages in thread
From: Mark Brown @ 2010-11-06  5:30 UTC (permalink / raw)
  To: Adam Rosenberg; +Cc: alsa-devel, Clemens Ladisch

On Fri, Nov 05, 2010 at 03:27:01PM -0400, Adam Rosenberg wrote:

> avail_update was the only reliable method.  Could you provide an
> alternative example that is known to work with multiple streams in the
> same application?

poll() is designed for this application.

> Sorry for the confusion, the main loop of my application basically does this:
> while(1)
> {
>   processNextAlsaStream();
>   processMp3Decoder();
>   processLCD();
>   processInputs();
>   processSerial();
> }

What you appear to be saying here is that your application which busy
waits is consuming a lot of CPU - this isn't entirely surprising, as
with many APIs in Linux the ALSA APIs are designed to be event driven.
If you really need to do this I'd suggest having all the functions which
can wait for input (at a guess at least the ALSA, input and serial ones)
converted to wait for events on their fds using poll(), epoll() or
whatever and if you desperately need to busy wait then do this by using
poll() on an epoll fd with a timeout of zero.  This will reduce the
overhead you incur for busy waiting.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ALSA processor usage is too high
  2010-11-05 19:27           ` Adam Rosenberg
  2010-11-06  5:30             ` Mark Brown
@ 2010-11-08 10:39             ` Clemens Ladisch
  1 sibling, 0 replies; 10+ messages in thread
From: Clemens Ladisch @ 2010-11-08 10:39 UTC (permalink / raw)
  To: Adam Rosenberg; +Cc: alsa-devel

Adam Rosenberg wrote:
> On Fri, Nov 5, 2010 at 3:10 PM, Clemens Ladisch <clemens@ladisch.de> wrote:
> > And you should use poll() with all eight handles so that you don't
> > eat CPU while waiting.)
> 
> I can't use poll because the application has to perform many other
> tasks in a deterministic manner (meaning I can only use threads and
> other processes to notify the main loop to perform some task).

As Mark wrote, this is what poll() was designed for.

> the main loop of my application basically does this:
> while(1)
> {
>   processNextAlsaStream();
>   processMp3Decoder();
>   processLCD();
>   processInputs();
>   processSerial();
> }

With poll(), it would look somewhat like this:

  struct pollfd pollfds[...];
  // fill pollfds with all handles
  while (1)
  {
     poll(...);
     for (1..8)
       if (stream ready for writing)
         processAlsaStream(i);
     if (input ready for reading)
       processInputs();
     if (serial ready for whatever)
       processSerial();
     processMp3Decoder();
  }

If you set the PCM device to non-blocking mode, you do not need to call
avail_update before writing; just try to write as much as you currently
have.

If you want to do something regularly, use the timeout of poll(), or
use a timerfd.

You mentioned threads; these are not directly supported with poll()
because they do not have a file handle, but if you want to wake up the
main loop, you can write to an eventfd or to a pipe created with pipe().


Regards,
Clemens

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-11-08 10:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-05 14:12 ALSA processor usage is too high Adam Rosenberg
2010-11-05 14:31 ` Clemens Ladisch
2010-11-05 15:19   ` Adam Rosenberg
2010-11-05 17:58     ` Clemens Ladisch
2010-11-05 18:45       ` Adam Rosenberg
2010-11-05 18:51         ` Mark Brown
2010-11-05 19:10         ` Clemens Ladisch
2010-11-05 19:27           ` Adam Rosenberg
2010-11-06  5:30             ` Mark Brown
2010-11-08 10:39             ` Clemens Ladisch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.