All of lore.kernel.org
 help / color / mirror / Atom feed
* rme9652: possible deadlock
@ 2003-04-14 18:47 Jeremy Hall
  2003-04-22 14:16 ` Jaroslav Kysela
  0 siblings, 1 reply; 19+ messages in thread
From: Jeremy Hall @ 2003-04-14 18:47 UTC (permalink / raw)
  To: paul; +Cc: alsa-devel

Hi,

Consider the following:

Two RME9652's are running together and on different interrupts.

The master, in interrupt context, acquires its runtime->lock and begins 
snd_pcm_update_hw_ptr_interrupt()

At the same time, the second card, the slave, is behind, still in play 
mode, and wants to XRUN.  To do that, it must stop and restart all the 
substreams connected to it.  To do that, it must acquire the runtime lock 
of each, but the capture substream is locked in another interrupt.

solution:

Is it acceptible if XRUN occurs in a pcm_multi environment to only restart 
substreams related to that physical card? or is it necessary to restart 
the whole device to maintain sample-sync?

I'm thinking you'd need to restart all devices.  Is this reasonable? as 
in, am I reading the code correctly?

_J


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-14 18:47 rme9652: possible deadlock Jeremy Hall
@ 2003-04-22 14:16 ` Jaroslav Kysela
  2003-04-22 19:22   ` Jeremy Hall
  0 siblings, 1 reply; 19+ messages in thread
From: Jaroslav Kysela @ 2003-04-22 14:16 UTC (permalink / raw)
  To: Jeremy Hall; +Cc: paul@linuxaudiosystems.com, alsa-devel@lists.sourceforge.net

On Mon, 14 Apr 2003, Jeremy Hall wrote:

> Hi,
> 
> Consider the following:
> 
> Two RME9652's are running together and on different interrupts.
> 
> The master, in interrupt context, acquires its runtime->lock and begins 
> snd_pcm_update_hw_ptr_interrupt()
> 
> At the same time, the second card, the slave, is behind, still in play 
> mode, and wants to XRUN.  To do that, it must stop and restart all the 
> substreams connected to it.  To do that, it must acquire the runtime lock 
> of each, but the capture substream is locked in another interrupt.

Yes, there is possible deadlock. Fortunately, we can simply disable 
interrupts for local CPU for critical area (spin_lock -> 
spin_lock_irqsave). Fixed in CVS and thank you for the notice.

						Jaroslav

-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-22 14:16 ` Jaroslav Kysela
@ 2003-04-22 19:22   ` Jeremy Hall
  2003-04-22 19:42     ` Jaroslav Kysela
  0 siblings, 1 reply; 19+ messages in thread
From: Jeremy Hall @ 2003-04-22 19:22 UTC (permalink / raw)
  To: Jaroslav Kysela
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

I wrote up a fix in the rme9652 code that fixes this and allows both cards 
to run, preferrably one on one cpu and one on the other.  The only 
remaining corner case is if both cards try to run snd_pcm_stop() at the 
exact same time.  This still doesn't work if you disable interrupts on the 
local CPU and it is not acceptable for me to have to lock a spin_lock and 
serialize the cards, allowing only one to run at a time.  In theory if you 
have both cards running, you can process twice the data in the same amount 
of time.  This should allow you to run 96000 at 64 frames per cycle with 
the same joy that I currently run 48000 at 64 frames_per_cycles--you're 
still moving the same amount of bits, just in the 48 case you're dropping 
half the channels (you'd have to see my pcm_multi posted to this list 
before)

I can provide my rme9652 patch if you'd like.

_J

In the new year, Jaroslav Kysela wrote:
> On Mon, 14 Apr 2003, Jeremy Hall wrote:
> 
> > Hi,
> > 
> > Consider the following:
> > 
> > Two RME9652's are running together and on different interrupts.
> > 
> > The master, in interrupt context, acquires its runtime->lock and begins 
> > snd_pcm_update_hw_ptr_interrupt()
> > 
> > At the same time, the second card, the slave, is behind, still in play 
> > mode, and wants to XRUN.  To do that, it must stop and restart all the 
> > substreams connected to it.  To do that, it must acquire the runtime lock 
> > of each, but the capture substream is locked in another interrupt.
> 
> Yes, there is possible deadlock. Fortunately, we can simply disable 
> interrupts for local CPU for critical area (spin_lock -> 
> spin_lock_irqsave). Fixed in CVS and thank you for the notice.
> 
> 						Jaroslav
> 
> -----
> Jaroslav Kysela <perex@suse.cz>
> Linux Kernel Sound Maintainer
> ALSA Project, SuSE Labs
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/alsa-devel
> 



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-22 19:22   ` Jeremy Hall
@ 2003-04-22 19:42     ` Jaroslav Kysela
  2003-04-22 22:04       ` Jeremy Hall
  0 siblings, 1 reply; 19+ messages in thread
From: Jaroslav Kysela @ 2003-04-22 19:42 UTC (permalink / raw)
  To: Jeremy Hall; +Cc: paul@linuxaudiosystems.com, alsa-devel@lists.sourceforge.net

On Tue, 22 Apr 2003, Jeremy Hall wrote:

> I wrote up a fix in the rme9652 code that fixes this and allows both cards 
> to run, preferrably one on one cpu and one on the other.  The only 
> remaining corner case is if both cards try to run snd_pcm_stop() at the 
> exact same time.  This still doesn't work if you disable interrupts on the 
> local CPU and it is not acceptable for me to have to lock a spin_lock and 
> serialize the cards, allowing only one to run at a time.  In theory if you 

Sorry, but spinlocking of the arbiter code in interrupt has nothing to do 
with the sample DMA transfers thus stream throughput. The streaming is not 
stopped with disabling interrupts for local CPU.

						Jaroslav

-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-22 19:42     ` Jaroslav Kysela
@ 2003-04-22 22:04       ` Jeremy Hall
  2003-04-23  9:51         ` Jaroslav Kysela
  0 siblings, 1 reply; 19+ messages in thread
From: Jeremy Hall @ 2003-04-22 22:04 UTC (permalink / raw)
  To: Jaroslav Kysela
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

If both cards try to run snd_pcm_stop() at the same time, one on one CPU 
one on the other, how can disabling local interrupts help? The code takes 
great care to not try to acquire its own substream->runtime->lock because 
the calling process has already acquired this lock.  But if both are 
trying to run snd_pcm_stop at the same time, CPU0 will try to acquire 
CPU1's lock, and CPU1 will try to acquire CPU0's.  Since neither will give 
up his lock, deadlock will occur.

Since the only solution I can see at the moment is to only allow one card 
to process at a time, I find that repulsive and therefore would like to 
see a different solution.

Therefore I would like to see a mechanism brought down through the calling 
hierarchy so that alsa knows that all four substreams are linked and 
therefore should only be governed by a single lock and further more that 
if one card XRUNs and the other doesn't, the offensive card should put 
silence in his buffer to maintain sample-sync.

Both cards are indeed running from the same clock, but sometimes we can't 
get around to servicing an interrupt so we XRUN--things are going too fast 
for even the slightest gitter.

Consider the case where left channel data is on one card, right is on the 
other.  If sample-synch is broekn, the channels will drift out of phase, 
permanently, and the application has no idea it has happened.  I think a 
better solution than pot luck needs to happen, although it may indeed be 
happening--I just don't understand the alsa code very well.  In a 
professional environment, a XRUN isn't a good thing, but a partial XRUN is 
a desaster.

Please understand I am not upset at you, I am not upset at all, I simply 
want to make my views known and to let you know this is something I feel 
quite strongly about.

_J

In the new year, Jaroslav Kysela wrote:
> On Tue, 22 Apr 2003, Jeremy Hall wrote:
> 
> > I wrote up a fix in the rme9652 code that fixes this and allows both cards 
> > to run, preferrably one on one cpu and one on the other.  The only 
> > remaining corner case is if both cards try to run snd_pcm_stop() at the 
> > exact same time.  This still doesn't work if you disable interrupts on the 
> > local CPU and it is not acceptable for me to have to lock a spin_lock and 
> > serialize the cards, allowing only one to run at a time.  In theory if you 
> 
> Sorry, but spinlocking of the arbiter code in interrupt has nothing to do 
> with the sample DMA transfers thus stream throughput. The streaming is not 
> stopped with disabling interrupts for local CPU.
> 
> 						Jaroslav
> 
> -----
> Jaroslav Kysela <perex@suse.cz>
> Linux Kernel Sound Maintainer
> ALSA Project, SuSE Labs
> 



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-22 22:04       ` Jeremy Hall
@ 2003-04-23  9:51         ` Jaroslav Kysela
  2003-04-23 15:06           ` Jeremy Hall
  0 siblings, 1 reply; 19+ messages in thread
From: Jaroslav Kysela @ 2003-04-23  9:51 UTC (permalink / raw)
  To: Jeremy Hall; +Cc: paul@linuxaudiosystems.com, alsa-devel@lists.sourceforge.net

On Tue, 22 Apr 2003, Jeremy Hall wrote:

> If both cards try to run snd_pcm_stop() at the same time, one on one CPU 
> one on the other, how can disabling local interrupts help? The code takes 
> great care to not try to acquire its own substream->runtime->lock because 
> the calling process has already acquired this lock.  But if both are 
> trying to run snd_pcm_stop at the same time, CPU0 will try to acquire 
> CPU1's lock, and CPU1 will try to acquire CPU0's.  Since neither will give 
> up his lock, deadlock will occur.

It's not true. Spinlocks work in this way:

1) try acquire atomically lock
   - success -> continue
   - fail -> goto 1)

So, CPU0 will get lock and CPU1 will be in a loop until lock is released
in SMP. The deadlock is if one CPU wants to enter to a spinlocked code
area more than once. Because linked cards might be on different IRQ lines,
there is a possibility for a deadlock for one CPU. But if you disable the
interrupts for this CPU before you lock the spinlocked area, the deadlock
cannot occur, because this CPU doesn't accept interrupts at all, thus it
cannot try to acquire same lock again.

> Since the only solution I can see at the moment is to only allow one card 
> to process at a time, I find that repulsive and therefore would like to 
> see a different solution.
> 
> Therefore I would like to see a mechanism brought down through the calling 
> hierarchy so that alsa knows that all four substreams are linked and 
> therefore should only be governed by a single lock and further more that 
> if one card XRUNs and the other doesn't, the offensive card should put 
> silence in his buffer to maintain sample-sync.

All cards (better linked streams) XRUNs at same moment in current
implementation. The only drawback is that interrupts are processed for all
streams separately. It might be optimized for streams with hardware sync,
but it cannot be for streams without hardware sync. Also, the optimization
is negligible in my eyes, because we can reduce only calls to determine
the current position in the ring buffer, all other arbiter code must be
preserved.

> Both cards are indeed running from the same clock, but sometimes we can't 
> get around to servicing an interrupt so we XRUN--things are going too fast 
> for even the slightest gitter.
> 
> Consider the case where left channel data is on one card, right is on the 
> other.  If sample-synch is broekn, the channels will drift out of phase, 
> permanently, and the application has no idea it has happened.  I think a 
> better solution than pot luck needs to happen, although it may indeed be 
> happening--I just don't understand the alsa code very well.  In a 
> professional environment, a XRUN isn't a good thing, but a partial XRUN is 
> a desaster.

If we stop all linked streams at one time XRUN, application must restart 
all things and the driver will GUERANTEE that the restarted streams are 
in sample sync if hardware supports this feature. I don't see any problem 
in this area.

And yes, applications might use no-xrun mode. The sample sync is always 
guaranteed, but the transferred stream might be invalid.

Anyway, XRUNs are bad for "profi" audio. If you cannot avoid them with 
your hardware and software configuration, it's bad. I see large areas in 
hardware and software tuning for audio gurus.

						Jaroslav

-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-23  9:51         ` Jaroslav Kysela
@ 2003-04-23 15:06           ` Jeremy Hall
  2003-04-23 19:19             ` Jaroslav Kysela
  0 siblings, 1 reply; 19+ messages in thread
From: Jeremy Hall @ 2003-04-23 15:06 UTC (permalink / raw)
  To: Jaroslav Kysela
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

In the new year, Jaroslav Kysela wrote:
> On Tue, 22 Apr 2003, Jeremy Hall wrote:
> 
> > If both cards try to run snd_pcm_stop() at the same time, one on one CPU 
> > one on the other, how can disabling local interrupts help? The code takes 
> > great care to not try to acquire its own substream->runtime->lock because 
> > the calling process has already acquired this lock.  But if both are 
> > trying to run snd_pcm_stop at the same time, CPU0 will try to acquire 
> > CPU1's lock, and CPU1 will try to acquire CPU0's.  Since neither will give 
> > up his lock, deadlock will occur.
> 
> It's not true. Spinlocks work in this way:
> 
> 1) try acquire atomically lock
>    - success -> continue
>    - fail -> goto 1)
> 
> So, CPU0 will get lock and CPU1 will be in a loop until lock is released
> in SMP. The deadlock is if one CPU wants to enter to a spinlocked code
> area more than once. Because linked cards might be on different IRQ lines,
> there is a possibility for a deadlock for one CPU. But if you disable the
> interrupts for this CPU before you lock the spinlocked area, the deadlock
> cannot occur, because this CPU doesn't accept interrupts at all, thus it
> cannot try to acquire same lock again.
> 
ok, it might not be a deadlock, but here is how I see it as I read through 
the code: Let us start with _SND_PCM_ACTION()

in pcm_native.c: 606

res is set to 0

a read lock snd_pcm_link_lock is acquired on CPU0, caller's 
substream->runtime->lock is already acquired.  Because this is a macro, it 
is unclear to me fully how this plays out.

On CPU1, its substream->runtime->lock is acquired from its calling 
process.  If you are permitted to have multiple read locks, CPU1 might 
also acquire the read lock, if not, it will wait indefinitely here.

cheerfully cooking along, CPU0 begins a while loop, while s != substream, 
its given substream (itself), so in essence iterate through all substreams 
that aren't itself

{
if certain the iterator s is not equal to the substream the caller gave 
you, i.e. yourself, acquire the runtime lock.

CPU0 iterates through snd_pcm_pre_stop, correcting for errors etc for the 
substreams until it hits the one that CPU1 is processing.  Whether CPU1 is 
waiting on the read lock to be released or has iterated this far in the 
processing doesn't matter because neither will ever release their locks.  
Both CPUs will wait forever, or until they burn up, because each process 
is depending on a backdoor (if you will allow me to loosely use this term) 
to get the job done.

}

I do not intend to be patronizing, it just helps me to see things written 
out like this and helps expose misconceptions on my part. (I have a long 
way to go on learning kernel)

> > Since the only solution I can see at the moment is to only allow one card 
> > to process at a time, I find that repulsive and therefore would like to 
> > see a different solution.
> > 
> > Therefore I would like to see a mechanism brought down through the calling 
> > hierarchy so that alsa knows that all four substreams are linked and 
> > therefore should only be governed by a single lock and further more that 
> > if one card XRUNs and the other doesn't, the offensive card should put 
> > silence in his buffer to maintain sample-sync.
> 
> All cards (better linked streams) XRUNs at same moment in current
> implementation. The only drawback is that interrupts are processed for all
> streams separately. It might be optimized for streams with hardware sync,
> but it cannot be for streams without hardware sync. Also, the optimization
> is negligible in my eyes, because we can reduce only calls to determine
> the current position in the ring buffer, all other arbiter code must be
> preserved.
> 
They may XRUN at near the same time, but because interrupts are 
asynchronous, it is very possible for one card to process a different 
number of frames than the other (because of the XRUN) and because the 
application assumes that if a XRUN is reported it happened for the whole 
device.  If your buffer is 4096 frames per fragment, and for the 
simplistic case there are two fragments, somewhere in that lot one card 
XRUNs, the other doesn't.  Now one card is at time x, the other at (worst 
case) 4096 - x.  What does the application do? throw out its data for the 
whole fragment? (for both cards)

actually it could be a LONG ways off, it could be several periods off, in 
fact if they get out of synch, the application could read old data from 
the slow card's memory.  When does the application get woken? does alsa 
make sure that both cards are done before waking the application?

> > Both cards are indeed running from the same clock, but sometimes we can't 
> > get around to servicing an interrupt so we XRUN--things are going too fast 
> > for even the slightest gitter.
> > 
> > Consider the case where left channel data is on one card, right is on the 
> > other.  If sample-synch is broekn, the channels will drift out of phase, 
> > permanently, and the application has no idea it has happened.  I think a 
> > better solution than pot luck needs to happen, although it may indeed be 
> > happening--I just don't understand the alsa code very well.  In a 
> > professional environment, a XRUN isn't a good thing, but a partial XRUN is 
> > a desaster.
> 
> If we stop all linked streams at one time XRUN, application must restart 
> all things and the driver will GUERANTEE that the restarted streams are 
> in sample sync if hardware supports this feature. I don't see any problem 
> in this area.
> 
yes, if we stop them at one time, but we might not stop them at the "same" 
time.

> And yes, applications might use no-xrun mode. The sample sync is always 
> guaranteed, but the transferred stream might be invalid.
> 
What does jack use?

> Anyway, XRUNs are bad for "profi" audio. If you cannot avoid them with 
> your hardware and software configuration, it's bad. I see large areas in 
> hardware and software tuning for audio gurus.
> 
I'm working on them, one at a time. :) I'm having moderate success, 
however.

_J

> 						Jaroslav
> 
> -----
> Jaroslav Kysela <perex@suse.cz>
> Linux Kernel Sound Maintainer
> ALSA Project, SuSE Labs
> 
> 
> 



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-23 15:06           ` Jeremy Hall
@ 2003-04-23 19:19             ` Jaroslav Kysela
  2003-04-24 10:03               ` Abramo Bagnara
  2003-04-28  8:44               ` Jeremy Hall
  0 siblings, 2 replies; 19+ messages in thread
From: Jaroslav Kysela @ 2003-04-23 19:19 UTC (permalink / raw)
  To: Jeremy Hall; +Cc: paul@linuxaudiosystems.com, alsa-devel@lists.sourceforge.net

On Wed, 23 Apr 2003, Jeremy Hall wrote:

> In the new year, Jaroslav Kysela wrote:
> > On Tue, 22 Apr 2003, Jeremy Hall wrote:
> > 
> > > If both cards try to run snd_pcm_stop() at the same time, one on one CPU 
> > > one on the other, how can disabling local interrupts help? The code takes 
> > > great care to not try to acquire its own substream->runtime->lock because 
> > > the calling process has already acquired this lock.  But if both are 
> > > trying to run snd_pcm_stop at the same time, CPU0 will try to acquire 
> > > CPU1's lock, and CPU1 will try to acquire CPU0's.  Since neither will give 
> > > up his lock, deadlock will occur.
> > 
> > It's not true. Spinlocks work in this way:
> > 
> > 1) try acquire atomically lock
> >    - success -> continue
> >    - fail -> goto 1)
> > 
> > So, CPU0 will get lock and CPU1 will be in a loop until lock is released
> > in SMP. The deadlock is if one CPU wants to enter to a spinlocked code
> > area more than once. Because linked cards might be on different IRQ lines,
> > there is a possibility for a deadlock for one CPU. But if you disable the
> > interrupts for this CPU before you lock the spinlocked area, the deadlock
> > cannot occur, because this CPU doesn't accept interrupts at all, thus it
> > cannot try to acquire same lock again.
> > 
> ok, it might not be a deadlock, but here is how I see it as I read through 
> the code: Let us start with _SND_PCM_ACTION()
> 
> in pcm_native.c: 606
> 
> res is set to 0
> 
> a read lock snd_pcm_link_lock is acquired on CPU0, caller's 
> substream->runtime->lock is already acquired.  Because this is a macro, it 
> is unclear to me fully how this plays out.
> 
> On CPU1, its substream->runtime->lock is acquired from its calling 
> process.  If you are permitted to have multiple read locks, CPU1 might 
> also acquire the read lock, if not, it will wait indefinitely here.
> 
> cheerfully cooking along, CPU0 begins a while loop, while s != substream, 
> its given substream (itself), so in essence iterate through all substreams 
> that aren't itself
> 
> {
> if certain the iterator s is not equal to the substream the caller gave 
> you, i.e. yourself, acquire the runtime lock.
> 
> CPU0 iterates through snd_pcm_pre_stop, correcting for errors etc for the 
> substreams until it hits the one that CPU1 is processing.  Whether CPU1 is 
> waiting on the read lock to be released or has iterated this far in the 
> processing doesn't matter because neither will ever release their locks.  
> Both CPUs will wait forever, or until they burn up, because each process 
> is depending on a backdoor (if you will allow me to loosely use this term) 
> to get the job done.
> 
> }
> 
> I do not intend to be patronizing, it just helps me to see things written 
> out like this and helps expose misconceptions on my part. (I have a long 
> way to go on learning kernel)

Bingo, I got it. You mean this scenario (linked two streams):

CPU0: stream1 interrupt: stream1 locked -> snd_pcm_stop()
CPU1: stream2 interrupt: stream2 locked -> snd_pcm_stop()
CPU0: try to lock stream2 (in ACTION macro)
CPU1: try to lock stream1 (in ACTION macro)

Yes, this is next deadlock issue.

I have probably only one solution in my brain: Separate snd_pcm_stop() to 
two steps:

1) stop only the locked stream and unlock it
2) stop all other linked streams

> > > Since the only solution I can see at the moment is to only allow one card 
> > > to process at a time, I find that repulsive and therefore would like to 
> > > see a different solution.
> > > 
> > > Therefore I would like to see a mechanism brought down through the calling 
> > > hierarchy so that alsa knows that all four substreams are linked and 
> > > therefore should only be governed by a single lock and further more that 
> > > if one card XRUNs and the other doesn't, the offensive card should put 
> > > silence in his buffer to maintain sample-sync.
> > 
> > All cards (better linked streams) XRUNs at same moment in current
> > implementation. The only drawback is that interrupts are processed for all
> > streams separately. It might be optimized for streams with hardware sync,
> > but it cannot be for streams without hardware sync. Also, the optimization
> > is negligible in my eyes, because we can reduce only calls to determine
> > the current position in the ring buffer, all other arbiter code must be
> > preserved.
> > 
> They may XRUN at near the same time, but because interrupts are 
> asynchronous, it is very possible for one card to process a different 
> number of frames than the other (because of the XRUN) and because the 
> application assumes that if a XRUN is reported it happened for the whole 
> device.  If your buffer is 4096 frames per fragment, and for the 
> simplistic case there are two fragments, somewhere in that lot one card 
> XRUNs, the other doesn't.  Now one card is at time x, the other at (worst 
> case) 4096 - x.  What does the application do? throw out its data for the 
> whole fragment? (for both cards)
> 
> actually it could be a LONG ways off, it could be several periods off, in 
> fact if they get out of synch, the application could read old data from 
> the slow card's memory.  When does the application get woken? does alsa 
> make sure that both cards are done before waking the application?

No, if hardware doesn't do the exact sync, we don't care. We describe
hardware and it's definitely out of scope of kernel's code. It's up to
application (or maybe pcm_multi code - not implemented) to handle clock
drifts.

> > > Both cards are indeed running from the same clock, but sometimes we can't 
> > > get around to servicing an interrupt so we XRUN--things are going too fast 
> > > for even the slightest gitter.
> > > 
> > > Consider the case where left channel data is on one card, right is on the 
> > > other.  If sample-synch is broekn, the channels will drift out of phase, 
> > > permanently, and the application has no idea it has happened.  I think a 
> > > better solution than pot luck needs to happen, although it may indeed be 
> > > happening--I just don't understand the alsa code very well.  In a 
> > > professional environment, a XRUN isn't a good thing, but a partial XRUN is 
> > > a desaster.
> > 
> > If we stop all linked streams at one time XRUN, application must restart 
> > all things and the driver will GUERANTEE that the restarted streams are 
> > in sample sync if hardware supports this feature. I don't see any problem 
> > in this area.
> > 
> yes, if we stop them at one time, but we might not stop them at the "same" 
> time.

Note: We don't even stop the stream for once card "precisely" at period 
boundary (if interrupt arrived late) so it's academic question. Does it 
hurt something if streams are in different position after XRUN? They must 
be restarted anyway.

> > And yes, applications might use no-xrun mode. The sample sync is always 
> > guaranteed, but the transferred stream might be invalid.
> > 
> What does jack use?

Jack uses standard xrun mode and Paul doesn't probably work with multiple 
cards connected together. He uses RME hardware with 24 channels.

						Jaroslav

-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-23 19:19             ` Jaroslav Kysela
@ 2003-04-24 10:03               ` Abramo Bagnara
  2003-04-28  8:23                 ` Jeremy Hall
  2003-04-28  8:25                 ` Jeremy Hall
  2003-04-28  8:44               ` Jeremy Hall
  1 sibling, 2 replies; 19+ messages in thread
From: Abramo Bagnara @ 2003-04-24 10:03 UTC (permalink / raw)
  To: Jaroslav Kysela
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

Jaroslav Kysela ha scritto:
> 
> 
> Bingo, I got it. You mean this scenario (linked two streams):
> 
> CPU0: stream1 interrupt: stream1 locked -> snd_pcm_stop()
> CPU1: stream2 interrupt: stream2 locked -> snd_pcm_stop()
> CPU0: try to lock stream2 (in ACTION macro)
> CPU1: try to lock stream1 (in ACTION macro)
> 
> Yes, this is next deadlock issue.
> 
> I have probably only one solution in my brain: Separate snd_pcm_stop() to 
> two steps:
> 
> 1) stop only the locked stream and unlock it
> 2) stop all other linked streams

My proposal is to have a per "linked PCM group" lock and to use it for 
proper locking (with also some side benefits substituting some use of 
snd_pcm_link_lock).

This mean also we need to change the linked list struct from

struct _snd_pcm_substream {
	...
         /* -- linked substreams -- */
         snd_pcm_substream_t *link_next;
         snd_pcm_substream_t *link_prev;

to
	struct list_head link_list;
	snd_pcm_link_t *link;	/* 0 for non linked stream */

typedef struct _snd_pcm_link {
	spinlock_t lock;
	struct list_head substreams;
};

Of course creation and destruction of a snd_pcm_link_t need to be 
protected by snd_pcm_link_lock.


-- 
Abramo Bagnara                       mailto:abramo.bagnara@libero.it

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-24 10:03               ` Abramo Bagnara
@ 2003-04-28  8:23                 ` Jeremy Hall
  2003-04-28  8:58                   ` Abramo Bagnara
  2003-04-28  8:25                 ` Jeremy Hall
  1 sibling, 1 reply; 19+ messages in thread
From: Jeremy Hall @ 2003-04-28  8:23 UTC (permalink / raw)
  To: Abramo Bagnara
  Cc: Jaroslav Kysela, Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

Will this fix allow both CPUs to run snd_pcm_period_elapsed() and friends 
concurrently and then if XRUN is encountered grab the group lock, or will 
one stream run at a time, in effect only allowing one card to process at a 
time?

Maybe I misunderstand this code, but my understanding is that this moves 
the pointer along in the ringbuffer.  Since two cards in a pcm_multi can 
use different areas, they should be able to process() at the same time, 
until XRUN is encountered, then the "right" thing should happen according 
to maintaining sample-synch.  My understanding is this can't happen 
currently with alsa and this is unfortunate.

_J

In the new year, Abramo Bagnara wrote:
> Jaroslav Kysela ha scritto:
> > 
> > 
> > Bingo, I got it. You mean this scenario (linked two streams):
> > 
> > CPU0: stream1 interrupt: stream1 locked -> snd_pcm_stop()
> > CPU1: stream2 interrupt: stream2 locked -> snd_pcm_stop()
> > CPU0: try to lock stream2 (in ACTION macro)
> > CPU1: try to lock stream1 (in ACTION macro)
> > 
> > Yes, this is next deadlock issue.
> > 
> > I have probably only one solution in my brain: Separate snd_pcm_stop() to 
> > two steps:
> > 
> > 1) stop only the locked stream and unlock it
> > 2) stop all other linked streams
> 
> My proposal is to have a per "linked PCM group" lock and to use it for 
> proper locking (with also some side benefits substituting some use of 
> snd_pcm_link_lock).
> 
> This mean also we need to change the linked list struct from
> 
> struct _snd_pcm_substream {
> 	...
>          /* -- linked substreams -- */
>          snd_pcm_substream_t *link_next;
>          snd_pcm_substream_t *link_prev;
> 
> to
> 	struct list_head link_list;
> 	snd_pcm_link_t *link;	/* 0 for non linked stream */
> 
> typedef struct _snd_pcm_link {
> 	spinlock_t lock;
> 	struct list_head substreams;
> };
> 
> Of course creation and destruction of a snd_pcm_link_t need to be 
> protected by snd_pcm_link_lock.
> 
> 
> -- 
> Abramo Bagnara                       mailto:abramo.bagnara@libero.it
> 
> Opera Unica                          Phone: +39.546.656023
> Via Emilia Interna, 140
> 48014 Castel Bolognese (RA) - Italy
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/alsa-devel
> 



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-24 10:03               ` Abramo Bagnara
  2003-04-28  8:23                 ` Jeremy Hall
@ 2003-04-28  8:25                 ` Jeremy Hall
  1 sibling, 0 replies; 19+ messages in thread
From: Jeremy Hall @ 2003-04-28  8:25 UTC (permalink / raw)
  To: Abramo Bagnara
  Cc: Jaroslav Kysela, Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

now I see your fix is in CVS, but I currently compile alsa into the kernel 
to aid in gdb.  Is it possible to compile alsa-kernel directly into the 
kernel? Does it have the irqreturn_t stuff in CVS?

Is there some doc I have shamelessly not read that explains this answer?

_J

In the new year, Abramo Bagnara wrote:
> Jaroslav Kysela ha scritto:
> > 
> > 
> > Bingo, I got it. You mean this scenario (linked two streams):
> > 
> > CPU0: stream1 interrupt: stream1 locked -> snd_pcm_stop()
> > CPU1: stream2 interrupt: stream2 locked -> snd_pcm_stop()
> > CPU0: try to lock stream2 (in ACTION macro)
> > CPU1: try to lock stream1 (in ACTION macro)
> > 
> > Yes, this is next deadlock issue.
> > 
> > I have probably only one solution in my brain: Separate snd_pcm_stop() to 
> > two steps:
> > 
> > 1) stop only the locked stream and unlock it
> > 2) stop all other linked streams
> 
> My proposal is to have a per "linked PCM group" lock and to use it for 
> proper locking (with also some side benefits substituting some use of 
> snd_pcm_link_lock).
> 
> This mean also we need to change the linked list struct from
> 
> struct _snd_pcm_substream {
> 	...
>          /* -- linked substreams -- */
>          snd_pcm_substream_t *link_next;
>          snd_pcm_substream_t *link_prev;
> 
> to
> 	struct list_head link_list;
> 	snd_pcm_link_t *link;	/* 0 for non linked stream */
> 
> typedef struct _snd_pcm_link {
> 	spinlock_t lock;
> 	struct list_head substreams;
> };
> 
> Of course creation and destruction of a snd_pcm_link_t need to be 
> protected by snd_pcm_link_lock.
> 
> 
> -- 
> Abramo Bagnara                       mailto:abramo.bagnara@libero.it
> 
> Opera Unica                          Phone: +39.546.656023
> Via Emilia Interna, 140
> 48014 Castel Bolognese (RA) - Italy
> 



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-23 19:19             ` Jaroslav Kysela
  2003-04-24 10:03               ` Abramo Bagnara
@ 2003-04-28  8:44               ` Jeremy Hall
  1 sibling, 0 replies; 19+ messages in thread
From: Jeremy Hall @ 2003-04-28  8:44 UTC (permalink / raw)
  To: Jaroslav Kysela
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

oh I didn't see this bit before

In the new year, Jaroslav Kysela wrote:
> On Wed, 23 Apr 2003, Jeremy Hall wrote:
> 
> > In the new year, Jaroslav Kysela wrote:
> > > On Tue, 22 Apr 2003, Jeremy Hall wrote:
> > > 
> > > > If both cards try to ru> > > If we stop all linked streams at one time XRUN, application must restart 
> > > all things and the driver will GUERANTEE that the restarted streams are 
> > > in sample sync if hardware supports this feature. I don't see any problem 
> > > in this area.
> > > 
> > yes, if we stop them at one time, but we might not stop them at the "same" 
> > time.
> 
> Note: We don't even stop the stream for once card "precisely" at period 
> boundary (if interrupt arrived late) so it's academic question. Does it 
> hurt something if streams are in different position after XRUN? They must 
> be restarted anyway.
> 
If we know exactly where and on what channels the XRUN occurred, we might 
be able to save more audio and minimize the xrun to the actual xrun time 
rather than the nearest fragment boundary (if that is what we do) if our 
fragment size (if you will allow me to use a deprecated term) is big, say 
4096 frames, then you're telling me that if a xrun happens in that moment 
the whole 4096 frames will be lost? or 8192 or whatever you set your 
fragsize to

> > > And yes, applications might use no-xrun mode. The sample sync is always 
> > > guaranteed, but the transferred stream might be invalid.
> > > 
> > What does jack use?
> 
> Jack uses standard xrun mode and Paul doesn't probably work with multiple 
> cards connected together. He uses RME hardware with 24 channels.

I have two rme9652 connected together synced with wordclock and have 
chosen to cable them so that the 24 channels are spread across both cards 
in such a way that to use 24 channels at 96 all I need to do is press the 
DS button on my adi8DS's and don't need to reconfig the optical cables.  
So in theory I should have hardware synch, but in the case of XRUN it 
seems both cards lose.

I think we are speaking past one another.  You say that sample-synch is 
guaranteed, and that is because both cards have the same fragment size and 
are told to go get one fragment.  Paul has considered the possability that 
the RME9652 may not tell us the hw pointer position properly--maybe there 
is a firmware upgrade for this--maybe not, but is that why we must go on 
fragment synch?

but I am saying that if an XRUN occurs time is no longer a constant, and 
thus samples in the buffer are no longer relative.  Restarting the 
substreams might be similar to closing the barn door after the horse has 
already run away, closing an empty stall.

_J

> 
> 						Jaroslav
> 
> -----
> Jaroslav Kysela <perex@suse.cz>
> Linux Kernel Sound Maintainer
> ALSA Project, SuSE Labs
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/alsa-devel
> 



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-28  8:23                 ` Jeremy Hall
@ 2003-04-28  8:58                   ` Abramo Bagnara
  2003-04-28  9:58                     ` Jaroslav Kysela
  0 siblings, 1 reply; 19+ messages in thread
From: Abramo Bagnara @ 2003-04-28  8:58 UTC (permalink / raw)
  To: Jeremy Hall
  Cc: Jaroslav Kysela, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

Jeremy Hall ha scritto:
> Will this fix allow both CPUs to run snd_pcm_period_elapsed() and friends 
> concurrently and then if XRUN is encountered grab the group lock, or will 
> one stream run at a time, in effect only allowing one card to process at a 
> time?

As I've pointed to Jaroslav, current CVS code does not permit to both 
CPUs to run the interrupt handler of two linked streams concurrently.

This is due to unconditional substitution of stream specific lock with a 
linked group lock.

This is of course deadlock safe, but I think it's not the right solution.

I draw my current proposal (my previous proposal to Jaroslav concerning 
that was flawed):

let's call stream specific locks S and the linked group lock G.

Whomever need to take stream specific lock take S.

When this need to be promoted to G, my proposal is to use code like this:

	if (!spin_trylock(G)) {
		spin_unlock(S);
		spin_lock(G);
		spin_lock(S);
	}

This should solve all deadlock issues while retaining minimal lock 
granularity.

-- 
Abramo Bagnara                       mailto:abramo.bagnara@libero.it

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-28  8:58                   ` Abramo Bagnara
@ 2003-04-28  9:58                     ` Jaroslav Kysela
  2003-04-28 10:11                       ` Abramo Bagnara
  0 siblings, 1 reply; 19+ messages in thread
From: Jaroslav Kysela @ 2003-04-28  9:58 UTC (permalink / raw)
  To: Abramo Bagnara
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

On Mon, 28 Apr 2003, Abramo Bagnara wrote:

> Jeremy Hall ha scritto:
> > Will this fix allow both CPUs to run snd_pcm_period_elapsed() and friends 
> > concurrently and then if XRUN is encountered grab the group lock, or will 
> > one stream run at a time, in effect only allowing one card to process at a 
> > time?
> 
> As I've pointed to Jaroslav, current CVS code does not permit to both 
> CPUs to run the interrupt handler of two linked streams concurrently.
> 
> This is due to unconditional substitution of stream specific lock with a 
> linked group lock.
> 
> This is of course deadlock safe, but I think it's not the right solution.

Sure, it was 1st version ;-)

> I draw my current proposal (my previous proposal to Jaroslav concerning 
> that was flawed):
> 
> let's call stream specific locks S and the linked group lock G.
> 
> Whomever need to take stream specific lock take S.
> 
> When this need to be promoted to G, my proposal is to use code like this:
> 
> 	if (!spin_trylock(G)) {
> 		spin_unlock(S);
> 		spin_lock(G);
> 		spin_lock(S);
> 	}
> 
> This should solve all deadlock issues while retaining minimal lock 
> granularity.

It seems a bit complicated to me. We can do easy this block:

read_lock(link_manager);
spin_lock(stream);
....
if (stream_in_group)
   spin_lock(group);
....
if (stream_in_group)
   spin_unlock(group);
....
spin_unlock(stream);
read_unlock(link_manager);

I cannot imagine any deadlock with this locking scheme, because stream 
locks are independent now.

Code is now in CVS for revision.

						Jaroslav

-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-28  9:58                     ` Jaroslav Kysela
@ 2003-04-28 10:11                       ` Abramo Bagnara
  2003-04-28 12:27                         ` Jaroslav Kysela
  0 siblings, 1 reply; 19+ messages in thread
From: Abramo Bagnara @ 2003-04-28 10:11 UTC (permalink / raw)
  To: Jaroslav Kysela
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

Jaroslav Kysela ha scritto:
> On Mon, 28 Apr 2003, Abramo Bagnara wrote:
> 
> 
>>Jeremy Hall ha scritto:
>>
>>>Will this fix allow both CPUs to run snd_pcm_period_elapsed() and friends 
>>>concurrently and then if XRUN is encountered grab the group lock, or will 
>>>one stream run at a time, in effect only allowing one card to process at a 
>>>time?
>>
>>As I've pointed to Jaroslav, current CVS code does not permit to both 
>>CPUs to run the interrupt handler of two linked streams concurrently.
>>
>>This is due to unconditional substitution of stream specific lock with a 
>>linked group lock.
>>
>>This is of course deadlock safe, but I think it's not the right solution.
> 
> 
> Sure, it was 1st version ;-)

Your missed answer to my objections made me suspected that ;-)

> It seems a bit complicated to me. We can do easy this block:
> 
> read_lock(link_manager);
> spin_lock(stream);
> ....
> if (stream_in_group)
>    spin_lock(group);
> ....
> if (stream_in_group)
>    spin_unlock(group);
> ....
> spin_unlock(stream);
> read_unlock(link_manager);
> 
> I cannot imagine any deadlock with this locking scheme, because stream 
> locks are independent now.

CPU1:

read_lock(link_manager);	read_lock(link_manager);
spin_lock(s1);			spin_lock(s2);
if (stream_in_group)		if (stream_in_group)
	spin_lock(group);		spin_lock(group); /* Waiting */
spin_lock(s2); /* BOOM! */

I guess that 3rd version will be the one I propose ;-)

(but probably the right one will be the forthcoming 4th)

-- 
Abramo Bagnara                       mailto:abramo.bagnara@libero.it

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-28 10:11                       ` Abramo Bagnara
@ 2003-04-28 12:27                         ` Jaroslav Kysela
  2003-04-28 12:47                           ` Abramo Bagnara
  0 siblings, 1 reply; 19+ messages in thread
From: Jaroslav Kysela @ 2003-04-28 12:27 UTC (permalink / raw)
  To: Abramo Bagnara
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

On Mon, 28 Apr 2003, Abramo Bagnara wrote:

> > It seems a bit complicated to me. We can do easy this block:
> > 
> > read_lock(link_manager);
> > spin_lock(stream);
> > ....
> > if (stream_in_group)
> >    spin_lock(group);
> > ....
> > if (stream_in_group)
> >    spin_unlock(group);
> > ....
> > spin_unlock(stream);
> > read_unlock(link_manager);
> > 
> > I cannot imagine any deadlock with this locking scheme, because stream 
> > locks are independent now.
> 
> CPU1:
> 
> read_lock(link_manager);	read_lock(link_manager);
> spin_lock(s1);			spin_lock(s2);
> if (stream_in_group)		if (stream_in_group)
> 	spin_lock(group);		spin_lock(group); /* Waiting */
> spin_lock(s2); /* BOOM! */
> 
> I guess that 3rd version will be the one I propose ;-)

Yep, it is in CVS now ;-) Except I don't use trylock...

read_lock(link_manager);
spin_lock(stream);
....
if (stream_in_group) {
    spin_lock(group);
    spin_unlock(stream);
}
....
link_for_each(s) {	/* note that streams are locked in defined order now */
    if (stream_in_group)
        spin_lock(s);
}
....
list_for_each(s) {
    if (stream_in_group)
        spin_unlock(s);
}
....
if (stream_in_group) {
    spin_lock(stream);
    spin_unlock(group);
}
....
spin_unlock(stream);
read_unlock(link_manager);

						Jaroslav

-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-28 12:27                         ` Jaroslav Kysela
@ 2003-04-28 12:47                           ` Abramo Bagnara
  2003-04-28 18:16                             ` Jaroslav Kysela
  0 siblings, 1 reply; 19+ messages in thread
From: Abramo Bagnara @ 2003-04-28 12:47 UTC (permalink / raw)
  To: Jaroslav Kysela
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

Jaroslav Kysela ha scritto:
>>CPU1:
>>
>>read_lock(link_manager);	read_lock(link_manager);
>>spin_lock(s1);			spin_lock(s2);
>>if (stream_in_group)		if (stream_in_group)
>>	spin_lock(group);		spin_lock(group); /* Waiting */
>>spin_lock(s2); /* BOOM! */
>>
>>I guess that 3rd version will be the one I propose ;-)
> 
> 
> Yep, it is in CVS now ;-) Except I don't use trylock...
> 
> read_lock(link_manager);
> spin_lock(stream);
> ....
> if (stream_in_group) {
>     spin_lock(group);
>     spin_unlock(stream);
> }
> ....
> link_for_each(s) {	/* note that streams are locked in defined order now */
>     if (stream_in_group)
>         spin_lock(s);
> }
> ....
> list_for_each(s) {
>     if (stream_in_group)
>         spin_unlock(s);
> }
> ....
> if (stream_in_group) {
>     spin_lock(stream);
>     spin_unlock(group);
> }
> ....
> spin_unlock(stream);
> read_unlock(link_manager);

CPU1:				CPU2:
read_lock(link_manager);	read_lock(link_manager);
spin_lock(s1);			spin_lock(s2);
spin_lock(group);		spin_lock(group); // waiting
spin_unlock(s1);
//inside list_for_each
spin_lock(s1);
spin_lock(s2); // BOOM!

I fear you cannot avoid trylock ;-)

-- 
Abramo Bagnara                       mailto:abramo.bagnara@libero.it

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-28 12:47                           ` Abramo Bagnara
@ 2003-04-28 18:16                             ` Jaroslav Kysela
  2003-04-28 19:15                               ` Abramo Bagnara
  0 siblings, 1 reply; 19+ messages in thread
From: Jaroslav Kysela @ 2003-04-28 18:16 UTC (permalink / raw)
  To: Abramo Bagnara
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

On Mon, 28 Apr 2003, Abramo Bagnara wrote:

> Jaroslav Kysela ha scritto:
> >>CPU1:
> >>
> >>read_lock(link_manager);	read_lock(link_manager);
> >>spin_lock(s1);			spin_lock(s2);
> >>if (stream_in_group)		if (stream_in_group)
> >>	spin_lock(group);		spin_lock(group); /* Waiting */
> >>spin_lock(s2); /* BOOM! */
> >>
> >>I guess that 3rd version will be the one I propose ;-)
> > 
> > 
> > Yep, it is in CVS now ;-) Except I don't use trylock...
> > 
> > read_lock(link_manager);
> > spin_lock(stream);
> > ....
> > if (stream_in_group) {
> >     spin_lock(group);
> >     spin_unlock(stream);
> > }
> > ....
> > link_for_each(s) {	/* note that streams are locked in defined order now */
> >     if (stream_in_group)
> >         spin_lock(s);
> > }
> > ....
> > list_for_each(s) {
> >     if (stream_in_group)
> >         spin_unlock(s);
> > }
> > ....
> > if (stream_in_group) {
> >     spin_lock(stream);
> >     spin_unlock(group);
> > }
> > ....
> > spin_unlock(stream);
> > read_unlock(link_manager);
> 
> CPU1:				CPU2:
> read_lock(link_manager);	read_lock(link_manager);
> spin_lock(s1);			spin_lock(s2);
> spin_lock(group);		spin_lock(group); // waiting
> spin_unlock(s1);
> //inside list_for_each
> spin_lock(s1);
> spin_lock(s2); // BOOM!
> 
> I fear you cannot avoid trylock ;-)

And this scenario:

read_lock(link_manager);
spin_lock(s1);
if (stream_in_lock) {
  spin_unlock(s1);
  spin_lock(group);
}
...


						Jaroslav

-----
Jaroslav Kysela <perex@suse.cz>
Linux Kernel Sound Maintainer
ALSA Project, SuSE Labs



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: rme9652: possible deadlock
  2003-04-28 18:16                             ` Jaroslav Kysela
@ 2003-04-28 19:15                               ` Abramo Bagnara
  0 siblings, 0 replies; 19+ messages in thread
From: Abramo Bagnara @ 2003-04-28 19:15 UTC (permalink / raw)
  To: Jaroslav Kysela
  Cc: Jeremy Hall, paul@linuxaudiosystems.com,
	alsa-devel@lists.sourceforge.net

Jaroslav Kysela ha scritto:
> On Mon, 28 Apr 2003, Abramo Bagnara wrote:
> 
> 
>>Jaroslav Kysela ha scritto:
>>
>>>>CPU1:
>>>>
>>>>read_lock(link_manager);	read_lock(link_manager);
>>>>spin_lock(s1);			spin_lock(s2);
>>>>if (stream_in_group)		if (stream_in_group)
>>>>	spin_lock(group);		spin_lock(group); /* Waiting */
>>>>spin_lock(s2); /* BOOM! */
>>>>
>>>>I guess that 3rd version will be the one I propose ;-)
>>>
>>>
>>>Yep, it is in CVS now ;-) Except I don't use trylock...
>>>
>>>read_lock(link_manager);
>>>spin_lock(stream);
>>>....
>>>if (stream_in_group) {
>>>    spin_lock(group);
>>>    spin_unlock(stream);
>>>}
>>>....
>>>link_for_each(s) {	/* note that streams are locked in defined order now */
>>>    if (stream_in_group)
>>>        spin_lock(s);
>>>}
>>>....
>>>list_for_each(s) {
>>>    if (stream_in_group)
>>>        spin_unlock(s);
>>>}
>>>....
>>>if (stream_in_group) {
>>>    spin_lock(stream);
>>>    spin_unlock(group);
>>>}
>>>....
>>>spin_unlock(stream);
>>>read_unlock(link_manager);
>>
>>CPU1:				CPU2:
>>read_lock(link_manager);	read_lock(link_manager);
>>spin_lock(s1);			spin_lock(s2);
>>spin_lock(group);		spin_lock(group); // waiting
>>spin_unlock(s1);
>>//inside list_for_each
>>spin_lock(s1);
>>spin_lock(s2); // BOOM!
>>
>>I fear you cannot avoid trylock ;-)
> 
> 
> And this scenario:
> 
> read_lock(link_manager);
> spin_lock(s1);
> if (stream_in_lock) {
>   spin_unlock(s1);
>   spin_lock(group);
> }

It's unsafe (just like my solution) but without a way to knows that 
stream state might be changed (unlike my solution).

An example: on enter of snd_pcm_update_hw_ptr_interrupt stream state is 
RUNNING but on exit it might be XRUN.

The only way to solve all that is to accurately invalid any implicitly 
cached info across call to functions that promoting a single stream lock 
to a group lock have waited on group lock.

Of course you might enlarge this criteria to "across call to functions 
that might promote a single stream lock to a group lock". In this case 
your solution is fine too.

As a side note I've noted that almost all calls to SND_PCM_ACTION take 
the single lock just before. This is definitely a non sense: taking the 
group lock at 1st is the right solution. This imply to have two 
SND_PCM_ACTION: one for already locked groups and another for lock 
promotion.

-- 
Abramo Bagnara                       mailto:abramo.bagnara@libero.it

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-04-28 19:15 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-14 18:47 rme9652: possible deadlock Jeremy Hall
2003-04-22 14:16 ` Jaroslav Kysela
2003-04-22 19:22   ` Jeremy Hall
2003-04-22 19:42     ` Jaroslav Kysela
2003-04-22 22:04       ` Jeremy Hall
2003-04-23  9:51         ` Jaroslav Kysela
2003-04-23 15:06           ` Jeremy Hall
2003-04-23 19:19             ` Jaroslav Kysela
2003-04-24 10:03               ` Abramo Bagnara
2003-04-28  8:23                 ` Jeremy Hall
2003-04-28  8:58                   ` Abramo Bagnara
2003-04-28  9:58                     ` Jaroslav Kysela
2003-04-28 10:11                       ` Abramo Bagnara
2003-04-28 12:27                         ` Jaroslav Kysela
2003-04-28 12:47                           ` Abramo Bagnara
2003-04-28 18:16                             ` Jaroslav Kysela
2003-04-28 19:15                               ` Abramo Bagnara
2003-04-28  8:25                 ` Jeremy Hall
2003-04-28  8:44               ` Jeremy Hall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.