another locking issue with nonatomic ops?

All of lore.kernel.org
 help / color / mirror / Atom feed

* another locking issue with nonatomic ops?
@ 2016-02-16 15:37 Vinod Koul
  2016-02-16 16:01 ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Vinod Koul @ 2016-02-16 15:37 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: patches.audio, alsa-devel


Hi Takashi,

We found another issue with non atomic ops.

This occurs when we have snd_pcm_do_prepare being invoked which acquires
the lock snd_pcm_link_rwsem twice, first in snd_pcm_common_ioctl1, and then again in
dpcm_dapm_stream_event. Normally this is fine, but if we have another stream
get closed in between two read accesses we get stuck.

First thread:

[  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1	0x20020004
[  419.665946]  ffff8800704a3a60 00000000704a3a60 ffff88006d1fddd0 ffffffff8252ffd8
[  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff ffffffff8252ffd8
[  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987 ffffffffffffffff
[  419.683574] Call Trace:
[  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
[  419.692306]  [<ffffffff81ca6f6d>] rwsem_down_read_failed+0xdd/0x130
[  419.701050]  [<ffffffff814df454>] call_rwsem_down_read_failed+0x14/0x30
[  419.709758]  [<ffffffff810e793b>] ? down_read_nested+0x5b/0x70
[  419.709805]  [<ffffffff81adfda9>] ? snd_pcm_stream_lock+0xa9/0x120
[  419.723012]  [<ffffffff81adfda9>] snd_pcm_stream_lock+0xa9/0x120

< ============     2nd Read lock is acquired here, without releasing previous one.

[  419.723026]  [<ffffffff81adfe4c>] snd_pcm_stream_lock_irq+0x2c/0x30
[  419.731801]  [<ffffffff81b01f5e>] ? dpcm_dapm_stream_event+0xbe/0xd0
[  419.740565]  [<ffffffff81b0079c>] dpcm_set_fe_update_state+0x3c/0xb0
[  419.749252]  [<ffffffff81b04514>] dpcm_fe_dai_prepare+0xc4/0x150
[  419.749301]  [<ffffffff81adf7eb>] snd_pcm_do_prepare+0x1b/0x30
[  419.758083]  [<ffffffff81adf3bf>] snd_pcm_action_single+0x2f/0x70
[  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
[  419.766962]  [<ffffffff81adf577>] ? snd_pcm_action_nonatomic+0x27/0x80
[  419.775565]  [<ffffffff81adf5c6>] snd_pcm_action_nonatomic+0x76/0x80

< ============     1st Read lock is acquired here

[  419.784419]  [<ffffffff81ae2c62>] snd_pcm_common_ioctl1+0x802/0xd30
[  419.784495]  [<ffffffff81ae332b>] snd_pcm_playback_ioctl1+0x19b/0x280
[  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
[  419.801943]  [<ffffffff81ae3dca>] snd_pcm_ioctl_compat+0x41a/0x770
[  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
[  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
[  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
[  419.819535]  [<ffffffff810eb696>] ? trace_hardirqs_on_caller+0x16/0x210
[  419.828306]  [<ffffffff814df4f8>] ? trace_hardirqs_on_thunk+0x17/0x19
[  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f


Second thread:
[  419.543062] mediaserver     D ffff880170957cd8     0  3690      1	0x20020004
[  419.543191]  ffff880170957cd8 0000000070957cd8 ffff88006de49ad0 ffffffff8252ffd8
[  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0 ffffffff00000001
[  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987 ffffffff8252ffc0
[  419.569368] Call Trace:
[  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
[  419.578026]  [<ffffffff81ca71f5>] rwsem_down_write_failed+0x235/0x450
[  419.586884]  [<ffffffff81ca7005>] ? rwsem_down_write_failed+0x45/0x450
[  419.586941]  [<ffffffff814df483>] call_rwsem_down_write_failed+0x13/0x20
[  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80

< ============     Write lock acquired here. If this occurs between two read
lock's then it's hangs

[  419.604428]  [<ffffffff81aded58>] ? snd_pcm_unlink+0x28/0x120
[  419.604465]  [<ffffffff81aded58>] snd_pcm_unlink+0x28/0x120
[  419.613215]  [<ffffffff81adee5e>] pcm_release_private+0xe/0x10
[  419.622058]  [<ffffffff81ae0d57>] snd_pcm_release_substream.part.34+0x67/0x90
[  419.622143]  [<ffffffff81ae0e68>] snd_pcm_release+0xb8/0xc0
[  419.630889]  [<ffffffff8120f1aa>] __fput+0xea/0x200
[  419.639680]  [<ffffffff8120f30e>] ____fput+0xe/0x10
[  419.639744]  [<ffffffff810ba43c>] task_work_run+0xcc/0x100
[  419.648435]  [<ffffffff81002e41>] do_notify_resume+0x51/0x70
[  419.657205]  [<ffffffff81ca986c>] int_signal+0x12/0x17

Thanks
-- 
~Vinod

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 15:37 another locking issue with nonatomic ops? Vinod Koul
@ 2016-02-16 16:01 ` Takashi Iwai
  2016-02-16 16:10   ` Vinod Koul
  0 siblings, 1 reply; 14+ messages in thread
From: Takashi Iwai @ 2016-02-16 16:01 UTC (permalink / raw)
  To: Vinod Koul; +Cc: patches.audio, alsa-devel

On Tue, 16 Feb 2016 16:37:11 +0100,
Vinod Koul wrote:
> 
> 
> Hi Takashi,
> 
> We found another issue with non atomic ops.
> 
> This occurs when we have snd_pcm_do_prepare being invoked which acquires
> the lock snd_pcm_link_rwsem twice, first in snd_pcm_common_ioctl1, and then again in
> dpcm_dapm_stream_event. Normally this is fine, but if we have another stream
> get closed in between two read accesses we get stuck.
> 
> First thread:
> 
> [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1	0x20020004
> [  419.665946]  ffff8800704a3a60 00000000704a3a60 ffff88006d1fddd0 ffffffff8252ffd8
> [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff ffffffff8252ffd8
> [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987 ffffffffffffffff
> [  419.683574] Call Trace:
> [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> [  419.692306]  [<ffffffff81ca6f6d>] rwsem_down_read_failed+0xdd/0x130
> [  419.701050]  [<ffffffff814df454>] call_rwsem_down_read_failed+0x14/0x30
> [  419.709758]  [<ffffffff810e793b>] ? down_read_nested+0x5b/0x70
> [  419.709805]  [<ffffffff81adfda9>] ? snd_pcm_stream_lock+0xa9/0x120
> [  419.723012]  [<ffffffff81adfda9>] snd_pcm_stream_lock+0xa9/0x120
> 
> < ============     2nd Read lock is acquired here, without releasing previous one.
> 
> [  419.723026]  [<ffffffff81adfe4c>] snd_pcm_stream_lock_irq+0x2c/0x30
> [  419.731801]  [<ffffffff81b01f5e>] ? dpcm_dapm_stream_event+0xbe/0xd0
> [  419.740565]  [<ffffffff81b0079c>] dpcm_set_fe_update_state+0x3c/0xb0
> [  419.749252]  [<ffffffff81b04514>] dpcm_fe_dai_prepare+0xc4/0x150
> [  419.749301]  [<ffffffff81adf7eb>] snd_pcm_do_prepare+0x1b/0x30
> [  419.758083]  [<ffffffff81adf3bf>] snd_pcm_action_single+0x2f/0x70
> [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> [  419.766962]  [<ffffffff81adf577>] ? snd_pcm_action_nonatomic+0x27/0x80
> [  419.775565]  [<ffffffff81adf5c6>] snd_pcm_action_nonatomic+0x76/0x80
> 
> < ============     1st Read lock is acquired here
> 
> [  419.784419]  [<ffffffff81ae2c62>] snd_pcm_common_ioctl1+0x802/0xd30
> [  419.784495]  [<ffffffff81ae332b>] snd_pcm_playback_ioctl1+0x19b/0x280
> [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> [  419.801943]  [<ffffffff81ae3dca>] snd_pcm_ioctl_compat+0x41a/0x770
> [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> [  419.819535]  [<ffffffff810eb696>] ? trace_hardirqs_on_caller+0x16/0x210
> [  419.828306]  [<ffffffff814df4f8>] ? trace_hardirqs_on_thunk+0x17/0x19
> [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> 
> 
> Second thread:
> [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1	0x20020004
> [  419.543191]  ffff880170957cd8 0000000070957cd8 ffff88006de49ad0 ffffffff8252ffd8
> [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0 ffffffff00000001
> [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987 ffffffff8252ffc0
> [  419.569368] Call Trace:
> [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> [  419.578026]  [<ffffffff81ca71f5>] rwsem_down_write_failed+0x235/0x450
> [  419.586884]  [<ffffffff81ca7005>] ? rwsem_down_write_failed+0x45/0x450
> [  419.586941]  [<ffffffff814df483>] call_rwsem_down_write_failed+0x13/0x20
> [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> 
> < ============     Write lock acquired here. If this occurs between two read
> lock's then it's hangs

So what's wrong?  The two read locks are still running, and
snd_pcm_release() waits until these two read locks are freed.
The read double-locks are fine, and it's intended behavior.

The problem is apparently not there, but somewhere else.
Any other thread or call involved?


Takashi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 16:01 ` Takashi Iwai
@ 2016-02-16 16:10   ` Vinod Koul
  2016-02-16 16:27     ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Vinod Koul @ 2016-02-16 16:10 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: patches.audio, alsa-devel

On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
> On Tue, 16 Feb 2016 16:37:11 +0100,
> Vinod Koul wrote:
> > 
> > 
> > Hi Takashi,
> > 
> > We found another issue with non atomic ops.
> > 
> > This occurs when we have snd_pcm_do_prepare being invoked which acquires
> > the lock snd_pcm_link_rwsem twice, first in snd_pcm_common_ioctl1, and then again in
> > dpcm_dapm_stream_event. Normally this is fine, but if we have another stream
> > get closed in between two read accesses we get stuck.
> > 
> > First thread:
> > 
> > [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1	0x20020004
> > [  419.665946]  ffff8800704a3a60 00000000704a3a60 ffff88006d1fddd0 ffffffff8252ffd8
> > [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff ffffffff8252ffd8
> > [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987 ffffffffffffffff
> > [  419.683574] Call Trace:
> > [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > [  419.692306]  [<ffffffff81ca6f6d>] rwsem_down_read_failed+0xdd/0x130
> > [  419.701050]  [<ffffffff814df454>] call_rwsem_down_read_failed+0x14/0x30
> > [  419.709758]  [<ffffffff810e793b>] ? down_read_nested+0x5b/0x70
> > [  419.709805]  [<ffffffff81adfda9>] ? snd_pcm_stream_lock+0xa9/0x120
> > [  419.723012]  [<ffffffff81adfda9>] snd_pcm_stream_lock+0xa9/0x120
> > 
> > < ============     2nd Read lock is acquired here, without releasing previous one.
> > 
> > [  419.723026]  [<ffffffff81adfe4c>] snd_pcm_stream_lock_irq+0x2c/0x30
> > [  419.731801]  [<ffffffff81b01f5e>] ? dpcm_dapm_stream_event+0xbe/0xd0
> > [  419.740565]  [<ffffffff81b0079c>] dpcm_set_fe_update_state+0x3c/0xb0
> > [  419.749252]  [<ffffffff81b04514>] dpcm_fe_dai_prepare+0xc4/0x150
> > [  419.749301]  [<ffffffff81adf7eb>] snd_pcm_do_prepare+0x1b/0x30
> > [  419.758083]  [<ffffffff81adf3bf>] snd_pcm_action_single+0x2f/0x70
> > [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> > [  419.766962]  [<ffffffff81adf577>] ? snd_pcm_action_nonatomic+0x27/0x80
> > [  419.775565]  [<ffffffff81adf5c6>] snd_pcm_action_nonatomic+0x76/0x80
> > 
> > < ============     1st Read lock is acquired here
> > 
> > [  419.784419]  [<ffffffff81ae2c62>] snd_pcm_common_ioctl1+0x802/0xd30
> > [  419.784495]  [<ffffffff81ae332b>] snd_pcm_playback_ioctl1+0x19b/0x280
> > [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > [  419.801943]  [<ffffffff81ae3dca>] snd_pcm_ioctl_compat+0x41a/0x770
> > [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> > [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> > [  419.819535]  [<ffffffff810eb696>] ? trace_hardirqs_on_caller+0x16/0x210
> > [  419.828306]  [<ffffffff814df4f8>] ? trace_hardirqs_on_thunk+0x17/0x19
> > [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> > 
> > 
> > Second thread:
> > [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1	0x20020004
> > [  419.543191]  ffff880170957cd8 0000000070957cd8 ffff88006de49ad0 ffffffff8252ffd8
> > [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0 ffffffff00000001
> > [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987 ffffffff8252ffc0
> > [  419.569368] Call Trace:
> > [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > [  419.578026]  [<ffffffff81ca71f5>] rwsem_down_write_failed+0x235/0x450
> > [  419.586884]  [<ffffffff81ca7005>] ? rwsem_down_write_failed+0x45/0x450
> > [  419.586941]  [<ffffffff814df483>] call_rwsem_down_write_failed+0x13/0x20
> > [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> > 
> > < ============     Write lock acquired here. If this occurs between two read
> > lock's then it's hangs
> 
> So what's wrong?  The two read locks are still running, and
> snd_pcm_release() waits until these two read locks are freed.
> The read double-locks are fine, and it's intended behavior.

Yes that is true

> The problem is apparently not there, but somewhere else.
> Any other thread or call involved?

The second backtrace above is the 2nd thread which does down_write. That
gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
read is also stuck and we are in deadlock.

down_read (1st acquired)
					down_write (waiting due to down_read)
down_read( 2nd,,, stuck)

Thanks
-- 
~Vinod

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 16:10   ` Vinod Koul
@ 2016-02-16 16:27     ` Takashi Iwai
  2016-02-16 16:40       ` Vinod Koul
  2016-02-16 16:40       ` Babu, Ramesh
  0 siblings, 2 replies; 14+ messages in thread
From: Takashi Iwai @ 2016-02-16 16:27 UTC (permalink / raw)
  To: Vinod Koul; +Cc: patches.audio, alsa-devel

On Tue, 16 Feb 2016 17:10:43 +0100,
Vinod Koul wrote:
> 
> On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
> > On Tue, 16 Feb 2016 16:37:11 +0100,
> > Vinod Koul wrote:
> > > 
> > > 
> > > Hi Takashi,
> > > 
> > > We found another issue with non atomic ops.
> > > 
> > > This occurs when we have snd_pcm_do_prepare being invoked which acquires
> > > the lock snd_pcm_link_rwsem twice, first in snd_pcm_common_ioctl1, and then again in
> > > dpcm_dapm_stream_event. Normally this is fine, but if we have another stream
> > > get closed in between two read accesses we get stuck.
> > > 
> > > First thread:
> > > 
> > > [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1	0x20020004
> > > [  419.665946]  ffff8800704a3a60 00000000704a3a60 ffff88006d1fddd0 ffffffff8252ffd8
> > > [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff ffffffff8252ffd8
> > > [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987 ffffffffffffffff
> > > [  419.683574] Call Trace:
> > > [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > [  419.692306]  [<ffffffff81ca6f6d>] rwsem_down_read_failed+0xdd/0x130
> > > [  419.701050]  [<ffffffff814df454>] call_rwsem_down_read_failed+0x14/0x30
> > > [  419.709758]  [<ffffffff810e793b>] ? down_read_nested+0x5b/0x70
> > > [  419.709805]  [<ffffffff81adfda9>] ? snd_pcm_stream_lock+0xa9/0x120
> > > [  419.723012]  [<ffffffff81adfda9>] snd_pcm_stream_lock+0xa9/0x120
> > > 
> > > < ============     2nd Read lock is acquired here, without releasing previous one.
> > > 
> > > [  419.723026]  [<ffffffff81adfe4c>] snd_pcm_stream_lock_irq+0x2c/0x30
> > > [  419.731801]  [<ffffffff81b01f5e>] ? dpcm_dapm_stream_event+0xbe/0xd0
> > > [  419.740565]  [<ffffffff81b0079c>] dpcm_set_fe_update_state+0x3c/0xb0
> > > [  419.749252]  [<ffffffff81b04514>] dpcm_fe_dai_prepare+0xc4/0x150
> > > [  419.749301]  [<ffffffff81adf7eb>] snd_pcm_do_prepare+0x1b/0x30
> > > [  419.758083]  [<ffffffff81adf3bf>] snd_pcm_action_single+0x2f/0x70
> > > [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> > > [  419.766962]  [<ffffffff81adf577>] ? snd_pcm_action_nonatomic+0x27/0x80
> > > [  419.775565]  [<ffffffff81adf5c6>] snd_pcm_action_nonatomic+0x76/0x80
> > > 
> > > < ============     1st Read lock is acquired here
> > > 
> > > [  419.784419]  [<ffffffff81ae2c62>] snd_pcm_common_ioctl1+0x802/0xd30
> > > [  419.784495]  [<ffffffff81ae332b>] snd_pcm_playback_ioctl1+0x19b/0x280
> > > [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > [  419.801943]  [<ffffffff81ae3dca>] snd_pcm_ioctl_compat+0x41a/0x770
> > > [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> > > [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> > > [  419.819535]  [<ffffffff810eb696>] ? trace_hardirqs_on_caller+0x16/0x210
> > > [  419.828306]  [<ffffffff814df4f8>] ? trace_hardirqs_on_thunk+0x17/0x19
> > > [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> > > 
> > > 
> > > Second thread:
> > > [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1	0x20020004
> > > [  419.543191]  ffff880170957cd8 0000000070957cd8 ffff88006de49ad0 ffffffff8252ffd8
> > > [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0 ffffffff00000001
> > > [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987 ffffffff8252ffc0
> > > [  419.569368] Call Trace:
> > > [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > [  419.578026]  [<ffffffff81ca71f5>] rwsem_down_write_failed+0x235/0x450
> > > [  419.586884]  [<ffffffff81ca7005>] ? rwsem_down_write_failed+0x45/0x450
> > > [  419.586941]  [<ffffffff814df483>] call_rwsem_down_write_failed+0x13/0x20
> > > [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> > > 
> > > < ============     Write lock acquired here. If this occurs between two read
> > > lock's then it's hangs
> > 
> > So what's wrong?  The two read locks are still running, and
> > snd_pcm_release() waits until these two read locks are freed.
> > The read double-locks are fine, and it's intended behavior.
> 
> Yes that is true
> 
> > The problem is apparently not there, but somewhere else.
> > Any other thread or call involved?
> 
> The second backtrace above is the 2nd thread which does down_write. That
> gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
> read is also stuck and we are in deadlock.
> 
> down_read (1st acquired)
> 					down_write (waiting due to down_read)
> down_read( 2nd,,, stuck)

Hmm, so down_read() takes care of the pending down_write()?
Then it's a problem.  Maybe the limitation of the current rwsem
implementation.

Does the patch below work?  It's suboptimal, just a proof of concept.


Takashi

---
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index fadd3eb8e8bb..8b45edd218a4 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -1813,7 +1813,8 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
 		res = -ENOMEM;
 		goto _nolock;
 	}
-	down_write(&snd_pcm_link_rwsem);
+	while (!down_write_trylock(&snd_pcm_link_rwsem))
+		cpu_relax();
 	write_lock_irq(&snd_pcm_link_rwlock);
 	if (substream->runtime->status->state == SNDRV_PCM_STATE_OPEN ||
 	    substream->runtime->status->state != substream1->runtime->status->state ||
@@ -1860,7 +1861,8 @@ static int snd_pcm_unlink(struct snd_pcm_substream *substream)
 	struct snd_pcm_substream *s;
 	int res = 0;
 
-	down_write(&snd_pcm_link_rwsem);
+	while (!down_write_trylock(&snd_pcm_link_rwsem))
+		cpu_relax();
 	write_lock_irq(&snd_pcm_link_rwlock);
 	if (!snd_pcm_stream_linked(substream)) {
 		res = -EALREADY;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 16:27     ` Takashi Iwai
@ 2016-02-16 16:40       ` Vinod Koul
  2016-02-16 17:17         ` Babu, Ramesh
  2016-02-16 16:40       ` Babu, Ramesh
  1 sibling, 1 reply; 14+ messages in thread
From: Vinod Koul @ 2016-02-16 16:40 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: patches.audio, alsa-devel

On Tue, Feb 16, 2016 at 05:27:44PM +0100, Takashi Iwai wrote:
> On Tue, 16 Feb 2016 17:10:43 +0100,
> Vinod Koul wrote:
> > 
> > On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
> > > On Tue, 16 Feb 2016 16:37:11 +0100,
> > > Vinod Koul wrote:
> > > > 
> > > > 
> > > > Hi Takashi,
> > > > 
> > > > We found another issue with non atomic ops.
> > > > 
> > > > This occurs when we have snd_pcm_do_prepare being invoked which acquires
> > > > the lock snd_pcm_link_rwsem twice, first in snd_pcm_common_ioctl1, and then again in
> > > > dpcm_dapm_stream_event. Normally this is fine, but if we have another stream
> > > > get closed in between two read accesses we get stuck.
> > > > 
> > > > First thread:
> > > > 
> > > > [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1	0x20020004
> > > > [  419.665946]  ffff8800704a3a60 00000000704a3a60 ffff88006d1fddd0 ffffffff8252ffd8
> > > > [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff ffffffff8252ffd8
> > > > [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987 ffffffffffffffff
> > > > [  419.683574] Call Trace:
> > > > [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > [  419.692306]  [<ffffffff81ca6f6d>] rwsem_down_read_failed+0xdd/0x130
> > > > [  419.701050]  [<ffffffff814df454>] call_rwsem_down_read_failed+0x14/0x30
> > > > [  419.709758]  [<ffffffff810e793b>] ? down_read_nested+0x5b/0x70
> > > > [  419.709805]  [<ffffffff81adfda9>] ? snd_pcm_stream_lock+0xa9/0x120
> > > > [  419.723012]  [<ffffffff81adfda9>] snd_pcm_stream_lock+0xa9/0x120
> > > > 
> > > > < ============     2nd Read lock is acquired here, without releasing previous one.
> > > > 
> > > > [  419.723026]  [<ffffffff81adfe4c>] snd_pcm_stream_lock_irq+0x2c/0x30
> > > > [  419.731801]  [<ffffffff81b01f5e>] ? dpcm_dapm_stream_event+0xbe/0xd0
> > > > [  419.740565]  [<ffffffff81b0079c>] dpcm_set_fe_update_state+0x3c/0xb0
> > > > [  419.749252]  [<ffffffff81b04514>] dpcm_fe_dai_prepare+0xc4/0x150
> > > > [  419.749301]  [<ffffffff81adf7eb>] snd_pcm_do_prepare+0x1b/0x30
> > > > [  419.758083]  [<ffffffff81adf3bf>] snd_pcm_action_single+0x2f/0x70
> > > > [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> > > > [  419.766962]  [<ffffffff81adf577>] ? snd_pcm_action_nonatomic+0x27/0x80
> > > > [  419.775565]  [<ffffffff81adf5c6>] snd_pcm_action_nonatomic+0x76/0x80
> > > > 
> > > > < ============     1st Read lock is acquired here
> > > > 
> > > > [  419.784419]  [<ffffffff81ae2c62>] snd_pcm_common_ioctl1+0x802/0xd30
> > > > [  419.784495]  [<ffffffff81ae332b>] snd_pcm_playback_ioctl1+0x19b/0x280
> > > > [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > [  419.801943]  [<ffffffff81ae3dca>] snd_pcm_ioctl_compat+0x41a/0x770
> > > > [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> > > > [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> > > > [  419.819535]  [<ffffffff810eb696>] ? trace_hardirqs_on_caller+0x16/0x210
> > > > [  419.828306]  [<ffffffff814df4f8>] ? trace_hardirqs_on_thunk+0x17/0x19
> > > > [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> > > > 
> > > > 
> > > > Second thread:
> > > > [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1	0x20020004
> > > > [  419.543191]  ffff880170957cd8 0000000070957cd8 ffff88006de49ad0 ffffffff8252ffd8
> > > > [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0 ffffffff00000001
> > > > [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987 ffffffff8252ffc0
> > > > [  419.569368] Call Trace:
> > > > [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > [  419.578026]  [<ffffffff81ca71f5>] rwsem_down_write_failed+0x235/0x450
> > > > [  419.586884]  [<ffffffff81ca7005>] ? rwsem_down_write_failed+0x45/0x450
> > > > [  419.586941]  [<ffffffff814df483>] call_rwsem_down_write_failed+0x13/0x20
> > > > [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> > > > 
> > > > < ============     Write lock acquired here. If this occurs between two read
> > > > lock's then it's hangs
> > > 
> > > So what's wrong?  The two read locks are still running, and
> > > snd_pcm_release() waits until these two read locks are freed.
> > > The read double-locks are fine, and it's intended behavior.
> > 
> > Yes that is true
> > 
> > > The problem is apparently not there, but somewhere else.
> > > Any other thread or call involved?
> > 
> > The second backtrace above is the 2nd thread which does down_write. That
> > gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
> > read is also stuck and we are in deadlock.
> > 
> > down_read (1st acquired)
> > 					down_write (waiting due to down_read)
> > down_read( 2nd,,, stuck)
> 
> Hmm, so down_read() takes care of the pending down_write()?
> Then it's a problem.  Maybe the limitation of the current rwsem
> implementation.

Yes that is how things seem right now.
> 
> Does the patch below work?  It's suboptimal, just a proof of concept.

Okay will try first thing tomorrow and let you know.

> ---
> diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> index fadd3eb8e8bb..8b45edd218a4 100644
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -1813,7 +1813,8 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
>  		res = -ENOMEM;
>  		goto _nolock;
>  	}
> -	down_write(&snd_pcm_link_rwsem);
> +	while (!down_write_trylock(&snd_pcm_link_rwsem))
> +		cpu_relax();
>  	write_lock_irq(&snd_pcm_link_rwlock);
>  	if (substream->runtime->status->state == SNDRV_PCM_STATE_OPEN ||
>  	    substream->runtime->status->state != substream1->runtime->status->state ||
> @@ -1860,7 +1861,8 @@ static int snd_pcm_unlink(struct snd_pcm_substream *substream)
>  	struct snd_pcm_substream *s;
>  	int res = 0;
>  
> -	down_write(&snd_pcm_link_rwsem);
> +	while (!down_write_trylock(&snd_pcm_link_rwsem))
> +		cpu_relax();
>  	write_lock_irq(&snd_pcm_link_rwlock);
>  	if (!snd_pcm_stream_linked(substream)) {
>  		res = -EALREADY;

-- 
~Vinod

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 16:27     ` Takashi Iwai
  2016-02-16 16:40       ` Vinod Koul
@ 2016-02-16 16:40       ` Babu, Ramesh
  2016-02-16 16:46         ` Takashi Iwai
  1 sibling, 1 reply; 14+ messages in thread
From: Babu, Ramesh @ 2016-02-16 16:40 UTC (permalink / raw)
  To: Takashi Iwai, Koul, Vinod; +Cc: Patches Audio, alsa-devel@alsa-project.org

> On Tue, 16 Feb 2016 17:10:43 +0100,
> Vinod Koul wrote:
> >
> > On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
> > > On Tue, 16 Feb 2016 16:37:11 +0100,
> > > Vinod Koul wrote:
> > > >
> > > >
> > > > Hi Takashi,
> > > >
> > > > We found another issue with non atomic ops.
> > > >
> > > > This occurs when we have snd_pcm_do_prepare being invoked which
> acquires
> > > > the lock snd_pcm_link_rwsem twice, first in snd_pcm_common_ioctl1,
> and then again in
> > > > dpcm_dapm_stream_event. Normally this is fine, but if we have
> another stream
> > > > get closed in between two read accesses we get stuck.
> > > >
> > > > First thread:
> > > >
> > > > [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1
> 	0x20020004
> > > > [  419.665946]  ffff8800704a3a60 00000000704a3a60 ffff88006d1fddd0
> ffffffff8252ffd8
> > > > [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff
> ffffffff8252ffd8
> > > > [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987
> ffffffffffffffff
> > > > [  419.683574] Call Trace:
> > > > [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > [  419.692306]  [<ffffffff81ca6f6d>]
> rwsem_down_read_failed+0xdd/0x130
> > > > [  419.701050]  [<ffffffff814df454>]
> call_rwsem_down_read_failed+0x14/0x30
> > > > [  419.709758]  [<ffffffff810e793b>] ? down_read_nested+0x5b/0x70
> > > > [  419.709805]  [<ffffffff81adfda9>] ?
> snd_pcm_stream_lock+0xa9/0x120
> > > > [  419.723012]  [<ffffffff81adfda9>] snd_pcm_stream_lock+0xa9/0x120
> > > >
> > > > < ============     2nd Read lock is acquired here, without releasing
> previous one.
> > > >
> > > > [  419.723026]  [<ffffffff81adfe4c>]
> snd_pcm_stream_lock_irq+0x2c/0x30
> > > > [  419.731801]  [<ffffffff81b01f5e>] ?
> dpcm_dapm_stream_event+0xbe/0xd0
> > > > [  419.740565]  [<ffffffff81b0079c>]
> dpcm_set_fe_update_state+0x3c/0xb0
> > > > [  419.749252]  [<ffffffff81b04514>] dpcm_fe_dai_prepare+0xc4/0x150
> > > > [  419.749301]  [<ffffffff81adf7eb>] snd_pcm_do_prepare+0x1b/0x30
> > > > [  419.758083]  [<ffffffff81adf3bf>] snd_pcm_action_single+0x2f/0x70
> > > > [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> > > > [  419.766962]  [<ffffffff81adf577>] ?
> snd_pcm_action_nonatomic+0x27/0x80
> > > > [  419.775565]  [<ffffffff81adf5c6>]
> snd_pcm_action_nonatomic+0x76/0x80
> > > >
> > > > < ============     1st Read lock is acquired here
> > > >
> > > > [  419.784419]  [<ffffffff81ae2c62>]
> snd_pcm_common_ioctl1+0x802/0xd30
> > > > [  419.784495]  [<ffffffff81ae332b>]
> snd_pcm_playback_ioctl1+0x19b/0x280
> > > > [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > [  419.801943]  [<ffffffff81ae3dca>]
> snd_pcm_ioctl_compat+0x41a/0x770
> > > > [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> > > > [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> > > > [  419.819535]  [<ffffffff810eb696>] ?
> trace_hardirqs_on_caller+0x16/0x210
> > > > [  419.828306]  [<ffffffff814df4f8>] ?
> trace_hardirqs_on_thunk+0x17/0x19
> > > > [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> > > >
> > > >
> > > > Second thread:
> > > > [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1
> 	0x20020004
> > > > [  419.543191]  ffff880170957cd8 0000000070957cd8 ffff88006de49ad0
> ffffffff8252ffd8
> > > > [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0
> ffffffff00000001
> > > > [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987
> ffffffff8252ffc0
> > > > [  419.569368] Call Trace:
> > > > [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > [  419.578026]  [<ffffffff81ca71f5>]
> rwsem_down_write_failed+0x235/0x450
> > > > [  419.586884]  [<ffffffff81ca7005>] ?
> rwsem_down_write_failed+0x45/0x450
> > > > [  419.586941]  [<ffffffff814df483>]
> call_rwsem_down_write_failed+0x13/0x20
> > > > [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> > > >
> > > > < ============     Write lock acquired here. If this occurs between
> two read
> > > > lock's then it's hangs
> > >
> > > So what's wrong?  The two read locks are still running, and
> > > snd_pcm_release() waits until these two read locks are freed.
> > > The read double-locks are fine, and it's intended behavior.
> >
> > Yes that is true
> >
> > > The problem is apparently not there, but somewhere else.
> > > Any other thread or call involved?
> >
> > The second backtrace above is the 2nd thread which does down_write.
> That
> > gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
> > read is also stuck and we are in deadlock.
> >
> > down_read (1st acquired)
> > 					down_write (waiting due to
> down_read)
> > down_read( 2nd,,, stuck)
> 
> Hmm, so down_read() takes care of the pending down_write()?
> Then it's a problem.  Maybe the limitation of the current rwsem
> implementation.
> 
down_write() will make sure no further read locks are allowed, till it get
executed.
So here the 2nd down_read() will get blocked, since down_write has higher
priority.

> Does the patch below work?  It's suboptimal, just a proof of concept.
> 
> 
> Takashi
> 
> ---
> diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> index fadd3eb8e8bb..8b45edd218a4 100644
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -1813,7 +1813,8 @@ static int snd_pcm_link(struct snd_pcm_substream
> *substream, int fd)
>  		res = -ENOMEM;
>  		goto _nolock;
>  	}
> -	down_write(&snd_pcm_link_rwsem);
> +	while (!down_write_trylock(&snd_pcm_link_rwsem))
> +		cpu_relax();
>  	write_lock_irq(&snd_pcm_link_rwlock);
>  	if (substream->runtime->status->state ==
> SNDRV_PCM_STATE_OPEN ||
>  	    substream->runtime->status->state != substream1->runtime-
> >status->state ||
> @@ -1860,7 +1861,8 @@ static int snd_pcm_unlink(struct
> snd_pcm_substream *substream)
>  	struct snd_pcm_substream *s;
>  	int res = 0;
> 
> -	down_write(&snd_pcm_link_rwsem);
> +	while (!down_write_trylock(&snd_pcm_link_rwsem))
> +		cpu_relax();
>  	write_lock_irq(&snd_pcm_link_rwlock);
>  	if (!snd_pcm_stream_linked(substream)) {
>  		res = -EALREADY;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 16:40       ` Babu, Ramesh
@ 2016-02-16 16:46         ` Takashi Iwai
  2016-02-16 17:15           ` Babu, Ramesh
  0 siblings, 1 reply; 14+ messages in thread
From: Takashi Iwai @ 2016-02-16 16:46 UTC (permalink / raw)
  To: Babu, Ramesh; +Cc: Koul, Vinod, Patches Audio, alsa-devel@alsa-project.org

On Tue, 16 Feb 2016 17:40:19 +0100,
Babu, Ramesh wrote:
> 
> > On Tue, 16 Feb 2016 17:10:43 +0100,
> > Vinod Koul wrote:
> > >
> > > On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
> > > > On Tue, 16 Feb 2016 16:37:11 +0100,
> > > > Vinod Koul wrote:
> > > > >
> > > > >
> > > > > Hi Takashi,
> > > > >
> > > > > We found another issue with non atomic ops.
> > > > >
> > > > > This occurs when we have snd_pcm_do_prepare being invoked which
> > acquires
> > > > > the lock snd_pcm_link_rwsem twice, first in snd_pcm_common_ioctl1,
> > and then again in
> > > > > dpcm_dapm_stream_event. Normally this is fine, but if we have
> > another stream
> > > > > get closed in between two read accesses we get stuck.
> > > > >
> > > > > First thread:
> > > > >
> > > > > [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1
> > 	0x20020004
> > > > > [  419.665946]  ffff8800704a3a60 00000000704a3a60 ffff88006d1fddd0
> > ffffffff8252ffd8
> > > > > [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff
> > ffffffff8252ffd8
> > > > > [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987
> > ffffffffffffffff
> > > > > [  419.683574] Call Trace:
> > > > > [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > [  419.692306]  [<ffffffff81ca6f6d>]
> > rwsem_down_read_failed+0xdd/0x130
> > > > > [  419.701050]  [<ffffffff814df454>]
> > call_rwsem_down_read_failed+0x14/0x30
> > > > > [  419.709758]  [<ffffffff810e793b>] ? down_read_nested+0x5b/0x70
> > > > > [  419.709805]  [<ffffffff81adfda9>] ?
> > snd_pcm_stream_lock+0xa9/0x120
> > > > > [  419.723012]  [<ffffffff81adfda9>] snd_pcm_stream_lock+0xa9/0x120
> > > > >
> > > > > < ============     2nd Read lock is acquired here, without releasing
> > previous one.
> > > > >
> > > > > [  419.723026]  [<ffffffff81adfe4c>]
> > snd_pcm_stream_lock_irq+0x2c/0x30
> > > > > [  419.731801]  [<ffffffff81b01f5e>] ?
> > dpcm_dapm_stream_event+0xbe/0xd0
> > > > > [  419.740565]  [<ffffffff81b0079c>]
> > dpcm_set_fe_update_state+0x3c/0xb0
> > > > > [  419.749252]  [<ffffffff81b04514>] dpcm_fe_dai_prepare+0xc4/0x150
> > > > > [  419.749301]  [<ffffffff81adf7eb>] snd_pcm_do_prepare+0x1b/0x30
> > > > > [  419.758083]  [<ffffffff81adf3bf>] snd_pcm_action_single+0x2f/0x70
> > > > > [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> > > > > [  419.766962]  [<ffffffff81adf577>] ?
> > snd_pcm_action_nonatomic+0x27/0x80
> > > > > [  419.775565]  [<ffffffff81adf5c6>]
> > snd_pcm_action_nonatomic+0x76/0x80
> > > > >
> > > > > < ============     1st Read lock is acquired here
> > > > >
> > > > > [  419.784419]  [<ffffffff81ae2c62>]
> > snd_pcm_common_ioctl1+0x802/0xd30
> > > > > [  419.784495]  [<ffffffff81ae332b>]
> > snd_pcm_playback_ioctl1+0x19b/0x280
> > > > > [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > [  419.801943]  [<ffffffff81ae3dca>]
> > snd_pcm_ioctl_compat+0x41a/0x770
> > > > > [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> > > > > [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> > > > > [  419.819535]  [<ffffffff810eb696>] ?
> > trace_hardirqs_on_caller+0x16/0x210
> > > > > [  419.828306]  [<ffffffff814df4f8>] ?
> > trace_hardirqs_on_thunk+0x17/0x19
> > > > > [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> > > > >
> > > > >
> > > > > Second thread:
> > > > > [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1
> > 	0x20020004
> > > > > [  419.543191]  ffff880170957cd8 0000000070957cd8 ffff88006de49ad0
> > ffffffff8252ffd8
> > > > > [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0
> > ffffffff00000001
> > > > > [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987
> > ffffffff8252ffc0
> > > > > [  419.569368] Call Trace:
> > > > > [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > [  419.578026]  [<ffffffff81ca71f5>]
> > rwsem_down_write_failed+0x235/0x450
> > > > > [  419.586884]  [<ffffffff81ca7005>] ?
> > rwsem_down_write_failed+0x45/0x450
> > > > > [  419.586941]  [<ffffffff814df483>]
> > call_rwsem_down_write_failed+0x13/0x20
> > > > > [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> > > > >
> > > > > < ============     Write lock acquired here. If this occurs between
> > two read
> > > > > lock's then it's hangs
> > > >
> > > > So what's wrong?  The two read locks are still running, and
> > > > snd_pcm_release() waits until these two read locks are freed.
> > > > The read double-locks are fine, and it's intended behavior.
> > >
> > > Yes that is true
> > >
> > > > The problem is apparently not there, but somewhere else.
> > > > Any other thread or call involved?
> > >
> > > The second backtrace above is the 2nd thread which does down_write.
> > That
> > > gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
> > > read is also stuck and we are in deadlock.
> > >
> > > down_read (1st acquired)
> > > 					down_write (waiting due to
> > down_read)
> > > down_read( 2nd,,, stuck)
> > 
> > Hmm, so down_read() takes care of the pending down_write()?
> > Then it's a problem.  Maybe the limitation of the current rwsem
> > implementation.
> > 
> down_write() will make sure no further read locks are allowed, till it get
> executed.
> So here the 2nd down_read() will get blocked, since down_write has higher
> priority.

Well, the requirement of down_write() is to block further reads *once
when* it takes the lock.  It doesn't necessarily mean to block reads
*before* taking the write lock.


Takashi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 16:46         ` Takashi Iwai
@ 2016-02-16 17:15           ` Babu, Ramesh
  2016-02-16 17:41             ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Babu, Ramesh @ 2016-02-16 17:15 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Koul, Vinod, Patches Audio, alsa-devel@alsa-project.org

> On Tue, 16 Feb 2016 17:40:19 +0100,
> Babu, Ramesh wrote:
> >
> > > On Tue, 16 Feb 2016 17:10:43 +0100,
> > > Vinod Koul wrote:
> > > >
> > > > On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
> > > > > On Tue, 16 Feb 2016 16:37:11 +0100,
> > > > > Vinod Koul wrote:
> > > > > >
> > > > > >
> > > > > > Hi Takashi,
> > > > > >
> > > > > > We found another issue with non atomic ops.
> > > > > >
> > > > > > This occurs when we have snd_pcm_do_prepare being invoked
> which
> > > acquires
> > > > > > the lock snd_pcm_link_rwsem twice, first in
> snd_pcm_common_ioctl1,
> > > and then again in
> > > > > > dpcm_dapm_stream_event. Normally this is fine, but if we have
> > > another stream
> > > > > > get closed in between two read accesses we get stuck.
> > > > > >
> > > > > > First thread:
> > > > > >
> > > > > > [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1
> > > 	0x20020004
> > > > > > [  419.665946]  ffff8800704a3a60 00000000704a3a60
> ffff88006d1fddd0
> > > ffffffff8252ffd8
> > > > > > [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff
> > > ffffffff8252ffd8
> > > > > > [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987
> > > ffffffffffffffff
> > > > > > [  419.683574] Call Trace:
> > > > > > [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > > [  419.692306]  [<ffffffff81ca6f6d>]
> > > rwsem_down_read_failed+0xdd/0x130
> > > > > > [  419.701050]  [<ffffffff814df454>]
> > > call_rwsem_down_read_failed+0x14/0x30
> > > > > > [  419.709758]  [<ffffffff810e793b>] ?
> down_read_nested+0x5b/0x70
> > > > > > [  419.709805]  [<ffffffff81adfda9>] ?
> > > snd_pcm_stream_lock+0xa9/0x120
> > > > > > [  419.723012]  [<ffffffff81adfda9>]
> snd_pcm_stream_lock+0xa9/0x120
> > > > > >
> > > > > > < ============     2nd Read lock is acquired here, without
> releasing
> > > previous one.
> > > > > >
> > > > > > [  419.723026]  [<ffffffff81adfe4c>]
> > > snd_pcm_stream_lock_irq+0x2c/0x30
> > > > > > [  419.731801]  [<ffffffff81b01f5e>] ?
> > > dpcm_dapm_stream_event+0xbe/0xd0
> > > > > > [  419.740565]  [<ffffffff81b0079c>]
> > > dpcm_set_fe_update_state+0x3c/0xb0
> > > > > > [  419.749252]  [<ffffffff81b04514>]
> dpcm_fe_dai_prepare+0xc4/0x150
> > > > > > [  419.749301]  [<ffffffff81adf7eb>]
> snd_pcm_do_prepare+0x1b/0x30
> > > > > > [  419.758083]  [<ffffffff81adf3bf>]
> snd_pcm_action_single+0x2f/0x70
> > > > > > [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> > > > > > [  419.766962]  [<ffffffff81adf577>] ?
> > > snd_pcm_action_nonatomic+0x27/0x80
> > > > > > [  419.775565]  [<ffffffff81adf5c6>]
> > > snd_pcm_action_nonatomic+0x76/0x80
> > > > > >
> > > > > > < ============     1st Read lock is acquired here
> > > > > >
> > > > > > [  419.784419]  [<ffffffff81ae2c62>]
> > > snd_pcm_common_ioctl1+0x802/0xd30
> > > > > > [  419.784495]  [<ffffffff81ae332b>]
> > > snd_pcm_playback_ioctl1+0x19b/0x280
> > > > > > [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > > [  419.801943]  [<ffffffff81ae3dca>]
> > > snd_pcm_ioctl_compat+0x41a/0x770
> > > > > > [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > > [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> > > > > > [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> > > > > > [  419.819535]  [<ffffffff810eb696>] ?
> > > trace_hardirqs_on_caller+0x16/0x210
> > > > > > [  419.828306]  [<ffffffff814df4f8>] ?
> > > trace_hardirqs_on_thunk+0x17/0x19
> > > > > > [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> > > > > >
> > > > > >
> > > > > > Second thread:
> > > > > > [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1
> > > 	0x20020004
> > > > > > [  419.543191]  ffff880170957cd8 0000000070957cd8
> ffff88006de49ad0
> > > ffffffff8252ffd8
> > > > > > [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0
> > > ffffffff00000001
> > > > > > [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987
> > > ffffffff8252ffc0
> > > > > > [  419.569368] Call Trace:
> > > > > > [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > > [  419.578026]  [<ffffffff81ca71f5>]
> > > rwsem_down_write_failed+0x235/0x450
> > > > > > [  419.586884]  [<ffffffff81ca7005>] ?
> > > rwsem_down_write_failed+0x45/0x450
> > > > > > [  419.586941]  [<ffffffff814df483>]
> > > call_rwsem_down_write_failed+0x13/0x20
> > > > > > [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> > > > > >
> > > > > > < ============     Write lock acquired here. If this occurs between
> > > two read
> > > > > > lock's then it's hangs
> > > > >
> > > > > So what's wrong?  The two read locks are still running, and
> > > > > snd_pcm_release() waits until these two read locks are freed.
> > > > > The read double-locks are fine, and it's intended behavior.
> > > >
> > > > Yes that is true
> > > >
> > > > > The problem is apparently not there, but somewhere else.
> > > > > Any other thread or call involved?
> > > >
> > > > The second backtrace above is the 2nd thread which does down_write.
> > > That
> > > > gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
> > > > read is also stuck and we are in deadlock.
> > > >
> > > > down_read (1st acquired)
> > > > 					down_write (waiting due to
> > > down_read)
> > > > down_read( 2nd,,, stuck)
> > >
> > > Hmm, so down_read() takes care of the pending down_write()?
> > > Then it's a problem.  Maybe the limitation of the current rwsem
> > > implementation.
> > >
> > down_write() will make sure no further read locks are allowed, till it get
> > executed.
> > So here the 2nd down_read() will get blocked, since down_write has
> higher
> > priority.
> 
> Well, the requirement of down_write() is to block further reads *once
> when* it takes the lock.  It doesn't necessarily mean to block reads
> *before* taking the write lock.

Below excerpts from LDD [1] says:
"Writers get priority; as soon as a writer tries to enter the critical section,
no readers will be allowed in until all writers have completed their work."

My understanding when writer *tries* to acquire lock, further read locks
will be blocked.  I think in my back trace, 2nd down_read() didn't get honored,
since the down_write() was attempting to enter critical section from different
thread.

[1] http://www.makelinux.net/ldd3/chp-5-sect-3

> 
> 
> Takashi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 16:40       ` Vinod Koul
@ 2016-02-16 17:17         ` Babu, Ramesh
  2016-02-17 12:41           ` Babu, Ramesh
  0 siblings, 1 reply; 14+ messages in thread
From: Babu, Ramesh @ 2016-02-16 17:17 UTC (permalink / raw)
  To: Koul, Vinod, Takashi Iwai; +Cc: Patches Audio, alsa-devel@alsa-project.org

> On Tue, Feb 16, 2016 at 05:27:44PM +0100, Takashi Iwai wrote:
> > On Tue, 16 Feb 2016 17:10:43 +0100,
> > Vinod Koul wrote:
> > >
> > > On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
> > > > On Tue, 16 Feb 2016 16:37:11 +0100,
> > > > Vinod Koul wrote:
> > > > >
> > > > >
> > > > > Hi Takashi,
> > > > >
> > > > > We found another issue with non atomic ops.
> > > > >
> > > > > This occurs when we have snd_pcm_do_prepare being invoked
> which acquires
> > > > > the lock snd_pcm_link_rwsem twice, first in
> snd_pcm_common_ioctl1, and then again in
> > > > > dpcm_dapm_stream_event. Normally this is fine, but if we have
> another stream
> > > > > get closed in between two read accesses we get stuck.
> > > > >
> > > > > First thread:
> > > > >
> > > > > [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1
> 	0x20020004
> > > > > [  419.665946]  ffff8800704a3a60 00000000704a3a60 ffff88006d1fddd0
> ffffffff8252ffd8
> > > > > [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff
> ffffffff8252ffd8
> > > > > [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987
> ffffffffffffffff
> > > > > [  419.683574] Call Trace:
> > > > > [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > [  419.692306]  [<ffffffff81ca6f6d>]
> rwsem_down_read_failed+0xdd/0x130
> > > > > [  419.701050]  [<ffffffff814df454>]
> call_rwsem_down_read_failed+0x14/0x30
> > > > > [  419.709758]  [<ffffffff810e793b>] ? down_read_nested+0x5b/0x70
> > > > > [  419.709805]  [<ffffffff81adfda9>] ?
> snd_pcm_stream_lock+0xa9/0x120
> > > > > [  419.723012]  [<ffffffff81adfda9>]
> snd_pcm_stream_lock+0xa9/0x120
> > > > >
> > > > > < ============     2nd Read lock is acquired here, without releasing
> previous one.
> > > > >
> > > > > [  419.723026]  [<ffffffff81adfe4c>]
> snd_pcm_stream_lock_irq+0x2c/0x30
> > > > > [  419.731801]  [<ffffffff81b01f5e>] ?
> dpcm_dapm_stream_event+0xbe/0xd0
> > > > > [  419.740565]  [<ffffffff81b0079c>]
> dpcm_set_fe_update_state+0x3c/0xb0
> > > > > [  419.749252]  [<ffffffff81b04514>]
> dpcm_fe_dai_prepare+0xc4/0x150
> > > > > [  419.749301]  [<ffffffff81adf7eb>] snd_pcm_do_prepare+0x1b/0x30
> > > > > [  419.758083]  [<ffffffff81adf3bf>] snd_pcm_action_single+0x2f/0x70
> > > > > [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> > > > > [  419.766962]  [<ffffffff81adf577>] ?
> snd_pcm_action_nonatomic+0x27/0x80
> > > > > [  419.775565]  [<ffffffff81adf5c6>]
> snd_pcm_action_nonatomic+0x76/0x80
> > > > >
> > > > > < ============     1st Read lock is acquired here
> > > > >
> > > > > [  419.784419]  [<ffffffff81ae2c62>]
> snd_pcm_common_ioctl1+0x802/0xd30
> > > > > [  419.784495]  [<ffffffff81ae332b>]
> snd_pcm_playback_ioctl1+0x19b/0x280
> > > > > [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > [  419.801943]  [<ffffffff81ae3dca>]
> snd_pcm_ioctl_compat+0x41a/0x770
> > > > > [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> > > > > [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> > > > > [  419.819535]  [<ffffffff810eb696>] ?
> trace_hardirqs_on_caller+0x16/0x210
> > > > > [  419.828306]  [<ffffffff814df4f8>] ?
> trace_hardirqs_on_thunk+0x17/0x19
> > > > > [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> > > > >
> > > > >
> > > > > Second thread:
> > > > > [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1
> 	0x20020004
> > > > > [  419.543191]  ffff880170957cd8 0000000070957cd8 ffff88006de49ad0
> ffffffff8252ffd8
> > > > > [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0
> ffffffff00000001
> > > > > [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987
> ffffffff8252ffc0
> > > > > [  419.569368] Call Trace:
> > > > > [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > [  419.578026]  [<ffffffff81ca71f5>]
> rwsem_down_write_failed+0x235/0x450
> > > > > [  419.586884]  [<ffffffff81ca7005>] ?
> rwsem_down_write_failed+0x45/0x450
> > > > > [  419.586941]  [<ffffffff814df483>]
> call_rwsem_down_write_failed+0x13/0x20
> > > > > [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> > > > >
> > > > > < ============     Write lock acquired here. If this occurs between
> two read
> > > > > lock's then it's hangs
> > > >
> > > > So what's wrong?  The two read locks are still running, and
> > > > snd_pcm_release() waits until these two read locks are freed.
> > > > The read double-locks are fine, and it's intended behavior.
> > >
> > > Yes that is true
> > >
> > > > The problem is apparently not there, but somewhere else.
> > > > Any other thread or call involved?
> > >
> > > The second backtrace above is the 2nd thread which does down_write.
> That
> > > gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
> > > read is also stuck and we are in deadlock.
> > >
> > > down_read (1st acquired)
> > > 					down_write (waiting due to
> down_read)
> > > down_read( 2nd,,, stuck)
> >
> > Hmm, so down_read() takes care of the pending down_write()?
> > Then it's a problem.  Maybe the limitation of the current rwsem
> > implementation.
> 
> Yes that is how things seem right now.
> >
> > Does the patch below work?  It's suboptimal, just a proof of concept.
> 
> Okay will try first thing tomorrow and let you know.
> 
I tried the below changes and it seem to be fixing the issue.
I will do more stress testing tomorrow and let you know.
Thanks :)

> > ---
> > diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> > index fadd3eb8e8bb..8b45edd218a4 100644
> > --- a/sound/core/pcm_native.c
> > +++ b/sound/core/pcm_native.c
> > @@ -1813,7 +1813,8 @@ static int snd_pcm_link(struct
> snd_pcm_substream *substream, int fd)
> >  		res = -ENOMEM;
> >  		goto _nolock;
> >  	}
> > -	down_write(&snd_pcm_link_rwsem);
> > +	while (!down_write_trylock(&snd_pcm_link_rwsem))
> > +		cpu_relax();
> >  	write_lock_irq(&snd_pcm_link_rwlock);
> >  	if (substream->runtime->status->state ==
> SNDRV_PCM_STATE_OPEN ||
> >  	    substream->runtime->status->state != substream1->runtime-
> >status->state ||
> > @@ -1860,7 +1861,8 @@ static int snd_pcm_unlink(struct
> snd_pcm_substream *substream)
> >  	struct snd_pcm_substream *s;
> >  	int res = 0;
> >
> > -	down_write(&snd_pcm_link_rwsem);
> > +	while (!down_write_trylock(&snd_pcm_link_rwsem))
> > +		cpu_relax();
> >  	write_lock_irq(&snd_pcm_link_rwlock);
> >  	if (!snd_pcm_stream_linked(substream)) {
> >  		res = -EALREADY;
> 
> --
> ~Vinod

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 17:15           ` Babu, Ramesh
@ 2016-02-16 17:41             ` Takashi Iwai
  0 siblings, 0 replies; 14+ messages in thread
From: Takashi Iwai @ 2016-02-16 17:41 UTC (permalink / raw)
  To: Babu, Ramesh; +Cc: Koul, Vinod, Patches Audio, alsa-devel@alsa-project.org

On Tue, 16 Feb 2016 18:15:12 +0100,
Babu, Ramesh wrote:
> 
> > On Tue, 16 Feb 2016 17:40:19 +0100,
> > Babu, Ramesh wrote:
> > >
> > > > On Tue, 16 Feb 2016 17:10:43 +0100,
> > > > Vinod Koul wrote:
> > > > >
> > > > > On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
> > > > > > On Tue, 16 Feb 2016 16:37:11 +0100,
> > > > > > Vinod Koul wrote:
> > > > > > >
> > > > > > >
> > > > > > > Hi Takashi,
> > > > > > >
> > > > > > > We found another issue with non atomic ops.
> > > > > > >
> > > > > > > This occurs when we have snd_pcm_do_prepare being invoked
> > which
> > > > acquires
> > > > > > > the lock snd_pcm_link_rwsem twice, first in
> > snd_pcm_common_ioctl1,
> > > > and then again in
> > > > > > > dpcm_dapm_stream_event. Normally this is fine, but if we have
> > > > another stream
> > > > > > > get closed in between two read accesses we get stuck.
> > > > > > >
> > > > > > > First thread:
> > > > > > >
> > > > > > > [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1
> > > > 	0x20020004
> > > > > > > [  419.665946]  ffff8800704a3a60 00000000704a3a60
> > ffff88006d1fddd0
> > > > ffffffff8252ffd8
> > > > > > > [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff
> > > > ffffffff8252ffd8
> > > > > > > [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987
> > > > ffffffffffffffff
> > > > > > > [  419.683574] Call Trace:
> > > > > > > [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > > > [  419.692306]  [<ffffffff81ca6f6d>]
> > > > rwsem_down_read_failed+0xdd/0x130
> > > > > > > [  419.701050]  [<ffffffff814df454>]
> > > > call_rwsem_down_read_failed+0x14/0x30
> > > > > > > [  419.709758]  [<ffffffff810e793b>] ?
> > down_read_nested+0x5b/0x70
> > > > > > > [  419.709805]  [<ffffffff81adfda9>] ?
> > > > snd_pcm_stream_lock+0xa9/0x120
> > > > > > > [  419.723012]  [<ffffffff81adfda9>]
> > snd_pcm_stream_lock+0xa9/0x120
> > > > > > >
> > > > > > > < ============     2nd Read lock is acquired here, without
> > releasing
> > > > previous one.
> > > > > > >
> > > > > > > [  419.723026]  [<ffffffff81adfe4c>]
> > > > snd_pcm_stream_lock_irq+0x2c/0x30
> > > > > > > [  419.731801]  [<ffffffff81b01f5e>] ?
> > > > dpcm_dapm_stream_event+0xbe/0xd0
> > > > > > > [  419.740565]  [<ffffffff81b0079c>]
> > > > dpcm_set_fe_update_state+0x3c/0xb0
> > > > > > > [  419.749252]  [<ffffffff81b04514>]
> > dpcm_fe_dai_prepare+0xc4/0x150
> > > > > > > [  419.749301]  [<ffffffff81adf7eb>]
> > snd_pcm_do_prepare+0x1b/0x30
> > > > > > > [  419.758083]  [<ffffffff81adf3bf>]
> > snd_pcm_action_single+0x2f/0x70
> > > > > > > [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> > > > > > > [  419.766962]  [<ffffffff81adf577>] ?
> > > > snd_pcm_action_nonatomic+0x27/0x80
> > > > > > > [  419.775565]  [<ffffffff81adf5c6>]
> > > > snd_pcm_action_nonatomic+0x76/0x80
> > > > > > >
> > > > > > > < ============     1st Read lock is acquired here
> > > > > > >
> > > > > > > [  419.784419]  [<ffffffff81ae2c62>]
> > > > snd_pcm_common_ioctl1+0x802/0xd30
> > > > > > > [  419.784495]  [<ffffffff81ae332b>]
> > > > snd_pcm_playback_ioctl1+0x19b/0x280
> > > > > > > [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > > > [  419.801943]  [<ffffffff81ae3dca>]
> > > > snd_pcm_ioctl_compat+0x41a/0x770
> > > > > > > [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > > > [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> > > > > > > [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> > > > > > > [  419.819535]  [<ffffffff810eb696>] ?
> > > > trace_hardirqs_on_caller+0x16/0x210
> > > > > > > [  419.828306]  [<ffffffff814df4f8>] ?
> > > > trace_hardirqs_on_thunk+0x17/0x19
> > > > > > > [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> > > > > > >
> > > > > > >
> > > > > > > Second thread:
> > > > > > > [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1
> > > > 	0x20020004
> > > > > > > [  419.543191]  ffff880170957cd8 0000000070957cd8
> > ffff88006de49ad0
> > > > ffffffff8252ffd8
> > > > > > > [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0
> > > > ffffffff00000001
> > > > > > > [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987
> > > > ffffffff8252ffc0
> > > > > > > [  419.569368] Call Trace:
> > > > > > > [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > > > [  419.578026]  [<ffffffff81ca71f5>]
> > > > rwsem_down_write_failed+0x235/0x450
> > > > > > > [  419.586884]  [<ffffffff81ca7005>] ?
> > > > rwsem_down_write_failed+0x45/0x450
> > > > > > > [  419.586941]  [<ffffffff814df483>]
> > > > call_rwsem_down_write_failed+0x13/0x20
> > > > > > > [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> > > > > > >
> > > > > > > < ============     Write lock acquired here. If this occurs between
> > > > two read
> > > > > > > lock's then it's hangs
> > > > > >
> > > > > > So what's wrong?  The two read locks are still running, and
> > > > > > snd_pcm_release() waits until these two read locks are freed.
> > > > > > The read double-locks are fine, and it's intended behavior.
> > > > >
> > > > > Yes that is true
> > > > >
> > > > > > The problem is apparently not there, but somewhere else.
> > > > > > Any other thread or call involved?
> > > > >
> > > > > The second backtrace above is the 2nd thread which does down_write.
> > > > That
> > > > > gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
> > > > > read is also stuck and we are in deadlock.
> > > > >
> > > > > down_read (1st acquired)
> > > > > 					down_write (waiting due to
> > > > down_read)
> > > > > down_read( 2nd,,, stuck)
> > > >
> > > > Hmm, so down_read() takes care of the pending down_write()?
> > > > Then it's a problem.  Maybe the limitation of the current rwsem
> > > > implementation.
> > > >
> > > down_write() will make sure no further read locks are allowed, till it get
> > > executed.
> > > So here the 2nd down_read() will get blocked, since down_write has
> > higher
> > > priority.
> > 
> > Well, the requirement of down_write() is to block further reads *once
> > when* it takes the lock.  It doesn't necessarily mean to block reads
> > *before* taking the write lock.
> 
> Below excerpts from LDD [1] says:
> "Writers get priority; as soon as a writer tries to enter the critical section,
> no readers will be allowed in until all writers have completed their work."

In your case, writer doesn't enter the critical section yet.  It's
trying to enter but blocked.

> My understanding when writer *tries* to acquire lock, further read locks
> will be blocked.  I think in my back trace, 2nd down_read() didn't get honored,
> since the down_write() was attempting to enter critical section from different
> thread.

Actually, this is pretty much depending on the implementation.  A
proper RT-wise rwsem may handle it differently, but the Linux standard
rwsem just queues a la FIFO, thus the writer blocks the rest.


Takashi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-16 17:17         ` Babu, Ramesh
@ 2016-02-17 12:41           ` Babu, Ramesh
  2016-02-17 13:48             ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Babu, Ramesh @ 2016-02-17 12:41 UTC (permalink / raw)
  To: Koul, Vinod, 'Takashi Iwai'
  Cc: Patches Audio, 'alsa-devel@alsa-project.org'

Hi Takashi,

> > On Tue, Feb 16, 2016 at 05:27:44PM +0100, Takashi Iwai wrote:
> > > On Tue, 16 Feb 2016 17:10:43 +0100,
> > > Vinod Koul wrote:
> > > >
> > > > On Tue, Feb 16, 2016 at 05:01:36PM +0100, Takashi Iwai wrote:
> > > > > On Tue, 16 Feb 2016 16:37:11 +0100,
> > > > > Vinod Koul wrote:
> > > > > >
> > > > > >
> > > > > > Hi Takashi,
> > > > > >
> > > > > > We found another issue with non atomic ops.
> > > > > >
> > > > > > This occurs when we have snd_pcm_do_prepare being invoked
> > which acquires
> > > > > > the lock snd_pcm_link_rwsem twice, first in
> > snd_pcm_common_ioctl1, and then again in
> > > > > > dpcm_dapm_stream_event. Normally this is fine, but if we have
> > another stream
> > > > > > get closed in between two read accesses we get stuck.
> > > > > >
> > > > > > First thread:
> > > > > >
> > > > > > [  419.657259] AudioOut_2      D ffff8800704a3a60     0  3691      1
> > 	0x20020004
> > > > > > [  419.665946]  ffff8800704a3a60 00000000704a3a60
> ffff88006d1fddd0
> > ffffffff8252ffd8
> > > > > > [  419.674678]  ffff8800704a4000 ffffffff8252ffc0 ffffffffffffffff
> > ffffffff8252ffd8
> > > > > > [  419.683513]  0000000000000000 ffff8800704a3a80 ffffffff81ca2987
> > ffffffffffffffff
> > > > > > [  419.683574] Call Trace:
> > > > > > [  419.692290]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > > [  419.692306]  [<ffffffff81ca6f6d>]
> > rwsem_down_read_failed+0xdd/0x130
> > > > > > [  419.701050]  [<ffffffff814df454>]
> > call_rwsem_down_read_failed+0x14/0x30
> > > > > > [  419.709758]  [<ffffffff810e793b>] ?
> down_read_nested+0x5b/0x70
> > > > > > [  419.709805]  [<ffffffff81adfda9>] ?
> > snd_pcm_stream_lock+0xa9/0x120
> > > > > > [  419.723012]  [<ffffffff81adfda9>]
> > snd_pcm_stream_lock+0xa9/0x120
> > > > > >
> > > > > > < ============     2nd Read lock is acquired here, without
> releasing
> > previous one.
> > > > > >
> > > > > > [  419.723026]  [<ffffffff81adfe4c>]
> > snd_pcm_stream_lock_irq+0x2c/0x30
> > > > > > [  419.731801]  [<ffffffff81b01f5e>] ?
> > dpcm_dapm_stream_event+0xbe/0xd0
> > > > > > [  419.740565]  [<ffffffff81b0079c>]
> > dpcm_set_fe_update_state+0x3c/0xb0
> > > > > > [  419.749252]  [<ffffffff81b04514>]
> > dpcm_fe_dai_prepare+0xc4/0x150
> > > > > > [  419.749301]  [<ffffffff81adf7eb>]
> snd_pcm_do_prepare+0x1b/0x30
> > > > > > [  419.758083]  [<ffffffff81adf3bf>]
> snd_pcm_action_single+0x2f/0x70
> > > > > > [  419.766897]  [<ffffffff81ca6827>] ? down_read+0x47/0x60
> > > > > > [  419.766962]  [<ffffffff81adf577>] ?
> > snd_pcm_action_nonatomic+0x27/0x80
> > > > > > [  419.775565]  [<ffffffff81adf5c6>]
> > snd_pcm_action_nonatomic+0x76/0x80
> > > > > >
> > > > > > < ============     1st Read lock is acquired here
> > > > > >
> > > > > > [  419.784419]  [<ffffffff81ae2c62>]
> > snd_pcm_common_ioctl1+0x802/0xd30
> > > > > > [  419.784495]  [<ffffffff81ae332b>]
> > snd_pcm_playback_ioctl1+0x19b/0x280
> > > > > > [  419.793106]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > > [  419.801943]  [<ffffffff81ae3dca>]
> > snd_pcm_ioctl_compat+0x41a/0x770
> > > > > > [  419.801959]  [<ffffffff81230915>] ? __fget+0x5/0x210
> > > > > > [  419.810793]  [<ffffffff8126d670>] compat_SyS_ioctl+0xd0/0x13b0
> > > > > > [  419.819491]  [<ffffffff81ca7bc4>] ? do_nanosleep+0x94/0x190
> > > > > > [  419.819535]  [<ffffffff810eb696>] ?
> > trace_hardirqs_on_caller+0x16/0x210
> > > > > > [  419.828306]  [<ffffffff814df4f8>] ?
> > trace_hardirqs_on_thunk+0x17/0x19
> > > > > > [  419.837104]  [<ffffffff81cabf0e>] sysenter_dispatch+0x7/0x1f
> > > > > >
> > > > > >
> > > > > > Second thread:
> > > > > > [  419.543062] mediaserver     D ffff880170957cd8     0  3690      1
> > 	0x20020004
> > > > > > [  419.543191]  ffff880170957cd8 0000000070957cd8
> ffff88006de49ad0
> > ffffffff8252ffd8
> > > > > > [  419.551920]  ffff880170958000 ffff88006de49ad0 ffffffff8252ffc0
> > ffffffff00000001
> > > > > > [  419.560555]  ffffffff8252ffd8 ffff880170957cf8 ffffffff81ca2987
> > ffffffff8252ffc0
> > > > > > [  419.569368] Call Trace:
> > > > > > [  419.569392]  [<ffffffff81ca2987>] schedule+0x37/0x90
> > > > > > [  419.578026]  [<ffffffff81ca71f5>]
> > rwsem_down_write_failed+0x235/0x450
> > > > > > [  419.586884]  [<ffffffff81ca7005>] ?
> > rwsem_down_write_failed+0x45/0x450
> > > > > > [  419.586941]  [<ffffffff814df483>]
> > call_rwsem_down_write_failed+0x13/0x20
> > > > > > [  419.595620]  [<ffffffff81ca689f>] ? down_write+0x5f/0x80
> > > > > >
> > > > > > < ============     Write lock acquired here. If this occurs between
> > two read
> > > > > > lock's then it's hangs
> > > > >
> > > > > So what's wrong?  The two read locks are still running, and
> > > > > snd_pcm_release() waits until these two read locks are freed.
> > > > > The read double-locks are fine, and it's intended behavior.
> > > >
> > > > Yes that is true
> > > >
> > > > > The problem is apparently not there, but somewhere else.
> > > > > Any other thread or call involved?
> > > >
> > > > The second backtrace above is the 2nd thread which does down_write.
> > That
> > > > gets hung as 1st read lock is acquired. Since write gets higer priorty, 2nd
> > > > read is also stuck and we are in deadlock.
> > > >
> > > > down_read (1st acquired)
> > > > 					down_write (waiting due to
> > down_read)
> > > > down_read( 2nd,,, stuck)
> > >
> > > Hmm, so down_read() takes care of the pending down_write()?
> > > Then it's a problem.  Maybe the limitation of the current rwsem
> > > implementation.
> >
> > Yes that is how things seem right now.
> > >
> > > Does the patch below work?  It's suboptimal, just a proof of concept.
> >
> > Okay will try first thing tomorrow and let you know.
> >
> I tried the below changes and it seem to be fixing the issue.
> I will do more stress testing tomorrow and let you know.
> Thanks :)
>
I verified the fix and I confirm that it resolved the race 
condition issue.
> > > ---
> > > diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> > > index fadd3eb8e8bb..8b45edd218a4 100644
> > > --- a/sound/core/pcm_native.c
> > > +++ b/sound/core/pcm_native.c
> > > @@ -1813,7 +1813,8 @@ static int snd_pcm_link(struct
> > snd_pcm_substream *substream, int fd)
> > >  		res = -ENOMEM;
> > >  		goto _nolock;
> > >  	}
> > > -	down_write(&snd_pcm_link_rwsem);
> > > +	while (!down_write_trylock(&snd_pcm_link_rwsem))
> > > +		cpu_relax();
> > >  	write_lock_irq(&snd_pcm_link_rwlock);
> > >  	if (substream->runtime->status->state ==
> > SNDRV_PCM_STATE_OPEN ||
> > >  	    substream->runtime->status->state != substream1->runtime-
> > >status->state ||
> > > @@ -1860,7 +1861,8 @@ static int snd_pcm_unlink(struct
> > snd_pcm_substream *substream)
> > >  	struct snd_pcm_substream *s;
> > >  	int res = 0;
> > >
> > > -	down_write(&snd_pcm_link_rwsem);
> > > +	while (!down_write_trylock(&snd_pcm_link_rwsem))
> > > +		cpu_relax();
> > >  	write_lock_irq(&snd_pcm_link_rwlock);
> > >  	if (!snd_pcm_stream_linked(substream)) {
> > >  		res = -EALREADY;
> >
> > --
> > ~Vinod

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-17 12:41           ` Babu, Ramesh
@ 2016-02-17 13:48             ` Takashi Iwai
  2016-02-18 10:28               ` Ramesh Babu
  0 siblings, 1 reply; 14+ messages in thread
From: Takashi Iwai @ 2016-02-17 13:48 UTC (permalink / raw)
  To: Babu, Ramesh
  Cc: Koul, Vinod, Patches Audio, 'alsa-devel@alsa-project.org'

On Wed, 17 Feb 2016 13:41:10 +0100,
Babu, Ramesh wrote:
> 
> I verified the fix and I confirm that it resolved the race 
> condition issue.

Good to hear.  Could you try the revised patch below?
If it still works, give your tested-by tag, so that I can merge to
upstream.


Takashi

---
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream

A non-atomic PCM stream may take snd_pcm_link_rwsem rw semaphore twice
in the same code path, e.g. one in snd_pcm_action_nonatomic() and
another in snd_pcm_stream_lock().  Usually this is OK, but when a
write lock is issued between these two read locks, the problem
happens: the write lock is blocked due to the first reade lock, and
the second read lock is also blocked by the write lock.  This
eventually deadlocks.

The reason is the way rwsem manages waiters; it's queued like FIFO, so
even if the writer itself doesn't take the lock yet, it blocks all the
waiters (including reads) queued after it.

As a workaround, in this patch, we replace the standard down_write()
with an spinning loop.  This is far from optimal, but it's good
enough, as the spinning time is supposed to be relatively short for
normal PCM operations, and the code paths requiring the write lock
aren't called so often.

Reported-by: Vinod Koul <vinod.koul@intel.com>
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/core/pcm_native.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index fadd3eb8e8bb..9106d8e2300e 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -74,6 +74,18 @@ static int snd_pcm_open(struct file *file, struct snd_pcm *pcm, int stream);
 static DEFINE_RWLOCK(snd_pcm_link_rwlock);
 static DECLARE_RWSEM(snd_pcm_link_rwsem);
 
+/* Writer in rwsem may block readers even during its waiting in queue,
+ * and this may lead to a deadlock when the code path takes read sem
+ * twice (e.g. one in snd_pcm_action_nonatomic() and another in
+ * snd_pcm_stream_lock()).  As a (suboptimal) workaround, let writer to
+ * spin until it gets the lock.
+ */
+static inline void down_write_nonblock(struct rw_semaphore *lock)
+{
+	while (!down_write_trylock(lock))
+		cond_resched();
+}
+
 /**
  * snd_pcm_stream_lock - Lock the PCM stream
  * @substream: PCM substream
@@ -1813,7 +1825,7 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
 		res = -ENOMEM;
 		goto _nolock;
 	}
-	down_write(&snd_pcm_link_rwsem);
+	down_write_nonblock(&snd_pcm_link_rwsem);
 	write_lock_irq(&snd_pcm_link_rwlock);
 	if (substream->runtime->status->state == SNDRV_PCM_STATE_OPEN ||
 	    substream->runtime->status->state != substream1->runtime->status->state ||
@@ -1860,7 +1872,7 @@ static int snd_pcm_unlink(struct snd_pcm_substream *substream)
 	struct snd_pcm_substream *s;
 	int res = 0;
 
-	down_write(&snd_pcm_link_rwsem);
+	down_write_nonblock(&snd_pcm_link_rwsem);
 	write_lock_irq(&snd_pcm_link_rwlock);
 	if (!snd_pcm_stream_linked(substream)) {
 		res = -EALREADY;
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-17 13:48             ` Takashi Iwai
@ 2016-02-18 10:28               ` Ramesh Babu
  2016-02-18 10:28                 ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Ramesh Babu @ 2016-02-18 10:28 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Koul, Vinod, Patches Audio, 'alsa-devel@alsa-project.org'

On Wed, Feb 17, 2016 at 02:48:39PM +0100, Takashi Iwai wrote:
> On Wed, 17 Feb 2016 13:41:10 +0100,
> Babu, Ramesh wrote:
> > 
> > I verified the fix and I confirm that it resolved the race 
> > condition issue.
> 
> Good to hear.  Could you try the revised patch below?
> If it still works, give your tested-by tag, so that I can merge to
> upstream.

Yes, patch works well.
Added Tested-by tag in below patch.

> 
> 
> Takashi
> 
> ---
> From: Takashi Iwai <tiwai@suse.de>
> Subject: [PATCH] ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream
> 
> A non-atomic PCM stream may take snd_pcm_link_rwsem rw semaphore twice
> in the same code path, e.g. one in snd_pcm_action_nonatomic() and
> another in snd_pcm_stream_lock().  Usually this is OK, but when a
> write lock is issued between these two read locks, the problem
> happens: the write lock is blocked due to the first reade lock, and
> the second read lock is also blocked by the write lock.  This
> eventually deadlocks.
> 
> The reason is the way rwsem manages waiters; it's queued like FIFO, so
> even if the writer itself doesn't take the lock yet, it blocks all the
> waiters (including reads) queued after it.
> 
> As a workaround, in this patch, we replace the standard down_write()
> with an spinning loop.  This is far from optimal, but it's good
> enough, as the spinning time is supposed to be relatively short for
> normal PCM operations, and the code paths requiring the write lock
> aren't called so often.
> 
> Reported-by: Vinod Koul <vinod.koul@intel.com>
Tested-by: Ramesh Babu <ramesh.babu@intel.com>
> Cc: <stable@vger.kernel.org> # v3.18+
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  sound/core/pcm_native.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> index fadd3eb8e8bb..9106d8e2300e 100644
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -74,6 +74,18 @@ static int snd_pcm_open(struct file *file, struct snd_pcm *pcm, int stream);
>  static DEFINE_RWLOCK(snd_pcm_link_rwlock);
>  static DECLARE_RWSEM(snd_pcm_link_rwsem);
>  
> +/* Writer in rwsem may block readers even during its waiting in queue,
> + * and this may lead to a deadlock when the code path takes read sem
> + * twice (e.g. one in snd_pcm_action_nonatomic() and another in
> + * snd_pcm_stream_lock()).  As a (suboptimal) workaround, let writer to
> + * spin until it gets the lock.
> + */
> +static inline void down_write_nonblock(struct rw_semaphore *lock)
> +{
> +	while (!down_write_trylock(lock))
> +		cond_resched();
> +}
> +
>  /**
>   * snd_pcm_stream_lock - Lock the PCM stream
>   * @substream: PCM substream
> @@ -1813,7 +1825,7 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
>  		res = -ENOMEM;
>  		goto _nolock;
>  	}
> -	down_write(&snd_pcm_link_rwsem);
> +	down_write_nonblock(&snd_pcm_link_rwsem);
>  	write_lock_irq(&snd_pcm_link_rwlock);
>  	if (substream->runtime->status->state == SNDRV_PCM_STATE_OPEN ||
>  	    substream->runtime->status->state != substream1->runtime->status->state ||
> @@ -1860,7 +1872,7 @@ static int snd_pcm_unlink(struct snd_pcm_substream *substream)
>  	struct snd_pcm_substream *s;
>  	int res = 0;
>  
> -	down_write(&snd_pcm_link_rwsem);
> +	down_write_nonblock(&snd_pcm_link_rwsem);
>  	write_lock_irq(&snd_pcm_link_rwlock);
>  	if (!snd_pcm_stream_linked(substream)) {
>  		res = -EALREADY;
> -- 
> 2.7.1
> 

-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: another locking issue with nonatomic ops?
  2016-02-18 10:28               ` Ramesh Babu
@ 2016-02-18 10:28                 ` Takashi Iwai
  0 siblings, 0 replies; 14+ messages in thread
From: Takashi Iwai @ 2016-02-18 10:28 UTC (permalink / raw)
  To: Ramesh Babu
  Cc: Koul, Vinod, Patches Audio, 'alsa-devel@alsa-project.org'

On Thu, 18 Feb 2016 11:28:22 +0100,
Ramesh Babu wrote:
> 
> On Wed, Feb 17, 2016 at 02:48:39PM +0100, Takashi Iwai wrote:
> > On Wed, 17 Feb 2016 13:41:10 +0100,
> > Babu, Ramesh wrote:
> > > 
> > > I verified the fix and I confirm that it resolved the race 
> > > condition issue.
> > 
> > Good to hear.  Could you try the revised patch below?
> > If it still works, give your tested-by tag, so that I can merge to
> > upstream.
> 
> Yes, patch works well.
> Added Tested-by tag in below patch.

OK, now merged and pushed out.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-02-18 10:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-16 15:37 another locking issue with nonatomic ops? Vinod Koul
2016-02-16 16:01 ` Takashi Iwai
2016-02-16 16:10   ` Vinod Koul
2016-02-16 16:27     ` Takashi Iwai
2016-02-16 16:40       ` Vinod Koul
2016-02-16 17:17         ` Babu, Ramesh
2016-02-17 12:41           ` Babu, Ramesh
2016-02-17 13:48             ` Takashi Iwai
2016-02-18 10:28               ` Ramesh Babu
2016-02-18 10:28                 ` Takashi Iwai
2016-02-16 16:40       ` Babu, Ramesh
2016-02-16 16:46         ` Takashi Iwai
2016-02-16 17:15           ` Babu, Ramesh
2016-02-16 17:41             ` Takashi Iwai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.