public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
@ 2024-07-28 12:26 Edmund Raile
  2024-07-28 12:26 ` [PATCH v2 1/2] ALSA: firewire-lib: restore workqueue for process context Edmund Raile
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Edmund Raile @ 2024-07-28 12:26 UTC (permalink / raw)
  To: o-takashi, clemens
  Cc: tiwai, alsa-devel, linux-sound, linux-kernel, edmund.raile,
	stable

This patchset serves to prevent a deadlock between
process context and softIRQ context:

A. In the process context
    * (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
               snd_pcm_status64()
    * (lock B) Then attempt to enter tasklet

B. In the softIRQ context
    * (lock B) Enter tasklet
    * (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in
               snd_pcm_period_elapsed()

? tasklet_unlock_spin_wait
 </NMI>
 <TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm

? native_queued_spin_lock_slowpath
 </NMI>
 <IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci

The issue has been reported as a regression of kernel 5.14:
Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with
Fireface 800")

Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
update") belongs to the same patch series and removed
the now-unused workqueue entirely.

Though being observed on RME Fireface 800, this issue would affect all
Firewire audio interfaces using ohci amdtp + pcm streaming.

ALSA streaming, especially under intensive CPU load will reveal this issue
the soonest due to issuing more hardIRQs, with time to occurrence ranging
from 2 secons to 30 minutes after starting playback.

to reproduce the issue:
direct ALSA playback to the device:
  mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
  - high CPU load
    stress --cpu $(nproc)
  - switching between applications via workspaces
    tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.

Backport note:
Also applies to and fixes on (tested):
6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164

Edmund Raile (2):
  ALSA: firewire-lib: restore workqueue for process context
  ALSA: firewire-lib: prevent deadlock between process and softIRQ
    context

 sound/firewire/amdtp-stream.c | 36 ++++++++++++++++++++++-------------
 sound/firewire/amdtp-stream.h |  1 +
 2 files changed, 24 insertions(+), 13 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 7+ messages in thread
* [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
@ 2024-07-28 12:30 Edmund Raile
  0 siblings, 0 replies; 7+ messages in thread
From: Edmund Raile @ 2024-07-28 12:30 UTC (permalink / raw)
  To: o-takashi, clemens
  Cc: tiwai, alsa-devel, linux-sound, linux-kernel, edmund.raile,
	stable

This patchset serves to prevent a deadlock between
process context and softIRQ context:

A. In the process context
    * (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
               snd_pcm_status64()
    * (lock B) Then attempt to enter tasklet

B. In the softIRQ context
    * (lock B) Enter tasklet
    * (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in
               snd_pcm_period_elapsed()

? tasklet_unlock_spin_wait
 </NMI>
 <TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm

? native_queued_spin_lock_slowpath
 </NMI>
 <IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci

The issue has been reported as a regression of kernel 5.14:
Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with
Fireface 800")

Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
update") belongs to the same patch series and removed
the now-unused workqueue entirely.

Though being observed on RME Fireface 800, this issue would affect all
Firewire audio interfaces using ohci amdtp + pcm streaming.

ALSA streaming, especially under intensive CPU load will reveal this issue
the soonest due to issuing more hardIRQs, with time to occurrence ranging
from 2 secons to 30 minutes after starting playback.

to reproduce the issue:
direct ALSA playback to the device:
  mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
  - high CPU load
    stress --cpu $(nproc)
  - switching between applications via workspaces
    tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.

Backport note:
Also applies to and fixes on (tested):
6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164

Edmund Raile (2):
  ALSA: firewire-lib: restore workqueue for process context
  ALSA: firewire-lib: prevent deadlock between process and softIRQ
    context

 sound/firewire/amdtp-stream.c | 36 ++++++++++++++++++++++-------------
 sound/firewire/amdtp-stream.h |  1 +
 2 files changed, 24 insertions(+), 13 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
@ 2024-07-29 10:16 edmund.raile
  2024-07-29 13:33 ` Takashi Sakamoto
  0 siblings, 1 reply; 7+ messages in thread
From: edmund.raile @ 2024-07-29 10:16 UTC (permalink / raw)
  To: Takashi Sakamoto
  Cc: linux-sound, stable, tiwai, clemens, alsa-devel, linux-kernel

> Thank you for your sending the revised patches, it looks better than the
> previous one. However, I have an additional request.
Allright, patch v3 it is.

> [1] https://git-scm.com/docs/git-revert
Should have known git has something like that, how handy!

> $ git revert -s b5b519965c4c
Yes, 5b5 can be removed via revert, but what is the difference in
effect? Just time saving?
> $ git revert -s 7ba5ca32fe6e
This one I'd like to ask you about:
The original inline comment in amdtp-stream.c
amdtp_domain_stream_pcm_pointer()
```
// This function is called in software IRQ context of
// period_work or process context.
//
// When the software IRQ context was scheduled by software IRQ
// context of IT contexts, queued packets were already handled.
// Therefore, no need to flush the queue in buffer furthermore.
//
// When the process context reach here, some packets will be
// already queued in the buffer. These packets should be handled
// immediately to keep better granularity of PCM pointer.
//
// Later, the process context will sometimes schedules software
// IRQ context of the period_work. Then, no need to flush the
// queue by the same reason as described in the above
```
(let's call the above v1) was replaced with
```
// In software IRQ context, the call causes dead-lock to disable the tasklet
// synchronously.
```
on occasion of 7ba5ca32fe6e (let's call this v2).

I sought to replace it with
```
// use wq to prevent deadlock between process context spin_lock
// of snd_pcm_stream_lock_irq() in snd_pcm_status64() and
// softIRQ context spin_lock of snd_pcm_stream_lock_irqsave()
// in snd_pcm_period_elapsed()
```
to prevent this issue from occurring again (let's call this v3).

Should I include v1, v3 or a combination of v1 and v3 in my next patch?

> Just for safe, it is preferable to execute 'scripts/checkpatch.pl' in
> kernel tree to check the patchset generated by send-email subcommand[3].
Absolutely should have done so, sorry.

Thank you for your patience and guidance,
Edmund Raile.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-07-29 13:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-28 12:26 [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
2024-07-28 12:26 ` [PATCH v2 1/2] ALSA: firewire-lib: restore workqueue for process context Edmund Raile
2024-07-28 12:26 ` [PATCH v2 2/2] ALSA: firewire-lib: prevent deadlock between process and softIRQ context Edmund Raile
2024-07-29  0:27 ` [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Takashi Sakamoto
  -- strict thread matches above, loose matches on Subject: below --
2024-07-28 12:30 Edmund Raile
2024-07-29 10:16 edmund.raile
2024-07-29 13:33 ` Takashi Sakamoto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox