* [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
@ 2024-07-28 12:26 Edmund Raile
2024-07-28 12:26 ` [PATCH v2 1/2] ALSA: firewire-lib: restore workqueue for process context Edmund Raile
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Edmund Raile @ 2024-07-28 12:26 UTC (permalink / raw)
To: o-takashi, clemens
Cc: tiwai, alsa-devel, linux-sound, linux-kernel, edmund.raile,
stable
This patchset serves to prevent a deadlock between
process context and softIRQ context:
A. In the process context
* (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) Then attempt to enter tasklet
B. In the softIRQ context
* (lock B) Enter tasklet
* (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed()
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
The issue has been reported as a regression of kernel 5.14:
Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with
Fireface 800")
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
update") belongs to the same patch series and removed
the now-unused workqueue entirely.
Though being observed on RME Fireface 800, this issue would affect all
Firewire audio interfaces using ohci amdtp + pcm streaming.
ALSA streaming, especially under intensive CPU load will reveal this issue
the soonest due to issuing more hardIRQs, with time to occurrence ranging
from 2 secons to 30 minutes after starting playback.
to reproduce the issue:
direct ALSA playback to the device:
mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
- high CPU load
stress --cpu $(nproc)
- switching between applications via workspaces
tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.
Backport note:
Also applies to and fixes on (tested):
6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
Edmund Raile (2):
ALSA: firewire-lib: restore workqueue for process context
ALSA: firewire-lib: prevent deadlock between process and softIRQ
context
sound/firewire/amdtp-stream.c | 36 ++++++++++++++++++++++-------------
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 24 insertions(+), 13 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v2 1/2] ALSA: firewire-lib: restore workqueue for process context
2024-07-28 12:26 [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
@ 2024-07-28 12:26 ` Edmund Raile
2024-07-28 12:26 ` [PATCH v2 2/2] ALSA: firewire-lib: prevent deadlock between process and softIRQ context Edmund Raile
2024-07-29 0:27 ` [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Takashi Sakamoto
2 siblings, 0 replies; 7+ messages in thread
From: Edmund Raile @ 2024-07-28 12:26 UTC (permalink / raw)
To: o-takashi, clemens
Cc: tiwai, alsa-devel, linux-sound, linux-kernel, edmund.raile,
stable
prepare resolution of deadlock between process context and softIRQ
context:
restore workqueue previously used for process context:
revert commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue
for period update")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/
Reported-by: edmund.raile <edmund.raile@proton.me>
Signed-off-by: Edmund Raile <edmund.raile@protonmail.com>
---
sound/firewire/amdtp-stream.c | 15 +++++++++++++++
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 16 insertions(+)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index d35d0a420ee0..31201d506a21 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -77,6 +77,8 @@
// overrun. Actual device can skip more, then this module stops the packet streaming.
#define IR_JUMBO_PAYLOAD_MAX_SKIP_CYCLES 5
+static void pcm_period_work(struct work_struct *work);
+
/**
* amdtp_stream_init - initialize an AMDTP stream structure
* @s: the AMDTP stream to initialize
@@ -105,6 +107,7 @@ int amdtp_stream_init(struct amdtp_stream *s, struct fw_unit *unit,
s->flags = flags;
s->context = ERR_PTR(-1);
mutex_init(&s->mutex);
+ INIT_WORK(&s->period_work, pcm_period_work);
s->packet_index = 0;
init_waitqueue_head(&s->ready_wait);
@@ -347,6 +350,7 @@ EXPORT_SYMBOL(amdtp_stream_get_max_payload);
*/
void amdtp_stream_pcm_prepare(struct amdtp_stream *s)
{
+ cancel_work_sync(&s->period_work);
s->pcm_buffer_pointer = 0;
s->pcm_period_pointer = 0;
}
@@ -624,6 +628,16 @@ static void update_pcm_pointers(struct amdtp_stream *s,
}
}
+static void pcm_period_work(struct work_struct *work)
+{
+ struct amdtp_stream *s = container_of(work, struct amdtp_stream,
+ period_work);
+ struct snd_pcm_substream *pcm = READ_ONCE(s->pcm);
+
+ if (pcm)
+ snd_pcm_period_elapsed(pcm);
+}
+
static int queue_packet(struct amdtp_stream *s, struct fw_iso_packet *params,
bool sched_irq)
{
@@ -1910,6 +1924,7 @@ static void amdtp_stream_stop(struct amdtp_stream *s)
return;
}
+ cancel_work_sync(&s->period_work);
fw_iso_context_stop(s->context);
fw_iso_context_destroy(s->context);
s->context = ERR_PTR(-1);
diff --git a/sound/firewire/amdtp-stream.h b/sound/firewire/amdtp-stream.h
index a1ed2e80f91a..775db3fc4959 100644
--- a/sound/firewire/amdtp-stream.h
+++ b/sound/firewire/amdtp-stream.h
@@ -191,5 +191,6 @@ struct amdtp_stream {
/* For a PCM substream processing. */
struct snd_pcm_substream *pcm;
+ struct work_struct period_work;
snd_pcm_uframes_t pcm_buffer_pointer;
unsigned int pcm_period_pointer;
--
2.45.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH v2 2/2] ALSA: firewire-lib: prevent deadlock between process and softIRQ context
2024-07-28 12:26 [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
2024-07-28 12:26 ` [PATCH v2 1/2] ALSA: firewire-lib: restore workqueue for process context Edmund Raile
@ 2024-07-28 12:26 ` Edmund Raile
2024-07-29 0:27 ` [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Takashi Sakamoto
2 siblings, 0 replies; 7+ messages in thread
From: Edmund Raile @ 2024-07-28 12:26 UTC (permalink / raw)
To: o-takashi, clemens
Cc: tiwai, alsa-devel, linux-sound, linux-kernel, edmund.raile,
stable
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
With RME Fireface 800, this lead to a regression since Kernels 5.14.0,
causing a deadlock with eventual system freeze under ALSA operation:
A. In the process context
* (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) Then attempt to enter tasklet
B. In the softIRQ context
* (lock B) Enter tasklet
* (lock A) Attempt to acquire spin_lock by
snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed()
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
Restore the process context work queue to prevent deadlock
between ALSA substream process context spin_lock of
snd_pcm_stream_lock_irq() in snd_pcm_status64()
and OHCI 1394 IT softIRQ context spin_lock of
snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed().
Cc: stable@vger.kernel.org
Fixes: 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context")
Link: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/
Reported-by: edmund.raile <edmund.raile@proton.me>
Signed-off-by: Edmund Raile <edmund.raile@protonmail.com>
---
sound/firewire/amdtp-stream.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index 31201d506a21..c037d7de1fdc 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -615,16 +615,9 @@ static void update_pcm_pointers(struct amdtp_stream *s,
// The program in user process should periodically check the status of intermediate
// buffer associated to PCM substream to process PCM frames in the buffer, instead
// of receiving notification of period elapsed by poll wait.
- if (!pcm->runtime->no_period_wakeup) {
- if (in_softirq()) {
- // In software IRQ context for 1394 OHCI.
- snd_pcm_period_elapsed(pcm);
- } else {
- // In process context of ALSA PCM application under acquired lock of
- // PCM substream.
- snd_pcm_period_elapsed_under_stream_lock(pcm);
- }
- }
+ if (!pcm->runtime->no_period_wakeup)
+ // wq used with amdtp_domain_stream_pcm_pointer() to prevent deadlock
+ queue_work(system_highpri_wq, &s->period_work);
}
}
@@ -1866,9 +1859,11 @@ unsigned long amdtp_domain_stream_pcm_pointer(struct amdtp_domain *d,
// Process isochronous packets queued till recent isochronous cycle to handle PCM frames.
if (irq_target && amdtp_stream_running(irq_target)) {
- // In software IRQ context, the call causes dead-lock to disable the tasklet
- // synchronously.
- if (!in_softirq())
+ // use wq to prevent deadlock between process context spin_lock
+ // of snd_pcm_stream_lock_irq() in snd_pcm_status64() and
+ // softIRQ context spin_lock of snd_pcm_stream_lock_irqsave()
+ // in snd_pcm_period_elapsed()
+ if ((!in_softirq()) && (current_work() != &s->period_work))
fw_iso_context_flush_completions(irq_target->context);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
2024-07-28 12:26 [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
2024-07-28 12:26 ` [PATCH v2 1/2] ALSA: firewire-lib: restore workqueue for process context Edmund Raile
2024-07-28 12:26 ` [PATCH v2 2/2] ALSA: firewire-lib: prevent deadlock between process and softIRQ context Edmund Raile
@ 2024-07-29 0:27 ` Takashi Sakamoto
2 siblings, 0 replies; 7+ messages in thread
From: Takashi Sakamoto @ 2024-07-29 0:27 UTC (permalink / raw)
To: Edmund Raile; +Cc: tiwai, linux-sound, stable
Hi,
On Sun, Jul 28, 2024 at 12:26:21PM +0000, Edmund Raile wrote:
> This patchset serves to prevent a deadlock between
> process context and softIRQ context:
(snip)
> Edmund Raile (2):
> ALSA: firewire-lib: restore workqueue for process context
> ALSA: firewire-lib: prevent deadlock between process and softIRQ
> context
>
> sound/firewire/amdtp-stream.c | 36 ++++++++++++++++++++++-------------
> sound/firewire/amdtp-stream.h | 1 +
> 2 files changed, 24 insertions(+), 13 deletions(-)
Thank you for your sending the revised patches, it looks better than the
previous one. However, I have an additional request.
In this case, it is enough to execute 'revert' subcommand[1] of git(1),
like:
$ git revert -s b5b519965c4c
$ git revert -s 7ba5ca32fe6e
It is permitted to add postscript to the commit comment generated by the
above command. You see my recent post as an example[2].
Just for safe, it is preferable to execute 'scripts/checkpatch.pl' in
kernel tree to check the patchset generated by send-email subcommand[3].
[1] https://git-scm.com/docs/git-revert
[2] https://lore.kernel.org/lkml/20240725161648.130404-1-o-takashi@sakamocchi.jp/
[3] https://git-scm.com/docs/git-send-email
Thanks
Takashi Sakamoto
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
@ 2024-07-28 12:30 Edmund Raile
0 siblings, 0 replies; 7+ messages in thread
From: Edmund Raile @ 2024-07-28 12:30 UTC (permalink / raw)
To: o-takashi, clemens
Cc: tiwai, alsa-devel, linux-sound, linux-kernel, edmund.raile,
stable
This patchset serves to prevent a deadlock between
process context and softIRQ context:
A. In the process context
* (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) Then attempt to enter tasklet
B. In the softIRQ context
* (lock B) Enter tasklet
* (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed()
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
The issue has been reported as a regression of kernel 5.14:
Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with
Fireface 800")
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
update") belongs to the same patch series and removed
the now-unused workqueue entirely.
Though being observed on RME Fireface 800, this issue would affect all
Firewire audio interfaces using ohci amdtp + pcm streaming.
ALSA streaming, especially under intensive CPU load will reveal this issue
the soonest due to issuing more hardIRQs, with time to occurrence ranging
from 2 secons to 30 minutes after starting playback.
to reproduce the issue:
direct ALSA playback to the device:
mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
- high CPU load
stress --cpu $(nproc)
- switching between applications via workspaces
tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.
Backport note:
Also applies to and fixes on (tested):
6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
Edmund Raile (2):
ALSA: firewire-lib: restore workqueue for process context
ALSA: firewire-lib: prevent deadlock between process and softIRQ
context
sound/firewire/amdtp-stream.c | 36 ++++++++++++++++++++++-------------
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 24 insertions(+), 13 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
@ 2024-07-29 10:16 edmund.raile
2024-07-29 13:33 ` Takashi Sakamoto
0 siblings, 1 reply; 7+ messages in thread
From: edmund.raile @ 2024-07-29 10:16 UTC (permalink / raw)
To: Takashi Sakamoto
Cc: linux-sound, stable, tiwai, clemens, alsa-devel, linux-kernel
> Thank you for your sending the revised patches, it looks better than the
> previous one. However, I have an additional request.
Allright, patch v3 it is.
> [1] https://git-scm.com/docs/git-revert
Should have known git has something like that, how handy!
> $ git revert -s b5b519965c4c
Yes, 5b5 can be removed via revert, but what is the difference in
effect? Just time saving?
> $ git revert -s 7ba5ca32fe6e
This one I'd like to ask you about:
The original inline comment in amdtp-stream.c
amdtp_domain_stream_pcm_pointer()
```
// This function is called in software IRQ context of
// period_work or process context.
//
// When the software IRQ context was scheduled by software IRQ
// context of IT contexts, queued packets were already handled.
// Therefore, no need to flush the queue in buffer furthermore.
//
// When the process context reach here, some packets will be
// already queued in the buffer. These packets should be handled
// immediately to keep better granularity of PCM pointer.
//
// Later, the process context will sometimes schedules software
// IRQ context of the period_work. Then, no need to flush the
// queue by the same reason as described in the above
```
(let's call the above v1) was replaced with
```
// In software IRQ context, the call causes dead-lock to disable the tasklet
// synchronously.
```
on occasion of 7ba5ca32fe6e (let's call this v2).
I sought to replace it with
```
// use wq to prevent deadlock between process context spin_lock
// of snd_pcm_stream_lock_irq() in snd_pcm_status64() and
// softIRQ context spin_lock of snd_pcm_stream_lock_irqsave()
// in snd_pcm_period_elapsed()
```
to prevent this issue from occurring again (let's call this v3).
Should I include v1, v3 or a combination of v1 and v3 in my next patch?
> Just for safe, it is preferable to execute 'scripts/checkpatch.pl' in
> kernel tree to check the patchset generated by send-email subcommand[3].
Absolutely should have done so, sorry.
Thank you for your patience and guidance,
Edmund Raile.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
2024-07-29 10:16 edmund.raile
@ 2024-07-29 13:33 ` Takashi Sakamoto
0 siblings, 0 replies; 7+ messages in thread
From: Takashi Sakamoto @ 2024-07-29 13:33 UTC (permalink / raw)
To: edmund.raile
Cc: linux-sound, stable, tiwai, clemens, alsa-devel, linux-kernel
On Mon, Jul 29, 2024 at 10:16:04AM +0000, edmund.raile wrote:
> > $ git revert -s b5b519965c4c
> Yes, 5b5 can be removed via revert, but what is the difference in
> effect? Just time saving?
The title of commit generated by git-revert could be helpful when reading
history of changes later. The code change is itself important, while the
history of changes often assists developers to work between several
trees.
> > $ git revert -s 7ba5ca32fe6e
> This one I'd like to ask you about:
> The original inline comment in amdtp-stream.c
> amdtp_domain_stream_pcm_pointer()
> ```
> // This function is called in software IRQ context of
> // period_work or process context.
> //
> // When the software IRQ context was scheduled by software IRQ
> // context of IT contexts, queued packets were already handled.
> // Therefore, no need to flush the queue in buffer furthermore.
> //
> // When the process context reach here, some packets will be
> // already queued in the buffer. These packets should be handled
> // immediately to keep better granularity of PCM pointer.
> //
> // Later, the process context will sometimes schedules software
> // IRQ context of the period_work. Then, no need to flush the
> // queue by the same reason as described in the above
> ```
> (let's call the above v1) was replaced with
> ```
> // In software IRQ context, the call causes dead-lock to disable the tasklet
> // synchronously.
> ```
> on occasion of 7ba5ca32fe6e (let's call this v2).
>
> I sought to replace it with
> ```
> // use wq to prevent deadlock between process context spin_lock
> // of snd_pcm_stream_lock_irq() in snd_pcm_status64() and
> // softIRQ context spin_lock of snd_pcm_stream_lock_irqsave()
> // in snd_pcm_period_elapsed()
> ```
> to prevent this issue from occurring again (let's call this v3).
>
> Should I include v1, v3 or a combination of v1 and v3 in my next patch?
I think we pay enough attention to regenerate the deadlock, thus v3 is a
good choice. But the cause of deadlock in the above description is a bit
wrong. It is typical AB/BA deadlock, like:
Thread 0:
1. Acquire stream lock by calling 'snd_pcm_stream_lock_irq()' or so
2. Wait until running tasklet finishes by calling 'tasklet_disable_in_atomic()'
(actually at tasklet_unlock_spin_wait())
Thread 1:
1. Launch tasklet
2. Wait until the stream lock released
The softIRQ context does not related to any type of lock in sound
subsystem essentially.
Thanks
Takashi Sakamoto
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-07-29 13:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-28 12:26 [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
2024-07-28 12:26 ` [PATCH v2 1/2] ALSA: firewire-lib: restore workqueue for process context Edmund Raile
2024-07-28 12:26 ` [PATCH v2 2/2] ALSA: firewire-lib: prevent deadlock between process and softIRQ context Edmund Raile
2024-07-29 0:27 ` [PATCH v2 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Takashi Sakamoto
-- strict thread matches above, loose matches on Subject: below --
2024-07-28 12:30 Edmund Raile
2024-07-29 10:16 edmund.raile
2024-07-29 13:33 ` Takashi Sakamoto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox