* [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
@ 2024-07-30 19:53 Edmund Raile
2024-07-30 19:53 ` [PATCH v4 1/2] Revert "ALSA: firewire-lib: obsolete workqueue for period update" Edmund Raile
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Edmund Raile @ 2024-07-30 19:53 UTC (permalink / raw)
To: o-takashi, clemens; +Cc: tiwai, alsa-devel, linux-sound, linux-kernel, stable
This patchset serves to prevent an AB/BA deadlock:
thread 0:
* (lock A) acquire substream lock by
snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) wait for tasklet to finish by calling
tasklet_unlock_spin_wait() in
tasklet_disable_in_atomic() in
ohci_flush_iso_completions() of ohci.c
thread 1:
* (lock B) enter tasklet
* (lock A) attempt to acquire substream lock,
waiting for it to be released:
snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed() in
update_pcm_pointers() in
process_ctx_payloads() in
process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
The issue has been reported as a regression of kernel 5.14:
Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock
with Fireface 800")
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
update") belongs to the same patch series and removed
the now-unused workqueue entirely.
Though being observed on RME Fireface 800, this issue would affect all
Firewire audio interfaces using ohci amdtp + pcm streaming.
ALSA streaming, especially under intensive CPU load will reveal this issue
the soonest due to issuing more hardIRQs, with time to occurrence ranging
from 2 secons to 30 minutes after starting playback.
to reproduce the issue:
direct ALSA playback to the device:
mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
- high CPU load
stress --cpu $(nproc)
- switching between applications via workspaces
tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.
Cc: stable@vger.kernel.org
Backport note:
Also applies to and fixes on (tested):
6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
Edmund Raile (2):
Revert "ALSA: firewire-lib: obsolete workqueue for period update"
Revert "ALSA: firewire-lib: operate for period elapse event in process
context"
sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++-------------
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 25 insertions(+), 14 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 1/2] Revert "ALSA: firewire-lib: obsolete workqueue for period update"
2024-07-30 19:53 [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
@ 2024-07-30 19:53 ` Edmund Raile
2024-07-30 19:53 ` [PATCH v4 2/2] Revert "ALSA: firewire-lib: operate for period elapse event in process context" Edmund Raile
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Edmund Raile @ 2024-07-30 19:53 UTC (permalink / raw)
To: o-takashi, clemens; +Cc: tiwai, alsa-devel, linux-sound, linux-kernel, stable
prepare resolution of AB/BA deadlock competition for substream lock:
restore workqueue previously used for process context:
revert commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue
for period update")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/
Signed-off-by: Edmund Raile <edmund.raile@protonmail.com>
---
sound/firewire/amdtp-stream.c | 15 +++++++++++++++
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 16 insertions(+)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index d35d0a420ee0..31201d506a21 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -77,6 +77,8 @@
// overrun. Actual device can skip more, then this module stops the packet streaming.
#define IR_JUMBO_PAYLOAD_MAX_SKIP_CYCLES 5
+static void pcm_period_work(struct work_struct *work);
+
/**
* amdtp_stream_init - initialize an AMDTP stream structure
* @s: the AMDTP stream to initialize
@@ -105,6 +107,7 @@ int amdtp_stream_init(struct amdtp_stream *s, struct fw_unit *unit,
s->flags = flags;
s->context = ERR_PTR(-1);
mutex_init(&s->mutex);
+ INIT_WORK(&s->period_work, pcm_period_work);
s->packet_index = 0;
init_waitqueue_head(&s->ready_wait);
@@ -347,6 +350,7 @@ EXPORT_SYMBOL(amdtp_stream_get_max_payload);
*/
void amdtp_stream_pcm_prepare(struct amdtp_stream *s)
{
+ cancel_work_sync(&s->period_work);
s->pcm_buffer_pointer = 0;
s->pcm_period_pointer = 0;
}
@@ -624,6 +628,16 @@ static void update_pcm_pointers(struct amdtp_stream *s,
}
}
+static void pcm_period_work(struct work_struct *work)
+{
+ struct amdtp_stream *s = container_of(work, struct amdtp_stream,
+ period_work);
+ struct snd_pcm_substream *pcm = READ_ONCE(s->pcm);
+
+ if (pcm)
+ snd_pcm_period_elapsed(pcm);
+}
+
static int queue_packet(struct amdtp_stream *s, struct fw_iso_packet *params,
bool sched_irq)
{
@@ -1910,6 +1924,7 @@ static void amdtp_stream_stop(struct amdtp_stream *s)
return;
}
+ cancel_work_sync(&s->period_work);
fw_iso_context_stop(s->context);
fw_iso_context_destroy(s->context);
s->context = ERR_PTR(-1);
diff --git a/sound/firewire/amdtp-stream.h b/sound/firewire/amdtp-stream.h
index a1ed2e80f91a..775db3fc4959 100644
--- a/sound/firewire/amdtp-stream.h
+++ b/sound/firewire/amdtp-stream.h
@@ -191,5 +191,6 @@ struct amdtp_stream {
/* For a PCM substream processing. */
struct snd_pcm_substream *pcm;
+ struct work_struct period_work;
snd_pcm_uframes_t pcm_buffer_pointer;
unsigned int pcm_period_pointer;
--
2.45.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v4 2/2] Revert "ALSA: firewire-lib: operate for period elapse event in process context"
2024-07-30 19:53 [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
2024-07-30 19:53 ` [PATCH v4 1/2] Revert "ALSA: firewire-lib: obsolete workqueue for period update" Edmund Raile
@ 2024-07-30 19:53 ` Edmund Raile
2024-07-31 8:55 ` [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Takashi Sakamoto
2024-07-31 9:29 ` Takashi Iwai
3 siblings, 0 replies; 5+ messages in thread
From: Edmund Raile @ 2024-07-30 19:53 UTC (permalink / raw)
To: o-takashi, clemens; +Cc: tiwai, alsa-devel, linux-sound, linux-kernel, stable
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
With RME Fireface 800, this lead to a regression since
Kernels 5.14.0, causing an AB/BA deadlock competition for the
substream lock with eventual system freeze under ALSA operation:
thread 0:
* (lock A) acquire substream lock by
snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) wait for tasklet to finish by calling
tasklet_unlock_spin_wait() in
tasklet_disable_in_atomic() in
ohci_flush_iso_completions() of ohci.c
thread 1:
* (lock B) enter tasklet
* (lock A) attempt to acquire substream lock,
waiting for it to be released:
snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed() in
update_pcm_pointers() in
process_ctx_payloads() in
process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
Restore the process context work queue to prevent deadlock
AB/BA deadlock competition for ALSA substream lock of
snd_pcm_stream_lock_irq() in snd_pcm_status64()
and snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed().
revert commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period
elapse event in process context")
Replace inline description to prevent future deadlock.
Cc: stable@vger.kernel.org
Fixes: 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context")
Reported-by: edmund.raile <edmund.raile@proton.me>
Closes: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/
Signed-off-by: Edmund Raile <edmund.raile@protonmail.com>
---
sound/firewire/amdtp-stream.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index 31201d506a21..7438999e0510 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -615,16 +615,8 @@ static void update_pcm_pointers(struct amdtp_stream *s,
// The program in user process should periodically check the status of intermediate
// buffer associated to PCM substream to process PCM frames in the buffer, instead
// of receiving notification of period elapsed by poll wait.
- if (!pcm->runtime->no_period_wakeup) {
- if (in_softirq()) {
- // In software IRQ context for 1394 OHCI.
- snd_pcm_period_elapsed(pcm);
- } else {
- // In process context of ALSA PCM application under acquired lock of
- // PCM substream.
- snd_pcm_period_elapsed_under_stream_lock(pcm);
- }
- }
+ if (!pcm->runtime->no_period_wakeup)
+ queue_work(system_highpri_wq, &s->period_work);
}
}
@@ -1864,11 +1856,14 @@ unsigned long amdtp_domain_stream_pcm_pointer(struct amdtp_domain *d,
{
struct amdtp_stream *irq_target = d->irq_target;
- // Process isochronous packets queued till recent isochronous cycle to handle PCM frames.
if (irq_target && amdtp_stream_running(irq_target)) {
- // In software IRQ context, the call causes dead-lock to disable the tasklet
- // synchronously.
- if (!in_softirq())
+ // use wq to prevent AB/BA deadlock competition for
+ // substream lock:
+ // fw_iso_context_flush_completions() acquires
+ // lock by ohci_flush_iso_completions(),
+ // amdtp-stream process_rx_packets() attempts to
+ // acquire same lock by snd_pcm_elapsed()
+ if (current_work() != &s->period_work)
fw_iso_context_flush_completions(irq_target->context);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
2024-07-30 19:53 [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
2024-07-30 19:53 ` [PATCH v4 1/2] Revert "ALSA: firewire-lib: obsolete workqueue for period update" Edmund Raile
2024-07-30 19:53 ` [PATCH v4 2/2] Revert "ALSA: firewire-lib: operate for period elapse event in process context" Edmund Raile
@ 2024-07-31 8:55 ` Takashi Sakamoto
2024-07-31 9:29 ` Takashi Iwai
3 siblings, 0 replies; 5+ messages in thread
From: Takashi Sakamoto @ 2024-07-31 8:55 UTC (permalink / raw)
To: Edmund Raile; +Cc: tiwai, alsa-devel, linux-sound, stable
Hi,
On Tue, Jul 30, 2024 at 07:53:23PM +0000, Edmund Raile wrote:
> This patchset serves to prevent an AB/BA deadlock:
>
> thread 0:
> * (lock A) acquire substream lock by
> snd_pcm_stream_lock_irq() in
> snd_pcm_status64()
> * (lock B) wait for tasklet to finish by calling
> tasklet_unlock_spin_wait() in
> tasklet_disable_in_atomic() in
> ohci_flush_iso_completions() of ohci.c
>
> thread 1:
> * (lock B) enter tasklet
> * (lock A) attempt to acquire substream lock,
> waiting for it to be released:
> snd_pcm_stream_lock_irqsave() in
> snd_pcm_period_elapsed() in
> update_pcm_pointers() in
> process_ctx_payloads() in
> process_rx_packets() of amdtp-stream.c
>
> ? tasklet_unlock_spin_wait
> </NMI>
> <TASK>
> ohci_flush_iso_completions firewire_ohci
> amdtp_domain_stream_pcm_pointer snd_firewire_lib
> snd_pcm_update_hw_ptr0 snd_pcm
> snd_pcm_status64 snd_pcm
>
> ? native_queued_spin_lock_slowpath
> </NMI>
> <IRQ>
> _raw_spin_lock_irqsave
> snd_pcm_period_elapsed snd_pcm
> process_rx_packets snd_firewire_lib
> irq_target_callback snd_firewire_lib
> handle_it_packet firewire_ohci
> context_tasklet firewire_ohci
>
> The issue has been reported as a regression of kernel 5.14:
> Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
> ("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock
> with Fireface 800")
>
> Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
> in process context") removed the process context workqueue from
> amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
> its overhead.
> Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
> update") belongs to the same patch series and removed
> the now-unused workqueue entirely.
>
> Though being observed on RME Fireface 800, this issue would affect all
> Firewire audio interfaces using ohci amdtp + pcm streaming.
>
> ALSA streaming, especially under intensive CPU load will reveal this issue
> the soonest due to issuing more hardIRQs, with time to occurrence ranging
> from 2 secons to 30 minutes after starting playback.
>
> to reproduce the issue:
> direct ALSA playback to the device:
> mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
> Time to occurrence: 2s to 30m
> Likelihood increased by:
> - high CPU load
> stress --cpu $(nproc)
> - switching between applications via workspaces
> tested with i915 in Xfce
> PulsaAudio / PipeWire conceal the issue as they run PCM substream
> without period wakeup mode, issuing less hardIRQs.
>
> Cc: stable@vger.kernel.org
> Backport note:
> Also applies to and fixes on (tested):
> 6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
>
> Edmund Raile (2):
> Revert "ALSA: firewire-lib: obsolete workqueue for period update"
> Revert "ALSA: firewire-lib: operate for period elapse event in process
> context"
>
> sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++-------------
> sound/firewire/amdtp-stream.h | 1 +
> 2 files changed, 25 insertions(+), 14 deletions(-)
They look good to me.
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
I appreciate your long effort to solve the issue.
Thanks
Takashi Sakamoto
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock
2024-07-30 19:53 [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
` (2 preceding siblings ...)
2024-07-31 8:55 ` [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Takashi Sakamoto
@ 2024-07-31 9:29 ` Takashi Iwai
3 siblings, 0 replies; 5+ messages in thread
From: Takashi Iwai @ 2024-07-31 9:29 UTC (permalink / raw)
To: Edmund Raile
Cc: o-takashi, clemens, tiwai, alsa-devel, linux-sound, linux-kernel,
stable
On Tue, 30 Jul 2024 21:53:23 +0200,
Edmund Raile wrote:
>
> This patchset serves to prevent an AB/BA deadlock:
>
> thread 0:
> * (lock A) acquire substream lock by
> snd_pcm_stream_lock_irq() in
> snd_pcm_status64()
> * (lock B) wait for tasklet to finish by calling
> tasklet_unlock_spin_wait() in
> tasklet_disable_in_atomic() in
> ohci_flush_iso_completions() of ohci.c
>
> thread 1:
> * (lock B) enter tasklet
> * (lock A) attempt to acquire substream lock,
> waiting for it to be released:
> snd_pcm_stream_lock_irqsave() in
> snd_pcm_period_elapsed() in
> update_pcm_pointers() in
> process_ctx_payloads() in
> process_rx_packets() of amdtp-stream.c
>
> ? tasklet_unlock_spin_wait
> </NMI>
> <TASK>
> ohci_flush_iso_completions firewire_ohci
> amdtp_domain_stream_pcm_pointer snd_firewire_lib
> snd_pcm_update_hw_ptr0 snd_pcm
> snd_pcm_status64 snd_pcm
>
> ? native_queued_spin_lock_slowpath
> </NMI>
> <IRQ>
> _raw_spin_lock_irqsave
> snd_pcm_period_elapsed snd_pcm
> process_rx_packets snd_firewire_lib
> irq_target_callback snd_firewire_lib
> handle_it_packet firewire_ohci
> context_tasklet firewire_ohci
>
> The issue has been reported as a regression of kernel 5.14:
> Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u
> ("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock
> with Fireface 800")
>
> Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
> in process context") removed the process context workqueue from
> amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
> its overhead.
> Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
> update") belongs to the same patch series and removed
> the now-unused workqueue entirely.
>
> Though being observed on RME Fireface 800, this issue would affect all
> Firewire audio interfaces using ohci amdtp + pcm streaming.
>
> ALSA streaming, especially under intensive CPU load will reveal this issue
> the soonest due to issuing more hardIRQs, with time to occurrence ranging
> from 2 secons to 30 minutes after starting playback.
>
> to reproduce the issue:
> direct ALSA playback to the device:
> mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
> Time to occurrence: 2s to 30m
> Likelihood increased by:
> - high CPU load
> stress --cpu $(nproc)
> - switching between applications via workspaces
> tested with i915 in Xfce
> PulsaAudio / PipeWire conceal the issue as they run PCM substream
> without period wakeup mode, issuing less hardIRQs.
>
> Cc: stable@vger.kernel.org
> Backport note:
> Also applies to and fixes on (tested):
> 6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
>
> Edmund Raile (2):
> Revert "ALSA: firewire-lib: obsolete workqueue for period update"
> Revert "ALSA: firewire-lib: operate for period elapse event in process
> context"
Applied both patches now. Thanks.
Takashi
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-31 9:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30 19:53 [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Edmund Raile
2024-07-30 19:53 ` [PATCH v4 1/2] Revert "ALSA: firewire-lib: obsolete workqueue for period update" Edmund Raile
2024-07-30 19:53 ` [PATCH v4 2/2] Revert "ALSA: firewire-lib: operate for period elapse event in process context" Edmund Raile
2024-07-31 8:55 ` [PATCH v4 0/2] ALSA: firewire-lib: restore process context workqueue to prevent deadlock Takashi Sakamoto
2024-07-31 9:29 ` Takashi Iwai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox