public inbox for linux-sound@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init
@ 2026-02-14  6:40 Cole Leavitt
  2026-02-14  6:40 ` [PATCH 1/2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry Cole Leavitt
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Cole Leavitt @ 2026-02-14  6:40 UTC (permalink / raw)
  To: Peter Ujfalusi, Bard Liao, Ranjani Sridharan, Liam Girdwood,
	Daniel Baluta
  Cc: Pierre-Louis Bossart, Kai Vehmanen, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, sound-open-firmware, linux-sound, linux-kernel,
	Cole Leavitt

Two fixes for SOF IPC4 reliability issues observed on Lenovo ThinkPad
P16 Gen 3 (Arrow Lake-S, CS42L43 + CS35L56 over SoundWire):

1. Replace the broken delayed_ipc_tx_msg mechanism with a bounded retry
   loop. The old deferred dispatch silently drops messages during D0i3
   transitions, causing 500ms+ hangs per IPC chunk.

2. Add a platform ops callback (dai_link_hw_ready) so Intel HDA
   platforms can wait for SoundWire slave initialization before ALH
   copier setup. Without this, the DSP enters an unrecoverable wedged
   state when userspace opens a PCM before slaves finish re-enumerating
   after resume.

Tested on ThinkPad P16 Gen 3 with repeated suspend/resume cycles
and concurrent audio playback.

Cole Leavitt (2):
  ASoC: SOF: Replace IPC TX busy deferral with bounded retry
  ASoC: SOF: Add platform ops callback for DAI link hardware readiness

 sound/soc/sof/intel/cnl.c            | 17 ++---------
 sound/soc/sof/intel/hda-common-ops.c |  1 +
 sound/soc/sof/intel/hda-ipc.c        | 17 ++---------
 sound/soc/sof/intel/hda.c            | 44 ++++++++++++++++++++++++++++
 sound/soc/sof/intel/hda.h            | 14 ++++-----
 sound/soc/sof/intel/mtl.c            | 17 ++---------
 sound/soc/sof/ipc4-topology.c        |  8 +++++
 sound/soc/sof/ipc4.c                 | 17 +++++++++--
 sound/soc/sof/sof-priv.h             |  3 ++
 9 files changed, 83 insertions(+), 55 deletions(-)


base-commit: 2687c848e57820651b9f69d30c4710f4219f7dbf
-- 
2.52.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry
  2026-02-14  6:40 [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init Cole Leavitt
@ 2026-02-14  6:40 ` Cole Leavitt
  2026-02-16 12:39   ` Péter Ujfalusi
  2026-02-14  6:40 ` [PATCH 2/2] ASoC: SOF: Add platform ops callback for DAI link hardware readiness Cole Leavitt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Cole Leavitt @ 2026-02-14  6:40 UTC (permalink / raw)
  To: Peter Ujfalusi, Bard Liao, Ranjani Sridharan, Liam Girdwood,
	Daniel Baluta
  Cc: Pierre-Louis Bossart, Kai Vehmanen, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, sound-open-firmware, linux-sound, linux-kernel,
	Cole Leavitt

The SOF IPC4 platform send_msg functions (hda_dsp_ipc4_send_msg,
mtl_ipc_send_msg, cnl_ipc4_send_msg) previously stored the message in
delayed_ipc_tx_msg and returned 0 when the TX register was busy. The
deferred message was supposed to be dispatched from the IRQ handler
when the DSP acknowledged the previous message.

This mechanism silently drops messages during D0i3 power transitions
because the IRQ handler never fires while the DSP is in a low-power
state. The caller then hangs in wait_event_timeout() for up to 500ms
per IPC chunk, causing multi-second audio stalls under CPU load.

Fix this by making the platform send_msg functions return -EBUSY
immediately when the TX register is busy (safe since they execute
under spin_lock_irq in sof_ipc_send_msg), and adding a bounded retry
loop with usleep_range() in ipc4_tx_msg_unlocked() which only holds
the tx_mutex (a sleepable context). The retry loop attempts up to 50
iterations with 100-200us delays, bounding the maximum busy-wait to
approximately 10ms instead of the previous 500ms timeout.

Also remove the now-dead delayed_ipc_tx_msg field from
sof_intel_hda_dev, the dispatch code, and the ack_received tracking
variable from all three IRQ thread handlers (hda_dsp_ipc4_irq_thread,
mtl_ipc_irq_thread, cnl_ipc4_irq_thread).

Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
 sound/soc/sof/intel/cnl.c     | 17 ++---------------
 sound/soc/sof/intel/hda-ipc.c | 17 ++---------------
 sound/soc/sof/intel/hda.h     |  8 --------
 sound/soc/sof/intel/mtl.c     | 17 ++---------------
 sound/soc/sof/ipc4.c          | 17 +++++++++++++++--
 5 files changed, 21 insertions(+), 55 deletions(-)

diff --git a/sound/soc/sof/intel/cnl.c b/sound/soc/sof/intel/cnl.c
index 0cc5725515e7..a2c6c7894a0f 100644
--- a/sound/soc/sof/intel/cnl.c
+++ b/sound/soc/sof/intel/cnl.c
@@ -37,7 +37,6 @@ irqreturn_t cnl_ipc4_irq_thread(int irq, void *context)
 {
 	struct sof_ipc4_msg notification_data = {{ 0 }};
 	struct snd_sof_dev *sdev = context;
-	bool ack_received = false;
 	bool ipc_irq = false;
 	u32 hipcida, hipctdr;
 
@@ -51,7 +50,6 @@ irqreturn_t cnl_ipc4_irq_thread(int irq, void *context)
 		cnl_ipc_dsp_done(sdev);
 
 		ipc_irq = true;
-		ack_received = true;
 	}
 
 	if (hipctdr & CNL_DSP_REG_HIPCTDR_BUSY) {
@@ -101,13 +99,6 @@ irqreturn_t cnl_ipc4_irq_thread(int irq, void *context)
 		/* This interrupt is not shared so no need to return IRQ_NONE. */
 		dev_dbg_ratelimited(sdev->dev, "nothing to do in IPC IRQ thread\n");
 
-	if (ack_received) {
-		struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
-
-		if (hdev->delayed_ipc_tx_msg)
-			cnl_ipc4_send_msg(sdev, hdev->delayed_ipc_tx_msg);
-	}
-
 	return IRQ_HANDLED;
 }
 EXPORT_SYMBOL_NS(cnl_ipc4_irq_thread, "SND_SOC_SOF_INTEL_CNL");
@@ -266,12 +257,8 @@ int cnl_ipc4_send_msg(struct snd_sof_dev *sdev, struct snd_sof_ipc_msg *msg)
 	struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
 	struct sof_ipc4_msg *msg_data = msg->msg_data;
 
-	if (hda_ipc4_tx_is_busy(sdev)) {
-		hdev->delayed_ipc_tx_msg = msg;
-		return 0;
-	}
-
-	hdev->delayed_ipc_tx_msg = NULL;
+	if (hda_ipc4_tx_is_busy(sdev))
+		return -EBUSY;
 
 	/* send the message via mailbox */
 	if (msg_data->data_size)
diff --git a/sound/soc/sof/intel/hda-ipc.c b/sound/soc/sof/intel/hda-ipc.c
index 94425c510861..78449452041c 100644
--- a/sound/soc/sof/intel/hda-ipc.c
+++ b/sound/soc/sof/intel/hda-ipc.c
@@ -106,12 +106,8 @@ int hda_dsp_ipc4_send_msg(struct snd_sof_dev *sdev, struct snd_sof_ipc_msg *msg)
 	struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
 	struct sof_ipc4_msg *msg_data = msg->msg_data;
 
-	if (hda_ipc4_tx_is_busy(sdev)) {
-		hdev->delayed_ipc_tx_msg = msg;
-		return 0;
-	}
-
-	hdev->delayed_ipc_tx_msg = NULL;
+	if (hda_ipc4_tx_is_busy(sdev))
+		return -EBUSY;
 
 	/* send the message via mailbox */
 	if (msg_data->data_size)
@@ -168,7 +164,6 @@ irqreturn_t hda_dsp_ipc4_irq_thread(int irq, void *context)
 {
 	struct sof_ipc4_msg notification_data = {{ 0 }};
 	struct snd_sof_dev *sdev = context;
-	bool ack_received = false;
 	bool ipc_irq = false;
 	u32 hipcie, hipct;
 
@@ -182,7 +177,6 @@ irqreturn_t hda_dsp_ipc4_irq_thread(int irq, void *context)
 		hda_dsp_ipc_dsp_done(sdev);
 
 		ipc_irq = true;
-		ack_received = true;
 	}
 
 	if (hipct & HDA_DSP_REG_HIPCT_BUSY) {
@@ -236,13 +230,6 @@ irqreturn_t hda_dsp_ipc4_irq_thread(int irq, void *context)
 		/* This interrupt is not shared so no need to return IRQ_NONE. */
 		dev_dbg_ratelimited(sdev->dev, "nothing to do in IPC IRQ thread\n");
 
-	if (ack_received) {
-		struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
-
-		if (hdev->delayed_ipc_tx_msg)
-			hda_dsp_ipc4_send_msg(sdev, hdev->delayed_ipc_tx_msg);
-	}
-
 	return IRQ_HANDLED;
 }
 EXPORT_SYMBOL_NS(hda_dsp_ipc4_irq_thread, "SND_SOC_SOF_INTEL_HDA_COMMON");
diff --git a/sound/soc/sof/intel/hda.h b/sound/soc/sof/intel/hda.h
index 562fe8be79c1..ac9f76a5ef97 100644
--- a/sound/soc/sof/intel/hda.h
+++ b/sound/soc/sof/intel/hda.h
@@ -549,14 +549,6 @@ struct sof_intel_hda_dev {
 
 	/* work queue for mic privacy state change notification sending */
 	struct sof_ace3_mic_privacy mic_privacy;
-
-	/*
-	 * Pointing to the IPC message if immediate sending was not possible
-	 * because the downlink communication channel was BUSY at the time.
-	 * The message will be re-tried when the channel becomes free (the ACK
-	 * is received from the DSP for the previous message)
-	 */
-	struct snd_sof_ipc_msg *delayed_ipc_tx_msg;
 };
 
 static inline struct hdac_bus *sof_to_bus(struct snd_sof_dev *s)
diff --git a/sound/soc/sof/intel/mtl.c b/sound/soc/sof/intel/mtl.c
index 095dcf1a18e4..24dec128f589 100644
--- a/sound/soc/sof/intel/mtl.c
+++ b/sound/soc/sof/intel/mtl.c
@@ -101,12 +101,8 @@ static int mtl_ipc_send_msg(struct snd_sof_dev *sdev, struct snd_sof_ipc_msg *ms
 	struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
 	struct sof_ipc4_msg *msg_data = msg->msg_data;
 
-	if (hda_ipc4_tx_is_busy(sdev)) {
-		hdev->delayed_ipc_tx_msg = msg;
-		return 0;
-	}
-
-	hdev->delayed_ipc_tx_msg = NULL;
+	if (hda_ipc4_tx_is_busy(sdev))
+		return -EBUSY;
 
 	/* send the message via mailbox */
 	if (msg_data->data_size)
@@ -559,7 +555,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
 {
 	struct sof_ipc4_msg notification_data = {{ 0 }};
 	struct snd_sof_dev *sdev = context;
-	bool ack_received = false;
 	bool ipc_irq = false;
 	u32 hipcida;
 	u32 hipctdr;
@@ -576,7 +571,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
 		mtl_ipc_dsp_done(sdev);
 
 		ipc_irq = true;
-		ack_received = true;
 	}
 
 	if (hipctdr & MTL_DSP_REG_HFIPCXTDR_BUSY) {
@@ -628,13 +622,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
 		dev_dbg_ratelimited(sdev->dev, "nothing to do in IPC IRQ thread\n");
 	}
 
-	if (ack_received) {
-		struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
-
-		if (hdev->delayed_ipc_tx_msg)
-			mtl_ipc_send_msg(sdev, hdev->delayed_ipc_tx_msg);
-	}
-
 	return IRQ_HANDLED;
 }
 
diff --git a/sound/soc/sof/ipc4.c b/sound/soc/sof/ipc4.c
index a4a090e6724a..2e24308ef9cc 100644
--- a/sound/soc/sof/ipc4.c
+++ b/sound/soc/sof/ipc4.c
@@ -365,20 +365,33 @@ static int ipc4_wait_tx_done(struct snd_sof_ipc *ipc, void *reply_data)
 	return ret;
 }
 
+#define SOF_IPC4_TX_BUSY_RETRIES	50
+#define SOF_IPC4_TX_BUSY_DELAY_US	100
+#define SOF_IPC4_TX_BUSY_DELAY_MAX_US	200
+
 static int ipc4_tx_msg_unlocked(struct snd_sof_ipc *ipc,
 				void *msg_data, size_t msg_bytes,
 				void *reply_data, size_t reply_bytes)
 {
 	struct sof_ipc4_msg *ipc4_msg = msg_data;
 	struct snd_sof_dev *sdev = ipc->sdev;
-	int ret;
+	int ret, i;
 
 	if (msg_bytes > ipc->max_payload_size || reply_bytes > ipc->max_payload_size)
 		return -EINVAL;
 
 	sof_ipc4_log_header(sdev->dev, "ipc tx      ", msg_data, true);
 
-	ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
+	for (i = 0; i < SOF_IPC4_TX_BUSY_RETRIES; i++) {
+		ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
+		if (ret != -EBUSY)
+			break;
+		usleep_range(SOF_IPC4_TX_BUSY_DELAY_US,
+			     SOF_IPC4_TX_BUSY_DELAY_MAX_US);
+	}
+	if (i == SOF_IPC4_TX_BUSY_RETRIES)
+		dev_dbg(sdev->dev, "%s: TX still busy after %d retries\n",
+			__func__, i);
 	if (ret) {
 		dev_err_ratelimited(sdev->dev,
 				    "%s: ipc message send for %#x|%#x failed: %d\n",
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] ASoC: SOF: Add platform ops callback for DAI link hardware readiness
  2026-02-14  6:40 [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init Cole Leavitt
  2026-02-14  6:40 ` [PATCH 1/2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry Cole Leavitt
@ 2026-02-14  6:40 ` Cole Leavitt
  2026-02-17  8:08   ` Pierre-Louis Bossart
  2026-02-16 10:52 ` [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init Péter Ujfalusi
  2026-02-16 16:41 ` Péter Ujfalusi
  3 siblings, 1 reply; 9+ messages in thread
From: Cole Leavitt @ 2026-02-14  6:40 UTC (permalink / raw)
  To: Peter Ujfalusi, Bard Liao, Ranjani Sridharan, Liam Girdwood,
	Daniel Baluta
  Cc: Pierre-Louis Bossart, Kai Vehmanen, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, sound-open-firmware, linux-sound, linux-kernel,
	Cole Leavitt

After suspend/resume (D3->D0), the SOF firmware is reloaded fresh and
pipelines are recreated lazily when userspace opens a PCM. However,
SoundWire slave re-enumeration runs asynchronously via a 100ms delayed
work item (SDW_INTEL_DELAYED_ENUMERATION_MS). If userspace attempts to
play audio before SoundWire slaves finish re-enumerating, the firmware
returns error 9 (resource not found) when creating ALH copier modules,
leaving the DSP in an unrecoverable wedged state requiring reboot.

Add a new optional dai_link_hw_ready callback to struct snd_sof_dsp_ops
that allows platform-specific code to wait for DAI link hardware to
become ready before pipeline setup. The generic ipc4-topology.c calls
this callback (when set) in sof_ipc4_prepare_copier_module() before
configuring DAI copiers, maintaining SOF's platform abstraction.

The Intel HDA implementation (hda_sdw_dai_hw_ready) waits for all
attached SoundWire slaves to complete initialization using
wait_for_completion_interruptible_timeout() with a 2-second timeout.
This is safe for multiple waiters since the SoundWire subsystem uses
complete_all() for initialization_complete. Unattached slaves (declared
in ACPI but not physically present) are skipped to avoid false timeouts.

The function returns -ETIMEDOUT on timeout (instead of warn-and-continue)
to prevent the DSP from entering a wedged state. On non-resume paths the
completions are already done, so the wait returns immediately.

Link: https://github.com/thesofproject/sof/issues/8662
Link: https://github.com/thesofproject/sof/issues/9308
Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
 sound/soc/sof/intel/hda-common-ops.c |  1 +
 sound/soc/sof/intel/hda.c            | 44 ++++++++++++++++++++++++++++
 sound/soc/sof/intel/hda.h            |  6 ++++
 sound/soc/sof/ipc4-topology.c        |  8 +++++
 sound/soc/sof/sof-priv.h             |  3 ++
 5 files changed, 62 insertions(+)

diff --git a/sound/soc/sof/intel/hda-common-ops.c b/sound/soc/sof/intel/hda-common-ops.c
index 746b426b1329..315cb61426da 100644
--- a/sound/soc/sof/intel/hda-common-ops.c
+++ b/sound/soc/sof/intel/hda-common-ops.c
@@ -84,6 +84,7 @@ const struct snd_sof_dsp_ops sof_hda_common_ops = {
 	.unregister_ipc_clients = hda_unregister_clients,
 
 	/* DAI drivers */
+	.dai_link_hw_ready = hda_sdw_dai_hw_ready,
 	.drv		= skl_dai,
 	.num_drv	= SOF_SKL_NUM_DAIS,
 	.is_chain_dma_supported	= hda_is_chain_dma_supported,
diff --git a/sound/soc/sof/intel/hda.c b/sound/soc/sof/intel/hda.c
index 686ecc040867..956106dc0e02 100644
--- a/sound/soc/sof/intel/hda.c
+++ b/sound/soc/sof/intel/hda.c
@@ -378,6 +378,50 @@ static void hda_dsp_sdw_process_mic_privacy(struct snd_sof_dev *sdev)
 		chip->process_mic_privacy(sdev, true, AZX_REG_ML_LEPTR_ID_SDW);
 }
 
+int hda_sdw_dai_hw_ready(struct snd_sof_dev *sdev, int dai_type)
+{
+	struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
+	struct sdw_peripherals *sdw_p;
+	long ret;
+	int idx;
+
+	if (dai_type != SOF_DAI_INTEL_ALH)
+		return 0;
+
+	if (!hdev || !hdev->sdw || !hdev->sdw->peripherals)
+		return 0;
+
+	sdw_p = hdev->sdw->peripherals;
+
+	for (idx = 0; idx < sdw_p->num_peripherals; idx++) {
+		struct sdw_slave *slave = sdw_p->array[idx];
+
+		if (!slave)
+			continue;
+
+		if (slave->status != SDW_SLAVE_ATTACHED)
+			continue;
+
+		ret = wait_for_completion_interruptible_timeout(
+				&slave->initialization_complete,
+				msecs_to_jiffies(2000));
+		if (ret == 0) {
+			dev_err(sdev->dev,
+				"timeout waiting for SoundWire slave %s initialization\n",
+				dev_name(&slave->dev));
+			return -ETIMEDOUT;
+		}
+		if (ret < 0) {
+			dev_dbg(sdev->dev,
+				"interrupted waiting for SoundWire slave %s initialization: %ld\n",
+				dev_name(&slave->dev), ret);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
 #else /* IS_ENABLED(CONFIG_SND_SOC_SOF_INTEL_SOUNDWIRE) */
 static inline int hda_sdw_acpi_scan(struct snd_sof_dev *sdev)
 {
diff --git a/sound/soc/sof/intel/hda.h b/sound/soc/sof/intel/hda.h
index ac9f76a5ef97..9bd8fe82ae9e 100644
--- a/sound/soc/sof/intel/hda.h
+++ b/sound/soc/sof/intel/hda.h
@@ -830,6 +830,7 @@ bool hda_sdw_check_wakeen_irq_common(struct snd_sof_dev *sdev);
 void hda_sdw_process_wakeen_common(struct snd_sof_dev *sdev);
 void hda_sdw_process_wakeen(struct snd_sof_dev *sdev);
 bool hda_common_check_sdw_irq(struct snd_sof_dev *sdev);
+int hda_sdw_dai_hw_ready(struct snd_sof_dev *sdev, int dai_type);
 
 #else
 
@@ -879,6 +880,11 @@ static inline bool hda_common_check_sdw_irq(struct snd_sof_dev *sdev)
 	return false;
 }
 
+static inline int hda_sdw_dai_hw_ready(struct snd_sof_dev *sdev, int dai_type)
+{
+	return 0;
+}
+
 #endif
 
 int sdw_hda_dai_hw_params(struct snd_pcm_substream *substream,
diff --git a/sound/soc/sof/ipc4-topology.c b/sound/soc/sof/ipc4-topology.c
index d621e7914a73..a8b107d7e786 100644
--- a/sound/soc/sof/ipc4-topology.c
+++ b/sound/soc/sof/ipc4-topology.c
@@ -2256,6 +2256,14 @@ sof_ipc4_prepare_copier_module(struct snd_sof_widget *swidget,
 	case snd_soc_dapm_dai_in:
 	case snd_soc_dapm_dai_out:
 	{
+		/* Wait for DAI link hardware (e.g. SoundWire slaves) to be ready */
+		if (sdev->pdata->desc->ops->dai_link_hw_ready) {
+			ret = sdev->pdata->desc->ops->dai_link_hw_ready(
+					sdev, ipc4_copier->dai_type);
+			if (ret)
+				return ret;
+		}
+
 		/*
 		 * Only SOF_DAI_INTEL_ALH needs copier_data to set blob.
 		 * That's why only ALH dai's blob is set after sof_ipc4_init_input_audio_fmt
diff --git a/sound/soc/sof/sof-priv.h b/sound/soc/sof/sof-priv.h
index 0f624d8cde20..346b5c34c6c8 100644
--- a/sound/soc/sof/sof-priv.h
+++ b/sound/soc/sof/sof-priv.h
@@ -346,6 +346,9 @@ struct snd_sof_dsp_ops {
 	int (*register_ipc_clients)(struct snd_sof_dev *sdev); /* optional */
 	void (*unregister_ipc_clients)(struct snd_sof_dev *sdev); /* optional */
 
+	/* optional: wait for DAI link hardware readiness (e.g. SoundWire slave init) */
+	int (*dai_link_hw_ready)(struct snd_sof_dev *sdev, int dai_type); /* optional */
+
 	/* DAI ops */
 	struct snd_soc_dai_driver *drv;
 	int num_drv;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init
  2026-02-14  6:40 [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init Cole Leavitt
  2026-02-14  6:40 ` [PATCH 1/2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry Cole Leavitt
  2026-02-14  6:40 ` [PATCH 2/2] ASoC: SOF: Add platform ops callback for DAI link hardware readiness Cole Leavitt
@ 2026-02-16 10:52 ` Péter Ujfalusi
  2026-02-16 16:41 ` Péter Ujfalusi
  3 siblings, 0 replies; 9+ messages in thread
From: Péter Ujfalusi @ 2026-02-16 10:52 UTC (permalink / raw)
  To: Cole Leavitt, Bard Liao, Ranjani Sridharan, Liam Girdwood,
	Daniel Baluta
  Cc: Pierre-Louis Bossart, Kai Vehmanen, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, sound-open-firmware, linux-sound, linux-kernel

Hi Cole,

On 14/02/2026 08:40, Cole Leavitt wrote:
> Two fixes for SOF IPC4 reliability issues observed on Lenovo ThinkPad
> P16 Gen 3 (Arrow Lake-S, CS42L43 + CS35L56 over SoundWire):
> 
> 1. Replace the broken delayed_ipc_tx_msg mechanism with a bounded retry
>    loop. The old deferred dispatch silently drops messages during D0i3
>    transitions, causing 500ms+ hangs per IPC chunk.
> 
> 2. Add a platform ops callback (dai_link_hw_ready) so Intel HDA
>    platforms can wait for SoundWire slave initialization before ALH
>    copier setup. Without this, the DSP enters an unrecoverable wedged
>    state when userspace opens a PCM before slaves finish re-enumerating
>    after resume.
> 
> Tested on ThinkPad P16 Gen 3 with repeated suspend/resume cycles
> and concurrent audio playback.

Thank you for the patch, I have sent them to our CI for testing:
https://github.com/thesofproject/linux/pull/5671

With a brief look the first patch is really nice, better than the
admitably handicapped delayed message workaround.

The second patch is a bit more of a layering violation at first glance,
I'm not sure if I would like code dealing with soundwire nuances in SOF
code itself.
This would need additional thoughts.

> 
> Cole Leavitt (2):
>   ASoC: SOF: Replace IPC TX busy deferral with bounded retry
>   ASoC: SOF: Add platform ops callback for DAI link hardware readiness
> 
>  sound/soc/sof/intel/cnl.c            | 17 ++---------
>  sound/soc/sof/intel/hda-common-ops.c |  1 +
>  sound/soc/sof/intel/hda-ipc.c        | 17 ++---------
>  sound/soc/sof/intel/hda.c            | 44 ++++++++++++++++++++++++++++
>  sound/soc/sof/intel/hda.h            | 14 ++++-----
>  sound/soc/sof/intel/mtl.c            | 17 ++---------
>  sound/soc/sof/ipc4-topology.c        |  8 +++++
>  sound/soc/sof/ipc4.c                 | 17 +++++++++--
>  sound/soc/sof/sof-priv.h             |  3 ++
>  9 files changed, 83 insertions(+), 55 deletions(-)
> 
> 
> base-commit: 2687c848e57820651b9f69d30c4710f4219f7dbf

-- 
Péter


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry
  2026-02-14  6:40 ` [PATCH 1/2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry Cole Leavitt
@ 2026-02-16 12:39   ` Péter Ujfalusi
  2026-02-17 21:49     ` [PATCH v2] " Cole Leavitt
  0 siblings, 1 reply; 9+ messages in thread
From: Péter Ujfalusi @ 2026-02-16 12:39 UTC (permalink / raw)
  To: Cole Leavitt, Bard Liao, Ranjani Sridharan, Liam Girdwood,
	Daniel Baluta
  Cc: Pierre-Louis Bossart, Kai Vehmanen, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, sound-open-firmware, linux-sound, linux-kernel



On 14/02/2026 08:40, Cole Leavitt wrote:
> The SOF IPC4 platform send_msg functions (hda_dsp_ipc4_send_msg,
> mtl_ipc_send_msg, cnl_ipc4_send_msg) previously stored the message in
> delayed_ipc_tx_msg and returned 0 when the TX register was busy. The
> deferred message was supposed to be dispatched from the IRQ handler
> when the DSP acknowledged the previous message.
> 
> This mechanism silently drops messages during D0i3 power transitions
> because the IRQ handler never fires while the DSP is in a low-power
> state. The caller then hangs in wait_event_timeout() for up to 500ms
> per IPC chunk, causing multi-second audio stalls under CPU load.

I do wonder how this can happen as we only send IPC messages when the fw
has booted up and thus the fw should be replying to messages.
The delayed message handling meant to handle the case when the firmware
sent the reply already, but the TX doorbell is not cleared, FW is not
yet ready to receive a new meesage (or if we send it might be lost) or
we will never send such message (and send it after the next firmware boot?).

see:
47772f905cd8 ("ASoC: SOF: Intel: ipc4: Wait for channel to be free
before sending a message")

I agree that rapid IPC sending attempt would drop messages and will only
send the last one. This would be visible with:
https://github.com/thesofproject/linux/pull/5521

> Fix this by making the platform send_msg functions return -EBUSY
> immediately when the TX register is busy (safe since they execute
> under spin_lock_irq in sof_ipc_send_msg), and adding a bounded retry
> loop with usleep_range() in ipc4_tx_msg_unlocked() which only holds
> the tx_mutex (a sleepable context). The retry loop attempts up to 50
> iterations with 100-200us delays, bounding the maximum busy-wait to
> approximately 10ms instead of the previous 500ms timeout.
> 
> Also remove the now-dead delayed_ipc_tx_msg field from
> sof_intel_hda_dev, the dispatch code, and the ack_received tracking
> variable from all three IRQ thread handlers (hda_dsp_ipc4_irq_thread,
> mtl_ipc_irq_thread, cnl_ipc4_irq_thread).
> 
> Signed-off-by: Cole Leavitt <cole@unwrap.rs>
> ---

...

diff --git a/sound/soc/sof/intel/mtl.c b/sound/soc/sof/intel/mtl.c
> index 095dcf1a18e4..24dec128f589 100644
> --- a/sound/soc/sof/intel/mtl.c
> +++ b/sound/soc/sof/intel/mtl.c
> @@ -101,12 +101,8 @@ static int mtl_ipc_send_msg(struct snd_sof_dev *sdev, struct snd_sof_ipc_msg *ms
>  	struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
>  	struct sof_ipc4_msg *msg_data = msg->msg_data;
>  
> -	if (hda_ipc4_tx_is_busy(sdev)) {
> -		hdev->delayed_ipc_tx_msg = msg;
> -		return 0;
> -	}
> -
> -	hdev->delayed_ipc_tx_msg = NULL;
> +	if (hda_ipc4_tx_is_busy(sdev))
> +		return -EBUSY;
>  
>  	/* send the message via mailbox */
>  	if (msg_data->data_size)
> @@ -559,7 +555,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
>  {
>  	struct sof_ipc4_msg notification_data = {{ 0 }};
>  	struct snd_sof_dev *sdev = context;
> -	bool ack_received = false;
>  	bool ipc_irq = false;
>  	u32 hipcida;
>  	u32 hipctdr;
> @@ -576,7 +571,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
>  		mtl_ipc_dsp_done(sdev);
>  
>  		ipc_irq = true;
> -		ack_received = true;
>  	}
>  
>  	if (hipctdr & MTL_DSP_REG_HFIPCXTDR_BUSY) {
> @@ -628,13 +622,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
>  		dev_dbg_ratelimited(sdev->dev, "nothing to do in IPC IRQ thread\n");
>  	}
>  
> -	if (ack_received) {
> -		struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
> -
> -		if (hdev->delayed_ipc_tx_msg)
> -			mtl_ipc_send_msg(sdev, hdev->delayed_ipc_tx_msg);
> -	}
> -
>  	return IRQ_HANDLED;
>  }
>  
> diff --git a/sound/soc/sof/ipc4.c b/sound/soc/sof/ipc4.c
> index a4a090e6724a..2e24308ef9cc 100644
> --- a/sound/soc/sof/ipc4.c
> +++ b/sound/soc/sof/ipc4.c
> @@ -365,20 +365,33 @@ static int ipc4_wait_tx_done(struct snd_sof_ipc *ipc, void *reply_data)
>  	return ret;
>  }
>  
> +#define SOF_IPC4_TX_BUSY_RETRIES	50
> +#define SOF_IPC4_TX_BUSY_DELAY_US	100
> +#define SOF_IPC4_TX_BUSY_DELAY_MAX_US	200
> +
>  static int ipc4_tx_msg_unlocked(struct snd_sof_ipc *ipc,
>  				void *msg_data, size_t msg_bytes,
>  				void *reply_data, size_t reply_bytes)
>  {
>  	struct sof_ipc4_msg *ipc4_msg = msg_data;
>  	struct snd_sof_dev *sdev = ipc->sdev;
> -	int ret;
> +	int ret, i;
>  
>  	if (msg_bytes > ipc->max_payload_size || reply_bytes > ipc->max_payload_size)
>  		return -EINVAL;
>  
>  	sof_ipc4_log_header(sdev->dev, "ipc tx      ", msg_data, true);
>  
> -	ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
> +	for (i = 0; i < SOF_IPC4_TX_BUSY_RETRIES; i++) {
> +		ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
> +		if (ret != -EBUSY)
> +			break;
> +		usleep_range(SOF_IPC4_TX_BUSY_DELAY_US,
> +			     SOF_IPC4_TX_BUSY_DELAY_MAX_US);
> +	}

The reason why I ended up with the dead simple delay msg handling which
sends the message when the ack is received rigth away is to avoid delays
without a need of a busy loop.
Before that I had similar approach, but dropped it due to the delay it
introduced on the message sending.

The delayed message combined with IPC timeouts are an interesting
problem to tackle, the PR I mentioned:
https://github.com/thesofproject/linux/pull/5521
plus this patch
https://github.com/ujfalusi/sof-linux/commit/e6c6ca613e60c477dcc025207f9732c8ae4a1b33

worked locally for most of the time, but I give that I have not faced
with the issue that I have lost IPCs.

I'm not sure if the two approces can be somehow combined.
Receving the reply to the message does not mean that the FW can receive
a new message, but from the kernel pow the IPC sequence was done (when
we receive out of sync ACK, the reply data might not be valid anymore).
We cannot make the IPC completion based on the ACK as I have seen in
debug logs all permutations: ACK then REPLY, ACK and REPLAY at the same
time, REPLY followed by ACK.

Having said that, I think this is still a bit safer to not loose
messages, 200us is close to polling.

> +	if (i == SOF_IPC4_TX_BUSY_RETRIES)
> +		dev_dbg(sdev->dev, "%s: TX still busy after %d retries\n",
> +			__func__, i);

no need to print the __func_ it is added by the infra.

Can you add
dev_dbg(sdev->dev, "message sending delayed by %d loops for %#x|%#x\n",
	i, ipc4_msg->primary, ipc4_msg->extension);

when the message got delayed due to EBUSY?

>  	if (ret) {
>  		dev_err_ratelimited(sdev->dev,
>  				    "%s: ipc message send for %#x|%#x failed: %d\n",

-- 
Péter


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init
  2026-02-14  6:40 [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init Cole Leavitt
                   ` (2 preceding siblings ...)
  2026-02-16 10:52 ` [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init Péter Ujfalusi
@ 2026-02-16 16:41 ` Péter Ujfalusi
  3 siblings, 0 replies; 9+ messages in thread
From: Péter Ujfalusi @ 2026-02-16 16:41 UTC (permalink / raw)
  To: Cole Leavitt, Bard Liao, Ranjani Sridharan, Liam Girdwood,
	Daniel Baluta
  Cc: Pierre-Louis Bossart, Kai Vehmanen, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, sound-open-firmware, linux-sound, linux-kernel

Hi Cole,

On 14/02/2026 08:40, Cole Leavitt wrote:
> Two fixes for SOF IPC4 reliability issues observed on Lenovo ThinkPad
> P16 Gen 3 (Arrow Lake-S, CS42L43 + CS35L56 over SoundWire):
> 
> 1. Replace the broken delayed_ipc_tx_msg mechanism with a bounded retry
>    loop. The old deferred dispatch silently drops messages during D0i3
>    transitions, causing 500ms+ hangs per IPC chunk.
> 
> 2. Add a platform ops callback (dai_link_hw_ready) so Intel HDA
>    platforms can wait for SoundWire slave initialization before ALH
>    copier setup. Without this, the DSP enters an unrecoverable wedged
>    state when userspace opens a PCM before slaves finish re-enumerating
>    after resume.
> 
> Tested on ThinkPad P16 Gen 3 with repeated suspend/resume cycles
> and concurrent audio playback.

This issue is a new one for us and we would like to understand what is
going on, can you create an issue at
https://github.com/thesofproject/linux/issues

and include the full kernel log with sof-dyndbg.conf in place like asked
in this comment:
https://github.com/thesofproject/linux/issues/5517#issuecomment-3233283263

with kernel (Most likely 6.19 based on the base-commit), sof-firmware
and alsa-info.sh output version as well?

In the second patch you are pointing to fw boot timeout issues, which
makes me wonder what is going on.
It is possible that the codec drivers prevent the DSP power off and on
next boot we obviously fail to receive the FW_READY since the DSP and
firmware is in booted state already.

thank you for the help!

> Cole Leavitt (2):
>   ASoC: SOF: Replace IPC TX busy deferral with bounded retry
>   ASoC: SOF: Add platform ops callback for DAI link hardware readiness
> 
>  sound/soc/sof/intel/cnl.c            | 17 ++---------
>  sound/soc/sof/intel/hda-common-ops.c |  1 +
>  sound/soc/sof/intel/hda-ipc.c        | 17 ++---------
>  sound/soc/sof/intel/hda.c            | 44 ++++++++++++++++++++++++++++
>  sound/soc/sof/intel/hda.h            | 14 ++++-----
>  sound/soc/sof/intel/mtl.c            | 17 ++---------
>  sound/soc/sof/ipc4-topology.c        |  8 +++++
>  sound/soc/sof/ipc4.c                 | 17 +++++++++--
>  sound/soc/sof/sof-priv.h             |  3 ++
>  9 files changed, 83 insertions(+), 55 deletions(-)
> 
> 
> base-commit: 2687c848e57820651b9f69d30c4710f4219f7dbf

-- 
Péter


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] ASoC: SOF: Add platform ops callback for DAI link hardware readiness
  2026-02-14  6:40 ` [PATCH 2/2] ASoC: SOF: Add platform ops callback for DAI link hardware readiness Cole Leavitt
@ 2026-02-17  8:08   ` Pierre-Louis Bossart
  0 siblings, 0 replies; 9+ messages in thread
From: Pierre-Louis Bossart @ 2026-02-17  8:08 UTC (permalink / raw)
  To: Cole Leavitt, Peter Ujfalusi, Bard Liao, Ranjani Sridharan,
	Liam Girdwood, Daniel Baluta
  Cc: Kai Vehmanen, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	sound-open-firmware, linux-sound, linux-kernel

On 2/14/26 07:40, Cole Leavitt wrote:
> After suspend/resume (D3->D0), the SOF firmware is reloaded fresh and

Is this really correct? I vaguely remember that with the IMR stuff started in MTL, the firmware saves/restore its context. No need to reload firmware.

> pipelines are recreated lazily when userspace opens a PCM. However,
> SoundWire slave re-enumeration runs asynchronously via a 100ms delayed
> work item (SDW_INTEL_DELAYED_ENUMERATION_MS). If userspace attempts to
> play audio before SoundWire slaves finish re-enumerating, the firmware
> returns error 9 (resource not found) when creating ALH copier modules,
> leaving the DSP in an unrecoverable wedged state requiring reboot.

The ALH copier stuff has absolutely nothing to do with SoundWire enumeration.
That's a wrong assumption, in fact the SOF driver should know absolutely nothing about peripherals.
Exhibit A is the fake codec mode, where we can stream data across the bus without any peripheral present on the bus.

If you can share a test case that exposes the issue, that would help.

> Add a new optional dai_link_hw_ready callback to struct snd_sof_dsp_ops
> that allows platform-specific code to wait for DAI link hardware to
> become ready before pipeline setup. The generic ipc4-topology.c calls
> this callback (when set) in sof_ipc4_prepare_copier_module() before
> configuring DAI copiers, maintaining SOF's platform abstraction.
> 
> The Intel HDA implementation (hda_sdw_dai_hw_ready) waits for all
> attached SoundWire slaves to complete initialization using
> wait_for_completion_interruptible_timeout() with a 2-second timeout.
> This is safe for multiple waiters since the SoundWire subsystem uses
> complete_all() for initialization_complete. Unattached slaves (declared
> in ACPI but not physically present) are skipped to avoid false timeouts.
> 
> The function returns -ETIMEDOUT on timeout (instead of warn-and-continue)
> to prevent the DSP from entering a wedged state. On non-resume paths the
> completions are already done, so the wait returns immediately.
> 
> Link: https://github.com/thesofproject/sof/issues/8662

come on, it's a 2023 issue on an early test device in 'nocodec' mode, which means NOT SoundWire...

> Link: https://github.com/thesofproject/sof/issues/9308

Again nocodec mode, nothing to do with SoundWire.

please file a real issue...

> Signed-off-by: Cole Leavitt <cole@unwrap.rs>
> ---
>  sound/soc/sof/intel/hda-common-ops.c |  1 +
>  sound/soc/sof/intel/hda.c            | 44 ++++++++++++++++++++++++++++
>  sound/soc/sof/intel/hda.h            |  6 ++++
>  sound/soc/sof/ipc4-topology.c        |  8 +++++
>  sound/soc/sof/sof-priv.h             |  3 ++
>  5 files changed, 62 insertions(+)
> 
> diff --git a/sound/soc/sof/intel/hda-common-ops.c b/sound/soc/sof/intel/hda-common-ops.c
> index 746b426b1329..315cb61426da 100644
> --- a/sound/soc/sof/intel/hda-common-ops.c
> +++ b/sound/soc/sof/intel/hda-common-ops.c
> @@ -84,6 +84,7 @@ const struct snd_sof_dsp_ops sof_hda_common_ops = {
>  	.unregister_ipc_clients = hda_unregister_clients,
>  
>  	/* DAI drivers */
> +	.dai_link_hw_ready = hda_sdw_dai_hw_ready,
>  	.drv		= skl_dai,
>  	.num_drv	= SOF_SKL_NUM_DAIS,
>  	.is_chain_dma_supported	= hda_is_chain_dma_supported,
> diff --git a/sound/soc/sof/intel/hda.c b/sound/soc/sof/intel/hda.c
> index 686ecc040867..956106dc0e02 100644
> --- a/sound/soc/sof/intel/hda.c
> +++ b/sound/soc/sof/intel/hda.c
> @@ -378,6 +378,50 @@ static void hda_dsp_sdw_process_mic_privacy(struct snd_sof_dev *sdev)
>  		chip->process_mic_privacy(sdev, true, AZX_REG_ML_LEPTR_ID_SDW);
>  }
>  
> +int hda_sdw_dai_hw_ready(struct snd_sof_dev *sdev, int dai_type)
> +{
> +	struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
> +	struct sdw_peripherals *sdw_p;
> +	long ret;
> +	int idx;
> +
> +	if (dai_type != SOF_DAI_INTEL_ALH)
> +		return 0;
> +
> +	if (!hdev || !hdev->sdw || !hdev->sdw->peripherals)
> +		return 0;
> +
> +	sdw_p = hdev->sdw->peripherals;
> +
> +	for (idx = 0; idx < sdw_p->num_peripherals; idx++) {
> +		struct sdw_slave *slave = sdw_p->array[idx];
> +
> +		if (!slave)
> +			continue;
> +
> +		if (slave->status != SDW_SLAVE_ATTACHED)
> +			continue;
> +
> +		ret = wait_for_completion_interruptible_timeout(
> +				&slave->initialization_complete,
> +				msecs_to_jiffies(2000));
> +		if (ret == 0) {
> +			dev_err(sdev->dev,
> +				"timeout waiting for SoundWire slave %s initialization\n",
> +				dev_name(&slave->dev));
> +			return -ETIMEDOUT;
> +		}
> +		if (ret < 0) {
> +			dev_dbg(sdev->dev,
> +				"interrupted waiting for SoundWire slave %s initialization: %ld\n",
> +				dev_name(&slave->dev), ret);
> +			return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  #else /* IS_ENABLED(CONFIG_SND_SOC_SOF_INTEL_SOUNDWIRE) */
>  static inline int hda_sdw_acpi_scan(struct snd_sof_dev *sdev)
>  {
> diff --git a/sound/soc/sof/intel/hda.h b/sound/soc/sof/intel/hda.h
> index ac9f76a5ef97..9bd8fe82ae9e 100644
> --- a/sound/soc/sof/intel/hda.h
> +++ b/sound/soc/sof/intel/hda.h
> @@ -830,6 +830,7 @@ bool hda_sdw_check_wakeen_irq_common(struct snd_sof_dev *sdev);
>  void hda_sdw_process_wakeen_common(struct snd_sof_dev *sdev);
>  void hda_sdw_process_wakeen(struct snd_sof_dev *sdev);
>  bool hda_common_check_sdw_irq(struct snd_sof_dev *sdev);
> +int hda_sdw_dai_hw_ready(struct snd_sof_dev *sdev, int dai_type);
>  
>  #else
>  
> @@ -879,6 +880,11 @@ static inline bool hda_common_check_sdw_irq(struct snd_sof_dev *sdev)
>  	return false;
>  }
>  
> +static inline int hda_sdw_dai_hw_ready(struct snd_sof_dev *sdev, int dai_type)
> +{
> +	return 0;
> +}
> +
>  #endif
>  
>  int sdw_hda_dai_hw_params(struct snd_pcm_substream *substream,
> diff --git a/sound/soc/sof/ipc4-topology.c b/sound/soc/sof/ipc4-topology.c
> index d621e7914a73..a8b107d7e786 100644
> --- a/sound/soc/sof/ipc4-topology.c
> +++ b/sound/soc/sof/ipc4-topology.c
> @@ -2256,6 +2256,14 @@ sof_ipc4_prepare_copier_module(struct snd_sof_widget *swidget,
>  	case snd_soc_dapm_dai_in:
>  	case snd_soc_dapm_dai_out:
>  	{
> +		/* Wait for DAI link hardware (e.g. SoundWire slaves) to be ready */
> +		if (sdev->pdata->desc->ops->dai_link_hw_ready) {
> +			ret = sdev->pdata->desc->ops->dai_link_hw_ready(
> +					sdev, ipc4_copier->dai_type);

that one is clearly wrong, the IPC copier stuff has nothing to do with link management. The wait may be needed but it's in the wrong location in this patch.

> +			if (ret)
> +				return ret;
> +		}
> +
>  		/*
>  		 * Only SOF_DAI_INTEL_ALH needs copier_data to set blob.
>  		 * That's why only ALH dai's blob is set after sof_ipc4_init_input_audio_fmt
> diff --git a/sound/soc/sof/sof-priv.h b/sound/soc/sof/sof-priv.h
> index 0f624d8cde20..346b5c34c6c8 100644
> --- a/sound/soc/sof/sof-priv.h
> +++ b/sound/soc/sof/sof-priv.h
> @@ -346,6 +346,9 @@ struct snd_sof_dsp_ops {
>  	int (*register_ipc_clients)(struct snd_sof_dev *sdev); /* optional */
>  	void (*unregister_ipc_clients)(struct snd_sof_dev *sdev); /* optional */
>  
> +	/* optional: wait for DAI link hardware readiness (e.g. SoundWire slave init) */
> +	int (*dai_link_hw_ready)(struct snd_sof_dev *sdev, int dai_type); /* optional */
> +
>  	/* DAI ops */
>  	struct snd_soc_dai_driver *drv;
>  	int num_drv;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry
  2026-02-16 12:39   ` Péter Ujfalusi
@ 2026-02-17 21:49     ` Cole Leavitt
  2026-02-19  7:11       ` Péter Ujfalusi
  0 siblings, 1 reply; 9+ messages in thread
From: Cole Leavitt @ 2026-02-17 21:49 UTC (permalink / raw)
  To: Peter Ujfalusi, Bard Liao, Ranjani Sridharan, Liam Girdwood,
	Daniel Baluta
  Cc: Pierre-Louis Bossart, Kai Vehmanen, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, sound-open-firmware, linux-sound, linux-kernel,
	Cole Leavitt

The SOF IPC4 platform send_msg functions (hda_dsp_ipc4_send_msg,
mtl_ipc_send_msg, cnl_ipc4_send_msg) previously stored the message in
delayed_ipc_tx_msg and returned 0 when the TX register was busy. The
deferred message was supposed to be dispatched from the IRQ handler
when the DSP acknowledged the previous message.

This mechanism silently drops messages during D0i3 power transitions
because the IRQ handler never fires while the DSP is in a low-power
state. The caller then hangs in wait_event_timeout() for up to 500ms
per IPC chunk, causing multi-second audio stalls under CPU load.

Fix this by making the platform send_msg functions return -EBUSY
immediately when the TX register is busy (safe since they execute
under spin_lock_irq in sof_ipc_send_msg), and adding a bounded retry
loop with usleep_range() in ipc4_tx_msg_unlocked() which only holds
the tx_mutex (a sleepable context). The retry loop attempts up to 50
iterations with 100-200us delays, bounding the maximum busy-wait to
approximately 10ms instead of the previous 500ms timeout.

Also remove the now-dead delayed_ipc_tx_msg field from
sof_intel_hda_dev, the dispatch code, and the ack_received tracking
variable from all three IRQ thread handlers (hda_dsp_ipc4_irq_thread,
mtl_ipc_irq_thread, cnl_ipc4_irq_thread).

Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
Changes in v2:
  - Removed __func__ from debug prints (dyndbg adds it automatically)
  - Added dev_dbg() when message sending is delayed due to EBUSY
  - Dropped patch 2/2 (dai_link_hw_ready) per Pierre's feedback

 sound/soc/sof/intel/cnl.c     | 17 ++---------------
 sound/soc/sof/intel/hda-ipc.c | 17 ++---------------
 sound/soc/sof/intel/hda.h     |  8 --------
 sound/soc/sof/intel/mtl.c     | 17 ++---------------
 sound/soc/sof/ipc4.c          | 20 ++++++++++++++++++--
 5 files changed, 24 insertions(+), 55 deletions(-)

diff --git a/sound/soc/sof/intel/cnl.c b/sound/soc/sof/intel/cnl.c
index 0cc5725515e7..a2c6c7894a0f 100644
--- a/sound/soc/sof/intel/cnl.c
+++ b/sound/soc/sof/intel/cnl.c
@@ -37,7 +37,6 @@ irqreturn_t cnl_ipc4_irq_thread(int irq, void *context)
 {
 	struct sof_ipc4_msg notification_data = {{ 0 }};
 	struct snd_sof_dev *sdev = context;
-	bool ack_received = false;
 	bool ipc_irq = false;
 	u32 hipcida, hipctdr;
 
@@ -51,7 +50,6 @@ irqreturn_t cnl_ipc4_irq_thread(int irq, void *context)
 		cnl_ipc_dsp_done(sdev);
 
 		ipc_irq = true;
-		ack_received = true;
 	}
 
 	if (hipctdr & CNL_DSP_REG_HIPCTDR_BUSY) {
@@ -101,13 +99,6 @@ irqreturn_t cnl_ipc4_irq_thread(int irq, void *context)
 		/* This interrupt is not shared so no need to return IRQ_NONE. */
 		dev_dbg_ratelimited(sdev->dev, "nothing to do in IPC IRQ thread\n");
 
-	if (ack_received) {
-		struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
-
-		if (hdev->delayed_ipc_tx_msg)
-			cnl_ipc4_send_msg(sdev, hdev->delayed_ipc_tx_msg);
-	}
-
 	return IRQ_HANDLED;
 }
 EXPORT_SYMBOL_NS(cnl_ipc4_irq_thread, "SND_SOC_SOF_INTEL_CNL");
@@ -266,12 +257,8 @@ int cnl_ipc4_send_msg(struct snd_sof_dev *sdev, struct snd_sof_ipc_msg *msg)
 	struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
 	struct sof_ipc4_msg *msg_data = msg->msg_data;
 
-	if (hda_ipc4_tx_is_busy(sdev)) {
-		hdev->delayed_ipc_tx_msg = msg;
-		return 0;
-	}
-
-	hdev->delayed_ipc_tx_msg = NULL;
+	if (hda_ipc4_tx_is_busy(sdev))
+		return -EBUSY;
 
 	/* send the message via mailbox */
 	if (msg_data->data_size)
diff --git a/sound/soc/sof/intel/hda-ipc.c b/sound/soc/sof/intel/hda-ipc.c
index 94425c510861..78449452041c 100644
--- a/sound/soc/sof/intel/hda-ipc.c
+++ b/sound/soc/sof/intel/hda-ipc.c
@@ -106,12 +106,8 @@ int hda_dsp_ipc4_send_msg(struct snd_sof_dev *sdev, struct snd_sof_ipc_msg *msg)
 	struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
 	struct sof_ipc4_msg *msg_data = msg->msg_data;
 
-	if (hda_ipc4_tx_is_busy(sdev)) {
-		hdev->delayed_ipc_tx_msg = msg;
-		return 0;
-	}
-
-	hdev->delayed_ipc_tx_msg = NULL;
+	if (hda_ipc4_tx_is_busy(sdev))
+		return -EBUSY;
 
 	/* send the message via mailbox */
 	if (msg_data->data_size)
@@ -168,7 +164,6 @@ irqreturn_t hda_dsp_ipc4_irq_thread(int irq, void *context)
 {
 	struct sof_ipc4_msg notification_data = {{ 0 }};
 	struct snd_sof_dev *sdev = context;
-	bool ack_received = false;
 	bool ipc_irq = false;
 	u32 hipcie, hipct;
 
@@ -182,7 +177,6 @@ irqreturn_t hda_dsp_ipc4_irq_thread(int irq, void *context)
 		hda_dsp_ipc_dsp_done(sdev);
 
 		ipc_irq = true;
-		ack_received = true;
 	}
 
 	if (hipct & HDA_DSP_REG_HIPCT_BUSY) {
@@ -236,13 +230,6 @@ irqreturn_t hda_dsp_ipc4_irq_thread(int irq, void *context)
 		/* This interrupt is not shared so no need to return IRQ_NONE. */
 		dev_dbg_ratelimited(sdev->dev, "nothing to do in IPC IRQ thread\n");
 
-	if (ack_received) {
-		struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
-
-		if (hdev->delayed_ipc_tx_msg)
-			hda_dsp_ipc4_send_msg(sdev, hdev->delayed_ipc_tx_msg);
-	}
-
 	return IRQ_HANDLED;
 }
 EXPORT_SYMBOL_NS(hda_dsp_ipc4_irq_thread, "SND_SOC_SOF_INTEL_HDA_COMMON");
diff --git a/sound/soc/sof/intel/hda.h b/sound/soc/sof/intel/hda.h
index 562fe8be79c1..ac9f76a5ef97 100644
--- a/sound/soc/sof/intel/hda.h
+++ b/sound/soc/sof/intel/hda.h
@@ -549,14 +549,6 @@ struct sof_intel_hda_dev {
 
 	/* work queue for mic privacy state change notification sending */
 	struct sof_ace3_mic_privacy mic_privacy;
-
-	/*
-	 * Pointing to the IPC message if immediate sending was not possible
-	 * because the downlink communication channel was BUSY at the time.
-	 * The message will be re-tried when the channel becomes free (the ACK
-	 * is received from the DSP for the previous message)
-	 */
-	struct snd_sof_ipc_msg *delayed_ipc_tx_msg;
 };
 
 static inline struct hdac_bus *sof_to_bus(struct snd_sof_dev *s)
diff --git a/sound/soc/sof/intel/mtl.c b/sound/soc/sof/intel/mtl.c
index 095dcf1a18e4..24dec128f589 100644
--- a/sound/soc/sof/intel/mtl.c
+++ b/sound/soc/sof/intel/mtl.c
@@ -101,12 +101,8 @@ static int mtl_ipc_send_msg(struct snd_sof_dev *sdev, struct snd_sof_ipc_msg *ms
 	struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
 	struct sof_ipc4_msg *msg_data = msg->msg_data;
 
-	if (hda_ipc4_tx_is_busy(sdev)) {
-		hdev->delayed_ipc_tx_msg = msg;
-		return 0;
-	}
-
-	hdev->delayed_ipc_tx_msg = NULL;
+	if (hda_ipc4_tx_is_busy(sdev))
+		return -EBUSY;
 
 	/* send the message via mailbox */
 	if (msg_data->data_size)
@@ -559,7 +555,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
 {
 	struct sof_ipc4_msg notification_data = {{ 0 }};
 	struct snd_sof_dev *sdev = context;
-	bool ack_received = false;
 	bool ipc_irq = false;
 	u32 hipcida;
 	u32 hipctdr;
@@ -576,7 +571,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
 		mtl_ipc_dsp_done(sdev);
 
 		ipc_irq = true;
-		ack_received = true;
 	}
 
 	if (hipctdr & MTL_DSP_REG_HFIPCXTDR_BUSY) {
@@ -628,13 +622,6 @@ static irqreturn_t mtl_ipc_irq_thread(int irq, void *context)
 		dev_dbg_ratelimited(sdev->dev, "nothing to do in IPC IRQ thread\n");
 	}
 
-	if (ack_received) {
-		struct sof_intel_hda_dev *hdev = sdev->pdata->hw_pdata;
-
-		if (hdev->delayed_ipc_tx_msg)
-			mtl_ipc_send_msg(sdev, hdev->delayed_ipc_tx_msg);
-	}
-
 	return IRQ_HANDLED;
 }
 
diff --git a/sound/soc/sof/ipc4.c b/sound/soc/sof/ipc4.c
index a4a090e6724a..ad99e2e07b66 100644
--- a/sound/soc/sof/ipc4.c
+++ b/sound/soc/sof/ipc4.c
@@ -365,20 +365,36 @@ static int ipc4_wait_tx_done(struct snd_sof_ipc *ipc, void *reply_data)
 	return ret;
 }
 
+#define SOF_IPC4_TX_BUSY_RETRIES	50
+#define SOF_IPC4_TX_BUSY_DELAY_US	100
+#define SOF_IPC4_TX_BUSY_DELAY_MAX_US	200
+
 static int ipc4_tx_msg_unlocked(struct snd_sof_ipc *ipc,
 				void *msg_data, size_t msg_bytes,
 				void *reply_data, size_t reply_bytes)
 {
 	struct sof_ipc4_msg *ipc4_msg = msg_data;
 	struct snd_sof_dev *sdev = ipc->sdev;
-	int ret;
+	int ret, i;
 
 	if (msg_bytes > ipc->max_payload_size || reply_bytes > ipc->max_payload_size)
 		return -EINVAL;
 
 	sof_ipc4_log_header(sdev->dev, "ipc tx      ", msg_data, true);
 
-	ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
+	for (i = 0; i < SOF_IPC4_TX_BUSY_RETRIES; i++) {
+		ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
+		if (ret != -EBUSY)
+			break;
+		usleep_range(SOF_IPC4_TX_BUSY_DELAY_US,
+			     SOF_IPC4_TX_BUSY_DELAY_MAX_US);
+	}
+	if (i == SOF_IPC4_TX_BUSY_RETRIES) {
+		dev_dbg(sdev->dev, "ipc tx failed: TX busy after %d retries\n", i);
+	} else if (i > 0) {
+		dev_dbg(sdev->dev, "ipc tx delayed by %d loops for %#x|%#x\n",
+			i, ipc4_msg->primary, ipc4_msg->extension);
+	}
 	if (ret) {
 		dev_err_ratelimited(sdev->dev,
 				    "%s: ipc message send for %#x|%#x failed: %d\n",
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry
  2026-02-17 21:49     ` [PATCH v2] " Cole Leavitt
@ 2026-02-19  7:11       ` Péter Ujfalusi
  0 siblings, 0 replies; 9+ messages in thread
From: Péter Ujfalusi @ 2026-02-19  7:11 UTC (permalink / raw)
  To: Cole Leavitt, Bard Liao, Ranjani Sridharan, Liam Girdwood,
	Daniel Baluta
  Cc: Pierre-Louis Bossart, Kai Vehmanen, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, sound-open-firmware, linux-sound, linux-kernel



On 17/02/2026 23:49, Cole Leavitt wrote:
> The SOF IPC4 platform send_msg functions (hda_dsp_ipc4_send_msg,
> mtl_ipc_send_msg, cnl_ipc4_send_msg) previously stored the message in
> delayed_ipc_tx_msg and returned 0 when the TX register was busy. The
> deferred message was supposed to be dispatched from the IRQ handler
> when the DSP acknowledged the previous message.
> 
> This mechanism silently drops messages during D0i3 power transitions
> because the IRQ handler never fires while the DSP is in a low-power
> state. The caller then hangs in wait_event_timeout() for up to 500ms
> per IPC chunk, causing multi-second audio stalls under CPU load.

I think the agent get this a bit wrong and there is a cause effect mixup.

> Fix this by making the platform send_msg functions return -EBUSY
> immediately when the TX register is busy (safe since they execute
> under spin_lock_irq in sof_ipc_send_msg), and adding a bounded retry
> loop with usleep_range() in ipc4_tx_msg_unlocked() which only holds
> the tx_mutex (a sleepable context). The retry loop attempts up to 50
> iterations with 100-200us delays, bounding the maximum busy-wait to
> approximately 10ms instead of the previous 500ms timeout.
> 
> Also remove the now-dead delayed_ipc_tx_msg field from
> sof_intel_hda_dev, the dispatch code, and the ack_received tracking
> variable from all three IRQ thread handlers (hda_dsp_ipc4_irq_thread,
> mtl_ipc_irq_thread, cnl_ipc4_irq_thread).

No messages were dropped, but if the firmware locks up during suspend
then we might enter low power while an IPC is delayed and send_msg is
wating for a reply (or timeout).
Yes, irq will not come, but it won't came even of the system would not
be on it's way to suspend.
The delayed handling as it is now is OK, it never looses messages,
everything is linear, it just takes a long time to go through several
messages when each of them times out because the fw is locked up.

In essence this patch reduces the 500ms default IPC timeout to 5-10ms
after an IPC timeout, levaing the FW less time to recover and not wating
for a reply.
It can also introduce a new race: if the FW clears the BUSY first and
then sends the reply and we were 'spinning' to send the next message we
might do so before receiving the reply to the previous message.

Which is fair, I think, but the commit message should be clear on this.

Please can you file the issue for sof/linux as I have asked with more
information? We had similar issues 2-3 years ago, but they were root
caused and fixed.

I'll need to think about this a bit more...

one commnet for ipc4.c

> 
> Signed-off-by: Cole Leavitt <cole@unwrap.rs>
> ---
> Changes in v2:
>   - Removed __func__ from debug prints (dyndbg adds it automatically)
>   - Added dev_dbg() when message sending is delayed due to EBUSY
>   - Dropped patch 2/2 (dai_link_hw_ready) per Pierre's feedback
> 
> diff --git a/sound/soc/sof/ipc4.c b/sound/soc/sof/ipc4.c
> index a4a090e6724a..ad99e2e07b66 100644
> --- a/sound/soc/sof/ipc4.c
> +++ b/sound/soc/sof/ipc4.c
> @@ -365,20 +365,36 @@ static int ipc4_wait_tx_done(struct snd_sof_ipc *ipc, void *reply_data)
>  	return ret;
>  }
>  
> +#define SOF_IPC4_TX_BUSY_RETRIES	50
> +#define SOF_IPC4_TX_BUSY_DELAY_US	100
> +#define SOF_IPC4_TX_BUSY_DELAY_MAX_US	200
> +
>  static int ipc4_tx_msg_unlocked(struct snd_sof_ipc *ipc,
>  				void *msg_data, size_t msg_bytes,
>  				void *reply_data, size_t reply_bytes)
>  {
>  	struct sof_ipc4_msg *ipc4_msg = msg_data;
>  	struct snd_sof_dev *sdev = ipc->sdev;
> -	int ret;
> +	int ret, i;
>  
>  	if (msg_bytes > ipc->max_payload_size || reply_bytes > ipc->max_payload_size)
>  		return -EINVAL;
>  
>  	sof_ipc4_log_header(sdev->dev, "ipc tx      ", msg_data, true);
>  
> -	ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
> +	for (i = 0; i < SOF_IPC4_TX_BUSY_RETRIES; i++) {
> +		ret = sof_ipc_send_msg(sdev, msg_data, msg_bytes, reply_bytes);
> +		if (ret != -EBUSY)
> +			break;
> +		usleep_range(SOF_IPC4_TX_BUSY_DELAY_US,
> +			     SOF_IPC4_TX_BUSY_DELAY_MAX_US);
> +	}
> +	if (i == SOF_IPC4_TX_BUSY_RETRIES) {
> +		dev_dbg(sdev->dev, "ipc tx failed: TX busy after %d retries\n", i);

this needs special treatment with unique error that can be used for
debugging purposes, something like:

	dev_err(sdev->dev, "IPC busy, msg %#x|%#x cannot be sent\n",
		ipc4_msg->primary, ipc4_msg->extension);
	snd_sof_handle_fw_exception(ipc->sdev, "IPC busy");

	return ret;


> +	} else if (i > 0) {
> +		dev_dbg(sdev->dev, "ipc tx delayed by %d loops for %#x|%#x\n",
> +			i, ipc4_msg->primary, ipc4_msg->extension);
> +	}
>  	if (ret) {
>  		dev_err_ratelimited(sdev->dev,
>  				    "%s: ipc message send for %#x|%#x failed: %d\n",

-- 
Péter


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-19  7:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-14  6:40 [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init Cole Leavitt
2026-02-14  6:40 ` [PATCH 1/2] ASoC: SOF: Replace IPC TX busy deferral with bounded retry Cole Leavitt
2026-02-16 12:39   ` Péter Ujfalusi
2026-02-17 21:49     ` [PATCH v2] " Cole Leavitt
2026-02-19  7:11       ` Péter Ujfalusi
2026-02-14  6:40 ` [PATCH 2/2] ASoC: SOF: Add platform ops callback for DAI link hardware readiness Cole Leavitt
2026-02-17  8:08   ` Pierre-Louis Bossart
2026-02-16 10:52 ` [PATCH 0/2] ASoC: SOF: Fix IPC reliability and post-resume SoundWire init Péter Ujfalusi
2026-02-16 16:41 ` Péter Ujfalusi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox