* [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend
@ 2026-05-13 8:37 Tim JH Chen(陳仁鴻)
2026-05-15 0:19 ` Jakub Kicinski
0 siblings, 1 reply; 2+ messages in thread
From: Tim JH Chen(陳仁鴻) @ 2026-05-13 8:37 UTC (permalink / raw)
To: netdev@vger.kernel.org
Cc: chandrashekar.devegowda@intel.com, haijun.liu@mediatek.com,
ricardo.martinez@linux.intel.com, loic.poulain@oss.qualcomm.com,
ryazanov.s.a@gmail.com, davem@davemloft.net, kuba@kernel.org,
linux-kernel@vger.kernel.org
[-- Attachment #1.1: Type: text/plain, Size: 4191 bytes --]
Date: Wed, 13 May 2026 09:21:40 +0800
Subject: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM
suspend
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When system suspend is triggered while the DPMAIF TX kthread
(t7xx_dpmaif_tx_hw_push_thread) is running, a deadlock can occur
leading to a CPU soft lockup.
The root cause is two-fold:
1. t7xx_dpmaif_suspend() calls t7xx_dpmaif_tx_stop() which only stops
the TX work-queue items (by clearing txq->que_started and waiting on
txq->tx_processing). It does NOT signal the kthread and does NOT
update dpmaif_ctrl->state, which stays DPMAIF_STATE_PWRON.
2. The kthread's state guard (line: "if ... state != DPMAIF_STATE_PWRON")
is only checked at the top of each loop iteration. If the thread
already passed this guard, it proceeds unconditionally to call
pm_runtime_resume_and_get() — which tries to acquire the PM spinlock
also held (or contended) by the system PM suspend path.
The result is a spinlock deadlock observed as:
watchdog: BUG: soft lockup - CPU#N stuck for 26s! [dpmaif_tx_hw_pu]
RIP: _raw_spin_unlock_irqrestore
Call Trace:
__pm_runtime_resume+0x5b/0x80
t7xx_dpmaif_tx_hw_push_thread+0xc4 [mtk_t7xx]
The condition requires ASPM L1 enabled on the endpoint (which extends
the time pm_runtime_resume_and_get() holds the PM lock during L1.2
link retraining) and hundreds of repeated suspend/resume cycles to
trigger reliably.
Fix by three coordinated changes:
- In t7xx_dpmaif_suspend(): immediately set state to DPMAIF_STATE_PWROFF
after stopping the TX queue, then call wake_up() so any sleeping thread
re-evaluates the wait_event condition and stops.
- In t7xx_dpmaif_resume(): restore state to DPMAIF_STATE_PWRON before
re-enabling the TX queues, symmetric with the suspend change.
Without this the kthread would never wake up after resume.
- In t7xx_dpmaif_tx_hw_push_thread(): add a second state check
immediately before pm_runtime_resume_and_get() to close the TOCTOU
window between the wait_event guard and the pm call.
Tested: no soft lockup observed over 500+ suspend/resume cycles with
SIM registered and ASPM L1 enabled (previously triggered in < 300).
Fixes: 05f7e89ab ("Linux 6.19")
Signed-off-by: Tim JH Chen <tim.jh.chen@wnc.com.tw>
---
drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c | 3 +++
drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
index 7ff33c1d6..315a77e24 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
@@ -412,6 +412,8 @@ static int t7xx_dpmaif_suspend(struct t7xx_pci_dev *t7xx_dev, void *param)
struct dpmaif_ctrl *dpmaif_ctrl = param;
t7xx_dpmaif_tx_stop(dpmaif_ctrl);
+ dpmaif_ctrl->state = DPMAIF_STATE_PWROFF;
+ wake_up(&dpmaif_ctrl->tx_wq);
t7xx_dpmaif_hw_stop_all_txq(&dpmaif_ctrl->hw_info);
t7xx_dpmaif_hw_stop_all_rxq(&dpmaif_ctrl->hw_info);
t7xx_dpmaif_disable_irq(dpmaif_ctrl);
@@ -451,6 +453,7 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7xx_dev, void *param)
if (!dpmaif_ctrl)
return 0;
+ dpmaif_ctrl->state = DPMAIF_STATE_PWRON;
t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
t7xx_dpmaif_enable_irq(dpmaif_ctrl);
t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
index 236d632cf..d5a5befec 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
@@ -460,6 +460,9 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
break;
}
+ if (dpmaif_ctrl->state != DPMAIF_STATE_PWRON)
+ continue;
+
ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
if (ret < 0 && ret != -EACCES)
return ret;
--
2.25.1
[-- Attachment #1.2: Type: text/html, Size: 22867 bytes --]
[-- Attachment #2: 0001-net-wwan-t7xx-fix-race-between-TX-thread-and-system-.patch --]
[-- Type: application/octet-stream, Size: 4009 bytes --]
From 7412885fd3b1da86d0fdc23e9a48af4b6d52c370 Mon Sep 17 00:00:00 2001
From: Tim JH Chen <tim.jh.chen@wnc.com.tw>
Date: Wed, 13 May 2026 09:21:40 +0800
Subject: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM
suspend
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When system suspend is triggered while the DPMAIF TX kthread
(t7xx_dpmaif_tx_hw_push_thread) is running, a deadlock can occur
leading to a CPU soft lockup.
The root cause is two-fold:
1. t7xx_dpmaif_suspend() calls t7xx_dpmaif_tx_stop() which only stops
the TX work-queue items (by clearing txq->que_started and waiting on
txq->tx_processing). It does NOT signal the kthread and does NOT
update dpmaif_ctrl->state, which stays DPMAIF_STATE_PWRON.
2. The kthread's state guard (line: "if ... state != DPMAIF_STATE_PWRON")
is only checked at the top of each loop iteration. If the thread
already passed this guard, it proceeds unconditionally to call
pm_runtime_resume_and_get() — which tries to acquire the PM spinlock
also held (or contended) by the system PM suspend path.
The result is a spinlock deadlock observed as:
watchdog: BUG: soft lockup - CPU#N stuck for 26s! [dpmaif_tx_hw_pu]
RIP: _raw_spin_unlock_irqrestore
Call Trace:
__pm_runtime_resume+0x5b/0x80
t7xx_dpmaif_tx_hw_push_thread+0xc4 [mtk_t7xx]
The condition requires ASPM L1 enabled on the endpoint (which extends
the time pm_runtime_resume_and_get() holds the PM lock during L1.2
link retraining) and hundreds of repeated suspend/resume cycles to
trigger reliably.
Fix by three coordinated changes:
- In t7xx_dpmaif_suspend(): immediately set state to DPMAIF_STATE_PWROFF
after stopping the TX queue, then call wake_up() so any sleeping thread
re-evaluates the wait_event condition and stops.
- In t7xx_dpmaif_resume(): restore state to DPMAIF_STATE_PWRON before
re-enabling the TX queues, symmetric with the suspend change.
Without this the kthread would never wake up after resume.
- In t7xx_dpmaif_tx_hw_push_thread(): add a second state check
immediately before pm_runtime_resume_and_get() to close the TOCTOU
window between the wait_event guard and the pm call.
Tested: no soft lockup observed over 500+ suspend/resume cycles with
SIM registered and ASPM L1 enabled (previously triggered in < 300).
Fixes: 05f7e89ab ("Linux 6.19")
Signed-off-by: Tim JH Chen <tim.jh.chen@wnc.com.tw>
---
drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c | 3 +++
drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
index 7ff33c1d6..315a77e24 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
@@ -412,6 +412,8 @@ static int t7xx_dpmaif_suspend(struct t7xx_pci_dev *t7xx_dev, void *param)
struct dpmaif_ctrl *dpmaif_ctrl = param;
t7xx_dpmaif_tx_stop(dpmaif_ctrl);
+ dpmaif_ctrl->state = DPMAIF_STATE_PWROFF;
+ wake_up(&dpmaif_ctrl->tx_wq);
t7xx_dpmaif_hw_stop_all_txq(&dpmaif_ctrl->hw_info);
t7xx_dpmaif_hw_stop_all_rxq(&dpmaif_ctrl->hw_info);
t7xx_dpmaif_disable_irq(dpmaif_ctrl);
@@ -451,6 +453,7 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7xx_dev, void *param)
if (!dpmaif_ctrl)
return 0;
+ dpmaif_ctrl->state = DPMAIF_STATE_PWRON;
t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
t7xx_dpmaif_enable_irq(dpmaif_ctrl);
t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
index 236d632cf..d5a5befec 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
@@ -460,6 +460,9 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
break;
}
+ if (dpmaif_ctrl->state != DPMAIF_STATE_PWRON)
+ continue;
+
ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
if (ret < 0 && ret != -EACCES)
return ret;
--
2.25.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend
2026-05-13 8:37 [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend Tim JH Chen(陳仁鴻)
@ 2026-05-15 0:19 ` Jakub Kicinski
0 siblings, 0 replies; 2+ messages in thread
From: Jakub Kicinski @ 2026-05-15 0:19 UTC (permalink / raw)
To: Tim JH Chen(陳仁鴻)
Cc: netdev@vger.kernel.org, chandrashekar.devegowda@intel.com,
haijun.liu@mediatek.com, ricardo.martinez@linux.intel.com,
loic.poulain@oss.qualcomm.com, ryazanov.s.a@gmail.com,
davem@davemloft.net, linux-kernel@vger.kernel.org
On Wed, 13 May 2026 08:37:48 +0000 Tim JH Chen(陳仁鴻) wrote:
> Date: Wed, 13 May 2026 09:21:40 +0800
> Subject: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM
> suspend
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
Something has corrupted this patch (either your email client or server).
Please try to fix your setup and resend (maybe use b4 gateway).
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-15 0:19 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 8:37 [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend Tim JH Chen(陳仁鴻)
2026-05-15 0:19 ` Jakub Kicinski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.