Netdev List
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: tim770802@gmail.com
Cc: Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org, pabeni@redhat.com,
	haijun.liu@mediatek.com, chandrashekar.devegowda@intel.com,
	ricardo.martinez@linux.intel.com, loic.poulain@oss.qualcomm.com,
	ryazanov.s.a@gmail.com, johannes@sipsolutions.net,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, linux-kernel@vger.kernel.org,
	tim.jh.chen@wnc.com.tw, Chih.Hung.Huang@wnc.com.tw
Subject: Re: [PATCH net v4] net: wwan: t7xx: fix race between TX path and system PM suspend
Date: Thu, 11 Jun 2026 20:04:35 +0100	[thread overview]
Message-ID: <20260611190433.634941-3-horms@kernel.org> (raw)
In-Reply-To: <20260610061014.597533-1-tim770802@gmail.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev

Simon says:

Hi Tim,

While I think the issues that are flagged as pre-existing below
can be considered in the context of possible follow-up, the other
two issues - the first and last one - do look like they effect this patch.

---
net: wwan: t7xx: fix race between TX path and system PM suspend

This commit addresses a CPU soft lockup that occurs during system suspend
when ASPM L1 is enabled. It aims to quiesce the DPMAIF TX contexts by
using the PM freezer, marking the TX push kthread as freezable and the
TX-done workqueue with WQ_FREEZABLE. This prevents them from touching
hardware while the suspend callback tears it down.

> diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
> index 236d632cf591b..804bd730c40fa 100644
> --- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
> +++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c

[ ... ]

> @@ -447,19 +448,28 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
>  	struct dpmaif_ctrl *dpmaif_ctrl = arg;
>  	int ret;
>  
> +	set_freezable();
> +
>  	while (!kthread_should_stop()) {
>  		if (t7xx_tx_lists_are_all_empty(dpmaif_ctrl) ||
>  		    dpmaif_ctrl->state != DPMAIF_STATE_PWRON) {
> -			if (wait_event_interruptible(dpmaif_ctrl->tx_wq,
> -						     (!t7xx_tx_lists_are_all_empty(dpmaif_ctrl) &&
> -						     dpmaif_ctrl->state == DPMAIF_STATE_PWRON) ||
> -						     kthread_should_stop()))
> +			if (wait_event_freezable(dpmaif_ctrl->tx_wq,
> +						 (!t7xx_tx_lists_are_all_empty(dpmaif_ctrl) &&
> +						  dpmaif_ctrl->state == DPMAIF_STATE_PWRON) ||
> +						 kthread_should_stop()))
>  				continue;
>  
>  			if (kthread_should_stop())
>  				break;
>  		}
>  
> +		/* Freeze here, outside the runtime-PM and MMIO section below, so
> +		 * the system suspend freezer parks this thread before the device
> +		 * suspend callbacks tear the DPMAIF hardware down.
> +		 */
> +		if (try_to_freeze())
> +			continue;
> +

[Severity: High]
Does this setup cause a freeze deadlock?

Since the tx worker queue is now freezable, if the worker freezes first, it
stops processing TX completions. The DRB ring stays full and no space is freed.

Meanwhile, if there are pending skbs, t7xx_do_tx_hw_push() repeatedly calls
t7xx_txq_burst_send_skb(), which returns 0 when the ring is full:

t7xx_do_tx_hw_push() {
    ...
    drb_send_cnt = t7xx_txq_burst_send_skb(txq);
    if (drb_send_cnt <= 0) {
        usleep_range(10, 20);
        cond_resched();
        continue;
    }
    ...
}

Because this do-while loop inside t7xx_do_tx_hw_push() never exits when the
DRB ring is full, the kthread never loops back to reach this try_to_freeze()
call. Will this cause the freezer to time out and abort the system suspend?

>  		ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
>  		if (ret < 0 && ret != -EACCES)
>  			return ret;

[Severity: High]
This is a pre-existing issue, but does this lead to a runtime PM usage
counter underflow when pm_runtime_resume_and_get() returns -EACCES?

pm_runtime_resume_and_get() internally decrements the usage counter back to
its original value if it returns an error. By ignoring -EACCES here, the
code proceeds and unconditionally executes pm_runtime_put_autosuspend() at
the end of the loop, dropping the usage count a second time.

A similar issue exists in t7xx_dpmaif_tx_done() where the TX done work item
also ignores -EACCES and continues. Could this lead to a negative reference
count when runtime PM is disabled?

[Severity: High]
This is a pre-existing issue, but does returning early here cause a
use-after-free later?

If pm_runtime_resume_and_get() returns an error other than -EACCES, the
function returns early. This terminates the kthread and invokes do_exit(),
which frees the task_struct since kthread_run() does not hold an extra
reference.

Later, when the device is unbound, t7xx_dpmaif_tx_thread_rel() calls
kthread_stop(dpmaif_ctrl->tx_thread) on the freed task pointer:

t7xx_dpmaif_tx_thread_rel() {
    ...
    if (dpmaif_ctrl->tx_thread)
        kthread_stop(dpmaif_ctrl->tx_thread);
    ...
}

Will this cause a panic?

[ ... ]

> @@ -617,7 +627,7 @@ int t7xx_dpmaif_txq_init(struct dpmaif_tx_queue *txq)
>  	}
>  
>  	txq->worker = alloc_ordered_workqueue("md_dpmaif_tx%d_worker",
> -				WQ_MEM_RECLAIM | (txq->index ? 0 : WQ_HIGHPRI),
> +				WQ_MEM_RECLAIM | WQ_FREEZABLE | (txq->index ? 0 : WQ_HIGHPRI),
>  				txq->index);
>  	if (!txq->worker)
>  		return -ENOMEM;

[Severity: High]
This is a pre-existing issue, but are there other hardware-accessing workqueues
that also need to be marked as freezable to prevent this system suspend race?

For example, the bat_release_wq is allocated without WQ_FREEZABLE:

t7xx_dpmaif_bat_rel_wq_alloc() {
    ...
    dpmaif_ctrl->bat_release_wq = alloc_workqueue("dpmaif_bat_release_work_queue",
                                                  WQ_MEM_RECLAIM | WQ_PERCPU, 1);
    ...
}

And its worker ignores -EACCES from runtime PM and accesses the hardware
directly:

t7xx_dpmaif_bat_release_work() {
    ...
    if (ret < 0 && ret != -EACCES)
        return;

    t7xx_pci_disable_sleep(dpmaif_ctrl->t7xx_dev);
    ...
}

Similarly, the CLDMA TX and RX workqueues are allocated without WQ_FREEZABLE
in t7xx_cldma_init(). Could these unfrozen workers execute pending work
against a suspended or torn-down device and cause the same soft lockups?

      reply	other threads:[~2026-06-11 19:06 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10  6:10 [PATCH net v4] net: wwan: t7xx: fix race between TX path and system PM suspend Tim JH Chen
2026-06-11 19:04 ` Simon Horman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260611190433.634941-3-horms@kernel.org \
    --to=horms@kernel.org \
    --cc=Chih.Hung.Huang@wnc.com.tw \
    --cc=andrew+netdev@lunn.ch \
    --cc=chandrashekar.devegowda@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=haijun.liu@mediatek.com \
    --cc=johannes@sipsolutions.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=loic.poulain@oss.qualcomm.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=ricardo.martinez@linux.intel.com \
    --cc=ryazanov.s.a@gmail.com \
    --cc=tim.jh.chen@wnc.com.tw \
    --cc=tim770802@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox