Re: [PATCH v6 4/5] dmaengine: fsl-edma: add runtime suspend/resume support

Linux kernel and device drivers for NXP i.MX platforms
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: joy.zou@oss.nxp.com
Cc: Frank.Li@kernel.org, imx@lists.linux.dev, vkoul@kernel.org,
	dmaengine@vger.kernel.org
Subject: Re: [PATCH v6 4/5] dmaengine: fsl-edma: add runtime suspend/resume support
Date: Wed, 01 Jul 2026 09:43:12 +0000	[thread overview]
Message-ID: <20260701094312.B0EE61F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260701-b4-edma-runtime-opt-v6-4-354ff4229c00@oss.nxp.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] Skipping hardware interrupt acknowledgment for inactive PM channels leads to an unrecoverable infinite IRQ storm.
- [High] The channel runtime suspend callback calls `synchronize_irq()` with a negative error code and fails to synchronize actual per-channel shared IRQs.
- [High] Accessing a freed `device_link` in `fsl_edma3_detach_pd()` due to devres teardown order inversion.
- [Medium] The DMA engine and its channels are exposed to clients before their IRQs and runtime PM domains are fully initialized.
- [Medium] Runtime PM usage counters for the engine device and channel power domains are permanently leaked on multiple error paths during probe.
--

commit a5800ff547600d938eb443a21684d6291b4658d1
Author: Joy Zou <joy.zou@nxp.com>

dmaengine: fsl-edma: add runtime suspend/resume support

Introduce runtime suspend and resume support for FSL eDMA. Enable
per-channel power domain management to facilitate runtime suspend and
resume operations.

> diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c
> index fe02b68d75fd3..3518dfb4292d2 100644
> --- a/drivers/dma/fsl-edma-main.c
> +++ b/drivers/dma/fsl-edma-main.c
> @@ -37,15 +37,27 @@ static irqreturn_t fsl_edma_tx_handler(int irq, void *dev_id)
>  	unsigned int intr, ch;
>  	struct edma_regs *regs = &fsl_edma->regs;
>  
> +	if (pm_runtime_get_if_active(fsl_edma->dma_dev.dev) <= 0)
> +		return IRQ_NONE;
> +
>  	intr = edma_readl(fsl_edma, regs->intl);
> -	if (!intr)
> +	if (!intr) {
> +		pm_runtime_put(fsl_edma->dma_dev.dev);
>  		return IRQ_NONE;
> +	}
>  
> +	pm_runtime_put(fsl_edma->dma_dev.dev);
>  	for (ch = 0; ch < fsl_edma->n_chans; ch++) {
> +		struct fsl_edma_chan *fsl_chan = &fsl_edma->chans[ch];
> +
> +		if (pm_runtime_get_if_active(&fsl_chan->vchan.chan.dev->device) <= 0)
> +			continue;
> +
>  		if (intr & (0x1 << ch)) {
>  			edma_writeb(fsl_edma, EDMA_CINT_CINT(ch), regs->cint);
>  			fsl_edma_tx_chan_handler(&fsl_edma->chans[ch]);
>  		}
> +		pm_runtime_put(&fsl_chan->vchan.chan.dev->device);
>  	}
>  	return IRQ_HANDLED;
>  }

[Severity: High]
If a channel is runtime suspended (e.g., due to a hardware race or a spurious
interrupt), the interrupt bit is never cleared because the loop evaluates
pm_runtime_get_if_active() <= 0 and continues to the next iteration.

However, the function ultimately returns IRQ_HANDLED unconditionally. 
Could returning IRQ_HANDLED without clearing the hardware interrupt flag 
trick the interrupt controller into thinking the event was serviced, 
potentially leading to an infinite IRQ storm?

This same pattern appears in fsl_edma_err_handler() and 
fsl_edma3_or_err_handler().

[ ... ]

> @@ -691,6 +711,36 @@ static int fsl_edma3_attach_pd(struct platform_device *pdev, struct fsl_edma_eng
>  	return -EINVAL;
>  }
>  
> +/* Per channel dma power domain */
> +static int fsl_edma_chan_runtime_suspend(struct device *dev)
> +{
> +	struct fsl_edma_chan *fsl_chan = dev_get_drvdata(dev);
> +	struct fsl_edma_engine *fsl_edma = fsl_chan->edma;
> +
> +	if (fsl_edma->txirq)
> +		synchronize_irq(fsl_edma->txirq);
> +
> +	if (fsl_edma->errirq)
> +		synchronize_irq(fsl_edma->errirq);
> +
> +	clk_disable_unprepare(fsl_chan->clk);
> +
> +	return 0;
> +}

[Severity: High]
Are we synchronizing the correct IRQs before disabling the channel clock?

During probe, global IRQs like fsl_edma->txirq are initialized to -EINVAL 
for SoCs using per-channel IRQs. This check evaluates to true because -EINVAL
is non-zero, causing a negative error code to be passed to synchronize_irq().

Additionally, this code doesn't synchronize the channel-specific shared 
IRQ (fsl_chan->txirq). Could this allow a shared IRQ handler to execute 
and access hardware after the clock is disabled?

[ ... ]

> @@ -891,6 +917,64 @@ static int fsl_edma_probe(struct platform_device *pdev)
>  		return dev_err_probe(&pdev->dev, ret,
>  				     "Can't register Freescale eDMA engine.\n");
>  
> +	ret = devm_pm_runtime_enable(&pdev->dev);
> +	if (ret)
> +		return dev_err_probe(&pdev->dev, ret, "Can't enable eDMA engine PM runtime!\n");
> +
> +	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	if (ret < 0)
> +		return dev_err_probe(&pdev->dev, ret, "Failed to resume eDMA engine!\n");
> +
> +	ret = fsl_edma->drvdata->setup_irq(pdev, fsl_edma);
> +	if (ret)
> +		return ret;

[Severity: Medium]
By calling dmaenginem_async_device_register() just prior to enabling 
runtime PM and setting up IRQs, do we expose the channels to clients 
prematurely?

If a client allocates a channel concurrently before runtime PM is enabled, 
fsl_edma_alloc_chan_resources() will invoke pm_runtime_resume_and_get(), 
which could fail with -EINVAL.

[Severity: Medium]
If fsl_edma->drvdata->setup_irq() fails here, we return the error code 
directly. Does this leave the pdev->dev usage counter permanently 
incremented from the pm_runtime_resume_and_get() call just above?

> +	for (i = 0; i < fsl_edma->n_chans; i++) {
> +		struct fsl_edma_chan *fsl_chan = &fsl_edma->chans[i];
> +		struct device *chan_dev;
> +
> +		if (fsl_edma->chan_masked & BIT(i))
> +			continue;
> +
> +		chan_dev = &fsl_chan->vchan.chan.dev->device;
> +		dev_set_drvdata(chan_dev, fsl_chan);
> +		dev_pm_domain_set(chan_dev, &fsl_edma_chan_pm_domain);
> +
> +		if (fsl_chan->pd_dev) {
> +			fsl_chan->pd_dev_link = device_link_add(chan_dev, fsl_chan->pd_dev,
> +								DL_FLAG_STATELESS |
> +								DL_FLAG_PM_RUNTIME);
> +			if (!fsl_chan->pd_dev_link) {
> +				return dev_err_probe(&pdev->dev, -EINVAL,
> +						     "Failed to add device_link to %d!\n", i);
> +			}
> +		}

[Severity: High]
Could the teardown order of these stateless device links result in a 
use-after-free on probe failure or driver unbind?

Earlier in probe, devm_fsl_edma3_detach_pd is registered as a devres 
cleanup action before the DMA engine is registered. 

However, the stateless device link is added here with the channel device 
(chan_dev) as the consumer. On unbind or probe failure, devres actions 
execute in reverse order:

1. The DMA engine unregisters, destroying chan_dev.
2. The driver core automatically purges and frees all device links attached 
   to the deleted device.
3. devm_fsl_edma3_detach_pd executes and calls device_link_del() on the 
   now-freed fsl_chan->pd_dev_link.

> +		ret = devm_pm_runtime_enable(chan_dev);
> +		if (ret)
> +			return dev_err_probe(&pdev->dev, ret,
> +					     "Can't enable eDMA channel PM runtime!\n");
> +
> +		if (fsl_chan->pd_dev) {
> +			ret = pm_runtime_resume_and_get(fsl_chan->pd_dev);
> +			if (ret)
> +				return dev_err_probe(&pdev->dev, ret,
> +						     "Failed to power on eDMA channel %d!\n",
> +						     fsl_chan->vchan.chan.chan_id);
> +		}
> +
> +		ret = pm_runtime_resume_and_get(chan_dev);
> +		if (ret < 0) {
> +			return dev_err_probe(&pdev->dev, ret,
> +					     "Failed to resume eDMA channel %d!\n",
> +					     fsl_chan->vchan.chan.chan_id);
> +		}

[Severity: Medium]
If pm_runtime_resume_and_get(chan_dev) fails here, we return directly.
Does this leak both the engine's usage counter (pdev->dev) and the channel 
power domain's usage counter (fsl_chan->pd_dev) acquired earlier in this 
function?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701-b4-edma-runtime-opt-v6-0-354ff4229c00@oss.nxp.com?part=4

next prev parent reply	other threads:[~2026-07-01  9:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-01  9:29 [PATCH v6 0/5] add runtime suspend/resume support joy.zou
2026-07-01  9:29 ` [PATCH v6 1/5] dmaengine: fsl-edma: use devm_clk_get_optional() for channel clock joy.zou
2026-07-01  9:38   ` sashiko-bot
2026-07-01  9:29 ` [PATCH v6 2/5] dmaengine: fsl-edma: use devm_clk_get_optional() for DMA engine clock joy.zou
2026-07-01  9:40   ` sashiko-bot
2026-07-01  9:29 ` [PATCH v6 3/5] dmaengine: fsl-edma: convert DMAMUX clock handling to bulk clock API joy.zou
2026-07-01  9:39   ` sashiko-bot
2026-07-01  9:29 ` [PATCH v6 4/5] dmaengine: fsl-edma: add runtime suspend/resume support joy.zou
2026-07-01  9:43   ` sashiko-bot [this message]
2026-07-01 14:50     ` Frank Li
2026-07-01  9:29 ` [PATCH v6 5/5] dmaengine: fsl-edma: fix use-after-free after dev_pm_domain_detach() joy.zou
2026-07-01  9:44   ` sashiko-bot
2026-07-01 14:47   ` Frank Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260701094312.B0EE61F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=Frank.Li@kernel.org \
    --cc=dmaengine@vger.kernel.org \
    --cc=imx@lists.linux.dev \
    --cc=joy.zou@oss.nxp.com \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=vkoul@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox