Re: [PATCH v3 04/10] mailbox: imx: use devm_of_platform

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Re: [PATCH v3 04/10] mailbox: imx: use devm_of_platform_populate()
       [not found] ` <20260617-imx_mbox_rproc-v3-4-77948112defc@linutronix.de>
@ 2026-06-17  7:04   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-17  7:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: imx, linux-rt-devel, linux-kernel, Frank.Li

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] The return value of devm_of_platform_populate() is ignored, leading to silent probe success even if child devices fail to populate, and breaking the intended cleanup on module removal.
--

commit 583b7f6c5d004040c54817c853b20be8858487ce
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

mailbox: imx: use devm_of_platform_populate()

This commit transitions the driver to use the managed function
devm_of_platform_populate(). This is intended to ensure that populated
child devices are properly removed when the parent device is removed,
fixing an issue where module reloads could lead to "double devices".

> diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
> --- a/drivers/mailbox/imx-mailbox.c
> +++ b/drivers/mailbox/imx-mailbox.c
> @@ -984,7 +984,7 @@ static int imx_mu_probe(struct platform_device *pdev)
>  	if (ret)
>  		goto err_out;
>  
> -	of_platform_populate(dev->of_node, NULL, NULL, dev);
> +	devm_of_platform_populate(dev);

[Severity: High]
Should this code check the return value of devm_of_platform_populate()?

If the function fails and returns an error code, the devres cleanup action
is not registered, but imx_mu_probe() will still return 0. This could
lead to a situation where child devices silently fail to populate, and any
partially populated devices would not be cleaned up on module removal.

For example:

	ret = devm_of_platform_populate(dev);
	if (ret)
		goto err_out;

>  
>  	return 0;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de?part=4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 02/10] mailbox: imx: Add a channel shutdown field
       [not found] ` <20260617-imx_mbox_rproc-v3-2-77948112defc@linutronix.de>
@ 2026-06-17  7:08   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-17  7:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Frank.Li, linux-rt-devel, imx, linux-kernel

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [High] The patch introduces `cp->shutdown` to fix a teardown race in the RX path but fails to apply the same fix to the TX path. If a channel is freed while TX messages are queued, the ISR can unconditionally unmask the TX interrupt after `shutdown()` has masked it, leading to a 'nobody cared' interrupt storm.
- [High] Polling jiffies with local IRQs disabled leads to potential deadlock in `imx_mu_tx_waiting_write()`.
- [Medium] Missing `pm_runtime_put_sync()` on the error path of `imx_mu_startup()` causes a permanent power management reference leak.
--

commit ceab4c0d8020e02fd68ce7762cacc2496963937c
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

mailbox: imx: Add a channel shutdown field

sashiko complained about possible teardown problem. The scenario

 CPU 0                              CPU 1
  imx_mu_isr()                   imx_mu_shutdown()
                                   imx_mu_xcr_rmw(priv, IMX_MU_RCR, 0, IMX_MU_xCR_RIEn(priv->dcfg->type, cp->idx));
    imx_mu_specific_rx()
      imx_mu_xcr_rmw(priv, IMX_MU_RCR, IMX_MU_xCR_RIEn(priv->dcfg->type, 0), 0);
                                   free_irq()

The RX event remains enabled because in this short window the RX event
was disabled in ->shutdown() while the interrupt was active and then
enabled again by the ISR while ->shutdown waited in free_irq().

This race requires timing and if happens can be problematic on shared
handlers if the "removed" channel triggers an interrupt. In this case
the irq-core will shutdown the interrupt with the "nobody cared"
message.

Introduce imx_mu_con_priv::shutdown to signal that the channel is
shutting down. This flag is set with the lock held (by
imx_mu_xcr_clr_shut()). The unmask side uses imx_mu_xcr_set_act() which
only enables the event if the channel has not been shutdown and
serialises on the same lock.

> diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
> index 0028073be4a71..e261b382d5c90 100644
> --- a/drivers/mailbox/imx-mailbox.c
> +++ b/drivers/mailbox/imx-mailbox.c

[ ... ]

> @@ -605,6 +636,7 @@ static int imx_mu_startup(struct mbox_chan *chan)
>  		return ret;
>  	}
>  
> +	cp->shutdown = false;

[Severity: Medium]
This is a pre-existing issue and was not introduced by the patch under
review, but does this error path leak the PM runtime reference?

In imx_mu_startup(), pm_runtime_get_sync() is called to increment the PM
usage counter. If request_threaded_irq() fails, the function returns
directly without calling pm_runtime_put_sync():

drivers/mailbox/imx-mailbox.c:imx_mu_startup() {
    ...
    pm_runtime_get_sync(priv->dev);
    ...
    ret = request_threaded_irq(...);
    if (ret) {
        dev_err(...);
        return ret;
    }
    ...
}

The mailbox framework handles this failure via mbox_clean_and_put_channel(),
but does not invoke the shutdown() callback when startup() fails.
Consequently, can the device's PM runtime reference be permanently leaked,
preventing the hardware from entering a suspended state?

[ ... ]

> @@ -639,13 +671,13 @@ static void imx_mu_shutdown(struct mbox_chan *chan)
>  
>  	switch (cp->type) {
>  	case IMX_MU_TYPE_TX:
> -		imx_mu_xcr_rmw(priv, IMX_MU_TCR, 0, IMX_MU_xCR_TIEn(priv->dcfg->type, cp->idx));
> +		imx_mu_xcr_clr_shut(priv, cp, IMX_MU_TCR, IMX_MU_xCR_TIEn(priv->dcfg->type, cp->idx));
>  		break;

[Severity: High]
This is a pre-existing issue and was not introduced by the patch under
review, but does this patch leave the TX path vulnerable to a similar
teardown race?

The patch introduces cp->shutdown to fix the teardown race in the RX path,
but if a TX channel is freed while messages are queued, can the ISR
unconditionally unmask the TX interrupt after shutdown() has masked it?

When mbox_free_channel() calls imx_mu_shutdown(), it masks the TX interrupt
via imx_mu_xcr_clr_shut(), sets cp->shutdown = true, and waits in free_irq().

If a TX interrupt concurrently fires, imx_mu_isr() masks it and wakes the
threaded handler imx_mu_isr_th(). This handler calls mbox_chan_txdone() and
eventually triggers msg_submit().

Because the queue isn't purged yet, it dequeues the next message and
synchronously calls the driver's TX callback. However, imx_mu_generic_tx()
and imx_mu_specific_tx() unconditionally re-enable the TX interrupt:

drivers/mailbox/imx-mailbox.c:imx_mu_generic_tx() {
    ...
    case IMX_MU_TYPE_TX:
        imx_mu_write(priv, *arg, priv->dcfg->xTR + cp->idx * 4);
        imx_mu_xcr_rmw(priv, IMX_MU_TCR, IMX_MU_xCR_TIEn(priv->dcfg->type, cp->idx), 0);
        break;
    ...
}

Once the threaded ISR finishes, free_irq() unblocks, leaving the hardware TX
interrupt enabled but without a registered handler. Will this result in a
"nobody cared" kernel warning?

[ ... ]

[Severity: High]
This is a pre-existing issue and was not introduced by the patch under
review, but does polling jiffies with local IRQs disabled lead to a potential
deadlock here?

The imx_mu_tx_waiting_write() function uses an open-coded timeout loop based
on jiffies:

drivers/mailbox/imx-mailbox.c:imx_mu_tx_waiting_write() {
    ...
    do {
        status = imx_mu_read(priv, priv->dcfg->xSR[IMX_MU_TSR]);
        can_write = status & IMX_MU_xSR_TEn(priv->dcfg->type, idx % 4);
    } while (!can_write && time_is_after_jiffies64(timeout_time));
    ...
}

This function is called from the send_data callback (imx_mu_seco_tx), which
the mailbox core invokes with the channel's spinlock (chan->lock) held. This
spinlock unconditionally disables local interrupts.

Because local interrupts are disabled, the timer tick cannot fire on the
local CPU to update jiffies. On a single-core system, or if the local CPU is
responsible for updating the tick, jiffies will never increment.

If the hardware does not promptly set the status bit, will the loop spin
infinitely and cause a hard lockup?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de?part=2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 01/10] mailbox: imx: Forward the timeout/ error in imx_mu_generic_tx()
       [not found] ` <20260617-imx_mbox_rproc-v3-1-77948112defc@linutronix.de>
@ 2026-06-17  7:08   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-17  7:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, imx, Frank.Li, linux-rt-devel

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Returning a negative error code (-ETIMEDOUT) from the mailbox send_data callback wedges the TX queue permanently.

Pre-existing issues:
- [High] Using readl_poll_timeout() with a timeout inside an atomic section creates up to 100ms of IRQ-disabled latency and is unsafe.
--

commit 00bd1a49fae92272b2f76397f788a118394fe7ae
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

mailbox: imx: Forward the timeout/ error in imx_mu_generic_tx()

This commit modifies imx_mu_generic_tx() to return the actual error code
(such as a timeout) instead of silently dropping it and returning 0. This
was intended to properly notify the caller of failures during the busy-wait
polling loop for TXDB_V2 channels.

> diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
> index 246a9a9e39520..0028073be4a71 100644
> --- a/drivers/mailbox/imx-mailbox.c
> +++ b/drivers/mailbox/imx-mailbox.c
> @@ -227,6 +227,7 @@ static int imx_mu_generic_tx(struct imx_mu_priv *priv,
>  	u32 val;
>  	int ret, count;
>  
> +	ret = 0;
>  	switch (cp->type) {

[ ... ]

> @@ -259,7 +260,7 @@ static int imx_mu_generic_tx(struct imx_mu_priv *priv,
>  		return -EINVAL;
>  	}
>  
> -	return 0;
> +	return ret;

[Severity: High]
Does returning a negative error code here stall the TX queue?

The mailbox framework does not propagate errors from the send_data
callback back to the caller. Instead, it treats non-zero return values
as a sign the controller is temporarily busy:

    drivers/mailbox/mailbox.c:msg_submit() {
        ...
        err = chan->mbox->ops->send_data(chan, data);
        if (!err) {
            chan->active_req = data;
            chan->msg_count--;
        }
        ...
    }

Because this now returns -ETIMEDOUT, the framework skips dequeuing the
message (chan->msg_count is not decremented) and does not set
chan->active_req. Because err is non-zero, no retry timer is started.

The failed message remains at the head of the TX queue, blocking all
future messages. Subsequent calls to mbox_send_message() will repeatedly
attempt to send the exact same message.

[Severity: High]
This is a pre-existing issue, but does the busy-wait loop in this function
cause high latency with interrupts disabled?

When handling IMX_MU_TYPE_TXDB_V2, imx_mu_generic_tx() calls
readl_poll_timeout() with a 10ms timeout and retries up to 10 times.
However, the caller holds a spinlock with interrupts disabled:

    drivers/mailbox/mailbox.c:msg_submit() {
        scoped_guard(spinlock_irqsave, &chan->lock) {
            ...
            err = chan->mbox->ops->send_data(chan, data);
            ...
        }
    }

This can result in up to 100ms of busy-waiting with interrupts
disabled, which can trigger watchdogs.

Furthermore, readl_poll_timeout() relies on ktime_get(), which the kernel
documentation explicitly forbids using in atomic contexts if a timeout
is specified.

>  }
>  
>  static int imx_mu_generic_rx(struct imx_mu_priv *priv,

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de?part=1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 06/10] mailbox: imx: Start splitting the IRQ handler in primary and threaded handler
       [not found] ` <20260617-imx_mbox_rproc-v3-6-77948112defc@linutronix.de>
@ 2026-06-17  7:08   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-17  7:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: imx, linux-rt-devel, Frank.Li, linux-kernel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Channel shutdown race condition leads to an unhandled IRQ storm on shared interrupt line.
--

commit 7ed2b361db99c9ca608465a64760164124f3d0d2
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

mailbox: imx: Start splitting the IRQ handler in primary and threaded handler

This commit splits the mailbox IRQ handling into primary and threaded
handlers. The primary handler masks the event, and the threaded handler
runs the mailbox core functions like mbox_chan_txdone() in a preemptible
context.

> diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
> index 408cd083c64ee..87acc43cb99c4 100644
> --- a/drivers/mailbox/imx-mailbox.c
> +++ b/drivers/mailbox/imx-mailbox.c
> @@ -540,11 +540,31 @@ static void imx_mu_txdb_work(struct work_struct *t)
>  	mbox_chan_txdone(cp->chan, 0);
>  }
>  
> +static irqreturn_t imx_mu_isr_th(int irq, void *p)
> +{
> +	struct mbox_chan *chan = p;
> +	struct imx_mu_priv *priv = to_imx_mu_priv(chan->mbox);
> +	struct imx_mu_con_priv *cp = chan->con_priv;
> +
> +	switch (cp->type) {
> +	case IMX_MU_TYPE_TX:
> +		mbox_chan_txdone(chan, 0);

[Severity: High]
This is a pre-existing issue, but does moving this into a threaded handler
expose a race condition during channel shutdown?

When mbox_free_channel() is called, it invokes the driver's imx_mu_shutdown()
callback, which masks the TX interrupt and calls free_irq().

Because free_irq() waits for any executing threaded handlers to finish,
it will block if imx_mu_isr_th() is currently executing mbox_chan_txdone().

This in turn triggers the mailbox core to submit the next previously queued
message via msg_submit() -> chan->mbox->ops->send_data().

The send callback (such as imx_mu_generic_tx() or imx_mu_specific_tx()) then
unconditionally unmasks the TX interrupt by calling imx_mu_xcr_rmw(),
completely bypassing any shutdown state.

Once the threaded handler finishes, free_irq() completes and unregisters
the interrupt action, but the hardware interrupt remains unmasked.

Since IRQF_SHARED is used, could the unhandled interrupt fire continuously,
resulting in a storm of IRQ_NONE that eventually causes the kernel to disable
the shared IRQ line and break IPC globally?

> +		break;
> +
> +	default:
> +		dev_warn_ratelimited(priv->dev, "Unhandled channel type %d\n",
> +				     cp->type);
> +		return IRQ_NONE;
> +	}
> +	return IRQ_HANDLED;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de?part=6

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 09/10] mailbox: imx: Don't force-thread the primary handler
       [not found] ` <20260617-imx_mbox_rproc-v3-9-77948112defc@linutronix.de>
@ 2026-06-17  7:08   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-17  7:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Frank.Li, linux-rt-devel, linux-kernel, imx

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The return value of `pm_runtime_get_sync()` is ignored in `imx_mu_startup()`, risking unclocked register accesses and PM reference leaks.
--

commit 14f99570339a0f5e4729690bf8d39e0fdeb47ac3
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

mailbox: imx: Don't force-thread the primary handler

This commit optimizes the i.MX mailbox driver by converting the
xcr_lock to a raw_spinlock_t and marking the primary interrupt handler
with IRQF_NO_THREAD. This prevents the primary handler from being
force-threaded, allowing it to run in hardirq context even on PREEMPT_RT.

> diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
> --- a/drivers/mailbox/imx-mailbox.c
> +++ b/drivers/mailbox/imx-mailbox.c

[ ... ]

> @@ -640,7 +635,7 @@ static int imx_mu_startup(struct mbox_chan *chan)
>  {
>  	struct imx_mu_priv *priv = to_imx_mu_priv(chan->mbox);
>  	struct imx_mu_con_priv *cp = chan->con_priv;
> -	unsigned long irq_flag = 0;
> +	unsigned long irq_flag = IRQF_NO_THREAD;
>  	int ret;
>  
>  	pm_runtime_get_sync(priv->dev);

[Severity: High]
This isn't a bug introduced by this patch, but should the return value of
pm_runtime_get_sync() be checked here?

If the device fails to resume (e.g., the PM domain fails to power on),
pm_runtime_get_sync() returns a negative error code but leaves the PM
usage counter incremented. Because the return value goes unchecked, the
driver might incorrectly assume the hardware is ready and proceed to call
hardware access functions like imx_mu_xcr_rmw() during startup.

Accessing unclocked or unpowered registers on i.MX devices can cause an
asynchronous external abort (bus fault), leading to a system crash. Could
this be updated to use pm_runtime_resume_and_get() instead, allowing the
driver to safely abort the startup on failure?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de?part=9

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 03/10] mailbox: imx: Use devm_pm_runtime_enable()
       [not found] ` <20260617-imx_mbox_rproc-v3-3-77948112defc@linutronix.de>
@ 2026-06-17  7:08   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-17  7:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: imx, linux-rt-devel, linux-kernel, Frank.Li

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [High] Missing error check on `pm_runtime_get_sync()` in `imx_mu_startup()` allows unclocked hardware access.
- [High] Checking `ret < 0` for `pm_runtime_put_sync()` causes spurious probe failures and resource leaks on concurrent PM access.
- [High] Missing `pm_runtime_put_sync()` when `request_irq()` fails in `imx_mu_startup()`.
--

commit 023389d6153d2d912a21b34dd94fa1a45a63f27a
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

mailbox: imx: Use devm_pm_runtime_enable()

This patch updates the imx mailbox driver to use devm_pm_runtime_enable()
and reorders the probe function to delay mailbox controller registration
until after PM runtime initialization. This ensures proper cleanup order
during device removal or probe failure, resolving issues with early device
usage.

> diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
> index e261b382d5c90..516a05b64daa1 100644
> --- a/drivers/mailbox/imx-mailbox.c
> +++ b/drivers/mailbox/imx-mailbox.c
[ ... ]
> @@ -966,38 +966,36 @@ static int imx_mu_probe(struct platform_device *pdev)
[ ... ]
>  	ret = pm_runtime_resume_and_get(dev);
>  	if (ret < 0)
> -		goto disable_runtime_pm;
> +		goto disable_clk;
>  
>  	ret = pm_runtime_put_sync(dev);
>  	if (ret < 0)
> -		goto disable_runtime_pm;
> +		goto disable_clk;

[Severity: High]
This isn't a bug introduced by this patch, but should the return value
of pm_runtime_put_sync() be treated as a fatal error here?

pm_runtime_put_sync() can return -EAGAIN or -EBUSY if the device cannot be
suspended immediately, such as when another thread concurrently accesses the
device's PM runtime via sysfs.

If this happens, the valid non-error return value causes the driver to
spuriously abort the probe process. It also leaves the device in RPM_ACTIVE,
which leaks the clock enabled by the earlier pm_runtime_resume_and_get() call.

[ ... ]

While reviewing the PM runtime usage, I also noticed two other pre-existing
issues in imx_mu_startup().

[Severity: High]
This is a pre-existing issue, but does this code leave the clock disabled
if the PM runtime resume fails?

drivers/mailbox/imx-mailbox.c:imx_mu_startup() {
	...
	pm_runtime_get_sync(priv->dev);

	spin_lock_irqsave(&priv->xcr_lock, flags);
	imx_mu_xcr_rmw(priv, 0, IMX_MU_xCR_GIEn(cp->type, cp->idx));
	...
}

If pm_runtime_get_sync() fails (for example, if the clock fails to enable),
the return value is ignored and the device remains suspended. The function
then immediately proceeds to access hardware registers via imx_mu_xcr_rmw().
Can accessing unclocked peripheral registers cause a synchronous external
abort and lead to a system crash?

[Severity: High]
This is a pre-existing issue, but does this error path leak the PM runtime
usage counter?

drivers/mailbox/imx-mailbox.c:imx_mu_startup() {
	...
	pm_runtime_get_sync(priv->dev);
	...
	ret = request_irq(priv->irq[cp->type], imx_mu_isr, irq_flag,
			  cp->irq_desc, chan);
	if (ret) {
		dev_err(priv->dev, "Unable to acquire IRQ %d\n",
			priv->irq[cp->type]);
		return ret;
	}
	...
}

If request_irq() fails, the function returns the error directly without
calling pm_runtime_put_sync(). This would permanently leak the PM runtime
usage counter, preventing the device from ever suspending and keeping the
hardware clock continuously enabled.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de?part=3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 05/10] mailbox: imx: Use channel index instead of zero in imx_mu_specific_rx()
       [not found] ` <20260617-imx_mbox_rproc-v3-5-77948112defc@linutronix.de>
@ 2026-06-17  7:13   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-17  7:13 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: imx, Frank.Li, linux-rt-devel, linux-kernel

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

Pre-existing issues:
- [High] Unsafe hardware polling inside atomic context leads to deadlocks.
- [High] Error paths in `imx_mu_specific_rx()` leave the RX interrupt permanently masked.
- [High] Error paths in `imx_mu_seco_rxdb()` skip clearing GIP and leave the RXDB interrupt permanently masked.
- [High] PM runtime usage counter leak in `imx_mu_startup()` error path.
--

commit ab09c59a56c0e6ea31b7f95dd9b506c460b17d31
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

mailbox: imx: Use channel index instead of zero in imx_mu_specific_rx()

imx_mu_specific_rx() masks channel 0 and unmasks it again at the end of
the function. Given that at startup the channel index got unmasked it
should do the right job.

> diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
> index a8d4e970fb786..408cd083c64ee 100644
> --- a/drivers/mailbox/imx-mailbox.c
> +++ b/drivers/mailbox/imx-mailbox.c
> @@ -381,7 +381,7 @@ static int imx_mu_specific_rx(struct imx_mu_priv *priv, struct imx_mu_con_priv *
>  
>  	data = (u32 *)priv->msg;
>  
> -	imx_mu_xcr_rmw(priv, IMX_MU_RCR, 0, IMX_MU_xCR_RIEn(priv->dcfg->type, 0));
> +	imx_mu_xcr_rmw(priv, IMX_MU_RCR, 0, IMX_MU_xCR_RIEn(priv->dcfg->type, cp->idx));
>  	*data++ = imx_mu_read(priv, priv->dcfg->xRR);
>  
>  	if (priv->dcfg->type & IMX_MU_V2_S4) {

[Severity: High]
This is a pre-existing issue, but the error paths in imx_mu_specific_rx() skip
the corresponding unmask operation.

If size exceeds max_size / 4 or if readl_poll_timeout() times out:

drivers/mailbox/imx-mailbox.c:imx_mu_specific_rx() {
...
	if (size > max_size / 4) {
		dev_err(priv->dev, "Maximal message size (%u bytes) exceeded on RX; got: %i bytes\n", max_size, size << 2);
		return -EINVAL;
	}
...
	for (i = 1; i < size; i++) {
		ret = readl_poll_timeout(...);
		if (ret) {
			dev_err(priv->dev, "timeout read idx %d\n", i);
			return ret;
		}
...
}

Does returning an error here skip unmasking the RIEn interrupt?

In the threaded IRQ handler imx_mu_isr_th(), the unmask is only performed if
this function returns 0. Could this leave the RX interrupt permanently masked
and cause a denial of service for the channel?

[ ... ]

> @@ -408,7 +408,7 @@ static int imx_mu_specific_rx(struct imx_mu_priv *priv, struct imx_mu_con_priv *
>  		*data++ = imx_mu_read(priv, priv->dcfg->xRR + (i % num_rr) * 4);
>  	}
>  
> -	imx_mu_xcr_set_act(priv, cp, IMX_MU_RCR, IMX_MU_xCR_RIEn(priv->dcfg->type, 0));
> +	imx_mu_xcr_set_act(priv, cp, IMX_MU_RCR, IMX_MU_xCR_RIEn(priv->dcfg->type, cp->idx));
>  	mbox_chan_received_data(cp->chan, (void *)priv->msg);
>  
>  	return 0;

[Severity: High]
This isn't a bug introduced by this patch, but are there unsafe hardware polling
loops inside atomic contexts elsewhere in this file?

The imx_mu_send_data() callback executes in atomic context (with chan->lock held
and interrupts disabled), but the underlying TX functions wait using mechanisms
that are unsafe in atomic context.

For example, imx_mu_tx_waiting_write():

drivers/mailbox/imx-mailbox.c:imx_mu_tx_waiting_write() {
...
	} while (!can_write && time_is_after_jiffies64(timeout_time));
...
}

There are also readl_poll_timeout() calls in imx_mu_generic_tx() and
imx_mu_specific_tx():

drivers/mailbox/imx-mailbox.c:imx_mu_specific_tx() {
...
		ret = readl_poll_timeout(priv->base + priv->dcfg->xSR[IMX_MU_TSR], xsr, xsr & IMX_MU_xSR_TEn(priv->dcfg->type, i % num_tr), 0, 5 * USEC_PER_SEC);
...
}

Can polling jiffies or using readl_poll_timeout() inside a spinlock with IRQs
disabled cause hard deadlocks, especially if the local CPU is responsible for
advancing the timer tick? Should readl_poll_timeout_atomic() be used here
instead?

[Severity: High]
This is a pre-existing issue, but do error paths in imx_mu_seco_rxdb() skip
clearing the GIP interrupt status?

drivers/mailbox/imx-mailbox.c:imx_mu_seco_rxdb() {
...
	if (byte_size > sizeof(*msg)) {
...
		err = -EINVAL;
		goto error;
	}
...
	for (i = 1; i < msg->hdr.size; i++) {
		err = imx_mu_rx_waiting_read(priv, arg++, i);
		if (err) {
			dev_err(priv->dev, "timeout read idx %d\n", i);
			goto error;
		}
	}
...
error:
	return err;
}

Because these paths jump directly to the error label, they skip the GSR
register write that clears the GIPn interrupt status.

Furthermore, returning an error skips the GIEn unmasking in imx_mu_isr_th().
Could this leave the RXDB interrupt permanently masked and cause spurious
interrupts if the channel is later closed and reopened?

[Severity: High]
This is a pre-existing issue, but does imx_mu_startup() leak the PM runtime
usage counter on error?

drivers/mailbox/imx-mailbox.c:imx_mu_startup() {
...
	ret = pm_runtime_get_sync(priv->dev);
...
	ret = request_threaded_irq(priv->irq[cp->type],
				   imx_mu_isr,
				   imx_mu_isr_th,
				   IRQF_SHARED,
				   cp->irq_desc, cp);
	if (ret) {
		dev_err(priv->dev, "Unable to acquire IRQ %d\n", priv->irq[cp->type]);
		return ret;
	}
...
}

If request_threaded_irq() fails, the function returns immediately without
calling pm_runtime_put_sync().

Since the mailbox core does not call shutdown() if startup() fails, does this
permanently prevent the device and its power domain from suspending?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de?part=5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 10/10] remoteproc: imx_rproc: Invoke the callback directly
       [not found] ` <20260617-imx_mbox_rproc-v3-10-77948112defc@linutronix.de>
@ 2026-06-17  7:19   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-17  7:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Frank.Li, linux-rt-devel, imx, linux-kernel

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] Unsafe lockless iteration of `rproc->notifyids` in preemptible context risks Use-After-Free of IDR nodes and `rvring` structures during concurrent device teardown.
- [High] If requesting the 'rx' mailbox channel fails, `priv->tx_ch` is freed but its pointer is not cleared, causing a subsequent start to mistakenly treat initialization as successful.
--

commit 6f600102625eeb04d37188290cefc55632750bc2
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

remoteproc: imx_rproc: Invoke the callback directly

The imx-mailbox driver moved the callback invocation into the threaded
IRQ handler. This means the callback is invoked in preemptible context,
so there is no need to schedule a kworker. This patch removes the workqueue
handling and invokes the callback directly.

> diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
> index 0dd80e688b0ea..c97bc1c401655 100644
> --- a/drivers/remoteproc/imx_rproc.c
> +++ b/drivers/remoteproc/imx_rproc.c

[ ... ]

> @@ -835,21 +832,11 @@ static int imx_rproc_notified_idr_cb(int id, void *ptr, void *data)
>  	return 0;
>  }
>  
> -static void imx_rproc_vq_work(struct work_struct *work)
> -{
> -	struct imx_rproc *priv = container_of(work, struct imx_rproc,
> -					      rproc_work);
> -	struct rproc *rproc = priv->rproc;
> -
> -	idr_for_each(&rproc->notifyids, imx_rproc_notified_idr_cb, rproc);
> -}
> -
>  static void imx_rproc_rx_callback(struct mbox_client *cl, void *msg)
>  {
>  	struct rproc *rproc = dev_get_drvdata(cl->dev);
> -	struct imx_rproc *priv = rproc->priv;
>  
> -	queue_work(priv->workqueue, &priv->rproc_work);
> +	idr_for_each(&rproc->notifyids, imx_rproc_notified_idr_cb, rproc);

[Severity: High]
This is a pre-existing issue, but since the iteration now runs directly
in the preemptible threaded IRQ handler, does the lack of RCU protection
risk a use-after-free?

During rproc_shutdown(), imx_rproc_stop() frees the mailbox channel, but
without disabling the IRQ, the callback can run concurrently with
rproc_resource_cleanup().

If the threaded IRQ handler is preempted here, could the RCU grace period
elapse while rproc_resource_cleanup() unregisters the device and removes
entries from the notifyids IDR?

rproc_vq_interrupt() also performs a lockless idr_find() lookup on the same
notifyids IDR:

drivers/remoteproc/remoteproc_virtio.c:rproc_vq_interrupt() {
    ...
    rvring = idr_find(&rproc->notifyids, notifyid);
    ...
}

Would wrapping the idr_for_each() call with rcu_read_lock() and
rcu_read_unlock() be necessary to safely iterate the IDR nodes and prevent
use-after-free of the rvring structures?

>  }
>  
>  static int imx_rproc_xtr_mbox_init(struct rproc *rproc, bool tx_block)

[Severity: High]
This is a pre-existing issue, but I noticed a potential problem in the error
handling within imx_rproc_xtr_mbox_init().

If requesting the rx mailbox channel fails, the code frees the tx channel but
does not clear the pointer:

drivers/remoteproc/imx_rproc.c:imx_rproc_xtr_mbox_init() {
    ...
    priv->rx_ch = mbox_request_channel_byname(cl, "rx");
    if (IS_ERR(priv->rx_ch)) {
        mbox_free_channel(priv->tx_ch);
        return dev_err_probe(...);
    }
    ...
}

Because priv->tx_ch is not set to NULL, and priv->rx_ch contains an ERR_PTR,
the early exit check at the top of the function evaluates to true on any
subsequent call:

    if (priv->tx_ch && priv->rx_ch)
        return 0;

If a user retries starting the remoteproc via sysfs, this will incorrectly
return 0 (success) with a dangling priv->tx_ch pointer. When the remoteproc
later sends a kick, imx_rproc_kick() will use this dangling pointer, which
can cause a use-after-free crash.

Could we set priv->tx_ch to NULL after freeing it to prevent this sequence?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de?part=10

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-17  7:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de>
     [not found] ` <20260617-imx_mbox_rproc-v3-4-77948112defc@linutronix.de>
2026-06-17  7:04   ` [PATCH v3 04/10] mailbox: imx: use devm_of_platform_populate() sashiko-bot
     [not found] ` <20260617-imx_mbox_rproc-v3-2-77948112defc@linutronix.de>
2026-06-17  7:08   ` [PATCH v3 02/10] mailbox: imx: Add a channel shutdown field sashiko-bot
     [not found] ` <20260617-imx_mbox_rproc-v3-1-77948112defc@linutronix.de>
2026-06-17  7:08   ` [PATCH v3 01/10] mailbox: imx: Forward the timeout/ error in imx_mu_generic_tx() sashiko-bot
     [not found] ` <20260617-imx_mbox_rproc-v3-6-77948112defc@linutronix.de>
2026-06-17  7:08   ` [PATCH v3 06/10] mailbox: imx: Start splitting the IRQ handler in primary and threaded handler sashiko-bot
     [not found] ` <20260617-imx_mbox_rproc-v3-9-77948112defc@linutronix.de>
2026-06-17  7:08   ` [PATCH v3 09/10] mailbox: imx: Don't force-thread the primary handler sashiko-bot
     [not found] ` <20260617-imx_mbox_rproc-v3-3-77948112defc@linutronix.de>
2026-06-17  7:08   ` [PATCH v3 03/10] mailbox: imx: Use devm_pm_runtime_enable() sashiko-bot
     [not found] ` <20260617-imx_mbox_rproc-v3-5-77948112defc@linutronix.de>
2026-06-17  7:13   ` [PATCH v3 05/10] mailbox: imx: Use channel index instead of zero in imx_mu_specific_rx() sashiko-bot
     [not found] ` <20260617-imx_mbox_rproc-v3-10-77948112defc@linutronix.de>
2026-06-17  7:19   ` [PATCH v3 10/10] remoteproc: imx_rproc: Invoke the callback directly sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox