From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBF8A26059D; Wed, 17 Jun 2026 07:08:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781680092; cv=none; b=PfYrgepJ6dEe/D47MRN9JLkLzbZtiOyFRSOHRXNsN9tRX4Wfc0ziEMhVluQMjIJsJ4qwifavgjUq/9Q1pFihF0gF//VrWrG1hLfESFdNH5vvNMzfa4hMDRdcM6NFymMkGQAc75oDmARk4R8oXeOJNTagYiMoyEMHahSXjKw7xto= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781680092; c=relaxed/simple; bh=l1c8tLje5i48X1tvUl4/V+vO9khJKr+oBb/pYNqKpKI=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=LPFZG6D3MGF8I4ckGBbI4IsTJo8v+M81I5bP87me460dOsnpFG7UOTDFaHcwkG8wT+pE+h3RwCr4wk7VgQPCDxSiSk/b0+niGAmB4poCYUkwWZ+Id6SSU9SiSYvZJRGoCnf828PGUd3/w6VV0FNoUDVEGLFqQGxBggC5VpMUX4U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JK3y0e+f; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JK3y0e+f" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 286031F000E9; Wed, 17 Jun 2026 07:08:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781680091; bh=sH/KK53c9P2v2U90v7O5x3mUWpaPocI1za1GXULsH6U=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=JK3y0e+fAWKlisZNdJH0Ws/727HYjtCmpS7orsi4cdXbI4nyBt/CKIHtdVPejniEB DdIqOgZuF6LnAid1MTQ1ydszHK8KmKxTf6c/2R5hxKc+84VLbSKeLeCbCCu5LImuNB LZPEDnq/XTWVNr++D6ErZ3VNw1wspG59srMCsUztvc3YWEaEqb1Os/9JLhCgTB2DEB TM3bR3fprNPajuTshS5Wmfxgcf9yVLhSwBf8fmxLWk0wT/Nren15zZ4AHVb7Ul/KAQ 77KGYqkgvuGOp464PWUxhMYRi0X2lH6H4fu6q+ZTjr4sbGj8mc+wy/HpYw0HHVSFKX iYZ4R4Q2tZh7A== From: sashiko-bot@kernel.org Subject: Re: [PATCH v3 02/10] mailbox: imx: Add a channel shutdown field Reply-To: sashiko-reviews@lists.linux.dev To: "Sebastian Andrzej Siewior" Cc: Frank.Li@kernel.org, linux-rt-devel@lists.linux.dev, imx@lists.linux.dev, linux-kernel@vger.kernel.org In-Reply-To: <20260617-imx_mbox_rproc-v3-2-77948112defc@linutronix.de> References: <20260617-imx_mbox_rproc-v3-0-77948112defc@linutronix.de> <20260617-imx_mbox_rproc-v3-2-77948112defc@linutronix.de> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 17 Jun 2026 07:08:10 +0000 Message-Id: <20260617070811.286031F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-rt-devel@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 3 potential issue(= s) to consider: Pre-existing issues: - [High] The patch introduces `cp->shutdown` to fix a teardown race in the = RX path but fails to apply the same fix to the TX path. If a channel is fre= ed while TX messages are queued, the ISR can unconditionally unmask the TX = interrupt after `shutdown()` has masked it, leading to a 'nobody cared' int= errupt storm. - [High] Polling jiffies with local IRQs disabled leads to potential deadlo= ck in `imx_mu_tx_waiting_write()`. - [Medium] Missing `pm_runtime_put_sync()` on the error path of `imx_mu_sta= rtup()` causes a permanent power management reference leak. -- commit ceab4c0d8020e02fd68ce7762cacc2496963937c Author: Sebastian Andrzej Siewior mailbox: imx: Add a channel shutdown field sashiko complained about possible teardown problem. The scenario CPU 0 CPU 1 imx_mu_isr() imx_mu_shutdown() imx_mu_xcr_rmw(priv, IMX_MU_RCR, 0, IMX_= MU_xCR_RIEn(priv->dcfg->type, cp->idx)); imx_mu_specific_rx() imx_mu_xcr_rmw(priv, IMX_MU_RCR, IMX_MU_xCR_RIEn(priv->dcfg->type, 0)= , 0); free_irq() The RX event remains enabled because in this short window the RX event was disabled in ->shutdown() while the interrupt was active and then enabled again by the ISR while ->shutdown waited in free_irq(). This race requires timing and if happens can be problematic on shared handlers if the "removed" channel triggers an interrupt. In this case the irq-core will shutdown the interrupt with the "nobody cared" message. Introduce imx_mu_con_priv::shutdown to signal that the channel is shutting down. This flag is set with the lock held (by imx_mu_xcr_clr_shut()). The unmask side uses imx_mu_xcr_set_act() which only enables the event if the channel has not been shutdown and serialises on the same lock. > diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c > index 0028073be4a71..e261b382d5c90 100644 > --- a/drivers/mailbox/imx-mailbox.c > +++ b/drivers/mailbox/imx-mailbox.c [ ... ] > @@ -605,6 +636,7 @@ static int imx_mu_startup(struct mbox_chan *chan) > return ret; > } > =20 > + cp->shutdown =3D false; [Severity: Medium] This is a pre-existing issue and was not introduced by the patch under review, but does this error path leak the PM runtime reference? In imx_mu_startup(), pm_runtime_get_sync() is called to increment the PM usage counter. If request_threaded_irq() fails, the function returns directly without calling pm_runtime_put_sync(): drivers/mailbox/imx-mailbox.c:imx_mu_startup() { ... pm_runtime_get_sync(priv->dev); ... ret =3D request_threaded_irq(...); if (ret) { dev_err(...); return ret; } ... } The mailbox framework handles this failure via mbox_clean_and_put_channel(), but does not invoke the shutdown() callback when startup() fails. Consequently, can the device's PM runtime reference be permanently leaked, preventing the hardware from entering a suspended state? [ ... ] > @@ -639,13 +671,13 @@ static void imx_mu_shutdown(struct mbox_chan *chan) > =20 > switch (cp->type) { > case IMX_MU_TYPE_TX: > - imx_mu_xcr_rmw(priv, IMX_MU_TCR, 0, IMX_MU_xCR_TIEn(priv->dcfg->type, = cp->idx)); > + imx_mu_xcr_clr_shut(priv, cp, IMX_MU_TCR, IMX_MU_xCR_TIEn(priv->dcfg->= type, cp->idx)); > break; [Severity: High] This is a pre-existing issue and was not introduced by the patch under review, but does this patch leave the TX path vulnerable to a similar teardown race? The patch introduces cp->shutdown to fix the teardown race in the RX path, but if a TX channel is freed while messages are queued, can the ISR unconditionally unmask the TX interrupt after shutdown() has masked it? When mbox_free_channel() calls imx_mu_shutdown(), it masks the TX interrupt via imx_mu_xcr_clr_shut(), sets cp->shutdown =3D true, and waits in free_ir= q(). If a TX interrupt concurrently fires, imx_mu_isr() masks it and wakes the threaded handler imx_mu_isr_th(). This handler calls mbox_chan_txdone() and eventually triggers msg_submit(). Because the queue isn't purged yet, it dequeues the next message and synchronously calls the driver's TX callback. However, imx_mu_generic_tx() and imx_mu_specific_tx() unconditionally re-enable the TX interrupt: drivers/mailbox/imx-mailbox.c:imx_mu_generic_tx() { ... case IMX_MU_TYPE_TX: imx_mu_write(priv, *arg, priv->dcfg->xTR + cp->idx * 4); imx_mu_xcr_rmw(priv, IMX_MU_TCR, IMX_MU_xCR_TIEn(priv->dcfg->type, = cp->idx), 0); break; ... } Once the threaded ISR finishes, free_irq() unblocks, leaving the hardware TX interrupt enabled but without a registered handler. Will this result in a "nobody cared" kernel warning? [ ... ] [Severity: High] This is a pre-existing issue and was not introduced by the patch under review, but does polling jiffies with local IRQs disabled lead to a potenti= al deadlock here? The imx_mu_tx_waiting_write() function uses an open-coded timeout loop based on jiffies: drivers/mailbox/imx-mailbox.c:imx_mu_tx_waiting_write() { ... do { status =3D imx_mu_read(priv, priv->dcfg->xSR[IMX_MU_TSR]); can_write =3D status & IMX_MU_xSR_TEn(priv->dcfg->type, idx % 4); } while (!can_write && time_is_after_jiffies64(timeout_time)); ... } This function is called from the send_data callback (imx_mu_seco_tx), which the mailbox core invokes with the channel's spinlock (chan->lock) held. This spinlock unconditionally disables local interrupts. Because local interrupts are disabled, the timer tick cannot fire on the local CPU to update jiffies. On a single-core system, or if the local CPU is responsible for updating the tick, jiffies will never increment. If the hardware does not promptly set the status bit, will the loop spin infinitely and cause a hard lockup? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260617-imx_mbox_r= proc-v3-0-77948112defc@linutronix.de?part=3D2