public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mailbox: Fix NULL message support in mbox_send_message()
@ 2026-03-22 17:17 jassisinghbrar
  2026-03-23  5:13 ` Joonwon Kang
  0 siblings, 1 reply; 4+ messages in thread
From: jassisinghbrar @ 2026-03-22 17:17 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: dianders, shawn.guo, maz, stable, andersson, tglx, joonwonkang,
	Jassi Brar

From: Jassi Brar <jassisinghbrar@gmail.com>

The active_req field serves double duty as both the "is a TX in
flight" flag (NULL means idle) and the storage for the in-flight
message pointer. When a client sends NULL via mbox_send_message(),
active_req is set to NULL, which the framework misinterprets as
"no active request". This breaks the TX state machine by:

 - tx_tick() short-circuits on (!mssg), skipping the tx_done
   callback and the tx_complete completion
 - txdone_hrtimer() skips the channel entirely since active_req
   is NULL, so poll-based TX-done detection never fires.

Fix this by introducing a MBOX_NO_MSG sentinel value that means
"no active request," freeing NULL to be valid message data. The
sentinel is defined in the subsystem-internal mailbox.h so that
controller drivers within drivers/mailbox/ can reference it, but
it is not exposed to clients outside the subsystem.

Fifteen in-tree callers send NULL (doorbell-style IPCs on Qualcomm,
Tegra, TI, Xilinx, i.MX, SCMI, and PCC platforms). All were
audited for regression:

 - Most already work around the bug via knows_txdone=true with a
   manual mbox_client_txdone() call, making the framework's
   tracking irrelevant. These are unaffected.

 - Poll-based callers (Xilinx zynqmp/r5) are strictly better off:
   the poll timer now correctly detects NULL-active channels
   instead of silently skipping them.

 - irq-qcom-mpm.c was a pre-existing bug -- the only Qualcomm
   caller that omitted the knows_txdone + mbox_client_txdone()
   pattern. Fixed in a companion commit ("irqchip/qcom-mpm: Fix
   missing mailbox TX done acknowledgment").

 - No caller sets both a tx_done callback and sends NULL, nor
   combines tx_block=true with NULL sends, so the newly reachable
   callback/completion paths are never exercised.

Also update tegra-hsp's flush callback, which directly inspects
active_req to wait for the channel to drain: the old "!= NULL"
check becomes "!= MBOX_NO_MSG", otherwise flush spins until
timeout since the sentinel is non-NULL.

The only tradeoff is that 'MBOX_NO_MSG' can not be used as a message
by clients.

Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
---
 drivers/mailbox/mailbox.c   | 13 +++++++------
 drivers/mailbox/mailbox.h   |  3 +++
 drivers/mailbox/tegra-hsp.c |  2 +-
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 617ba505691d..2a7fc7395144 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -52,7 +52,7 @@ static void msg_submit(struct mbox_chan *chan)
 	int err = -EBUSY;
 
 	scoped_guard(spinlock_irqsave, &chan->lock) {
-		if (!chan->msg_count || chan->active_req)
+		if (!chan->msg_count || chan->active_req != MBOX_NO_MSG)
 			break;
 
 		count = chan->msg_count;
@@ -87,13 +87,13 @@ static void tx_tick(struct mbox_chan *chan, int r)
 
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		mssg = chan->active_req;
-		chan->active_req = NULL;
+		chan->active_req = MBOX_NO_MSG;
 	}
 
 	/* Submit next message */
 	msg_submit(chan);
 
-	if (!mssg)
+	if (mssg == MBOX_NO_MSG)
 		return;
 
 	/* Notify the client */
@@ -114,7 +114,7 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
 	for (i = 0; i < mbox->num_chans; i++) {
 		struct mbox_chan *chan = &mbox->chans[i];
 
-		if (chan->active_req && chan->cl) {
+		if (chan->active_req != MBOX_NO_MSG && chan->cl) {
 			txdone = chan->mbox->ops->last_tx_done(chan);
 			if (txdone)
 				tx_tick(chan, 0);
@@ -319,7 +319,7 @@ static int __mbox_bind_client(struct mbox_chan *chan, struct mbox_client *cl)
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		chan->msg_free = 0;
 		chan->msg_count = 0;
-		chan->active_req = NULL;
+		chan->active_req = MBOX_NO_MSG;
 		chan->cl = cl;
 		init_completion(&chan->tx_complete);
 
@@ -477,7 +477,7 @@ void mbox_free_channel(struct mbox_chan *chan)
 	/* The queued TX requests are simply aborted, no callbacks are made */
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		chan->cl = NULL;
-		chan->active_req = NULL;
+		chan->active_req = MBOX_NO_MSG;
 		if (chan->txdone_method == TXDONE_BY_ACK)
 			chan->txdone_method = TXDONE_BY_POLL;
 	}
@@ -532,6 +532,7 @@ int mbox_controller_register(struct mbox_controller *mbox)
 
 		chan->cl = NULL;
 		chan->mbox = mbox;
+		chan->active_req = MBOX_NO_MSG;
 		chan->txdone_method = txdone;
 		spin_lock_init(&chan->lock);
 	}
diff --git a/drivers/mailbox/mailbox.h b/drivers/mailbox/mailbox.h
index e1ec4efab693..c77dd6fc5b8a 100644
--- a/drivers/mailbox/mailbox.h
+++ b/drivers/mailbox/mailbox.h
@@ -5,6 +5,9 @@
 
 #include <linux/bits.h>
 
+/* Sentinel value distinguishing "no active request" from "NULL message data" */
+#define MBOX_NO_MSG	((void *)-1)
+
 #define TXDONE_BY_IRQ	BIT(0) /* controller has remote RTR irq */
 #define TXDONE_BY_POLL	BIT(1) /* controller can read status of last TX */
 #define TXDONE_BY_ACK	BIT(2) /* S/W ACK received by Client ticks the TX */
diff --git a/drivers/mailbox/tegra-hsp.c b/drivers/mailbox/tegra-hsp.c
index ed9a0bb2bcd8..7991e8dba579 100644
--- a/drivers/mailbox/tegra-hsp.c
+++ b/drivers/mailbox/tegra-hsp.c
@@ -497,7 +497,7 @@ static int tegra_hsp_mailbox_flush(struct mbox_chan *chan,
 			mbox_chan_txdone(chan, 0);
 
 			/* Wait until channel is empty */
-			if (chan->active_req != NULL)
+			if (chan->active_req != MBOX_NO_MSG)
 				continue;
 
 			return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mailbox: Fix NULL message support in mbox_send_message()
  2026-03-22 17:17 [PATCH] mailbox: Fix NULL message support in mbox_send_message() jassisinghbrar
@ 2026-03-23  5:13 ` Joonwon Kang
  2026-03-23 14:57   ` Jassi Brar
  0 siblings, 1 reply; 4+ messages in thread
From: Joonwon Kang @ 2026-03-23  5:13 UTC (permalink / raw)
  To: jassisinghbrar
  Cc: andersson, dianders, joonwonkang, linux-arm-kernel, linux-kernel,
	maz, shawn.guo, stable, tglx, akpm

> The active_req field serves double duty as both the "is a TX in
> flight" flag (NULL means idle) and the storage for the in-flight
> message pointer. When a client sends NULL via mbox_send_message(),
> active_req is set to NULL, which the framework misinterprets as
> "no active request". This breaks the TX state machine by:
> 
>  - tx_tick() short-circuits on (!mssg), skipping the tx_done
>    callback and the tx_complete completion
>  - txdone_hrtimer() skips the channel entirely since active_req
>    is NULL, so poll-based TX-done detection never fires.
> 
> Fix this by introducing a MBOX_NO_MSG sentinel value that means
> "no active request," freeing NULL to be valid message data. The
> sentinel is defined in the subsystem-internal mailbox.h so that
> controller drivers within drivers/mailbox/ can reference it, but
> it is not exposed to clients outside the subsystem.

It sounds that it allows future controller drivers also to refer to the
new sentinel pointer value.

> 
> Fifteen in-tree callers send NULL (doorbell-style IPCs on Qualcomm,
> Tegra, TI, Xilinx, i.MX, SCMI, and PCC platforms). All were
> audited for regression:
> 
>  - Most already work around the bug via knows_txdone=true with a
>    manual mbox_client_txdone() call, making the framework's
>    tracking irrelevant. These are unaffected.
> 
>  - Poll-based callers (Xilinx zynqmp/r5) are strictly better off:
>    the poll timer now correctly detects NULL-active channels
>    instead of silently skipping them.
> 
>  - irq-qcom-mpm.c was a pre-existing bug -- the only Qualcomm
>    caller that omitted the knows_txdone + mbox_client_txdone()
>    pattern. Fixed in a companion commit ("irqchip/qcom-mpm: Fix
>    missing mailbox TX done acknowledgment").
> 
>  - No caller sets both a tx_done callback and sends NULL, nor
>    combines tx_block=true with NULL sends, so the newly reachable
>    callback/completion paths are never exercised.
> 
> Also update tegra-hsp's flush callback, which directly inspects
> active_req to wait for the channel to drain: the old "!= NULL"
> check becomes "!= MBOX_NO_MSG", otherwise flush spins until
> timeout since the sentinel is non-NULL.
> 
> The only tradeoff is that 'MBOX_NO_MSG' can not be used as a message
> by clients.

The other, but I guess more important, tradeoff is that future controller
driver developers should now know that the pointer value of `->active->req`
could be -1(== MBOX_NO_MSG) other than conventional pointer value(memory
address, NULL, or error-encoded pointer value). Although I am still not
sure if this is better than changing the type from pointer to integer
index to make the intention clearer, could you add API doc for
`->active_req` that it may become MBOX_NO_MSG when no active request
exists? Otherwise, it is likely that the future driver developers just use
NULL to check the channel emptiness and also are not aware that this
pointer could be assigned the unconventional value -1. Just like this
mis-use case of `->active_req` was not caught in tegra-hsp.c in the first
attempt of this patch, it could be hard the same way for controller driver
developers to avoid/prevent the mis-use without proper API doc.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mailbox: Fix NULL message support in mbox_send_message()
  2026-03-23  5:13 ` Joonwon Kang
@ 2026-03-23 14:57   ` Jassi Brar
  2026-03-23 15:43     ` Joonwon Kang
  0 siblings, 1 reply; 4+ messages in thread
From: Jassi Brar @ 2026-03-23 14:57 UTC (permalink / raw)
  To: Joonwon Kang
  Cc: andersson, dianders, linux-arm-kernel, linux-kernel, maz,
	shawn.guo, stable, tglx, akpm

On Mon, Mar 23, 2026 at 12:14 AM Joonwon Kang <joonwonkang@google.com> wrote:
>
> > The active_req field serves double duty as both the "is a TX in
> > flight" flag (NULL means idle) and the storage for the in-flight
> > message pointer. When a client sends NULL via mbox_send_message(),
> > active_req is set to NULL, which the framework misinterprets as
> > "no active request". This breaks the TX state machine by:
> >
> >  - tx_tick() short-circuits on (!mssg), skipping the tx_done
> >    callback and the tx_complete completion
> >  - txdone_hrtimer() skips the channel entirely since active_req
> >    is NULL, so poll-based TX-done detection never fires.
> >
> > Fix this by introducing a MBOX_NO_MSG sentinel value that means
> > "no active request," freeing NULL to be valid message data. The
> > sentinel is defined in the subsystem-internal mailbox.h so that
> > controller drivers within drivers/mailbox/ can reference it, but
> > it is not exposed to clients outside the subsystem.
>
> It sounds that it allows future controller drivers also to refer to the
> new sentinel pointer value.
>
Sentinel value is not the problem, active_req should have been hidden
from controllers. Which is actually respected by all controllers
except the tegra-hsp.c

> >
> > Fifteen in-tree callers send NULL (doorbell-style IPCs on Qualcomm,
> > Tegra, TI, Xilinx, i.MX, SCMI, and PCC platforms). All were
> > audited for regression:
> >
> >  - Most already work around the bug via knows_txdone=true with a
> >    manual mbox_client_txdone() call, making the framework's
> >    tracking irrelevant. These are unaffected.
> >
> >  - Poll-based callers (Xilinx zynqmp/r5) are strictly better off:
> >    the poll timer now correctly detects NULL-active channels
> >    instead of silently skipping them.
> >
> >  - irq-qcom-mpm.c was a pre-existing bug -- the only Qualcomm
> >    caller that omitted the knows_txdone + mbox_client_txdone()
> >    pattern. Fixed in a companion commit ("irqchip/qcom-mpm: Fix
> >    missing mailbox TX done acknowledgment").
> >
> >  - No caller sets both a tx_done callback and sends NULL, nor
> >    combines tx_block=true with NULL sends, so the newly reachable
> >    callback/completion paths are never exercised.
> >
> > Also update tegra-hsp's flush callback, which directly inspects
> > active_req to wait for the channel to drain: the old "!= NULL"
> > check becomes "!= MBOX_NO_MSG", otherwise flush spins until
> > timeout since the sentinel is non-NULL.
> >
> > The only tradeoff is that 'MBOX_NO_MSG' can not be used as a message
> > by clients.
>
> The other, but I guess more important, tradeoff is that future controller
> driver developers should now know that the pointer value of `->active->req`
> could be -1(== MBOX_NO_MSG) other than conventional pointer value(memory
> address, NULL, or error-encoded pointer value).
>
That should not be a concern. Controller drivers shouldn't peek into
mailbox internals and if they do they will know the sentinel value
being used.
For example, of the ~40 drivers, only tegra-hsp.c chose to (not had
to) use active_req and it relied on the sentinel value, which will now
be MBOX_NO_MSG.

Thanks
Jassi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mailbox: Fix NULL message support in mbox_send_message()
  2026-03-23 14:57   ` Jassi Brar
@ 2026-03-23 15:43     ` Joonwon Kang
  0 siblings, 0 replies; 4+ messages in thread
From: Joonwon Kang @ 2026-03-23 15:43 UTC (permalink / raw)
  To: jassisinghbrar
  Cc: akpm, andersson, dianders, joonwonkang, linux-arm-kernel,
	linux-kernel, maz, shawn.guo, stable, tglx

> On Mon, Mar 23, 2026 at 12:14 AM Joonwon Kang <joonwonkang@google.com> wrote:
> >
> > > The active_req field serves double duty as both the "is a TX in
> > > flight" flag (NULL means idle) and the storage for the in-flight
> > > message pointer. When a client sends NULL via mbox_send_message(),
> > > active_req is set to NULL, which the framework misinterprets as
> > > "no active request". This breaks the TX state machine by:
> > >
> > >  - tx_tick() short-circuits on (!mssg), skipping the tx_done
> > >    callback and the tx_complete completion
> > >  - txdone_hrtimer() skips the channel entirely since active_req
> > >    is NULL, so poll-based TX-done detection never fires.
> > >
> > > Fix this by introducing a MBOX_NO_MSG sentinel value that means
> > > "no active request," freeing NULL to be valid message data. The
> > > sentinel is defined in the subsystem-internal mailbox.h so that
> > > controller drivers within drivers/mailbox/ can reference it, but
> > > it is not exposed to clients outside the subsystem.
> >
> > It sounds that it allows future controller drivers also to refer to the
> > new sentinel pointer value.
> >
> Sentinel value is not the problem, active_req should have been hidden
> from controllers. Which is actually respected by all controllers
> except the tegra-hsp.c
> 
> > >
> > > Fifteen in-tree callers send NULL (doorbell-style IPCs on Qualcomm,
> > > Tegra, TI, Xilinx, i.MX, SCMI, and PCC platforms). All were
> > > audited for regression:
> > >
> > >  - Most already work around the bug via knows_txdone=true with a
> > >    manual mbox_client_txdone() call, making the framework's
> > >    tracking irrelevant. These are unaffected.
> > >
> > >  - Poll-based callers (Xilinx zynqmp/r5) are strictly better off:
> > >    the poll timer now correctly detects NULL-active channels
> > >    instead of silently skipping them.
> > >
> > >  - irq-qcom-mpm.c was a pre-existing bug -- the only Qualcomm
> > >    caller that omitted the knows_txdone + mbox_client_txdone()
> > >    pattern. Fixed in a companion commit ("irqchip/qcom-mpm: Fix
> > >    missing mailbox TX done acknowledgment").
> > >
> > >  - No caller sets both a tx_done callback and sends NULL, nor
> > >    combines tx_block=true with NULL sends, so the newly reachable
> > >    callback/completion paths are never exercised.
> > >
> > > Also update tegra-hsp's flush callback, which directly inspects
> > > active_req to wait for the channel to drain: the old "!= NULL"
> > > check becomes "!= MBOX_NO_MSG", otherwise flush spins until
> > > timeout since the sentinel is non-NULL.
> > >
> > > The only tradeoff is that 'MBOX_NO_MSG' can not be used as a message
> > > by clients.
> >
> > The other, but I guess more important, tradeoff is that future controller
> > driver developers should now know that the pointer value of `->active->req`
> > could be -1(== MBOX_NO_MSG) other than conventional pointer value(memory
> > address, NULL, or error-encoded pointer value).
> >
> That should not be a concern. Controller drivers shouldn't peek into
> mailbox internals

Thanks for this clarification on your intention. This resolves the afore-
mentioned concerns.

> and if they do they will know the sentinel value
> being used.
> For example, of the ~40 drivers, only tegra-hsp.c chose to (not had
> to) use active_req and it relied on the sentinel value, which will now
> be MBOX_NO_MSG.
> 
> Thanks
> Jassi

Thanks,
Joonwon Kang

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-23 15:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-22 17:17 [PATCH] mailbox: Fix NULL message support in mbox_send_message() jassisinghbrar
2026-03-23  5:13 ` Joonwon Kang
2026-03-23 14:57   ` Jassi Brar
2026-03-23 15:43     ` Joonwon Kang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox