[PATCH v3 0/2] mailbox: Fix wrong completion order and improper send result in the blocking mode send API

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH v3 0/2] mailbox: Fix wrong completion order and improper send result in the blocking mode send API
@ 2026-04-02 17:06 Joonwon Kang
  2026-04-02 17:06 ` [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order Joonwon Kang
  2026-04-02 17:06 ` [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails Joonwon Kang
  0 siblings, 2 replies; 12+ messages in thread
From: Joonwon Kang @ 2026-04-02 17:06 UTC (permalink / raw)
  To: jassisinghbrar, matthias.bgg, angelogioacchino.delregno,
	thierry.reding, jonathanh
  Cc: linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra,
	Joonwon Kang

Hi team,

This patch series fixes the two major issues in blocking mode.

1) Wrong completion order in the send API as described in [1]:

        Thread#1(T1)               Thread#2(T2)
     mbox_send_message           mbox_send_message
            |                           |
            V                           |
        add_to_rbuf(M1)                 V
            |                     add_to_rbuf(M2)
            |                           |
            |                           V
            V                      msg_submit(picks M1)
        msg_submit                      |
            |                           V
            V                   wait_for_completion(on M2)
     wait_for_completion(on M1)         |  (1st in waitQ)
            |   (2nd in waitQ)          V
            V                   wake_up(on completion of M1)<--incorrect

2) Send API does not return the actual send result.

This patch series contains two patches for each issue:
  0001-mailbox-Use-per-thread-completion-to-fix-wrong-co.patch
  0002-mailbox-Make-mbox_send_message-return-error-code-.patch

The first issue has to do with multi-threads support. Given the
discussion in [1] with the mailbox framework maintainer, it has been
long thought that the mailbox framework is designed to support
multi-threads although it missed the completion order issue at its
introduction. The first patch of this series is to fix it.

Alternatively, we could instead declare that the mailbox API does not
support multi-threads [2]. However, it would be a sudden big change to
the mailbox users after the long standing implication of supporting
multi-threads. Plus, it would have disparity with the non-blocking mode
which supports multi-threads already, which could also lead to confusion
to the users by saying "non-blocking mode supports multi-threads whereas
blocking mode doesn't". For this reason, the first patch in this series
does not choose this alternative.

The patch series rules out the case where tx_tick() is called twice or
more for a sent message on the same channel. In theory, it could happen
when timeout occurs. For example, one tx_tick() by the mailbox core due
to timeout and another tx_tick() by the mailbox controller or client by
accident or for any other reason. If it happens, the internal mailbox
state could become inconsistent even on a single thread. Thus, this
issue should be handled in an orthogonal effort later on.

The second issue forces users to register tx done callback to get the
actual send result although they are using the blocking mode send API.
This behavior is different from typical blocking send APIs, which just
return the actual send result directly, and so confusing to the users.
Without knowing this additional requirement of the API, it would be
prone to miss the send result check entirely. The second patch is to fix
it by making the blocking mode send API return the actual send result.

Change log of the first patch:
 - v3: Rebase on the latest for-next branch.
 - v2: Consider the case where timeout occurs and so tx_tick() is called
   for a channel by one thread while another thread is having an active
   request on the same channel. In that case, we mark the inactive
   request as canceled and do not send it to the controller.
 - v1: The previous solution in v0 tries to have per-message completion:
   `tx_cmpl[MBOX_TX_QUEUE_LEN]`; each completion belongs to each slot of
   the message queue: `msg_data[i]`. Those completions take up additional
   memory even when they are not used. Instead, this patch tries to have
   per-"thread" completion; each completion belongs to each sender thread
   and each slot of the message queue has a pointer to that completion;
   `struct mbox_message` has the "pointer" field
   `struct completion *tx_complete` which points to the completion which
   is created on the stack of the sender, instead of owning the
   completion by `struct completion tx_complete`. This way, we could
   avoid additional memory use since a completion will be allocated only
   when necessary. Plus, more importantly, we could avoid the window
   where the same completion is reused by different sender threads, which
   the previous solution still has.
 - v0: This first attempt tries to have per-message completion: [1].

Change log of the second patch:
 - No major change from v1.

References:
 - [1]: https://lore.kernel.org/all/1490809381-28869-1-git-send-email-jaswinder.singh@linaro.org
 - [2]: https://lore.kernel.org/all/CABb+yY39rhTZbtA21MecYk-R9fh7VQQr5kZUgCw4z92mWhZ1Rg@mail.gmail.com/

Joonwon Kang (2):
  mailbox: Use per-thread completion to fix wrong completion order
  mailbox: Make mbox_send_message() return error code when tx fails

 drivers/mailbox/mailbox.c          | 98 ++++++++++++++++++++----------
 drivers/mailbox/mtk-vcp-mailbox.c  |  2 +-
 drivers/mailbox/tegra-hsp.c        |  2 +-
 include/linux/mailbox_controller.h | 22 +++++--
 4 files changed, 85 insertions(+), 39 deletions(-)

Thanks,
Joonwon Kang
-- 
2.53.0.1185.g05d4b7b318-goog

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order
  2026-04-02 17:06 [PATCH v3 0/2] mailbox: Fix wrong completion order and improper send result in the blocking mode send API Joonwon Kang
@ 2026-04-02 17:06 ` Joonwon Kang
  2026-04-02 17:59   ` Jassi Brar
  2026-04-02 17:06 ` [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails Joonwon Kang
  1 sibling, 1 reply; 12+ messages in thread
From: Joonwon Kang @ 2026-04-02 17:06 UTC (permalink / raw)
  To: jassisinghbrar, matthias.bgg, angelogioacchino.delregno,
	thierry.reding, jonathanh
  Cc: linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra,
	Joonwon Kang, stable

Previously, a sender thread in mbox_send_message() could be woken up at
a wrong time in blocking mode. It is because there was only a single
completion for a channel whereas messages from multiple threads could be
sent in any order; since the shared completion could be signalled in any
order, it could wake up a wrong sender thread.

This commit resolves the false wake-up issue with the following changes:
- Completions are created just as many as the number of concurrent sender
  threads
- A completion is created on a sender thread's stack
- Each slot of the message queue, i.e. `msg_data`, contains a pointer to
  its target completion
- tx_tick() signals the completion of the currently active slot of the
  message queue

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/1490809381-28869-1-git-send-email-jaswinder.singh@linaro.org
Signed-off-by: Joonwon Kang <joonwonkang@google.com>
---
 drivers/mailbox/mailbox.c          | 86 +++++++++++++++++++-----------
 drivers/mailbox/mtk-vcp-mailbox.c  |  2 +-
 drivers/mailbox/tegra-hsp.c        |  2 +-
 include/linux/mailbox_controller.h | 20 ++++---
 4 files changed, 72 insertions(+), 38 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 138ffbcd4fde..d63386468982 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -21,7 +21,7 @@
 static LIST_HEAD(mbox_cons);
 static DEFINE_MUTEX(con_mutex);
 
-static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
+static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete)
 {
 	int idx;
 
@@ -32,7 +32,8 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
 		return -ENOBUFS;
 
 	idx = chan->msg_free;
-	chan->msg_data[idx] = mssg;
+	chan->msg_data[idx].data = mssg;
+	chan->msg_data[idx].tx_complete = tx_complete;
 	chan->msg_count++;
 
 	if (idx == MBOX_TX_QUEUE_LEN - 1)
@@ -50,24 +51,33 @@ static void msg_submit(struct mbox_chan *chan)
 	int err = -EBUSY;
 
 	scoped_guard(spinlock_irqsave, &chan->lock) {
-		if (!chan->msg_count || chan->active_req != MBOX_NO_MSG)
+		if (chan->active_req >= 0)
 			break;
 
-		count = chan->msg_count;
-		idx = chan->msg_free;
-		if (idx >= count)
-			idx -= count;
-		else
-			idx += MBOX_TX_QUEUE_LEN - count;
+		while (chan->msg_count > 0) {
+			count = chan->msg_count;
+			idx = chan->msg_free;
+			if (idx >= count)
+				idx -= count;
+			else
+				idx += MBOX_TX_QUEUE_LEN - count;
 
-		data = chan->msg_data[idx];
+			data = chan->msg_data[idx].data;
+			if (data != MBOX_NO_MSG)
+				break;
+
+			chan->msg_count--;
+		}
+
+		if (!chan->msg_count)
+			break;
 
 		if (chan->cl->tx_prepare)
 			chan->cl->tx_prepare(chan->cl, data);
 		/* Try to submit a message to the MBOX controller */
 		err = chan->mbox->ops->send_data(chan, data);
 		if (!err) {
-			chan->active_req = data;
+			chan->active_req = idx;
 			chan->msg_count--;
 		}
 	}
@@ -79,27 +89,35 @@ static void msg_submit(struct mbox_chan *chan)
 	}
 }
 
-static void tx_tick(struct mbox_chan *chan, int r)
+static void tx_tick(struct mbox_chan *chan, int r, int idx)
 {
-	void *mssg;
+	struct mbox_message mssg = {MBOX_NO_MSG, NULL};
 
 	scoped_guard(spinlock_irqsave, &chan->lock) {
-		mssg = chan->active_req;
-		chan->active_req = MBOX_NO_MSG;
+		if (idx >= 0 && idx != chan->active_req) {
+			chan->msg_data[idx].data = MBOX_NO_MSG;
+			chan->msg_data[idx].tx_complete = NULL;
+			return;
+		}
+
+		if (chan->active_req >= 0) {
+			mssg = chan->msg_data[chan->active_req];
+			chan->active_req = -1;
+		}
 	}
 
 	/* Submit next message */
 	msg_submit(chan);
 
-	if (mssg == MBOX_NO_MSG)
+	if (mssg.data == MBOX_NO_MSG)
 		return;
 
 	/* Notify the client */
 	if (chan->cl->tx_done)
-		chan->cl->tx_done(chan->cl, mssg, r);
+		chan->cl->tx_done(chan->cl, mssg.data, r);
 
 	if (r != -ETIME && chan->cl->tx_block)
-		complete(&chan->tx_complete);
+		complete(mssg.tx_complete);
 }
 
 static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
@@ -112,10 +130,10 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
 	for (i = 0; i < mbox->num_chans; i++) {
 		struct mbox_chan *chan = &mbox->chans[i];
 
-		if (chan->active_req != MBOX_NO_MSG && chan->cl) {
+		if (chan->active_req >= 0 && chan->cl) {
 			txdone = chan->mbox->ops->last_tx_done(chan);
 			if (txdone)
-				tx_tick(chan, 0);
+				tx_tick(chan, 0, -1);
 			else
 				resched = true;
 		}
@@ -168,7 +186,7 @@ void mbox_chan_txdone(struct mbox_chan *chan, int r)
 		return;
 	}
 
-	tx_tick(chan, r);
+	tx_tick(chan, r, -1);
 }
 EXPORT_SYMBOL_GPL(mbox_chan_txdone);
 
@@ -188,7 +206,7 @@ void mbox_client_txdone(struct mbox_chan *chan, int r)
 		return;
 	}
 
-	tx_tick(chan, r);
+	tx_tick(chan, r, -1);
 }
 EXPORT_SYMBOL_GPL(mbox_client_txdone);
 
@@ -266,11 +284,19 @@ EXPORT_SYMBOL_GPL(mbox_chan_tx_slots_available);
 int mbox_send_message(struct mbox_chan *chan, void *mssg)
 {
 	int t;
+	int idx;
+	struct completion tx_complete;
 
 	if (!chan || !chan->cl || mssg == MBOX_NO_MSG)
 		return -EINVAL;
 
-	t = add_to_rbuf(chan, mssg);
+	if (chan->cl->tx_block) {
+		init_completion(&tx_complete);
+		t = add_to_rbuf(chan, mssg, &tx_complete);
+	} else {
+		t = add_to_rbuf(chan, mssg, NULL);
+	}
+
 	if (t < 0) {
 		dev_err(chan->mbox->dev, "Try increasing MBOX_TX_QUEUE_LEN\n");
 		return t;
@@ -287,10 +313,11 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 		else
 			wait = msecs_to_jiffies(chan->cl->tx_tout);
 
-		ret = wait_for_completion_timeout(&chan->tx_complete, wait);
+		ret = wait_for_completion_timeout(&tx_complete, wait);
 		if (ret == 0) {
+			idx = t;
 			t = -ETIME;
-			tx_tick(chan, t);
+			tx_tick(chan, t, idx);
 		}
 	}
 
@@ -321,7 +348,7 @@ int mbox_flush(struct mbox_chan *chan, unsigned long timeout)
 
 	ret = chan->mbox->ops->flush(chan, timeout);
 	if (ret < 0)
-		tx_tick(chan, ret);
+		tx_tick(chan, ret, -1);
 
 	return ret;
 }
@@ -340,9 +367,8 @@ static int __mbox_bind_client(struct mbox_chan *chan, struct mbox_client *cl)
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		chan->msg_free = 0;
 		chan->msg_count = 0;
-		chan->active_req = MBOX_NO_MSG;
+		chan->active_req = -1;
 		chan->cl = cl;
-		init_completion(&chan->tx_complete);
 
 		if (chan->txdone_method	== TXDONE_BY_POLL && cl->knows_txdone)
 			chan->txdone_method = TXDONE_BY_ACK;
@@ -498,7 +524,7 @@ void mbox_free_channel(struct mbox_chan *chan)
 	/* The queued TX requests are simply aborted, no callbacks are made */
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		chan->cl = NULL;
-		chan->active_req = MBOX_NO_MSG;
+		chan->active_req = -1;
 		if (chan->txdone_method == TXDONE_BY_ACK)
 			chan->txdone_method = TXDONE_BY_POLL;
 	}
@@ -553,7 +579,7 @@ int mbox_controller_register(struct mbox_controller *mbox)
 
 		chan->cl = NULL;
 		chan->mbox = mbox;
-		chan->active_req = MBOX_NO_MSG;
+		chan->active_req = -1;
 		chan->txdone_method = txdone;
 		spin_lock_init(&chan->lock);
 	}
diff --git a/drivers/mailbox/mtk-vcp-mailbox.c b/drivers/mailbox/mtk-vcp-mailbox.c
index 1b291b8ea15a..a7bab06ac686 100644
--- a/drivers/mailbox/mtk-vcp-mailbox.c
+++ b/drivers/mailbox/mtk-vcp-mailbox.c
@@ -84,7 +84,7 @@ static int mtk_vcp_mbox_send_data(struct mbox_chan *chan, void *data)
 
 static bool mtk_vcp_mbox_last_tx_done(struct mbox_chan *chan)
 {
-	struct mtk_ipi_info *ipi_info = chan->active_req;
+	struct mtk_ipi_info *ipi_info = chan->msg_data[chan->active_req].data;
 	struct mtk_vcp_mbox *priv = chan->con_priv;
 
 	return !(readl(priv->base + priv->cfg->set_in) & BIT(ipi_info->index));
diff --git a/drivers/mailbox/tegra-hsp.c b/drivers/mailbox/tegra-hsp.c
index 7b1e1b83ea29..efe0033cb5c5 100644
--- a/drivers/mailbox/tegra-hsp.c
+++ b/drivers/mailbox/tegra-hsp.c
@@ -495,7 +495,7 @@ static int tegra_hsp_mailbox_flush(struct mbox_chan *chan,
 			mbox_chan_txdone(chan, 0);
 
 			/* Wait until channel is empty */
-			if (chan->active_req != MBOX_NO_MSG)
+			if (chan->active_req >= 0)
 				continue;
 
 			return 0;
diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h
index e3896b08f22e..912499ad08ed 100644
--- a/include/linux/mailbox_controller.h
+++ b/include/linux/mailbox_controller.h
@@ -113,16 +113,25 @@ struct mbox_controller {
  */
 #define MBOX_TX_QUEUE_LEN	20
 
+/**
+ * struct mbox_message - Internal representation of a mailbox message
+ * @data:		Data packet
+ * @tx_complete:	Pointer to the transmission completion
+ */
+struct mbox_message {
+	void *data;
+	struct completion *tx_complete;
+};
+
 /**
  * struct mbox_chan - s/w representation of a communication chan
  * @mbox:		Pointer to the parent/provider of this channel
  * @txdone_method:	Way to detect TXDone chosen by the API
  * @cl:			Pointer to the current owner of this channel
- * @tx_complete:	Transmission completion
- * @active_req:		Currently active request hook
+ * @active_req:		Index of the currently active slot in the queue
  * @msg_count:		No. of mssg currently queued
  * @msg_free:		Index of next available mssg slot
- * @msg_data:		Hook for data packet
+ * @msg_data:		Queue of data packets
  * @lock:		Serialise access to the channel
  * @con_priv:		Hook for controller driver to attach private data
  */
@@ -130,10 +139,9 @@ struct mbox_chan {
 	struct mbox_controller *mbox;
 	unsigned txdone_method;
 	struct mbox_client *cl;
-	struct completion tx_complete;
-	void *active_req;
+	int active_req;
 	unsigned msg_count, msg_free;
-	void *msg_data[MBOX_TX_QUEUE_LEN];
+	struct mbox_message msg_data[MBOX_TX_QUEUE_LEN];
 	spinlock_t lock; /* Serialise access to the channel */
 	void *con_priv;
 };
-- 
2.53.0.1185.g05d4b7b318-goog



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails
  2026-04-02 17:06 [PATCH v3 0/2] mailbox: Fix wrong completion order and improper send result in the blocking mode send API Joonwon Kang
  2026-04-02 17:06 ` [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order Joonwon Kang
@ 2026-04-02 17:06 ` Joonwon Kang
  2026-04-02 18:03   ` Jassi Brar
  1 sibling, 1 reply; 12+ messages in thread
From: Joonwon Kang @ 2026-04-02 17:06 UTC (permalink / raw)
  To: jassisinghbrar, matthias.bgg, angelogioacchino.delregno,
	thierry.reding, jonathanh
  Cc: linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra,
	Joonwon Kang, stable

When the mailbox controller failed transmitting message, the error code
was only passed to the client's tx done handler and not to
mbox_send_message(). For this reason, the function could return a false
success. This commit resolves the issue by introducing the tx status and
checking it before mbox_send_message() returns.

Cc: stable@vger.kernel.org
Signed-off-by: Joonwon Kang <joonwonkang@google.com>
---
 drivers/mailbox/mailbox.c          | 20 +++++++++++++++-----
 include/linux/mailbox_controller.h |  2 ++
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index d63386468982..ea9aec9dc947 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -21,7 +21,10 @@
 static LIST_HEAD(mbox_cons);
 static DEFINE_MUTEX(con_mutex);
 
-static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete)
+static int add_to_rbuf(struct mbox_chan *chan,
+		       void *mssg,
+		       struct completion *tx_complete,
+		       int *tx_status)
 {
 	int idx;
 
@@ -34,6 +37,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx
 	idx = chan->msg_free;
 	chan->msg_data[idx].data = mssg;
 	chan->msg_data[idx].tx_complete = tx_complete;
+	chan->msg_data[idx].tx_status = tx_status;
 	chan->msg_count++;
 
 	if (idx == MBOX_TX_QUEUE_LEN - 1)
@@ -91,12 +95,13 @@ static void msg_submit(struct mbox_chan *chan)
 
 static void tx_tick(struct mbox_chan *chan, int r, int idx)
 {
-	struct mbox_message mssg = {MBOX_NO_MSG, NULL};
+	struct mbox_message mssg = {MBOX_NO_MSG, NULL, NULL};
 
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		if (idx >= 0 && idx != chan->active_req) {
 			chan->msg_data[idx].data = MBOX_NO_MSG;
 			chan->msg_data[idx].tx_complete = NULL;
+			chan->msg_data[idx].tx_status = NULL;
 			return;
 		}
 
@@ -116,8 +121,10 @@ static void tx_tick(struct mbox_chan *chan, int r, int idx)
 	if (chan->cl->tx_done)
 		chan->cl->tx_done(chan->cl, mssg.data, r);
 
-	if (r != -ETIME && chan->cl->tx_block)
+	if (r != -ETIME && chan->cl->tx_block) {
+		*mssg.tx_status = r;
 		complete(mssg.tx_complete);
+	}
 }
 
 static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
@@ -286,15 +293,16 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 	int t;
 	int idx;
 	struct completion tx_complete;
+	int tx_status = 0;
 
 	if (!chan || !chan->cl || mssg == MBOX_NO_MSG)
 		return -EINVAL;
 
 	if (chan->cl->tx_block) {
 		init_completion(&tx_complete);
-		t = add_to_rbuf(chan, mssg, &tx_complete);
+		t = add_to_rbuf(chan, mssg, &tx_complete, &tx_status);
 	} else {
-		t = add_to_rbuf(chan, mssg, NULL);
+		t = add_to_rbuf(chan, mssg, NULL, NULL);
 	}
 
 	if (t < 0) {
@@ -318,6 +326,8 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 			idx = t;
 			t = -ETIME;
 			tx_tick(chan, t, idx);
+		} else if (tx_status < 0) {
+			t = tx_status;
 		}
 	}
 
diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h
index 912499ad08ed..890da97bcb50 100644
--- a/include/linux/mailbox_controller.h
+++ b/include/linux/mailbox_controller.h
@@ -117,10 +117,12 @@ struct mbox_controller {
  * struct mbox_message - Internal representation of a mailbox message
  * @data:		Data packet
  * @tx_complete:	Pointer to the transmission completion
+ * @tx_status:		Pointer to the transmission status
  */
 struct mbox_message {
 	void *data;
 	struct completion *tx_complete;
+	int *tx_status;
 };
 
 /**
-- 
2.53.0.1185.g05d4b7b318-goog



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order
  2026-04-02 17:06 ` [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order Joonwon Kang
@ 2026-04-02 17:59   ` Jassi Brar
  2026-04-03 14:51     ` Joonwon Kang
  0 siblings, 1 reply; 12+ messages in thread
From: Jassi Brar @ 2026-04-02 17:59 UTC (permalink / raw)
  To: Joonwon Kang
  Cc: matthias.bgg, angelogioacchino.delregno, thierry.reding,
	jonathanh, linux-kernel, linux-arm-kernel, linux-mediatek,
	linux-tegra, stable

On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
>
> Previously, a sender thread in mbox_send_message() could be woken up at
> a wrong time in blocking mode. It is because there was only a single
> completion for a channel whereas messages from multiple threads could be
> sent in any order; since the shared completion could be signalled in any
> order, it could wake up a wrong sender thread.
>
> This commit resolves the false wake-up issue with the following changes:
> - Completions are created just as many as the number of concurrent sender
>   threads
> - A completion is created on a sender thread's stack
> - Each slot of the message queue, i.e. `msg_data`, contains a pointer to
>   its target completion
> - tx_tick() signals the completion of the currently active slot of the
>   message queue
>
I think I reviewed it already or is this happening on
one-channel-one-client usage? Because mailbox api does not support
channels shared among multiple clients.

Thanks
Jassi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails
  2026-04-02 17:06 ` [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails Joonwon Kang
@ 2026-04-02 18:03   ` Jassi Brar
  2026-04-03 15:19     ` Joonwon Kang
  0 siblings, 1 reply; 12+ messages in thread
From: Jassi Brar @ 2026-04-02 18:03 UTC (permalink / raw)
  To: Joonwon Kang
  Cc: matthias.bgg, angelogioacchino.delregno, thierry.reding,
	jonathanh, linux-kernel, linux-arm-kernel, linux-mediatek,
	linux-tegra, stable

On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
>
> When the mailbox controller failed transmitting message, the error code
> was only passed to the client's tx done handler and not to
> mbox_send_message(). For this reason, the function could return a false
> success. This commit resolves the issue by introducing the tx status and
> checking it before mbox_send_message() returns.
>
Can you please share the scenario when this becomes necessary? This
can potentially change the ground underneath some clients, so we have
to be sure this is really useful.

Thanks
Jassi


> Cc: stable@vger.kernel.org
> Signed-off-by: Joonwon Kang <joonwonkang@google.com>
> ---
>  drivers/mailbox/mailbox.c          | 20 +++++++++++++++-----
>  include/linux/mailbox_controller.h |  2 ++
>  2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index d63386468982..ea9aec9dc947 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -21,7 +21,10 @@
>  static LIST_HEAD(mbox_cons);
>  static DEFINE_MUTEX(con_mutex);
>
> -static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete)
> +static int add_to_rbuf(struct mbox_chan *chan,
> +                      void *mssg,
> +                      struct completion *tx_complete,
> +                      int *tx_status)
>  {
>         int idx;
>
> @@ -34,6 +37,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx
>         idx = chan->msg_free;
>         chan->msg_data[idx].data = mssg;
>         chan->msg_data[idx].tx_complete = tx_complete;
> +       chan->msg_data[idx].tx_status = tx_status;
>         chan->msg_count++;
>
>         if (idx == MBOX_TX_QUEUE_LEN - 1)
> @@ -91,12 +95,13 @@ static void msg_submit(struct mbox_chan *chan)
>
>  static void tx_tick(struct mbox_chan *chan, int r, int idx)
>  {
> -       struct mbox_message mssg = {MBOX_NO_MSG, NULL};
> +       struct mbox_message mssg = {MBOX_NO_MSG, NULL, NULL};
>
>         scoped_guard(spinlock_irqsave, &chan->lock) {
>                 if (idx >= 0 && idx != chan->active_req) {
>                         chan->msg_data[idx].data = MBOX_NO_MSG;
>                         chan->msg_data[idx].tx_complete = NULL;
> +                       chan->msg_data[idx].tx_status = NULL;
>                         return;
>                 }
>
> @@ -116,8 +121,10 @@ static void tx_tick(struct mbox_chan *chan, int r, int idx)
>         if (chan->cl->tx_done)
>                 chan->cl->tx_done(chan->cl, mssg.data, r);
>
> -       if (r != -ETIME && chan->cl->tx_block)
> +       if (r != -ETIME && chan->cl->tx_block) {
> +               *mssg.tx_status = r;
>                 complete(mssg.tx_complete);
> +       }
>  }
>
>  static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
> @@ -286,15 +293,16 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
>         int t;
>         int idx;
>         struct completion tx_complete;
> +       int tx_status = 0;
>
>         if (!chan || !chan->cl || mssg == MBOX_NO_MSG)
>                 return -EINVAL;
>
>         if (chan->cl->tx_block) {
>                 init_completion(&tx_complete);
> -               t = add_to_rbuf(chan, mssg, &tx_complete);
> +               t = add_to_rbuf(chan, mssg, &tx_complete, &tx_status);
>         } else {
> -               t = add_to_rbuf(chan, mssg, NULL);
> +               t = add_to_rbuf(chan, mssg, NULL, NULL);
>         }
>
>         if (t < 0) {
> @@ -318,6 +326,8 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
>                         idx = t;
>                         t = -ETIME;
>                         tx_tick(chan, t, idx);
> +               } else if (tx_status < 0) {
> +                       t = tx_status;
>                 }
>         }
>
> diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h
> index 912499ad08ed..890da97bcb50 100644
> --- a/include/linux/mailbox_controller.h
> +++ b/include/linux/mailbox_controller.h
> @@ -117,10 +117,12 @@ struct mbox_controller {
>   * struct mbox_message - Internal representation of a mailbox message
>   * @data:              Data packet
>   * @tx_complete:       Pointer to the transmission completion
> + * @tx_status:         Pointer to the transmission status
>   */
>  struct mbox_message {
>         void *data;
>         struct completion *tx_complete;
> +       int *tx_status;
>  };
>
>  /**
> --
> 2.53.0.1185.g05d4b7b318-goog
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order
  2026-04-02 17:59   ` Jassi Brar
@ 2026-04-03 14:51     ` Joonwon Kang
  2026-04-03 16:19       ` Jassi Brar
  0 siblings, 1 reply; 12+ messages in thread
From: Joonwon Kang @ 2026-04-03 14:51 UTC (permalink / raw)
  To: jassisinghbrar
  Cc: angelogioacchino.delregno, jonathanh, joonwonkang,
	linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra,
	matthias.bgg, stable, thierry.reding

> On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
> >
> > Previously, a sender thread in mbox_send_message() could be woken up at
> > a wrong time in blocking mode. It is because there was only a single
> > completion for a channel whereas messages from multiple threads could be
> > sent in any order; since the shared completion could be signalled in any
> > order, it could wake up a wrong sender thread.
> >
> > This commit resolves the false wake-up issue with the following changes:
> > - Completions are created just as many as the number of concurrent sender
> >   threads
> > - A completion is created on a sender thread's stack
> > - Each slot of the message queue, i.e. `msg_data`, contains a pointer to
> >   its target completion
> > - tx_tick() signals the completion of the currently active slot of the
> >   message queue
> >
> I think I reviewed it already or is this happening on
> one-channel-one-client usage? Because mailbox api does not support
> channels shared among multiple clients.

Yes, this patch is handling the one-channel-one-client usage but when that
single channel is shared between multiple threads. From my understanding, the
discussion back then ended with how to circumvent the issue rather than whether
we will eventually solve this in the mailbox framework or not, and if yes, how
we will, and if not, why. I think it should still be resolved in the framework
for the reasons in the cover letter. Could you help to give a second review
with regards to those aspects?

Thanks,
Joonwon Kang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails
  2026-04-02 18:03   ` Jassi Brar
@ 2026-04-03 15:19     ` Joonwon Kang
  2026-04-03 16:36       ` Jassi Brar
  0 siblings, 1 reply; 12+ messages in thread
From: Joonwon Kang @ 2026-04-03 15:19 UTC (permalink / raw)
  To: jassisinghbrar
  Cc: angelogioacchino.delregno, jonathanh, joonwonkang,
	linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra,
	matthias.bgg, stable, thierry.reding, akpm

> On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
> >
> > When the mailbox controller failed transmitting message, the error code
> > was only passed to the client's tx done handler and not to
> > mbox_send_message(). For this reason, the function could return a false
> > success. This commit resolves the issue by introducing the tx status and
> > checking it before mbox_send_message() returns.
> >
> Can you please share the scenario when this becomes necessary? This
> can potentially change the ground underneath some clients, so we have
> to be sure this is really useful.

I would say the problem here is generic enough to apply to all the cases where
the send result needs to be checked. Since the return value of the send API is
not the real send result, any users who believe that this blocking send API
will return the real send result could fall for that. For example, users may
think the send was successful even though it was not actually. I believe it is
uncommon that users have to register a callback solely to get the send result
even though they are using the blocking send API already. Also, I guess there
is no special reason why only the mailbox send API should work this way among
other typical blocking send APIs. For these reasons, this patch makes the send
API return the real send result. This way, users will not need to register the
redundant callback and I think the return value will align with their common
expectation.

Regarding the change in the ground for some clients, could you help to clarify
a bit more on what change, you expect, would surprise the clients?

Thanks,
Joonwon Kang

> 
> Thanks
> Jassi
> 
> 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Joonwon Kang <joonwonkang@google.com>
> > ---
> >  drivers/mailbox/mailbox.c          | 20 +++++++++++++++-----
> >  include/linux/mailbox_controller.h |  2 ++
> >  2 files changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> > index d63386468982..ea9aec9dc947 100644
> > --- a/drivers/mailbox/mailbox.c
> > +++ b/drivers/mailbox/mailbox.c
> > @@ -21,7 +21,10 @@
> >  static LIST_HEAD(mbox_cons);
> >  static DEFINE_MUTEX(con_mutex);
> >
> > -static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete)
> > +static int add_to_rbuf(struct mbox_chan *chan,
> > +                      void *mssg,
> > +                      struct completion *tx_complete,
> > +                      int *tx_status)
> >  {
> >         int idx;
> >
> > @@ -34,6 +37,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx
> >         idx = chan->msg_free;
> >         chan->msg_data[idx].data = mssg;
> >         chan->msg_data[idx].tx_complete = tx_complete;
> > +       chan->msg_data[idx].tx_status = tx_status;
> >         chan->msg_count++;
> >
> >         if (idx == MBOX_TX_QUEUE_LEN - 1)
> > @@ -91,12 +95,13 @@ static void msg_submit(struct mbox_chan *chan)
> >
> >  static void tx_tick(struct mbox_chan *chan, int r, int idx)
> >  {
> > -       struct mbox_message mssg = {MBOX_NO_MSG, NULL};
> > +       struct mbox_message mssg = {MBOX_NO_MSG, NULL, NULL};
> >
> >         scoped_guard(spinlock_irqsave, &chan->lock) {
> >                 if (idx >= 0 && idx != chan->active_req) {
> >                         chan->msg_data[idx].data = MBOX_NO_MSG;
> >                         chan->msg_data[idx].tx_complete = NULL;
> > +                       chan->msg_data[idx].tx_status = NULL;
> >                         return;
> >                 }
> >
> > @@ -116,8 +121,10 @@ static void tx_tick(struct mbox_chan *chan, int r, int idx)
> >         if (chan->cl->tx_done)
> >                 chan->cl->tx_done(chan->cl, mssg.data, r);
> >
> > -       if (r != -ETIME && chan->cl->tx_block)
> > +       if (r != -ETIME && chan->cl->tx_block) {
> > +               *mssg.tx_status = r;
> >                 complete(mssg.tx_complete);
> > +       }
> >  }
> >
> >  static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
> > @@ -286,15 +293,16 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
> >         int t;
> >         int idx;
> >         struct completion tx_complete;
> > +       int tx_status = 0;
> >
> >         if (!chan || !chan->cl || mssg == MBOX_NO_MSG)
> >                 return -EINVAL;
> >
> >         if (chan->cl->tx_block) {
> >                 init_completion(&tx_complete);
> > -               t = add_to_rbuf(chan, mssg, &tx_complete);
> > +               t = add_to_rbuf(chan, mssg, &tx_complete, &tx_status);
> >         } else {
> > -               t = add_to_rbuf(chan, mssg, NULL);
> > +               t = add_to_rbuf(chan, mssg, NULL, NULL);
> >         }
> >
> >         if (t < 0) {
> > @@ -318,6 +326,8 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
> >                         idx = t;
> >                         t = -ETIME;
> >                         tx_tick(chan, t, idx);
> > +               } else if (tx_status < 0) {
> > +                       t = tx_status;
> >                 }
> >         }
> >
> > diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h
> > index 912499ad08ed..890da97bcb50 100644
> > --- a/include/linux/mailbox_controller.h
> > +++ b/include/linux/mailbox_controller.h
> > @@ -117,10 +117,12 @@ struct mbox_controller {
> >   * struct mbox_message - Internal representation of a mailbox message
> >   * @data:              Data packet
> >   * @tx_complete:       Pointer to the transmission completion
> > + * @tx_status:         Pointer to the transmission status
> >   */
> >  struct mbox_message {
> >         void *data;
> >         struct completion *tx_complete;
> > +       int *tx_status;
> >  };
> >
> >  /**
> > --
> > 2.53.0.1185.g05d4b7b318-goog
> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order
  2026-04-03 14:51     ` Joonwon Kang
@ 2026-04-03 16:19       ` Jassi Brar
  2026-04-04 12:44         ` Joonwon Kang
  0 siblings, 1 reply; 12+ messages in thread
From: Jassi Brar @ 2026-04-03 16:19 UTC (permalink / raw)
  To: Joonwon Kang
  Cc: angelogioacchino.delregno, jonathanh, linux-arm-kernel,
	linux-kernel, linux-mediatek, linux-tegra, matthias.bgg, stable,
	thierry.reding

On Fri, Apr 3, 2026 at 9:51 AM Joonwon Kang <joonwonkang@google.com> wrote:
>
> > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
> > >
> > > Previously, a sender thread in mbox_send_message() could be woken up at
> > > a wrong time in blocking mode. It is because there was only a single
> > > completion for a channel whereas messages from multiple threads could be
> > > sent in any order; since the shared completion could be signalled in any
> > > order, it could wake up a wrong sender thread.
> > >
> > > This commit resolves the false wake-up issue with the following changes:
> > > - Completions are created just as many as the number of concurrent sender
> > >   threads
> > > - A completion is created on a sender thread's stack
> > > - Each slot of the message queue, i.e. `msg_data`, contains a pointer to
> > >   its target completion
> > > - tx_tick() signals the completion of the currently active slot of the
> > >   message queue
> > >
> > I think I reviewed it already or is this happening on
> > one-channel-one-client usage? Because mailbox api does not support
> > channels shared among multiple clients.
>
> Yes, this patch is handling the one-channel-one-client usage but when that
> single channel is shared between multiple threads.

hmm.... how is this not single-channel-multiple-clients ?
A channel is returned as an opaque token to the clients, if that
client shares that with other threads - they will race.
It is the job of the original client to serialize its threads' access
to the channel.

> From my understanding, the
> discussion back then ended with how to circumvent the issue rather than whether
> we will eventually solve this in the mailbox framework or not, and if yes, how
> we will, and if not, why.

It will be interesting to see how many current clients actually need
to share channels. If there are enough, it makes sense to implement
some helper api
on top of existing code, instead of changing its nature totally.

Thanks
Jassi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails
  2026-04-03 15:19     ` Joonwon Kang
@ 2026-04-03 16:36       ` Jassi Brar
  2026-04-04 11:47         ` Joonwon Kang
  0 siblings, 1 reply; 12+ messages in thread
From: Jassi Brar @ 2026-04-03 16:36 UTC (permalink / raw)
  To: Joonwon Kang
  Cc: angelogioacchino.delregno, jonathanh, linux-arm-kernel,
	linux-kernel, linux-mediatek, linux-tegra, matthias.bgg, stable,
	thierry.reding, akpm

On Fri, Apr 3, 2026 at 10:19 AM Joonwon Kang <joonwonkang@google.com> wrote:
>
> > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
> > >
> > > When the mailbox controller failed transmitting message, the error code
> > > was only passed to the client's tx done handler and not to
> > > mbox_send_message(). For this reason, the function could return a false
> > > success. This commit resolves the issue by introducing the tx status and
> > > checking it before mbox_send_message() returns.
> > >
> > Can you please share the scenario when this becomes necessary? This
> > can potentially change the ground underneath some clients, so we have
> > to be sure this is really useful.
>
> I would say the problem here is generic enough to apply to all the cases where
> the send result needs to be checked. Since the return value of the send API is
> not the real send result, any users who believe that this blocking send API
> will return the real send result could fall for that. For example, users may
> think the send was successful even though it was not actually. I believe it is
> uncommon that users have to register a callback solely to get the send result
> even though they are using the blocking send API already. Also, I guess there
> is no special reason why only the mailbox send API should work this way among
> other typical blocking send APIs. For these reasons, this patch makes the send
> API return the real send result. This way, users will not need to register the
> redundant callback and I think the return value will align with their common
> expectation.
>
Clients submit a message into the Mailbox subsystem to be sent out to
the remote side which can happen immediately or later.
If submission fails, clients get immediately notified. If transmission
fails (which is now internal to the subsystem) it is reported to the
client by a callback.
If the API was called mbox_submit_message (which it actually is)
instead of mbox_send_message, there would be no confusion.
We can argue how good/bad the current implementation is, but the fact
is that it is here. And I am reluctant to cause churn without good
reason.
Again, as I said, any, _legal_, setup scenario will help me come over
my reluctance.

Thanks
Jassi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails
  2026-04-03 16:36       ` Jassi Brar
@ 2026-04-04 11:47         ` Joonwon Kang
  0 siblings, 0 replies; 12+ messages in thread
From: Joonwon Kang @ 2026-04-04 11:47 UTC (permalink / raw)
  To: jassisinghbrar
  Cc: akpm, angelogioacchino.delregno, jonathanh, joonwonkang,
	linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra,
	matthias.bgg, stable, thierry.reding, lee

> On Fri, Apr 3, 2026 at 10:19 AM Joonwon Kang <joonwonkang@google.com> wrote:
> >
> > > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
> > > >
> > > > When the mailbox controller failed transmitting message, the error code
> > > > was only passed to the client's tx done handler and not to
> > > > mbox_send_message(). For this reason, the function could return a false
> > > > success. This commit resolves the issue by introducing the tx status and
> > > > checking it before mbox_send_message() returns.
> > > >
> > > Can you please share the scenario when this becomes necessary? This
> > > can potentially change the ground underneath some clients, so we have
> > > to be sure this is really useful.
> >
> > I would say the problem here is generic enough to apply to all the cases where
> > the send result needs to be checked. Since the return value of the send API is
> > not the real send result, any users who believe that this blocking send API
> > will return the real send result could fall for that. For example, users may
> > think the send was successful even though it was not actually. I believe it is
> > uncommon that users have to register a callback solely to get the send result
> > even though they are using the blocking send API already. Also, I guess there
> > is no special reason why only the mailbox send API should work this way among
> > other typical blocking send APIs. For these reasons, this patch makes the send
> > API return the real send result. This way, users will not need to register the
> > redundant callback and I think the return value will align with their common
> > expectation.
> >
> Clients submit a message into the Mailbox subsystem to be sent out to
> the remote side which can happen immediately or later.
> If submission fails, clients get immediately notified. If transmission
> fails (which is now internal to the subsystem) it is reported to the
> client by a callback.
> If the API was called mbox_submit_message (which it actually is)
> instead of mbox_send_message, there would be no confusion.
> We can argue how good/bad the current implementation is, but the fact
> is that it is here. And I am reluctant to cause churn without good
> reason.
> Again, as I said, any, _legal_, setup scenario will help me come over
> my reluctance.

mbox_send_message() in blocking mode is not only for submitting the message in
the first place if we read through the API docs.

From the API doc for `mbox_send_message()`:
```
 * If the client had set 'tx_block', the call will return
 * either when the remote receives the data or when 'tx_tout' millisecs
 * run out.
```

From the API doc for `struct mbox_client`:
```
 * @tx_block:		If the mbox_send_message should block until data is
 *			transmitted.
```

With the docs, I think it is apparent that the API covers "transmission" of the
message, not only submission of it. If sumbitting is the sole purpose of the
API, what does the API block for in the first place? We would not need the
blocking mode at all then.

The current return value of the API in failure cases is as follows:

 - When submission fails, returns failure.
 - When submission succeeds but timeout occurs during transmission, return
   failure, i.e. -ETIME.
 - When submission succeeds but transmission fails without timeout, return
   success.

The third case looks problematic. This patch is focusing on this. There is also
disparity to handle the failure between timeout(the second case) and
non-timeout(the third case). Why does it not return failure when non-timeout
error occurs during transmission whereas it does when timeout occurs during
transmission? If the API is solely for submission, why does it return failure
instead of success in the second case?

In the third case, the controller(or the client) will inform the mailbox core
of the transmission failure. Then, why not return that failure as a return
value of the API despite having this information in the core?

An alternative to fixing this issue would be adding the API doc by saying like:

 - In blocking mode, the send API will return -ETIME when timeout occurs during
   transmission, but it will not return failure but success(since submission
   itself was successful before transmission) when other errors occur during
   transmission. Users have to register a callback to collect the error code
   when non-timeout error occurs.

But, I think we can go away with this unnecessary confusion by fixing the API
just to return the error code when error occurs regardless of whether it is
timeout or not. Then, we could simply say:

 - In blocking mode, the send API will return failure when error occurs.

Since this patch is pointing out this anomaly of the send API's behavior, I
am not sure what scenario we would need more. In other words, the current way
of working would be more surprising to the users than the fixed version of it
as it was when I found out this issue for the first time.

Do you think that this change will cause any other problem on the client side
than fixing the existing issue? If not, I am wondering why not go fix the
issue with this patch.

Thanks,
Joonwon Kang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order
  2026-04-03 16:19       ` Jassi Brar
@ 2026-04-04 12:44         ` Joonwon Kang
  2026-04-05  0:58           ` zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Joonwon Kang @ 2026-04-04 12:44 UTC (permalink / raw)
  To: jassisinghbrar
  Cc: angelogioacchino.delregno, jonathanh, joonwonkang,
	linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra,
	matthias.bgg, stable, thierry.reding

> On Fri, Apr 3, 2026 at 9:51 AM Joonwon Kang <joonwonkang@google.com> wrote:
> >
> > > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
> > > >
> > > > Previously, a sender thread in mbox_send_message() could be woken up at
> > > > a wrong time in blocking mode. It is because there was only a single
> > > > completion for a channel whereas messages from multiple threads could be
> > > > sent in any order; since the shared completion could be signalled in any
> > > > order, it could wake up a wrong sender thread.
> > > >
> > > > This commit resolves the false wake-up issue with the following changes:
> > > > - Completions are created just as many as the number of concurrent sender
> > > >   threads
> > > > - A completion is created on a sender thread's stack
> > > > - Each slot of the message queue, i.e. `msg_data`, contains a pointer to
> > > >   its target completion
> > > > - tx_tick() signals the completion of the currently active slot of the
> > > >   message queue
> > > >
> > > I think I reviewed it already or is this happening on
> > > one-channel-one-client usage? Because mailbox api does not support
> > > channels shared among multiple clients.
> >
> > Yes, this patch is handling the one-channel-one-client usage but when that
> > single channel is shared between multiple threads.
> 
> hmm.... how is this not single-channel-multiple-clients ?
> A channel is returned as an opaque token to the clients, if that
> client shares that with other threads - they will race.

They will race because of the current blocking mode implementation. With this
patch, they should not race as it handles the known racing point. So, I think
it will be important to decide whether to support multi-threads in blocking
mode or not.

> It is the job of the original client to serialize its threads' access
> to the channel.

I can see the disparity with the non-blocking mode here. Currently, the client
does not need to serialize its threads' access to the channel in non-blocking
mode whereas it needs to in blocking mode. It would be nice if the client does
not need to in both modes, but it may also depend on the necessity as you said.

> > From my understanding, the
> > discussion back then ended with how to circumvent the issue rather than whether
> > we will eventually solve this in the mailbox framework or not, and if yes, how
> > we will, and if not, why.
> 
> It will be interesting to see how many current clients actually need
> to share channels. If there are enough, it makes sense to implement
> some helper api
> on top of existing code, instead of changing its nature totally.

I agree that we may need research on the current uses of channels and the
necessity of shared channels. However, it may require non-trivial amount of
time since it requires thorough understanding of the context of every client
driver. At this point, I think we at least need a clear documentation in terms
of multi-threads support as we have none now. Since it is obvious that
multi-threads is not supported for now, I can create another patch to add this
to the API doc to be clear. How do you think?

Thanks,
Joonwon Kang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order
  2026-04-04 12:44         ` Joonwon Kang
@ 2026-04-05  0:58           ` zhang
  0 siblings, 0 replies; 12+ messages in thread
From: zhang @ 2026-04-05  0:58 UTC (permalink / raw)
  To: joonwonkang
  Cc: angelogioacchino.delregno, jassisinghbrar, jonathanh,
	linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra,
	matthias.bgg, stable, thierry.reding

Hi! Joonwon Kang.

I just looked at the content of your email, and I think we can design a resource priority scheduling system with 70% and 30% priority allocation. The specific idea is as follows:

During task execution, each task can be tagged. Important tasks can be allocated to the 30% of resources, while the remaining 70% can be used to run low-load and repetitive pipeline tasks.

The specific algorithm can be written as follows: reserve 30% of the runtime space for the system's critical processes. For the remaining 70% of non-critical processes, a judgment can be made: if resource usage exceeds 70%, the excess processes are marked with a priority deferred tag and run only when resources are freed up.

-- 

the-essence-of-life

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-04-05  0:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02 17:06 [PATCH v3 0/2] mailbox: Fix wrong completion order and improper send result in the blocking mode send API Joonwon Kang
2026-04-02 17:06 ` [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order Joonwon Kang
2026-04-02 17:59   ` Jassi Brar
2026-04-03 14:51     ` Joonwon Kang
2026-04-03 16:19       ` Jassi Brar
2026-04-04 12:44         ` Joonwon Kang
2026-04-05  0:58           ` zhang
2026-04-02 17:06 ` [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails Joonwon Kang
2026-04-02 18:03   ` Jassi Brar
2026-04-03 15:19     ` Joonwon Kang
2026-04-03 16:36       ` Jassi Brar
2026-04-04 11:47         ` Joonwon Kang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox