* [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order [not found] <20260402170641.2082547-1-joonwonkang@google.com> @ 2026-04-02 17:06 ` Joonwon Kang 2026-04-02 17:59 ` Jassi Brar 2026-04-02 17:06 ` [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails Joonwon Kang 1 sibling, 1 reply; 11+ messages in thread From: Joonwon Kang @ 2026-04-02 17:06 UTC (permalink / raw) To: jassisinghbrar, matthias.bgg, angelogioacchino.delregno, thierry.reding, jonathanh Cc: linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra, Joonwon Kang, stable Previously, a sender thread in mbox_send_message() could be woken up at a wrong time in blocking mode. It is because there was only a single completion for a channel whereas messages from multiple threads could be sent in any order; since the shared completion could be signalled in any order, it could wake up a wrong sender thread. This commit resolves the false wake-up issue with the following changes: - Completions are created just as many as the number of concurrent sender threads - A completion is created on a sender thread's stack - Each slot of the message queue, i.e. `msg_data`, contains a pointer to its target completion - tx_tick() signals the completion of the currently active slot of the message queue Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/1490809381-28869-1-git-send-email-jaswinder.singh@linaro.org Signed-off-by: Joonwon Kang <joonwonkang@google.com> --- drivers/mailbox/mailbox.c | 86 +++++++++++++++++++----------- drivers/mailbox/mtk-vcp-mailbox.c | 2 +- drivers/mailbox/tegra-hsp.c | 2 +- include/linux/mailbox_controller.h | 20 ++++--- 4 files changed, 72 insertions(+), 38 deletions(-) diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c index 138ffbcd4fde..d63386468982 100644 --- a/drivers/mailbox/mailbox.c +++ b/drivers/mailbox/mailbox.c @@ -21,7 +21,7 @@ static LIST_HEAD(mbox_cons); static DEFINE_MUTEX(con_mutex); -static int add_to_rbuf(struct mbox_chan *chan, void *mssg) +static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete) { int idx; @@ -32,7 +32,8 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg) return -ENOBUFS; idx = chan->msg_free; - chan->msg_data[idx] = mssg; + chan->msg_data[idx].data = mssg; + chan->msg_data[idx].tx_complete = tx_complete; chan->msg_count++; if (idx == MBOX_TX_QUEUE_LEN - 1) @@ -50,24 +51,33 @@ static void msg_submit(struct mbox_chan *chan) int err = -EBUSY; scoped_guard(spinlock_irqsave, &chan->lock) { - if (!chan->msg_count || chan->active_req != MBOX_NO_MSG) + if (chan->active_req >= 0) break; - count = chan->msg_count; - idx = chan->msg_free; - if (idx >= count) - idx -= count; - else - idx += MBOX_TX_QUEUE_LEN - count; + while (chan->msg_count > 0) { + count = chan->msg_count; + idx = chan->msg_free; + if (idx >= count) + idx -= count; + else + idx += MBOX_TX_QUEUE_LEN - count; - data = chan->msg_data[idx]; + data = chan->msg_data[idx].data; + if (data != MBOX_NO_MSG) + break; + + chan->msg_count--; + } + + if (!chan->msg_count) + break; if (chan->cl->tx_prepare) chan->cl->tx_prepare(chan->cl, data); /* Try to submit a message to the MBOX controller */ err = chan->mbox->ops->send_data(chan, data); if (!err) { - chan->active_req = data; + chan->active_req = idx; chan->msg_count--; } } @@ -79,27 +89,35 @@ static void msg_submit(struct mbox_chan *chan) } } -static void tx_tick(struct mbox_chan *chan, int r) +static void tx_tick(struct mbox_chan *chan, int r, int idx) { - void *mssg; + struct mbox_message mssg = {MBOX_NO_MSG, NULL}; scoped_guard(spinlock_irqsave, &chan->lock) { - mssg = chan->active_req; - chan->active_req = MBOX_NO_MSG; + if (idx >= 0 && idx != chan->active_req) { + chan->msg_data[idx].data = MBOX_NO_MSG; + chan->msg_data[idx].tx_complete = NULL; + return; + } + + if (chan->active_req >= 0) { + mssg = chan->msg_data[chan->active_req]; + chan->active_req = -1; + } } /* Submit next message */ msg_submit(chan); - if (mssg == MBOX_NO_MSG) + if (mssg.data == MBOX_NO_MSG) return; /* Notify the client */ if (chan->cl->tx_done) - chan->cl->tx_done(chan->cl, mssg, r); + chan->cl->tx_done(chan->cl, mssg.data, r); if (r != -ETIME && chan->cl->tx_block) - complete(&chan->tx_complete); + complete(mssg.tx_complete); } static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) @@ -112,10 +130,10 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) for (i = 0; i < mbox->num_chans; i++) { struct mbox_chan *chan = &mbox->chans[i]; - if (chan->active_req != MBOX_NO_MSG && chan->cl) { + if (chan->active_req >= 0 && chan->cl) { txdone = chan->mbox->ops->last_tx_done(chan); if (txdone) - tx_tick(chan, 0); + tx_tick(chan, 0, -1); else resched = true; } @@ -168,7 +186,7 @@ void mbox_chan_txdone(struct mbox_chan *chan, int r) return; } - tx_tick(chan, r); + tx_tick(chan, r, -1); } EXPORT_SYMBOL_GPL(mbox_chan_txdone); @@ -188,7 +206,7 @@ void mbox_client_txdone(struct mbox_chan *chan, int r) return; } - tx_tick(chan, r); + tx_tick(chan, r, -1); } EXPORT_SYMBOL_GPL(mbox_client_txdone); @@ -266,11 +284,19 @@ EXPORT_SYMBOL_GPL(mbox_chan_tx_slots_available); int mbox_send_message(struct mbox_chan *chan, void *mssg) { int t; + int idx; + struct completion tx_complete; if (!chan || !chan->cl || mssg == MBOX_NO_MSG) return -EINVAL; - t = add_to_rbuf(chan, mssg); + if (chan->cl->tx_block) { + init_completion(&tx_complete); + t = add_to_rbuf(chan, mssg, &tx_complete); + } else { + t = add_to_rbuf(chan, mssg, NULL); + } + if (t < 0) { dev_err(chan->mbox->dev, "Try increasing MBOX_TX_QUEUE_LEN\n"); return t; @@ -287,10 +313,11 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg) else wait = msecs_to_jiffies(chan->cl->tx_tout); - ret = wait_for_completion_timeout(&chan->tx_complete, wait); + ret = wait_for_completion_timeout(&tx_complete, wait); if (ret == 0) { + idx = t; t = -ETIME; - tx_tick(chan, t); + tx_tick(chan, t, idx); } } @@ -321,7 +348,7 @@ int mbox_flush(struct mbox_chan *chan, unsigned long timeout) ret = chan->mbox->ops->flush(chan, timeout); if (ret < 0) - tx_tick(chan, ret); + tx_tick(chan, ret, -1); return ret; } @@ -340,9 +367,8 @@ static int __mbox_bind_client(struct mbox_chan *chan, struct mbox_client *cl) scoped_guard(spinlock_irqsave, &chan->lock) { chan->msg_free = 0; chan->msg_count = 0; - chan->active_req = MBOX_NO_MSG; + chan->active_req = -1; chan->cl = cl; - init_completion(&chan->tx_complete); if (chan->txdone_method == TXDONE_BY_POLL && cl->knows_txdone) chan->txdone_method = TXDONE_BY_ACK; @@ -498,7 +524,7 @@ void mbox_free_channel(struct mbox_chan *chan) /* The queued TX requests are simply aborted, no callbacks are made */ scoped_guard(spinlock_irqsave, &chan->lock) { chan->cl = NULL; - chan->active_req = MBOX_NO_MSG; + chan->active_req = -1; if (chan->txdone_method == TXDONE_BY_ACK) chan->txdone_method = TXDONE_BY_POLL; } @@ -553,7 +579,7 @@ int mbox_controller_register(struct mbox_controller *mbox) chan->cl = NULL; chan->mbox = mbox; - chan->active_req = MBOX_NO_MSG; + chan->active_req = -1; chan->txdone_method = txdone; spin_lock_init(&chan->lock); } diff --git a/drivers/mailbox/mtk-vcp-mailbox.c b/drivers/mailbox/mtk-vcp-mailbox.c index 1b291b8ea15a..a7bab06ac686 100644 --- a/drivers/mailbox/mtk-vcp-mailbox.c +++ b/drivers/mailbox/mtk-vcp-mailbox.c @@ -84,7 +84,7 @@ static int mtk_vcp_mbox_send_data(struct mbox_chan *chan, void *data) static bool mtk_vcp_mbox_last_tx_done(struct mbox_chan *chan) { - struct mtk_ipi_info *ipi_info = chan->active_req; + struct mtk_ipi_info *ipi_info = chan->msg_data[chan->active_req].data; struct mtk_vcp_mbox *priv = chan->con_priv; return !(readl(priv->base + priv->cfg->set_in) & BIT(ipi_info->index)); diff --git a/drivers/mailbox/tegra-hsp.c b/drivers/mailbox/tegra-hsp.c index 7b1e1b83ea29..efe0033cb5c5 100644 --- a/drivers/mailbox/tegra-hsp.c +++ b/drivers/mailbox/tegra-hsp.c @@ -495,7 +495,7 @@ static int tegra_hsp_mailbox_flush(struct mbox_chan *chan, mbox_chan_txdone(chan, 0); /* Wait until channel is empty */ - if (chan->active_req != MBOX_NO_MSG) + if (chan->active_req >= 0) continue; return 0; diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h index e3896b08f22e..912499ad08ed 100644 --- a/include/linux/mailbox_controller.h +++ b/include/linux/mailbox_controller.h @@ -113,16 +113,25 @@ struct mbox_controller { */ #define MBOX_TX_QUEUE_LEN 20 +/** + * struct mbox_message - Internal representation of a mailbox message + * @data: Data packet + * @tx_complete: Pointer to the transmission completion + */ +struct mbox_message { + void *data; + struct completion *tx_complete; +}; + /** * struct mbox_chan - s/w representation of a communication chan * @mbox: Pointer to the parent/provider of this channel * @txdone_method: Way to detect TXDone chosen by the API * @cl: Pointer to the current owner of this channel - * @tx_complete: Transmission completion - * @active_req: Currently active request hook + * @active_req: Index of the currently active slot in the queue * @msg_count: No. of mssg currently queued * @msg_free: Index of next available mssg slot - * @msg_data: Hook for data packet + * @msg_data: Queue of data packets * @lock: Serialise access to the channel * @con_priv: Hook for controller driver to attach private data */ @@ -130,10 +139,9 @@ struct mbox_chan { struct mbox_controller *mbox; unsigned txdone_method; struct mbox_client *cl; - struct completion tx_complete; - void *active_req; + int active_req; unsigned msg_count, msg_free; - void *msg_data[MBOX_TX_QUEUE_LEN]; + struct mbox_message msg_data[MBOX_TX_QUEUE_LEN]; spinlock_t lock; /* Serialise access to the channel */ void *con_priv; }; -- 2.53.0.1185.g05d4b7b318-goog ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order 2026-04-02 17:06 ` [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order Joonwon Kang @ 2026-04-02 17:59 ` Jassi Brar 2026-04-03 14:51 ` Joonwon Kang 0 siblings, 1 reply; 11+ messages in thread From: Jassi Brar @ 2026-04-02 17:59 UTC (permalink / raw) To: Joonwon Kang Cc: matthias.bgg, angelogioacchino.delregno, thierry.reding, jonathanh, linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra, stable On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote: > > Previously, a sender thread in mbox_send_message() could be woken up at > a wrong time in blocking mode. It is because there was only a single > completion for a channel whereas messages from multiple threads could be > sent in any order; since the shared completion could be signalled in any > order, it could wake up a wrong sender thread. > > This commit resolves the false wake-up issue with the following changes: > - Completions are created just as many as the number of concurrent sender > threads > - A completion is created on a sender thread's stack > - Each slot of the message queue, i.e. `msg_data`, contains a pointer to > its target completion > - tx_tick() signals the completion of the currently active slot of the > message queue > I think I reviewed it already or is this happening on one-channel-one-client usage? Because mailbox api does not support channels shared among multiple clients. Thanks Jassi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order 2026-04-02 17:59 ` Jassi Brar @ 2026-04-03 14:51 ` Joonwon Kang 2026-04-03 16:19 ` Jassi Brar 0 siblings, 1 reply; 11+ messages in thread From: Joonwon Kang @ 2026-04-03 14:51 UTC (permalink / raw) To: jassisinghbrar Cc: angelogioacchino.delregno, jonathanh, joonwonkang, linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra, matthias.bgg, stable, thierry.reding > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote: > > > > Previously, a sender thread in mbox_send_message() could be woken up at > > a wrong time in blocking mode. It is because there was only a single > > completion for a channel whereas messages from multiple threads could be > > sent in any order; since the shared completion could be signalled in any > > order, it could wake up a wrong sender thread. > > > > This commit resolves the false wake-up issue with the following changes: > > - Completions are created just as many as the number of concurrent sender > > threads > > - A completion is created on a sender thread's stack > > - Each slot of the message queue, i.e. `msg_data`, contains a pointer to > > its target completion > > - tx_tick() signals the completion of the currently active slot of the > > message queue > > > I think I reviewed it already or is this happening on > one-channel-one-client usage? Because mailbox api does not support > channels shared among multiple clients. Yes, this patch is handling the one-channel-one-client usage but when that single channel is shared between multiple threads. From my understanding, the discussion back then ended with how to circumvent the issue rather than whether we will eventually solve this in the mailbox framework or not, and if yes, how we will, and if not, why. I think it should still be resolved in the framework for the reasons in the cover letter. Could you help to give a second review with regards to those aspects? Thanks, Joonwon Kang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order 2026-04-03 14:51 ` Joonwon Kang @ 2026-04-03 16:19 ` Jassi Brar 2026-04-04 12:44 ` Joonwon Kang 0 siblings, 1 reply; 11+ messages in thread From: Jassi Brar @ 2026-04-03 16:19 UTC (permalink / raw) To: Joonwon Kang Cc: angelogioacchino.delregno, jonathanh, linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra, matthias.bgg, stable, thierry.reding On Fri, Apr 3, 2026 at 9:51 AM Joonwon Kang <joonwonkang@google.com> wrote: > > > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote: > > > > > > Previously, a sender thread in mbox_send_message() could be woken up at > > > a wrong time in blocking mode. It is because there was only a single > > > completion for a channel whereas messages from multiple threads could be > > > sent in any order; since the shared completion could be signalled in any > > > order, it could wake up a wrong sender thread. > > > > > > This commit resolves the false wake-up issue with the following changes: > > > - Completions are created just as many as the number of concurrent sender > > > threads > > > - A completion is created on a sender thread's stack > > > - Each slot of the message queue, i.e. `msg_data`, contains a pointer to > > > its target completion > > > - tx_tick() signals the completion of the currently active slot of the > > > message queue > > > > > I think I reviewed it already or is this happening on > > one-channel-one-client usage? Because mailbox api does not support > > channels shared among multiple clients. > > Yes, this patch is handling the one-channel-one-client usage but when that > single channel is shared between multiple threads. hmm.... how is this not single-channel-multiple-clients ? A channel is returned as an opaque token to the clients, if that client shares that with other threads - they will race. It is the job of the original client to serialize its threads' access to the channel. > From my understanding, the > discussion back then ended with how to circumvent the issue rather than whether > we will eventually solve this in the mailbox framework or not, and if yes, how > we will, and if not, why. It will be interesting to see how many current clients actually need to share channels. If there are enough, it makes sense to implement some helper api on top of existing code, instead of changing its nature totally. Thanks Jassi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order 2026-04-03 16:19 ` Jassi Brar @ 2026-04-04 12:44 ` Joonwon Kang 2026-04-05 0:58 ` zhang 0 siblings, 1 reply; 11+ messages in thread From: Joonwon Kang @ 2026-04-04 12:44 UTC (permalink / raw) To: jassisinghbrar Cc: angelogioacchino.delregno, jonathanh, joonwonkang, linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra, matthias.bgg, stable, thierry.reding > On Fri, Apr 3, 2026 at 9:51 AM Joonwon Kang <joonwonkang@google.com> wrote: > > > > > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote: > > > > > > > > Previously, a sender thread in mbox_send_message() could be woken up at > > > > a wrong time in blocking mode. It is because there was only a single > > > > completion for a channel whereas messages from multiple threads could be > > > > sent in any order; since the shared completion could be signalled in any > > > > order, it could wake up a wrong sender thread. > > > > > > > > This commit resolves the false wake-up issue with the following changes: > > > > - Completions are created just as many as the number of concurrent sender > > > > threads > > > > - A completion is created on a sender thread's stack > > > > - Each slot of the message queue, i.e. `msg_data`, contains a pointer to > > > > its target completion > > > > - tx_tick() signals the completion of the currently active slot of the > > > > message queue > > > > > > > I think I reviewed it already or is this happening on > > > one-channel-one-client usage? Because mailbox api does not support > > > channels shared among multiple clients. > > > > Yes, this patch is handling the one-channel-one-client usage but when that > > single channel is shared between multiple threads. > > hmm.... how is this not single-channel-multiple-clients ? > A channel is returned as an opaque token to the clients, if that > client shares that with other threads - they will race. They will race because of the current blocking mode implementation. With this patch, they should not race as it handles the known racing point. So, I think it will be important to decide whether to support multi-threads in blocking mode or not. > It is the job of the original client to serialize its threads' access > to the channel. I can see the disparity with the non-blocking mode here. Currently, the client does not need to serialize its threads' access to the channel in non-blocking mode whereas it needs to in blocking mode. It would be nice if the client does not need to in both modes, but it may also depend on the necessity as you said. > > From my understanding, the > > discussion back then ended with how to circumvent the issue rather than whether > > we will eventually solve this in the mailbox framework or not, and if yes, how > > we will, and if not, why. > > It will be interesting to see how many current clients actually need > to share channels. If there are enough, it makes sense to implement > some helper api > on top of existing code, instead of changing its nature totally. I agree that we may need research on the current uses of channels and the necessity of shared channels. However, it may require non-trivial amount of time since it requires thorough understanding of the context of every client driver. At this point, I think we at least need a clear documentation in terms of multi-threads support as we have none now. Since it is obvious that multi-threads is not supported for now, I can create another patch to add this to the API doc to be clear. How do you think? Thanks, Joonwon Kang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order 2026-04-04 12:44 ` Joonwon Kang @ 2026-04-05 0:58 ` zhang 0 siblings, 0 replies; 11+ messages in thread From: zhang @ 2026-04-05 0:58 UTC (permalink / raw) To: joonwonkang Cc: angelogioacchino.delregno, jassisinghbrar, jonathanh, linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra, matthias.bgg, stable, thierry.reding Hi! Joonwon Kang. I just looked at the content of your email, and I think we can design a resource priority scheduling system with 70% and 30% priority allocation. The specific idea is as follows: During task execution, each task can be tagged. Important tasks can be allocated to the 30% of resources, while the remaining 70% can be used to run low-load and repetitive pipeline tasks. The specific algorithm can be written as follows: reserve 30% of the runtime space for the system's critical processes. For the remaining 70% of non-critical processes, a judgment can be made: if resource usage exceeds 70%, the excess processes are marked with a priority deferred tag and run only when resources are freed up. -- the-essence-of-life ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails [not found] <20260402170641.2082547-1-joonwonkang@google.com> 2026-04-02 17:06 ` [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order Joonwon Kang @ 2026-04-02 17:06 ` Joonwon Kang 2026-04-02 18:03 ` Jassi Brar 1 sibling, 1 reply; 11+ messages in thread From: Joonwon Kang @ 2026-04-02 17:06 UTC (permalink / raw) To: jassisinghbrar, matthias.bgg, angelogioacchino.delregno, thierry.reding, jonathanh Cc: linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra, Joonwon Kang, stable When the mailbox controller failed transmitting message, the error code was only passed to the client's tx done handler and not to mbox_send_message(). For this reason, the function could return a false success. This commit resolves the issue by introducing the tx status and checking it before mbox_send_message() returns. Cc: stable@vger.kernel.org Signed-off-by: Joonwon Kang <joonwonkang@google.com> --- drivers/mailbox/mailbox.c | 20 +++++++++++++++----- include/linux/mailbox_controller.h | 2 ++ 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c index d63386468982..ea9aec9dc947 100644 --- a/drivers/mailbox/mailbox.c +++ b/drivers/mailbox/mailbox.c @@ -21,7 +21,10 @@ static LIST_HEAD(mbox_cons); static DEFINE_MUTEX(con_mutex); -static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete) +static int add_to_rbuf(struct mbox_chan *chan, + void *mssg, + struct completion *tx_complete, + int *tx_status) { int idx; @@ -34,6 +37,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx idx = chan->msg_free; chan->msg_data[idx].data = mssg; chan->msg_data[idx].tx_complete = tx_complete; + chan->msg_data[idx].tx_status = tx_status; chan->msg_count++; if (idx == MBOX_TX_QUEUE_LEN - 1) @@ -91,12 +95,13 @@ static void msg_submit(struct mbox_chan *chan) static void tx_tick(struct mbox_chan *chan, int r, int idx) { - struct mbox_message mssg = {MBOX_NO_MSG, NULL}; + struct mbox_message mssg = {MBOX_NO_MSG, NULL, NULL}; scoped_guard(spinlock_irqsave, &chan->lock) { if (idx >= 0 && idx != chan->active_req) { chan->msg_data[idx].data = MBOX_NO_MSG; chan->msg_data[idx].tx_complete = NULL; + chan->msg_data[idx].tx_status = NULL; return; } @@ -116,8 +121,10 @@ static void tx_tick(struct mbox_chan *chan, int r, int idx) if (chan->cl->tx_done) chan->cl->tx_done(chan->cl, mssg.data, r); - if (r != -ETIME && chan->cl->tx_block) + if (r != -ETIME && chan->cl->tx_block) { + *mssg.tx_status = r; complete(mssg.tx_complete); + } } static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) @@ -286,15 +293,16 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg) int t; int idx; struct completion tx_complete; + int tx_status = 0; if (!chan || !chan->cl || mssg == MBOX_NO_MSG) return -EINVAL; if (chan->cl->tx_block) { init_completion(&tx_complete); - t = add_to_rbuf(chan, mssg, &tx_complete); + t = add_to_rbuf(chan, mssg, &tx_complete, &tx_status); } else { - t = add_to_rbuf(chan, mssg, NULL); + t = add_to_rbuf(chan, mssg, NULL, NULL); } if (t < 0) { @@ -318,6 +326,8 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg) idx = t; t = -ETIME; tx_tick(chan, t, idx); + } else if (tx_status < 0) { + t = tx_status; } } diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h index 912499ad08ed..890da97bcb50 100644 --- a/include/linux/mailbox_controller.h +++ b/include/linux/mailbox_controller.h @@ -117,10 +117,12 @@ struct mbox_controller { * struct mbox_message - Internal representation of a mailbox message * @data: Data packet * @tx_complete: Pointer to the transmission completion + * @tx_status: Pointer to the transmission status */ struct mbox_message { void *data; struct completion *tx_complete; + int *tx_status; }; /** -- 2.53.0.1185.g05d4b7b318-goog ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails 2026-04-02 17:06 ` [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails Joonwon Kang @ 2026-04-02 18:03 ` Jassi Brar 2026-04-03 15:19 ` Joonwon Kang 0 siblings, 1 reply; 11+ messages in thread From: Jassi Brar @ 2026-04-02 18:03 UTC (permalink / raw) To: Joonwon Kang Cc: matthias.bgg, angelogioacchino.delregno, thierry.reding, jonathanh, linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra, stable On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote: > > When the mailbox controller failed transmitting message, the error code > was only passed to the client's tx done handler and not to > mbox_send_message(). For this reason, the function could return a false > success. This commit resolves the issue by introducing the tx status and > checking it before mbox_send_message() returns. > Can you please share the scenario when this becomes necessary? This can potentially change the ground underneath some clients, so we have to be sure this is really useful. Thanks Jassi > Cc: stable@vger.kernel.org > Signed-off-by: Joonwon Kang <joonwonkang@google.com> > --- > drivers/mailbox/mailbox.c | 20 +++++++++++++++----- > include/linux/mailbox_controller.h | 2 ++ > 2 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c > index d63386468982..ea9aec9dc947 100644 > --- a/drivers/mailbox/mailbox.c > +++ b/drivers/mailbox/mailbox.c > @@ -21,7 +21,10 @@ > static LIST_HEAD(mbox_cons); > static DEFINE_MUTEX(con_mutex); > > -static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete) > +static int add_to_rbuf(struct mbox_chan *chan, > + void *mssg, > + struct completion *tx_complete, > + int *tx_status) > { > int idx; > > @@ -34,6 +37,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx > idx = chan->msg_free; > chan->msg_data[idx].data = mssg; > chan->msg_data[idx].tx_complete = tx_complete; > + chan->msg_data[idx].tx_status = tx_status; > chan->msg_count++; > > if (idx == MBOX_TX_QUEUE_LEN - 1) > @@ -91,12 +95,13 @@ static void msg_submit(struct mbox_chan *chan) > > static void tx_tick(struct mbox_chan *chan, int r, int idx) > { > - struct mbox_message mssg = {MBOX_NO_MSG, NULL}; > + struct mbox_message mssg = {MBOX_NO_MSG, NULL, NULL}; > > scoped_guard(spinlock_irqsave, &chan->lock) { > if (idx >= 0 && idx != chan->active_req) { > chan->msg_data[idx].data = MBOX_NO_MSG; > chan->msg_data[idx].tx_complete = NULL; > + chan->msg_data[idx].tx_status = NULL; > return; > } > > @@ -116,8 +121,10 @@ static void tx_tick(struct mbox_chan *chan, int r, int idx) > if (chan->cl->tx_done) > chan->cl->tx_done(chan->cl, mssg.data, r); > > - if (r != -ETIME && chan->cl->tx_block) > + if (r != -ETIME && chan->cl->tx_block) { > + *mssg.tx_status = r; > complete(mssg.tx_complete); > + } > } > > static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) > @@ -286,15 +293,16 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg) > int t; > int idx; > struct completion tx_complete; > + int tx_status = 0; > > if (!chan || !chan->cl || mssg == MBOX_NO_MSG) > return -EINVAL; > > if (chan->cl->tx_block) { > init_completion(&tx_complete); > - t = add_to_rbuf(chan, mssg, &tx_complete); > + t = add_to_rbuf(chan, mssg, &tx_complete, &tx_status); > } else { > - t = add_to_rbuf(chan, mssg, NULL); > + t = add_to_rbuf(chan, mssg, NULL, NULL); > } > > if (t < 0) { > @@ -318,6 +326,8 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg) > idx = t; > t = -ETIME; > tx_tick(chan, t, idx); > + } else if (tx_status < 0) { > + t = tx_status; > } > } > > diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h > index 912499ad08ed..890da97bcb50 100644 > --- a/include/linux/mailbox_controller.h > +++ b/include/linux/mailbox_controller.h > @@ -117,10 +117,12 @@ struct mbox_controller { > * struct mbox_message - Internal representation of a mailbox message > * @data: Data packet > * @tx_complete: Pointer to the transmission completion > + * @tx_status: Pointer to the transmission status > */ > struct mbox_message { > void *data; > struct completion *tx_complete; > + int *tx_status; > }; > > /** > -- > 2.53.0.1185.g05d4b7b318-goog > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails 2026-04-02 18:03 ` Jassi Brar @ 2026-04-03 15:19 ` Joonwon Kang 2026-04-03 16:36 ` Jassi Brar 0 siblings, 1 reply; 11+ messages in thread From: Joonwon Kang @ 2026-04-03 15:19 UTC (permalink / raw) To: jassisinghbrar Cc: angelogioacchino.delregno, jonathanh, joonwonkang, linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra, matthias.bgg, stable, thierry.reding, akpm > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote: > > > > When the mailbox controller failed transmitting message, the error code > > was only passed to the client's tx done handler and not to > > mbox_send_message(). For this reason, the function could return a false > > success. This commit resolves the issue by introducing the tx status and > > checking it before mbox_send_message() returns. > > > Can you please share the scenario when this becomes necessary? This > can potentially change the ground underneath some clients, so we have > to be sure this is really useful. I would say the problem here is generic enough to apply to all the cases where the send result needs to be checked. Since the return value of the send API is not the real send result, any users who believe that this blocking send API will return the real send result could fall for that. For example, users may think the send was successful even though it was not actually. I believe it is uncommon that users have to register a callback solely to get the send result even though they are using the blocking send API already. Also, I guess there is no special reason why only the mailbox send API should work this way among other typical blocking send APIs. For these reasons, this patch makes the send API return the real send result. This way, users will not need to register the redundant callback and I think the return value will align with their common expectation. Regarding the change in the ground for some clients, could you help to clarify a bit more on what change, you expect, would surprise the clients? Thanks, Joonwon Kang > > Thanks > Jassi > > > > Cc: stable@vger.kernel.org > > Signed-off-by: Joonwon Kang <joonwonkang@google.com> > > --- > > drivers/mailbox/mailbox.c | 20 +++++++++++++++----- > > include/linux/mailbox_controller.h | 2 ++ > > 2 files changed, 17 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c > > index d63386468982..ea9aec9dc947 100644 > > --- a/drivers/mailbox/mailbox.c > > +++ b/drivers/mailbox/mailbox.c > > @@ -21,7 +21,10 @@ > > static LIST_HEAD(mbox_cons); > > static DEFINE_MUTEX(con_mutex); > > > > -static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete) > > +static int add_to_rbuf(struct mbox_chan *chan, > > + void *mssg, > > + struct completion *tx_complete, > > + int *tx_status) > > { > > int idx; > > > > @@ -34,6 +37,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx > > idx = chan->msg_free; > > chan->msg_data[idx].data = mssg; > > chan->msg_data[idx].tx_complete = tx_complete; > > + chan->msg_data[idx].tx_status = tx_status; > > chan->msg_count++; > > > > if (idx == MBOX_TX_QUEUE_LEN - 1) > > @@ -91,12 +95,13 @@ static void msg_submit(struct mbox_chan *chan) > > > > static void tx_tick(struct mbox_chan *chan, int r, int idx) > > { > > - struct mbox_message mssg = {MBOX_NO_MSG, NULL}; > > + struct mbox_message mssg = {MBOX_NO_MSG, NULL, NULL}; > > > > scoped_guard(spinlock_irqsave, &chan->lock) { > > if (idx >= 0 && idx != chan->active_req) { > > chan->msg_data[idx].data = MBOX_NO_MSG; > > chan->msg_data[idx].tx_complete = NULL; > > + chan->msg_data[idx].tx_status = NULL; > > return; > > } > > > > @@ -116,8 +121,10 @@ static void tx_tick(struct mbox_chan *chan, int r, int idx) > > if (chan->cl->tx_done) > > chan->cl->tx_done(chan->cl, mssg.data, r); > > > > - if (r != -ETIME && chan->cl->tx_block) > > + if (r != -ETIME && chan->cl->tx_block) { > > + *mssg.tx_status = r; > > complete(mssg.tx_complete); > > + } > > } > > > > static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) > > @@ -286,15 +293,16 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg) > > int t; > > int idx; > > struct completion tx_complete; > > + int tx_status = 0; > > > > if (!chan || !chan->cl || mssg == MBOX_NO_MSG) > > return -EINVAL; > > > > if (chan->cl->tx_block) { > > init_completion(&tx_complete); > > - t = add_to_rbuf(chan, mssg, &tx_complete); > > + t = add_to_rbuf(chan, mssg, &tx_complete, &tx_status); > > } else { > > - t = add_to_rbuf(chan, mssg, NULL); > > + t = add_to_rbuf(chan, mssg, NULL, NULL); > > } > > > > if (t < 0) { > > @@ -318,6 +326,8 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg) > > idx = t; > > t = -ETIME; > > tx_tick(chan, t, idx); > > + } else if (tx_status < 0) { > > + t = tx_status; > > } > > } > > > > diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h > > index 912499ad08ed..890da97bcb50 100644 > > --- a/include/linux/mailbox_controller.h > > +++ b/include/linux/mailbox_controller.h > > @@ -117,10 +117,12 @@ struct mbox_controller { > > * struct mbox_message - Internal representation of a mailbox message > > * @data: Data packet > > * @tx_complete: Pointer to the transmission completion > > + * @tx_status: Pointer to the transmission status > > */ > > struct mbox_message { > > void *data; > > struct completion *tx_complete; > > + int *tx_status; > > }; > > > > /** > > -- > > 2.53.0.1185.g05d4b7b318-goog > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails 2026-04-03 15:19 ` Joonwon Kang @ 2026-04-03 16:36 ` Jassi Brar 2026-04-04 11:47 ` Joonwon Kang 0 siblings, 1 reply; 11+ messages in thread From: Jassi Brar @ 2026-04-03 16:36 UTC (permalink / raw) To: Joonwon Kang Cc: angelogioacchino.delregno, jonathanh, linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra, matthias.bgg, stable, thierry.reding, akpm On Fri, Apr 3, 2026 at 10:19 AM Joonwon Kang <joonwonkang@google.com> wrote: > > > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote: > > > > > > When the mailbox controller failed transmitting message, the error code > > > was only passed to the client's tx done handler and not to > > > mbox_send_message(). For this reason, the function could return a false > > > success. This commit resolves the issue by introducing the tx status and > > > checking it before mbox_send_message() returns. > > > > > Can you please share the scenario when this becomes necessary? This > > can potentially change the ground underneath some clients, so we have > > to be sure this is really useful. > > I would say the problem here is generic enough to apply to all the cases where > the send result needs to be checked. Since the return value of the send API is > not the real send result, any users who believe that this blocking send API > will return the real send result could fall for that. For example, users may > think the send was successful even though it was not actually. I believe it is > uncommon that users have to register a callback solely to get the send result > even though they are using the blocking send API already. Also, I guess there > is no special reason why only the mailbox send API should work this way among > other typical blocking send APIs. For these reasons, this patch makes the send > API return the real send result. This way, users will not need to register the > redundant callback and I think the return value will align with their common > expectation. > Clients submit a message into the Mailbox subsystem to be sent out to the remote side which can happen immediately or later. If submission fails, clients get immediately notified. If transmission fails (which is now internal to the subsystem) it is reported to the client by a callback. If the API was called mbox_submit_message (which it actually is) instead of mbox_send_message, there would be no confusion. We can argue how good/bad the current implementation is, but the fact is that it is here. And I am reluctant to cause churn without good reason. Again, as I said, any, _legal_, setup scenario will help me come over my reluctance. Thanks Jassi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails 2026-04-03 16:36 ` Jassi Brar @ 2026-04-04 11:47 ` Joonwon Kang 0 siblings, 0 replies; 11+ messages in thread From: Joonwon Kang @ 2026-04-04 11:47 UTC (permalink / raw) To: jassisinghbrar Cc: akpm, angelogioacchino.delregno, jonathanh, joonwonkang, linux-arm-kernel, linux-kernel, linux-mediatek, linux-tegra, matthias.bgg, stable, thierry.reding, lee > On Fri, Apr 3, 2026 at 10:19 AM Joonwon Kang <joonwonkang@google.com> wrote: > > > > > On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote: > > > > > > > > When the mailbox controller failed transmitting message, the error code > > > > was only passed to the client's tx done handler and not to > > > > mbox_send_message(). For this reason, the function could return a false > > > > success. This commit resolves the issue by introducing the tx status and > > > > checking it before mbox_send_message() returns. > > > > > > > Can you please share the scenario when this becomes necessary? This > > > can potentially change the ground underneath some clients, so we have > > > to be sure this is really useful. > > > > I would say the problem here is generic enough to apply to all the cases where > > the send result needs to be checked. Since the return value of the send API is > > not the real send result, any users who believe that this blocking send API > > will return the real send result could fall for that. For example, users may > > think the send was successful even though it was not actually. I believe it is > > uncommon that users have to register a callback solely to get the send result > > even though they are using the blocking send API already. Also, I guess there > > is no special reason why only the mailbox send API should work this way among > > other typical blocking send APIs. For these reasons, this patch makes the send > > API return the real send result. This way, users will not need to register the > > redundant callback and I think the return value will align with their common > > expectation. > > > Clients submit a message into the Mailbox subsystem to be sent out to > the remote side which can happen immediately or later. > If submission fails, clients get immediately notified. If transmission > fails (which is now internal to the subsystem) it is reported to the > client by a callback. > If the API was called mbox_submit_message (which it actually is) > instead of mbox_send_message, there would be no confusion. > We can argue how good/bad the current implementation is, but the fact > is that it is here. And I am reluctant to cause churn without good > reason. > Again, as I said, any, _legal_, setup scenario will help me come over > my reluctance. mbox_send_message() in blocking mode is not only for submitting the message in the first place if we read through the API docs. From the API doc for `mbox_send_message()`: ``` * If the client had set 'tx_block', the call will return * either when the remote receives the data or when 'tx_tout' millisecs * run out. ``` From the API doc for `struct mbox_client`: ``` * @tx_block: If the mbox_send_message should block until data is * transmitted. ``` With the docs, I think it is apparent that the API covers "transmission" of the message, not only submission of it. If sumbitting is the sole purpose of the API, what does the API block for in the first place? We would not need the blocking mode at all then. The current return value of the API in failure cases is as follows: - When submission fails, returns failure. - When submission succeeds but timeout occurs during transmission, return failure, i.e. -ETIME. - When submission succeeds but transmission fails without timeout, return success. The third case looks problematic. This patch is focusing on this. There is also disparity to handle the failure between timeout(the second case) and non-timeout(the third case). Why does it not return failure when non-timeout error occurs during transmission whereas it does when timeout occurs during transmission? If the API is solely for submission, why does it return failure instead of success in the second case? In the third case, the controller(or the client) will inform the mailbox core of the transmission failure. Then, why not return that failure as a return value of the API despite having this information in the core? An alternative to fixing this issue would be adding the API doc by saying like: - In blocking mode, the send API will return -ETIME when timeout occurs during transmission, but it will not return failure but success(since submission itself was successful before transmission) when other errors occur during transmission. Users have to register a callback to collect the error code when non-timeout error occurs. But, I think we can go away with this unnecessary confusion by fixing the API just to return the error code when error occurs regardless of whether it is timeout or not. Then, we could simply say: - In blocking mode, the send API will return failure when error occurs. Since this patch is pointing out this anomaly of the send API's behavior, I am not sure what scenario we would need more. In other words, the current way of working would be more surprising to the users than the fixed version of it as it was when I found out this issue for the first time. Do you think that this change will cause any other problem on the client side than fixing the existing issue? If not, I am wondering why not go fix the issue with this patch. Thanks, Joonwon Kang ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-05 0:58 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260402170641.2082547-1-joonwonkang@google.com>
2026-04-02 17:06 ` [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order Joonwon Kang
2026-04-02 17:59 ` Jassi Brar
2026-04-03 14:51 ` Joonwon Kang
2026-04-03 16:19 ` Jassi Brar
2026-04-04 12:44 ` Joonwon Kang
2026-04-05 0:58 ` zhang
2026-04-02 17:06 ` [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails Joonwon Kang
2026-04-02 18:03 ` Jassi Brar
2026-04-03 15:19 ` Joonwon Kang
2026-04-03 16:36 ` Jassi Brar
2026-04-04 11:47 ` Joonwon Kang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox