[PATCH] mailbox: stop the release and reacquire of the chan lock

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mailbox: stop the release and reacquire of the chan lock
@ 2025-06-06 13:41 Tudor Ambarus
  2025-06-22 23:41 ` Jassi Brar
  0 siblings, 1 reply; 6+ messages in thread
From: Tudor Ambarus @ 2025-06-06 13:41 UTC (permalink / raw)
  To: Jassi Brar
  Cc: peter.griffin, andre.draszik, willmcvicker, cristian.marussi,
	sudeep.holla, kernel-team, arm-scmi, linux-kernel, Tudor Ambarus

There are two cases where the chan lock is released and reacquired
were it shouldn't really be:

1/ released at the end of add_to_rbuf() and reacquired at the beginning
of msg_submit(). After the lock is released at the end of add_to_rbuf(),
if the mailbox core is under heavy load, the mailbox software queue may
fill up without any of the threads getting the chance to drain the
software queue.
	T#0 acquires chan lock, fills rbuf, releases the lock, then
	T#1 acquires chan lock, fills rbuf, releases the lock, then
	...
	T#MBOX_TX_QUEUE_LEN returns -ENOBUFS;
We shall drain the software queue as fast as we can, while still holding
the channel lock.

2/ tx_tick() releases the lock after setting chan->active_req = NULL.
This gives again the possibility for the software queue to fill up, as
described in case 1/.

Address the cases from above by draining the software queue while still
holding the channel lock.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
 drivers/mailbox/mailbox.c | 75 ++++++++++++++++++++++++++---------------------
 1 file changed, 41 insertions(+), 34 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 5cd8ae22207309fadbe8fe7f6fd8b4bc2c345cfd..b064a0bd98fd07bfa4dc4186c90e5989d5dfd510 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -26,8 +26,6 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
 {
 	int idx;
 
-	guard(spinlock_irqsave)(&chan->lock);
-
 	/* See if there is any space left */
 	if (chan->msg_count == MBOX_TX_QUEUE_LEN)
 		return -ENOBUFS;
@@ -48,49 +46,49 @@ static void msg_submit(struct mbox_chan *chan)
 {
 	unsigned count, idx;
 	void *data;
-	int err = -EBUSY;
-
-	scoped_guard(spinlock_irqsave, &chan->lock) {
-		if (!chan->msg_count || chan->active_req)
-			break;
 
-		count = chan->msg_count;
-		idx = chan->msg_free;
-		if (idx >= count)
-			idx -= count;
-		else
-			idx += MBOX_TX_QUEUE_LEN - count;
+	count = chan->msg_count;
+	idx = chan->msg_free;
+	if (idx >= count)
+		idx -= count;
+	else
+		idx += MBOX_TX_QUEUE_LEN - count;
 
-		data = chan->msg_data[idx];
+	data = chan->msg_data[idx];
 
-		if (chan->cl->tx_prepare)
-			chan->cl->tx_prepare(chan->cl, data);
-		/* Try to submit a message to the MBOX controller */
-		err = chan->mbox->ops->send_data(chan, data);
-		if (!err) {
-			chan->active_req = data;
-			chan->msg_count--;
-		}
+	if (chan->cl->tx_prepare)
+		chan->cl->tx_prepare(chan->cl, data);
+	/* Try to submit a message to the MBOX controller */
+	if (!chan->mbox->ops->send_data(chan, data)) {
+		chan->active_req = data;
+		chan->msg_count--;
 	}
+}
 
-	if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
-		/* kick start the timer immediately to avoid delays */
-		scoped_guard(spinlock_irqsave, &chan->mbox->poll_hrt_lock)
-			hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
-	}
+static void mbox_kick_start_timer(struct mbox_chan *chan)
+{
+	/* kick start the timer immediately to avoid delays */
+	scoped_guard(spinlock_irqsave, &chan->mbox->poll_hrt_lock)
+		hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
 }
 
 static void tx_tick(struct mbox_chan *chan, int r)
 {
+	bool sent = false;
 	void *mssg;
 
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		mssg = chan->active_req;
 		chan->active_req = NULL;
+
+		if (chan->msg_count) {
+			msg_submit(chan);
+			sent = true;
+		}
 	}
 
-	/* Submit next message */
-	msg_submit(chan);
+	if (sent && (chan->txdone_method & TXDONE_BY_POLL))
+		mbox_kick_start_timer(chan);
 
 	if (!mssg)
 		return;
@@ -243,18 +241,27 @@ EXPORT_SYMBOL_GPL(mbox_client_peek_data);
  */
 int mbox_send_message(struct mbox_chan *chan, void *mssg)
 {
+	bool sent = false;
 	int t;
 
 	if (!chan || !chan->cl)
 		return -EINVAL;
 
-	t = add_to_rbuf(chan, mssg);
-	if (t < 0) {
-		dev_err(chan->mbox->dev, "Try increasing MBOX_TX_QUEUE_LEN\n");
-		return t;
+	scoped_guard(spinlock_irqsave, &chan->lock) {
+		t = add_to_rbuf(chan, mssg);
+		if (t < 0) {
+			dev_err(chan->mbox->dev, "Try increasing MBOX_TX_QUEUE_LEN\n");
+			return t;
+		}
+
+		if (!chan->active_req) {
+			msg_submit(chan);
+			sent = true;
+		}
 	}
 
-	msg_submit(chan);
+	if (sent && (chan->txdone_method & TXDONE_BY_POLL))
+		mbox_kick_start_timer(chan);
 
 	if (chan->cl->tx_block) {
 		unsigned long wait;

---
base-commit: a0bea9e39035edc56a994630e6048c8a191a99d8
change-id: 20250606-mbox-drop-reacquire-lock-14b7c5df65bd

Best regards,
-- 
Tudor Ambarus <tudor.ambarus@linaro.org>


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mailbox: stop the release and reacquire of the chan lock
  2025-06-06 13:41 [PATCH] mailbox: stop the release and reacquire of the chan lock Tudor Ambarus
@ 2025-06-22 23:41 ` Jassi Brar
  2025-07-04 12:30   ` Tudor Ambarus
  0 siblings, 1 reply; 6+ messages in thread
From: Jassi Brar @ 2025-06-22 23:41 UTC (permalink / raw)
  To: Tudor Ambarus
  Cc: peter.griffin, andre.draszik, willmcvicker, cristian.marussi,
	sudeep.holla, kernel-team, arm-scmi, linux-kernel

On Fri, Jun 6, 2025 at 8:41 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>
> There are two cases where the chan lock is released and reacquired
> were it shouldn't really be:
>
> 1/ released at the end of add_to_rbuf() and reacquired at the beginning
> of msg_submit(). After the lock is released at the end of add_to_rbuf(),
> if the mailbox core is under heavy load, the mailbox software queue may
> fill up without any of the threads getting the chance to drain the
> software queue.
>         T#0 acquires chan lock, fills rbuf, releases the lock, then
>         T#1 acquires chan lock, fills rbuf, releases the lock, then
>         ...
>         T#MBOX_TX_QUEUE_LEN returns -ENOBUFS;
> We shall drain the software queue as fast as we can, while still holding
> the channel lock.
>
I don't see any issue to fix to begin with.
T#0 does drain the queue by moving on to submit the message after
adding it to the rbuf.
And until the tx is done, T#1 would still be only adding to the rbuf
because of chan->active_req.

> 2/ tx_tick() releases the lock after setting chan->active_req = NULL.
> This gives again the possibility for the software queue to fill up, as
> described in case 1/.
>
This again is not an issue. The user(s) should account for the fact
that the message bus
 may be busy and there can be only limited buffers in the queue.

Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mailbox: stop the release and reacquire of the chan lock
  2025-06-22 23:41 ` Jassi Brar
@ 2025-07-04 12:30   ` Tudor Ambarus
  2025-07-06  2:19     ` Jassi Brar
  0 siblings, 1 reply; 6+ messages in thread
From: Tudor Ambarus @ 2025-07-04 12:30 UTC (permalink / raw)
  To: Jassi Brar
  Cc: peter.griffin, andre.draszik, willmcvicker, cristian.marussi,
	sudeep.holla, kernel-team, arm-scmi, linux-kernel

Hi, Jassi,

Sorry for the delay, I was out for a while.

On 6/23/25 12:41 AM, Jassi Brar wrote:
> On Fri, Jun 6, 2025 at 8:41 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>
>> There are two cases where the chan lock is released and reacquired
>> were it shouldn't really be:
>>
>> 1/ released at the end of add_to_rbuf() and reacquired at the beginning
>> of msg_submit(). After the lock is released at the end of add_to_rbuf(),
>> if the mailbox core is under heavy load, the mailbox software queue may
>> fill up without any of the threads getting the chance to drain the
>> software queue.
>>         T#0 acquires chan lock, fills rbuf, releases the lock, then
>>         T#1 acquires chan lock, fills rbuf, releases the lock, then
>>         ...
>>         T#MBOX_TX_QUEUE_LEN returns -ENOBUFS;
>> We shall drain the software queue as fast as we can, while still holding
>> the channel lock.
>>
> I don't see any issue to fix to begin with.
> T#0 does drain the queue by moving on to submit the message after
> adding it to the rbuf.

The problem is that the code releases the chan->lock after adding the
message to rbuf and then reacquires it on submit. A thread can be
preempted after add_to_rbuf(), without getting the chance to get to
msg_submit().

Let's assume that
T#0 adds to rbuf and gets preempted by T#1
T#1 adds to rbuf and gets preempted by T#2
...
T#n-1 adds to rbuf and gets preempted by T#n

We fill the mailbox software queue without any thread getting to
msg_submit().

Thanks,
ta

> And until the tx is done, T#1 would still be only adding to the rbuf
> because of chan->active_req.
> 
>> 2/ tx_tick() releases the lock after setting chan->active_req = NULL.
>> This gives again the possibility for the software queue to fill up, as
>> described in case 1/.
>>
> This again is not an issue. The user(s) should account for the fact
> that the message bus
>  may be busy and there can be only limited buffers in the queue.
> 
> Thanks


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mailbox: stop the release and reacquire of the chan lock
  2025-07-04 12:30   ` Tudor Ambarus
@ 2025-07-06  2:19     ` Jassi Brar
  2025-07-07  4:42       ` Peng Fan
  0 siblings, 1 reply; 6+ messages in thread
From: Jassi Brar @ 2025-07-06  2:19 UTC (permalink / raw)
  To: Tudor Ambarus
  Cc: peter.griffin, andre.draszik, willmcvicker, cristian.marussi,
	sudeep.holla, kernel-team, arm-scmi, linux-kernel

On Fri, Jul 4, 2025 at 7:30 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>
> Hi, Jassi,
>
> Sorry for the delay, I was out for a while.
>
> On 6/23/25 12:41 AM, Jassi Brar wrote:
> > On Fri, Jun 6, 2025 at 8:41 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> >>
> >> There are two cases where the chan lock is released and reacquired
> >> were it shouldn't really be:
> >>
> >> 1/ released at the end of add_to_rbuf() and reacquired at the beginning
> >> of msg_submit(). After the lock is released at the end of add_to_rbuf(),
> >> if the mailbox core is under heavy load, the mailbox software queue may
> >> fill up without any of the threads getting the chance to drain the
> >> software queue.
> >>         T#0 acquires chan lock, fills rbuf, releases the lock, then
> >>         T#1 acquires chan lock, fills rbuf, releases the lock, then
> >>         ...
> >>         T#MBOX_TX_QUEUE_LEN returns -ENOBUFS;
> >> We shall drain the software queue as fast as we can, while still holding
> >> the channel lock.
> >>
> > I don't see any issue to fix to begin with.
> > T#0 does drain the queue by moving on to submit the message after
> > adding it to the rbuf.
>
> The problem is that the code releases the chan->lock after adding the
> message to rbuf and then reacquires it on submit. A thread can be
> preempted after add_to_rbuf(), without getting the chance to get to
> msg_submit().
>
> Let's assume that
> T#0 adds to rbuf and gets preempted by T#1
> T#1 adds to rbuf and gets preempted by T#2
> ...
> T#n-1 adds to rbuf and gets preempted by T#n
>
> We fill the mailbox software queue without any thread getting to
> msg_submit().
>
I get that but I still don't see a problem.
When the queue gets filled the next submission will be rejected and
have to wait until the mailbox gets some work done. Which is the
expected behaviour. And will be the same even if we don't release the
lock between add_to_rbug and and msg_submit and there are enough
mbox_send_message calls coming in faster than the transmission bus.
Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mailbox: stop the release and reacquire of the chan lock
  2025-07-06  2:19     ` Jassi Brar
@ 2025-07-07  4:42       ` Peng Fan
  2025-07-07  8:04         ` Tudor Ambarus
  0 siblings, 1 reply; 6+ messages in thread
From: Peng Fan @ 2025-07-07  4:42 UTC (permalink / raw)
  To: Jassi Brar, Tudor Ambarus
  Cc: Tudor Ambarus, peter.griffin, andre.draszik, willmcvicker,
	cristian.marussi, sudeep.holla, kernel-team, arm-scmi,
	linux-kernel

Hi Jassi, Tudor
On Sat, Jul 05, 2025 at 09:19:08PM -0500, Jassi Brar wrote:
>On Fri, Jul 4, 2025 at 7:30???AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>
>> Hi, Jassi,
>>
>> Sorry for the delay, I was out for a while.
>>
>> On 6/23/25 12:41 AM, Jassi Brar wrote:
>> > On Fri, Jun 6, 2025 at 8:41???AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>> >>
>> >> There are two cases where the chan lock is released and reacquired
>> >> were it shouldn't really be:
>> >>
>> >> 1/ released at the end of add_to_rbuf() and reacquired at the beginning
>> >> of msg_submit(). After the lock is released at the end of add_to_rbuf(),
>> >> if the mailbox core is under heavy load, the mailbox software queue may
>> >> fill up without any of the threads getting the chance to drain the
>> >> software queue.
>> >>         T#0 acquires chan lock, fills rbuf, releases the lock, then
>> >>         T#1 acquires chan lock, fills rbuf, releases the lock, then
>> >>         ...
>> >>         T#MBOX_TX_QUEUE_LEN returns -ENOBUFS;
>> >> We shall drain the software queue as fast as we can, while still holding
>> >> the channel lock.
>> >>
>> > I don't see any issue to fix to begin with.
>> > T#0 does drain the queue by moving on to submit the message after
>> > adding it to the rbuf.
>>
>> The problem is that the code releases the chan->lock after adding the
>> message to rbuf and then reacquires it on submit. A thread can be
>> preempted after add_to_rbuf(), without getting the chance to get to
>> msg_submit().
>>
>> Let's assume that
>> T#0 adds to rbuf and gets preempted by T#1
>> T#1 adds to rbuf and gets preempted by T#2
>> ...
>> T#n-1 adds to rbuf and gets preempted by T#n
>>
>> We fill the mailbox software queue without any thread getting to
>> msg_submit().

I try to understand this case. Is this patch from code inspection or
the issue could be reproduced on real platform?

If the client continuously issues 'mbox_send_message',
there might be possibility that 'msg_submit' does not have chance to get
get chan lock.

>>
>I get that but I still don't see a problem.
>When the queue gets filled the next submission will be rejected and
>have to wait until the mailbox gets some work done. Which is the
>expected behaviour. And will be the same even if we don't release the
>lock between add_to_rbug and and msg_submit and there are enough
>mbox_send_message calls coming in faster than the transmission bus.

Not sure I get it clear, do you mean this?
When the mailbox queue is full, any new message submission will be rejected.
The sender must wait until space becomes available as messages are processed.

This is the expected behavior and ensures proper flow control. Even if we
avoid releasing the lock between add_to_rbuf() and msg_submit(), the outcome
remains the same: if mbox_send_message() calls arrive faster than the bus can
transmit, the queue will eventually fill up and block further submissions.

Thanks,
Peng

>Thanks.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mailbox: stop the release and reacquire of the chan lock
  2025-07-07  4:42       ` Peng Fan
@ 2025-07-07  8:04         ` Tudor Ambarus
  0 siblings, 0 replies; 6+ messages in thread
From: Tudor Ambarus @ 2025-07-07  8:04 UTC (permalink / raw)
  To: Peng Fan, Jassi Brar
  Cc: peter.griffin, andre.draszik, willmcvicker, cristian.marussi,
	sudeep.holla, kernel-team, arm-scmi, linux-kernel



On 7/7/25 5:42 AM, Peng Fan wrote:
> Hi Jassi, Tudor

hiya. Thanks for chiming in.

> On Sat, Jul 05, 2025 at 09:19:08PM -0500, Jassi Brar wrote:
>> On Fri, Jul 4, 2025 at 7:30???AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>>
>>> Hi, Jassi,
>>>
>>> Sorry for the delay, I was out for a while.
>>>
>>> On 6/23/25 12:41 AM, Jassi Brar wrote:
>>>> On Fri, Jun 6, 2025 at 8:41???AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>>>>
>>>>> There are two cases where the chan lock is released and reacquired
>>>>> were it shouldn't really be:
>>>>>
>>>>> 1/ released at the end of add_to_rbuf() and reacquired at the beginning
>>>>> of msg_submit(). After the lock is released at the end of add_to_rbuf(),
>>>>> if the mailbox core is under heavy load, the mailbox software queue may
>>>>> fill up without any of the threads getting the chance to drain the
>>>>> software queue.
>>>>>         T#0 acquires chan lock, fills rbuf, releases the lock, then
>>>>>         T#1 acquires chan lock, fills rbuf, releases the lock, then
>>>>>         ...
>>>>>         T#MBOX_TX_QUEUE_LEN returns -ENOBUFS;
>>>>> We shall drain the software queue as fast as we can, while still holding
>>>>> the channel lock.
>>>>>
>>>> I don't see any issue to fix to begin with.
>>>> T#0 does drain the queue by moving on to submit the message after
>>>> adding it to the rbuf.
>>>
>>> The problem is that the code releases the chan->lock after adding the
>>> message to rbuf and then reacquires it on submit. A thread can be
>>> preempted after add_to_rbuf(), without getting the chance to get to
>>> msg_submit().
>>>
>>> Let's assume that
>>> T#0 adds to rbuf and gets preempted by T#1
>>> T#1 adds to rbuf and gets preempted by T#2
>>> ...
>>> T#n-1 adds to rbuf and gets preempted by T#n
>>>
>>> We fill the mailbox software queue without any thread getting to
>>> msg_submit().
> 
> I try to understand this case. Is this patch from code inspection or
> the issue could be reproduced on real platform?

code inspection. ACPM, the mailbox client that I'm using, does not yet
have enough clients to hit this.

> 
> If the client continuously issues 'mbox_send_message',
> there might be possibility that 'msg_submit' does not have chance to get
> get chan lock.

right. That's why I proposed to drain the FIFO before new requests are
introduced. This is a common practice when dealing with queues.

"Draining a FIFO" means ensuring that all existing data or requests
currently stored within the FIFO are processed, transmitted, or consumed
before new data or requests are introduced.

> 
>>>
>> I get that but I still don't see a problem.
>> When the queue gets filled the next submission will be rejected and
>> have to wait until the mailbox gets some work done. Which is the

yes, but it's not optimal.

>> expected behaviour. And will be the same even if we don't release the
>> lock between add_to_rbug and and msg_submit and there are enough
>> mbox_send_message calls coming in faster than the transmission bus.

I agree, but you're exposing a different scenario.

My scenario was that on high client requests the code may fill the queue
without any of the requests getting the chance to be consumed. Nothing
gets to the transmission bus.

If we don't release the lock between add_to_rbuf and and msg_submit we
make sure that all the requests in the FIFO are transmitted to the
mailbox controller, thus we allow the controller to consume the FIFO. We
shall fill the FIFO harder, as the controller consumes the requests faster.

> 
> Not sure I get it clear, do you mean this?
> When the mailbox queue is full, any new message submission will be rejected.
> The sender must wait until space becomes available as messages are processed.
> 
> This is the expected behavior and ensures proper flow control. Even if we
> avoid releasing the lock between add_to_rbuf() and msg_submit(), the outcome
> remains the same: if mbox_send_message() calls arrive faster than the bus can
> transmit, the queue will eventually fill up and block further submissions.

I guess I explained above the differences.

Cheers,
ta

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-07-07  8:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-06 13:41 [PATCH] mailbox: stop the release and reacquire of the chan lock Tudor Ambarus
2025-06-22 23:41 ` Jassi Brar
2025-07-04 12:30   ` Tudor Ambarus
2025-07-06  2:19     ` Jassi Brar
2025-07-07  4:42       ` Peng Fan
2025-07-07  8:04         ` Tudor Ambarus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).