Linux Remote Processor Subsystem development
 help / color / mirror / Atom feed
From: Arnaud POULIQUEN <arnaud.pouliquen@foss.st.com>
To: Tim Blechmann <tim@klingt.org>,
	Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Tim Blechmann <tim.blechmann@gmail.com>,
	<linux-remoteproc@vger.kernel.org>
Subject: Re: [PATCH 1/1] rpmsg: virtio_rpmsg_bus - prevent possible race condition
Date: Fri, 8 Sep 2023 17:04:03 +0200	[thread overview]
Message-ID: <a47f8cea-5dc4-cdb2-9c2d-daf84c6853e3@foss.st.com> (raw)
In-Reply-To: <00d5edfd-808f-51ac-0233-ce8489c6722c@klingt.org>



On 9/5/23 03:33, Tim Blechmann wrote:
>>>> when we cannot get a tx buffer (`get_a_tx_buf`) `rpmsg_upref_sleepers`
>>>> enables tx-complete interrupt.
>>>> however if the interrupt is executed after `get_a_tx_buf` and before
>>>> `rpmsg_upref_sleepers` we may mis the tx-complete interrupt and sleep
>>>> for the full 15 seconds.
>>>
>>>
>>> Is there any reason why your co-processor is unable to release the TX RPMSG
>>> buffers for 15 seconds? If not, you should first determine the reason why it is
>>> stalled.
>>
>> Arnaud's concern is valid.  If the remote processor can't consume a buffer
>> within 15 seconds, something is probably wrong.
>>
>> That said, I believe your assesment of the situation is correct.  *If* the TX
>> callback is disabled and there is no buffer available, there is a window of
>> opportunity between calls to get_a_tx_buf() and rpmsg_upref_sleepers() for an
>> interrupt to arrive in function rpmsg_send_offchannel_raw().
> 
> the remote processor certainly releases the tx buffer and according to my
> tracing the `vring_interrupt` fires immediately before `rpmsg_send` enters the
> `rpmsg_upref_sleepers`.


If I well understood your point, the issue occur in following race condition

- all the Tx buffers are used
- in rpmsg_send_offchannel_raw() function, we try to get a buffer using
get_a_tx_buf(vrp) that returns NULL
- rpmsg_xmit_done is called as a Tx buffer is released by the remote processor
  and now free
- in rpmsg_send_offchannel_raw() rpmsg_upref_sleepers is called

At this point you are nothing happen until 15 second because rpmsg_xmit_done is
never called  again that would wake up the waitqueue to call get_a_tx_buf()

I'm right?

If yes what is not clear to me is that wait_event_interruptible_timeout() seems
to test the condition (so call get_a_tx_buf()) before entering in sleep[1]. A
free TX buffer should be found at this step.

[1]https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L534

Regards,
Arnaud

> 
> after applying this patch we haven't been able to reproduce the 15s timeout
> anymore, whereas before we could easily reproduce it with certain workloads.
> 
>> 3) This patch gets applied when rc1 comes out so that it has 6 or 7 weeks to
>> soak.  No error are locks are reported due to this patch during that time.
> 
> mentioning locks: i was a bit uncertain about a good way to implement the retry,
> since both `rpmsg_upref_sleepers` and `get_a_tx_buf` both acquire the same
> mutex. i briefly considered to add `get_a_tx_buf` into `rpmsg_upref_sleepers` to
> avoid locking the same mutex multiple times, though it adds a bit of complexity
> to the implementation and harms readability a bit.
> are there any recommendations on this topic or are (likely non-contended) locks
> not expensive enough to justify the added complexity?
> 
> thanks,
> tim
> 
> 
>>
>>>
>>> Regards,
>>> Arnaud
>>>
>>>>
>>>> in this case, so we re-try once before we really start to sleep
>>>>
>>>> Signed-off-by: Tim Blechmann <tim@klingt.org>
>>>> ---
>>>>   drivers/rpmsg/virtio_rpmsg_bus.c | 24 +++++++++++++++---------
>>>>   1 file changed, 15 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c
>>>> b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>> index 905ac7910c98..2a9d42225e60 100644
>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>> @@ -587,21 +587,27 @@ static int rpmsg_send_offchannel_raw(struct
>>>> rpmsg_device *rpdev,
>>>>         /* no free buffer ? wait for one (but bail after 15 seconds) */
>>>>       while (!msg) {
>>>>           /* enable "tx-complete" interrupts, if not already enabled */
>>>>           rpmsg_upref_sleepers(vrp);
>>>>   -        /*
>>>> -         * sleep until a free buffer is available or 15 secs elapse.
>>>> -         * the timeout period is not configurable because there's
>>>> -         * little point in asking drivers to specify that.
>>>> -         * if later this happens to be required, it'd be easy to add.
>>>> -         */
>>>> -        err = wait_event_interruptible_timeout(vrp->sendq,
>>>> -                    (msg = get_a_tx_buf(vrp)),
>>>> -                    msecs_to_jiffies(15000));
>>>> +        /* make sure to retry to grab tx buffer before we start waiting */
>>>> +        msg = get_a_tx_buf(vrp);
>>>> +        if (msg) {
>>>> +            err = 0;
>>>> +        } else {
>>>> +            /*
>>>> +             * sleep until a free buffer is available or 15 secs elapse.
>>>> +             * the timeout period is not configurable because there's
>>>> +             * little point in asking drivers to specify that.
>>>> +             * if later this happens to be required, it'd be easy to add.
>>>> +             */
>>>> +            err = wait_event_interruptible_timeout(vrp->sendq,
>>>> +                        (msg = get_a_tx_buf(vrp)),
>>>> +                        msecs_to_jiffies(15000));
>>>> +        }
>>>>             /* disable "tx-complete" interrupts if we're the last sleeper */
>>>>           rpmsg_downref_sleepers(vrp);
>>>>             /* timeout ? */
>>>>           if (!err) {
>>
> 

  parent reply	other threads:[~2023-09-08 15:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-04  8:36 [PATCH 1/1] rpmsg: virtio_rpmsg_bus - prevent possible race condition Tim Blechmann
2023-09-04 13:52 ` Arnaud POULIQUEN
2023-09-04 20:43   ` Mathieu Poirier
2023-09-05  1:33     ` Tim Blechmann
2023-09-05 16:02       ` Mathieu Poirier
2023-09-08 15:04       ` Arnaud POULIQUEN [this message]
2023-09-09  6:28         ` Tim Blechmann
2023-09-11 17:20           ` Arnaud POULIQUEN
2023-09-13  1:07             ` Tim Blechmann
2023-09-13  1:11               ` Tim Blechmann
2023-09-13  7:44                 ` Arnaud POULIQUEN
2023-09-13  8:47                   ` Tim Blechmann
2023-09-13 10:02                     ` Arnaud POULIQUEN
     [not found]                     ` <CAG2LOc42AG5H56=tzz8_2WrrBiy9d74qYmgPQaEVGrzWTNqodg@mail.gmail.com>
2023-09-14 17:25                       ` Arnaud POULIQUEN
2023-09-16  1:38                         ` Tim Blechmann
2023-09-13 10:10     ` Arnaud POULIQUEN
2023-09-13 14:46       ` Mathieu Poirier
  -- strict thread matches above, loose matches on Subject: below --
2023-09-07  4:51 Tim Blechmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a47f8cea-5dc4-cdb2-9c2d-daf84c6853e3@foss.st.com \
    --to=arnaud.pouliquen@foss.st.com \
    --cc=linux-remoteproc@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=tim.blechmann@gmail.com \
    --cc=tim@klingt.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox