From: Arnaud POULIQUEN <arnaud.pouliquen@foss.st.com>
To: "Agostiño Carballeira"
<agostino.carballeira@native-instruments.com>,
"Tim Blechmann" <tim@klingt.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>,
Tim Blechmann <tim.blechmann@gmail.com>,
<linux-remoteproc@vger.kernel.org>
Subject: Re: [PATCH 1/1] rpmsg: virtio_rpmsg_bus - prevent possible race condition
Date: Thu, 14 Sep 2023 19:25:03 +0200 [thread overview]
Message-ID: <12d59a09-bb83-26e7-321e-1407b3f814e8@foss.st.com> (raw)
In-Reply-To: <CAG2LOc42AG5H56=tzz8_2WrrBiy9d74qYmgPQaEVGrzWTNqodg@mail.gmail.com>
On 9/13/23 12:57, Agostiño Carballeira wrote:
> Hello!
>
> I am the main developer of the CM4 firmware for this project.
> First of all thanks for taking the time to analyse the trace.
> Further analysis on my side has shown that the CM4 is not completely stalled in
> this situation, but it is stuck on a busywait loop within the MAILBOX_Notify
> function, awaiting a window to send a "buffer used" notification to the CA7. So
> it seems that the mailbox is locked both ways and neither side is giving way to
> unclog the traffic jam.
The Cortex-M4 is probably blocked, waiting for Linux mailbox to acknowledge a
mailbox notification[1].
Are you 100% sure that you have never exited this loop during the issue?
Could you provide the value of the 'id' variable and the call stack?
That would mean that the mailbox has not been acknowledged by Linux [2] or [3].
I don't understand how it could be possible...
[1]
https://github.com/STMicroelectronics/STM32CubeMP1/blob/master/Middlewares/Third_Party/OpenAMP/mw_if/platform_if/mbox_ipcc_template.c#L182
[2] https://elixir.bootlin.com/linux/latest/source/drivers/mailbox/stm32-ipcc.c#L105
[3] https://elixir.bootlin.com/linux/latest/source/drivers/mailbox/stm32-ipcc.c#L134
> Interestingly, when we replace rpmsg_send by rpmsg_trysend + busywait loop, this
> mutual stall doesn't happen at all.
What do you mean by busywait? Do you add a delay between 2 rpmsg_trysend()
calls? If yes, you probably add delay that avoid the issue.
That said rpmsg_trysend is recommended for baremetal to avoid to block the system.
> Does that give you any clues?
>
> Thanks
> Agos
>
> On Wed, Sep 13, 2023 at 10:47 AM Tim Blechmann <tim@klingt.org
> <mailto:tim@klingt.org>> wrote:
>
> many thanks for your analysis, very interesting.
>
> > please find below an extract of your trace with my analysis:
> >
> >
> > stm32mp1_bulk_p-390 [001] ..... 907.241226: rpmsg_send
> > <-rpmsg_intercore_send_buffer.constprop.0
> > stm32mp1_bulk_p-390 [001] ..... 907.241228: virtio_rpmsg_send
> <-rpmsg_send
> > stm32mp1_bulk_p-390 [001] ..... 907.241237: virtqueue_enable_cb
> > <-rpmsg_send_offchannel_raw
> > stm32mp1_bulk_p-390 [001] ..... 907.241239:
> virtqueue_enable_cb_prepare
> >
> > At this point seems that no more TX-buffer
> >
> > <-rpmsg_recv_single
> > kworker/0:4-67 [000] ..... 907.242533: vring_interrupt
> > <-rproc_vq_interrupt
> > kworker/0:4-67 [000] ..... 907.242536: rpmsg_xmit_done
> >
> > Here you receive an interrupt indicating that TX buffer has been released by
> > the remote. that's the expected behavior.
> >
> >
> > <-rpmsg_send_offchannel_raw
> > stm32mp1_bulk_p-390 [000] ..... 984.054941: rpmsg_send
> > <-rpmsg_intercore_send_buffer.constprop.0
> > stm32mp1_bulk_p-390 [000] ..... 984.054943: virtio_rpmsg_send
> <-rpmsg_send
> > stm32mp1_bulk_p-390 [000] ..... 984.054956: virtqueue_enable_cb
> > <-rpmsg_send_offchannel_raw
> > stm32mp1_bulk_p-390 [000] ..... 984.054958:
> virtqueue_enable_cb_prepare
> > <-virtqueue_enable_cb
> > stm32mp1_bulk_p-390 [000] ..... 999.398667: virtqueue_disable_cb
> > <-rpmsg_send_offchannel_raw
> > stm32mp1_bulk_p-390 [000] ..... 999.414840: rpmsg_send
> > <-rpmsg_intercore_send_buffer.constprop.0
> > stm32mp1_bulk_p-390 [000] ..... 999.414843: virtio_rpmsg_send
> <-rpmsg_send
> > stm32mp1_bulk_p-390 [000] ..... 999.414855: virtqueue_enable_cb
> > <-rpmsg_send_offchannel_raw
> > stm32mp1_bulk_p-390 [000] ..... 999.414857:
> virtqueue_enable_cb_prepare
> >
> > Here you have again no more TX buffer. From this point there is no more
> activity
> > neither in TX nor in RX until the timeout of 15 seconds.
> > If you have a look to rproc_vq_interrupt the last one occurs at 907.242533
> >
> >
> > As you have no more virtqueue interrupts call for both directions, the
> issue is
> > probably either in the Cortex-M firmware, which seems to be stalled, or
> due to a
> > disable of the IRQs in Linux.
>
> afaict we can rule out a complete stall of the cortex-m firmware: if we
> change the rpmsg_send to a rpmsg_trysend/msleep loop, the trysend will
> succeed to get a buffer after a few iterations.
>
> > or due to a disable of the IRQs in Linux.
>
> do you have some recommendations how we could trace this?
>
> many thanks,
> tim
>
> > <-virtqueue_enable_cb
> > stm32mp1_bulk_p-390 [000] ..... 1014.758678: virtqueue_disable_cb
> > <-rpmsg_send_offchannel_raw
> > stm32mp1_bulk_p-390 [000] ..... 1014.774802: rpmsg_send
> > <-rpmsg_intercore_send_buffer.constprop.0
> > stm32mp1_bulk_p-390 [000] ..... 1014.774804: virtio_rpmsg_send
> <-rpmsg_send
> > stm32mp1_bulk_p-390 [000] ..... 1014.774815: virtqueue_enable_cb
>
>
>
>
> --
>
> Agostiño Carballeira
>
> Senior Embedded Software Engineer
>
> agostino.carballeira@native-instruments.com
> <mailto:agostino.carballeira@native-instruments.com>
>
>
> Native Instruments <https://www.native-instruments.com>– now including iZotope,
> Plugin Alliance, and Brainworx
next prev parent reply other threads:[~2023-09-14 17:26 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-04 8:36 [PATCH 1/1] rpmsg: virtio_rpmsg_bus - prevent possible race condition Tim Blechmann
2023-09-04 13:52 ` Arnaud POULIQUEN
2023-09-04 20:43 ` Mathieu Poirier
2023-09-05 1:33 ` Tim Blechmann
2023-09-05 16:02 ` Mathieu Poirier
2023-09-08 15:04 ` Arnaud POULIQUEN
2023-09-09 6:28 ` Tim Blechmann
2023-09-11 17:20 ` Arnaud POULIQUEN
2023-09-13 1:07 ` Tim Blechmann
2023-09-13 1:11 ` Tim Blechmann
2023-09-13 7:44 ` Arnaud POULIQUEN
2023-09-13 8:47 ` Tim Blechmann
2023-09-13 10:02 ` Arnaud POULIQUEN
[not found] ` <CAG2LOc42AG5H56=tzz8_2WrrBiy9d74qYmgPQaEVGrzWTNqodg@mail.gmail.com>
2023-09-14 17:25 ` Arnaud POULIQUEN [this message]
2023-09-16 1:38 ` Tim Blechmann
2023-09-13 10:10 ` Arnaud POULIQUEN
2023-09-13 14:46 ` Mathieu Poirier
-- strict thread matches above, loose matches on Subject: below --
2023-09-07 4:51 Tim Blechmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=12d59a09-bb83-26e7-321e-1407b3f814e8@foss.st.com \
--to=arnaud.pouliquen@foss.st.com \
--cc=agostino.carballeira@native-instruments.com \
--cc=linux-remoteproc@vger.kernel.org \
--cc=mathieu.poirier@linaro.org \
--cc=tim.blechmann@gmail.com \
--cc=tim@klingt.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox