[RFC PATCH] firmware: arm_scmi: Support mailbox transports with no completion IRQ

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] firmware: arm_scmi: Support mailbox transports with no completion IRQ
@ 2025-01-20 21:24 Radu Rendec
  2025-01-21 10:49 ` Cristian Marussi
  0 siblings, 1 reply; 5+ messages in thread
From: Radu Rendec @ 2025-01-20 21:24 UTC (permalink / raw)
  To: arm-scmi, linux-arm-kernel; +Cc: Sudeep Holla, Cristian Marussi

With the introduction of no_completion_irq in struct scmi_chan_info in
commit a690b7e6e774 ("firmware: arm_scmi: Add configurable polling mode
for transports") it became possible to enable polling mode by default
when the transport does not support completion interrupts.

Mailbox controllers on the other hand have a similar mechanism to
indicate if completion interrupts are available, using the txdone_irq
flag in struct mbox_controller. This is available since the introduction
of the mailbox framework in commit 2b6d83e2b8b7 ("mailbox: Introduce
framework for mailbox").

Since the mailbox framework is already aware of whether the underlying
transport supports completion interrupts or not, set the flag correctly
in mailbox_chan_setup() to propagate the information to the SCMI core
layer. Without this change, scmi_wait_for_reply() would end up taking
the second branch for a mailbox transport with no completion IRQ
support, and time out waiting for an interrupt that never comes.

Signed-off-by: Radu Rendec <rrendec@redhat.com>
---
 drivers/firmware/arm_scmi/transports/mailbox.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/firmware/arm_scmi/transports/mailbox.c b/drivers/firmware/arm_scmi/transports/mailbox.c
index b66df29814566..33722f2d82787 100644
--- a/drivers/firmware/arm_scmi/transports/mailbox.c
+++ b/drivers/firmware/arm_scmi/transports/mailbox.c
@@ -9,6 +9,7 @@
 #include <linux/err.h>
 #include <linux/device.h>
 #include <linux/mailbox_client.h>
+#include <linux/mailbox_controller.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/platform_device.h>
@@ -221,6 +222,9 @@ static int mailbox_chan_setup(struct scmi_chan_info *cinfo, struct device *dev,
 		return ret;
 	}

+	if (!smbox->chan->mbox->txdone_irq)
+		cinfo->no_completion_irq = true;
+
 	/* Additional unidirectional channel for TX if needed */
 	if (tx && a2p_rx_chan) {
 		smbox->chan_receiver = mbox_request_channel(cl, a2p_rx_chan);
-- 
2.47.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] firmware: arm_scmi: Support mailbox transports with no completion IRQ
  2025-01-20 21:24 [RFC PATCH] firmware: arm_scmi: Support mailbox transports with no completion IRQ Radu Rendec
@ 2025-01-21 10:49 ` Cristian Marussi
  2025-01-22  2:19   ` Radu Rendec
  0 siblings, 1 reply; 5+ messages in thread
From: Cristian Marussi @ 2025-01-21 10:49 UTC (permalink / raw)
  To: Radu Rendec; +Cc: arm-scmi, linux-arm-kernel, Sudeep Holla, Cristian Marussi

On Mon, Jan 20, 2025 at 04:24:18PM -0500, Radu Rendec wrote:
> With the introduction of no_completion_irq in struct scmi_chan_info in
> commit a690b7e6e774 ("firmware: arm_scmi: Add configurable polling mode
> for transports") it became possible to enable polling mode by default
> when the transport does not support completion interrupts.
> 

Hi Radu,

> Mailbox controllers on the other hand have a similar mechanism to
> indicate if completion interrupts are available, using the txdone_irq
> flag in struct mbox_controller. This is available since the introduction
> of the mailbox framework in commit 2b6d83e2b8b7 ("mailbox: Introduce
> framework for mailbox").
> 

mmm...I dont think you can do this....it is a bit tricky to explain BUT
the optional txdone_irq in the mailbox subsystem is NOT a completion
interrupt as intended in the SCMI world, I'll try to explain why in the
following.

The SCMI mailbox transport is defined by a mailbox and an associated
shared memory area: when an SCMI client like Linux wants to send a
message to the server it places the message in the shmem, set the
channel BUSY bit, and rings the doorbell to alert the other side; at 
this point the SCMI server wakes up grab the message from the shmem,
process it as it sees fit, places the reply back into the same shmem 
area, clears the channnel BUSY bit, and CAN optionally ring back the
client with the mailbox doorbell: this completion IRQ is OPTIONAL, and,
if missing, the client will have instead to revert to polling the well
defined channel BUSY bit in the shmem area to know when the reply from
the server is ready.
The end result is that the channel is kept busy until the server has
replied, and cannot be reused until we are sure that the reply is available
AND that the client execution path, waiting for it, has comsumed such
expected reply message. (or times out)
Alternatively, even if we have this completion IRQ we can decide, client
side, to start a polling trabnsaction and dont use the completion
IRQ...but this is another sroty.

Now, in the mailbox subsystem the txdone_irq, when present is meant to
represnt a txACK IRQ, sent automatically by the mailbox controller, when
the transmission has completed and the issued command has been read by
the server(usually when the server clears some specific reg): it does NOT
assure the client that the server has processed the request and that a
reply has been placed into the shmem area, so it is not what SCMI intend
of a completion interrupt: it is just a mere indication that the
transmission client->server has completed in the mailbox and that the
mailbox channel(doorbell) is now free annd available for another
transmission.

Indeed, such TxACK, when received by the mailbox driver causes the next
enqueued message to be sent (i.e. the doorbell to be ring), and, while
it is certainly useful in some other context in which you could use such
TxACK to deduce that the channel is free and available for another
transmission, it is not enough in the SCMI world to guarantee that the
SCMI reply has come back and has been processed by the client.

From another point of view, you could say that the problem is that the
mailbox/doorbell/TxACK is only one side of the story, the other part,
the shared memory area used for the effective message transmission, is
unknown to the mailbox controller and in control of the SCMI stack, so
you cannot let the mailbox subsystem decide when to queue new messages
on the same shared memory area used for cmds and replies.

In fact, such TxACk interrupt has no practical usage in the SCMI stack
and it is even counterproductive to have it, since it can cause the
client to mistakenly attempt the next transmission before the previous
one has completed, so overwriting the in-flight reply.
(..and our busy-looping client side on the channel BUSY bit does not
help here...)

The following commit, though, needed mainly for different reasons, should
have indeed solved also the issue with mailbox controllers having a TxACK,
since it introduces a global channel lock that serializes and inhibits any
further transmissions, at the SCMI layer, even on systems that has a working
and enabled TxACK IRQ.

commit da1642bc97c4ef67f347edcd493bd0a52f88777b (tag: scmi-fixes-6.12)
Author: Justin Chen <justin.chen@broadcom.com>
Date:   Mon Oct 14 09:07:17 2024 -0700

    firmware: arm_scmi: Queue in scmi layer for mailbox implementation

In general, anyway, it would be even better, if possible, not to enable
at all such txACK IRQ at the mailbox level when such a controller is used
for SCMI: as an example, in arm_mhuv3.c, NOT even defining the txACK in
the DT when describimng the mailbox controller does the trick.

Did you see any specific anomaly regarding the SCMI stack when using a
mailbox controller which provides a txACK ?

Sorry for the not so short mail :P ... but if I am missing something
and you are seeing anomalies, please let us know.

Thanks,
Cristian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] firmware: arm_scmi: Support mailbox transports with no completion IRQ
  2025-01-21 10:49 ` Cristian Marussi
@ 2025-01-22  2:19   ` Radu Rendec
  2025-01-22 12:14     ` Cristian Marussi
  0 siblings, 1 reply; 5+ messages in thread
From: Radu Rendec @ 2025-01-22  2:19 UTC (permalink / raw)
  To: Cristian Marussi; +Cc: arm-scmi, linux-arm-kernel, Sudeep Holla

Hi Cristian,

On Tue, 2025-01-21 at 10:49 +0000, Cristian Marussi wrote:
> mmm...I dont think you can do this....it is a bit tricky to explain BUT
> the optional txdone_irq in the mailbox subsystem is NOT a completion
> interrupt as intended in the SCMI world, I'll try to explain why in the
> following.
> 
> The SCMI mailbox transport is defined by a mailbox and an associated
> shared memory area: when an SCMI client like Linux wants to send a
> message to the server it places the message in the shmem, set the
> channel BUSY bit, and rings the doorbell to alert the other side; at 
> this point the SCMI server wakes up grab the message from the shmem,
> process it as it sees fit, places the reply back into the same shmem 
> area, clears the channnel BUSY bit, and CAN optionally ring back the
> client with the mailbox doorbell: this completion IRQ is OPTIONAL, and,
> if missing, the client will have instead to revert to polling the well
> defined channel BUSY bit in the shmem area to know when the reply from
> the server is ready.
> The end result is that the channel is kept busy until the server has
> replied, and cannot be reused until we are sure that the reply is available
> AND that the client execution path, waiting for it, has comsumed such
> expected reply message. (or times out)
> Alternatively, even if we have this completion IRQ we can decide, client
> side, to start a polling trabnsaction and dont use the completion
> IRQ...but this is another sroty.
> 
> Now, in the mailbox subsystem the txdone_irq, when present is meant to
> represnt a txACK IRQ, sent automatically by the mailbox controller, when
> the transmission has completed and the issued command has been read by
> the server(usually when the server clears some specific reg): it does NOT
> assure the client that the server has processed the request and that a
> reply has been placed into the shmem area, so it is not what SCMI intend
> of a completion interrupt: it is just a mere indication that the
> transmission client->server has completed in the mailbox and that the
> mailbox channel(doorbell) is now free annd available for another
> transmission.
> 
> Indeed, such TxACK, when received by the mailbox driver causes the next
> enqueued message to be sent (i.e. the doorbell to be ring), and, while
> it is certainly useful in some other context in which you could use such
> TxACK to deduce that the channel is free and available for another
> transmission, it is not enough in the SCMI world to guarantee that the
> SCMI reply has come back and has been processed by the client.
> 
> From another point of view, you could say that the problem is that the
> mailbox/doorbell/TxACK is only one side of the story, the other part,
> the shared memory area used for the effective message transmission, is
> unknown to the mailbox controller and in control of the SCMI stack, so
> you cannot let the mailbox subsystem decide when to queue new messages
> on the same shared memory area used for cmds and replies.
> 
> In fact, such TxACk interrupt has no practical usage in the SCMI stack
> and it is even counterproductive to have it, since it can cause the
> client to mistakenly attempt the next transmission before the previous
> one has completed, so overwriting the in-flight reply.
> (..and our busy-looping client side on the channel BUSY bit does not
> help here...)
> 
> The following commit, though, needed mainly for different reasons, should
> have indeed solved also the issue with mailbox controllers having a TxACK,
> since it introduces a global channel lock that serializes and inhibits any
> further transmissions, at the SCMI layer, even on systems that has a working
> and enabled TxACK IRQ.
> 
> commit da1642bc97c4ef67f347edcd493bd0a52f88777b (tag: scmi-fixes-6.12)
> Author: Justin Chen <justin.chen@broadcom.com>
> Date:   Mon Oct 14 09:07:17 2024 -0700
> 
>     firmware: arm_scmi: Queue in scmi layer for mailbox implementation
> 
> In general, anyway, it would be even better, if possible, not to enable
> at all such txACK IRQ at the mailbox level when such a controller is used
> for SCMI: as an example, in arm_mhuv3.c, NOT even defining the txACK in
> the DT when describimng the mailbox controller does the trick.
> 
> Did you see any specific anomaly regarding the SCMI stack when using a
> mailbox controller which provides a txACK ?
> 
> Sorry for the not so short mail :P ... but if I am missing something
> and you are seeing anomalies, please let us know.

No need to apologize, and many thanks for taking the time to explain
everything in such detail. Much appreciated! What you explained makes
perfect sense, and now I clearly understand why the changes I proposed
would not work.

I have quite the opposite problem. I'm using a mailbox controller that
does *not* provide a txACK interrupt (or any kind of interrupt for that
matter), and the problem is that SCMI transfers do not complete because
scmi_wait_for_reply() times out.

I tracked this down to xfer->hdr.poll_completion being false, and
therefore scmi_wait_for_reply() taking the second branch, where it
calls wait_for_completion_timeout(). Looking at do_xfer(),
xfer->hdr.poll_completion is set to true if is_polling_enabled()
returns true, and one of the conditions for this to happen is that the
no_completion_irq flag in struct scmi_chan_info is set to true. But it
defaults to false, and in my case there is nothing there to set it to
true. This is what led me to the RFC patch I sent.

Even if now that you explained it I understand why the txACK interrupt
is different from the SCMI message having been processed and replied, I
still fail to understand how it's all supposed to work for the mailbox
transport case where the completion IRQ (which you said was optional)
is indeed *not* implemented.

Since xfer->hdr.poll_completion defaults to false, I looked at what
could trigger the completion that scmi_wait_for_reply() waits on, in
the case of a mailbox transport. The way it's set up is quite
convoluted but if I'm reading the code correctly:
 * core->rx_callback (in mailbox.c) is set to scmi_rx_callback()
 * chan->cl->rx_callback is set to rx_callback() (in mailbox.c)
Therefore, the (reverse) call path leading to completion would be:
complete(&xfer->done)
scmi_handle_response()
scmi_rx_callback()
rx_callback()
mbox_chan_received_data()

If I understand correctly, mbox_chan_received_data() is supposed to be
called by the mailbox provider driver when the mailbox receives data
e.g. the SCMI server rings the bell to indicate completion. But doesn't
this mean that the completion notification (through mailbox doorbell)
is expected to be implemented by default?

Now I get that the mailbox layer has no knowledge of the client layer
(SCMI in this case) and that the txACK IRQ in the mailbox layer is not
meaningful or useful to the SCMI layer. But in my (uneducated) opinion,
there should still be a way to set the no_completion_irq flag in struct
scmi_chan_info to true for the mailbox transport, for the case when the
SCMI server does not implement the completion notification. Perhaps
through the device tree?

Now it's my turn to apologize for a long email :) Hopefully it makes
sense and it's clear now what problem I'm trying to solve. Thanks again
for all the input provided so far, and I'm looking forward to some more
answers.

-- 
Best regards,
Radu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] firmware: arm_scmi: Support mailbox transports with no completion IRQ
  2025-01-22  2:19   ` Radu Rendec
@ 2025-01-22 12:14     ` Cristian Marussi
  2025-01-22 21:10       ` Radu Rendec
  0 siblings, 1 reply; 5+ messages in thread
From: Cristian Marussi @ 2025-01-22 12:14 UTC (permalink / raw)
  To: Radu Rendec; +Cc: Cristian Marussi, arm-scmi, linux-arm-kernel, Sudeep Holla

On Tue, Jan 21, 2025 at 09:19:37PM -0500, Radu Rendec wrote:
> Hi Cristian,
> 

Hi,

> On Tue, 2025-01-21 at 10:49 +0000, Cristian Marussi wrote:
> > mmm...I dont think you can do this....it is a bit tricky to explain BUT
> > the optional txdone_irq in the mailbox subsystem is NOT a completion
> > interrupt as intended in the SCMI world, I'll try to explain why in the
> > following.

[snip]

> > Sorry for the not so short mail :P ... but if I am missing something
> > and you are seeing anomalies, please let us know.
> 
> No need to apologize, and many thanks for taking the time to explain
> everything in such detail. Much appreciated! What you explained makes
> perfect sense, and now I clearly understand why the changes I proposed
> would not work.
> 
> I have quite the opposite problem. I'm using a mailbox controller that
> does *not* provide a txACK interrupt (or any kind of interrupt for that
> matter), and the problem is that SCMI transfers do not complete because
> scmi_wait_for_reply() times out.
> 

Do you mean a unidirectional mailbox controller (like ARM MHUv3) that
allow for bidirectionality ONLY if you effectively use a pair of them,
one for each direction ?

> I tracked this down to xfer->hdr.poll_completion being false, and
> therefore scmi_wait_for_reply() taking the second branch, where it
> calls wait_for_completion_timeout(). Looking at do_xfer(),
> xfer->hdr.poll_completion is set to true if is_polling_enabled()
> returns true, and one of the conditions for this to happen is that the
> no_completion_irq flag in struct scmi_chan_info is set to true. But it
> defaults to false, and in my case there is nothing there to set it to
> true. This is what led me to the RFC patch I sent.
> 
> Even if now that you explained it I understand why the txACK interrupt
> is different from the SCMI message having been processed and replied, I
> still fail to understand how it's all supposed to work for the mailbox
> transport case where the completion IRQ (which you said was optional)
> is indeed *not* implemented.
> 

Well...the answer here is easy...it is not supposed to work :P

...I introduced no_completion_irq to centralize message handling into the
SCMI core instead to leave it to the individual transports, BUT we did
that because it was needed by the SMC transport which originally had NO
completion IRQs by default, but that, later, was evolved to have an
optionally defined completion IRQ too.
You'll see that the no_completion_irq flag is set there based on the
actual DT description (or lack there of)

On the other side, mailbox HW (that we had at least) originally were
just bidirectional in nature (like ARM MHUv1 MHUv2) and only recently
with ARM MHUv3 we had to support unidirectional HW, BUT, again, in our
setup the unidirectional MHUv3 mboxes HW were anyway used in pair, so
as to achieve bidirectionality in any case, so again the case of
no_completion_irq on mailboxes was still not needed.

> Since xfer->hdr.poll_completion defaults to false, I looked at what
> could trigger the completion that scmi_wait_for_reply() waits on, in
> the case of a mailbox transport. The way it's set up is quite
> convoluted but if I'm reading the code correctly:
>  * core->rx_callback (in mailbox.c) is set to scmi_rx_callback()
>  * chan->cl->rx_callback is set to rx_callback() (in mailbox.c)
> Therefore, the (reverse) call path leading to completion would be:
> complete(&xfer->done)
> scmi_handle_response()
> scmi_rx_callback()
> rx_callback()
> mbox_chan_received_data()
> 
> If I understand correctly, mbox_chan_received_data() is supposed to be
> called by the mailbox provider driver when the mailbox receives data
> e.g. the SCMI server rings the bell to indicate completion. But doesn't
> this mean that the completion notification (through mailbox doorbell)
> is expected to be implemented by default?

Nope because when the SCMI client requires a polling transaction, beside
the busy-looping on the channel BUSY bit on its side, it will also ask
for a polling transaction by setting the proper bit in the channel flags
(as detailed in spec chap 5.1) of the shmem area, so that the server will
know that the completion IRQ, even if existing is NOT required for that
transaction: in any case, polling or not, the server will clear the channel
BUSY bit when it is done, since that is the mechanism that is used by the
client and server to pass back and forth the 'ownership' of the channel.
[BIG RED WARNING: I never remember if the channel bit I blabbed about
above is defined as a BUSY bit or as a FREE bit....so the logic could be
reversed.]

The above backtrace is only for the IRQ path...if you are polling, you
are indeed busy looping inside scmi_wait_for_reply() and if you dont get
a timeout (or an out of order invalid reply) the reply if picked up at
the end of the poll loop by a fetch_reponse() inside that same
wait_for_reply....since polling anyway is meant to pick up Delayed
Responses OR notification that does on the P2A channel (RX in the SCMI
stack) which needs to have an associated IRQ to work at all.

Beside your scenario of a system lacking a completion IRQ, polling
transactions can be asked for a speficic SCMI command at will like to
issue a clock_enable(), so polling is generally working as far as I know
and tested....you can easily test that also by just enaboing
force_polling flag in the transport desc (for testing/debug purposes
only)

> 
> Now I get that the mailbox layer has no knowledge of the client layer
> (SCMI in this case) and that the txACK IRQ in the mailbox layer is not
> meaningful or useful to the SCMI layer. But in my (uneducated) opinion,
> there should still be a way to set the no_completion_irq flag in struct
> scmi_chan_info to true for the mailbox transport, for the case when the
> SCMI server does not implement the completion notification. Perhaps
> through the device tree?

As I said, the flag is NOT settable, since it was a scenario that was
never needed....but you are right.. you will have to somehow set that
flag to handle this kind of setup.

Now, I was hoping to be able to somehow handle this with the existing
bindings, because if you look at 'mboxes' definition in:

  Documentation/devicetree/bindings/firmware/arm,scmi.yaml

..you will see that we try to handle all the possible combinations of
mailbox controller types implicitly by deducing the kind of controller
by the combination of the number of defined shmem/mbox. (this is because
unfortunately we cannot rely on naming for backward compatibility issue
since mbox names was NOT required prop from the start).

So, I would have loved to be able to deduce in mailbox_chan_setup() that
in a specific configuration there is no completion interrupt and set
accordingly the no_completion_irq flag using the existing bindings....

...but seems to me that it is not possible since there is the assumption
that unidirectional mailbox are always used in pair to provide a completion
IRQ....and we run out of possibilities to exploit this implicit info
(again because we address the most common case with the current bindings)

At the end, yes I think we should be able to describe this particular HW
configuration somehow, so a new binding could be the answer, since this
is exactly HW description, the tricky part would be to add something
that can cope well with all the above existing implict combinations
(maybe is not a great issue eh .... I have not though about it...)

...BUT...before all of this, just to be paranoid, I would check to make
sure that your HW is really unidirectional in nature and not that, instead,
is a bidirectional mailbox controller, where the SCMI server is really not
taking care to ring the doorbell back ? (which usually is the same
doorbell used in the client->server direction)

(I ignore which server implementation you use...so just to make sure...)

> 
> Now it's my turn to apologize for a long email :) Hopefully it makes
> sense and it's clear now what problem I'm trying to solve. Thanks again
> for all the input provided so far, and I'm looking forward to some more
> answers.
> 

... better stop apologizing here and now, given the average payload size
of these recent mail exchanges :P

Thanks,
Cristian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] firmware: arm_scmi: Support mailbox transports with no completion IRQ
  2025-01-22 12:14     ` Cristian Marussi
@ 2025-01-22 21:10       ` Radu Rendec
  0 siblings, 0 replies; 5+ messages in thread
From: Radu Rendec @ 2025-01-22 21:10 UTC (permalink / raw)
  To: Cristian Marussi; +Cc: arm-scmi, linux-arm-kernel, Sudeep Holla

Hi Cristian,

On Wed, 2025-01-22 at 12:14 +0000, Cristian Marussi wrote:
> On Tue, Jan 21, 2025 at 09:19:37PM -0500, Radu Rendec wrote:
> 
> > On Tue, 2025-01-21 at 10:49 +0000, Cristian Marussi wrote:
> > > mmm...I dont think you can do this....it is a bit tricky to explain BUT
> > > the optional txdone_irq in the mailbox subsystem is NOT a completion
> > > interrupt as intended in the SCMI world, I'll try to explain why in the
> > > following.
> 
> [snip]
> 
> > > Sorry for the not so short mail :P ... but if I am missing something
> > > and you are seeing anomalies, please let us know.
> > 
> > No need to apologize, and many thanks for taking the time to explain
> > everything in such detail. Much appreciated! What you explained makes
> > perfect sense, and now I clearly understand why the changes I proposed
> > would not work.
> > 
> > I have quite the opposite problem. I'm using a mailbox controller that
> > does *not* provide a txACK interrupt (or any kind of interrupt for that
> > matter), and the problem is that SCMI transfers do not complete because
> > scmi_wait_for_reply() times out.
> 
> Do you mean a unidirectional mailbox controller (like ARM MHUv3) that
> allow for bidirectionality ONLY if you effectively use a pair of them,
> one for each direction ?

Yes, that's what I meant. But I had a very distorted understanding of
how that worked, and now it's much clearer after reading your email,
looking at arm,scmi.yaml and staring more at the code :) Please see
further details below.

[snip]

> > Since xfer->hdr.poll_completion defaults to false, I looked at what
> > could trigger the completion that scmi_wait_for_reply() waits on, in
> > the case of a mailbox transport. The way it's set up is quite
> > convoluted but if I'm reading the code correctly:
> >  * core->rx_callback (in mailbox.c) is set to scmi_rx_callback()
> >  * chan->cl->rx_callback is set to rx_callback() (in mailbox.c)
> > Therefore, the (reverse) call path leading to completion would be:
> > complete(&xfer->done)
> > scmi_handle_response()
> > scmi_rx_callback()
> > rx_callback()
> > mbox_chan_received_data()
> > 
> > If I understand correctly, mbox_chan_received_data() is supposed to be
> > called by the mailbox provider driver when the mailbox receives data
> > e.g. the SCMI server rings the bell to indicate completion. But doesn't
> > this mean that the completion notification (through mailbox doorbell)
> > is expected to be implemented by default?
> 
> Nope because when the SCMI client requires a polling transaction, beside
> the busy-looping on the channel BUSY bit on its side, it will also ask
> for a polling transaction by setting the proper bit in the channel flags
> (as detailed in spec chap 5.1) of the shmem area, so that the server will
> know that the completion IRQ, even if existing is NOT required for that
> transaction: in any case, polling or not, the server will clear the channel
> BUSY bit when it is done, since that is the mechanism that is used by the
> client and server to pass back and forth the 'ownership' of the channel.
> [BIG RED WARNING: I never remember if the channel bit I blabbed about
> above is defined as a BUSY bit or as a FREE bit....so the logic could be
> reversed.]
> 
> The above backtrace is only for the IRQ path...if you are polling, you
> are indeed busy looping inside scmi_wait_for_reply() and if you dont get
> a timeout (or an out of order invalid reply) the reply if picked up at
> the end of the poll loop by a fetch_reponse() inside that same
> wait_for_reply....since polling anyway is meant to pick up Delayed
> Responses OR notification that does on the P2A channel (RX in the SCMI
> stack) which needs to have an associated IRQ to work at all.
> 
> Beside your scenario of a system lacking a completion IRQ, polling
> transactions can be asked for a speficic SCMI command at will like to
> issue a clock_enable(), so polling is generally working as far as I know
> and tested....you can easily test that also by just enaboing
> force_polling flag in the transport desc (for testing/debug purposes
> only)

I'm sorry, I didn't phrase the question very well. Polling does work as
expected, and it was clearly proven by the fact that I actually got the
SCMI communication working with the patch I sent initially - because it
was forcing the polling mode (even though the patch itself was wrong
and had no practical value).

The backtrace I mentioned is clearly for the IRQ path. What I was
getting at was that it seems to be the only way for the completion to
be completed. And since in the default configuration (with the default
flags) scmi_wait_for_reply() takes the second branch, where it waits
for that completion, it means that by default the completion IRQ needs
to exist for the kernel to know that the transfer has been replied to.

What I really meant to ask was this: does it mean that in the default
configuration (with the default flags) the completion IRQ is required?
And I think you already answered, indirectly :)

> > Now I get that the mailbox layer has no knowledge of the client layer
> > (SCMI in this case) and that the txACK IRQ in the mailbox layer is not
> > meaningful or useful to the SCMI layer. But in my (uneducated) opinion,
> > there should still be a way to set the no_completion_irq flag in struct
> > scmi_chan_info to true for the mailbox transport, for the case when the
> > SCMI server does not implement the completion notification. Perhaps
> > through the device tree?
> 
> As I said, the flag is NOT settable, since it was a scenario that was
> never needed....but you are right.. you will have to somehow set that
> flag to handle this kind of setup.
> 
> Now, I was hoping to be able to somehow handle this with the existing
> bindings, because if you look at 'mboxes' definition in:
> 
>   Documentation/devicetree/bindings/firmware/arm,scmi.yaml
> 
> ..you will see that we try to handle all the possible combinations of
> mailbox controller types implicitly by deducing the kind of controller
> by the combination of the number of defined shmem/mbox. (this is because
> unfortunately we cannot rely on naming for backward compatibility issue
> since mbox names was NOT required prop from the start).

That makes perfect sense, and the description in arm,scmi.yaml (which I
should have looked at in the first place) was a real eye opener. Thanks
for that!!

> So, I would have loved to be able to deduce in mailbox_chan_setup() that
> in a specific configuration there is no completion interrupt and set
> accordingly the no_completion_irq flag using the existing bindings....
> 
> ...but seems to me that it is not possible since there is the assumption
> that unidirectional mailbox are always used in pair to provide a completion
> IRQ....and we run out of possibilities to exploit this implicit info
> (again because we address the most common case with the current bindings)

True, but maybe the assumption is wrong. In the scenario I'm describing
there are 2 shmem areas and only one unidirectional mailbox for Tx. The
Rx mailbox does not exist because there is no completion IRQ, and the
only way you can implement a Rx mailbox is through an IRQ. So in theory
you could infer this configuration from a "1 mbox / 2 shmem" setup,
which is not one of the currently supported combinations. But it would
probably not scale well for the more general case.

> At the end, yes I think we should be able to describe this particular HW
> configuration somehow, so a new binding could be the answer, since this
> is exactly HW description, the tricky part would be to add something
> that can cope well with all the above existing implict combinations
> (maybe is not a great issue eh .... I have not though about it...)
> 
> ...BUT...before all of this, just to be paranoid, I would check to make
> sure that your HW is really unidirectional in nature and not that, instead,
> is a bidirectional mailbox controller, where the SCMI server is really not
> taking care to ring the doorbell back ? (which usually is the same
> doorbell used in the client->server direction)

I think you nailed it. Looking at the mailbox driver implementation
(which actually has an IRQ handler that calls mbox_chan_received_data)
and the way the SCMI configuration is described in the device tree, I
think that's exactly the case. And the completion IRQ doesn't work
either because the SCMI server doesn't trigger it or there is a problem
with the IRQ plumbing along the way.

Unfortunately, because the mailboxes can be either unidirectional or
bidirectional and they are not necessarily mapped 1:1 to the SCMI
channels, it can be very confusing for the uninitiated eye (like
myself). If I had known everything I know now, it would have been
pretty obvious from the beginning that there was something wrong with
the SCMI server (or completion IRQ routing).

> (I ignore which server implementation you use...so just to make sure...)

Unfortunately the mailbox driver is not upstream (which is always a sad
thing), and that's why I didn't mention what it was. It's also based on
a 5.10.x kernel but simple enough that it compiles cleanly with the
latest upstream. With everything I have learned, now I can tell that
ironically the driver itself seems to be OK. For what it's worth, the
BSP kernel where this comes from includes an ugly hack that changes the
initialization of xfer->hdr.poll_completion to true instead of false to
force polling mode and cope with the broken completion IRQ :-/

> > Now it's my turn to apologize for a long email :) Hopefully it makes
> > sense and it's clear now what problem I'm trying to solve. Thanks again
> > for all the input provided so far, and I'm looking forward to some more
> > answers.
> 
> ... better stop apologizing here and now, given the average payload size
> of these recent mail exchanges :P

Fair enough :) Instead, let me thank you, again, for your time and
patience, and for all the detailed information you have provided.

-- 
Best regards,
Radu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-01-22 21:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-20 21:24 [RFC PATCH] firmware: arm_scmi: Support mailbox transports with no completion IRQ Radu Rendec
2025-01-21 10:49 ` Cristian Marussi
2025-01-22  2:19   ` Radu Rendec
2025-01-22 12:14     ` Cristian Marussi
2025-01-22 21:10       ` Radu Rendec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).