linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Sudeep Holla <sudeep.holla@arm.com>
To: jack21 <jackhuang021@gmail.com>
Cc: Cristian Marussi <cristian.marussi@arm.com>,
	Dan Carpenter <dan.carpenter@linaro.org>,
	<arm-scmi@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>,
	Huangjie <huangjie1663@phytium.com.cn>
Subject: Re: [PATCH] dirvers: scmi: poll again when transfer reach timeout
Date: Tue, 11 Feb 2025 15:37:27 +0000	[thread overview]
Message-ID: <Z6tutxs2UE7vjjJ6@bogus> (raw)
In-Reply-To: <Z5KdJa0QOIYIIRv4@pluto>

On Thu, Jan 23, 2025 at 07:48:53PM +0000, Cristian Marussi wrote:
> On Thu, Jan 23, 2025 at 01:38:30PM +0300, Dan Carpenter wrote:
> > s/dirvers/drivers/
> > 
> > On Thu, Jan 23, 2025 at 04:33:24PM +0800, jack21 wrote:
> > > From: Huangjie <huangjie1663@phytium.com.cn>
> > > 
> > > spin_until_cond() not really hold a spin lock, possible timeout may occur
> > > in preemption kernel when preempted after spin_until_cond().
> > > 
> > > We check status again when reach timeout is reached to prevent incorrect
> > > jugement of timeout.
> > > 
> 
> Hi,
> 
> probably another not so short email of mine :P ... 
> 
> > 
> > I suspect the real issue is that we exit the spin loop when
> > try_wait_for_completion(&xfer->done) is true.  Probably we should add
> > that as a Fixes tag?:
> > 
> 
> The Kernel SCMI stack, acting as an SCMI agent have to cope with the
> possible (even though rare) scenario of receiving Out of Order messages
> when dealing with async commands...
> 
> ...IOW, what to do if, after having issued an AsycnCmd, the Delayed response
> is received BEFORE the corresponding immediate reply to the initial request:
> such a wicked (and rare) situation could be the result of a misbehaving
> platform server OR simply due to parallellization of activies on the A2P
> and P2A on some transports, i.e. platform sent reply and delayed_response
> in the correct order but the transport delivered those OoO, being
> transmitted on 2 discinct physical  channels...hard but not impossible.
> 
> Kernel side we address this OoO scenario by assuming that, if a valid
> Delayed Response to a in-flight Async-cmd is received on teh P2A chan,
> before the immediate A2P reply, the transaction itself is good and we
> can progress by just logging and ignoring/swallowing the missing
> immediate-reply, that will probably arrive later, and just carry on
> processing the Delayed Response.
> 
> In order to do that, we maintain, in fact, a per-message state-machine,
> and inside scmi_msg_response_validate(), when we detect the OoO condition,
> we cause the wait-for-immediate-reply to terminate by issuing forcibly a
> complete(xfer->done)....
> 
> ...this works straight away for the non-polled IRQ transactions since
> causes the wait_for_completion() to terminate cleanly, BUT in order to
> cut-short also the busy-wait in the polling case we need that additional
> try_wait_for_completion()....so as not to spin forever for a message
> that we dont care anymore...
> [ note that any late arriving immediate-reply will be in teh future
>  discarded at this point]
> 
> For this reason, I think that this patch, while correctly checking the
> poll_done() condition when the spinloop terminates for the reason
> explained in the commit-message, it should also check if the loop was not
> instead forcibly terminated by the OoO scenario...if not, we will end up
> considering the polling to have timeout, while instead it was forcibly
> terminated by the OoO state machine complete() and just want to ignore
> such missing message...
> 
> So what about instead of checking (untested):
> 
> +	if (!completion_done(&xfer->done) &&
> +	    !info->desc->ops->poll_done(cinfo, xfer))

Were you able to check this ? I am waiting to hear back from you on this.

-- 
Regards,
Sudeep


      reply	other threads:[~2025-02-11 15:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-23  8:33 [PATCH] dirvers: scmi: poll again when transfer reach timeout jack21
2025-01-23 10:38 ` Dan Carpenter
2025-01-23 19:48   ` Cristian Marussi
2025-02-11 15:37     ` Sudeep Holla [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6tutxs2UE7vjjJ6@bogus \
    --to=sudeep.holla@arm.com \
    --cc=arm-scmi@vger.kernel.org \
    --cc=cristian.marussi@arm.com \
    --cc=dan.carpenter@linaro.org \
    --cc=huangjie1663@phytium.com.cn \
    --cc=jackhuang021@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).