From: Cristian Marussi <cristian.marussi@arm.com>
To: Artem Shimko <artyom.shimko@gmail.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>,
Cristian Marussi <cristian.marussi@arm.com>,
a.shimko.dev@gmail.com, arm-scmi@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] drivers: scmi: Add completion timeout handling for raw mode transfers
Date: Wed, 1 Oct 2025 12:57:27 +0100 [thread overview]
Message-ID: <aN0T0tvsL6t6R3yF@pluto> (raw)
In-Reply-To: <20250929142856.540590-1-a.shimko.dev@gmail.com>
On Mon, Sep 29, 2025 at 05:28:55PM +0300, Artem Shimko wrote:
> Fix race conditions in SCMI raw mode implementation by adding proper
> completion timeout handling. Multiple tests[1] in the SCMI test suite
> were failing due to early clearing of SCMI_XFER_FLAG_IS_RAW flag in
> scmi_xfer_raw_put() function.
Hi Artem,
>
> TRANS=raw
> PROTOCOLS=base,clock,power_domain,performance,system_power,sensor,
> voltage,reset,powercap,pin_control VERBOSE=5
>
Glad to see that someone is using the test-suite im RAW mode out there
in the real world :P
> The root cause:
> Tests were failing on poll() system calls with this condition:
> if (!raw || (idx == SCMI_RAW_REPLY_QUEUE && !SCMI_XFER_IS_RAW(xfer)))
> return;
>
> The SCMI_XFER_FLAG_IS_RAW flag was being cleared prematurely before
> the transfer completion was properly acknowledged, causing the poll
> to return on timeout and tests to fail.
>
I have to say I had seen some anomalies, hardly reproducible, and I
think you have a point here, good catch !
The xfer flags handling is racy and it could lead to the observed
anomalies...BUT....
> Сhanges implemented:
> 1. Add completion wait with timeout in scmi_xfer_raw_worker()
> 2. Signal completion in scmi_raw_message_report()
>
.. I think the fix in your patch is overkill...it is really not needed
to add another layer of synchronization like you are doing, because the
bug is really around the xfer put and it is very well evident, now that
you have pointed at it...
What about, instead of your patch, this:
-----8<----
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 801d59e6b3bc..d3453130896a 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -822,6 +822,8 @@ __scmi_xfer_put(struct scmi_xfers_info *minfo, struct scmi_xfer *xfer)
scmi_dec_count(info->dbg->counters, XFERS_INFLIGHT);
}
+ xfer->flags &= ~SCMI_XFER_FLAG_IS_RAW;
+ xfer->flags &= ~SCMI_XFER_FLAG_CHAN_SET;
hlist_add_head(&xfer->node, &minfo->free_xfers);
}
spin_unlock_irqrestore(&minfo->xfer_lock, flags);
@@ -840,8 +842,6 @@ void scmi_xfer_raw_put(const struct scmi_handle *handle, struct scmi_xfer *xfer)
{
struct scmi_info *info = handle_to_scmi_info(handle);
- xfer->flags &= ~SCMI_XFER_FLAG_IS_RAW;
- xfer->flags &= ~SCMI_XFER_FLAG_CHAN_SET;
return __scmi_xfer_put(&info->tx_minfo, xfer);
}
---->8----
..because the xfer itself is refcounted properly, BUT as of now the
flags are cleared out of the block inside __scmi_xfer_put:
if (refcount_dec_and_test(&xfer->users))
...and that is the problem.
The above patch solves for me all the observed anomalies...
... probably EVEN BETTER if you just clear all:
---8<-----
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 801d59e6b3bc..914510de3abb 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -822,6 +822,7 @@ __scmi_xfer_put(struct scmi_xfers_info *minfo, struct scmi_xfer *xfer)
scmi_dec_count(info->dbg->counters, XFERS_INFLIGHT);
}
+ xfer->flags = 0;
hlist_add_head(&xfer->node, &minfo->free_xfers);
}
spin_unlock_irqrestore(&minfo->xfer_lock, flags);
@@ -840,8 +841,6 @@ void scmi_xfer_raw_put(const struct scmi_handle *handle, struct scmi_xfer *xfer)
{
struct scmi_info *info = handle_to_scmi_info(handle);
- xfer->flags &= ~SCMI_XFER_FLAG_IS_RAW;
- xfer->flags &= ~SCMI_XFER_FLAG_CHAN_SET;
return __scmi_xfer_put(&info->tx_minfo, xfer);
}
---->8----
...to simply clear all flags when xfer is put...
If this solves for you too, please send a new patch with also a Fixes tag.
Last but not least, are you running the test-suite against a Kernel
configured with RAW cohexistence disabled ?
CONFIG_ARM_SCMI_RAW_MODE_SUPPORT_COEX = n
...because having RAW COEX enabled when running the test-suite could lead to more
false positives even with your patch applied, due to the interference of regular
drivers..
Thanks for this !
Cristian
next prev parent reply other threads:[~2025-10-01 11:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-29 14:28 [PATCH] drivers: scmi: Add completion timeout handling for raw mode transfers Artem Shimko
2025-10-01 11:57 ` Cristian Marussi [this message]
2025-10-03 19:22 ` [PATCH v2] " Artem Shimko
[not found] ` <20251003192233.1618447-1-a.shimko.dev@gmail.com_quarantine>
2025-10-07 16:55 ` Cristian Marussi
2025-10-08 9:10 ` [PATCH v3] firmware: arm_scmi: Fix premature SCMI_XFER_FLAG_IS_RAW clearing in raw mode Artem Shimko
2025-10-08 10:03 ` Cristian Marussi
2025-10-16 9:30 ` Sudeep Holla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aN0T0tvsL6t6R3yF@pluto \
--to=cristian.marussi@arm.com \
--cc=a.shimko.dev@gmail.com \
--cc=arm-scmi@vger.kernel.org \
--cc=artyom.shimko@gmail.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sudeep.holla@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).