* [PATCH v2 0/2] x86/platform/amd: better handle scheduling delays in hsmp_send_message
@ 2025-06-05 18:09 Jake Hillion via B4 Relay
2025-06-05 18:09 ` [PATCH v2 1/2] x86/platform/amd: move final timeout check to after final sleep Jake Hillion via B4 Relay
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Jake Hillion via B4 Relay @ 2025-06-05 18:09 UTC (permalink / raw)
To: Naveen Krishna Chatradhi, Carlos Bilbao, Hans de Goede,
Ilpo Järvinen
Cc: platform-driver-x86, linux-kernel, sched-ext, Jake Hillion,
Blaise Sanouillet, Suma Hegde
hsmp_send_message currently relies in 2 places on the assumption that
usleep_range will complete in well under 100ms. This is not guaranteed,
and is prevalent when running sched_ext schedulers or possible under
other high load conditions.
These patches alter the behaviour in two ways:
1. Checks the result of `mbox_status` a final time if the sleep took us
past the timeout. This gives a useful result under test when there
are significant scheduling delays, rather than -ETIMEDOUT.
2. Removes the 100ms limit on awaiting the semaphore. This allows a
second thread to compete even when the other suffers large scheduling
delays.
Signed-off-by: Jake Hillion <jake@hillion.co.uk>
---
Changes in v2:
- Corrected -ETIMEDOUT to -ETIME in patch 2 as requested by reviewers.
- Link to v1: https://lore.kernel.org/r/20250530-amd-hsmp-v1-0-3222bffa4008@hillion.co.uk
---
Jake Hillion (2):
x86/platform/amd: move final timeout check to after final sleep
x86/platform/amd: replace down_timeout with down_interruptible
drivers/platform/x86/amd/hsmp/hsmp.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
---
base-commit: 44ed0f35df343d00b8d38006854f96e333104a66
change-id: 20250527-amd-hsmp-8c358985b4fb
Best regards,
--
Jake Hillion <jake@hillion.co.uk>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/2] x86/platform/amd: move final timeout check to after final sleep
2025-06-05 18:09 [PATCH v2 0/2] x86/platform/amd: better handle scheduling delays in hsmp_send_message Jake Hillion via B4 Relay
@ 2025-06-05 18:09 ` Jake Hillion via B4 Relay
2025-06-05 18:09 ` [PATCH v2 2/2] x86/platform/amd: replace down_timeout with down_interruptible Jake Hillion via B4 Relay
2025-06-09 7:53 ` [PATCH v2 0/2] x86/platform/amd: better handle scheduling delays in hsmp_send_message Ilpo Järvinen
2 siblings, 0 replies; 4+ messages in thread
From: Jake Hillion via B4 Relay @ 2025-06-05 18:09 UTC (permalink / raw)
To: Naveen Krishna Chatradhi, Carlos Bilbao, Hans de Goede,
Ilpo Järvinen
Cc: platform-driver-x86, linux-kernel, sched-ext, Jake Hillion,
Blaise Sanouillet, Suma Hegde
From: Jake Hillion <jake@hillion.co.uk>
__hsmp_send_message sleeps between result read attempts and has a
timeout of 100ms. Under extreme load it's possible for these sleeps to
take a long time, exceeding the 100ms. In this case the current code
does not check the register and fails with ETIMEDOUT.
Refactor the loop to ensure there is at least one read of the register
after a sleep of any duration. This removes instances of ETIMEDOUT with
a single caller, even with a misbehaving scheduler. Tested on AMD
Bergamo machines.
Suggested-by: Blaise Sanouillet <linux@blaise.sanouillet.com>
Reviewed-by: Suma Hegde <suma.hegde@amd.com>
Tested-by: Suma Hegde <suma.hegde@amd.com>
Signed-off-by: Jake Hillion <jake@hillion.co.uk>
---
drivers/platform/x86/amd/hsmp/hsmp.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/platform/x86/amd/hsmp/hsmp.c b/drivers/platform/x86/amd/hsmp/hsmp.c
index e262e8a97b4542a389e09a82dad71f7d2e8b2449..f35c639457ac425e79dead2515c0eddea0759323 100644
--- a/drivers/platform/x86/amd/hsmp/hsmp.c
+++ b/drivers/platform/x86/amd/hsmp/hsmp.c
@@ -99,7 +99,7 @@ static int __hsmp_send_message(struct hsmp_socket *sock, struct hsmp_message *ms
short_sleep = jiffies + msecs_to_jiffies(HSMP_SHORT_SLEEP);
timeout = jiffies + msecs_to_jiffies(HSMP_MSG_TIMEOUT);
- while (time_before(jiffies, timeout)) {
+ while (true) {
ret = sock->amd_hsmp_rdwr(sock, mbinfo->msg_resp_off, &mbox_status, HSMP_RD);
if (ret) {
dev_err(sock->dev, "Error %d reading mailbox status\n", ret);
@@ -108,6 +108,10 @@ static int __hsmp_send_message(struct hsmp_socket *sock, struct hsmp_message *ms
if (mbox_status != HSMP_STATUS_NOT_READY)
break;
+
+ if (!time_before(jiffies, timeout))
+ break;
+
if (time_before(jiffies, short_sleep))
usleep_range(50, 100);
else
--
2.47.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] x86/platform/amd: replace down_timeout with down_interruptible
2025-06-05 18:09 [PATCH v2 0/2] x86/platform/amd: better handle scheduling delays in hsmp_send_message Jake Hillion via B4 Relay
2025-06-05 18:09 ` [PATCH v2 1/2] x86/platform/amd: move final timeout check to after final sleep Jake Hillion via B4 Relay
@ 2025-06-05 18:09 ` Jake Hillion via B4 Relay
2025-06-09 7:53 ` [PATCH v2 0/2] x86/platform/amd: better handle scheduling delays in hsmp_send_message Ilpo Järvinen
2 siblings, 0 replies; 4+ messages in thread
From: Jake Hillion via B4 Relay @ 2025-06-05 18:09 UTC (permalink / raw)
To: Naveen Krishna Chatradhi, Carlos Bilbao, Hans de Goede,
Ilpo Järvinen
Cc: platform-driver-x86, linux-kernel, sched-ext, Jake Hillion,
Suma Hegde
From: Jake Hillion <jake@hillion.co.uk>
Currently hsmp_send_message uses down_timeout with a 100ms timeout to
take the semaphore. However __hsmp_send_message, the content of the
critical section, has a sleep in it. On systems with significantly
delayed scheduling behaviour this may take over 100ms.
Convert this method to down_interruptible. Leave the error handling the
same as the documentation currently is not specific about what error is
returned.
Previous behaviour: a caller who competes with another caller stuck in
the critical section due to scheduler delays would receive -ETIME.
New behaviour: a caller who competes with another caller stuck in the
critical section due to scheduler delays will complete successfully.
Reviewed-by: Suma Hegde <suma.hegde@amd.com>
Tested-by: Suma Hegde <suma.hegde@amd.com>
Signed-off-by: Jake Hillion <jake@hillion.co.uk>
---
drivers/platform/x86/amd/hsmp/hsmp.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/platform/x86/amd/hsmp/hsmp.c b/drivers/platform/x86/amd/hsmp/hsmp.c
index f35c639457ac425e79dead2515c0eddea0759323..6c30bb3edc1d77939b10047b771a5c574e5f2a1e 100644
--- a/drivers/platform/x86/amd/hsmp/hsmp.c
+++ b/drivers/platform/x86/amd/hsmp/hsmp.c
@@ -216,13 +216,7 @@ int hsmp_send_message(struct hsmp_message *msg)
return -ENODEV;
sock = &hsmp_pdev.sock[msg->sock_ind];
- /*
- * The time taken by smu operation to complete is between
- * 10us to 1ms. Sometime it may take more time.
- * In SMP system timeout of 100 millisecs should
- * be enough for the previous thread to finish the operation
- */
- ret = down_timeout(&sock->hsmp_sem, msecs_to_jiffies(HSMP_MSG_TIMEOUT));
+ ret = down_interruptible(&sock->hsmp_sem);
if (ret < 0)
return ret;
--
2.47.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 0/2] x86/platform/amd: better handle scheduling delays in hsmp_send_message
2025-06-05 18:09 [PATCH v2 0/2] x86/platform/amd: better handle scheduling delays in hsmp_send_message Jake Hillion via B4 Relay
2025-06-05 18:09 ` [PATCH v2 1/2] x86/platform/amd: move final timeout check to after final sleep Jake Hillion via B4 Relay
2025-06-05 18:09 ` [PATCH v2 2/2] x86/platform/amd: replace down_timeout with down_interruptible Jake Hillion via B4 Relay
@ 2025-06-09 7:53 ` Ilpo Järvinen
2 siblings, 0 replies; 4+ messages in thread
From: Ilpo Järvinen @ 2025-06-09 7:53 UTC (permalink / raw)
To: Naveen Krishna Chatradhi, Carlos Bilbao, Hans de Goede,
Jake Hillion
Cc: platform-driver-x86, linux-kernel, sched-ext, Blaise Sanouillet,
Suma Hegde
On Thu, 05 Jun 2025 19:09:25 +0100, Jake Hillion wrote:
> hsmp_send_message currently relies in 2 places on the assumption that
> usleep_range will complete in well under 100ms. This is not guaranteed,
> and is prevalent when running sched_ext schedulers or possible under
> other high load conditions.
>
> These patches alter the behaviour in two ways:
> 1. Checks the result of `mbox_status` a final time if the sleep took us
> past the timeout. This gives a useful result under test when there
> are significant scheduling delays, rather than -ETIMEDOUT.
> 2. Removes the 100ms limit on awaiting the semaphore. This allows a
> second thread to compete even when the other suffers large scheduling
> delays.
>
> [...]
Thank you for your contribution, it has been applied to my local
review-ilpo-fixes branch. Note it will show up in the public
platform-drivers-x86/review-ilpo-fixes branch only once I've pushed my
local branch there, which might take a while.
The list of commits applied:
[1/2] x86/platform/amd: move final timeout check to after final sleep
commit: f8afb12a2d7503de6558c23cacd7acbf6e9fe678
[2/2] x86/platform/amd: replace down_timeout with down_interruptible
commit: 784e48a82976ee0b645788750343cd1b28a372f3
--
i.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-06-09 7:53 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 18:09 [PATCH v2 0/2] x86/platform/amd: better handle scheduling delays in hsmp_send_message Jake Hillion via B4 Relay
2025-06-05 18:09 ` [PATCH v2 1/2] x86/platform/amd: move final timeout check to after final sleep Jake Hillion via B4 Relay
2025-06-05 18:09 ` [PATCH v2 2/2] x86/platform/amd: replace down_timeout with down_interruptible Jake Hillion via B4 Relay
2025-06-09 7:53 ` [PATCH v2 0/2] x86/platform/amd: better handle scheduling delays in hsmp_send_message Ilpo Järvinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).