[PATCH] firmware: stratix10-rsu: avoid RSU mutex deadlock on update

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] firmware: stratix10-rsu: avoid RSU mutex deadlock on update
@ 2026-05-05  2:51 tze.yee.ng
  2026-05-05 23:40 ` Dinh Nguyen
  0 siblings, 1 reply; 2+ messages in thread
From: tze.yee.ng @ 2026-05-05  2:51 UTC (permalink / raw)
  To: Dinh Nguyen, linux-kernel; +Cc: Adrian Ng Ho Yin, Nazim Amirul, Tze Yee Ng

From: Tze Yee Ng <tze.yee.ng@altera.com>

Use mutex_trylock() in rsu_send_msg() instead of mutex_lock() so callers
that already hold priv->lock or race with an in-flight RSU operation get
-EAGAIN instead of blocking forever.

In reboot_image_store(), treat -EAGAIN as a no-op success so sysfs write
does not hang when the RSU path is busy.

Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
---
 drivers/firmware/stratix10-rsu.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/stratix10-rsu.c b/drivers/firmware/stratix10-rsu.c
index e1912108a0fe..f3ee0b4e49a8 100644
--- a/drivers/firmware/stratix10-rsu.c
+++ b/drivers/firmware/stratix10-rsu.c
@@ -254,7 +254,13 @@ static void rsu_async_get_spt_table_callback(struct device *dev,
  * is necessary to get RSU boot log or set the address of bitstream to
  * boot after reboot.
  *
- * Returns 0 on success or -ETIMEDOUT on error.
+ * Return:
+ * * 0 on success
+ * * -EAGAIN if the driver mutex could not be acquired
+ * * a negative errno from stratix10_svc_send() on failure to send the request
+ * * -ETIMEDOUT if waiting for the SMC callback times out
+ * * a negative errno from wait_for_completion_interruptible_timeout() if
+ *   interrupted or otherwise signaled while waiting (for example -ERESTARTSYS)
  */
 static int rsu_send_msg(struct stratix10_rsu_priv *priv,
 			enum stratix10_svc_command_code command,
@@ -264,7 +270,8 @@ static int rsu_send_msg(struct stratix10_rsu_priv *priv,
 	struct stratix10_svc_client_msg msg;
 	int ret;
 
-	mutex_lock(&priv->lock);
+	if (!mutex_trylock(&priv->lock))
+		return -EAGAIN;
 	reinit_completion(&priv->completion);
 	priv->client.receive_cb = callback;
 
@@ -597,7 +604,9 @@ static ssize_t reboot_image_store(struct device *dev,
 
 	ret = rsu_send_msg(priv, COMMAND_RSU_UPDATE,
 			   address, rsu_command_callback);
-	if (ret) {
+	if (ret == -EAGAIN) {
+		return 0;
+	} else if (ret) {
 		dev_err(dev, "Error, RSU update returned %i\n", ret);
 		return ret;
 	}
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] firmware: stratix10-rsu: avoid RSU mutex deadlock on update
  2026-05-05  2:51 [PATCH] firmware: stratix10-rsu: avoid RSU mutex deadlock on update tze.yee.ng
@ 2026-05-05 23:40 ` Dinh Nguyen
  0 siblings, 0 replies; 2+ messages in thread
From: Dinh Nguyen @ 2026-05-05 23:40 UTC (permalink / raw)
  To: tze.yee.ng, linux-kernel; +Cc: Adrian Ng Ho Yin, Nazim Amirul

 From sashiko claude-sonnet-4-6,

On 5/4/26 21:51, tze.yee.ng@altera.com wrote:
> From: Tze Yee Ng <tze.yee.ng@altera.com>
> 
> Use mutex_trylock() in rsu_send_msg() instead of mutex_lock() so callers
> that already hold priv->lock or race with an in-flight RSU operation get
> -EAGAIN instead of blocking forever.
> 
> In reboot_image_store(), treat -EAGAIN as a no-op success so sysfs write
> does not hang when the RSU path is busy.
> 
> Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
> ---
>   drivers/firmware/stratix10-rsu.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/firmware/stratix10-rsu.c b/drivers/firmware/stratix10-rsu.c
> index e1912108a0fe..f3ee0b4e49a8 100644
> --- a/drivers/firmware/stratix10-rsu.c
> +++ b/drivers/firmware/stratix10-rsu.c
> @@ -254,7 +254,13 @@ static void rsu_async_get_spt_table_callback(struct device *dev,
>    * is necessary to get RSU boot log or set the address of bitstream to
>    * boot after reboot.
>    *
> - * Returns 0 on success or -ETIMEDOUT on error.
> + * Return:
> + * * 0 on success
> + * * -EAGAIN if the driver mutex could not be acquired
> + * * a negative errno from stratix10_svc_send() on failure to send the request
> + * * -ETIMEDOUT if waiting for the SMC callback times out
> + * * a negative errno from wait_for_completion_interruptible_timeout() if
> + *   interrupted or otherwise signaled while waiting (for example -ERESTARTSYS)
>    */
>   static int rsu_send_msg(struct stratix10_rsu_priv *priv,
>   			enum stratix10_svc_command_code command,
> @@ -264,7 +270,8 @@ static int rsu_send_msg(struct stratix10_rsu_priv *priv,
>   	struct stratix10_svc_client_msg msg;
>   	int ret;
>   
> -	mutex_lock(&priv->lock);
> +	if (!mutex_trylock(&priv->lock))
> +		return -EAGAIN;

The commit message says this prevents deadlocks from "callers that
already hold priv->lock." Does such a caller actually exist?

Looking at the driver, priv->lock is initialized in
stratix10_rsu_probe() and is acquired only inside rsu_send_msg()
itself. The completion callbacks (rsu_command_callback and friends)
call complete() without taking priv->lock, so there is no reentrant
acquisition path through callbacks either.

Two concurrent sysfs writes would serialize correctly under the
original mutex_lock() — the second writer would block until the first
completed. That is the expected behavior for a one-shot firmware
configuration command, not a deadlock.

>   	reinit_completion(&priv->completion);
>   	priv->client.receive_cb = callback;
>   
> @@ -597,7 +604,9 @@ static ssize_t reboot_image_store(struct device *dev,
>   
>   	ret = rsu_send_msg(priv, COMMAND_RSU_UPDATE,
>   			   address, rsu_command_callback);
> -	if (ret) {
> +	if (ret == -EAGAIN) {
> +		return 0;

Two concerns here.

First, does returning 0 violate the sysfs store callback contract?
Sysfs store handlers are expected to return the number of bytes
consumed on success, or a negative errno on error. Returning 0 tells
kernfs that zero bytes were consumed, which is a short write. A
POSIX-compliant write loop seeing 0 bytes returned on a non-zero
write will retry indefinitely. The intended silent no-op should
return count, not 0.

Second, when mutex_trylock fails, COMMAND_RSU_UPDATE is never sent
to the firmware, yet reboot_image_store() returns without an error.
The operator's request to change the persistent boot image is
silently dropped with no log message and no error to userspace.

On the next reboot the device will boot the old image rather than
the one that was written to the attribute.

For a one-shot persistent firmware state change, is silently
discarding the request the right tradeoff? Blocking in a sysfs
store callback is permitted, so keeping the original mutex_lock()
would let concurrent writers serialize correctly without any data
loss.

Dinh

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-05 23:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05  2:51 [PATCH] firmware: stratix10-rsu: avoid RSU mutex deadlock on update tze.yee.ng
2026-05-05 23:40 ` Dinh Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox