Re: [PATCH] firmware: stratix10-rsu: avoid RSU mutex deadlock on update

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Dinh Nguyen <dinguyen@kernel.org>
To: tze.yee.ng@altera.com, linux-kernel@vger.kernel.org
Cc: Adrian Ng Ho Yin <adrian.ho.yin.ng@altera.com>,
	Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
Subject: Re: [PATCH] firmware: stratix10-rsu: avoid RSU mutex deadlock on update
Date: Tue, 5 May 2026 18:40:53 -0500	[thread overview]
Message-ID: <4307e321-ea7b-4798-890c-adba537fc5e0@kernel.org> (raw)
In-Reply-To: <ddbc5453f6bdbea132ae3df8a5307ba27f275c63.1777949109.git.tze.yee.ng@altera.com>

 From sashiko claude-sonnet-4-6,

On 5/4/26 21:51, tze.yee.ng@altera.com wrote:
> From: Tze Yee Ng <tze.yee.ng@altera.com>
> 
> Use mutex_trylock() in rsu_send_msg() instead of mutex_lock() so callers
> that already hold priv->lock or race with an in-flight RSU operation get
> -EAGAIN instead of blocking forever.
> 
> In reboot_image_store(), treat -EAGAIN as a no-op success so sysfs write
> does not hang when the RSU path is busy.
> 
> Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
> ---
>   drivers/firmware/stratix10-rsu.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/firmware/stratix10-rsu.c b/drivers/firmware/stratix10-rsu.c
> index e1912108a0fe..f3ee0b4e49a8 100644
> --- a/drivers/firmware/stratix10-rsu.c
> +++ b/drivers/firmware/stratix10-rsu.c
> @@ -254,7 +254,13 @@ static void rsu_async_get_spt_table_callback(struct device *dev,
>    * is necessary to get RSU boot log or set the address of bitstream to
>    * boot after reboot.
>    *
> - * Returns 0 on success or -ETIMEDOUT on error.
> + * Return:
> + * * 0 on success
> + * * -EAGAIN if the driver mutex could not be acquired
> + * * a negative errno from stratix10_svc_send() on failure to send the request
> + * * -ETIMEDOUT if waiting for the SMC callback times out
> + * * a negative errno from wait_for_completion_interruptible_timeout() if
> + *   interrupted or otherwise signaled while waiting (for example -ERESTARTSYS)
>    */
>   static int rsu_send_msg(struct stratix10_rsu_priv *priv,
>   			enum stratix10_svc_command_code command,
> @@ -264,7 +270,8 @@ static int rsu_send_msg(struct stratix10_rsu_priv *priv,
>   	struct stratix10_svc_client_msg msg;
>   	int ret;
>   
> -	mutex_lock(&priv->lock);
> +	if (!mutex_trylock(&priv->lock))
> +		return -EAGAIN;

The commit message says this prevents deadlocks from "callers that
already hold priv->lock." Does such a caller actually exist?

Looking at the driver, priv->lock is initialized in
stratix10_rsu_probe() and is acquired only inside rsu_send_msg()
itself. The completion callbacks (rsu_command_callback and friends)
call complete() without taking priv->lock, so there is no reentrant
acquisition path through callbacks either.

Two concurrent sysfs writes would serialize correctly under the
original mutex_lock() — the second writer would block until the first
completed. That is the expected behavior for a one-shot firmware
configuration command, not a deadlock.

>   	reinit_completion(&priv->completion);
>   	priv->client.receive_cb = callback;
>   
> @@ -597,7 +604,9 @@ static ssize_t reboot_image_store(struct device *dev,
>   
>   	ret = rsu_send_msg(priv, COMMAND_RSU_UPDATE,
>   			   address, rsu_command_callback);
> -	if (ret) {
> +	if (ret == -EAGAIN) {
> +		return 0;

Two concerns here.

First, does returning 0 violate the sysfs store callback contract?
Sysfs store handlers are expected to return the number of bytes
consumed on success, or a negative errno on error. Returning 0 tells
kernfs that zero bytes were consumed, which is a short write. A
POSIX-compliant write loop seeing 0 bytes returned on a non-zero
write will retry indefinitely. The intended silent no-op should
return count, not 0.

Second, when mutex_trylock fails, COMMAND_RSU_UPDATE is never sent
to the firmware, yet reboot_image_store() returns without an error.
The operator's request to change the persistent boot image is
silently dropped with no log message and no error to userspace.

On the next reboot the device will boot the old image rather than
the one that was written to the attribute.

For a one-shot persistent firmware state change, is silently
discarding the request the right tradeoff? Blocking in a sysfs
store callback is permitted, so keeping the original mutex_lock()
would let concurrent writers serialize correctly without any data
loss.

Dinh

     prev parent reply	other threads:[~2026-05-05 23:40 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  2:51 [PATCH] firmware: stratix10-rsu: avoid RSU mutex deadlock on update tze.yee.ng
2026-05-05 23:40 ` Dinh Nguyen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4307e321-ea7b-4798-890c-adba537fc5e0@kernel.org \
    --to=dinguyen@kernel.org \
    --cc=adrian.ho.yin.ng@altera.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=muhammad.nazim.amirul.nazle.asmade@altera.com \
    --cc=tze.yee.ng@altera.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox