From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A782A40DFDA
	for <linux-kernel@vger.kernel.org>; Tue,  5 May 2026 23:40:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1778024455; cv=none; b=ULlbkbQiGjiqxZ05+gTt6FV1C6qvcOOfwaRR+qjRsjbVpgXpCW7lE6ol3YMnvAUUngiNl94kXZxR1KXjT6twIQorute2vmxjmCgavt3C5xQhV8HltSGO6jpYuiFJ32Cfox5SY55E+QmRwL7fwb86eqFenzRlB6EZzZLWEGVrJCo=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1778024455; c=relaxed/simple;
	bh=f20BOxJoF77F75xIuq1SIbdRBJaMXVGiM1e8pvOxdvk=;
	h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type; b=m4G9b+IxAaF3tmj6Y1CG3kiRB1GziC4W0o30aCpiFdq+HNwKlAvUC5YCIocWeCgIa8jiEiGI9xv3G8FHlvK1wyyNKJgbGGhSmRr8kN2mghtImqr6Z5dupK+yxLisrmsud2P9FQSPTM5qaWl77OJ1selPsAaGQYb/mCv9K51g1Tg=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=O3bNbkbt; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="O3bNbkbt"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id F01EBC2BCB4;
	Tue,  5 May 2026 23:40:54 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1778024455;
	bh=f20BOxJoF77F75xIuq1SIbdRBJaMXVGiM1e8pvOxdvk=;
	h=Date:Subject:To:Cc:References:From:In-Reply-To:From;
	b=O3bNbkbtz/JoykujKm2mCDqQxEAmuAYRY6O4s9erx6GH9JpLpLx9hXcPscyjN6Pq/
	 kBb9qO/VyJVejNgs40TYTaHvCQqbB3pQsHBPXwpV3OuQzv3WBEN9jIX85d5n18dDrD
	 HBc/WXKwLI+kjgMutO/4zP0AvAc9a4TdzWPxJadxpPBbgT9mOdQqD3gLxZQ3zzlNhy
	 ikF9o2L8gscNX4Ha/kCw2uxJ7dwsgBp7Rutat+0veKn6OPyFlv1+Ytskw38RvrBID+
	 Ylrdpga+rVNEFg7E3bxwLyP+mnipcGOurVe+/ePX+WAi94Dp0g0qZ75IRH1YN32a6w
	 0skYxQQIbVoKQ==
Message-ID: <4307e321-ea7b-4798-890c-adba537fc5e0@kernel.org>
Date: Tue, 5 May 2026 18:40:53 -0500
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH] firmware: stratix10-rsu: avoid RSU mutex deadlock on
 update
Content-Language: en-US
To: tze.yee.ng@altera.com, linux-kernel@vger.kernel.org
Cc: Adrian Ng Ho Yin <adrian.ho.yin.ng@altera.com>,
 Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
References: <ddbc5453f6bdbea132ae3df8a5307ba27f275c63.1777949109.git.tze.yee.ng@altera.com>
From: Dinh Nguyen <dinguyen@kernel.org>
In-Reply-To: <ddbc5453f6bdbea132ae3df8a5307ba27f275c63.1777949109.git.tze.yee.ng@altera.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

 From sashiko claude-sonnet-4-6,

On 5/4/26 21:51, tze.yee.ng@altera.com wrote:
> From: Tze Yee Ng <tze.yee.ng@altera.com>
> 
> Use mutex_trylock() in rsu_send_msg() instead of mutex_lock() so callers
> that already hold priv->lock or race with an in-flight RSU operation get
> -EAGAIN instead of blocking forever.
> 
> In reboot_image_store(), treat -EAGAIN as a no-op success so sysfs write
> does not hang when the RSU path is busy.
> 
> Signed-off-by: Tze Yee Ng <tze.yee.ng@altera.com>
> ---
>   drivers/firmware/stratix10-rsu.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/firmware/stratix10-rsu.c b/drivers/firmware/stratix10-rsu.c
> index e1912108a0fe..f3ee0b4e49a8 100644
> --- a/drivers/firmware/stratix10-rsu.c
> +++ b/drivers/firmware/stratix10-rsu.c
> @@ -254,7 +254,13 @@ static void rsu_async_get_spt_table_callback(struct device *dev,
>    * is necessary to get RSU boot log or set the address of bitstream to
>    * boot after reboot.
>    *
> - * Returns 0 on success or -ETIMEDOUT on error.
> + * Return:
> + * * 0 on success
> + * * -EAGAIN if the driver mutex could not be acquired
> + * * a negative errno from stratix10_svc_send() on failure to send the request
> + * * -ETIMEDOUT if waiting for the SMC callback times out
> + * * a negative errno from wait_for_completion_interruptible_timeout() if
> + *   interrupted or otherwise signaled while waiting (for example -ERESTARTSYS)
>    */
>   static int rsu_send_msg(struct stratix10_rsu_priv *priv,
>   			enum stratix10_svc_command_code command,
> @@ -264,7 +270,8 @@ static int rsu_send_msg(struct stratix10_rsu_priv *priv,
>   	struct stratix10_svc_client_msg msg;
>   	int ret;
>   
> -	mutex_lock(&priv->lock);
> +	if (!mutex_trylock(&priv->lock))
> +		return -EAGAIN;

The commit message says this prevents deadlocks from "callers that
already hold priv->lock." Does such a caller actually exist?

Looking at the driver, priv->lock is initialized in
stratix10_rsu_probe() and is acquired only inside rsu_send_msg()
itself. The completion callbacks (rsu_command_callback and friends)
call complete() without taking priv->lock, so there is no reentrant
acquisition path through callbacks either.

Two concurrent sysfs writes would serialize correctly under the
original mutex_lock() — the second writer would block until the first
completed. That is the expected behavior for a one-shot firmware
configuration command, not a deadlock.


>   	reinit_completion(&priv->completion);
>   	priv->client.receive_cb = callback;
>   
> @@ -597,7 +604,9 @@ static ssize_t reboot_image_store(struct device *dev,
>   
>   	ret = rsu_send_msg(priv, COMMAND_RSU_UPDATE,
>   			   address, rsu_command_callback);
> -	if (ret) {
> +	if (ret == -EAGAIN) {
> +		return 0;

Two concerns here.

First, does returning 0 violate the sysfs store callback contract?
Sysfs store handlers are expected to return the number of bytes
consumed on success, or a negative errno on error. Returning 0 tells
kernfs that zero bytes were consumed, which is a short write. A
POSIX-compliant write loop seeing 0 bytes returned on a non-zero
write will retry indefinitely. The intended silent no-op should
return count, not 0.

Second, when mutex_trylock fails, COMMAND_RSU_UPDATE is never sent
to the firmware, yet reboot_image_store() returns without an error.
The operator's request to change the persistent boot image is
silently dropped with no log message and no error to userspace.

On the next reboot the device will boot the old image rather than
the one that was written to the attribute.

For a one-shot persistent firmware state change, is silently
discarding the request the right tradeoff? Blocking in a sysfs
store callback is permitted, so keeping the original mutex_lock()
would let concurrent writers serialize correctly without any data
loss.

Dinh