[PATCH v1] ufs: core: Fix runtime suspend error deadlock

public inbox for linux-mediatek@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH v1] ufs: core: Fix runtime suspend error deadlock
@ 2025-09-23  8:20 peter.wang
  2025-09-23 16:07 ` Bart Van Assche
  0 siblings, 1 reply; 5+ messages in thread
From: peter.wang @ 2025-09-23  8:20 UTC (permalink / raw)
  To: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, yi-fan.peng,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu, ed.tsai,
	bvanassche

From: Peter Wang <peter.wang@mediatek.com>

Resolve the deadlock issue during runtime suspend when an
error triggers the error handler. Prevent the deadlock
by checking pm_op_in_progress and performing a quick recovery.
This approach ensures that the error handler does not wait
indefinitely for runtime PM to resume, allowing runtime
suspend to proceed smoothly.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 88a0e9289ca6..0735bd5df1cc 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6681,6 +6681,11 @@ static void ufshcd_err_handler(struct work_struct *work)
 	}
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
+	if (hba->pm_op_in_progress) {
+		ufshcd_link_recovery(hba);
+		return;
+	}
+
 	ufshcd_err_handling_prepare(hba);
 
 	spin_lock_irqsave(hba->host->host_lock, flags);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] ufs: core: Fix runtime suspend error deadlock
  2025-09-23  8:20 [PATCH v1] ufs: core: Fix runtime suspend error deadlock peter.wang
@ 2025-09-23 16:07 ` Bart Van Assche
  2025-09-24  9:26   ` Peter Wang (王信友)
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2025-09-23 16:07 UTC (permalink / raw)
  To: peter.wang, linux-scsi, martin.petersen, avri.altman, alim.akhtar,
	jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, yi-fan.peng, qilin.tan, lin.gui,
	tun-yu.yu, eddie.huang, naomi.chu, ed.tsai

On 9/23/25 1:20 AM, peter.wang@mediatek.com wrote:
> From: Peter Wang <peter.wang@mediatek.com>
> 
> Resolve the deadlock issue during runtime suspend when an
> error triggers the error handler. Prevent the deadlock
> by checking pm_op_in_progress and performing a quick recovery.
> This approach ensures that the error handler does not wait
> indefinitely for runtime PM to resume, allowing runtime
> suspend to proceed smoothly.
> 
> Signed-off-by: Peter Wang <peter.wang@mediatek.com>
> Suggested-by: Bart Van Assche <bvanassche@acm.org>
> Tested-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufshcd.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 88a0e9289ca6..0735bd5df1cc 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -6681,6 +6681,11 @@ static void ufshcd_err_handler(struct work_struct *work)
>   	}
>   	spin_unlock_irqrestore(hba->host->host_lock, flags);
>   
> +	if (hba->pm_op_in_progress) {
> +		ufshcd_link_recovery(hba);
> +		return;
> +	}
> +
>   	ufshcd_err_handling_prepare(hba);
>   
>   	spin_lock_irqsave(hba->host->host_lock, flags);

Does this patch introduce a race condition? Can a power management
operation start after hba->pm_op_in_progress has been tested and before
ufshcd_err_handling_prepare() is called? Should the added code perhaps
be surrounded with ufshcd_rpm_get_noresume() and ufshcd_rpm_put()?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] ufs: core: Fix runtime suspend error deadlock
  2025-09-23 16:07 ` Bart Van Assche
@ 2025-09-24  9:26   ` Peter Wang (王信友)
  2025-09-24 21:51     ` Bart Van Assche
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Wang (王信友) @ 2025-09-24  9:26 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org, jejb@linux.ibm.com,
	bvanassche@acm.org, avri.altman@wdc.com,
	martin.petersen@oracle.com, alim.akhtar@samsung.com
  Cc: Alice Chao (趙珮均),
	CC Chou (周志杰),
	Eddie Huang (黃智傑),
	Ed Tsai (蔡宗軒), wsd_upstream,
	Chaotian Jing (井朝天),
	Chun-Hung Wu (巫駿宏),
	Yi-fan Peng (彭羿凡),
	Qilin Tan (谭麒麟),
	linux-mediatek@lists.infradead.org,
	Jiajie Hao (郝加节), Lin Gui (桂林),
	Naomi Chu (朱詠田),
	Tun-yu Yu (游敦聿)

On Tue, 2025-09-23 at 09:07 -0700, Bart Van Assche wrote:
> 
> 
> Does this patch introduce a race condition? Can a power management
> operation start after hba->pm_op_in_progress has been tested and
> before
> ufshcd_err_handling_prepare() is called? Should the added code
> perhaps
> be surrounded with ufshcd_rpm_get_noresume() and ufshcd_rpm_put()?
> 
> Thanks,
> 
> Bart.

Hi Bart,

Are you saying that after the runtime suspend timer expires, 
the error handler might be triggered just before runtime suspend?
I don't think this can happen, since there shouldn't be anything 
triggering the error handler when the bus has been idle for a long
time.

Thanks
Peter


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] ufs: core: Fix runtime suspend error deadlock
  2025-09-24  9:26   ` Peter Wang (王信友)
@ 2025-09-24 21:51     ` Bart Van Assche
  2025-09-25 10:03       ` Peter Wang (王信友)
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2025-09-24 21:51 UTC (permalink / raw)
  To: Peter Wang (王信友), linux-scsi@vger.kernel.org,
	jejb@linux.ibm.com, avri.altman@wdc.com,
	martin.petersen@oracle.com, alim.akhtar@samsung.com
  Cc: Alice Chao (趙珮均),
	CC Chou (周志杰),
	Eddie Huang (黃智傑),
	Ed Tsai (蔡宗軒), wsd_upstream,
	Chaotian Jing (井朝天),
	Chun-Hung Wu (巫駿宏),
	Yi-fan Peng (彭羿凡),
	Qilin Tan (谭麒麟),
	linux-mediatek@lists.infradead.org,
	Jiajie Hao (郝加节), Lin Gui (桂林),
	Naomi Chu (朱詠田),
	Tun-yu Yu (游敦聿)

On 9/24/25 2:26 AM, Peter Wang (王信友) wrote:
> Are you saying that after the runtime suspend timer expires,
> the error handler might be triggered just before runtime suspend?
> I don't think this can happen, since there shouldn't be anything
> triggering the error handler when the bus has been idle for a long
> time.
Hi Peter,

The UFS error handler is not only triggered by bus errors. It can also
be activated via debugfs. Do you agree that activation via debugfs can
happen concurrently with runtime suspend?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] ufs: core: Fix runtime suspend error deadlock
  2025-09-24 21:51     ` Bart Van Assche
@ 2025-09-25 10:03       ` Peter Wang (王信友)
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Wang (王信友) @ 2025-09-25 10:03 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org, jejb@linux.ibm.com,
	bvanassche@acm.org, avri.altman@wdc.com,
	martin.petersen@oracle.com, alim.akhtar@samsung.com
  Cc: Tun-yu Yu (游敦聿),
	Alice Chao (趙珮均),
	Eddie Huang (黃智傑),
	CC Chou (周志杰),
	Ed Tsai (蔡宗軒), wsd_upstream,
	Chaotian Jing (井朝天),
	Chun-Hung Wu (巫駿宏),
	Yi-fan Peng (彭羿凡),
	Qilin Tan (谭麒麟),
	linux-mediatek@lists.infradead.org,
	Jiajie Hao (郝加节), Lin Gui (桂林),
	Naomi Chu (朱詠田)

On Wed, 2025-09-24 at 14:51 -0700, Bart Van Assche wrote:
> 
> Hi Peter,
> 
> The UFS error handler is not only triggered by bus errors. It can
> also
> be activated via debugfs. Do you agree that activation via debugfs
> can
> happen concurrently with runtime suspend?
> 
> Thanks,
> 
> Bart.

Hi Bart,

Do you mean that debugfs could schedule EH work just before
suspend? If so, there is indeed a chance that this patch 
might not work as intended. Even if we surround it with 
ufshcd_rpm_get_noresume() and ufshcd_rpm_put(), there is 
still a possibility that runtime PM is ongoing and the suspend
callback hasn't been called yet, so the result would be the 
same, just with a lower chance of occurrence.

Because this patch mainly targets errors that occur during
the suspend process. If the error happens before suspend, 
it’s outside the scope of what this patch is intended to fix. 
But I will add ufshcd_rpm_get_noresume() and ufshcd_rpm_put() 
next version.

Thanks.
Peter

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-25 10:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-23  8:20 [PATCH v1] ufs: core: Fix runtime suspend error deadlock peter.wang
2025-09-23 16:07 ` Bart Van Assche
2025-09-24  9:26   ` Peter Wang (王信友)
2025-09-24 21:51     ` Bart Van Assche
2025-09-25 10:03       ` Peter Wang (王信友)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox