[PATCH v5] ufs: core: wlun send SSU timeout recovery

Linux SCSI subsystem development
 help / color / mirror / Atom feed

* [PATCH v5] ufs: core: wlun send SSU timeout recovery
@ 2023-09-27  3:35 peter.wang
  2023-09-27 18:31 ` Bart Van Assche
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: peter.wang @ 2023-09-27  3:35 UTC (permalink / raw)
  To: stanley.chu, linux-scsi, martin.petersen, avri.altman,
	alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu

From: Peter Wang <peter.wang@mediatek.com>

When runtime pm send SSU times out, the SCSI core invokes
eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
schedule eh_work and stuck at wait flush_work(&hba->eh_work).
However, ufshcd_err_handler hangs in wait rpm resume.
Do link recovery only in this case.
Below is IO hang stack dump in kernel-6.1

kworker/4:0     D
<ffffffd7d31f6fb4> __switch_to+0x180/0x344
<ffffffd7d31f779c> __schedule+0x5ec/0xa14
<ffffffd7d31f7c3c> schedule+0x78/0xe0
<ffffffd7d31fefbc> schedule_timeout+0xb0/0x15c
<ffffffd7d31f8120> io_schedule_timeout+0x48/0x70
<ffffffd7d31f8e40> do_wait_for_common+0x108/0x19c
<ffffffd7d31f837c> wait_for_completion_io_timeout+0x50/0x78
<ffffffd7d2876bc0> blk_execute_rq+0x1b8/0x218
<ffffffd7d2b4297c> scsi_execute_cmd+0x148/0x238
<ffffffd7d2da7358> ufshcd_set_dev_pwr_mode+0xe8/0x244
<ffffffd7d2da7e40> __ufshcd_wl_resume+0x1e0/0x45c
<ffffffd7d2da7b28> ufshcd_wl_runtime_resume+0x3c/0x174
<ffffffd7d2b4f290> scsi_runtime_resume+0x7c/0xc8
<ffffffd7d2ae1d48> __rpm_callback+0xa0/0x410
<ffffffd7d2ae0128> rpm_resume+0x43c/0x67c
<ffffffd7d2ae1e98> __rpm_callback+0x1f0/0x410
<ffffffd7d2ae014c> rpm_resume+0x460/0x67c
<ffffffd7d2ae1450> pm_runtime_work+0xa4/0xac
<ffffffd7d22e39ac> process_one_work+0x208/0x598
<ffffffd7d22e3fc0> worker_thread+0x228/0x438
<ffffffd7d22eb038> kthread+0x104/0x1d4
<ffffffd7d22171a0> ret_from_fork+0x10/0x20

scsi_eh_0       D
<ffffffd7d31f6fb4> __switch_to+0x180/0x344
<ffffffd7d31f779c> __schedule+0x5ec/0xa14
<ffffffd7d31f7c3c> schedule+0x78/0xe0
<ffffffd7d31fef50> schedule_timeout+0x44/0x15c
<ffffffd7d31f8e40> do_wait_for_common+0x108/0x19c
<ffffffd7d31f8234> wait_for_completion+0x48/0x64
<ffffffd7d22deb88> __flush_work+0x260/0x2d0
<ffffffd7d22de918> flush_work+0x10/0x20
<ffffffd7d2da4728> ufshcd_eh_host_reset_handler+0x88/0xcc
<ffffffd7d2b41da4> scsi_try_host_reset+0x48/0xe0
<ffffffd7d2b410fc> scsi_eh_ready_devs+0x934/0xa40
<ffffffd7d2b41618> scsi_error_handler+0x168/0x374
<ffffffd7d22eb038> kthread+0x104/0x1d4
<ffffffd7d22171a0> ret_from_fork+0x10/0x20

kworker/u16:5   D
<ffffffd7d31f6fb4> __switch_to+0x180/0x344
<ffffffd7d31f779c> __schedule+0x5ec/0xa14
<ffffffd7d31f7c3c> schedule+0x78/0xe0
<ffffffd7d2adfe00> rpm_resume+0x114/0x67c
<ffffffd7d2adfca8> __pm_runtime_resume+0x70/0xb4
<ffffffd7d2d9cf48> ufshcd_err_handler+0x1a0/0xe68
<ffffffd7d22e39ac> process_one_work+0x208/0x598
<ffffffd7d22e3fc0> worker_thread+0x228/0x438
<ffffffd7d22eb038> kthread+0x104/0x1d4
<ffffffd7d22171a0> ret_from_fork+0x10/0x20

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index c2df07545f96..0619cefa092e 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -7716,6 +7716,20 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
 
 	hba = shost_priv(cmd->device->host);
 
+	/*
+	 * If runtime pm send SSU and got timeout, scsi_error_handler
+	 * stuck at this function to wait for flush_work(&hba->eh_work).
+	 * And ufshcd_err_handler(eh_work) stuck at wait for runtime pm active.
+	 * Do ufshcd_link_recovery instead schedule eh_work can prevent
+	 * dead lock to happen.
+	 */
+	if (hba->pm_op_in_progress) {
+		if (ufshcd_link_recovery(hba))
+			err = FAILED;
+
+		return err;
+	}
+
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	hba->force_reset = true;
 	ufshcd_schedule_eh_work(hba);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] ufs: core: wlun send SSU timeout recovery
  2023-09-27  3:35 [PATCH v5] ufs: core: wlun send SSU timeout recovery peter.wang
@ 2023-09-27 18:31 ` Bart Van Assche
  2023-09-28  1:58   ` Peter Wang (王信友)
  2023-09-28 16:30 ` Bart Van Assche
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2023-09-27 18:31 UTC (permalink / raw)
  To: peter.wang, stanley.chu, linux-scsi, martin.petersen, avri.altman,
	alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui,
	tun-yu.yu, eddie.huang, naomi.chu

On 9/26/23 20:35, peter.wang@mediatek.com wrote:
> +	if (hba->pm_op_in_progress) {
> +		if (ufshcd_link_recovery(hba))
> +			err = FAILED;
> +
> +		return err;
> +	}

This patch looks good to me but I wish the above code would have been
written using the style that is preferred in the Linux kernel:

	if (hba->pm_op_in_progress && ufshcd_link_recovery(hba) < 0)
		return FAILED;

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] ufs: core: wlun send SSU timeout recovery
  2023-09-27 18:31 ` Bart Van Assche
@ 2023-09-28  1:58   ` Peter Wang (王信友)
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Wang (王信友) @ 2023-09-28  1:58 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org, bvanassche@acm.org,
	avri.altman@wdc.com, Stanley Chu (朱原陞),
	alim.akhtar@samsung.com, martin.petersen@oracle.com,
	jejb@linux.ibm.com
  Cc: linux-mediatek@lists.infradead.org,
	Jiajie Hao (郝加节),
	CC Chou (周志杰),
	Eddie Huang (黃智傑),
	Alice Chao (趙珮均), wsd_upstream,
	Lin Gui (桂林),
	Chun-Hung Wu (巫駿宏),
	Tun-yu Yu (游敦聿),
	Chaotian Jing (井朝天),
	Powen Kao (高伯文),
	Naomi Chu (朱詠田),
	Qilin Tan (谭麒麟)

On Wed, 2023-09-27 at 11:31 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/26/23 20:35, peter.wang@mediatek.com wrote:
> > +if (hba->pm_op_in_progress) {
> > +if (ufshcd_link_recovery(hba))
> > +err = FAILED;
> > +
> > +return err;
> > +}
> 
> This patch looks good to me but I wish the above code would have been
> written using the style that is preferred in the Linux kernel:
> 
> if (hba->pm_op_in_progress && ufshcd_link_recovery(hba) < 0)
> return FAILED;
> 
> Thanks,
> 
> Bart.
> 

Hi Bart,

It looks more concise but cannot help in this deadlock case.
Because if pm_op_in_progress is true, and ufshcd_link_recovery return
0, we should return SUCCESS directly, else go forward in this function
eh_work will be triggered and deadlock happen.

Thanks.
Peter


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] ufs: core: wlun send SSU timeout recovery
  2023-09-27  3:35 [PATCH v5] ufs: core: wlun send SSU timeout recovery peter.wang
  2023-09-27 18:31 ` Bart Van Assche
@ 2023-09-28 16:30 ` Bart Van Assche
  2023-09-29 14:14 ` Stanley Chu
  2023-10-10  1:23 ` Martin K. Petersen
  3 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2023-09-28 16:30 UTC (permalink / raw)
  To: peter.wang, stanley.chu, linux-scsi, martin.petersen, avri.altman,
	alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui,
	tun-yu.yu, eddie.huang, naomi.chu

On 9/26/23 20:35, peter.wang@mediatek.com wrote:
> When runtime pm send SSU times out, the SCSI core invokes
> eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
> schedule eh_work and stuck at wait flush_work(&hba->eh_work).
> However, ufshcd_err_handler hangs in wait rpm resume.
> Do link recovery only in this case.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] ufs: core: wlun send SSU timeout recovery
  2023-09-27  3:35 [PATCH v5] ufs: core: wlun send SSU timeout recovery peter.wang
  2023-09-27 18:31 ` Bart Van Assche
  2023-09-28 16:30 ` Bart Van Assche
@ 2023-09-29 14:14 ` Stanley Chu
  2023-10-10  1:23 ` Martin K. Petersen
  3 siblings, 0 replies; 6+ messages in thread
From: Stanley Chu @ 2023-09-29 14:14 UTC (permalink / raw)
  To: peter.wang
  Cc: stanley.chu, linux-scsi, martin.petersen, avri.altman,
	alim.akhtar, jejb, wsd_upstream, linux-mediatek, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu

On Wed, Sep 27, 2023 at 1:43 PM <peter.wang@mediatek.com> wrote:
>
> From: Peter Wang <peter.wang@mediatek.com>
>
> When runtime pm send SSU times out, the SCSI core invokes
> eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
> schedule eh_work and stuck at wait flush_work(&hba->eh_work).
> However, ufshcd_err_handler hangs in wait rpm resume.
> Do link recovery only in this case.
> Below is IO hang stack dump in kernel-6.1

Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5] ufs: core: wlun send SSU timeout recovery
  2023-09-27  3:35 [PATCH v5] ufs: core: wlun send SSU timeout recovery peter.wang
                   ` (2 preceding siblings ...)
  2023-09-29 14:14 ` Stanley Chu
@ 2023-10-10  1:23 ` Martin K. Petersen
  3 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2023-10-10  1:23 UTC (permalink / raw)
  To: peter.wang
  Cc: stanley.chu, linux-scsi, martin.petersen, avri.altman,
	alim.akhtar, jejb, wsd_upstream, linux-mediatek, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu


Peter,

> When runtime pm send SSU times out, the SCSI core invokes
> eh_host_reset_handler, which hooks function
> ufshcd_eh_host_reset_handler schedule eh_work and stuck at wait
> flush_work(&hba->eh_work). However, ufshcd_err_handler hangs in wait
> rpm resume. Do link recovery only in this case.

Applied to 6.7/scsi-staging, thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-10  1:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-27  3:35 [PATCH v5] ufs: core: wlun send SSU timeout recovery peter.wang
2023-09-27 18:31 ` Bart Van Assche
2023-09-28  1:58   ` Peter Wang (王信友)
2023-09-28 16:30 ` Bart Van Assche
2023-09-29 14:14 ` Stanley Chu
2023-10-10  1:23 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox