linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scsi: ufs: preventing bus hang crash during emergency power off
@ 2025-05-19  7:38 Bo Ye
  2025-05-19 15:31 ` Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Bo Ye @ 2025-05-19  7:38 UTC (permalink / raw)
  To: Alim Akhtar, Avri Altman, Bart Van Assche, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, AngeloGioacchino Del Regno
  Cc: xiujuan.tan, Qilin Tan, Bosser Ye, linux-scsi, linux-kernel,
	linux-arm-kernel, linux-mediatek

From: Qilin Tan <qilin.tan@mediatek.com>

When kernel_power_off is called directly without freezing userspace,
it may cause UFS crashes:

Callback:
    ...... 0xBFFFFFC080C6156C()
    vmlinux readl() + 52
    vmlinux ufshcd_add_command_trace() + 552
    vmlinux ufshcd_send_command() + 84

When kernel_power_off is executed, ufshcd_wl_shutdown is also called
to turn off the UFS reference clock, VCC, and VCCQ. If I/O requests
are still being sent to the UFS host and accessing the interrupt
status register at this time, AP read timeouts may occur, causing bus
hang crashes.

The root cause is that scsi_device_quiesce and blk_mq_freeze_queue
only drain the requests in the request queue but don't guarantee that
all requests have been dispatched to the UFS host and completed.
Requests may remain pending in the hardware dispatch queue and be
rescheduled later. If the UFS reference clock has already been turned
off at this point, a bus hang crash will occur.

Example of the race condition:
Thread 1                                                   Thread 2
kernel_power_off
-> ufshcd_wl_shutdown
 -> scsi_device_quiesce(sdev)
  -> blk_mq_freeze_queue(q)
   -> blk_mq_run_hw_queue(htx, false)
    -> blk_mq_delay_run_hw_queue(hctx, 0)                blk_mq_run_work_fn
 -> ufshcd_suspend(hba) // disable ref clk                -> blk_mq_dispatch_rq_list
                                                           -> blk_mq_run_hw_queue()
                                                            -> ufshcd_send_command()
                                                             -> ufshcd_add_command_trace()
                                                              -> ufshcd_readl(hba, REG_INTERRUPT_STATUS)

When Thread-2's dispatch request is delayed due to heavy CPU load,
the interrupt status register may be read after the reference clock
is disabled, resulting in a bus hang crash.

To avoid this issue, call ufshcd_wait_for_doorbell_clr to wait until
all requests are processed before disabling the reference clock.

Signed-off-by: Qilin Tan <qilin.tan@mediatek.com>
Signed-off-by: Bosser Ye <bo.ye@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 7735421e3991..a1013aea8e90 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -10262,6 +10262,7 @@ static void ufshcd_wl_shutdown(struct device *dev)
 		scsi_device_set_state(sdev, SDEV_OFFLINE);
 		mutex_unlock(&sdev->state_mutex);
 	}
+	ufshcd_wait_for_doorbell_clr(hba, 5 * USEC_PER_SEC);
 	__ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);
 
 	/*
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] scsi: ufs: preventing bus hang crash during emergency power off
  2025-05-19  7:38 [PATCH] scsi: ufs: preventing bus hang crash during emergency power off Bo Ye
@ 2025-05-19 15:31 ` Bart Van Assche
  2025-06-24 16:14 ` Bart Van Assche
  2025-07-07 16:14 ` Bart Van Assche
  2 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2025-05-19 15:31 UTC (permalink / raw)
  To: Bo Ye, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, AngeloGioacchino Del Regno
  Cc: xiujuan.tan, Qilin Tan, linux-scsi, linux-kernel,
	linux-arm-kernel, linux-mediatek

On 5/19/25 12:38 AM, Bo Ye wrote:
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 7735421e3991..a1013aea8e90 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -10262,6 +10262,7 @@ static void ufshcd_wl_shutdown(struct device *dev)
>   		scsi_device_set_state(sdev, SDEV_OFFLINE);
>   		mutex_unlock(&sdev->state_mutex);
>   	}
> +	ufshcd_wait_for_doorbell_clr(hba, 5 * USEC_PER_SEC);
>   	__ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);
>   
>   	/*

UFS driver kernel patches should be prepared against Martin's for-next
branch. Commit a3a951064f7e ("scsi: ufs: Rework 
ufshcd_clock_scaling_prepare()") removed ufshcd_wait_for_doorbell_clr().

Please fix the build failure introduced by this patch.

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] scsi: ufs: preventing bus hang crash during emergency power off
  2025-05-19  7:38 [PATCH] scsi: ufs: preventing bus hang crash during emergency power off Bo Ye
  2025-05-19 15:31 ` Bart Van Assche
@ 2025-06-24 16:14 ` Bart Van Assche
       [not found]   ` <SEYPR03MB6531CD4D3FAD80339F8E693B944FA@SEYPR03MB6531.apcprd03.prod.outlook.com>
  2025-07-07 16:14 ` Bart Van Assche
  2 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2025-06-24 16:14 UTC (permalink / raw)
  To: Bo Ye, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, AngeloGioacchino Del Regno
  Cc: xiujuan.tan, Qilin Tan, linux-scsi, linux-kernel,
	linux-arm-kernel, linux-mediatek

On 5/19/25 12:38 AM, Bo Ye wrote:
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 7735421e3991..a1013aea8e90 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -10262,6 +10262,7 @@ static void ufshcd_wl_shutdown(struct device *dev)
>   		scsi_device_set_state(sdev, SDEV_OFFLINE);
>   		mutex_unlock(&sdev->state_mutex);
>   	}
> +	ufshcd_wait_for_doorbell_clr(hba, 5 * USEC_PER_SEC);
>   	__ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);
>   
>   	/*

This code path is not only triggered when using a UFSHCI 3.0 controller
but also when using a UFSHCI 4.0 controller.
ufshcd_wait_for_doorbell_clr() only supports the legacy single doorbell
mode. Please make sure that the fix supports both the legacy single
doorbell mode and MCQ.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 回覆: [PATCH] scsi: ufs: preventing bus hang crash during emergency power off
       [not found]   ` <SEYPR03MB6531CD4D3FAD80339F8E693B944FA@SEYPR03MB6531.apcprd03.prod.outlook.com>
@ 2025-07-07 15:39     ` Bart Van Assche
  0 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2025-07-07 15:39 UTC (permalink / raw)
  To: Bo Ye (叶波), Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Matthias Brugger,
	AngeloGioacchino Del Regno
  Cc: Xiujuan Tan (谭秀娟),
	Qilin Tan (谭麒麟), linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org

On 7/7/25 12:49 AM, Bo Ye (叶波) wrote:
> Thank you for your feedback.
> I believe ufshcd_wait_for_doorbell_clr() can handle both legacy single 
> doorbell mode and MCQ [ ... ]

Agreed.

> Therefore, the current implementation should support both modes.
> Please let me know if you have further concerns.

The Fixes tag seems wrong to me. I think it should be changed into the
following:

Fixes: 19a198b67767 ("scsi: ufs: core: Set SDEV_OFFLINE when UFS is shut 
down")

I will add more comments as a reply to the original patch.

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] scsi: ufs: preventing bus hang crash during emergency power off
  2025-05-19  7:38 [PATCH] scsi: ufs: preventing bus hang crash during emergency power off Bo Ye
  2025-05-19 15:31 ` Bart Van Assche
  2025-06-24 16:14 ` Bart Van Assche
@ 2025-07-07 16:14 ` Bart Van Assche
  2 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2025-07-07 16:14 UTC (permalink / raw)
  To: Bo Ye, Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Matthias Brugger, AngeloGioacchino Del Regno
  Cc: xiujuan.tan, Qilin Tan, linux-scsi, linux-kernel,
	linux-arm-kernel, linux-mediatek


On 5/19/25 12:38 AM, Bo Ye wrote:
> The root cause is that scsi_device_quiesce and blk_mq_freeze_queue
> only drain the requests in the request queue but don't guarantee that
> all requests have been dispatched to the UFS host and completed.
> Requests may remain pending in the hardware dispatch queue and be
> rescheduled later.

The above is confusing. scsi_device_quiesce() drains both the request
queue and the hardware queues for all commands associated with a single
logical unit. The problem that you are encountering is probably that
scsi_device_quiesce() is only called for the WLUN but not for the other
logical units and hence that there still may be commands being processed
for other logical units after scsi_device_quiesce() has returned.

> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 7735421e3991..a1013aea8e90 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -10262,6 +10262,7 @@ static void ufshcd_wl_shutdown(struct device *dev)
>   		scsi_device_set_state(sdev, SDEV_OFFLINE);
>   		mutex_unlock(&sdev->state_mutex);
>   	}
> +	ufshcd_wait_for_doorbell_clr(hba, 5 * USEC_PER_SEC);
>   	__ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);
>   
>   	/*

The name of the ufshcd_wait_for_doorbell_clr() function is confusing
since "doorbell" refers to a legacy single doorbell mode concept while
the function supports both legacy mode and MCQ. After this patch has
been queued I plan to submit a patch that renames this function.

If the patch description of this patch is improved I will add my
Reviewed-by to this patch.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-07 17:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19  7:38 [PATCH] scsi: ufs: preventing bus hang crash during emergency power off Bo Ye
2025-05-19 15:31 ` Bart Van Assche
2025-06-24 16:14 ` Bart Van Assche
     [not found]   ` <SEYPR03MB6531CD4D3FAD80339F8E693B944FA@SEYPR03MB6531.apcprd03.prod.outlook.com>
2025-07-07 15:39     ` 回覆: " Bart Van Assche
2025-07-07 16:14 ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).