* [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error
@ 2025-11-12 6:32 Po-Wen Kao
2025-11-12 8:17 ` Peter Wang (王信友)
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Po-Wen Kao @ 2025-11-12 6:32 UTC (permalink / raw)
Cc: Brian Kao, Alim Akhtar, Avri Altman, Bart Van Assche,
James E.J. Bottomley, Martin K. Petersen, Peter Wang, Bean Huo,
Manivannan Sadhasivam, Bao D. Nguyen, Adrian Hunter,
open list:UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER,
open list
From: Brian Kao <powenkao@google.com>
When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
including the scsi_target, may be runtime suspended. Subsequently, the
error handler in ufshcd_recover_pm_error() fails to set the W-LUN
device back to active because the parent target is not active.
This results in the following errors:
google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ...
ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000
ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
...
ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
This patch addresses this by:
1. Ensuring the W-LUN's parent scsi_target is runtime resumed before
attempting to set the W-LUN to active within ufshcd_recover_pm_error().
2. Explicitly checking for power.runtime_error on the HBA and W-LUN
devices before calling pm_runtime_set_active() to clear the error state.
3. Adding pm_runtime_get_sync(hba->dev) in
ufshcd_err_handling_prepare() to ensure the HBA itself
is active during error recovery, even if a child device resume failed.
These changes ensure the device power states are managed correctly
during error recovery.
Signed-off-by: Brian Kao <powenkao@google.com>
Tested-by: Brian Kao <powenkao@google.com>
---
drivers/ufs/core/ufshcd.c | 36 ++++++++++++++++++++++++++++--------
1 file changed, 28 insertions(+), 8 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index fc4d1b6576dc..9176d7ce25b4 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6504,6 +6504,11 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
{
+ /*
+ * A WLUN resume failure could potentially lead to the HBA being
+ * runtime suspended, so take an extra reference on hba->dev.
+ */
+ pm_runtime_get_sync(hba->dev);
ufshcd_rpm_get_sync(hba);
if (pm_runtime_status_suspended(&hba->ufs_device_wlun->sdev_gendev) ||
hba->is_sys_suspended) {
@@ -6543,6 +6548,7 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
if (ufshcd_is_clkscaling_supported(hba))
ufshcd_clk_scaling_suspend(hba, false);
ufshcd_rpm_put(hba);
+ pm_runtime_put(hba->dev);
}
static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
@@ -6557,28 +6563,42 @@ static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
#ifdef CONFIG_PM
static void ufshcd_recover_pm_error(struct ufs_hba *hba)
{
+ struct scsi_target *starget = hba->ufs_device_wlun->sdev_target;
struct Scsi_Host *shost = hba->host;
struct scsi_device *sdev;
struct request_queue *q;
- int ret;
+ bool resume_sdev_queues = false;
hba->is_sys_suspended = false;
+
/*
- * Set RPM status of wlun device to RPM_ACTIVE,
- * this also clears its runtime error.
+ * Ensure the parent's error status is cleared before proceeding
+ * to the child, as the parent must be active to activate the child.
*/
- ret = pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev);
+ if (hba->dev->power.runtime_error) {
+ /* hba->dev has no functional parent thus simplily set RPM_ACTIVE */
+ pm_runtime_set_active(hba->dev);
+ resume_sdev_queues = true;
+ }
+
+ if (hba->ufs_device_wlun->sdev_gendev.power.runtime_error) {
+ /*
+ * starget, parent of wlun, might be suspended if wlun resume failed.
+ * Make sure parent is resumed before set child (wlun) active.
+ */
+ pm_runtime_get_sync(&starget->dev);
+ pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev);
+ pm_runtime_put_sync(&starget->dev);
+ resume_sdev_queues = true;
+ }
- /* hba device might have a runtime error otherwise */
- if (ret)
- ret = pm_runtime_set_active(hba->dev);
/*
* If wlun device had runtime error, we also need to resume those
* consumer scsi devices in case any of them has failed to be
* resumed due to supplier runtime resume failure. This is to unblock
* blk_queue_enter in case there are bios waiting inside it.
*/
- if (!ret) {
+ if (resume_sdev_queues) {
shost_for_each_device(sdev, shost) {
q = sdev->request_queue;
if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
base-commit: c53a741a7fd4b8e9d07acf1861b5e4a188c6585a
--
2.51.2.1041.gc1ab5b90ca-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error
2025-11-12 6:32 [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error Po-Wen Kao
@ 2025-11-12 8:17 ` Peter Wang (王信友)
[not found] ` <CA+=0d2YnrDL41DXtC8kDmtXioy4+hohGsmrOPxJY31jqt22uww@mail.gmail.com>
2025-11-20 16:32 ` Bart Van Assche
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Peter Wang (王信友) @ 2025-11-12 8:17 UTC (permalink / raw)
To: powenkao@google.com
Cc: beanhuo@micron.com, avri.altman@wdc.com, quic_nguyenb@quicinc.com,
linux-scsi@vger.kernel.org, bvanassche@acm.org,
linux-kernel@vger.kernel.org, alim.akhtar@samsung.com,
adrian.hunter@intel.com, mani@kernel.org,
James.Bottomley@HansenPartnership.com, martin.petersen@oracle.com
On Wed, 2025-11-12 at 06:32 +0000, Po-Wen Kao wrote:
> google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state
> eh_fatal; ...
> ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1,
> result 40000
> ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
> ...
> ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child
> device 0:0:0:49488 but parent (target0:0:0) is not active
>
Hi Brian,
How is ufshcd_err_handler triggered before the parent device
resumes? I mean, what causes ufshcd_err_handler to be
triggered, an error interrupt or something else?
Thanks
Peter
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error
2025-11-12 6:32 [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error Po-Wen Kao
2025-11-12 8:17 ` Peter Wang (王信友)
@ 2025-11-20 16:32 ` Bart Van Assche
2025-11-29 20:02 ` Martin K. Petersen
2025-12-09 3:21 ` Martin K. Petersen
3 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2025-11-20 16:32 UTC (permalink / raw)
To: Po-Wen Kao
Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
Martin K. Petersen, Peter Wang, Bean Huo, Manivannan Sadhasivam,
Bao D. Nguyen, Adrian Hunter,
open list:UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER,
open list
On 11/11/25 10:32 PM, Po-Wen Kao wrote:
> When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
> including the scsi_target, may be runtime suspended. Subsequently, the
> error handler in ufshcd_recover_pm_error() fails to set the W-LUN
> device back to active because the parent target is not active.
> This results in the following errors:
>
> google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ...
> ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000
> ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
> ...
> ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
>
> This patch addresses this by:
>
> 1. Ensuring the W-LUN's parent scsi_target is runtime resumed before
> attempting to set the W-LUN to active within ufshcd_recover_pm_error().
> 2. Explicitly checking for power.runtime_error on the HBA and W-LUN
> devices before calling pm_runtime_set_active() to clear the error state.
> 3. Adding pm_runtime_get_sync(hba->dev) in
> ufshcd_err_handling_prepare() to ensure the HBA itself
> is active during error recovery, even if a child device resume failed.
>
> These changes ensure the device power states are managed correctly
> during error recovery.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error
2025-11-12 6:32 [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error Po-Wen Kao
2025-11-12 8:17 ` Peter Wang (王信友)
2025-11-20 16:32 ` Bart Van Assche
@ 2025-11-29 20:02 ` Martin K. Petersen
2025-12-09 3:21 ` Martin K. Petersen
3 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2025-11-29 20:02 UTC (permalink / raw)
To: Po-Wen Kao
Cc: Alim Akhtar, Avri Altman, Bart Van Assche, James E.J. Bottomley,
Martin K. Petersen, Peter Wang, Bean Huo, Manivannan Sadhasivam,
Bao D. Nguyen, Adrian Hunter,
open list:UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER,
open list
Po-Wen,
> When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
> including the scsi_target, may be runtime suspended. Subsequently, the
> error handler in ufshcd_recover_pm_error() fails to set the W-LUN
> device back to active because the parent target is not active. This
> results in the following errors:
Applied to 6.19/scsi-staging, thanks!
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error
2025-11-12 6:32 [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error Po-Wen Kao
` (2 preceding siblings ...)
2025-11-29 20:02 ` Martin K. Petersen
@ 2025-12-09 3:21 ` Martin K. Petersen
3 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2025-12-09 3:21 UTC (permalink / raw)
To: Po-Wen Kao
Cc: Martin K . Petersen, Alim Akhtar, Avri Altman, Bart Van Assche,
James E.J. Bottomley, Peter Wang, Bean Huo, Manivannan Sadhasivam,
Bao D. Nguyen, Adrian Hunter,
open list:UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER,
open list
On Wed, 12 Nov 2025 06:32:02 +0000, Po-Wen Kao wrote:
> When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
> including the scsi_target, may be runtime suspended. Subsequently, the
> error handler in ufshcd_recover_pm_error() fails to set the W-LUN
> device back to active because the parent target is not active.
> This results in the following errors:
>
> google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ...
> ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000
> ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
> ...
> ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
>
> [...]
Applied to 6.19/scsi-queue, thanks!
[1/1] scsi: ufs: core: Fix EH failure after wlun resume error
https://git.kernel.org/mkp/scsi/c/b4bb6daf4ac4
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-12-09 3:21 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-12 6:32 [PATCH 1/1] scsi: ufs: core: Fix EH failure after wlun resume error Po-Wen Kao
2025-11-12 8:17 ` Peter Wang (王信友)
[not found] ` <CA+=0d2YnrDL41DXtC8kDmtXioy4+hohGsmrOPxJY31jqt22uww@mail.gmail.com>
2025-11-12 9:29 ` Brian Kao
2025-11-13 10:06 ` Peter Wang (王信友)
2025-11-18 7:26 ` Brian Kao
2025-11-19 9:18 ` Peter Wang (王信友)
2025-11-19 23:44 ` Brian Kao
2025-11-20 13:14 ` Peter Wang (王信友)
2025-11-20 16:32 ` Bart Van Assche
2025-11-29 20:02 ` Martin K. Petersen
2025-12-09 3:21 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox