[PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
       [not found] <CGME20250714090630epcas1p28ab8afec11bbab4d256dfe6649d3b00b@epcas1p2.samsung.com>
@ 2025-07-14  9:06 ` Seunghui Lee
  2025-07-14 11:21   ` Bean Huo
  0 siblings, 1 reply; 12+ messages in thread
From: Seunghui Lee @ 2025-07-14  9:06 UTC (permalink / raw)
  To: alim.akhtar, avri.altman, bvanassche, James.Bottomley,
	martin.petersen, linux-scsi, sdriver.sec
  Cc: Seunghui Lee

If the h8 exit fails during runtime resume process,
the runtime thread enters runtime suspend immediately
and the error handler operates at the same time.
It becomes stuck and cannot be recovered through the error handler.
To fix this, use link recovery instead of the error handler.

Signed-off-by: Seunghui Lee <sh043.lee@samsung.com>
---
 drivers/ufs/core/ufshcd.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 50adfb8b335b..dc2845c32d72 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4340,7 +4340,7 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
 	hba->uic_async_done = NULL;
 	if (reenable_intr)
 		ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
-	if (ret) {
+	if (ret && !hba->pm_op_in_progress) {
 		ufshcd_set_link_broken(hba);
 		ufshcd_schedule_eh_work(hba);
 	}
@@ -4348,6 +4348,14 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	mutex_unlock(&hba->uic_cmd_mutex);
 
+	/*
+	 * If the h8 exit fails during the runtime resume process,
+	 * it becomes stuck and cannot be recovered through the error handler.
+	 * To fix this, use link recovery instead of the error handler.
+	 */
+	if (ret && hba->pm_op_in_progress)
+		ret = ufshcd_link_recovery(hba);
+
 	return ret;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-14  9:06 ` [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume Seunghui Lee
@ 2025-07-14 11:21   ` Bean Huo
  2025-07-15  2:23     ` 이승희
  0 siblings, 1 reply; 12+ messages in thread
From: Bean Huo @ 2025-07-14 11:21 UTC (permalink / raw)
  To: Seunghui Lee, alim.akhtar, avri.altman, bvanassche,
	James.Bottomley, martin.petersen, linux-scsi, sdriver.sec

On Mon, 2025-07-14 at 18:06 +0900, Seunghui Lee wrote:
> If the h8 exit fails during runtime resume process,
> the runtime thread enters runtime suspend immediately
> and the error handler operates at the same time.
> It becomes stuck and cannot be recovered through the error handler.
> To fix this, use link recovery instead of the error handler.
> 
> Signed-off-by: Seunghui Lee <sh043.lee@samsung.com>
> ---
>  drivers/ufs/core/ufshcd.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 50adfb8b335b..dc2845c32d72 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -4340,7 +4340,7 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
>         hba->uic_async_done = NULL;
>         if (reenable_intr)
>                 ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
> -       if (ret) {
> +       if (ret && !hba->pm_op_in_progress) {
>                 ufshcd_set_link_broken(hba);
>                 ufshcd_schedule_eh_work(hba);
>         }
> @@ -4348,6 +4348,14 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
>         spin_unlock_irqrestore(hba->host->host_lock, flags);
>         mutex_unlock(&hba->uic_cmd_mutex);
>  
> +       /*
> +        * If the h8 exit fails during the runtime resume process,
> +        * it becomes stuck and cannot be recovered through the error handler.
> +        * To fix this, use link recovery instead of the error handler.
> +        */
> +       if (ret && hba->pm_op_in_progress)
> +               ret = ufshcd_link_recovery(hba);
> +
>         return ret;
>  }
>  
I have one queston:

In the error handler, if the link is broken(set by ufshcd_set_link_broken()), then in ufshcd_err_handler(),
will ufshcd_reset_and_restore(hba), does not this work?


Kind regards,
Bean


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-14 11:21   ` Bean Huo
@ 2025-07-15  2:23     ` 이승희
  2025-07-15 12:47       ` Bean Huo
  2025-07-15 15:21       ` Bean Huo
  0 siblings, 2 replies; 12+ messages in thread
From: 이승희 @ 2025-07-15  2:23 UTC (permalink / raw)
  To: 'Bean Huo', alim.akhtar, avri.altman, bvanassche,
	James.Bottomley, martin.petersen, linux-scsi, sdriver.sec

> -----Original Message-----
> From: Bean Huo <huobean@gmail.com>
> Sent: Monday, July 14, 2025 8:21 PM
> To: Seunghui Lee <sh043.lee@samsung.com>; alim.akhtar@samsung.com;
> avri.altman@wdc.com; bvanassche@acm.org;
> James.Bottomley@HansenPartnership.com; martin.petersen@oracle.com; linux-
> scsi@vger.kernel.org; sdriver.sec@samsung.com
> Subject: Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure
> during runtime resume
> 
> On Mon, 2025-07-14 at 18:06 +0900, Seunghui Lee wrote:
> > If the h8 exit fails during runtime resume process, the runtime thread
> > enters runtime suspend immediately and the error handler operates at
> > the same time.
> > It becomes stuck and cannot be recovered through the error handler.
> > To fix this, use link recovery instead of the error handler.
> >
> > Signed-off-by: Seunghui Lee <sh043.lee@samsung.com>
> > ---
> >  drivers/ufs/core/ufshcd.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> > index 50adfb8b335b..dc2845c32d72 100644
> > --- a/drivers/ufs/core/ufshcd.c
> > +++ b/drivers/ufs/core/ufshcd.c
> > @@ -4340,7 +4340,7 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba
> > *hba, struct uic_command *cmd)
> >         hba->uic_async_done = NULL;
> >         if (reenable_intr)
> >                 ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
> > -       if (ret) {
> > +       if (ret && !hba->pm_op_in_progress) {
> >                 ufshcd_set_link_broken(hba);
> >                 ufshcd_schedule_eh_work(hba);
> >         }
> > @@ -4348,6 +4348,14 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba
> > *hba, struct uic_command *cmd)
> >         spin_unlock_irqrestore(hba->host->host_lock, flags);
> >         mutex_unlock(&hba->uic_cmd_mutex);
> >
> > +       /*
> > +        * If the h8 exit fails during the runtime resume process,
> > +        * it becomes stuck and cannot be recovered through the error
> handler.
> > +        * To fix this, use link recovery instead of the error handler.
> > +        */
> > +       if (ret && hba->pm_op_in_progress)
> > +               ret = ufshcd_link_recovery(hba);
> > +
> >         return ret;
> >  }
> >
> I have one queston:
> 
> In the error handler, if the link is broken(set by
> ufshcd_set_link_broken()), then in ufshcd_err_handler(), will
> ufshcd_reset_and_restore(hba), does not this work?
> 
> 
> Kind regards,
> Bean
> 


Unfortunately, it doesn't work.
Please refer to the below log.

[  310.118416] [4:    kworker/4:4:  786] ufshcd-qcom 1d84000.ufshc: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = -110
[  310.118423] [4:    kworker/4:4:  786] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_resume: hibern8 exit failed -110

[  310.118424] [0:  kworker/u32:0:   12] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0; link is broken

[  310.119046] [4:    kworker/4:4:  786] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -110
[  310.119051] [4:    kworker/4:4:  786] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume: 560926us, pwr_mode(1), link state(3)
-> ufshcd_wl_runtime_resume failed done.
   -> ufshcd_rpm_get_sync() in ufshcd_err_handling_prepare()

[  310.119104] [4:    kworker/4:4:  786] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend start.
-> ufshcd_runtime_suspend()
   -> ufshcd_suspend()
      -> ufshcd_disable_irq() / ufshcd_setup_clocks( , false) / ufshcd_vreg_set_lpm() / ufshcd_hba_vreg_set_lpm()
[  310.119111]I[0:  kworker/u32:0:   12] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: Auto Hibern8 Exit failed - status: 0x00000020, upmcrs: 0x00000001
[  310.119119]I[0:  kworker/u32:0:   12] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20 saved_uic_err 0x0
<snip>
[  310.119162] [4:    kworker/4:4:  786]  gcc_ufs_mem_phy_gdsc: genpd_power_off
[  310.119167] [4:    kworker/4:4:  786] CPU: 4 UID: 0 PID: 786 Comm: kworker/4:4 Tainted: G        W  O       6.12.23-android16-5-31706220-ud-abogki31706220-4k #1 5bbb440b8bd7ff2d31dd25fab2106d21aa9b6357
[  310.119176] [4:    kworker/4:4:  786] Tainted: [W]=WARN, [O]=OOT_MODULE
[  310.119179] [4:    kworker/4:4:  786] Hardware name: Samsung M2Q PROJECT (board-id,05) (DT)
[  310.119183] [4:    kworker/4:4:  786] Workqueue: pm pm_runtime_work
[  310.119189] [4:    kworker/4:4:  786] 
[  310.119192] [4:    kworker/4:4:  786] Call trace:
[  310.119195] [4:    kworker/4:4:  786]  dump_backtrace+0xec/0x128
[  310.119205] [4:    kworker/4:4:  786]  show_stack+0x18/0x28
[  310.119212] [4:    kworker/4:4:  786]  dump_stack_lvl+0x40/0x88
[  310.119219] [4:    kworker/4:4:  786]  dump_stack+0x18/0x24
[  310.119226] [4:    kworker/4:4:  786]  genpd_power_off+0x304/0x308
[  310.119232] [4:    kworker/4:4:  786]  genpd_runtime_suspend+0x260/0x38c
[  310.119238] [4:    kworker/4:4:  786]  __rpm_callback+0x94/0x390
[  310.119242] [4:    kworker/4:4:  786]  rpm_suspend+0x284/0x640
[  310.119249] [4:    kworker/4:4:  786]  rpm_idle+0x58/0x37c
[  310.119255] [4:    kworker/4:4:  786]  __pm_runtime_idle+0x60/0x150
[  310.119262] [4:    kworker/4:4:  786]  ufs_qcom_phy_qmp_v4_power_control+0x144/0x15c [phy_qcom_ufs_qmp_v4_canoe f98acc73ebd0cc2fea7bd12b574d8fd77ee19353]
[  310.119276] [4:    kworker/4:4:  786]  ufs_qcom_phy_power_off+0x44/0x308 [phy_qcom_ufs b7bbd5d6bbe64d4ff5f3e89a124ca991f7c4f08a]
[  310.119291] [4:    kworker/4:4:  786]  phy_power_off+0x58/0xdc
[  310.119300] [4:    kworker/4:4:  786]  ufs_qcom_setup_clocks+0x3f4/0x7e8 [ufs_qcom 01a09a66f1eae71e0199ddd5861db80b7fe6c630]
[  310.119341] [4:    kworker/4:4:  786]  ufshcd_setup_clocks+0x74/0x3d8
[  310.119351] [4:    kworker/4:4:  786]  ufshcd_suspend+0x48/0x160
[  310.119357] [4:    kworker/4:4:  786]  ufshcd_runtime_suspend+0x70/0x1b8
[  310.119363] [4:    kworker/4:4:  786]  pm_generic_runtime_suspend+0x40/0x58
[  310.119369] [4:    kworker/4:4:  786]  genpd_runtime_suspend+0x128/0x38c
[  310.119375] [4:    kworker/4:4:  786]  __rpm_callback+0x94/0x390
[  310.119378] [4:    kworker/4:4:  786]  rpm_suspend+0x2a4/0x640
[  310.119385] [4:    kworker/4:4:  786]  pm_runtime_work+0x8c/0xa8
[  310.119389] [4:    kworker/4:4:  786]  process_scheduled_works+0x1c4/0x45c
[  310.119394] [4:    kworker/4:4:  786]  worker_thread+0x32c/0x3e8
[  310.119399] [4:    kworker/4:4:  786]  kthread+0x11c/0x1b0
[  310.119405] [4:    kworker/4:4:  786]  ret_from_fork+0x10/0x20

[  310.120394] [0:  kworker/u32:0:   12]  gcc_ufs_mem_phy_gdsc: genpd_power_on
[  310.120398] [0:  kworker/u32:0:   12] CPU: 0 UID: 0 PID: 12 Comm: kworker/u32:0 Tainted: G        W  O       6.12.23-android16-5-31706220-ud-abogki31706220-4k #1 5bbb440b8bd7ff2d31dd25fab2106d21aa9b6357
[  310.120408] [0:  kworker/u32:0:   12] Tainted: [W]=WARN, [O]=OOT_MODULE
[  310.120410] [0:  kworker/u32:0:   12] Hardware name: Samsung M2Q PROJECT (board-id,05) (DT)
[  310.120413] [0:  kworker/u32:0:   12] Workqueue: ufs_eh_wq_0 ufshcd_err_handler
[  310.120423] [0:  kworker/u32:0:   12] Call trace:
[  310.120426] [0:  kworker/u32:0:   12]  dump_backtrace+0xec/0x128
[  310.120435] [0:  kworker/u32:0:   12]  show_stack+0x18/0x28
[  310.120442] [0:  kworker/u32:0:   12]  dump_stack_lvl+0x40/0x88
[  310.120448] [0:  kworker/u32:0:   12]  dump_stack+0x18/0x24
[  310.120455] [0:  kworker/u32:0:   12]  genpd_power_on+0x34c/0x3b0
[  310.120463] [0:  kworker/u32:0:   12]  genpd_runtime_resume+0x16c/0x43c
[  310.120469] [0:  kworker/u32:0:   12]  __rpm_callback+0x94/0x390
[  310.120473] [0:  kworker/u32:0:   12]  rpm_resume+0x3bc/0x5a8
[  310.120480] [0:  kworker/u32:0:   12]  __pm_runtime_resume+0x48/0x8c
[  310.120487] [0:  kworker/u32:0:   12]  ufs_qcom_phy_qmp_v4_power_control+0x34/0x15c [phy_qcom_ufs_qmp_v4_canoe f98acc73ebd0cc2fea7bd12b574d8fd77ee19353]
[  310.120497] [0:  kworker/u32:0:   12]  ufs_qcom_phy_power_on+0x13c/0x79c [phy_qcom_ufs b7bbd5d6bbe64d4ff5f3e89a124ca991f7c4f08a]
[  310.120512] [0:  kworker/u32:0:   12]  phy_power_on+0x9c/0x124
[  310.120520] [0:  kworker/u32:0:   12]  ufs_qcom_setup_clocks+0xb4/0x7e8 [ufs_qcom 01a09a66f1eae71e0199ddd5861db80b7fe6c630]
[  310.120560] [0:  kworker/u32:0:   12]  ufshcd_setup_clocks+0x150/0x3d8
[  310.120567] [0:  kworker/u32:0:   12]  ufshcd_err_handler+0x42c/0xd20
[  310.120574] [0:  kworker/u32:0:   12]  process_scheduled_works+0x1c4/0x45c
[  310.120579] [0:  kworker/u32:0:   12]  worker_thread+0x32c/0x3e8
[  310.120584] [0:  kworker/u32:0:   12]  kthread+0x11c/0x1b0
[  310.120590] [0:  kworker/u32:0:   12]  ret_from_fork+0x10/0x20
-> ufshcd_err_handler()
   -> ufshcd_err_handling_prepare()
      -> ufshcd_setup_clocks( , true)

[  310.122158] [4:    kworker/4:4:  786] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend: 3056us, pwr_mode(1), link state(3)
-> ufshcd_runtime_suspend done.

[  310.122187] [4:    kworker/4:4:  786]  gcc_ufs_phy_gdsc: genpd_power_off
[  310.122191] [4:    kworker/4:4:  786] CPU: 4 UID: 0 PID: 786 Comm: kworker/4:4 Tainted: G        W  O       6.12.23-android16-5-31706220-ud-abogki31706220-4k #1 5bbb440b8bd7ff2d31dd25fab2106d21aa9b6357
[  310.122198] [4:    kworker/4:4:  786] Tainted: [W]=WARN, [O]=OOT_MODULE
[  310.122200] [4:    kworker/4:4:  786] Hardware name: Samsung M2Q PROJECT (board-id,05) (DT)
[  310.122203] [4:    kworker/4:4:  786] Workqueue: pm pm_runtime_work
[  310.122208] [4:    kworker/4:4:  786] Call trace:
[  310.122210] [4:    kworker/4:4:  786]  dump_backtrace+0xec/0x128
[  310.122218] [4:    kworker/4:4:  786]  show_stack+0x18/0x28
[  310.122225] [4:    kworker/4:4:  786]  dump_stack_lvl+0x40/0x88
[  310.122231] [4:    kworker/4:4:  786]  dump_stack+0x18/0x24
[  310.122238] [4:    kworker/4:4:  786]  genpd_power_off+0x304/0x308
[  310.122244] [4:    kworker/4:4:  786]  genpd_runtime_suspend+0x260/0x38c
[  310.122249] [4:    kworker/4:4:  786]  __rpm_callback+0x94/0x390
[  310.122253] [4:    kworker/4:4:  786]  rpm_suspend+0x2a4/0x640
[  310.122260] [4:    kworker/4:4:  786]  pm_runtime_work+0x8c/0xa8
[  310.122263] [4:    kworker/4:4:  786]  process_scheduled_works+0x1c4/0x45c
[  310.122268] [4:    kworker/4:4:  786]  worker_thread+0x32c/0x3e8
[  310.122272] [4:    kworker/4:4:  786]  kthread+0x11c/0x1b0
[  310.122279] [4:    kworker/4:4:  786]  ret_from_fork+0x10/0x20

[  313.197378] [4:  kworker/u32:0:   12] ufs_qcom_phy_qmp_v4_canoe 1d80000.ufsphy_mem: ufs_qcom_phy_qmp_v4_is_pcs_ready: poll for pcs failed err = -110
[  313.197398] [4:  kworker/u32:0:   12] ufshcd-qcom 1d84000.ufshc: ufs_qcom_power_up_sequence: Failed to calibrate PHY -110
[  313.255403] [0:  kworker/u32:0:   12] ufshcd-qcom 1d84000.ufshc: Controller enable failed



To fix this, I've tried two more things.

1> The error handler waits until ufshcd_runtime_suspend done.
-> It doesn't work, either.

--- a/common/drivers/ufs/core/ufshcd.c
+++ b/common/drivers/ufs/core/ufshcd.c
@@ -6551,6 +6551,11 @@ static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
            hba->is_sys_suspended) {
                enum ufs_pm_op pm_op;
 
+               while (!pm_runtime_status_suspended(hba->dev)) {
+                       dev_err(hba->dev, "%s: waiting for complete suspend\n", __func__);
+                       msleep(10);
+               }
+
                /*
                 * Don't assume anything of resume, if
                 * resume fails, irq and clocks can be OFF, and powers


[  335.876451] [0:    kworker/0:3:  713] ufshcd-qcom 1d84000.ufshc: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = -110
[  335.876456] [0:    kworker/0:3:  713] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_resume: hibern8 exit failed -110

[  335.876460] [4:  kworker/u32:2:   88] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0; link is broken

[  335.877072] [0:    kworker/0:3:  713] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -110
[  335.877076] [0:    kworker/0:3:  713] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume: 544445us, pwr_mode(1), link state(3)
-> ufshcd_wl_runtime_resume failed done.
   -> ufshcd_rpm_get_sync() in ufshcd_err_handling_prepare()

[  335.877112] [4:  kworker/u32:2:   88] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handling_prepare: waiting for complete suspend
-> wait until ufshcd_runtime_suspend done.

[  335.877123] [0:    kworker/0:3:  713] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend start.
[  335.879805] [0:    kworker/0:3:  713] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend: 2685us, pwr_mode(1), link state(3)

[  337.950174] [4:  kworker/u32:2:   88] ufshcd-qcom 1d84000.ufshc: ESI configured
[  337.950302] [4:  kworker/u32:2:   88] ufshcd-qcom 1d84000.ufshc: MCQ configured, nr_queues=9, io_queues=8, read_queue=0, poll_queues=1, queue_depth=64
-> ufshcd_reset_and_restore() works well.

[  337.970178] [0:  kworker/u32:2:   88] ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
-> ufschd_recover_pm_err()
   -> Because of this error, pm_request_resume doesn't call here.

[  337.970212] [0:  kworker/u32:2:   88] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler finished; HBA state operational



2> if eh_in_progress, err EBUSY return in ufshcd_runtime_suspend to guarantee the error handling done.
-> It doesn't work as well.

--- a/common/drivers/ufs/core/ufshcd.c
+++ b/common/drivers/ufs/core/ufshcd.c
@@ -10371,6 +10371,9 @@ int ufshcd_runtime_suspend(struct device *dev)
        int ret;
        ktime_t start = ktime_get();
 
+       if (ufshcd_eh_in_progress(hba))
+               return -EBUSY;
+
        ret = ufshcd_suspend(hba);
 
        trace_ufshcd_runtime_suspend(hba, ret,

[   63.010841] [4:    kworker/4:0:   52] ufshcd-qcom 1d84000.ufshc: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = -110
[   63.010844] [4:    kworker/4:0:   52] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_resume: hibern8 exit failed -110

[   63.010845] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0; link is broken

[   63.011430] [4:    kworker/4:0:   52] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -110
[   63.011433] [4:    kworker/4:0:   52] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume: 574917us, pwr_mode(1), link state(3)
-> ufshcd_wl_runtime_resume failed done.
   -> ufshcd_rpm_get_sync() in ufshcd_err_handling_prepare()

[   63.011457] [4:    kworker/4:0:   52] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend: eh_in_progress
-> EBUSY return in ufshcd_runtime_suspend due to eh_in_progress

[   63.011464]I[0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: Auto Hibern8 Exit failed - status: 0x00000020, upmcrs: 0x00000001
[   63.011468]I[0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20 saved_uic_err 0x0

[   63.039824] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufs_qcom_device_reset: Waiting for device internal cache flush

[   65.084604] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ESI configured
[   65.084728] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: MCQ configured, nr_queues=9, io_queues=8, read_queue=0, poll_queues=1, queue_depth=64
-> ufshcd_reset_and_restore() works well.

[   65.105186] [1: kworker/u32:10:13604] ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
-> ufschd_recover_pm_err()
   -> Because of this error, pm_request_resume doesn't call here.
   
[   65.105305] [1: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler finished; HBA state operational
[   65.105310] [1: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state operational; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-15  2:23     ` 이승희
@ 2025-07-15 12:47       ` Bean Huo
  2025-07-15 15:21       ` Bean Huo
  1 sibling, 0 replies; 12+ messages in thread
From: Bean Huo @ 2025-07-15 12:47 UTC (permalink / raw)
  To: 이승희, alim.akhtar, avri.altman, bvanassche,
	James.Bottomley, martin.petersen, linux-scsi, sdriver.sec

On Tue, 2025-07-15 at 11:23 +0900, 이승희 wrote:
> 2> if eh_in_progress, err EBUSY return in ufshcd_runtime_suspend to guarantee the error handling done.
> -> It doesn't work as well.
> 
> --- a/common/drivers/ufs/core/ufshcd.c
> +++ b/common/drivers/ufs/core/ufshcd.c
> @@ -10371,6 +10371,9 @@ int ufshcd_runtime_suspend(struct device *dev)
>         int ret;
>         ktime_t start = ktime_get();
>  
> +       if (ufshcd_eh_in_progress(hba))
> +               return -EBUSY;
> +
>         ret = ufshcd_suspend(hba);
>  
>         trace_ufshcd_runtime_suspend(hba, ret,
> 
> [   63.010841] [4:    kworker/4:0:   52] ufshcd-qcom 1d84000.ufshc: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = -110
> [   63.010844] [4:    kworker/4:0:   52] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_resume: hibern8 exit failed -110
> 
> [   63.010845] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0;
> link is broken
> 
> [   63.011430] [4:    kworker/4:0:   52] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -110
> [   63.011433] [4:    kworker/4:0:   52] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume: 574917us, pwr_mode(1), link state(3)
> -> ufshcd_wl_runtime_resume failed done.
>    -> ufshcd_rpm_get_sync() in ufshcd_err_handling_prepare()
> 
> [   63.011457] [4:    kworker/4:0:   52] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend: eh_in_progress
> -> EBUSY return in ufshcd_runtime_suspend due to eh_in_progress
> 
> [   63.011464]I[0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: Auto Hibern8 Exit failed - status: 0x00000020, upmcrs: 0x00000001
> [   63.011468]I[0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20 saved_uic_err 0x0
> 
> [   63.039824] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufs_qcom_device_reset: Waiting for device internal cache flush
> 
> [   65.084604] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ESI configured
> [   65.084728] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: MCQ configured, nr_queues=9, io_queues=8, read_queue=0, poll_queues=1, queue_depth=64
> -> ufshcd_reset_and_restore() works well.
> 
> [   65.105186] [1: kworker/u32:10:13604] ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
> -> ufschd_recover_pm_err()
>    -> Because of this error, pm_request_resume doesn't call here.
>    
> [   65.105305] [1: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler finished; HBA state operational
> [   65.105310] [1: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state operational; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 


Thanks for updating the logs — I can now see the issue clearly.
Please add the appropriate Fixes tag to the patch.

By the way, I prefer the second approach as well and would rather handle recovery in a single interface.
It’s strange that it's not working as expected.
However, I don’t see ufschd_recover_pm_err() being invoked in my code either.

kind regards,
Bean

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-15  2:23     ` 이승희
  2025-07-15 12:47       ` Bean Huo
@ 2025-07-15 15:21       ` Bean Huo
  2025-07-16  7:01         ` Seunghui Lee
  1 sibling, 1 reply; 12+ messages in thread
From: Bean Huo @ 2025-07-15 15:21 UTC (permalink / raw)
  To: 이승희, alim.akhtar, avri.altman, bvanassche,
	James.Bottomley, martin.petersen, linux-scsi, sdriver.sec

On Tue, 2025-07-15 at 11:23 +0900, 이승희 wrote:
> 
> 2> if eh_in_progress, err EBUSY return in ufshcd_runtime_suspend to guarantee the error handling done.
> -> It doesn't work as well.
> 
> --- a/common/drivers/ufs/core/ufshcd.c
> +++ b/common/drivers/ufs/core/ufshcd.c
> @@ -10371,6 +10371,9 @@ int ufshcd_runtime_suspend(struct device *dev)
>         int ret;
>         ktime_t start = ktime_get();
>  
> +       if (ufshcd_eh_in_progress(hba))
> +               return -EBUSY;
> +
>         ret = ufshcd_suspend(hba);
>  
>         trace_ufshcd_runtime_suspend(hba, ret,
> 
> [   63.010841] [4:    kworker/4:0:   52] ufshcd-qcom 1d84000.ufshc: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = -110
> [   63.010844] [4:    kworker/4:0:   52] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_resume: hibern8 exit failed -110
> 
> [   63.010845] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0;
> link is broken
> 
> [   63.011430] [4:    kworker/4:0:   52] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -110
> [   63.011433] [4:    kworker/4:0:   52] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume: 574917us, pwr_mode(1), link state(3)
> -> ufshcd_wl_runtime_resume failed done.
>    -> ufshcd_rpm_get_sync() in ufshcd_err_handling_prepare()
> 
> [   63.011457] [4:    kworker/4:0:   52] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend: eh_in_progress
> -> EBUSY return in ufshcd_runtime_suspend due to eh_in_progress
> 
> [   63.011464]I[0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: Auto Hibern8 Exit failed - status: 0x00000020, upmcrs: 0x00000001
> [   63.011468]I[0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20 saved_uic_err 0x0
> 
> [   63.039824] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufs_qcom_device_reset: Waiting for device internal cache flush
> 
> [   65.084604] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ESI configured
> [   65.084728] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: MCQ configured, nr_queues=9, io_queues=8, read_queue=0, poll_queues=1, queue_depth=64
> -> ufshcd_reset_and_restore() works well.
> 
> [   65.105186] [1: kworker/u32:10:13604] ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
> -> ufschd_recover_pm_err()
>    -> Because of this error, pm_request_resume doesn't call here.


You patched ufshcd_runtime_suspend() to return -EBUSY if eh_in_progress is true — meant to avoid suspend during error handling. I don't understand why here parent is not active?

would like to try return -EAGAIN? 


Kind regards,
Bean


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-15 15:21       ` Bean Huo
@ 2025-07-16  7:01         ` Seunghui Lee
  2025-07-16  7:49           ` Seunghui Lee
  2025-07-16 15:14           ` Bart Van Assche
  0 siblings, 2 replies; 12+ messages in thread
From: Seunghui Lee @ 2025-07-16  7:01 UTC (permalink / raw)
  To: 'Bean Huo', alim.akhtar, avri.altman, bvanassche,
	James.Bottomley, martin.petersen, linux-scsi, sdriver.sec

> -----Original Message-----
> From: Bean Huo <huobean@gmail.com>
> Sent: Wednesday, July 16, 2025 12:22 AM
> To: 이승희 <sh043.lee@samsung.com>; alim.akhtar@samsung.com;
> avri.altman@wdc.com; bvanassche@acm.org;
> James.Bottomley@HansenPartnership.com; martin.petersen@oracle.com; linux-
> scsi@vger.kernel.org; sdriver.sec@samsung.com
> Subject: Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure
> during runtime resume
> 
> 
> You patched ufshcd_runtime_suspend() to return -EBUSY if eh_in_progress is
> true — meant to avoid suspend during error handling. I don't understand
> why here parent is not active?

After checking the RPM devices, I found that the parent device is suspended due to the failure of ufshcd_wl_runtime_resume.
Even if we guarantee both the completion of ufshcd_runtime_suspend and the error handling completion to avoid races, ufshcd_recover_pm_error can't fully recover from a runtime pm failure scenario.

> 
> would like to try return -EAGAIN?
> 
> 
> Kind regards,
> Bean
> 

I've tried that as well, but it doesn't work either.
[  328.915154] [0:  kworker/u32:7:  941] ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
[  328.915156] [0:  kworker/u32:7:  941] ufs_device_wlun 0:0:0:49488: ufshcd_recover_pm_error: rpm set_active ret(-16)
[  328.915157] [0:  kworker/u32:7:  941] ufshcd-qcom 1d84000.ufshc: ufshcd_recover_pm_error: rpm set_active ret(-11)
Due to this error, pm_request_resume(q->dev) for each scsi device can't execute.
This causes the I/O stack to become stuck.

This issue arises from the race condition between the runtime pm worker and the error handler worker.
I believe it would be safer to handle recovery within the runtime pm worker itself.

For reference, ufshcd_link_recovery is useful for the similar issue. (resolving the race condition between the runtime pm worker and the SCSI error handler worker)
https://patchwork.kernel.org/project/linux-scsi/patch/20230927033557.13801-1-peter.wang@mediatek.com/

I'll add Fixes tags and modify v2 as requested.

Thank you,
Seunghui Lee.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-16  7:01         ` Seunghui Lee
@ 2025-07-16  7:49           ` Seunghui Lee
  2025-07-16 13:35             ` Bean Huo
  2025-07-16 15:14           ` Bart Van Assche
  1 sibling, 1 reply; 12+ messages in thread
From: Seunghui Lee @ 2025-07-16  7:49 UTC (permalink / raw)
  To: 'Bean Huo', alim.akhtar, avri.altman, bvanassche,
	James.Bottomley, martin.petersen, linux-scsi, sdriver.sec

> -----Original Message-----
> From: Seunghui Lee <sh043.lee@samsung.com>
> Sent: Wednesday, July 16, 2025 4:01 PM
> To: 'Bean Huo' <huobean@gmail.com>; alim.akhtar@samsung.com;
> avri.altman@wdc.com; bvanassche@acm.org;
> James.Bottomley@HansenPartnership.com; martin.petersen@oracle.com; linux-
> scsi@vger.kernel.org; sdriver.sec@samsung.com
> Subject: RE: [PATCH] ufs: core: Use link recovery when the h8 exit failure
> during runtime resume
> 
> > -----Original Message-----
> > From: Bean Huo <huobean@gmail.com>
> > Sent: Wednesday, July 16, 2025 12:22 AM
> > To: 이승희 <sh043.lee@samsung.com>; alim.akhtar@samsung.com;
> > avri.altman@wdc.com; bvanassche@acm.org;
> > James.Bottomley@HansenPartnership.com; martin.petersen@oracle.com;
> > linux- scsi@vger.kernel.org; sdriver.sec@samsung.com
> > Subject: Re: [PATCH] ufs: core: Use link recovery when the h8 exit
> > failure during runtime resume
> >
> >
> > You patched ufshcd_runtime_suspend() to return -EBUSY if
> > eh_in_progress is true — meant to avoid suspend during error handling.
> > I don't understand why here parent is not active?
> 
> After checking the RPM devices, I found that the parent device is
> suspended due to the failure of ufshcd_wl_runtime_resume.
> Even if we guarantee both the completion of ufshcd_runtime_suspend and the
> error handling completion to avoid races, ufshcd_recover_pm_error can't
> fully recover from a runtime pm failure scenario.
> 
> >
> > would like to try return -EAGAIN?
> >
> >
> > Kind regards,
> > Bean
> >
> 
> I've tried that as well, but it doesn't work either.
> [  328.915154] [0:  kworker/u32:7:  941] ufs_device_wlun 0:0:0:49488:
> runtime PM trying to activate child device 0:0:0:49488 but parent
> (target0:0:0) is not active [  328.915156] [0:  kworker/u32:7:  941]
> ufs_device_wlun 0:0:0:49488: ufshcd_recover_pm_error: rpm set_active ret(-
> 16) [  328.915157] [0:  kworker/u32:7:  941] ufshcd-qcom 1d84000.ufshc:
> ufshcd_recover_pm_error: rpm set_active ret(-11) Due to this error,
> pm_request_resume(q->dev) for each scsi device can't execute.
> This causes the I/O stack to become stuck.
> 
> This issue arises from the race condition between the runtime pm worker
> and the error handler worker.
> I believe it would be safer to handle recovery within the runtime pm
> worker itself.
> 
> For reference, ufshcd_link_recovery is useful for the similar issue.
> (resolving the race condition between the runtime pm worker and the SCSI
> error handler worker) https://patchwork.kernel.org/project/linux-
> scsi/patch/20230927033557.13801-1-peter.wang@mediatek.com/
> 
> I'll add Fixes tags and modify v2 as requested.
> 
> Thank you,
> Seunghui Lee.
> 

I would love to find the previous tag, but it's old code.
And plus, it's hard to pick one commit for Fixes.
Please understand this.

If this commit is okay, please review this commit.

Thank you,
Seunghui Lee.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-16  7:49           ` Seunghui Lee
@ 2025-07-16 13:35             ` Bean Huo
  2025-07-16 15:02               ` Bart Van Assche
  2025-07-17  6:12               ` Seunghui Lee
  0 siblings, 2 replies; 12+ messages in thread
From: Bean Huo @ 2025-07-16 13:35 UTC (permalink / raw)
  To: Seunghui Lee, alim.akhtar, avri.altman, bvanassche,
	James.Bottomley, martin.petersen, linux-scsi, sdriver.sec

On Wed, 2025-07-16 at 16:49 +0900, Seunghui Lee wrote:
> > -----Original Message-----
> > From: Seunghui Lee <sh043.lee@samsung.com>
> > Sent: Wednesday, July 16, 2025 4:01 PM
> > To: 'Bean Huo' <huobean@gmail.com>; alim.akhtar@samsung.com;
> > avri.altman@wdc.com; bvanassche@acm.org;
> > James.Bottomley@HansenPartnership.com; martin.petersen@oracle.com; linux-
> > scsi@vger.kernel.org; sdriver.sec@samsung.com
> > Subject: RE: [PATCH] ufs: core: Use link recovery when the h8 exit failure
> > during runtime resume
> > 
> > > -----Original Message-----
> > > From: Bean Huo <huobean@gmail.com>
> > > Sent: Wednesday, July 16, 2025 12:22 AM
> > > To: 이승희 <sh043.lee@samsung.com>; alim.akhtar@samsung.com;
> > > avri.altman@wdc.com; bvanassche@acm.org;
> > > James.Bottomley@HansenPartnership.com; martin.petersen@oracle.com;
> > > linux- scsi@vger.kernel.org; sdriver.sec@samsung.com
> > > Subject: Re: [PATCH] ufs: core: Use link recovery when the h8 exit
> > > failure during runtime resume
> > > 
> > > 
> > > You patched ufshcd_runtime_suspend() to return -EBUSY if
> > > eh_in_progress is true — meant to avoid suspend during error handling.
> > > I don't understand why here parent is not active?
> > 
> > After checking the RPM devices, I found that the parent device is
> > suspended due to the failure of ufshcd_wl_runtime_resume.
> > Even if we guarantee both the completion of ufshcd_runtime_suspend and the
> > error handling completion to avoid races, ufshcd_recover_pm_error can't
> > fully recover from a runtime pm failure scenario.
> > 
> > > 
> > > would like to try return -EAGAIN?
> > > 
> > > 
> > > Kind regards,
> > > Bean
> > > 
> > 
> > I've tried that as well, but it doesn't work either.
> > [  328.915154] [0:  kworker/u32:7:  941] ufs_device_wlun 0:0:0:49488:
> > runtime PM trying to activate child device 0:0:0:49488 but parent
> > (target0:0:0) is not active [  328.915156] [0:  kworker/u32:7:  941]
> > ufs_device_wlun 0:0:0:49488: ufshcd_recover_pm_error: rpm set_active ret(-
> > 16) [  328.915157] [0:  kworker/u32:7:  941] ufshcd-qcom 1d84000.ufshc:
> > ufshcd_recover_pm_error: rpm set_active ret(-11) Due to this error,
> > pm_request_resume(q->dev) for each scsi device can't execute.
> > This causes the I/O stack to become stuck.
> > 
> > This issue arises from the race condition between the runtime pm worker
> > and the error handler worker.
> > I believe it would be safer to handle recovery within the runtime pm
> > worker itself.
> > 
> > For reference, ufshcd_link_recovery is useful for the similar issue.
> > (resolving the race condition between the runtime pm worker and the SCSI
> > error handler worker) https://patchwork.kernel.org/project/linux-
> > scsi/patch/20230927033557.13801-1-peter.wang@mediatek.com/
> > 
> > I'll add Fixes tags and modify v2 as requested.
> > 
> > Thank you,
> > Seunghui Lee.
> > 
> 
> I would love to find the previous tag, but it's old code.
> And plus, it's hard to pick one commit for Fixes.
> Please understand this.
> 
> If this commit is okay, please review this commit.
> 
> Thank you,
> Seunghui Lee.



Based on my analysis, the tag might be:

commit 4db7a2360597 ("scsi: ufs: Fix concurrency of error handler and other error recovery paths"),
This commit added the error handling logic that calls ufshcd_set_link_broken() and ufshcd_schedule_eh_work()
when ret is non-zero in the ufshcd_uic_pwr_ctrl() function.

your patch adds a check for pm_op_in_progress to skip the error handler and use link recovery instead during PM operations.

Therefore, the appropriate Fixes tag for your patch would be:

Fixes: 4db7a2360597 ("scsi: ufs: Fix concurrency of error handler and other error recovery paths")

But, the problem is dd11376b9f1b ("scsi: ufs: Split the drivers/scsi/ufs directory") reorganzied the UFS code layout, 
it is true that not easy to add a proper tag, 

Or we can use both fix tags, or just the last one.

Let Bart comment on this,

Kind regards, 
Bean

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-16 13:35             ` Bean Huo
@ 2025-07-16 15:02               ` Bart Van Assche
  2025-07-17  6:12               ` Seunghui Lee
  1 sibling, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2025-07-16 15:02 UTC (permalink / raw)
  To: Bean Huo, Seunghui Lee, alim.akhtar, avri.altman, James.Bottomley,
	martin.petersen, linux-scsi, sdriver.sec

On 7/16/25 6:35 AM, Bean Huo wrote:
> But, the problem is dd11376b9f1b ("scsi: ufs: Split the drivers/ 
> scsi/ufs directory") reorganzied the UFS code layout, it is true
> that not easy to add a proper tag,
Full history of the UFS driver can be obtained as follows:

git log -p -- drivers/ufs include/ufs drivers/scsi/ufs

The above command shows the UFS driver history back up to the commit
that introduced the UFS driver, namely commit 7a3e97b0dc4b ("[SCSI]
ufshcd: UFS Host controller driver") from 2012.

The double hyphen suppresses complaints about paths that do not exist in
git HEAD.

Bart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-16  7:01         ` Seunghui Lee
  2025-07-16  7:49           ` Seunghui Lee
@ 2025-07-16 15:14           ` Bart Van Assche
  2025-07-17  6:01             ` Seunghui Lee
  1 sibling, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2025-07-16 15:14 UTC (permalink / raw)
  To: Seunghui Lee, 'Bean Huo', alim.akhtar, avri.altman,
	James.Bottomley, martin.petersen, linux-scsi, sdriver.sec


On 7/16/25 12:01 AM, Seunghui Lee wrote:
> This issue arises from the race condition between the runtime pm
> worker and the error handler worker.

How could there be a race condition between the runtime PM code and UFS
error handling? Runtime PM is disabled while UFS error handling is in
progress by surrounding error handling with ufshcd_rpm_get_sync(hba) and
ufshcd_rpm_put(hba).

> I believe it would be safer to handle recovery within the runtime pm
> worker itself.
I do not agree. If runtime PM and the UFS error handler really happen in
parallel, the proper way to fix this is by serializing both activities
instead of introducing concurrent error handling.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-16 15:14           ` Bart Van Assche
@ 2025-07-17  6:01             ` Seunghui Lee
  0 siblings, 0 replies; 12+ messages in thread
From: Seunghui Lee @ 2025-07-17  6:01 UTC (permalink / raw)
  To: 'Bart Van Assche', 'Bean Huo', alim.akhtar,
	avri.altman, James.Bottomley, martin.petersen, linux-scsi,
	sdriver.sec

> Subject: Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure
> during runtime resume
> 
> 
> On 7/16/25 12:01 AM, Seunghui Lee wrote:
> > This issue arises from the race condition between the runtime pm
> > worker and the error handler worker.
> 
> How could there be a race condition between the runtime PM code and UFS
> error handling? Runtime PM is disabled while UFS error handling is in
> progress by surrounding error handling with ufshcd_rpm_get_sync(hba) and
> ufshcd_rpm_put(hba).
> 
> > I believe it would be safer to handle recovery within the runtime pm
> > worker itself.
> I do not agree. If runtime PM and the UFS error handler really happen in
> parallel, the proper way to fix this is by serializing both activities
> instead of introducing concurrent error handling.
> 
> Thanks,
> 
> Bart.


Even though adding ufshcd_rpm_get_sync(hba),ufshcd_rpm_put(hba) in ufshcd_err_handler,
the failure of ufshcd_wl_runtime_resume causes parent devices's runtime suspend.

Here is the fault scenario more details.

1> runtime PM worker
2> err handler worker

1> sd 0:0:0:0: rpm_resume: rpmflags(0x2)
1> scsi target0:0:0: rpm_resume: rpmflags(0x0)
1> scsi host0: rpm_resume: rpmflags(0x0)
1> ufshcd-qcom 1d84000.ufshc: rpm_resume: rpmflags(0x0)
   -> enable irq, setup clocks on
1> ufshcd-qcom 1d84000.ufshc: rpm_resume: end. retval(0)
1> scsi host0: rpm_resume: end. retval(0)
1> scsi target0:0:0: rpm_resume: end. retval(0)
1> ufs_device_wlun 0:0:0:49488: rpm_resume: rpmflags(0x4)
1> __ufshcd_wl_resume: hibern8 exit failed -110
   -> set link broken, schedule error handler

	2> ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0; link is broken
	   ufshcd_err_handling_prepare()
	   ufshcd_rpm_get_sync()	// hba->ufs_device_wlun->sdev_gendev
	   -> sdev_gendev is suspended

1> ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -110
   -> sync wl_runtime_resume here, but set dev->power.runtime_error.
1> ufs_device_wlun 0:0:0:49488: rpm_resume: end. retval(-110)	// hba->ufs_device_wlun->sdev_gendev
1> sd 0:0:0:0: rpm_resume: end. retval(-110)
   -> Due to the child device PM operation error, PM operation invokes runtime suspend.

1> scsi target0:0:0: rpm_suspend: rpmflags(0xa)
1> scsi target0:0:0: rpm_suspend: end. retval(0)
1> scsi host0: rpm_suspend: rpmflags(0xa)
1> scsi host0: rpm_suspend: end. retval(0)
1> ufshcd-qcom 1d84000.ufshc: rpm_suspend: rpmflags(0xa)
1> ufshcd-qcom 1d84000.ufshc: rpm_suspend: end. retval(0)	// hba->dev
   -> disable irq, setup clocks off

	2> ufshcd_err_handling_prepare()
	   ufshcd_enable_irq(), ufshcd_setup_clocks( , on)
	2> ufshcd_reset_and_restore()
	2> ufshcd_recover_pm_error()
	   pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev)
	   -> ret error due to scsi target0:0:0 is suspended.
	   pm_runtime_set_active(hba->dev)
	   -> rev error due to dev->power.runtime_error.
	   -> Due to the upper errors, host can't invoke pm_request_resume any more.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
  2025-07-16 13:35             ` Bean Huo
  2025-07-16 15:02               ` Bart Van Assche
@ 2025-07-17  6:12               ` Seunghui Lee
  1 sibling, 0 replies; 12+ messages in thread
From: Seunghui Lee @ 2025-07-17  6:12 UTC (permalink / raw)
  To: 'Bean Huo', alim.akhtar, avri.altman, bvanassche,
	James.Bottomley, martin.petersen, linux-scsi, sdriver.sec

> > > > Subject: Re: [PATCH] ufs: core: Use link recovery when the h8 exit
> > > > failure during runtime resume
> > > >
<snip> 
> 
> Based on my analysis, the tag might be:
> 
> commit 4db7a2360597 ("scsi: ufs: Fix concurrency of error handler and
> other error recovery paths"), This commit added the error handling logic
> that calls ufshcd_set_link_broken() and ufshcd_schedule_eh_work() when ret
> is non-zero in the ufshcd_uic_pwr_ctrl() function.
> 
> your patch adds a check for pm_op_in_progress to skip the error handler
> and use link recovery instead during PM operations.
> 
> Therefore, the appropriate Fixes tag for your patch would be:
> 
> Fixes: 4db7a2360597 ("scsi: ufs: Fix concurrency of error handler and
> other error recovery paths")
> 
> But, the problem is dd11376b9f1b ("scsi: ufs: Split the drivers/scsi/ufs
> directory") reorganzied the UFS code layout, it is true that not easy to
> add a proper tag,
> 
> Or we can use both fix tags, or just the last one.
> 
> Let Bart comment on this,
> 
> Kind regards,
> Bean


Thank you so much for sharing it.
I'll add it v2 as requested.

Fixes: 4db7a2360597 ("scsi: ufs: Fix concurrency of error handler and
other error recovery paths")




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-07-17  6:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20250714090630epcas1p28ab8afec11bbab4d256dfe6649d3b00b@epcas1p2.samsung.com>
2025-07-14  9:06 ` [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume Seunghui Lee
2025-07-14 11:21   ` Bean Huo
2025-07-15  2:23     ` 이승희
2025-07-15 12:47       ` Bean Huo
2025-07-15 15:21       ` Bean Huo
2025-07-16  7:01         ` Seunghui Lee
2025-07-16  7:49           ` Seunghui Lee
2025-07-16 13:35             ` Bean Huo
2025-07-16 15:02               ` Bart Van Assche
2025-07-17  6:12               ` Seunghui Lee
2025-07-16 15:14           ` Bart Van Assche
2025-07-17  6:01             ` Seunghui Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).