From: 이승희 <sh043.lee@samsung.com>
To: "'Bean Huo'" <huobean@gmail.com>, <alim.akhtar@samsung.com>,
<avri.altman@wdc.com>, <bvanassche@acm.org>,
<James.Bottomley@HansenPartnership.com>,
<martin.petersen@oracle.com>, <linux-scsi@vger.kernel.org>,
<sdriver.sec@samsung.com>
Subject: RE: [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume
Date: Tue, 15 Jul 2025 11:23:01 +0900 [thread overview]
Message-ID: <000901dbf52f$63a69090$2af3b1b0$@samsung.com> (raw)
In-Reply-To: <b8fa773234058e68e6006127b3cd848046b75e6f.camel@gmail.com>
> -----Original Message-----
> From: Bean Huo <huobean@gmail.com>
> Sent: Monday, July 14, 2025 8:21 PM
> To: Seunghui Lee <sh043.lee@samsung.com>; alim.akhtar@samsung.com;
> avri.altman@wdc.com; bvanassche@acm.org;
> James.Bottomley@HansenPartnership.com; martin.petersen@oracle.com; linux-
> scsi@vger.kernel.org; sdriver.sec@samsung.com
> Subject: Re: [PATCH] ufs: core: Use link recovery when the h8 exit failure
> during runtime resume
>
> On Mon, 2025-07-14 at 18:06 +0900, Seunghui Lee wrote:
> > If the h8 exit fails during runtime resume process, the runtime thread
> > enters runtime suspend immediately and the error handler operates at
> > the same time.
> > It becomes stuck and cannot be recovered through the error handler.
> > To fix this, use link recovery instead of the error handler.
> >
> > Signed-off-by: Seunghui Lee <sh043.lee@samsung.com>
> > ---
> > drivers/ufs/core/ufshcd.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> > index 50adfb8b335b..dc2845c32d72 100644
> > --- a/drivers/ufs/core/ufshcd.c
> > +++ b/drivers/ufs/core/ufshcd.c
> > @@ -4340,7 +4340,7 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba
> > *hba, struct uic_command *cmd)
> > hba->uic_async_done = NULL;
> > if (reenable_intr)
> > ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
> > - if (ret) {
> > + if (ret && !hba->pm_op_in_progress) {
> > ufshcd_set_link_broken(hba);
> > ufshcd_schedule_eh_work(hba);
> > }
> > @@ -4348,6 +4348,14 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba
> > *hba, struct uic_command *cmd)
> > spin_unlock_irqrestore(hba->host->host_lock, flags);
> > mutex_unlock(&hba->uic_cmd_mutex);
> >
> > + /*
> > + * If the h8 exit fails during the runtime resume process,
> > + * it becomes stuck and cannot be recovered through the error
> handler.
> > + * To fix this, use link recovery instead of the error handler.
> > + */
> > + if (ret && hba->pm_op_in_progress)
> > + ret = ufshcd_link_recovery(hba);
> > +
> > return ret;
> > }
> >
> I have one queston:
>
> In the error handler, if the link is broken(set by
> ufshcd_set_link_broken()), then in ufshcd_err_handler(), will
> ufshcd_reset_and_restore(hba), does not this work?
>
>
> Kind regards,
> Bean
>
Unfortunately, it doesn't work.
Please refer to the below log.
[ 310.118416] [4: kworker/4:4: 786] ufshcd-qcom 1d84000.ufshc: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = -110
[ 310.118423] [4: kworker/4:4: 786] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_resume: hibern8 exit failed -110
[ 310.118424] [0: kworker/u32:0: 12] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0; link is broken
[ 310.119046] [4: kworker/4:4: 786] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -110
[ 310.119051] [4: kworker/4:4: 786] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume: 560926us, pwr_mode(1), link state(3)
-> ufshcd_wl_runtime_resume failed done.
-> ufshcd_rpm_get_sync() in ufshcd_err_handling_prepare()
[ 310.119104] [4: kworker/4:4: 786] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend start.
-> ufshcd_runtime_suspend()
-> ufshcd_suspend()
-> ufshcd_disable_irq() / ufshcd_setup_clocks( , false) / ufshcd_vreg_set_lpm() / ufshcd_hba_vreg_set_lpm()
[ 310.119111]I[0: kworker/u32:0: 12] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: Auto Hibern8 Exit failed - status: 0x00000020, upmcrs: 0x00000001
[ 310.119119]I[0: kworker/u32:0: 12] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20 saved_uic_err 0x0
<snip>
[ 310.119162] [4: kworker/4:4: 786] gcc_ufs_mem_phy_gdsc: genpd_power_off
[ 310.119167] [4: kworker/4:4: 786] CPU: 4 UID: 0 PID: 786 Comm: kworker/4:4 Tainted: G W O 6.12.23-android16-5-31706220-ud-abogki31706220-4k #1 5bbb440b8bd7ff2d31dd25fab2106d21aa9b6357
[ 310.119176] [4: kworker/4:4: 786] Tainted: [W]=WARN, [O]=OOT_MODULE
[ 310.119179] [4: kworker/4:4: 786] Hardware name: Samsung M2Q PROJECT (board-id,05) (DT)
[ 310.119183] [4: kworker/4:4: 786] Workqueue: pm pm_runtime_work
[ 310.119189] [4: kworker/4:4: 786]
[ 310.119192] [4: kworker/4:4: 786] Call trace:
[ 310.119195] [4: kworker/4:4: 786] dump_backtrace+0xec/0x128
[ 310.119205] [4: kworker/4:4: 786] show_stack+0x18/0x28
[ 310.119212] [4: kworker/4:4: 786] dump_stack_lvl+0x40/0x88
[ 310.119219] [4: kworker/4:4: 786] dump_stack+0x18/0x24
[ 310.119226] [4: kworker/4:4: 786] genpd_power_off+0x304/0x308
[ 310.119232] [4: kworker/4:4: 786] genpd_runtime_suspend+0x260/0x38c
[ 310.119238] [4: kworker/4:4: 786] __rpm_callback+0x94/0x390
[ 310.119242] [4: kworker/4:4: 786] rpm_suspend+0x284/0x640
[ 310.119249] [4: kworker/4:4: 786] rpm_idle+0x58/0x37c
[ 310.119255] [4: kworker/4:4: 786] __pm_runtime_idle+0x60/0x150
[ 310.119262] [4: kworker/4:4: 786] ufs_qcom_phy_qmp_v4_power_control+0x144/0x15c [phy_qcom_ufs_qmp_v4_canoe f98acc73ebd0cc2fea7bd12b574d8fd77ee19353]
[ 310.119276] [4: kworker/4:4: 786] ufs_qcom_phy_power_off+0x44/0x308 [phy_qcom_ufs b7bbd5d6bbe64d4ff5f3e89a124ca991f7c4f08a]
[ 310.119291] [4: kworker/4:4: 786] phy_power_off+0x58/0xdc
[ 310.119300] [4: kworker/4:4: 786] ufs_qcom_setup_clocks+0x3f4/0x7e8 [ufs_qcom 01a09a66f1eae71e0199ddd5861db80b7fe6c630]
[ 310.119341] [4: kworker/4:4: 786] ufshcd_setup_clocks+0x74/0x3d8
[ 310.119351] [4: kworker/4:4: 786] ufshcd_suspend+0x48/0x160
[ 310.119357] [4: kworker/4:4: 786] ufshcd_runtime_suspend+0x70/0x1b8
[ 310.119363] [4: kworker/4:4: 786] pm_generic_runtime_suspend+0x40/0x58
[ 310.119369] [4: kworker/4:4: 786] genpd_runtime_suspend+0x128/0x38c
[ 310.119375] [4: kworker/4:4: 786] __rpm_callback+0x94/0x390
[ 310.119378] [4: kworker/4:4: 786] rpm_suspend+0x2a4/0x640
[ 310.119385] [4: kworker/4:4: 786] pm_runtime_work+0x8c/0xa8
[ 310.119389] [4: kworker/4:4: 786] process_scheduled_works+0x1c4/0x45c
[ 310.119394] [4: kworker/4:4: 786] worker_thread+0x32c/0x3e8
[ 310.119399] [4: kworker/4:4: 786] kthread+0x11c/0x1b0
[ 310.119405] [4: kworker/4:4: 786] ret_from_fork+0x10/0x20
[ 310.120394] [0: kworker/u32:0: 12] gcc_ufs_mem_phy_gdsc: genpd_power_on
[ 310.120398] [0: kworker/u32:0: 12] CPU: 0 UID: 0 PID: 12 Comm: kworker/u32:0 Tainted: G W O 6.12.23-android16-5-31706220-ud-abogki31706220-4k #1 5bbb440b8bd7ff2d31dd25fab2106d21aa9b6357
[ 310.120408] [0: kworker/u32:0: 12] Tainted: [W]=WARN, [O]=OOT_MODULE
[ 310.120410] [0: kworker/u32:0: 12] Hardware name: Samsung M2Q PROJECT (board-id,05) (DT)
[ 310.120413] [0: kworker/u32:0: 12] Workqueue: ufs_eh_wq_0 ufshcd_err_handler
[ 310.120423] [0: kworker/u32:0: 12] Call trace:
[ 310.120426] [0: kworker/u32:0: 12] dump_backtrace+0xec/0x128
[ 310.120435] [0: kworker/u32:0: 12] show_stack+0x18/0x28
[ 310.120442] [0: kworker/u32:0: 12] dump_stack_lvl+0x40/0x88
[ 310.120448] [0: kworker/u32:0: 12] dump_stack+0x18/0x24
[ 310.120455] [0: kworker/u32:0: 12] genpd_power_on+0x34c/0x3b0
[ 310.120463] [0: kworker/u32:0: 12] genpd_runtime_resume+0x16c/0x43c
[ 310.120469] [0: kworker/u32:0: 12] __rpm_callback+0x94/0x390
[ 310.120473] [0: kworker/u32:0: 12] rpm_resume+0x3bc/0x5a8
[ 310.120480] [0: kworker/u32:0: 12] __pm_runtime_resume+0x48/0x8c
[ 310.120487] [0: kworker/u32:0: 12] ufs_qcom_phy_qmp_v4_power_control+0x34/0x15c [phy_qcom_ufs_qmp_v4_canoe f98acc73ebd0cc2fea7bd12b574d8fd77ee19353]
[ 310.120497] [0: kworker/u32:0: 12] ufs_qcom_phy_power_on+0x13c/0x79c [phy_qcom_ufs b7bbd5d6bbe64d4ff5f3e89a124ca991f7c4f08a]
[ 310.120512] [0: kworker/u32:0: 12] phy_power_on+0x9c/0x124
[ 310.120520] [0: kworker/u32:0: 12] ufs_qcom_setup_clocks+0xb4/0x7e8 [ufs_qcom 01a09a66f1eae71e0199ddd5861db80b7fe6c630]
[ 310.120560] [0: kworker/u32:0: 12] ufshcd_setup_clocks+0x150/0x3d8
[ 310.120567] [0: kworker/u32:0: 12] ufshcd_err_handler+0x42c/0xd20
[ 310.120574] [0: kworker/u32:0: 12] process_scheduled_works+0x1c4/0x45c
[ 310.120579] [0: kworker/u32:0: 12] worker_thread+0x32c/0x3e8
[ 310.120584] [0: kworker/u32:0: 12] kthread+0x11c/0x1b0
[ 310.120590] [0: kworker/u32:0: 12] ret_from_fork+0x10/0x20
-> ufshcd_err_handler()
-> ufshcd_err_handling_prepare()
-> ufshcd_setup_clocks( , true)
[ 310.122158] [4: kworker/4:4: 786] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend: 3056us, pwr_mode(1), link state(3)
-> ufshcd_runtime_suspend done.
[ 310.122187] [4: kworker/4:4: 786] gcc_ufs_phy_gdsc: genpd_power_off
[ 310.122191] [4: kworker/4:4: 786] CPU: 4 UID: 0 PID: 786 Comm: kworker/4:4 Tainted: G W O 6.12.23-android16-5-31706220-ud-abogki31706220-4k #1 5bbb440b8bd7ff2d31dd25fab2106d21aa9b6357
[ 310.122198] [4: kworker/4:4: 786] Tainted: [W]=WARN, [O]=OOT_MODULE
[ 310.122200] [4: kworker/4:4: 786] Hardware name: Samsung M2Q PROJECT (board-id,05) (DT)
[ 310.122203] [4: kworker/4:4: 786] Workqueue: pm pm_runtime_work
[ 310.122208] [4: kworker/4:4: 786] Call trace:
[ 310.122210] [4: kworker/4:4: 786] dump_backtrace+0xec/0x128
[ 310.122218] [4: kworker/4:4: 786] show_stack+0x18/0x28
[ 310.122225] [4: kworker/4:4: 786] dump_stack_lvl+0x40/0x88
[ 310.122231] [4: kworker/4:4: 786] dump_stack+0x18/0x24
[ 310.122238] [4: kworker/4:4: 786] genpd_power_off+0x304/0x308
[ 310.122244] [4: kworker/4:4: 786] genpd_runtime_suspend+0x260/0x38c
[ 310.122249] [4: kworker/4:4: 786] __rpm_callback+0x94/0x390
[ 310.122253] [4: kworker/4:4: 786] rpm_suspend+0x2a4/0x640
[ 310.122260] [4: kworker/4:4: 786] pm_runtime_work+0x8c/0xa8
[ 310.122263] [4: kworker/4:4: 786] process_scheduled_works+0x1c4/0x45c
[ 310.122268] [4: kworker/4:4: 786] worker_thread+0x32c/0x3e8
[ 310.122272] [4: kworker/4:4: 786] kthread+0x11c/0x1b0
[ 310.122279] [4: kworker/4:4: 786] ret_from_fork+0x10/0x20
[ 313.197378] [4: kworker/u32:0: 12] ufs_qcom_phy_qmp_v4_canoe 1d80000.ufsphy_mem: ufs_qcom_phy_qmp_v4_is_pcs_ready: poll for pcs failed err = -110
[ 313.197398] [4: kworker/u32:0: 12] ufshcd-qcom 1d84000.ufshc: ufs_qcom_power_up_sequence: Failed to calibrate PHY -110
[ 313.255403] [0: kworker/u32:0: 12] ufshcd-qcom 1d84000.ufshc: Controller enable failed
To fix this, I've tried two more things.
1> The error handler waits until ufshcd_runtime_suspend done.
-> It doesn't work, either.
--- a/common/drivers/ufs/core/ufshcd.c
+++ b/common/drivers/ufs/core/ufshcd.c
@@ -6551,6 +6551,11 @@ static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
hba->is_sys_suspended) {
enum ufs_pm_op pm_op;
+ while (!pm_runtime_status_suspended(hba->dev)) {
+ dev_err(hba->dev, "%s: waiting for complete suspend\n", __func__);
+ msleep(10);
+ }
+
/*
* Don't assume anything of resume, if
* resume fails, irq and clocks can be OFF, and powers
[ 335.876451] [0: kworker/0:3: 713] ufshcd-qcom 1d84000.ufshc: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = -110
[ 335.876456] [0: kworker/0:3: 713] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_resume: hibern8 exit failed -110
[ 335.876460] [4: kworker/u32:2: 88] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0; link is broken
[ 335.877072] [0: kworker/0:3: 713] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -110
[ 335.877076] [0: kworker/0:3: 713] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume: 544445us, pwr_mode(1), link state(3)
-> ufshcd_wl_runtime_resume failed done.
-> ufshcd_rpm_get_sync() in ufshcd_err_handling_prepare()
[ 335.877112] [4: kworker/u32:2: 88] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handling_prepare: waiting for complete suspend
-> wait until ufshcd_runtime_suspend done.
[ 335.877123] [0: kworker/0:3: 713] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend start.
[ 335.879805] [0: kworker/0:3: 713] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend: 2685us, pwr_mode(1), link state(3)
[ 337.950174] [4: kworker/u32:2: 88] ufshcd-qcom 1d84000.ufshc: ESI configured
[ 337.950302] [4: kworker/u32:2: 88] ufshcd-qcom 1d84000.ufshc: MCQ configured, nr_queues=9, io_queues=8, read_queue=0, poll_queues=1, queue_depth=64
-> ufshcd_reset_and_restore() works well.
[ 337.970178] [0: kworker/u32:2: 88] ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
-> ufschd_recover_pm_err()
-> Because of this error, pm_request_resume doesn't call here.
[ 337.970212] [0: kworker/u32:2: 88] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler finished; HBA state operational
2> if eh_in_progress, err EBUSY return in ufshcd_runtime_suspend to guarantee the error handling done.
-> It doesn't work as well.
--- a/common/drivers/ufs/core/ufshcd.c
+++ b/common/drivers/ufs/core/ufshcd.c
@@ -10371,6 +10371,9 @@ int ufshcd_runtime_suspend(struct device *dev)
int ret;
ktime_t start = ktime_get();
+ if (ufshcd_eh_in_progress(hba))
+ return -EBUSY;
+
ret = ufshcd_suspend(hba);
trace_ufshcd_runtime_suspend(hba, ret,
[ 63.010841] [4: kworker/4:0: 52] ufshcd-qcom 1d84000.ufshc: ufshcd_uic_hibern8_exit: hibern8 exit failed. ret = -110
[ 63.010844] [4: kworker/4:0: 52] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_resume: hibern8 exit failed -110
[ 63.010845] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state eh_fatal; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0; link is broken
[ 63.011430] [4: kworker/4:0: 52] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -110
[ 63.011433] [4: kworker/4:0: 52] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume: 574917us, pwr_mode(1), link state(3)
-> ufshcd_wl_runtime_resume failed done.
-> ufshcd_rpm_get_sync() in ufshcd_err_handling_prepare()
[ 63.011457] [4: kworker/4:0: 52] ufshcd-qcom 1d84000.ufshc: ufshcd_runtime_suspend: eh_in_progress
-> EBUSY return in ufshcd_runtime_suspend due to eh_in_progress
[ 63.011464]I[0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: Auto Hibern8 Exit failed - status: 0x00000020, upmcrs: 0x00000001
[ 63.011468]I[0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_check_errors: saved_err 0x20 saved_uic_err 0x0
[ 63.039824] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufs_qcom_device_reset: Waiting for device internal cache flush
[ 65.084604] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ESI configured
[ 65.084728] [0: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: MCQ configured, nr_queues=9, io_queues=8, read_queue=0, poll_queues=1, queue_depth=64
-> ufshcd_reset_and_restore() works well.
[ 65.105186] [1: kworker/u32:10:13604] ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active
-> ufschd_recover_pm_err()
-> Because of this error, pm_request_resume doesn't call here.
[ 65.105305] [1: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler finished; HBA state operational
[ 65.105310] [1: kworker/u32:10:13604] ufshcd-qcom 1d84000.ufshc: ufshcd_err_handler started; HBA state operational; powered 1; shutting down 0; saved_err = 0; saved_uic_err = 0; force_reset = 0
next prev parent reply other threads:[~2025-07-15 2:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20250714090630epcas1p28ab8afec11bbab4d256dfe6649d3b00b@epcas1p2.samsung.com>
2025-07-14 9:06 ` [PATCH] ufs: core: Use link recovery when the h8 exit failure during runtime resume Seunghui Lee
2025-07-14 11:21 ` Bean Huo
2025-07-15 2:23 ` 이승희 [this message]
2025-07-15 12:47 ` Bean Huo
2025-07-15 15:21 ` Bean Huo
2025-07-16 7:01 ` Seunghui Lee
2025-07-16 7:49 ` Seunghui Lee
2025-07-16 13:35 ` Bean Huo
2025-07-16 15:02 ` Bart Van Assche
2025-07-17 6:12 ` Seunghui Lee
2025-07-16 15:14 ` Bart Van Assche
2025-07-17 6:01 ` Seunghui Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='000901dbf52f$63a69090$2af3b1b0$@samsung.com' \
--to=sh043.lee@samsung.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=alim.akhtar@samsung.com \
--cc=avri.altman@wdc.com \
--cc=bvanassche@acm.org \
--cc=huobean@gmail.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=sdriver.sec@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.