* [PATCH v1] ufs: core: wlun suspend SSU fail recovery @ 2022-11-01 14:24 peter.wang 2022-11-30 10:17 ` Adrian Hunter 0 siblings, 1 reply; 9+ messages in thread From: peter.wang @ 2022-11-01 14:24 UTC (permalink / raw) To: stanley.chu, linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu, alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu From: Peter Wang <peter.wang@mediatek.com> When SSU fail in wlun suspend flow, trigger error handlder and return busy to break the suspend. If not, wlun runtime pm status become error and the consumer will stuck in runtime suspend status. Signed-off-by: Peter Wang <peter.wang@mediatek.com> --- drivers/ufs/core/ufshcd.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index b1f59a5fe632..2f2d3d5d8684 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -8970,6 +8970,7 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) enum ufs_pm_level pm_lvl; enum ufs_dev_pwr_mode req_dev_pwr_mode; enum uic_link_state req_link_state; + unsigned long flags; hba->pm_op_in_progress = true; if (pm_op != UFS_SHUTDOWN_PM) { @@ -9049,8 +9050,21 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) if (!hba->dev_info.b_rpm_dev_flush_capable) { ret = ufshcd_set_dev_pwr_mode(hba, req_dev_pwr_mode); - if (ret) + if (ret) { + /* + * If retrun err in suspend flow, IO will hang. + * Trigger error handler and break suspend for + * error recovery. + */ + spin_lock_irqsave(hba->host->host_lock, flags); + hba->force_reset = true; + ufshcd_schedule_eh_work(hba); + spin_unlock_irqrestore(hba->host->host_lock, + flags); + + ret = -EBUSY; goto enable_scaling; + } } } -- 2.18.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v1] ufs: core: wlun suspend SSU fail recovery 2022-11-01 14:24 [PATCH v1] ufs: core: wlun suspend SSU fail recovery peter.wang @ 2022-11-30 10:17 ` Adrian Hunter 2022-11-30 17:49 ` Bart Van Assche 2022-12-01 9:19 ` Peter Wang (王信友) 0 siblings, 2 replies; 9+ messages in thread From: Adrian Hunter @ 2022-11-30 10:17 UTC (permalink / raw) To: peter.wang, stanley.chu, linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu On 1/11/22 16:24, peter.wang@mediatek.com wrote: > From: Peter Wang <peter.wang@mediatek.com> > > When SSU fail in wlun suspend flow, trigger error handlder and handlder -> handler Why / how does SSU fail? > return busy to break the suspend. > If not, wlun runtime pm status become error and the consumer will > stuck in runtime suspend status. > > Signed-off-by: Peter Wang <peter.wang@mediatek.com> > --- > drivers/ufs/core/ufshcd.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index b1f59a5fe632..2f2d3d5d8684 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -8970,6 +8970,7 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) > enum ufs_pm_level pm_lvl; > enum ufs_dev_pwr_mode req_dev_pwr_mode; > enum uic_link_state req_link_state; > + unsigned long flags; > > hba->pm_op_in_progress = true; > if (pm_op != UFS_SHUTDOWN_PM) { > @@ -9049,8 +9050,21 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) > > if (!hba->dev_info.b_rpm_dev_flush_capable) { > ret = ufshcd_set_dev_pwr_mode(hba, req_dev_pwr_mode); > - if (ret) > + if (ret) { > + /* > + * If retrun err in suspend flow, IO will hang. retrun -> return > + * Trigger error handler and break suspend for > + * error recovery. > + */ > + spin_lock_irqsave(hba->host->host_lock, flags); > + hba->force_reset = true; > + ufshcd_schedule_eh_work(hba); __ufshcd_wl_suspend() is also used for shutdown and poweroff where scheduling EH is not appropriate. > + spin_unlock_irqrestore(hba->host->host_lock, > + flags); > + > + ret = -EBUSY; > goto enable_scaling; > + } > } > } > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] ufs: core: wlun suspend SSU fail recovery 2022-11-30 10:17 ` Adrian Hunter @ 2022-11-30 17:49 ` Bart Van Assche 2022-12-01 9:31 ` Peter Wang (王信友) 2022-12-01 23:32 ` Daniil Lunev 2022-12-01 9:19 ` Peter Wang (王信友) 1 sibling, 2 replies; 9+ messages in thread From: Bart Van Assche @ 2022-11-30 17:49 UTC (permalink / raw) To: Adrian Hunter, peter.wang, stanley.chu, linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu On 11/30/22 02:17, Adrian Hunter wrote: > On 1/11/22 16:24, peter.wang@mediatek.com wrote: >> From: Peter Wang <peter.wang@mediatek.com> >> >> When SSU fail in wlun suspend flow, trigger error handlder and > > handlder -> handler > > Why / how does SSU fail? I'm not sure but the issue that Peter is trying to fix with this patch may already have been fixed by my patch series "Fix a deadlock in the UFS driver". Thanks, Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] ufs: core: wlun suspend SSU fail recovery 2022-11-30 17:49 ` Bart Van Assche @ 2022-12-01 9:31 ` Peter Wang (王信友) 2022-12-01 23:32 ` Daniil Lunev 1 sibling, 0 replies; 9+ messages in thread From: Peter Wang (王信友) @ 2022-12-01 9:31 UTC (permalink / raw) To: jejb@linux.ibm.com, avri.altman@wdc.com, bvanassche@acm.org, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, alim.akhtar@samsung.com, Stanley Chu (朱原陞), adrian.hunter@intel.com Cc: linux-mediatek@lists.infradead.org, Jiajie Hao (郝加节), CC Chou (周志杰), Eddie Huang (黃智傑), Alice Chao (趙珮均), wsd_upstream, Lin Gui (桂林), Chun-Hung Wu (巫駿宏), Tun-yu Yu (游敦聿), Chaotian Jing (井朝天), Powen Kao (高伯文), Naomi Chu (朱詠田), Qilin Tan (谭麒麟) On Wed, 2022-11-30 at 09:49 -0800, Bart Van Assche wrote: > On 11/30/22 02:17, Adrian Hunter wrote: > > On 1/11/22 16:24, peter.wang@mediatek.com wrote: > > > From: Peter Wang <peter.wang@mediatek.com> > > > > > > When SSU fail in wlun suspend flow, trigger error handlder and > > > > handlder -> handler > > > > Why / how does SSU fail? > > I'm not sure but the issue that Peter is trying to fix with this > patch > may already have been fixed by my patch series "Fix a deadlock in > the > UFS driver". > > Thanks, > > Bart. > Hi Bart, Yes, it may also can fix by "[v4,07/10] scsi: ufs: Try harder to change the power mode" But I am not sure if any outher corner case SSU will got antother err which retry canoot fix? (ex, device always return same error sense code which host is not expect to receive again, again and again let retry sill fail) By the way, in wl suspend flow, not only SSU, but also enter hibern8(0x17) could got fail. I think it is better fix in both fail case. Hera is our 0x17 timeout fail log and runtime suspend retrun timeout. 473-c(1), 21140.355276017,31742,18, rs, ret=-110, time_us= 2892755, pwr_mode=2, link_status=3 472-u(1), 21139.829423708,31742, 7,0x17,arg1=0x0,arg2=0x0,arg3=0x0, 0 471-r(1), 21139.829361861, 0, 1,0x1b,t=46,db:0x 0,is:0x 80000,crypt:0,0,lba= 0,len= -1, 1738384 470-r(1), 21139.827623708,31742, 0,0x1b,t=46,db:0x 0,is:0x 80000,crypt:0,0,lba= 0,len= -1, 0 Thanks. Peter ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] ufs: core: wlun suspend SSU fail recovery 2022-11-30 17:49 ` Bart Van Assche 2022-12-01 9:31 ` Peter Wang (王信友) @ 2022-12-01 23:32 ` Daniil Lunev 2022-12-02 6:40 ` Peter Wang (王信友) 1 sibling, 1 reply; 9+ messages in thread From: Daniil Lunev @ 2022-12-01 23:32 UTC (permalink / raw) To: Bart Van Assche Cc: Adrian Hunter, peter.wang, stanley.chu, linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb, wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu On Thu, Dec 1, 2022 at 4:50 AM Bart Van Assche <bvanassche@acm.org> wrote: > > On 11/30/22 02:17, Adrian Hunter wrote: > > On 1/11/22 16:24, peter.wang@mediatek.com wrote: > >> From: Peter Wang <peter.wang@mediatek.com> > >> > >> When SSU fail in wlun suspend flow, trigger error handlder and > > > > handlder -> handler > > > > Why / how does SSU fail? > > I'm not sure but the issue that Peter is trying to fix with this patch > may already have been fixed by my patch series "Fix a deadlock in the > UFS driver". We observe a similar failure scenario when a device tries to change power mode (e.g. due to autosuspend) while "purge" is in progress. It appears that in this case the "pm_only" counter is increased on the device queue prior to the attempt of entering a runtime suspended state, but not reduced upon it failing due to the device being busy with an ongoing purge. That leads to a deadlock of subsequent IO operations to the device. --Daniil ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] ufs: core: wlun suspend SSU fail recovery 2022-12-01 23:32 ` Daniil Lunev @ 2022-12-02 6:40 ` Peter Wang (王信友) 2022-12-04 20:02 ` Daniil Lunev 0 siblings, 1 reply; 9+ messages in thread From: Peter Wang (王信友) @ 2022-12-02 6:40 UTC (permalink / raw) To: bvanassche@acm.org, dlunev@chromium.org Cc: linux-mediatek@lists.infradead.org, Jiajie Hao (郝加节), CC Chou (周志杰), Eddie Huang (黃智傑), Alice Chao (趙珮均), jejb@linux.ibm.com, wsd_upstream, avri.altman@wdc.com, martin.petersen@oracle.com, Lin Gui (桂林), Chun-Hung Wu (巫駿宏), linux-scsi@vger.kernel.org, alim.akhtar@samsung.com, Tun-yu Yu (游敦聿), Chaotian Jing (井朝天), Powen Kao (高伯文), Naomi Chu (朱詠田), Stanley Chu (朱原陞), Qilin Tan (谭麒麟), adrian.hunter@intel.com On Fri, 2022-12-02 at 10:32 +1100, Daniil Lunev wrote: > On Thu, Dec 1, 2022 at 4:50 AM Bart Van Assche <bvanassche@acm.org> > wrote: > > > > On 11/30/22 02:17, Adrian Hunter wrote: > > > On 1/11/22 16:24, peter.wang@mediatek.com wrote: > > > > From: Peter Wang <peter.wang@mediatek.com> > > > > > > > > When SSU fail in wlun suspend flow, trigger error handlder and > > > > > > handlder -> handler > > > > > > Why / how does SSU fail? > > > > I'm not sure but the issue that Peter is trying to fix with this > > patch > > may already have been fixed by my patch series "Fix a deadlock in > > the > > UFS driver". > > We observe a similar failure scenario when a device tries to change > power mode (e.g. due to autosuspend) while "purge" is in progress. It > appears that in this case the "pm_only" counter is increased on the > device queue prior to the attempt of entering a runtime suspended > state, but not reduced upon it failing due to the device being busy > with an ongoing purge. That leads to a deadlock of subsequent IO > operations to the device. > > --Daniil Hi Daniil, May I have a question, which dievce you use in this failure scenario? I think it is same issue due to SSU fail, and wlun devcie pm enter error state. So the cunsomer(scsi lu device) is block in suspend state and connot resume to reduce pm_only lead to IO hang. Thanks. BR Peter ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] ufs: core: wlun suspend SSU fail recovery 2022-12-02 6:40 ` Peter Wang (王信友) @ 2022-12-04 20:02 ` Daniil Lunev 2022-12-05 7:39 ` Peter Wang (王信友) 0 siblings, 1 reply; 9+ messages in thread From: Daniil Lunev @ 2022-12-04 20:02 UTC (permalink / raw) To: Peter Wang (王信友) Cc: bvanassche@acm.org, linux-mediatek@lists.infradead.org, Jiajie Hao (郝加节), CC Chou (周志杰), Eddie Huang (黃智傑), Alice Chao (趙珮均), jejb@linux.ibm.com, wsd_upstream, avri.altman@wdc.com, martin.petersen@oracle.com, Lin Gui (桂林), Chun-Hung Wu (巫駿宏), linux-scsi@vger.kernel.org, alim.akhtar@samsung.com, Tun-yu Yu (游敦聿), Chaotian Jing (井朝天), Powen Kao (高伯文), Naomi Chu (朱詠田), Stanley Chu (朱原陞), Qilin Tan (谭麒麟), adrian.hunter@intel.com > May I have a question, which dievce you use in this failure scenario? > I think it is same issue due to SSU fail, and wlun devcie pm enter > error state. So the cunsomer(scsi lu device) is block in suspend state > and connot resume to reduce pm_only lead to IO hang. > > Thanks. > BR > Peter I don't think it is device specific. I am pretty sure it is the same problem, for it traces to the same origin and your patch fixes the issue (though with a lot of log spam which is a bit unfortunate, but we can live with that) --Daniil ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] ufs: core: wlun suspend SSU fail recovery 2022-12-04 20:02 ` Daniil Lunev @ 2022-12-05 7:39 ` Peter Wang (王信友) 0 siblings, 0 replies; 9+ messages in thread From: Peter Wang (王信友) @ 2022-12-05 7:39 UTC (permalink / raw) To: dlunev@chromium.org Cc: linux-mediatek@lists.infradead.org, Jiajie Hao (郝加节), CC Chou (周志杰), Eddie Huang (黃智傑), Alice Chao (趙珮均), jejb@linux.ibm.com, wsd_upstream, avri.altman@wdc.com, bvanassche@acm.org, Lin Gui (桂林), martin.petersen@oracle.com, Chun-Hung Wu (巫駿宏), linux-scsi@vger.kernel.org, alim.akhtar@samsung.com, Chaotian Jing (井朝天), Powen Kao (高伯文), Tun-yu Yu (游敦聿), Naomi Chu (朱詠田), Stanley Chu (朱原陞), Qilin Tan (谭麒麟), adrian.hunter@intel.com Hi Daniil, Thanks your feedback. I am glad to hear your problem can be solved by this patch. Please also help have a review of my v3 patch https://patchwork.kernel.org/project/linux-scsi/patch/20221202073532.7884-1-peter.wang@mediatek.com/ Thanks. BR Peter On Mon, 2022-12-05 at 07:02 +1100, Daniil Lunev wrote: > > May I have a question, which dievce you use in this failure > > scenario? > > I think it is same issue due to SSU fail, and wlun devcie pm enter > > error state. So the cunsomer(scsi lu device) is block in suspend > > state > > and connot resume to reduce pm_only lead to IO hang. > > > > Thanks. > > BR > > Peter > > I don't think it is device specific. I am pretty sure it is the same > problem, for it traces to the same origin and your patch fixes the > issue (though with a lot of log spam which is a bit unfortunate, but > we can live with that) > > --Daniil ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] ufs: core: wlun suspend SSU fail recovery 2022-11-30 10:17 ` Adrian Hunter 2022-11-30 17:49 ` Bart Van Assche @ 2022-12-01 9:19 ` Peter Wang (王信友) 1 sibling, 0 replies; 9+ messages in thread From: Peter Wang (王信友) @ 2022-12-01 9:19 UTC (permalink / raw) To: linux-scsi@vger.kernel.org, avri.altman@wdc.com, alim.akhtar@samsung.com, Stanley Chu (朱原陞), jejb@linux.ibm.com, martin.petersen@oracle.com, adrian.hunter@intel.com Cc: linux-mediatek@lists.infradead.org, Jiajie Hao (郝加节), CC Chou (周志杰), Eddie Huang (黃智傑), Alice Chao (趙珮均), wsd_upstream, Lin Gui (桂林), Chun-Hung Wu (巫駿宏), Tun-yu Yu (游敦聿), Chaotian Jing (井朝天), Powen Kao (高伯文), Naomi Chu (朱詠田), Qilin Tan (谭麒麟) On Wed, 2022-11-30 at 12:17 +0200, Adrian Hunter wrote: > On 1/11/22 16:24, peter.wang@mediatek.com wrote: > > From: Peter Wang <peter.wang@mediatek.com> > > > > When SSU fail in wlun suspend flow, trigger error handlder and > > handlder -> handler will fix next versio, thanks. > > Why / how does SSU fail? > Unknow, it is seldom and we cannot catch fail the failure scenarios. We suspect is maybe the device problem which response wrong sense code. So, I think the driver is better to cover this error. Here is our cmd histroy log with 3 times SSU fail, and retrun busy to do error handler. 26-c(1), 2576.085601525, 9396,18, rs, ret=-16, time_us= 41136, pwr_mode=1, link_status=1 25-r(1), 2576.085027371, 9396, 1,0x1b,t=53,db:0x 0,is:0x 80000,crypt:0,0,lba= 0,len= -1, 31308 24-r(1), 2576.084996910, 9396, 0,0x1b,t=53,db:0x 0,is:0x 80000,crypt:0,0,lba= 0,len= -1, 0 23-r(1), 2576.084919679, 9396, 1,0x1b,t=52,db:0x 0,is:0x 80000,crypt:0,0,lba= 0,len= -1, 32846 22-r(1), 2576.084887679, 9396, 0,0x1b,t=52,db:0x 0,is:0x 80000,crypt:0,0,lba= 0,len= -1, 0 21-r(1), 2576.084704295, 0, 1,0x1b,t=55,db:0x 0,is:0x 80000,crypt:0,0,lba= 0,len= -1, 39777539 20-r(1), 2576.044927679, 9396, 0,0x1b,t=55,db:0x 0,is:0x 80000,crypt:0,0,lba= 0,len= -1, 0 > > return busy to break the suspend. > > If not, wlun runtime pm status become error and the consumer will > > stuck in runtime suspend status. > > > > Signed-off-by: Peter Wang <peter.wang@mediatek.com> > > --- > > drivers/ufs/core/ufshcd.c | 16 +++++++++++++++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > > index b1f59a5fe632..2f2d3d5d8684 100644 > > --- a/drivers/ufs/core/ufshcd.c > > +++ b/drivers/ufs/core/ufshcd.c > > @@ -8970,6 +8970,7 @@ static int __ufshcd_wl_suspend(struct ufs_hba > > *hba, enum ufs_pm_op pm_op) > > enum ufs_pm_level pm_lvl; > > enum ufs_dev_pwr_mode req_dev_pwr_mode; > > enum uic_link_state req_link_state; > > + unsigned long flags; > > > > hba->pm_op_in_progress = true; > > if (pm_op != UFS_SHUTDOWN_PM) { > > @@ -9049,8 +9050,21 @@ static int __ufshcd_wl_suspend(struct > > ufs_hba *hba, enum ufs_pm_op pm_op) > > > > if (!hba->dev_info.b_rpm_dev_flush_capable) { > > ret = ufshcd_set_dev_pwr_mode(hba, > > req_dev_pwr_mode); > > - if (ret) > > + if (ret) { > > + /* > > + * If retrun err in suspend flow, IO > > will hang. > > retrun -> return will fix next version, thanks > > > + * Trigger error handler and break > > suspend for > > + * error recovery. > > + */ > > + spin_lock_irqsave(hba->host->host_lock, > > flags); > > + hba->force_reset = true; > > + ufshcd_schedule_eh_work(hba); > > __ufshcd_wl_suspend() is also used for shutdown and poweroff > where scheduling EH is not appropriate. Will check pm_op and bypass UFS_SHUTDOWN_PM next version, thanks. > > > + spin_unlock_irqrestore(hba->host- > > >host_lock, > > + flags); > > + > > + ret = -EBUSY; > > goto enable_scaling; > > + } > > } > > } > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-12-05 7:39 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-01 14:24 [PATCH v1] ufs: core: wlun suspend SSU fail recovery peter.wang 2022-11-30 10:17 ` Adrian Hunter 2022-11-30 17:49 ` Bart Van Assche 2022-12-01 9:31 ` Peter Wang (王信友) 2022-12-01 23:32 ` Daniil Lunev 2022-12-02 6:40 ` Peter Wang (王信友) 2022-12-04 20:02 ` Daniil Lunev 2022-12-05 7:39 ` Peter Wang (王信友) 2022-12-01 9:19 ` Peter Wang (王信友)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox