From: Can Guo <cang@codeaurora.org>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
hongwus@codeaurora.org, ziqichen@codeaurora.org,
linux-scsi@vger.kernel.org, kernel-team@android.com,
Alim Akhtar <alim.akhtar@samsung.com>,
Avri Altman <avri.altman@wdc.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Stanley Chu <stanley.chu@mediatek.com>,
Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation
Date: Fri, 11 Jun 2021 11:01:12 +0800 [thread overview]
Message-ID: <f0ae504bccc428fa674a183608174bdd@codeaurora.org> (raw)
In-Reply-To: <6abb81f6-4dd2-082e-9440-4b549f105788@intel.com>
Hi Adrian,
On 2021-06-10 20:30, Adrian Hunter wrote:
> On 10/06/21 7:43 am, Can Guo wrote:
>> Commit cb7e6f05fce67c965194ac04467e1ba7bc70b069 ("scsi: ufs: core:
>> Enable
>> power management for wlun") moves UFS operations out of
>> ufshcd_resume(), so
>> in error handling preparation, if ufshcd hba has failed to resume,
>> there is
>> no point to re-enable IRQ/clk/pwr.
>
> I am not sure how cb7e6f05fce67c965194ac04467e1ba7bc70b069 made things
> any
> different,
Previously, without commit cb7e6f05fce67c965194ac04467e1ba7bc70b069,
ufshcd_resume()
may turn off pwr and clk due to UFS error, e.g., link transition failure
and SSU
error/abort (and these UFS error would invoke error handling). When
error handling
kicks start, it should re-enable the pwr and clk before proceeding. Now,
commit
cb7e6f05fce67c965194ac04467e1ba7bc70b069 makes ufshcd_resume() purely
control pwr and
clk, meaning if ufshcd_resume() fails, there is nothing we can do about
it - pwr or
clk enabling must have failed, and it is not because of UFS error. This
is why I am
removing the re-enabling pwr/clk in error handling prepare.
> but what I really wonder is why we don't just do recovery
> directly in __ufshcd_wl_suspend() and __ufshcd_wl_resume() and strip
> all
> the PM complexity out of ufshcd_err_handling()?
>
This is a good question and I've been strugled with this idea ever since
I
started to fix error handling.
Just so you know, there are runtime and system suspend/resume. And error
handling has the same nature of user access - it is unpredictable,
meaning it
can be invoked at any time (from IRQ handler), even when there is no
ongoing
cmd/data transactions (like auto hibern8 failure and UIC errors, such as
DME
error and some errors in data link layer) [1], unless you disable UFS
IRQ.
For runtime suspend/resume, it is fine, since we call
pm_runtime_get/put_sync() in
error handling - error handling won't run into parallel with runtime
suspend/resume.
For system suspend/resume, since error handling has the same nature like
user
access, so we are using host_sem to avoid concurrency of error handling
and
system suspend/resume.
Back to your question - can we just do recovery directly in
__ufshcd_wl_suspend()
and __ufshcd_wl_resume()? Yes, we can.
However, the reasons why I choose not to do it that way are (althrough
error
handler prepare has became much more simple after apply this change)
1. I want to keep all the complexity within error handler, and re-direct
all error
recovery needs to error handler. It can avoid calling
ufshcd_reset_and_restore()
and/or flush_work(&hba->eh_work) here and there. The entire UFS
suspend/resume is
already complex enough, I don't want to mess up with it.
2. We do explicit recovery only when we see certain errors, e.g., H8
enter func
returns an error during suspend, but as mentioned above [1], error
handling can
be invoked already from IRQ handler (due to all kinds of UIC errors
before H8 enter
func returns). So, we still need host_sem (in case of system
suspend/resume) to
avoid concurrency.
3. During system suspend/resume, error handling can be invoked (due to
non-fatal
errors) but still UFS cmds return no error at all. Similar like above,
we need
host_sem to avoid concurrency.
There are more reasons why I chose this way, but it is really this way
or others.
I am glad to see someone cares about error handling and can make it
better and
more robust, no matter what that way is. :)
Thanks,
Can Guo.
>>
>> Signed-off-by: Can Guo <cang@codeaurora.org>
>> ---
>> drivers/scsi/ufs/ufshcd.c | 58
>> +++++++++++++++++++++++++----------------------
>> 1 file changed, 31 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 7dc0fda..0afad6b 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -2727,8 +2727,8 @@ static int ufshcd_queuecommand(struct Scsi_Host
>> *host, struct scsi_cmnd *cmd)
>> break;
>> case UFSHCD_STATE_EH_SCHEDULED_FATAL:
>> /*
>> - * pm_runtime_get_sync() is used at error handling preparation
>> - * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
>> + * ufshcd_rpm_get_sync() is used at error handling preparation
>> + * stage. If a scsi cmd, e.g., the SSU cmd, is sent from the
>> * PM ops, it can never be finished if we let SCSI layer keep
>> * retrying it, which gets err handler stuck forever. Neither
>> * can we let the scsi cmd pass through, because UFS is in bad
>> @@ -5915,29 +5915,26 @@ static void ufshcd_clk_scaling_suspend(struct
>> ufs_hba *hba, bool suspend)
>> }
>> }
>>
>> -static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>> +static int ufshcd_err_handling_prepare(struct ufs_hba *hba)
>> {
>> + /*
>> + * Exclusively call pm_runtime_get_sync(hba->dev) once, in case
>> + * following ufshcd_rpm_get_sync() fails.
>> + */
>> + pm_runtime_get_sync(hba->dev);
>> + /* End of the world. */
>> + if (pm_runtime_suspended(hba->dev)) {
>> + pm_runtime_put(hba->dev);
>> + return -EINVAL;
>> + }
>> +
>> + ufshcd_set_eh_in_progress(hba);
>> ufshcd_rpm_get_sync(hba);
>> - if (pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev)
>> ||
>> + if (pm_runtime_suspended(&hba->sdev_ufs_device->sdev_gendev) ||
>> hba->is_wl_sys_suspended) {
>> - enum ufs_pm_op pm_op;
>> + enum ufs_pm_op pm_op = hba->is_wl_sys_suspended ?
>> + UFS_SYSTEM_PM : UFS_RUNTIME_PM;
>>
>> - /*
>> - * Don't assume anything of resume, if
>> - * resume fails, irq and clocks can be OFF, and powers
>> - * can be OFF or in LPM.
>> - */
>> - ufshcd_setup_hba_vreg(hba, true);
>> - ufshcd_setup_vreg(hba, true);
>> - ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq);
>> - ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq2);
>> - ufshcd_hold(hba, false);
>> - if (!ufshcd_is_clkgating_allowed(hba)) {
>> - ufshcd_setup_clocks(hba, true);
>> - ufshcd_enable_irq(hba);
>> - }
>> - ufshcd_release(hba);
>> - pm_op = hba->is_wl_sys_suspended ? UFS_SYSTEM_PM : UFS_RUNTIME_PM;
>> ufshcd_vops_resume(hba, pm_op);
>> } else {
>> ufshcd_hold(hba, false);
>> @@ -5951,22 +5948,25 @@ static void ufshcd_err_handling_prepare(struct
>> ufs_hba *hba)
>> down_write(&hba->clk_scaling_lock);
>> up_write(&hba->clk_scaling_lock);
>> cancel_work_sync(&hba->eeh_work);
>> + return 0;
>> }
>>
>> static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
>> {
>> + ufshcd_clear_eh_in_progress(hba);
>> ufshcd_scsi_unblock_requests(hba);
>> ufshcd_release(hba);
>> if (ufshcd_is_clkscaling_supported(hba))
>> ufshcd_clk_scaling_suspend(hba, false);
>> ufshcd_clear_ua_wluns(hba);
>> ufshcd_rpm_put(hba);
>> + pm_runtime_put(hba->dev);
>> }
>>
>> static inline bool ufshcd_err_handling_should_stop(struct ufs_hba
>> *hba)
>> {
>> return (!hba->is_powered || hba->shutting_down ||
>> - !hba->sdev_ufs_device ||
>> + !hba->sdev_ufs_device || hba->is_sys_suspended ||
>> hba->ufshcd_state == UFSHCD_STATE_ERROR ||
>> (!(hba->saved_err || hba->saved_uic_err || hba->force_reset ||
>> ufshcd_is_link_broken(hba))));
>> @@ -6052,9 +6052,13 @@ static void ufshcd_err_handler(struct
>> work_struct *work)
>> up(&hba->host_sem);
>> return;
>> }
>> - ufshcd_set_eh_in_progress(hba);
>> spin_unlock_irqrestore(hba->host->host_lock, flags);
>> - ufshcd_err_handling_prepare(hba);
>> + if (ufshcd_err_handling_prepare(hba)) {
>> + dev_err(hba->dev, "%s: error handling preparation failed\n",
>> + __func__);
>> + up(&hba->host_sem);
>> + return;
>> + }
>> /* Complete requests that have door-bell cleared by h/w */
>> ufshcd_complete_requests(hba);
>> spin_lock_irqsave(hba->host->host_lock, flags);
>> @@ -6198,7 +6202,6 @@ static void ufshcd_err_handler(struct
>> work_struct *work)
>> dev_err_ratelimited(hba->dev, "%s: exit: saved_err 0x%x
>> saved_uic_err 0x%x",
>> __func__, hba->saved_err, hba->saved_uic_err);
>> }
>> - ufshcd_clear_eh_in_progress(hba);
>> spin_unlock_irqrestore(hba->host->host_lock, flags);
>> ufshcd_err_handling_unprepare(hba);
>> up(&hba->host_sem);
>> @@ -8999,6 +9002,9 @@ static int __ufshcd_wl_resume(struct ufs_hba
>> *hba, enum ufs_pm_op pm_op)
>>
>> /* Enable Auto-Hibernate if configured */
>> ufshcd_auto_hibern8_enable(hba);
>> +
>> + hba->clk_gating.is_suspended = false;
>> + ufshcd_release(hba);
>> goto out;
>>
>> set_old_link_state:
>> @@ -9008,8 +9014,6 @@ static int __ufshcd_wl_resume(struct ufs_hba
>> *hba, enum ufs_pm_op pm_op)
>> out:
>> if (ret)
>> ufshcd_update_evt_hist(hba, UFS_EVT_WL_RES_ERR, (u32)ret);
>> - hba->clk_gating.is_suspended = false;
>> - ufshcd_release(hba);
>> hba->wl_pm_op_in_progress = false;
>> return ret <= 0 ? ret : -EINVAL;
>> }
>>
next prev parent reply other threads:[~2021-06-11 3:01 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-10 4:43 [PATCH v3 0/9] Complementary changes for error handling Can Guo
2021-06-10 4:43 ` [PATCH v3 1/9] scsi: ufs: Differentiate status between hba pm ops and wl pm ops Can Guo
2021-06-10 11:15 ` Adrian Hunter
2021-06-11 0:53 ` Can Guo
2021-06-11 20:40 ` Bart Van Assche
2021-06-12 6:20 ` Can Guo
2021-06-16 17:50 ` Bart Van Assche
2021-06-23 1:32 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 2/9] scsi: ufs: Update the return value of supplier " Can Guo
2021-06-10 4:43 ` [PATCH v3 3/9] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-10 4:43 ` [PATCH v3 4/9] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-11 20:52 ` Bart Van Assche
2021-06-12 7:38 ` Can Guo
2021-06-12 15:50 ` Bart Van Assche
2021-06-13 13:30 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-10 12:30 ` Adrian Hunter
2021-06-11 3:01 ` Can Guo [this message]
2021-06-11 20:58 ` Bart Van Assche
2021-06-12 6:46 ` Can Guo
2021-06-12 9:49 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 6/9] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-10 4:43 ` [PATCH v3 7/9] scsi: ufs: Let host_sem cover the entire system suspend/resume Can Guo
2021-06-10 13:32 ` Adrian Hunter
2021-06-11 3:06 ` Can Guo
2021-06-11 21:00 ` Bart Van Assche
2021-06-12 6:46 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-11 21:02 ` Bart Van Assche
2021-06-12 7:07 ` Can Guo
2021-06-12 16:50 ` Bart Van Assche
2021-06-13 14:42 ` Can Guo
2021-06-14 18:49 ` Bart Van Assche
2021-06-15 2:36 ` Can Guo
2021-06-15 3:17 ` Can Guo
2021-06-15 18:25 ` Bart Van Assche
2021-06-16 4:00 ` Can Guo
2021-06-16 4:40 ` Bart Van Assche
2021-06-16 8:47 ` Can Guo
2021-06-16 17:55 ` Bart Van Assche
2021-06-23 1:34 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 9/9] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-11 21:03 ` Bart Van Assche
2021-06-12 7:13 ` Can Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f0ae504bccc428fa674a183608174bdd@codeaurora.org \
--to=cang@codeaurora.org \
--cc=adrian.hunter@intel.com \
--cc=alim.akhtar@samsung.com \
--cc=asutoshd@codeaurora.org \
--cc=avri.altman@wdc.com \
--cc=beanhuo@micron.com \
--cc=hongwus@codeaurora.org \
--cc=jaegeuk@kernel.org \
--cc=jejb@linux.ibm.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=nguyenb@codeaurora.org \
--cc=stanley.chu@mediatek.com \
--cc=ziqichen@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).