From: Can Guo <cang@codeaurora.org>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
hongwus@codeaurora.org, ziqichen@codeaurora.org,
linux-scsi@vger.kernel.org, kernel-team@android.com,
Alim Akhtar <alim.akhtar@samsung.com>,
Avri Altman <avri.altman@wdc.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Stanley Chu <stanley.chu@mediatek.com>,
Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume
Date: Mon, 28 Jun 2021 15:26:34 +0800 [thread overview]
Message-ID: <7c6e2baa3578eb30f2d4bd1696e800eb@codeaurora.org> (raw)
In-Reply-To: <f1c997f3-66e4-3f1f-08f5-83449b65c397@intel.com>
On 2021-06-24 18:04, Adrian Hunter wrote:
> On 24/06/21 9:31 am, Can Guo wrote:
>> On 2021-06-24 14:23, Adrian Hunter wrote:
>>> On 24/06/21 9:12 am, Can Guo wrote:
>>>> On 2021-06-24 13:52, Adrian Hunter wrote:
>>>>> On 24/06/21 5:16 am, Can Guo wrote:
>>>>>> On 2021-06-23 22:30, Adrian Hunter wrote:
>>>>>>> On 23/06/21 10:35 am, Can Guo wrote:
>>>>>>>> To protect system suspend/resume from being disturbed by error
>>>>>>>> handling,
>>>>>>>> instead of using host_sem, let error handler call
>>>>>>>> lock_system_sleep() and
>>>>>>>> unlock_system_sleep() which achieve the same purpose. Remove the
>>>>>>>> host_sem
>>>>>>>> used in suspend/resume paths to make the code more readable.
>>>>>>>>
>>>>>>>> Suggested-by: Bart Van Assche <bvanassche@acm.org>
>>>>>>>> Signed-off-by: Can Guo <cang@codeaurora.org>
>>>>>>>> ---
>>>>>>>> drivers/scsi/ufs/ufshcd.c | 12 +++++++-----
>>>>>>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/scsi/ufs/ufshcd.c
>>>>>>>> b/drivers/scsi/ufs/ufshcd.c
>>>>>>>> index 3695dd2..a09e4a2 100644
>>>>>>>> --- a/drivers/scsi/ufs/ufshcd.c
>>>>>>>> +++ b/drivers/scsi/ufs/ufshcd.c
>>>>>>>> @@ -5907,6 +5907,11 @@ static void
>>>>>>>> ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
>>>>>>>>
>>>>>>>> static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>>>>>>>> {
>>>>>>>> + /*
>>>>>>>> + * It is not safe to perform error handling while suspend
>>>>>>>> or resume is
>>>>>>>> + * in progress. Hence the lock_system_sleep() call.
>>>>>>>> + */
>>>>>>>> + lock_system_sleep();
>>>>>>>
>>>>>>> It looks to me like the system takes this lock quite early, even
>>>>>>> before
>>>>>>> freezing tasks, so if anything needs the error handler to run it
>>>>>>> will
>>>>>>> deadlock.
>>>>>>
>>>>>> Hi Adrian,
>>>>>>
>>>>>> UFS/hba system suspend/resume does not invoke or call error
>>>>>> handling in a
>>>>>> synchronous way. So, whatever UFS errors (which schedules the
>>>>>> error handler)
>>>>>> happens during suspend/resume, error handler will just wait here
>>>>>> till system
>>>>>> suspend/resume release the lock. Hence no worries of deadlock
>>>>>> here.
>>>>>
>>>>> It looks to me like the state can change to
>>>>> UFSHCD_STATE_EH_SCHEDULED_FATAL
>>>>> and since user processes are not frozen, nor file systems sync'ed,
>>>>> everything
>>>>> is going to deadlock.
>>>>> i.e.
>>>>> I/O is blocked waiting on error handling
>>>>> error handling is blocked waiting on lock_system_sleep()
>>>>> suspend is blocked waiting on I/O
>>>>>
>>>>
>>>> Hi Adrian,
>>>>
>>>> First of all, enter_state(suspend_state_t state) uses
>>>> mutex_trylock(&system_transition_mutex).
>>>
>>> Yes, in the case I am outlining it gets the mutex.
>>>
>>>> Second, even that happens, in ufshcd_queuecommand(), below logic
>>>> will break the cycle, by
>>>> fast failing the PM request (below codes are from the code tip with
>>>> this whole series applied).
>>>
>>> It won't get that far because the suspend will be waiting to sync
>>> filesystems.
>>> Filesystems will be waiting on I/O.
>>> I/O will be waiting on the error handler.
>>> The error handler will be waiting on system_transition_mutex.
>>> But system_transition_mutex is already held by PM core.
>>
>> Hi Adrian,
>>
>> You are right.... I missed the action of syncing filesystems...
>>
>> Using back host_sem in suspend_prepare()/resume_complete() won't have
>> this
>> problem of deadlock, right?
>
> I am not sure, but what was problem that the V3 patch was fixing?
> Can you give an example?
V3 was moving host_sem from wl_system_suspend/resume() to
ufshcd_suspend_prepare()/ufshcd_resume_complete(). It is to
make sure error handling does not run concurrenly with system
PM, since error handling is recovering/clearing runtime PM
errors of all the scsi devices under hba (in patch #8). Having the
error handling doing so (in patch 8) is because runtime PM framework
may save the runtime errors of the supplier to one or more consumers (
unlike the children - parent relationship), for example if wlu resume
fails, sda and/or other scsi devices may save the resume error, then
they will be left runtime suspended permanently.
Thanks,
Can Guo.
>
>>
>> Thanks,
>>
>> Can Guo.
>>
>>>
>>>>
>>>> case UFSHCD_STATE_EH_SCHEDULED_FATAL:
>>>> /*
>>>> * ufshcd_rpm_get_sync() is used at error handling
>>>> preparation
>>>> * stage. If a scsi cmd, e.g., the SSU cmd, is sent
>>>> from the
>>>> * PM ops, it can never be finished if we let SCSI
>>>> layer keep
>>>> * retrying it, which gets err handler stuck
>>>> forever. Neither
>>>> * can we let the scsi cmd pass through, because UFS
>>>> is in bad
>>>> * state, the scsi cmd may eventually time out,
>>>> which will get
>>>> * err handler blocked for too long. So, just fail
>>>> the scsi cmd
>>>> * sent from PM ops, err handler can recover PM
>>>> error anyways.
>>>> */
>>>> if (cmd->request->rq_flags & RQF_PM) {
>>>> hba->force_reset = true;
>>>> set_host_byte(cmd, DID_BAD_TARGET);
>>>> cmd->scsi_done(cmd);
>>>> goto out;
>>>> }
>>>> fallthrough;
>>>> case UFSHCD_STATE_RESET:
>>>>
>>>> Thanks,
>>>>
>>>> Can Guo.
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Can Guo.
>>>>>>
>>>>>>>
>>>>>>>> ufshcd_rpm_get_sync(hba);
>>>>>>>> if
>>>>>>>> (pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev)
>>>>>>>> ||
>>>>>>>> hba->is_wlu_sys_suspended) {
>>>>>>>> @@ -5951,6 +5956,7 @@ static void
>>>>>>>> ufshcd_err_handling_unprepare(struct ufs_hba *hba)
>>>>>>>> ufshcd_clk_scaling_suspend(hba, false);
>>>>>>>> ufshcd_clear_ua_wluns(hba);
>>>>>>>> ufshcd_rpm_put(hba);
>>>>>>>> + unlock_system_sleep();
>>>>>>>> }
>>>>>>>>
>>>>>>>> static inline bool ufshcd_err_handling_should_stop(struct
>>>>>>>> ufs_hba *hba)
>>>>>>>> @@ -9053,16 +9059,13 @@ static int ufshcd_wl_suspend(struct
>>>>>>>> device *dev)
>>>>>>>> ktime_t start = ktime_get();
>>>>>>>>
>>>>>>>> hba = shost_priv(sdev->host);
>>>>>>>> - down(&hba->host_sem);
>>>>>>>>
>>>>>>>> if (pm_runtime_suspended(dev))
>>>>>>>> goto out;
>>>>>>>>
>>>>>>>> ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM);
>>>>>>>> - if (ret) {
>>>>>>>> + if (ret)
>>>>>>>> dev_err(&sdev->sdev_gendev, "%s failed: %d\n",
>>>>>>>> __func__, ret);
>>>>>>>> - up(&hba->host_sem);
>>>>>>>> - }
>>>>>>>>
>>>>>>>> out:
>>>>>>>> if (!ret)
>>>>>>>> @@ -9095,7 +9098,6 @@ static int ufshcd_wl_resume(struct device
>>>>>>>> *dev)
>>>>>>>> hba->curr_dev_pwr_mode, hba->uic_link_state);
>>>>>>>> if (!ret)
>>>>>>>> hba->is_wlu_sys_suspended = false;
>>>>>>>> - up(&hba->host_sem);
>>>>>>>> return ret;
>>>>>>>> }
>>>>>>>> #endif
>>>>>>>>
next prev parent reply other threads:[~2021-06-28 7:26 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-23 7:34 [PATCH v4 00/10] Complementary changes for error handling Can Guo
2021-06-23 7:35 ` [PATCH v4 01/10] scsi: ufs: Rename flags pm_op_in_progress and is_sys_suspended Can Guo
2021-06-23 20:05 ` Bart Van Assche
2021-06-23 20:57 ` Bart Van Assche
2021-06-24 2:02 ` Can Guo
2021-06-24 2:34 ` Can Guo
2021-06-24 6:04 ` Adrian Hunter
2021-06-23 20:42 ` Bjorn Andersson
2021-06-23 22:41 ` Bart Van Assche
2021-06-24 2:04 ` Can Guo
2021-06-24 17:32 ` Bart Van Assche
2021-06-24 23:42 ` Bart Van Assche
2021-06-28 7:01 ` Can Guo
2021-06-28 7:35 ` Can Guo
2021-06-28 17:07 ` Bart Van Assche
2021-06-23 7:35 ` [PATCH v4 02/10] scsi: ufs: Add " Can Guo
2021-06-23 12:33 ` Adrian Hunter
2021-06-24 2:05 ` Can Guo
2021-06-23 20:59 ` Bart Van Assche
2021-06-24 2:07 ` Can Guo
2021-06-24 17:35 ` Bart Van Assche
2021-06-28 7:11 ` Can Guo
2021-06-23 7:35 ` [PATCH v4 03/10] scsi: ufs: Update the return value of supplier pm ops Can Guo
2021-06-23 21:08 ` Bart Van Assche
2021-06-24 2:11 ` Can Guo
2021-06-23 7:35 ` [PATCH v4 04/10] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-23 21:20 ` Bart Van Assche
2021-06-23 7:35 ` [PATCH 05/10] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-23 7:39 ` Can Guo
2021-06-23 7:35 ` [PATCH v4 05/10] scsi: ufs: Remove a redundant tag check in ufshcd_queuecommand() Can Guo
2021-06-23 21:24 ` Bart Van Assche
2021-06-23 7:35 ` [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume Can Guo
2021-06-23 14:30 ` Adrian Hunter
2021-06-24 2:16 ` Can Guo
2021-06-24 5:52 ` Adrian Hunter
2021-06-24 6:12 ` Can Guo
2021-06-24 6:23 ` Adrian Hunter
2021-06-24 6:31 ` Can Guo
2021-06-24 10:04 ` Adrian Hunter
2021-06-28 7:26 ` Can Guo [this message]
2021-07-07 19:04 ` Adrian Hunter
2021-06-24 17:11 ` Bart Van Assche
2021-06-28 8:17 ` Can Guo
2021-06-28 17:31 ` Bart Van Assche
2021-06-29 6:23 ` Can Guo
2021-06-29 18:01 ` Bart Van Assche
2021-06-29 21:50 ` Can Guo
2021-06-23 7:35 ` [PATCH v4 07/10] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-23 21:30 ` Bart Van Assche
2021-06-23 7:35 ` [PATCH v4 08/10] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-23 7:35 ` [PATCH v4 09/10] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-23 21:33 ` Bart Van Assche
2021-06-24 4:16 ` Can Guo
2021-06-24 16:57 ` Bart Van Assche
2021-06-23 7:35 ` [PATCH v4 10/10] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-23 21:51 ` Bart Van Assche
2021-06-24 2:23 ` Can Guo
2021-06-24 22:25 ` Bart Van Assche
2021-06-28 7:16 ` Can Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7c6e2baa3578eb30f2d4bd1696e800eb@codeaurora.org \
--to=cang@codeaurora.org \
--cc=adrian.hunter@intel.com \
--cc=alim.akhtar@samsung.com \
--cc=asutoshd@codeaurora.org \
--cc=avri.altman@wdc.com \
--cc=beanhuo@micron.com \
--cc=hongwus@codeaurora.org \
--cc=jaegeuk@kernel.org \
--cc=jejb@linux.ibm.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=nguyenb@codeaurora.org \
--cc=stanley.chu@mediatek.com \
--cc=ziqichen@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.