From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
hongwus@codeaurora.org, ziqichen@codeaurora.org,
linux-scsi@vger.kernel.org, kernel-team@android.com,
Alim Akhtar <alim.akhtar@samsung.com>,
Avri Altman <avri.altman@wdc.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Stanley Chu <stanley.chu@mediatek.com>,
Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests
Date: Tue, 15 Jun 2021 11:17:40 +0800 [thread overview]
Message-ID: <ef1b4408a5fd87b3b2c9cb0e891b892f@codeaurora.org> (raw)
In-Reply-To: <3fce15502c2742a4388817538eb4db97@codeaurora.org>
On 2021-06-15 10:36, Can Guo wrote:
> Hi Bart,
>
> On 2021-06-15 02:49, Bart Van Assche wrote:
>> On 6/13/21 7:42 AM, Can Guo wrote:
>>> 2. ufshcd_abort() invokes ufshcd_err_handler() synchronously can have
>>> a
>>> live lock issue, which is why I chose the asynchronous way (from the
>>> first
>>> day I started to fix error handling). The live lock happens when
>>> abort
>>> happens
>>> to a PM request, e.g., a SSU cmd sent from suspend/resume. Because
>>> UFS
>>> error
>>> handler is synchronized with suspend/resume (by calling
>>> pm_runtime_get_sync()
>>> and lock_system_sleep()), the sequence is like:
>>> [1] ufshcd_wl_resume() sends SSU cmd
>>> [2] ufshcd_abort() calls UFS error handler
>>> [3] UFS error handler calls lock_system_sleep() and
>>> pm_runtime_get_sync()
>>>
>>> In above sequence, either lock_system_sleep() or
>>> pm_runtime_get_sync()
>>> shall
>>> be blocked - [3] is blocked by [1], [2] is blocked by [3], while [1]
>>> is
>>> blocked by [2].
>>>
>>> For PM requests, I chose to abort them fast to unblock
>>> suspend/resume,
>>> suspend/resume shall fail of course, but UFS error handler recovers
>>> PM errors anyways.
>>
>> In the above sequence, does [2] perhaps refer to aborting the SSU
>> command submitted in step [1] (this is not clear to me)?
>
> Yes, your understanding is right.
>
>> If so, how about breaking the circular waiting cycle as follows:
>> - If it can happen that SSU succeeds after more than scsi_timeout
>> seconds, define a custom timeout handler. From inside the timeout
>> handler, schedule a link check and return BLK_EH_RESET_TIMER. If the
>> link is no longer operational, run the error handler. If the link
>> cannot be recovered by the error handler, fail all pending commands.
>> This will prevent that ufshcd_abort() is called if a SSU command
>> takes
>> longer than expected. See also commit 0dd0dec1677e.
>> - Modify the UFS error handler such that it accepts a context
>> argument.
>> The context argument specifies whether or not the UFS error handler
>> is
>> called from inside a system suspend or system resume handler. If the
>> UFS error handler is called from inside a system suspend or resume
>> callback, skip the lock_system_sleep() and unlock_system_sleep()
>> calls.
>>
>
> I am aware of commit 0dd0dec1677e, I gave my reviewed-by tag. Thank you
> for your suggestion and I believe it can resolve the cycle, because
> actually
> I've considered the similar way (leverage hba->host->eh_noresume) last
> year,
> but I didn't take this way due to below reasons:
>
> 1. UFS error handler basically does one thing - reset and restore,
> which
> stops hba [1], resets device [2] and re-probes the device [3]. Stopping
> hba [1]
> shall complete any pending requests in the doorbell (with error or no
> error).
> After [1], suspend/resume contexts, blocked by SSU cmd, shall be
> unblocked
> right away to do whatever it needs to handle the SSU cmd failure
> (completed
> in [1], so scsi_execute() returns an error), e.g., put link back to the
> old
> state. call ufshcd_vops_suspend(), turn off irq/clocks/powers and
> etc...
> However, reset and restore ([2] and [3]) is still running, and it can
> (most likely)
> be disturbed by suspend/resume. So passing a parameter or using
> hba->host->eh_noresume
> to skip lock_system_sleep() and unlock_system_sleep() can break the
> cycle,
> but error handling may run concurrently with suspend/resume. Of course
> we can
> modify suspend/resume to avoid it, but I was pursuing a minimal change
> to get this fixed.
>
Add more - besides, SSU cmd is not the only PM request sent during
suspend/resume,
last year (before your changes came in) it also sends request sense cmd
without
checking the return value of it - so if request sense cmd abort happens,
suspend/resume
still move forward, which can run concurrently with error handling. So I
was pursuing
a way to make error handler less dependent on the bahaviours of these
contexts.
Thanks,
Can Guo.
> 2. Whatever way we take to break the cycle, suspend/resume shall fail
> and
> RPM framework shall save the error to dev.power.runtime_error, leaving
> the device in runtime suspended or active mode permanently. If it is
> left
> runtime suspended, UFS driver won't accept cmd anymore, while if it is
> left
> runtime active, powers of UFS device and host will be left ON, leading
> to power
> penalty. So my main idea is to let suspend/resume contexts, blocked by
> PM cmds,
> fail fast first and then error handler recover everything back to work.
>
> Thanks,
>
> Can Guo.
>
>> Thanks,
>>
>> Bart.
next prev parent reply other threads:[~2021-06-15 3:17 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-10 4:43 [PATCH v3 0/9] Complementary changes for error handling Can Guo
2021-06-10 4:43 ` [PATCH v3 1/9] scsi: ufs: Differentiate status between hba pm ops and wl pm ops Can Guo
2021-06-10 11:15 ` Adrian Hunter
2021-06-11 0:53 ` Can Guo
2021-06-11 20:40 ` Bart Van Assche
2021-06-12 6:20 ` Can Guo
2021-06-16 17:50 ` Bart Van Assche
2021-06-23 1:32 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 2/9] scsi: ufs: Update the return value of supplier " Can Guo
2021-06-10 4:43 ` [PATCH v3 3/9] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-10 4:43 ` [PATCH v3 4/9] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-11 20:52 ` Bart Van Assche
2021-06-12 7:38 ` Can Guo
2021-06-12 15:50 ` Bart Van Assche
2021-06-13 13:30 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-10 12:30 ` Adrian Hunter
2021-06-11 3:01 ` Can Guo
2021-06-11 20:58 ` Bart Van Assche
2021-06-12 6:46 ` Can Guo
2021-06-12 9:49 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 6/9] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-10 4:43 ` [PATCH v3 7/9] scsi: ufs: Let host_sem cover the entire system suspend/resume Can Guo
2021-06-10 13:32 ` Adrian Hunter
2021-06-11 3:06 ` Can Guo
2021-06-11 21:00 ` Bart Van Assche
2021-06-12 6:46 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-11 21:02 ` Bart Van Assche
2021-06-12 7:07 ` Can Guo
2021-06-12 16:50 ` Bart Van Assche
2021-06-13 14:42 ` Can Guo
2021-06-14 18:49 ` Bart Van Assche
2021-06-15 2:36 ` Can Guo
2021-06-15 3:17 ` Can Guo [this message]
2021-06-15 18:25 ` Bart Van Assche
2021-06-16 4:00 ` Can Guo
2021-06-16 4:40 ` Bart Van Assche
2021-06-16 8:47 ` Can Guo
2021-06-16 17:55 ` Bart Van Assche
2021-06-23 1:34 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 9/9] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-11 21:03 ` Bart Van Assche
2021-06-12 7:13 ` Can Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ef1b4408a5fd87b3b2c9cb0e891b892f@codeaurora.org \
--to=cang@codeaurora.org \
--cc=alim.akhtar@samsung.com \
--cc=asutoshd@codeaurora.org \
--cc=avri.altman@wdc.com \
--cc=beanhuo@micron.com \
--cc=bvanassche@acm.org \
--cc=hongwus@codeaurora.org \
--cc=jaegeuk@kernel.org \
--cc=jejb@linux.ibm.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=nguyenb@codeaurora.org \
--cc=stanley.chu@mediatek.com \
--cc=ziqichen@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).