linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, ziqichen@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests
Date: Sun, 13 Jun 2021 22:42:55 +0800	[thread overview]
Message-ID: <75527f0ba5d315d6edbf800a2ddcf8c7@codeaurora.org> (raw)
In-Reply-To: <926d8c4a-0fbf-a973-188a-b10c9acaa444@acm.org>

Hi Bart,

On 2021-06-13 00:50, Bart Van Assche wrote:
> On 6/12/21 12:07 AM, Can Guo wrote:
>> Sigh... I also want my life and work to be easier...
> 
> How about reducing the number of states and state transitions in the 
> UFS
> driver? One source of complexity is that ufshcd_err_handler() is 
> scheduled
> independently of the SCSI error handler and hence may run concurrently
> with the SCSI error handler. Has the following already been considered?
> - Call ufshcd_err_handler() synchronously from ufshcd_abort() and
> ufshcd_eh_host_reset_handler() instead of asynchronously.

1. ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and 
flushes
it, so it is synchronous. ufshcd_eh_host_reset_handler() used to call
reset_and_restore() directly, which can run concurrently with UFS error 
handler,
so I fixed it last year [1].

2. ufshcd_abort() invokes ufshcd_err_handler() synchronously can have a
live lock issue, which is why I chose the asynchronous way (from the 
first
day I started to fix error handling). The live lock happens when abort 
happens
to a PM request, e.g., a SSU cmd sent from suspend/resume. Because UFS 
error
handler is synchronized with suspend/resume (by calling 
pm_runtime_get_sync()
and lock_system_sleep()), the sequence is like:
[1] ufshcd_wl_resume() sends SSU cmd
[2] ufshcd_abort() calls UFS error handler
[3] UFS error handler calls lock_system_sleep() and 
pm_runtime_get_sync()

In above sequence, either lock_system_sleep() or pm_runtime_get_sync() 
shall
be blocked - [3] is blocked by [1], [2] is blocked by [3], while [1] is 
blocked by [2].

For PM requests, I chose to abort them fast to unblock suspend/resume,
suspend/resume shall fail of course, but UFS error handler recovers
PM errors anyways.

> - Call scsi_schedule_eh() from ufshcd_uic_pwr_ctrl() and
> ufshcd_check_errors() instead of ufshcd_schedule_eh_work().

When ufshcd_uic_pwr_ctrl() and/or ufshcd_check_errors() report errors,
usually they are fatal errors, according to UFSHCI spec, SW should 
re-probe
UFS to recover.

However scsi_schedule_eh() does more than that - scsi_unjam_host() sends
request sense cmd and calls scsi_eh_ready_devs(), while 
scsi_eh_ready_devs()
sends test unit ready cmd and calls all the way down to 
scsi_eh_device/target/
bus/host_reset(). But we only need scsi_eh_host_reset() in this case. I 
know
you have concerns that scsi_schedule_eh() may run concurrently with UFS 
error
handler, but as I mentioned above in [1] - I've made 
ufshcd_eh_host_reset_handler()
synchronized with UFS error handler, hope that can ease your concern.

I am not saying your idea won't work, it is a good suggestion. I will 
try
it after these changes go in, because it would require extra effort and 
the
effort won't be minor - I need to consider how to remove/reduce the 
ufshcd
states along with the change and the error injection and stability test 
all
over again, which is a long way to go. As for now, at least current 
changes
works well as per my test and we really need these changes for 
Andriod12-5.10.

Thanks,

Can Guo.

> 
> These changes will guarantee that all commands have completed or timed
> out before ufshcd_err_handler() is called. I think that would allow to
> remove e.g. the following code from the error handler:
> 
> 	ufshcd_scsi_block_requests(hba);
> 	/* Drain ufshcd_queuecommand() */
> 	down_write(&hba->clk_scaling_lock);
> 	up_write(&hba->clk_scaling_lock);
> 
> Thanks,
> 
> Bart.

  reply	other threads:[~2021-06-13 14:43 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10  4:43 [PATCH v3 0/9] Complementary changes for error handling Can Guo
2021-06-10  4:43 ` [PATCH v3 1/9] scsi: ufs: Differentiate status between hba pm ops and wl pm ops Can Guo
2021-06-10 11:15   ` Adrian Hunter
2021-06-11  0:53     ` Can Guo
2021-06-11 20:40   ` Bart Van Assche
2021-06-12  6:20     ` Can Guo
2021-06-16 17:50   ` Bart Van Assche
2021-06-23  1:32     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 2/9] scsi: ufs: Update the return value of supplier " Can Guo
2021-06-10  4:43 ` [PATCH v3 3/9] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-10  4:43 ` [PATCH v3 4/9] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-11 20:52   ` Bart Van Assche
2021-06-12  7:38     ` Can Guo
2021-06-12 15:50       ` Bart Van Assche
2021-06-13 13:30         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-10 12:30   ` Adrian Hunter
2021-06-11  3:01     ` Can Guo
2021-06-11 20:58       ` Bart Van Assche
2021-06-12  6:46         ` Can Guo
2021-06-12  9:49           ` Can Guo
2021-06-10  4:43 ` [PATCH v3 6/9] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-10  4:43 ` [PATCH v3 7/9] scsi: ufs: Let host_sem cover the entire system suspend/resume Can Guo
2021-06-10 13:32   ` Adrian Hunter
2021-06-11  3:06     ` Can Guo
2021-06-11 21:00   ` Bart Van Assche
2021-06-12  6:46     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-11 21:02   ` Bart Van Assche
2021-06-12  7:07     ` Can Guo
2021-06-12 16:50       ` Bart Van Assche
2021-06-13 14:42         ` Can Guo [this message]
2021-06-14 18:49           ` Bart Van Assche
2021-06-15  2:36             ` Can Guo
2021-06-15  3:17               ` Can Guo
2021-06-15 18:25               ` Bart Van Assche
2021-06-16  4:00                 ` Can Guo
2021-06-16  4:40                   ` Bart Van Assche
2021-06-16  8:47                     ` Can Guo
2021-06-16 17:55                       ` Bart Van Assche
2021-06-23  1:34                         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 9/9] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-11 21:03   ` Bart Van Assche
2021-06-12  7:13     ` Can Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75527f0ba5d315d6edbf800a2ddcf8c7@codeaurora.org \
    --to=cang@codeaurora.org \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=hongwus@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=stanley.chu@mediatek.com \
    --cc=ziqichen@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).