From: Asutosh Das <quic_asutoshd@quicinc.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: "Martin K . Petersen" <martin.petersen@oracle.com>,
Jaegeuk Kim <jaegeuk@kernel.org>, <linux-scsi@vger.kernel.org>,
Adrian Hunter <adrian.hunter@intel.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>,
Bean Huo <beanhuo@micron.com>, Avri Altman <avri.altman@wdc.com>,
Jinyoung Choi <j-young.choi@samsung.com>
Subject: Re: [PATCH v2 8/8] scsi: ufs: Fix a deadlock between PM and the SCSI error handler
Date: Tue, 27 Sep 2022 12:30:12 -0700 [thread overview]
Message-ID: <20220927193012.GE15228@asutoshd-linux1.qualcomm.com> (raw)
In-Reply-To: <20220927184309.2223322-9-bvanassche@acm.org>
On Tue, Sep 27 2022 at 11:45 -0700, Bart Van Assche wrote:
>The following deadlock has been observed on multiple test setups:
>* ufshcd_wl_suspend() is waiting for blk_execute_rq() to complete while it
> holds host_sem.
>* ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the
> latter function tries to obtain host_sem.
>This is a deadlock because blk_execute_rq() can't execute SCSI commands
>while the host is in the SHOST_RECOVERY state and because the error
>handler cannot make progress either.
>
>Fix this deadlock as follows:
>* Fail attempts to suspend the system while the SCSI error handler is in
> progress.
>* If the system is suspending and a START STOP UNIT command times out,
> handle the SCSI command timeout from inside the context of the SCSI
> timeout handler instead of activating the SCSI error handler.
>* Reduce the START STOP UNIT command timeout to one second since on
> Android devices a kernel panic is triggered if an attempt to suspend
> the system takes more than 20 seconds. One second should be enough for
> the START STOP UNIT command since this command completes in less than
> a millisecond for the UFS devices I have access to.
>
>The runtime power management code is not affected by this deadlock since
>hba->host_sem is not touched by the runtime power management functions
>in the UFS driver.
>
>Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>---
> drivers/ufs/core/ufshcd.c | 51 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 50 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
>index 5507d93a4bba..010a5d1b984b 100644
>--- a/drivers/ufs/core/ufshcd.c
>+++ b/drivers/ufs/core/ufshcd.c
>@@ -8295,6 +8295,54 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie)
> }
> }
>
>+static enum scsi_timeout_action ufshcd_eh_timed_out(struct scsi_cmnd *scmd)
>+{
>+ struct ufs_hba *hba = shost_priv(scmd->device->host);
>+ bool reset_controller = false;
>+ int tag, ret;
>+
>+ if (!hba->system_suspending) {
>+ /* Activate the error handler in the SCSI core. */
>+ return SCSI_EH_NOT_HANDLED;
>+ }
>+
>+ /*
>+ * Handle errors directly to prevent a deadlock between
>+ * ufshcd_set_dev_pwr_mode() and ufshcd_err_handler().
>+ */
>+ for_each_set_bit(tag, &hba->outstanding_reqs, hba->nutrs) {
>+ ret = ufshcd_try_to_abort_task(hba, tag);
>+ dev_info(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
>+ hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
>+ ret == 0 ? "succeeded" : "failed");
>+ if (ret != 0) {
>+ reset_controller = true;
>+ break;
>+ }
>+ }
>+ for_each_set_bit(tag, &hba->outstanding_tasks, hba->nutmrs) {
>+ ret = ufshcd_clear_tm_cmd(hba, tag);
If reset_controller is true, then the HC would be reset and it would
anyway clear up all resources. Would this be needed if reset_controller is true?
>
>+ dev_info(hba->dev, "Aborting TMF %d %s\n", tag,
>+ ret == 0 ? "succeeded" : "failed");
>+ if (ret != 0) {
>+ reset_controller = true;
>+ break;
>+ }
>+ }
>+ if (reset_controller) {
>+ dev_info(hba->dev, "Resetting controller\n");
>+ ufshcd_reset_and_restore(hba);
>+ if (ufshcd_clear_cmds(hba, 0xffffffff))
ufshcd_reset_and_restore() would reset the host and the device.
So is the ufshcd_clear_cmds() needed after that?
>+ dev_err(hba->dev,
>+ "Clearing outstanding commands failed\n");
>+ }
>+ ufshcd_complete_requests(hba);
>+ dev_info(hba->dev, "%s() finished; outstanding_tasks = %#lx.\n",
>+ __func__, hba->outstanding_tasks);
>+
>+ return hba->outstanding_tasks ? SCSI_EH_RESET_TIMER : SCSI_EH_DONE;
next prev parent reply other threads:[~2022-09-27 19:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-27 18:43 [PATCH v2 0/8] Fix a deadlock in the UFS driver Bart Van Assche
2022-09-27 18:43 ` [PATCH v2 1/8] scsi: core: Fix a race between scsi_done() and scsi_timeout() Bart Van Assche
2022-09-27 18:43 ` [PATCH v2 2/8] scsi: core: Change the return type of .eh_timed_out() Bart Van Assche
2022-09-29 18:12 ` Lee Duncan
2022-09-27 18:43 ` [PATCH v2 3/8] scsi: core: Support failing requests while recovering Bart Van Assche
2022-09-27 18:43 ` [PATCH v2 4/8] scsi: ufs: Remove an outdated comment Bart Van Assche
2022-09-27 18:43 ` [PATCH v2 5/8] scsi: ufs: Use 'else' in ufshcd_set_dev_pwr_mode() Bart Van Assche
2022-09-27 18:43 ` [PATCH v2 6/8] scsi: ufs: Try harder to change the power mode Bart Van Assche
2022-09-27 18:43 ` [PATCH v2 7/8] scsi: ufs: Track system suspend / resume activity Bart Van Assche
2022-09-27 18:43 ` [PATCH v2 8/8] scsi: ufs: Fix a deadlock between PM and the SCSI error handler Bart Van Assche
2022-09-27 19:30 ` Asutosh Das [this message]
2022-09-28 23:09 ` Bart Van Assche
2022-09-28 16:47 ` Adrian Hunter
2022-09-28 23:15 ` Bart Van Assche
2022-09-29 10:58 ` Adrian Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220927193012.GE15228@asutoshd-linux1.qualcomm.com \
--to=quic_asutoshd@quicinc.com \
--cc=adrian.hunter@intel.com \
--cc=avri.altman@wdc.com \
--cc=beanhuo@micron.com \
--cc=bvanassche@acm.org \
--cc=j-young.choi@samsung.com \
--cc=jaegeuk@kernel.org \
--cc=jejb@linux.ibm.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox