public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: "Asutosh Das (asd)" <asutoshd@codeaurora.org>
To: Can Guo <cang@codeaurora.org>,
	nguyenb@codeaurora.org, hongwus@codeaurora.org,
	ziqichen@codeaurora.org, rnayak@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	saravanak@google.com, salyzyn@google.com
Cc: Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Bean Huo <beanhuo@micron.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Satya Tangirala <satyat@google.com>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 2/3] scsi: ufs: Fix a racing problem between ufshcd_abort and eh_work
Date: Mon, 30 Nov 2020 18:42:07 -0800	[thread overview]
Message-ID: <0a3a545b-55f8-e358-7d62-00cde64d40aa@codeaurora.org> (raw)
In-Reply-To: <1605596660-2987-3-git-send-email-cang@codeaurora.org>

On 11/16/2020 11:04 PM, Can Guo wrote:
> In current task abort routine, if task abort happens to the device W-LU,
> the code directly jumps to ufshcd_eh_host_reset_handler() to perform a
> full reset and restore then returns FAIL or SUCCESS. Commands sent to the
> device W-LU are most likely the SSU cmds sent during UFS PM operations. If
> such SSU cmd enters task abort routine, when ufshcd_eh_host_reset_handler()
> flushes eh_work, there will be racing because err_handler is serialized
> with any PM operations.
> 
> Since the main idea of aborting one cmd to the device W-LU is to perform
> a full reset and restore, in order to resolve the racing problem, we merely
> clean up the lrb taken by this cmd, queue the eh_work and abort the cmd.
> Since the cmd has been aborted, the PM operation which sends the cmd simply
> errors out, thus err_handler will not be blocked by ongoing PM operations
> and err_handler can also recover PM error if any, which comes as another
> benefit of this change.
> 
> Because such cmd is aborted even before it is actually cleared from HW, set
> the lrb->in_use flag to prevent subsequent cmds, including SCSI cmds and
> dev cmds, from taking the lrb released by this cmd. Flag lrb->in_use shall
> evetually be cleared in __ufshcd_transfer_req_compl() invoked by the full
> reset and restore from err_handler.
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>
> ---

Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>

>   drivers/scsi/ufs/ufshcd.c | 55 ++++++++++++++++++++++++++++++++++++-----------
>   drivers/scsi/ufs/ufshcd.h |  2 ++
>   2 files changed, 45 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 7e764e8..cd7394e 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -2539,6 +2539,14 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
>   		(hba->clk_gating.state != CLKS_ON));
>   
>   	lrbp = &hba->lrb[tag];
> +	if (unlikely(lrbp->in_use)) {
> +		if (hba->pm_op_in_progress)
> +			set_host_byte(cmd, DID_BAD_TARGET);
> +		else
> +			err = SCSI_MLQUEUE_HOST_BUSY;
> +		ufshcd_release(hba);
> +		goto out;
> +	}
>   
>   	WARN_ON(lrbp->cmd);
>   	lrbp->cmd = cmd;
> @@ -2781,6 +2789,11 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
>   
>   	init_completion(&wait);
>   	lrbp = &hba->lrb[tag];
> +	if (unlikely(lrbp->in_use)) {
> +		err = -EBUSY;
> +		goto out;
> +	}
> +
>   	WARN_ON(lrbp->cmd);
>   	err = ufshcd_compose_dev_cmd(hba, lrbp, cmd_type, tag);
>   	if (unlikely(err))
> @@ -2797,6 +2810,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
>   
>   	err = ufshcd_wait_for_dev_cmd(hba, lrbp, timeout);
>   
> +out:
>   	ufshcd_add_query_upiu_trace(hba, tag,
>   			err ? "query_complete_err" : "query_complete");
>   
> @@ -4932,6 +4946,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
>   
>   	for_each_set_bit(index, &completed_reqs, hba->nutrs) {
>   		lrbp = &hba->lrb[index];
> +		lrbp->in_use = false;
>   		lrbp->compl_time_stamp = ktime_get();
>   		cmd = lrbp->cmd;
>   		if (cmd) {
> @@ -6374,8 +6389,12 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
>   
>   	init_completion(&wait);
>   	lrbp = &hba->lrb[tag];
> -	WARN_ON(lrbp->cmd);
> +	if (unlikely(lrbp->in_use)) {
> +		err = -EBUSY;
> +		goto out;
> +	}
>   
> +	WARN_ON(lrbp->cmd);
>   	lrbp->cmd = NULL;
>   	lrbp->sense_bufflen = 0;
>   	lrbp->sense_buffer = NULL;
> @@ -6447,6 +6466,7 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
>   		}
>   	}
>   
> +out:
>   	blk_put_request(req);
>   out_unlock:
>   	up_read(&hba->clk_scaling_lock);
> @@ -6696,16 +6716,6 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
>   		BUG();
>   	}
>   
> -	/*
> -	 * Task abort to the device W-LUN is illegal. When this command
> -	 * will fail, due to spec violation, scsi err handling next step
> -	 * will be to send LU reset which, again, is a spec violation.
> -	 * To avoid these unnecessary/illegal step we skip to the last error
> -	 * handling stage: reset and restore.
> -	 */
> -	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN)
> -		return ufshcd_eh_host_reset_handler(cmd);
> -
>   	ufshcd_hold(hba, false);
>   	reg = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
>   	/* If command is already aborted/completed, return SUCCESS */
> @@ -6726,7 +6736,7 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
>   	 * to reduce repeated printouts. For other aborted requests only print
>   	 * basic details.
>   	 */
> -	scsi_print_command(hba->lrb[tag].cmd);
> +	scsi_print_command(cmd);
>   	if (!hba->req_abort_count) {
>   		ufshcd_update_reg_hist(&hba->ufs_stats.task_abort, 0);
>   		ufshcd_print_host_regs(hba);
> @@ -6745,6 +6755,27 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
>   		goto cleanup;
>   	}
>   
> +	/*
> +	 * Task abort to the device W-LUN is illegal. When this command
> +	 * will fail, due to spec violation, scsi err handling next step
> +	 * will be to send LU reset which, again, is a spec violation.
> +	 * To avoid these unnecessary/illegal steps, first we clean up
> +	 * the lrb taken by this cmd and mark the lrb as in_use, then
> +	 * queue the eh_work and bail.
> +	 */
> +	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN) {
> +		spin_lock_irqsave(host->host_lock, flags);
> +		if (lrbp->cmd) {
> +			__ufshcd_transfer_req_compl(hba, (1UL << tag));
> +			__set_bit(tag, &hba->outstanding_reqs);
> +			lrbp->in_use = true;
> +			hba->force_reset = true;
> +			ufshcd_schedule_eh_work(hba);
> +		}
> +		spin_unlock_irqrestore(host->host_lock, flags);
> +		goto out;
> +	}
> +
>   	/* Skip task abort in case previous aborts failed and report failure */
>   	if (lrbp->req_abort_skip)
>   		err = -EIO;
> diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
> index 1e680bf..66e5338 100644
> --- a/drivers/scsi/ufs/ufshcd.h
> +++ b/drivers/scsi/ufs/ufshcd.h
> @@ -163,6 +163,7 @@ struct ufs_pm_lvl_states {
>    * @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
>    * @data_unit_num: the data unit number for the first block for inline crypto
>    * @req_abort_skip: skip request abort task flag
> + * @in_use: indicates that this lrb is still in use
>    */
>   struct ufshcd_lrb {
>   	struct utp_transfer_req_desc *utr_descriptor_ptr;
> @@ -192,6 +193,7 @@ struct ufshcd_lrb {
>   #endif
>   
>   	bool req_abort_skip;
> +	bool in_use;
>   };
>   
>   /**
> 


-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project

  reply	other threads:[~2020-12-01  2:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17  7:04 [PATCH v3 0/3] Fix some racing problems btw err_handler and paths like system PM ops and the task abort callback Can Guo
2020-11-17  7:04 ` [PATCH v3 1/3] scsi: ufs: Serialize eh_work with system PM events and async scan Can Guo
2020-11-25  2:48   ` hongwus
2020-12-01  2:41   ` Asutosh Das (asd)
2020-11-17  7:04 ` [PATCH v3 2/3] scsi: ufs: Fix a racing problem between ufshcd_abort and eh_work Can Guo
2020-12-01  2:42   ` Asutosh Das (asd) [this message]
2020-11-17  7:04 ` [PATCH v3 3/3] scsi: ufs: Print host regs in IRQ handler when AH8 error happens Can Guo
2020-11-17 17:23   ` Asutosh Das (asd)
2020-11-25  2:51     ` hongwus
2020-11-25  4:17   ` nguyenb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a3a545b-55f8-e358-7d62-00cde64d40aa@codeaurora.org \
    --to=asutoshd@codeaurora.org \
    --cc=alim.akhtar@samsung.com \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=cang@codeaurora.org \
    --cc=hongwus@codeaurora.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=rnayak@codeaurora.org \
    --cc=salyzyn@google.com \
    --cc=saravanak@google.com \
    --cc=satyat@google.com \
    --cc=stanley.chu@mediatek.com \
    --cc=ziqichen@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox