* [PATCH v10 1/2] ufs: core: fix the issue of ICU failure
2024-10-01 9:19 [PATCH v10 0/2] fix abort defect peter.wang
@ 2024-10-01 9:19 ` peter.wang
2024-10-01 9:19 ` [PATCH v10 2/2] ufs: core: requeue aborted request peter.wang
2024-10-16 2:38 ` [PATCH v10 0/2] fix abort defect Martin K. Petersen
2 siblings, 0 replies; 13+ messages in thread
From: peter.wang @ 2024-10-01 9:19 UTC (permalink / raw)
To: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu, ed.tsai,
bvanassche, quic_nguyenb, stable
From: Peter Wang <peter.wang@mediatek.com>
When setting the ICU bit without using read-modify-write,
SQRTCy will restart SQ again and receive an RTC return
error code 2 (Failure - SQ not stopped).
Additionally, the error log has been modified so that
this type of error can be observed.
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
---
drivers/ufs/core/ufs-mcq.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
index 5891cdacd0b3..3903947dbed1 100644
--- a/drivers/ufs/core/ufs-mcq.c
+++ b/drivers/ufs/core/ufs-mcq.c
@@ -539,7 +539,7 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag)
struct scsi_cmnd *cmd = lrbp->cmd;
struct ufs_hw_queue *hwq;
void __iomem *reg, *opr_sqd_base;
- u32 nexus, id, val;
+ u32 nexus, id, val, rtc;
int err;
if (hba->quirks & UFSHCD_QUIRK_MCQ_BROKEN_RTC)
@@ -569,17 +569,18 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag)
opr_sqd_base = mcq_opr_base(hba, OPR_SQD, id);
writel(nexus, opr_sqd_base + REG_SQCTI);
- /* SQRTCy.ICU = 1 */
- writel(SQ_ICU, opr_sqd_base + REG_SQRTC);
+ /* Initiate Cleanup */
+ writel(readl(opr_sqd_base + REG_SQRTC) | SQ_ICU,
+ opr_sqd_base + REG_SQRTC);
/* Poll SQRTSy.CUS = 1. Return result from SQRTSy.RTC */
reg = opr_sqd_base + REG_SQRTS;
err = read_poll_timeout(readl, val, val & SQ_CUS, 20,
MCQ_POLL_US, false, reg);
- if (err)
- dev_err(hba->dev, "%s: failed. hwq=%d, tag=%d err=%ld\n",
- __func__, id, task_tag,
- FIELD_GET(SQ_ICU_ERR_CODE_MASK, readl(reg)));
+ rtc = FIELD_GET(SQ_ICU_ERR_CODE_MASK, readl(reg));
+ if (err || rtc)
+ dev_err(hba->dev, "%s: failed. hwq=%d, tag=%d err=%d RTC=%d\n",
+ __func__, id, task_tag, err, rtc);
if (ufshcd_mcq_sq_start(hba, hwq))
err = -ETIMEDOUT;
--
2.45.2
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-01 9:19 [PATCH v10 0/2] fix abort defect peter.wang
2024-10-01 9:19 ` [PATCH v10 1/2] ufs: core: fix the issue of ICU failure peter.wang
@ 2024-10-01 9:19 ` peter.wang
2024-10-01 17:13 ` Bart Van Assche
2024-10-09 17:59 ` Bart Van Assche
2024-10-16 2:38 ` [PATCH v10 0/2] fix abort defect Martin K. Petersen
2 siblings, 2 replies; 13+ messages in thread
From: peter.wang @ 2024-10-01 9:19 UTC (permalink / raw)
To: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu, ed.tsai,
bvanassche, quic_nguyenb, stable
From: Peter Wang <peter.wang@mediatek.com>
After the SQ cleanup fix, the CQ will receive a response with
the corresponding tag marked as OCS: ABORTED. To align with
the behavior of Legacy SDB mode, the handling of OCS: ABORTED
has been changed to match that of OCS_INVALID_COMMAND_STATUS
(SDB), with both returning a SCSI result of DID_REQUEUE.
Furthermore, the workaround implemented before the SQ cleanup
fix can be removed.
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
drivers/ufs/core/ufshcd.c | 20 ++++----------------
1 file changed, 4 insertions(+), 16 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 24a32e2fd75e..8e2a7889a565 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -5417,10 +5417,12 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
}
break;
case OCS_ABORTED:
- result |= DID_ABORT << 16;
- break;
case OCS_INVALID_COMMAND_STATUS:
result |= DID_REQUEUE << 16;
+ dev_warn(hba->dev,
+ "OCS %s from controller for tag %d\n",
+ (ocs == OCS_ABORTED? "aborted" : "invalid"),
+ lrbp->task_tag);
break;
case OCS_INVALID_CMD_TABLE_ATTR:
case OCS_INVALID_PRDT_ATTR:
@@ -6466,26 +6468,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv)
struct scsi_device *sdev = cmd->device;
struct Scsi_Host *shost = sdev->host;
struct ufs_hba *hba = shost_priv(shost);
- struct ufshcd_lrb *lrbp = &hba->lrb[tag];
- struct ufs_hw_queue *hwq;
- unsigned long flags;
*ret = ufshcd_try_to_abort_task(hba, tag);
dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
*ret ? "failed" : "succeeded");
- /* Release cmd in MCQ mode if abort succeeds */
- if (hba->mcq_enabled && (*ret == 0)) {
- hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
- if (!hwq)
- return 0;
- spin_lock_irqsave(&hwq->cq_lock, flags);
- if (ufshcd_cmd_inflight(lrbp->cmd))
- ufshcd_release_scsi_cmd(hba, lrbp);
- spin_unlock_irqrestore(&hwq->cq_lock, flags);
- }
-
return *ret == 0;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-01 9:19 ` [PATCH v10 2/2] ufs: core: requeue aborted request peter.wang
@ 2024-10-01 17:13 ` Bart Van Assche
2024-10-02 12:42 ` Peter Wang (王信友)
2024-10-09 17:59 ` Bart Van Assche
1 sibling, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2024-10-01 17:13 UTC (permalink / raw)
To: peter.wang, linux-scsi, martin.petersen; +Cc: wsd_upstream, linux-mediatek
On 10/1/24 2:19 AM, peter.wang@mediatek.com wrote:
> From: Peter Wang <peter.wang@mediatek.com>
>
> After the SQ cleanup fix, the CQ will receive a response with
> the corresponding tag marked as OCS: ABORTED. To align with
> the behavior of Legacy SDB mode, the handling of OCS: ABORTED
> has been changed to match that of OCS_INVALID_COMMAND_STATUS
> (SDB), with both returning a SCSI result of DID_REQUEUE.
>
> Furthermore, the workaround implemented before the SQ cleanup
> fix can be removed.
>
> Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
> Cc: stable@vger.kernel.org
> Signed-off-by: Peter Wang <peter.wang@mediatek.com>
> ---
> drivers/ufs/core/ufshcd.c | 20 ++++----------------
> 1 file changed, 4 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 24a32e2fd75e..8e2a7889a565 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5417,10 +5417,12 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
> }
> break;
> case OCS_ABORTED:
> - result |= DID_ABORT << 16;
> - break;
> case OCS_INVALID_COMMAND_STATUS:
> result |= DID_REQUEUE << 16;
> + dev_warn(hba->dev,
> + "OCS %s from controller for tag %d\n",
> + (ocs == OCS_ABORTED? "aborted" : "invalid"),
> + lrbp->task_tag);
> break;
> case OCS_INVALID_CMD_TABLE_ATTR:
> case OCS_INVALID_PRDT_ATTR:
> @@ -6466,26 +6468,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv)
> struct scsi_device *sdev = cmd->device;
> struct Scsi_Host *shost = sdev->host;
> struct ufs_hba *hba = shost_priv(shost);
> - struct ufshcd_lrb *lrbp = &hba->lrb[tag];
> - struct ufs_hw_queue *hwq;
> - unsigned long flags;
>
> *ret = ufshcd_try_to_abort_task(hba, tag);
> dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
> hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
> *ret ? "failed" : "succeeded");
>
> - /* Release cmd in MCQ mode if abort succeeds */
> - if (hba->mcq_enabled && (*ret == 0)) {
> - hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
> - if (!hwq)
> - return 0;
> - spin_lock_irqsave(&hwq->cq_lock, flags);
> - if (ufshcd_cmd_inflight(lrbp->cmd))
> - ufshcd_release_scsi_cmd(hba, lrbp);
> - spin_unlock_irqrestore(&hwq->cq_lock, flags);
> - }
> -
> return *ret == 0;
> }
As mentioned before, ufshcd_try_to_abort_task() cannot handle concurrent
scsi_done() calls. ufshcd_abort_one() calls ufshcd_try_to_abort_task()
without even trying to prevent that scsi_done() is called concurrently.
Since this could result in a kernel crash, I think that it is important
that this gets fixed, even if it requires modifying the SCSI core.
Bart.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-01 17:13 ` Bart Van Assche
@ 2024-10-02 12:42 ` Peter Wang (王信友)
2024-10-03 20:02 ` Bart Van Assche
0 siblings, 1 reply; 13+ messages in thread
From: Peter Wang (王信友) @ 2024-10-02 12:42 UTC (permalink / raw)
To: linux-scsi@vger.kernel.org, bvanassche@acm.org,
martin.petersen@oracle.com
Cc: wsd_upstream, linux-mediatek@lists.infradead.org
On Tue, 2024-10-01 at 10:13 -0700, Bart Van Assche wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> On 10/1/24 2:19 AM, peter.wang@mediatek.com wrote:
> > From: Peter Wang <peter.wang@mediatek.com>
> >
> > After the SQ cleanup fix, the CQ will receive a response with
> > the corresponding tag marked as OCS: ABORTED. To align with
> > the behavior of Legacy SDB mode, the handling of OCS: ABORTED
> > has been changed to match that of OCS_INVALID_COMMAND_STATUS
> > (SDB), with both returning a SCSI result of DID_REQUEUE.
> >
> > Furthermore, the workaround implemented before the SQ cleanup
> > fix can be removed.
> >
> > Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ
> mode")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Peter Wang <peter.wang@mediatek.com>
> > ---
> > drivers/ufs/core/ufshcd.c | 20 ++++----------------
> > 1 file changed, 4 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> > index 24a32e2fd75e..8e2a7889a565 100644
> > --- a/drivers/ufs/core/ufshcd.c
> > +++ b/drivers/ufs/core/ufshcd.c
> > @@ -5417,10 +5417,12 @@ ufshcd_transfer_rsp_status(struct ufs_hba
> *hba, struct ufshcd_lrb *lrbp,
> > }
> > break;
> > case OCS_ABORTED:
> > -result |= DID_ABORT << 16;
> > -break;
> > case OCS_INVALID_COMMAND_STATUS:
> > result |= DID_REQUEUE << 16;
> > +dev_warn(hba->dev,
> > +"OCS %s from controller for tag %d\n",
> > +(ocs == OCS_ABORTED? "aborted" : "invalid"),
> > +lrbp->task_tag);
> > break;
> > case OCS_INVALID_CMD_TABLE_ATTR:
> > case OCS_INVALID_PRDT_ATTR:
> > @@ -6466,26 +6468,12 @@ static bool ufshcd_abort_one(struct request
> *rq, void *priv)
> > struct scsi_device *sdev = cmd->device;
> > struct Scsi_Host *shost = sdev->host;
> > struct ufs_hba *hba = shost_priv(shost);
> > -struct ufshcd_lrb *lrbp = &hba->lrb[tag];
> > -struct ufs_hw_queue *hwq;
> > -unsigned long flags;
> >
> > *ret = ufshcd_try_to_abort_task(hba, tag);
> > dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
> > hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
> > *ret ? "failed" : "succeeded");
> >
> > -/* Release cmd in MCQ mode if abort succeeds */
> > -if (hba->mcq_enabled && (*ret == 0)) {
> > -hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
> > -if (!hwq)
> > -return 0;
> > -spin_lock_irqsave(&hwq->cq_lock, flags);
> > -if (ufshcd_cmd_inflight(lrbp->cmd))
> > -ufshcd_release_scsi_cmd(hba, lrbp);
> > -spin_unlock_irqrestore(&hwq->cq_lock, flags);
> > -}
> > -
> > return *ret == 0;
> > }
>
> As mentioned before, ufshcd_try_to_abort_task() cannot handle
> concurrent
> scsi_done() calls. ufshcd_abort_one() calls
> ufshcd_try_to_abort_task()
> without even trying to prevent that scsi_done() is called
> concurrently.
> Since this could result in a kernel crash, I think that it is
> important
> that this gets fixed, even if it requires modifying the SCSI core.
>
> Bart.
>
>
Hi Bart,
This patch merely aligns with the approach of SDB mode
and does not involve the flow of scsi_done. Besides,
I don't see any issue with concurrency between
ufshcd_abort_one() calling ufshcd_try_to_abort_task()
and scsi_done(). Can you point out the specific flow where
the problem occurs? If there is one, shouldn't SDB mode
have the same issue?
Thanks
Peter
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-02 12:42 ` Peter Wang (王信友)
@ 2024-10-03 20:02 ` Bart Van Assche
2024-10-07 7:20 ` Peter Wang (王信友)
0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2024-10-03 20:02 UTC (permalink / raw)
To: Peter Wang (王信友), linux-scsi@vger.kernel.org,
martin.petersen@oracle.com
Cc: wsd_upstream, linux-mediatek@lists.infradead.org
On 10/2/24 5:42 AM, Peter Wang (王信友) wrote:
> This patch merely aligns with the approach of SDB mode
> and does not involve the flow of scsi_done. Besides,
> I don't see any issue with concurrency between
> ufshcd_abort_one() calling ufshcd_try_to_abort_task()
> and scsi_done(). Can you point out the specific flow where
> the problem occurs? If there is one, shouldn't SDB mode
> have the same issue?
Hi Peter,
Correct, my comment applies to both legacy mode and MCQ mode. From the
section in the UFS standard about ABORT TASK: "A response of FUNCTION
COMPLETE shall indicate that the command was aborted or was not in the
task set." In other words, if a command completes just before
ufshcd_try_to_abort_task() calls ufshcd_issue_tm_cmd(), then
ufshcd_try_to_abort_task() will call ufshcd_clear_cmd() for a command
that has already completed. In legacy mode, this call will succeed.
Hence, both ufshcd_compl_one_cqe() and ufshcd_abort_all() will call
ufshcd_release(hba). This will cause hba->clk_gating.active_reqs to be
decremented twice instead of only once. Do you agree that this can
happen and also that it should be prevented that this happens?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-03 20:02 ` Bart Van Assche
@ 2024-10-07 7:20 ` Peter Wang (王信友)
2024-10-08 18:29 ` Bart Van Assche
0 siblings, 1 reply; 13+ messages in thread
From: Peter Wang (王信友) @ 2024-10-07 7:20 UTC (permalink / raw)
To: linux-scsi@vger.kernel.org, bvanassche@acm.org,
martin.petersen@oracle.com
Cc: wsd_upstream, linux-mediatek@lists.infradead.org
On Thu, 2024-10-03 at 13:02 -0700, Bart Van Assche wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> On 10/2/24 5:42 AM, Peter Wang (王信友) wrote:
> > This patch merely aligns with the approach of SDB mode
> > and does not involve the flow of scsi_done. Besides,
> > I don't see any issue with concurrency between
> > ufshcd_abort_one() calling ufshcd_try_to_abort_task()
> > and scsi_done(). Can you point out the specific flow where
> > the problem occurs? If there is one, shouldn't SDB mode
> > have the same issue?
>
> Hi Peter,
>
> Correct, my comment applies to both legacy mode and MCQ mode. From
> the
> section in the UFS standard about ABORT TASK: "A response of FUNCTION
> COMPLETE shall indicate that the command was aborted or was not in
> the
> task set." In other words, if a command completes just before
> ufshcd_try_to_abort_task() calls ufshcd_issue_tm_cmd(), then
> ufshcd_try_to_abort_task() will call ufshcd_clear_cmd() for a command
> that has already completed. In legacy mode, this call will succeed.
>
Hi Bart,
Yes, the legacy SDB mode is protected by the outstanding_lock.
> Hence, both ufshcd_compl_one_cqe() and ufshcd_abort_all() will call
> ufshcd_release(hba). This will cause hba->clk_gating.active_reqs to
> be
> decremented twice instead of only once. Do you agree that this can
> happen and also that it should be prevented that this happens?
>
> Thanks,
>
> Bart.
Sorry, I still don't understand why both ufshcd_compl_one_cqe()
and ufshcd_abort_all() will call ufshcd_release(hba)?
Because I have already removed the ufshcd_release_scsi_cmd from
ufshcd_abort_one, so the command won't be released immediately
after ufshcd_try_to_abort_task succeeds. Instead, it will wait
until the CQ Entry comes in before releasing. And since it is
protected by the cq_lock, it should only release once, right?
Thanks.
Peter
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-07 7:20 ` Peter Wang (王信友)
@ 2024-10-08 18:29 ` Bart Van Assche
2024-10-09 2:17 ` Peter Wang (王信友)
0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2024-10-08 18:29 UTC (permalink / raw)
To: Peter Wang (王信友), linux-scsi@vger.kernel.org,
martin.petersen@oracle.com
Cc: wsd_upstream, linux-mediatek@lists.infradead.org
On 10/7/24 12:20 AM, Peter Wang (王信友) wrote:
> On Thu, 2024-10-03 at 13:02 -0700, Bart Van Assche wrote:
>>
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>> On 10/2/24 5:42 AM, Peter Wang (王信友) wrote:
>>> This patch merely aligns with the approach of SDB mode
>>> and does not involve the flow of scsi_done. Besides,
>>> I don't see any issue with concurrency between
>>> ufshcd_abort_one() calling ufshcd_try_to_abort_task()
>>> and scsi_done(). Can you point out the specific flow where
>>> the problem occurs? If there is one, shouldn't SDB mode
>>> have the same issue?
>>
>> Hi Peter,
>>
>> Correct, my comment applies to both legacy mode and MCQ mode. From
>> the
>> section in the UFS standard about ABORT TASK: "A response of FUNCTION
>> COMPLETE shall indicate that the command was aborted or was not in
>> the
>> task set." In other words, if a command completes just before
>> ufshcd_try_to_abort_task() calls ufshcd_issue_tm_cmd(), then
>> ufshcd_try_to_abort_task() will call ufshcd_clear_cmd() for a command
>> that has already completed. In legacy mode, this call will succeed.
>>
>
> Hi Bart,
>
> Yes, the legacy SDB mode is protected by the outstanding_lock.
>
>
>> Hence, both ufshcd_compl_one_cqe() and ufshcd_abort_all() will call
>> ufshcd_release(hba). This will cause hba->clk_gating.active_reqs to
>> be
>> decremented twice instead of only once. Do you agree that this can
>> happen and also that it should be prevented that this happens?
>>
>> Thanks,
>>
>> Bart.
>
> Sorry, I still don't understand why both ufshcd_compl_one_cqe()
> and ufshcd_abort_all() will call ufshcd_release(hba)?
> Because I have already removed the ufshcd_release_scsi_cmd from
> ufshcd_abort_one, so the command won't be released immediately
> after ufshcd_try_to_abort_task succeeds. Instead, it will wait
> until the CQ Entry comes in before releasing. And since it is
> protected by the cq_lock, it should only release once, right?
Hi Peter,
I think what you wrote applies to MCQ mode only. In my previous email
I clearly referred to "legacy mode" (SDB mode). Summarizing my previous
email, I think that in legacy mode it is possible that ufshcd_release()
is called twice while it only should be called once. Here are the
possible solutions I see:
* Add a function to the SCSI core for setting SCMD_STATE_COMPLETE. This
may be controversial since no other SCSI LLD needs this functionality.
* Changing the error handling approach in the UFS driver to the same
approach other SCSI LLDs use: instead of using queue_work() to
activate the error handler, call scsi_schedule_eh(). This will cause
the error handler to be activated later, namely after all pending
commands have timed out instead of aborting any pending commands
first.
* Add a variant of scsi_schedule_eh() to the SCSI core that accelerates
error handling by calling scsi_timeout() on all pending commands.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-08 18:29 ` Bart Van Assche
@ 2024-10-09 2:17 ` Peter Wang (王信友)
2024-10-09 18:06 ` Bart Van Assche
0 siblings, 1 reply; 13+ messages in thread
From: Peter Wang (王信友) @ 2024-10-09 2:17 UTC (permalink / raw)
To: linux-scsi@vger.kernel.org, bvanassche@acm.org,
martin.petersen@oracle.com
Cc: wsd_upstream, linux-mediatek@lists.infradead.org
On Tue, 2024-10-08 at 11:29 -0700, Bart Van Assche wrote:
> Hi Peter,
>
> I think what you wrote applies to MCQ mode only. In my previous email
> I clearly referred to "legacy mode" (SDB mode). Summarizing my
> previous
> email, I think that in legacy mode it is possible that
> ufshcd_release()
> is called twice while it only should be called once. Here are the
> possible solutions I see:
> * Add a function to the SCSI core for setting SCMD_STATE_COMPLETE.
> This
> may be controversial since no other SCSI LLD needs this
> functionality.
> * Changing the error handling approach in the UFS driver to the same
> approach other SCSI LLDs use: instead of using queue_work() to
> activate the error handler, call scsi_schedule_eh(). This will
> cause
> the error handler to be activated later, namely after all pending
> commands have timed out instead of aborting any pending commands
> first.
> * Add a variant of scsi_schedule_eh() to the SCSI core that
> accelerates
> error handling by calling scsi_timeout() on all pending commands.
>
> Thanks,
>
> Bart.
>
Hi Bart,
Yes, this patch is only for MCQ mode, because only MCQ mode
receives OCS: ABORTED, right? This patch doesn't modify
any of the Legacy mode flows, does it?
Additionally, I still don't understand why you say there would
be an issue with legacy mode having duplicate ufshcd_release(hba)
calls. As I mentioned before, it is protected by the
outstanding_lock. Could you please clarify the detailed
error flow?
Furthermore, even if there is an issue with Legacy mode, it
should be addressed by a separate patch, not by this one, which is
intended to resolve the MCQ mode issue. We shouldn't mix two
different issues together, don't you agree?
Thanks
Peter
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-09 2:17 ` Peter Wang (王信友)
@ 2024-10-09 18:06 ` Bart Van Assche
2024-10-11 5:44 ` Peter Wang (王信友)
0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2024-10-09 18:06 UTC (permalink / raw)
To: Peter Wang (王信友), linux-scsi@vger.kernel.org,
martin.petersen@oracle.com
Cc: wsd_upstream, linux-mediatek@lists.infradead.org
On 10/8/24 7:17 PM, Peter Wang (王信友) wrote:
> Yes, this patch is only for MCQ mode, because only MCQ mode
> receives OCS: ABORTED, right? This patch doesn't modify
> any of the Legacy mode flows, does it?
Agreed. What I mentioned in my email is an existing bug in the legacy
flow for ufshcd_abort_all().
> Furthermore, even if there is an issue with Legacy mode, it
> should be addressed by a separate patch, not by this one, which is
> intended to resolve the MCQ mode issue. We shouldn't mix two
> different issues together, don't you agree?
Let's proceed with this patch series and let's address what I brought
up in my email separately.
With the current approach for error handling in the UFS driver, anyone
who wants to verify or modify ufshcd_try_to_abort_task() has to consider
all possible interleavings of ufshcd_try_to_abort_task() and the
completion path (ufshcd_compl_one_cqe()). That's an unnecessary burden
on UFS driver contributors. Additionally, this is error-prone. This
applies to both modes (legacy and MCQ). I know of reports of sporadic
crashes in legacy mode related to UFS error handling. I'm wondering
whether these are perhaps the result of the issue I mentioned in a
previous email. Anyway, I will look further into this myself as soon as
I have the time.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-09 18:06 ` Bart Van Assche
@ 2024-10-11 5:44 ` Peter Wang (王信友)
0 siblings, 0 replies; 13+ messages in thread
From: Peter Wang (王信友) @ 2024-10-11 5:44 UTC (permalink / raw)
To: linux-scsi@vger.kernel.org, bvanassche@acm.org,
martin.petersen@oracle.com
Cc: wsd_upstream, linux-mediatek@lists.infradead.org
On Wed, 2024-10-09 at 11:06 -0700, Bart Van Assche wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
> On 10/8/24 7:17 PM, Peter Wang (王信友) wrote:
> > Yes, this patch is only for MCQ mode, because only MCQ mode
> > receives OCS: ABORTED, right? This patch doesn't modify
> > any of the Legacy mode flows, does it?
>
> Agreed. What I mentioned in my email is an existing bug in the legacy
> flow for ufshcd_abort_all().
>
> > Furthermore, even if there is an issue with Legacy mode, it
> > should be addressed by a separate patch, not by this one, which is
> > intended to resolve the MCQ mode issue. We shouldn't mix two
> > different issues together, don't you agree?
>
> Let's proceed with this patch series and let's address what I brought
> up in my email separately.
>
> With the current approach for error handling in the UFS driver,
> anyone
> who wants to verify or modify ufshcd_try_to_abort_task() has to
> consider
> all possible interleavings of ufshcd_try_to_abort_task() and the
> completion path (ufshcd_compl_one_cqe()). That's an unnecessary
> burden
> on UFS driver contributors. Additionally, this is error-prone. This
> applies to both modes (legacy and MCQ). I know of reports of sporadic
> crashes in legacy mode related to UFS error handling. I'm wondering
> whether these are perhaps the result of the issue I mentioned in a
> previous email. Anyway, I will look further into this myself as soon
> as
> I have the time.
>
> Thanks,
>
> Bart.
Hi Bart,
Thank you for your review.
I currently cannot see the issue of duplicate releases in
legacy SDB mode. ufshcd_try_to_abort_task() will directly
reset if it fails. It is only in the case of success that
we need to consider the possibility of ufshcd_compl_one_cqe.
I believe the original design flow has already taken this
into account, which is why there is protection with
outstanding_lock/cq_lock. Perhaps we can wait for an actual
example to occur before making corrections. Even if there
is an issue, I think the probability should be very low,
because the flow for legacy SDB mode has been in use
for several years.
Thank you again for your review.
Thanks
Peter
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v10 2/2] ufs: core: requeue aborted request
2024-10-01 9:19 ` [PATCH v10 2/2] ufs: core: requeue aborted request peter.wang
2024-10-01 17:13 ` Bart Van Assche
@ 2024-10-09 17:59 ` Bart Van Assche
1 sibling, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2024-10-09 17:59 UTC (permalink / raw)
To: peter.wang, linux-scsi, martin.petersen, avri.altman, alim.akhtar,
jejb
Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui,
tun-yu.yu, eddie.huang, naomi.chu, ed.tsai, quic_nguyenb, stable
On 10/1/24 2:19 AM, peter.wang@mediatek.com wrote:
> After the SQ cleanup fix, the CQ will receive a response with
> the corresponding tag marked as OCS: ABORTED. To align with
> the behavior of Legacy SDB mode, the handling of OCS: ABORTED
> has been changed to match that of OCS_INVALID_COMMAND_STATUS
> (SDB), with both returning a SCSI result of DID_REQUEUE.
>
> Furthermore, the workaround implemented before the SQ cleanup
> fix can be removed.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v10 0/2] fix abort defect
2024-10-01 9:19 [PATCH v10 0/2] fix abort defect peter.wang
2024-10-01 9:19 ` [PATCH v10 1/2] ufs: core: fix the issue of ICU failure peter.wang
2024-10-01 9:19 ` [PATCH v10 2/2] ufs: core: requeue aborted request peter.wang
@ 2024-10-16 2:38 ` Martin K. Petersen
2 siblings, 0 replies; 13+ messages in thread
From: Martin K. Petersen @ 2024-10-16 2:38 UTC (permalink / raw)
To: linux-scsi, avri.altman, alim.akhtar, jejb, peter.wang
Cc: Martin K . Petersen, wsd_upstream, linux-mediatek, chun-hung.wu,
alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu, ed.tsai,
bvanassche, quic_nguyenb
On Tue, 01 Oct 2024 17:19:15 +0800, peter.wang@mediatek.com wrote:
> V10:
> - Requeue OCS: ABORTED request in MCQ mode.
>
> V9:
> - Revise the OCS content printed.
>
> V8:
> - Remove the abort variable to simplify the abort process.
> - Correct error handler successfully aborts release flow.
> - Ingore MCQ OCS: ABORTED.
>
> [...]
Applied to 6.12/scsi-fixes, thanks!
[1/2] ufs: core: fix the issue of ICU failure
https://git.kernel.org/mkp/scsi/c/bf0c6cc73f7f
[2/2] ufs: core: requeue aborted request
https://git.kernel.org/mkp/scsi/c/8fa075804cb3
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 13+ messages in thread