From: "Peter Wang (王信友)" <peter.wang@mediatek.com>
To: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"bvanassche@acm.org" <bvanassche@acm.org>,
"avri.altman@wdc.com" <avri.altman@wdc.com>,
"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
"alim.akhtar@samsung.com" <alim.akhtar@samsung.com>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: "linux-mediatek@lists.infradead.org"
<linux-mediatek@lists.infradead.org>,
"Jiajie Hao (郝加节)" <jiajie.hao@mediatek.com>,
"CC Chou (周志杰)" <cc.chou@mediatek.com>,
"Eddie Huang (黃智傑)" <eddie.huang@mediatek.com>,
"Alice Chao (趙珮均)" <Alice.Chao@mediatek.com>,
"quic_nguyenb@quicinc.com" <quic_nguyenb@quicinc.com>,
wsd_upstream <wsd_upstream@mediatek.com>,
"Ed Tsai (蔡宗軒)" <Ed.Tsai@mediatek.com>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
"Lin Gui (桂林)" <Lin.Gui@mediatek.com>,
"Chun-Hung Wu (巫駿宏)" <Chun-hung.Wu@mediatek.com>,
"Tun-yu Yu (游敦聿)" <Tun-yu.Yu@mediatek.com>,
"Chaotian Jing (井朝天)" <Chaotian.Jing@mediatek.com>,
"Powen Kao (高伯文)" <Powen.Kao@mediatek.com>,
"Naomi Chu (朱詠田)" <Naomi.Chu@mediatek.com>,
"Qilin Tan (谭麒麟)" <Qilin.Tan@mediatek.com>
Subject: Re: [PATCH v4 2/2] ufs: core: requeue aborted request
Date: Thu, 19 Sep 2024 12:16:16 +0000 [thread overview]
Message-ID: <f350a1dee5a03347b5e88b9d7249223ce7b72c08.camel@mediatek.com> (raw)
In-Reply-To: <78c7fc74-81c2-40e4-b050-1d65dec96d0a@acm.org>
On Wed, 2024-09-18 at 11:29 -0700, Bart Van Assche wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> On 9/18/24 6:29 AM, Peter Wang (王信友) wrote:
> > Basically, this patch currently only needs to handle requeueing
> > for the error handler abort.
> > The approach for DBR mode and MCQ mode should be consistent.
> > If receive an interrupt response (OCS:ABORTED or
> INVALID_OCS_VALUE),
> > then set DID_REQUEUE. If there is no interrupt, it will also set
> > SCSI DID_REQUEUE in ufshcd_err_handler through
> > ufshcd_complete_requests
> > with force_compl = true.
>
> Reporting a completion for commands cleared by writing into the
> legacy
> UTRLCLR register is not compliant with any version of the UFSHCI
> standard. Reporting a completion for commands cleared by writing into
> that register is problematic because it causes
> ufshcd_release_scsi_cmd()
> to be called as follows:
>
> ufshcd_sl_intr()
> ufshcd_transfer_req_compl()
> ufshcd_poll()
> __ufshcd_transfer_req_compl()
> ufshcd_compl_one_cqe()
> cmd->result = ...
> ufshcd_release_scsi_cmd()
> scsi_done()
>
> Calling ufshcd_release_scsi_cmd() if a command has been cleared is
> problematic because the SCSI core does not expect this. If
> ufshcd_try_to_abort_task() clears a SCSI command,
> ufshcd_release_scsi_cmd() must not be called until the SCSI core
> decides to release the command. This is why I wrote in a previous
> mail
> that I think that a quirk should be introduced to suppress the
> completions generated by clearing a SCSI command.
>
Hi Bart,
I'm not sure if I'm misunderstanding your point, but I feel that
ufshcd_release_scsi_cmd should always be called. It's scsi_done
that shouldn't be called, as it should be left to the SCSI layer
to decide how to handle this command.
Because ufshcd_release_scsi_cmd is just about releasing resources
related to ufshcd_map_sg and the clock at the UFS driver level.
scsi_done is what notifies the SCSI layer that the cmd has finished,
asking it to look at the result to decide how to proceed.
> > The more problematic part is with MCQ mode. To imitate the DBR
> > approach, we just need to set DID_REQUEUE upon receiving an
> interrupt.
> > Everything else remains the same. This would make things simpler.
> >
> > Moving forward, if we want to simplify things and we have also
> > taken stock of the two or three scenarios where OCS: ABORTED
> occurs,
> > do we even need a flag? Couldn't we just set DID_REQUEUE directly
> > for OCS: ABORTED?
> > What do you think?
>
> How about making ufshcd_compl_one_cqe() skip entries with status
> OCS_ABORTED? That would make ufshcd_compl_one_cqe() behave as the
> SCSI core expects, namely not freeing any command resources if a
> SCSI command is aborted successfully.
>
> This approach may require further changes to ufshcd_abort_all().
> In that function there are separate code paths for legacy and MCQ
> mode. This is less than ideal. Would it be possible to combine
> these code paths by removing the ufshcd_complete_requests() call
> from ufshcd_abort_all() and by handling completions from inside
> ufshcd_abort_one()?
>
> Thanks,
>
> Bart.
The four case flows for abort are as follows:
----------------------------------------------------------------
Case1: DBR ufshcd_abort
In this case, you can see that ufshcd_release_scsi_cmd will
definitely be called.
ufshcd_abort()
ufshcd_try_to_abort_task() // It should trigger an
interrupt, but the tensor might not
get outstanding_lock
clear outstanding_reqs tag
ufshcd_release_scsi_cmd()
release outstanding_lock
ufshcd_intr()
ufshcd_sl_intr()
ufshcd_transfer_req_compl()
ufshcd_poll()
get outstanding_lock
clear outstanding_reqs tag
release outstanding_lock
__ufshcd_transfer_req_compl()
ufshcd_compl_one_cqe()
cmd->result = DID_REQUEUE // mediatek may need quirk
change DID_ABORT to DID_REQUEUE
ufshcd_release_scsi_cmd()
scsi_done();
In most cases, ufshcd_intr will not reach scsi_done because the
outstanding_reqs tag is cleared by the original thread.
Therefore, whether there is an interrupt or not doesn't affect
the result because the ISR will do nothing in most cases.
In a very low chance, the ISR will reach scsi_done and notify
SCSI to requeue, and the original thread will not
call ufshcd_release_scsi_cmd.
MediaTek may need to change DID_ABORT to DID_REQUEUE in this
situation, or perhaps not handle this ISR at all.
----------------------------------------------------------------
Case2: MCQ ufshcd_abort
In the case of MCQ ufshcd_abort, you can also see that
ufshcd_release_scsi_cmd will definitely be called too.
However, there seems to be a problem here, as
ufshcd_release_scsi_cmd might be called twice.
This is because cmd is not null in ufshcd_release_scsi_cmd,
which the previous version would set cmd to null.
Skipping OCS: ABORTED in ufshcd_compl_one_cqe indeed
can avoid this problem. This part needs further
consideration on how to handle it.
ufshcd_abort()
ufshcd_mcq_abort()
ufshcd_try_to_abort_task() // will trigger ISR
ufshcd_release_scsi_cmd()
ufs_mtk_mcq_intr()
ufshcd_mcq_poll_cqe_lock()
ufshcd_mcq_process_cqe()
ufshcd_compl_one_cqe()
cmd->result = DID_ABORT
ufshcd_release_scsi_cmd() // will release twice
scsi_done()
----------------------------------------------------------------
Case3: DBR ufshcd_err_handler
In the case of the DBR mode error handler, it's the same;
ufshcd_release_scsi_cmd will also be executed, and scsi_done
will definitely be used to notify SCSI to requeue.
ufshcd_err_handler()
ufshcd_abort_all()
ufshcd_abort_one()
ufshcd_try_to_abort_task() // It should trigger an
interrupt, but the tensor might not
ufshcd_complete_requests()
ufshcd_transfer_req_compl()
ufshcd_poll()
get outstanding_lock
clear outstanding_reqs tag
release outstanding_lock
__ufshcd_transfer_req_compl()
ufshcd_compl_one_cqe()
cmd->result = DID_REQUEUE // mediatek may need quirk
change DID_ABORT to DID_REQUEUE
ufshcd_release_scsi_cmd()
scsi_done()
ufshcd_intr()
ufshcd_sl_intr()
ufshcd_transfer_req_compl()
ufshcd_poll()
get outstanding_lock
clear outstanding_reqs tag
release outstanding_lock
__ufshcd_transfer_req_compl()
ufshcd_compl_one_cqe()
cmd->result = DID_REQUEUE // mediatek may need quirk change
DID_ABORT to DID_REQUEUE
ufshcd_release_scsi_cmd()
scsi_done();
At this time, the same actions are taken regardless of whether
there is an ISR, and with the protection of outstanding_lock,
only one thread will execute ufshcd_release_scsi_cmd and scsi_done.
----------------------------------------------------------------
Case4: MCQ ufshcd_err_handler
It's the same with MCQ mode; there is protection from the cqe lock,
so only one thread will execute. What my patch 2 aims to do is to
change DID_ABORT to DID_REQUEUE in this situation.
ufshcd_err_handler()
ufshcd_abort_all()
ufshcd_abort_one()
ufshcd_try_to_abort_task() // will trigger irq thread
ufshcd_complete_requests()
ufshcd_mcq_compl_pending_transfer()
ufshcd_mcq_poll_cqe_lock()
ufshcd_mcq_process_cqe()
ufshcd_compl_one_cqe()
cmd->result = DID_ABORT // should change to DID_REQUEUE
ufshcd_release_scsi_cmd()
scsi_done()
ufs_mtk_mcq_intr()
ufshcd_mcq_poll_cqe_lock()
ufshcd_mcq_process_cqe()
ufshcd_compl_one_cqe()
cmd->result = DID_ABORT // should change to DID_REQUEUE
ufshcd_release_scsi_cmd()
scsi_done()
Thanks
Peter
next prev parent reply other threads:[~2024-09-19 12:16 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240910073035.25974-1-peter.wang@mediatek.com>
2024-09-10 7:30 ` [PATCH v4 1/2] ufs: core: fix the issue of ICU failure peter.wang
2024-09-10 7:30 ` [PATCH v4 2/2] ufs: core: requeue aborted request peter.wang
2024-09-10 17:59 ` Bart Van Assche
2024-09-11 6:03 ` Peter Wang (王信友)
2024-09-11 19:11 ` Bart Van Assche
2024-09-12 13:31 ` Peter Wang (王信友)
2024-09-12 21:17 ` Bart Van Assche
2024-09-13 7:10 ` Peter Wang (王信友)
2024-09-13 17:41 ` Bart Van Assche
2024-09-18 13:29 ` Peter Wang (王信友)
2024-09-18 18:29 ` Bart Van Assche
2024-09-19 12:16 ` Peter Wang (王信友) [this message]
2024-09-19 18:49 ` Bart Van Assche
2024-09-20 2:02 ` Peter Wang (王信友)
2024-09-20 18:39 ` Bart Van Assche
2024-09-23 7:06 ` Peter Wang (王信友)
2024-09-14 16:13 ` Bart Van Assche
2024-09-18 13:30 ` Peter Wang (王信友)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f350a1dee5a03347b5e88b9d7249223ce7b72c08.camel@mediatek.com \
--to=peter.wang@mediatek.com \
--cc=Alice.Chao@mediatek.com \
--cc=Chaotian.Jing@mediatek.com \
--cc=Chun-hung.Wu@mediatek.com \
--cc=Ed.Tsai@mediatek.com \
--cc=Lin.Gui@mediatek.com \
--cc=Naomi.Chu@mediatek.com \
--cc=Powen.Kao@mediatek.com \
--cc=Qilin.Tan@mediatek.com \
--cc=Tun-yu.Yu@mediatek.com \
--cc=alim.akhtar@samsung.com \
--cc=avri.altman@wdc.com \
--cc=bvanassche@acm.org \
--cc=cc.chou@mediatek.com \
--cc=eddie.huang@mediatek.com \
--cc=jejb@linux.ibm.com \
--cc=jiajie.hao@mediatek.com \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=quic_nguyenb@quicinc.com \
--cc=stable@vger.kernel.org \
--cc=wsd_upstream@mediatek.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox