Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: "Peter Wang (王信友)" <peter.wang@mediatek.com>
To: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"avri.altman@wdc.com" <avri.altman@wdc.com>,
	"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
	"alim.akhtar@samsung.com" <alim.akhtar@samsung.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: "linux-mediatek@lists.infradead.org"
	<linux-mediatek@lists.infradead.org>,
	"Jiajie Hao (郝加节)" <jiajie.hao@mediatek.com>,
	"CC Chou (周志杰)" <cc.chou@mediatek.com>,
	"Eddie Huang (黃智傑)" <eddie.huang@mediatek.com>,
	"Alice Chao (趙珮均)" <Alice.Chao@mediatek.com>,
	"quic_nguyenb@quicinc.com" <quic_nguyenb@quicinc.com>,
	wsd_upstream <wsd_upstream@mediatek.com>,
	"Ed Tsai (蔡宗軒)" <Ed.Tsai@mediatek.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"Lin Gui (桂林)" <Lin.Gui@mediatek.com>,
	"Chun-Hung Wu (巫駿宏)" <Chun-hung.Wu@mediatek.com>,
	"Tun-yu Yu (游敦聿)" <Tun-yu.Yu@mediatek.com>,
	"Chaotian Jing (井朝天)" <Chaotian.Jing@mediatek.com>,
	"Powen Kao (高伯文)" <Powen.Kao@mediatek.com>,
	"Naomi Chu (朱詠田)" <Naomi.Chu@mediatek.com>,
	"Qilin Tan (谭麒麟)" <Qilin.Tan@mediatek.com>
Subject: Re: [PATCH v4 2/2] ufs: core: requeue aborted request
Date: Fri, 20 Sep 2024 02:02:03 +0000	[thread overview]
Message-ID: <4f9e2ac99bcb981b11dc6454165818c5de6fd4d6.camel@mediatek.com> (raw)
In-Reply-To: <beeec868-b4ac-4025-859b-35a828cd2f8e@acm.org>

On Thu, 2024-09-19 at 11:49 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/19/24 5:16 AM, Peter Wang (王信友) wrote:
> > The four case flows for abort are as follows:
> > ----------------------------------------------------------------
> > 
> > Case1: DBR ufshcd_abort
> 
> Please follow the terminology from the UFSHCI 4.0 standard and use
> the
> word "legacy" instead of "DBR".
> 

Hi Bart,

Okay, but the current code comments all use 'SDB mode'. 
Should we just stick with that term?


> > In this case, you can see that ufshcd_release_scsi_cmd will
> > definitely be called.
> > 
> > ufshcd_abort()
> >    ufshcd_try_to_abort_task()// It should trigger an
> > interrupt, but the tensor might not
> >    get outstanding_lock
> >    clear outstanding_reqs tag
> >    ufshcd_release_scsi_cmd()
> >    release outstanding_lock
> > 
> > ufshcd_intr()
> >    ufshcd_sl_intr()
> >      ufshcd_transfer_req_compl()
> >        ufshcd_poll()
> >          get outstanding_lock
> >          clear outstanding_reqs tag
> >          release outstanding_lock
> >          __ufshcd_transfer_req_compl()
> >            ufshcd_compl_one_cqe()
> >            cmd->result = DID_REQUEUE// mediatek may need quirk
> > change DID_ABORT to DID_REQUEUE
> >            ufshcd_release_scsi_cmd()
> >            scsi_done();
> > 
> > In most cases, ufshcd_intr will not reach scsi_done because the
> > outstanding_reqs tag is cleared by the original thread.
> > Therefore, whether there is an interrupt or not doesn't affect
> > the result because the ISR will do nothing in most cases.
> > 
> > In a very low chance, the ISR will reach scsi_done and notify
> > SCSI to requeue, and the original thread will not
> > call ufshcd_release_scsi_cmd.
> > MediaTek may need to change DID_ABORT to DID_REQUEUE in this
> > situation, or perhaps not handle this ISR at all.
> 
> Please modify ufshcd_compl_one_cqe() such that it ignores commands
> with status OCS_ABORTED. This will make the UFSHCI driver behave in
> the same way for all UFSHCI controllers, whether or not clearing a
> command triggers a completion interrupt.
> 

Yes, I am considering how to modify the code here.

> > ----------------------------------------------------------------
> > 
> > Case2: MCQ ufshcd_abort
> > 
> > In the case of MCQ ufshcd_abort, you can also see that
> > ufshcd_release_scsi_cmd will definitely be called too.
> > However, there seems to be a problem here, as
> > ufshcd_release_scsi_cmd might be called twice.
> > This is because cmd is not null in ufshcd_release_scsi_cmd,
> > which the previous version would set cmd to null.
> > Skipping OCS: ABORTED in ufshcd_compl_one_cqe indeed
> > can avoid this problem. This part needs further
> > consideration on how to handle it.
> > 
> > ufshcd_abort()
> >    ufshcd_mcq_abort()
> >      ufshcd_try_to_abort_task()// will trigger ISR
> >      ufshcd_release_scsi_cmd()
> > 
> > ufs_mtk_mcq_intr()
> >    ufshcd_mcq_poll_cqe_lock()
> >      ufshcd_mcq_process_cqe()
> >        ufshcd_compl_one_cqe()
> >          cmd->result = DID_ABORT
> >          ufshcd_release_scsi_cmd() // will release twice
> >          scsi_done()
> 
> Do you agree that this case can be addressed with the
> ufshcd_compl_one_cqe() change proposed above?
> 

Agree.

> > ----------------------------------------------------------------
> > 
> > Case3: DBR ufshcd_err_handler
> > 
> > In the case of the DBR mode error handler, it's the same;
> > ufshcd_release_scsi_cmd will also be executed, and scsi_done
> > will definitely be used to notify SCSI to requeue.
> > 
> > ufshcd_err_handler()
> >    ufshcd_abort_all()
> >      ufshcd_abort_one()
> >        ufshcd_try_to_abort_task()// It should trigger an
> > interrupt, but the tensor might not
> >      ufshcd_complete_requests()
> >        ufshcd_transfer_req_compl()
> >          ufshcd_poll()
> >            get outstanding_lock
> >            clear outstanding_reqs tag
> >            release outstanding_lock
> >            __ufshcd_transfer_req_compl()
> >              ufshcd_compl_one_cqe()
> >                cmd->result = DID_REQUEUE // mediatek may need quirk
> > change DID_ABORT to DID_REQUEUE
> >                ufshcd_release_scsi_cmd()
> >                scsi_done()
> > 
> > ufshcd_intr()
> >    ufshcd_sl_intr()
> >      ufshcd_transfer_req_compl()
> >        ufshcd_poll()
> >          get outstanding_lock
> >          clear outstanding_reqs tag
> >          release outstanding_lock
> >          __ufshcd_transfer_req_compl()
> >            ufshcd_compl_one_cqe()
> >            cmd->result = DID_REQUEUE // mediatek may need quirk
> change
> > DID_ABORT to DID_REQUEUE
> >            ufshcd_release_scsi_cmd()
> >            scsi_done();
> > 
> > At this time, the same actions are taken regardless of whether
> > there is an ISR, and with the protection of outstanding_lock,
> > only one thread will execute ufshcd_release_scsi_cmd and scsi_done.
> > ----------------------------------------------------------------
> > 
> > Case4: MCQ ufshcd_err_handler
> > 
> > It's the same with MCQ mode; there is protection from the cqe lock,
> > so only one thread will execute. What my patch 2 aims to do is to
> > change DID_ABORT to DID_REQUEUE in this situation.
> > 
> > ufshcd_err_handler()
> >    ufshcd_abort_all()
> >      ufshcd_abort_one()
> >        ufshcd_try_to_abort_task()// will trigger irq thread
> >      ufshcd_complete_requests()
> >        ufshcd_mcq_compl_pending_transfer()
> >          ufshcd_mcq_poll_cqe_lock()
> >            ufshcd_mcq_process_cqe()
> >              ufshcd_compl_one_cqe()
> >                cmd->result = DID_ABORT // should change to
> DID_REQUEUE
> >                ufshcd_release_scsi_cmd()
> >                scsi_done()
> > 
> > ufs_mtk_mcq_intr()
> >    ufshcd_mcq_poll_cqe_lock()
> >      ufshcd_mcq_process_cqe()
> >        ufshcd_compl_one_cqe()
> >          cmd->result = DID_ABORT  // should change to DID_REQUEUE
> >          ufshcd_release_scsi_cmd()
> >          scsi_done()
> 
> For legacy and MCQ mode, I prefer the following behavior for
> ufshcd_abort_all():
> * ufshcd_compl_one_cqe() ignores commands with status OCS_ABORTED.
> * ufshcd_release_scsi_cmd() is called either by ufshcd_abort_one() or
>    by ufshcd_abort_all().
> 
> Do you agree with making the changes proposed above?
> 
> Thank you,

This might not work, as SDB mode doesn't ignore 
OCS: INVALID_OCS_VALUE but rather notifies SCSI to requeue. 
So what we need to correct is to notify SCSI to requeue 
when MCQ mode receives OCS: ABORTED as well.

Furthermore, ufshcd_compl_one_cqe, whether it comes from 
ufshcd_abort_all or ISR, does the same thing and is protected 
by a lock. Therefore, there is no need for special handling 
specifically for ufshcd_abort_all.

After discussing with you, I realized that there are indeed many 
deficiencies and inconsistencies here that need to be addressed. 
I will submit a new patch for the content discussed above.

Thanks.
Peter



> 
> Bar

  reply	other threads:[~2024-09-20  2:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240910073035.25974-1-peter.wang@mediatek.com>
2024-09-10  7:30 ` [PATCH v4 1/2] ufs: core: fix the issue of ICU failure peter.wang
2024-09-10  7:30 ` [PATCH v4 2/2] ufs: core: requeue aborted request peter.wang
2024-09-10 17:59   ` Bart Van Assche
2024-09-11  6:03     ` Peter Wang (王信友)
2024-09-11 19:11       ` Bart Van Assche
2024-09-12 13:31         ` Peter Wang (王信友)
2024-09-12 21:17           ` Bart Van Assche
2024-09-13  7:10             ` Peter Wang (王信友)
2024-09-13 17:41               ` Bart Van Assche
2024-09-18 13:29                 ` Peter Wang (王信友)
2024-09-18 18:29                   ` Bart Van Assche
2024-09-19 12:16                     ` Peter Wang (王信友)
2024-09-19 18:49                       ` Bart Van Assche
2024-09-20  2:02                         ` Peter Wang (王信友) [this message]
2024-09-20 18:39                           ` Bart Van Assche
2024-09-23  7:06                             ` Peter Wang (王信友)
2024-09-14 16:13       ` Bart Van Assche
2024-09-18 13:30         ` Peter Wang (王信友)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f9e2ac99bcb981b11dc6454165818c5de6fd4d6.camel@mediatek.com \
    --to=peter.wang@mediatek.com \
    --cc=Alice.Chao@mediatek.com \
    --cc=Chaotian.Jing@mediatek.com \
    --cc=Chun-hung.Wu@mediatek.com \
    --cc=Ed.Tsai@mediatek.com \
    --cc=Lin.Gui@mediatek.com \
    --cc=Naomi.Chu@mediatek.com \
    --cc=Powen.Kao@mediatek.com \
    --cc=Qilin.Tan@mediatek.com \
    --cc=Tun-yu.Yu@mediatek.com \
    --cc=alim.akhtar@samsung.com \
    --cc=avri.altman@wdc.com \
    --cc=bvanassche@acm.org \
    --cc=cc.chou@mediatek.com \
    --cc=eddie.huang@mediatek.com \
    --cc=jejb@linux.ibm.com \
    --cc=jiajie.hao@mediatek.com \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=quic_nguyenb@quicinc.com \
    --cc=stable@vger.kernel.org \
    --cc=wsd_upstream@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox