From: Bart Van Assche <bvanassche@acm.org>
To: "Peter Wang (王信友)" <peter.wang@mediatek.com>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"avri.altman@wdc.com" <avri.altman@wdc.com>,
"quic_nguyenb@quicinc.com" <quic_nguyenb@quicinc.com>,
"alim.akhtar@samsung.com" <alim.akhtar@samsung.com>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
"jejb@linux.ibm.com" <jejb@linux.ibm.com>
Cc: "linux-mediatek@lists.infradead.org"
<linux-mediatek@lists.infradead.org>,
"Jiajie Hao (郝加节)" <jiajie.hao@mediatek.com>,
"CC Chou (周志杰)" <cc.chou@mediatek.com>,
"Eddie Huang (黃智傑)" <eddie.huang@mediatek.com>,
"Alice Chao (趙珮均)" <Alice.Chao@mediatek.com>,
wsd_upstream <wsd_upstream@mediatek.com>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
"Lin Gui (桂林)" <Lin.Gui@mediatek.com>,
"Chun-Hung Wu (巫駿宏)" <Chun-hung.Wu@mediatek.com>,
"Tun-yu Yu (游敦聿)" <Tun-yu.Yu@mediatek.com>,
"chu.stanley@gmail.com" <chu.stanley@gmail.com>,
"Chaotian Jing (井朝天)" <Chaotian.Jing@mediatek.com>,
"Powen Kao (高伯文)" <Powen.Kao@mediatek.com>,
"Naomi Chu (朱詠田)" <Naomi.Chu@mediatek.com>,
"Qilin Tan (谭麒麟)" <Qilin.Tan@mediatek.com>
Subject: Re: [PATCH v2] ufs: core: fix ufshcd_abort_all racing issue
Date: Tue, 25 Jun 2024 09:42:33 -0700 [thread overview]
Message-ID: <795a89bb-12eb-4ac8-93df-6ec5173fb679@acm.org> (raw)
In-Reply-To: <4c4d10aae216e0b6925445b0317e55a3dd0ce629.camel@mediatek.com>
On 6/25/24 1:29 AM, Peter Wang (王信友) wrote:
> On Mon, 2024-06-24 at 11:01 -0700, Bart Van Assche wrote:
>> On 6/24/24 5:11 AM, peter.wang@mediatek.com wrote:
>>> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-
>> mcq.c
>>> index 8944548c30fa..3b2e5bcb08a7 100644
>>> --- a/drivers/ufs/core/ufs-mcq.c
>>> +++ b/drivers/ufs/core/ufs-mcq.c
>>> @@ -512,8 +512,9 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba,
>> int task_tag)
>>> return -ETIMEDOUT;
>>>
>>> if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
>>> -if (!cmd)
>>> -return -EINVAL;
>>> +/* Should return 0 if cmd is already complete by irq */
>>> +if (!cmd || !ufshcd_cmd_inflight(cmd))
>>> +return 0;
>>> hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
>>> } else {
>>> hwq = hba->dev_cmd_queue;
>>
>> Does the call trace show that blk_mq_unique_tag() tries to
>> dereference
>> address 0x194? If so, how is this possible? There are
>> only two lrbp->cmd assignments in the UFS driver. These assignments
>> either assign a valid SCSI command pointer or NULL. Even after a SCSI
>> command has been completed, the SCSI command pointer remains valid.
>> So
>> how can an invalid pointer be passed to blk_mq_unique_tag()? Please
>> root-cause this issue instead of posting a code change that reduces a
>> race window without closing the race window completely.
>
> blk_mq_unique_tag() tries to dereference address 0x194, and it is null.
> Beacuse ISR end this IO by scsi_done, free request will be called and
> set mq_hctx null.
> The call path is
> scsi_done -> scsi_done_internal -> blk_mq_complete_request ->
> scsi_complete ->
> scsi_finish_command -> scsi_io_completion -> scsi_end_request ->
> __blk_mq_end_request ->
> blk_mq_free_request -> __blk_mq_free_request
>
> And blk_mq_unique_tag will access mq_hctx then get null pointer error.
> Please reference
> https://elixir.bootlin.com/linux/latest/source/block/blk-mq.c#L713
> https://elixir.bootlin.com/linux/latest/source/block/blk-mq-tag.c#L680
>
> So, the root-casue is very simple, free request then get hwq.
> This patch only check if reqesut not free(inflight) then get hwq.
> Thought it still have racing winodw, but it is better then do nothing,
> right?
> Or, maybe we get all cq_lock before get hwq to close the racing window.
> But the code may ugly, how do you think?
Please include a full root cause analysis when reposting fixes for the
reported crashes. It is not clear to me how it is possible that an
invalid pointer is passed to blk_mq_unique_tag() (0x194). As I mentioned
in my previous email, freeing a request does not modify the request
pointer and does not modify the SCSI command pointer either. As one can
derive from the blk_mq_alloc_rqs() call stack, memory for struct request
and struct scsi_cmnd is allocated at request queue allocation time and
is not freed until the request queue is freed. Hence, for a given tag,
neither the request pointer nor the SCSI command pointer changes as long
as a request queue exists. Hence my request for an explanation how it is
possible that an invalid pointer was passed to blk_mq_unique_tag().
Thanks,
Bart.
next prev parent reply other threads:[~2024-06-25 16:42 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-24 12:11 [PATCH v2] ufs: core: fix ufshcd_abort_all racing issue peter.wang
2024-06-24 18:01 ` Bart Van Assche
2024-06-25 8:29 ` Peter Wang (王信友)
2024-06-25 16:42 ` Bart Van Assche [this message]
2024-06-26 3:56 ` Peter Wang (王信友)
2024-06-26 17:13 ` Bart Van Assche
2024-06-27 9:19 ` Wenchao Hao
2024-06-27 10:59 ` Peter Wang (王信友)
2024-06-27 20:13 ` Bart Van Assche
2024-06-28 3:13 ` Peter Wang (王信友)
2024-06-27 7:59 ` Wenchao Hao
2024-06-27 10:58 ` Peter Wang (王信友)
2024-06-28 1:44 ` Wenchao Hao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=795a89bb-12eb-4ac8-93df-6ec5173fb679@acm.org \
--to=bvanassche@acm.org \
--cc=Alice.Chao@mediatek.com \
--cc=Chaotian.Jing@mediatek.com \
--cc=Chun-hung.Wu@mediatek.com \
--cc=Lin.Gui@mediatek.com \
--cc=Naomi.Chu@mediatek.com \
--cc=Powen.Kao@mediatek.com \
--cc=Qilin.Tan@mediatek.com \
--cc=Tun-yu.Yu@mediatek.com \
--cc=alim.akhtar@samsung.com \
--cc=avri.altman@wdc.com \
--cc=cc.chou@mediatek.com \
--cc=chu.stanley@gmail.com \
--cc=eddie.huang@mediatek.com \
--cc=jejb@linux.ibm.com \
--cc=jiajie.hao@mediatek.com \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=peter.wang@mediatek.com \
--cc=quic_nguyenb@quicinc.com \
--cc=stable@vger.kernel.org \
--cc=wsd_upstream@mediatek.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox