* [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue
@ 2024-06-28 7:00 peter.wang
2024-06-28 7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: peter.wang @ 2024-06-28 7:00 UTC (permalink / raw)
To: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu,
chu.stanley
From: Peter Wang <peter.wang@mediatek.com>
This series fixes race condition KE in ufshcd_err_handler, which call
ufshcd_abort_all abort an already completed request by ISR in MCQ mode.
V3:
- Modify ufshcd_mcq_req_to_hwq to distinguish cmd is completed or not
- Split two patches and add more race description.
V2:
- Change patch description and add Fixes/Cc tag
Peter Wang (2):
ufs: core: fix ufshcd_clear_cmd racing issue
ufs: core: fix ufshcd_abort_one racing issue
drivers/ufs/core/ufs-mcq.c | 11 ++++++-----
drivers/ufs/core/ufshcd.c | 2 ++
2 files changed, 8 insertions(+), 5 deletions(-)
--
2.18.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd racing issue
2024-06-28 7:00 [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue peter.wang
@ 2024-06-28 7:00 ` peter.wang
2024-06-28 19:25 ` Bart Van Assche
2024-06-28 7:00 ` [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one " peter.wang
2024-07-05 3:56 ` [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all " Martin K. Petersen
2 siblings, 1 reply; 6+ messages in thread
From: peter.wang @ 2024-06-28 7:00 UTC (permalink / raw)
To: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu,
chu.stanley, stable
From: Peter Wang <peter.wang@mediatek.com>
When ufshcd_clear_cmd racing with complete ISR,
the completed tag of request's mq_hctx pointer will set NULL by ISR.
And ufshcd_clear_cmd call ufshcd_mcq_req_to_hwq will get NULL pointer KE.
Return success when request is completed by ISR beacuse sq dosen't
need cleanup.
The racing flow is:
Thread A
ufshcd_err_handler step 1
ufshcd_try_to_abort_task
ufshcd_cmd_inflight(true) step 3
ufshcd_clear_cmd
...
ufshcd_mcq_req_to_hwq
blk_mq_unique_tag
rq->mq_hctx->queue_num step 5
Thread B
ufs_mtk_mcq_intr(cq complete ISR) step 2
scsi_done
...
__blk_mq_free_request
rq->mq_hctx = NULL; step 4
Below is KE back trace:
ufshcd_try_to_abort_task: cmd pending in the device. tag = 6
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14
lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc [ufs_mediatek_mod_ise]
Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise]
Call trace:
dump_backtrace+0xf8/0x148
show_stack+0x18/0x24
dump_stack_lvl+0x60/0x7c
dump_stack+0x18/0x3c
mrdump_common_die+0x24c/0x398 [mrdump]
ipanic_die+0x20/0x34 [mrdump]
notify_die+0x80/0xd8
die+0x94/0x2b8
__do_kernel_fault+0x264/0x298
do_page_fault+0xa4/0x4b8
do_translation_fault+0x38/0x54
do_mem_abort+0x58/0x118
el1_abort+0x3c/0x5c
el1h_64_sync_handler+0x54/0x90
el1h_64_sync+0x68/0x6c
blk_mq_unique_tag+0x8/0x14
ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise]
ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise]
ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise]
process_one_work+0x208/0x4fc
worker_thread+0x228/0x438
kthread+0x104/0x1d4
ret_from_fork+0x10/0x20
Fixes: 8d7290348992 ("scsi: ufs: mcq: Add supporting functions for MCQ abort")
Cc: <stable@vger.kernel.org> 6.6.x
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
drivers/ufs/core/ufs-mcq.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
index 8944548c30fa..c532416aec22 100644
--- a/drivers/ufs/core/ufs-mcq.c
+++ b/drivers/ufs/core/ufs-mcq.c
@@ -105,16 +105,15 @@ EXPORT_SYMBOL_GPL(ufshcd_mcq_config_mac);
* @hba: per adapter instance
* @req: pointer to the request to be issued
*
- * Return: the hardware queue instance on which the request would
- * be queued.
+ * Return: the hardware queue instance on which the request will be or has
+ * been queued. %NULL if the request has already been freed.
*/
struct ufs_hw_queue *ufshcd_mcq_req_to_hwq(struct ufs_hba *hba,
struct request *req)
{
- u32 utag = blk_mq_unique_tag(req);
- u32 hwq = blk_mq_unique_tag_to_hwq(utag);
+ struct blk_mq_hw_ctx *hctx = READ_ONCE(req->mq_hctx);
- return &hba->uhq[hwq];
+ return hctx ? &hba->uhq[hctx->queue_num] : NULL;
}
/**
@@ -515,6 +514,8 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag)
if (!cmd)
return -EINVAL;
hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
+ if (!hwq)
+ return 0;
} else {
hwq = hba->dev_cmd_queue;
}
--
2.18.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one racing issue
2024-06-28 7:00 [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue peter.wang
2024-06-28 7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
@ 2024-06-28 7:00 ` peter.wang
2024-06-28 19:26 ` Bart Van Assche
2024-07-05 3:56 ` [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all " Martin K. Petersen
2 siblings, 1 reply; 6+ messages in thread
From: peter.wang @ 2024-06-28 7:00 UTC (permalink / raw)
To: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu,
chu.stanley, stable
From: Peter Wang <peter.wang@mediatek.com>
When ufshcd_abort_one racing with complete ISR,
the completed tag of request's mq_hctx pointer will set NULL by ISR.
Same as previous patch race condition.
Return success when request is completed by ISR beacuse ufshcd_abort_one
dose't need do anything.
The racing flow is:
Thread A
ufshcd_err_handler step 1
...
ufshcd_abort_one
ufshcd_try_to_abort_task
ufshcd_cmd_inflight(true) step 3
ufshcd_mcq_req_to_hwq
blk_mq_unique_tag
rq->mq_hctx->queue_num step 5
Thread B
ufs_mtk_mcq_intr(cq complete ISR) step 2
scsi_done
...
__blk_mq_free_request
rq->mq_hctx = NULL; step 4
Below is KE back trace.
ufshcd_try_to_abort_task: cmd at tag 41 not pending in the device.
ufshcd_try_to_abort_task: cmd at tag=41 is cleared.
Aborting tag 41 / CDB 0x28 succeeded
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14
lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 [ufs_mediatek_mod_ise]
do_mem_abort+0x58/0x118
el1_abort+0x3c/0x5c
el1h_64_sync_handler+0x54/0x90
el1h_64_sync+0x68/0x6c
blk_mq_unique_tag+0x8/0x14
ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise]
process_one_work+0x208/0x4fc
worker_thread+0x228/0x438
kthread+0x104/0x1d4
ret_from_fork+0x10/0x20
Fixes: 93e6c0e19d5b ("scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode")
Cc: <stable@vger.kernel.org> 6.6.x
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
drivers/ufs/core/ufshcd.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index e5e9da61f15d..7214417a5ddc 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6456,6 +6456,8 @@ static bool ufshcd_abort_one(struct request *rq, void *priv)
/* Release cmd in MCQ mode if abort succeeds */
if (is_mcq_enabled(hba) && (*ret == 0)) {
hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
+ if (!hwq)
+ return 0;
spin_lock_irqsave(&hwq->cq_lock, flags);
if (ufshcd_cmd_inflight(lrbp->cmd))
ufshcd_release_scsi_cmd(hba, lrbp);
--
2.18.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd racing issue
2024-06-28 7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
@ 2024-06-28 19:25 ` Bart Van Assche
0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2024-06-28 19:25 UTC (permalink / raw)
To: peter.wang, linux-scsi, martin.petersen, avri.altman, alim.akhtar,
jejb
Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui,
tun-yu.yu, eddie.huang, naomi.chu, chu.stanley, stable
On 6/28/24 12:00 AM, peter.wang@mediatek.com wrote:
> From: Peter Wang <peter.wang@mediatek.com>
>
> When ufshcd_clear_cmd racing with complete ISR,
> the completed tag of request's mq_hctx pointer will set NULL by ISR.
> And ufshcd_clear_cmd call ufshcd_mcq_req_to_hwq will get NULL pointer KE.
> Return success when request is completed by ISR beacuse sq dosen't
> need cleanup.
>
> The racing flow is:
>
> Thread A
> ufshcd_err_handler step 1
> ufshcd_try_to_abort_task
> ufshcd_cmd_inflight(true) step 3
> ufshcd_clear_cmd
> ...
> ufshcd_mcq_req_to_hwq
> blk_mq_unique_tag
> rq->mq_hctx->queue_num step 5
>
> Thread B
> ufs_mtk_mcq_intr(cq complete ISR) step 2
> scsi_done
> ...
> __blk_mq_free_request
> rq->mq_hctx = NULL; step 4
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one racing issue
2024-06-28 7:00 ` [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one " peter.wang
@ 2024-06-28 19:26 ` Bart Van Assche
0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2024-06-28 19:26 UTC (permalink / raw)
To: peter.wang, linux-scsi, martin.petersen, avri.altman, alim.akhtar,
jejb
Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui,
tun-yu.yu, eddie.huang, naomi.chu, chu.stanley, stable
On 6/28/24 12:00 AM, peter.wang@mediatek.com wrote:
> When ufshcd_abort_one racing with complete ISR,
> the completed tag of request's mq_hctx pointer will set NULL by ISR.
> Same as previous patch race condition.
> Return success when request is completed by ISR beacuse ufshcd_abort_one
> dose't need do anything.
dose't -> doesn't. Anyway:
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue
2024-06-28 7:00 [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue peter.wang
2024-06-28 7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
2024-06-28 7:00 ` [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one " peter.wang
@ 2024-07-05 3:56 ` Martin K. Petersen
2 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2024-07-05 3:56 UTC (permalink / raw)
To: linux-scsi, avri.altman, alim.akhtar, jejb, peter.wang
Cc: Martin K . Petersen, wsd_upstream, linux-mediatek, chun-hung.wu,
alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu,
chu.stanley
On Fri, 28 Jun 2024 15:00:28 +0800, peter.wang@mediatek.com wrote:
> This series fixes race condition KE in ufshcd_err_handler, which call
> ufshcd_abort_all abort an already completed request by ISR in MCQ mode.
>
> V3:
> - Modify ufshcd_mcq_req_to_hwq to distinguish cmd is completed or not
> - Split two patches and add more race description.
>
> [...]
Applied to 6.10/scsi-fixes, thanks!
[1/2] ufs: core: fix ufshcd_clear_cmd racing issue
https://git.kernel.org/mkp/scsi/c/9307a998cb98
[2/2] ufs: core: fix ufshcd_abort_one racing issue
https://git.kernel.org/mkp/scsi/c/74736103fb41
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-05 3:56 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28 7:00 [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue peter.wang
2024-06-28 7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
2024-06-28 19:25 ` Bart Van Assche
2024-06-28 7:00 ` [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one " peter.wang
2024-06-28 19:26 ` Bart Van Assche
2024-07-05 3:56 ` [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all " Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).