linux-mediatek.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue
@ 2024-06-28  7:00 peter.wang
  2024-06-28  7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: peter.wang @ 2024-06-28  7:00 UTC (permalink / raw)
  To: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu,
	chu.stanley

From: Peter Wang <peter.wang@mediatek.com>

This series fixes race condition KE in ufshcd_err_handler, which call 
ufshcd_abort_all abort an already completed request by ISR in MCQ mode.

V3:
 - Modify ufshcd_mcq_req_to_hwq to distinguish cmd is completed or not
 - Split two patches and add more race description.

V2:
 - Change patch description and add Fixes/Cc tag

Peter Wang (2):
  ufs: core: fix ufshcd_clear_cmd racing issue
  ufs: core: fix ufshcd_abort_one racing issue

 drivers/ufs/core/ufs-mcq.c | 11 ++++++-----
 drivers/ufs/core/ufshcd.c  |  2 ++
 2 files changed, 8 insertions(+), 5 deletions(-)

-- 
2.18.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd racing issue
  2024-06-28  7:00 [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue peter.wang
@ 2024-06-28  7:00 ` peter.wang
  2024-06-28 19:25   ` Bart Van Assche
  2024-06-28  7:00 ` [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one " peter.wang
  2024-07-05  3:56 ` [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all " Martin K. Petersen
  2 siblings, 1 reply; 6+ messages in thread
From: peter.wang @ 2024-06-28  7:00 UTC (permalink / raw)
  To: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu,
	chu.stanley, stable

From: Peter Wang <peter.wang@mediatek.com>

When ufshcd_clear_cmd racing with complete ISR,
the completed tag of request's mq_hctx pointer will set NULL by ISR.
And ufshcd_clear_cmd call ufshcd_mcq_req_to_hwq will get NULL pointer KE.
Return success when request is completed by ISR beacuse sq dosen't
need cleanup.

The racing flow is:

Thread A
ufshcd_err_handler					step 1
	ufshcd_try_to_abort_task
		ufshcd_cmd_inflight(true)		step 3
		ufshcd_clear_cmd
			...
			ufshcd_mcq_req_to_hwq
			blk_mq_unique_tag
				rq->mq_hctx->queue_num	step 5

Thread B
ufs_mtk_mcq_intr(cq complete ISR)			step 2
	scsi_done
		...
		__blk_mq_free_request
			rq->mq_hctx = NULL;		step 4

Below is KE back trace:

  ufshcd_try_to_abort_task: cmd pending in the device. tag = 6
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
   pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14
   lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc [ufs_mediatek_mod_ise]
   Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise]
   Call trace:
    dump_backtrace+0xf8/0x148
    show_stack+0x18/0x24
    dump_stack_lvl+0x60/0x7c
    dump_stack+0x18/0x3c
    mrdump_common_die+0x24c/0x398 [mrdump]
    ipanic_die+0x20/0x34 [mrdump]
    notify_die+0x80/0xd8
    die+0x94/0x2b8
    __do_kernel_fault+0x264/0x298
    do_page_fault+0xa4/0x4b8
    do_translation_fault+0x38/0x54
    do_mem_abort+0x58/0x118
    el1_abort+0x3c/0x5c
    el1h_64_sync_handler+0x54/0x90
    el1h_64_sync+0x68/0x6c
    blk_mq_unique_tag+0x8/0x14
    ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise]
    ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise]
    ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise]
    process_one_work+0x208/0x4fc
    worker_thread+0x228/0x438
    kthread+0x104/0x1d4
    ret_from_fork+0x10/0x20

Fixes: 8d7290348992 ("scsi: ufs: mcq: Add supporting functions for MCQ abort")
Cc: <stable@vger.kernel.org> 6.6.x
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufs-mcq.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
index 8944548c30fa..c532416aec22 100644
--- a/drivers/ufs/core/ufs-mcq.c
+++ b/drivers/ufs/core/ufs-mcq.c
@@ -105,16 +105,15 @@ EXPORT_SYMBOL_GPL(ufshcd_mcq_config_mac);
  * @hba: per adapter instance
  * @req: pointer to the request to be issued
  *
- * Return: the hardware queue instance on which the request would
- * be queued.
+ * Return: the hardware queue instance on which the request will be or has
+ * been queued. %NULL if the request has already been freed.
  */
 struct ufs_hw_queue *ufshcd_mcq_req_to_hwq(struct ufs_hba *hba,
 					 struct request *req)
 {
-	u32 utag = blk_mq_unique_tag(req);
-	u32 hwq = blk_mq_unique_tag_to_hwq(utag);
+	struct blk_mq_hw_ctx *hctx = READ_ONCE(req->mq_hctx);
 
-	return &hba->uhq[hwq];
+	return hctx ? &hba->uhq[hctx->queue_num] : NULL;
 }
 
 /**
@@ -515,6 +514,8 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag)
 		if (!cmd)
 			return -EINVAL;
 		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
+		if (!hwq)
+			return 0;
 	} else {
 		hwq = hba->dev_cmd_queue;
 	}
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one racing issue
  2024-06-28  7:00 [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue peter.wang
  2024-06-28  7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
@ 2024-06-28  7:00 ` peter.wang
  2024-06-28 19:26   ` Bart Van Assche
  2024-07-05  3:56 ` [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all " Martin K. Petersen
  2 siblings, 1 reply; 6+ messages in thread
From: peter.wang @ 2024-06-28  7:00 UTC (permalink / raw)
  To: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb
  Cc: wsd_upstream, linux-mediatek, peter.wang, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu,
	chu.stanley, stable

From: Peter Wang <peter.wang@mediatek.com>

When ufshcd_abort_one racing with complete ISR,
the completed tag of request's mq_hctx pointer will set NULL by ISR.
Same as previous patch race condition.
Return success when request is completed by ISR beacuse ufshcd_abort_one
dose't need do anything.

The racing flow is:

Thread A
ufshcd_err_handler					step 1
	...
	ufshcd_abort_one
		ufshcd_try_to_abort_task
			ufshcd_cmd_inflight(true)	step 3
		ufshcd_mcq_req_to_hwq
			blk_mq_unique_tag
				rq->mq_hctx->queue_num	step 5

Thread B
ufs_mtk_mcq_intr(cq complete ISR)			step 2
	scsi_done
		...
		__blk_mq_free_request
			rq->mq_hctx = NULL;		step 4

Below is KE back trace.
  ufshcd_try_to_abort_task: cmd at tag 41 not pending in the device.
  ufshcd_try_to_abort_task: cmd at tag=41 is cleared.
  Aborting tag 41 / CDB 0x28 succeeded
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
  pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14
  lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 [ufs_mediatek_mod_ise]
   do_mem_abort+0x58/0x118
   el1_abort+0x3c/0x5c
   el1h_64_sync_handler+0x54/0x90
   el1h_64_sync+0x68/0x6c
   blk_mq_unique_tag+0x8/0x14
   ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise]
   process_one_work+0x208/0x4fc
   worker_thread+0x228/0x438
   kthread+0x104/0x1d4
   ret_from_fork+0x10/0x20

Fixes: 93e6c0e19d5b ("scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode")
Cc: <stable@vger.kernel.org> 6.6.x
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index e5e9da61f15d..7214417a5ddc 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6456,6 +6456,8 @@ static bool ufshcd_abort_one(struct request *rq, void *priv)
 	/* Release cmd in MCQ mode if abort succeeds */
 	if (is_mcq_enabled(hba) && (*ret == 0)) {
 		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
+		if (!hwq)
+			return 0;
 		spin_lock_irqsave(&hwq->cq_lock, flags);
 		if (ufshcd_cmd_inflight(lrbp->cmd))
 			ufshcd_release_scsi_cmd(hba, lrbp);
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd racing issue
  2024-06-28  7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
@ 2024-06-28 19:25   ` Bart Van Assche
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2024-06-28 19:25 UTC (permalink / raw)
  To: peter.wang, linux-scsi, martin.petersen, avri.altman, alim.akhtar,
	jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui,
	tun-yu.yu, eddie.huang, naomi.chu, chu.stanley, stable

On 6/28/24 12:00 AM, peter.wang@mediatek.com wrote:
> From: Peter Wang <peter.wang@mediatek.com>
> 
> When ufshcd_clear_cmd racing with complete ISR,
> the completed tag of request's mq_hctx pointer will set NULL by ISR.
> And ufshcd_clear_cmd call ufshcd_mcq_req_to_hwq will get NULL pointer KE.
> Return success when request is completed by ISR beacuse sq dosen't
> need cleanup.
> 
> The racing flow is:
> 
> Thread A
> ufshcd_err_handler					step 1
> 	ufshcd_try_to_abort_task
> 		ufshcd_cmd_inflight(true)		step 3
> 		ufshcd_clear_cmd
> 			...
> 			ufshcd_mcq_req_to_hwq
> 			blk_mq_unique_tag
> 				rq->mq_hctx->queue_num	step 5
> 
> Thread B
> ufs_mtk_mcq_intr(cq complete ISR)			step 2
> 	scsi_done
> 		...
> 		__blk_mq_free_request
> 			rq->mq_hctx = NULL;		step 4

Reviewed-by: Bart Van Assche <bvanassche@acm.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one racing issue
  2024-06-28  7:00 ` [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one " peter.wang
@ 2024-06-28 19:26   ` Bart Van Assche
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2024-06-28 19:26 UTC (permalink / raw)
  To: peter.wang, linux-scsi, martin.petersen, avri.altman, alim.akhtar,
	jejb
  Cc: wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, powen.kao, qilin.tan, lin.gui,
	tun-yu.yu, eddie.huang, naomi.chu, chu.stanley, stable

On 6/28/24 12:00 AM, peter.wang@mediatek.com wrote:
> When ufshcd_abort_one racing with complete ISR,
> the completed tag of request's mq_hctx pointer will set NULL by ISR.
> Same as previous patch race condition.
> Return success when request is completed by ISR beacuse ufshcd_abort_one
> dose't need do anything.

dose't -> doesn't. Anyway:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue
  2024-06-28  7:00 [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue peter.wang
  2024-06-28  7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
  2024-06-28  7:00 ` [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one " peter.wang
@ 2024-07-05  3:56 ` Martin K. Petersen
  2 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2024-07-05  3:56 UTC (permalink / raw)
  To: linux-scsi, avri.altman, alim.akhtar, jejb, peter.wang
  Cc: Martin K . Petersen, wsd_upstream, linux-mediatek, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, powen.kao,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu,
	chu.stanley

On Fri, 28 Jun 2024 15:00:28 +0800, peter.wang@mediatek.com wrote:

> This series fixes race condition KE in ufshcd_err_handler, which call
> ufshcd_abort_all abort an already completed request by ISR in MCQ mode.
> 
> V3:
>  - Modify ufshcd_mcq_req_to_hwq to distinguish cmd is completed or not
>  - Split two patches and add more race description.
> 
> [...]

Applied to 6.10/scsi-fixes, thanks!

[1/2] ufs: core: fix ufshcd_clear_cmd racing issue
      https://git.kernel.org/mkp/scsi/c/9307a998cb98
[2/2] ufs: core: fix ufshcd_abort_one racing issue
      https://git.kernel.org/mkp/scsi/c/74736103fb41

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-05  3:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28  7:00 [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all racing issue peter.wang
2024-06-28  7:00 ` [PATCH v3 1/2] ufs: core: fix ufshcd_clear_cmd " peter.wang
2024-06-28 19:25   ` Bart Van Assche
2024-06-28  7:00 ` [PATCH v3 2/2] ufs: core: fix ufshcd_abort_one " peter.wang
2024-06-28 19:26   ` Bart Van Assche
2024-07-05  3:56 ` [PATCH v3 0/2] ufs: core: fix ufshcd_abort_all " Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).