* [PATCH] nvme: donot try to resubmit a canceled IO
@ 2023-05-11 10:20 Chunguang Xu
2023-05-13 16:15 ` brookxu
0 siblings, 1 reply; 2+ messages in thread
From: Chunguang Xu @ 2023-05-11 10:20 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel
From: "chunguang.xu" <chunguang.xu@shopee.com>
Now if NVMe over rdma and NVMe over TCP module detected
controller is INACTIVE on the IO timeout path, it will
try to call nvmf_complete_timed_out_request() to
terminated this IO. But nvme_complete_rq() will continue
to retry this IO, as the request_queue is quiescing at
this time, if the target cannot connected or the host
actively executes disconnect, it will cause this IO hang
in the hctx dispatch queue and cannot be processed,
resulting in hung task, the calltrace as followed:
[ 1575.570245] INFO: task kworker/u129:6:758 blocked for more than 966 seconds.
[ 1575.577829] Tainted: G OE 5.4.0-77-shopee-generic #86+5
[ 1575.585323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1575.593670] kworker/u129:6 D 0 758 2 0x80004000
[ 1575.593681] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[ 1575.593683] Call Trace:
[ 1575.593689] __schedule+0x2ee/0x750
[ 1575.593691] schedule+0x42/0xb0
[ 1575.593693] io_schedule+0x16/0x40
[ 1575.593696] do_read_cache_page+0x438/0x840
[ 1575.593698] ? __switch_to_asm+0x40/0x70
[ 1575.593700] ? file_fdatawait_range+0x30/0x30
[ 1575.593702] read_cache_page+0x12/0x20
[ 1575.593704] read_dev_sector+0x27/0xd0
[ 1575.593705] read_lba+0xbd/0x220
[ 1575.593707] ? kmem_cache_alloc_trace+0x1b0/0x240
[ 1575.593708] efi_partition+0x1e0/0x700
[ 1575.593710] ? vsnprintf+0x39e/0x4e0
[ 1575.593712] ? snprintf+0x49/0x60
[ 1575.593714] check_partition+0x154/0x250
[ 1575.593715] rescan_partitions+0xae/0x280
[ 1575.593718] bdev_disk_changed+0x5f/0x70
[ 1575.593719] __blkdev_get+0x3e3/0x550
[ 1575.593720] blkdev_get+0x3d/0x150
[ 1575.593722] __device_add_disk+0x329/0x480
[ 1575.593723] device_add_disk+0x13/0x20
[ 1575.593727] nvme_mpath_set_live+0x125/0x130 [nvme_core]
[ 1575.593731] nvme_mpath_add_disk+0x11e/0x130 [nvme_core]
[ 1575.593734] nvme_validate_ns+0x6a8/0x9d0 [nvme_core]
[ 1575.593736] ? __switch_to_asm+0x40/0x70
[ 1575.593739] nvme_scan_work+0x1e0/0x350 [nvme_core]
[ 1575.593743] process_one_work+0x1eb/0x3b0
[ 1575.593745] worker_thread+0x4d/0x400
[ 1575.593747] kthread+0x104/0x140
[ 1575.593748] ? process_one_work+0x3b0/0x3b0
[ 1575.593750] ? kthread_park+0x90/0x90
[ 1575.593751] ret_from_fork+0x1f/0x40
This issue seems not fixed on lastes kernel, try to fix it here.
Signed-off-by: chunguang.xu <chunguang.xu@shopee.com>
---
drivers/nvme/host/core.c | 3 +++
drivers/nvme/host/fabrics.h | 1 +
2 files changed, 4 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ccb6eb1282f8..bf9273081595 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -349,6 +349,9 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
if (likely(nvme_req(req)->status == 0))
return COMPLETE;
+ if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
+ return COMPLETE;
+
if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
return AUTHENTICATE;
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index dcac3df8a5f7..40a5d6659af0 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -199,6 +199,7 @@ static inline void nvmf_complete_timed_out_request(struct request *rq)
{
if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
+ nvme_req(rq)->flags |= NVME_REQ_CANCELLED;
blk_mq_complete_request(rq);
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] nvme: donot try to resubmit a canceled IO
2023-05-11 10:20 [PATCH] nvme: donot try to resubmit a canceled IO Chunguang Xu
@ 2023-05-13 16:15 ` brookxu
0 siblings, 0 replies; 2+ messages in thread
From: brookxu @ 2023-05-13 16:15 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel
Hi all:
Sorry to make noise, please ignore this patch.
Thanks.
在 2023/5/11 18:20, Chunguang Xu 写道:
> From: "chunguang.xu" <chunguang.xu@shopee.com>
>
> Now if NVMe over rdma and NVMe over TCP module detected
> controller is INACTIVE on the IO timeout path, it will
> try to call nvmf_complete_timed_out_request() to
> terminated this IO. But nvme_complete_rq() will continue
> to retry this IO, as the request_queue is quiescing at
> this time, if the target cannot connected or the host
> actively executes disconnect, it will cause this IO hang
> in the hctx dispatch queue and cannot be processed,
> resulting in hung task, the calltrace as followed:
>
> [ 1575.570245] INFO: task kworker/u129:6:758 blocked for more than 966 seconds.
> [ 1575.577829] Tainted: G OE 5.4.0-77-shopee-generic #86+5
> [ 1575.585323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1575.593670] kworker/u129:6 D 0 758 2 0x80004000
> [ 1575.593681] Workqueue: nvme-wq nvme_scan_work [nvme_core]
> [ 1575.593683] Call Trace:
> [ 1575.593689] __schedule+0x2ee/0x750
> [ 1575.593691] schedule+0x42/0xb0
> [ 1575.593693] io_schedule+0x16/0x40
> [ 1575.593696] do_read_cache_page+0x438/0x840
> [ 1575.593698] ? __switch_to_asm+0x40/0x70
> [ 1575.593700] ? file_fdatawait_range+0x30/0x30
> [ 1575.593702] read_cache_page+0x12/0x20
> [ 1575.593704] read_dev_sector+0x27/0xd0
> [ 1575.593705] read_lba+0xbd/0x220
> [ 1575.593707] ? kmem_cache_alloc_trace+0x1b0/0x240
> [ 1575.593708] efi_partition+0x1e0/0x700
> [ 1575.593710] ? vsnprintf+0x39e/0x4e0
> [ 1575.593712] ? snprintf+0x49/0x60
> [ 1575.593714] check_partition+0x154/0x250
> [ 1575.593715] rescan_partitions+0xae/0x280
> [ 1575.593718] bdev_disk_changed+0x5f/0x70
> [ 1575.593719] __blkdev_get+0x3e3/0x550
> [ 1575.593720] blkdev_get+0x3d/0x150
> [ 1575.593722] __device_add_disk+0x329/0x480
> [ 1575.593723] device_add_disk+0x13/0x20
> [ 1575.593727] nvme_mpath_set_live+0x125/0x130 [nvme_core]
> [ 1575.593731] nvme_mpath_add_disk+0x11e/0x130 [nvme_core]
> [ 1575.593734] nvme_validate_ns+0x6a8/0x9d0 [nvme_core]
> [ 1575.593736] ? __switch_to_asm+0x40/0x70
> [ 1575.593739] nvme_scan_work+0x1e0/0x350 [nvme_core]
> [ 1575.593743] process_one_work+0x1eb/0x3b0
> [ 1575.593745] worker_thread+0x4d/0x400
> [ 1575.593747] kthread+0x104/0x140
> [ 1575.593748] ? process_one_work+0x3b0/0x3b0
> [ 1575.593750] ? kthread_park+0x90/0x90
> [ 1575.593751] ret_from_fork+0x1f/0x40
>
> This issue seems not fixed on lastes kernel, try to fix it here.
>
> Signed-off-by: chunguang.xu <chunguang.xu@shopee.com>
> ---
> drivers/nvme/host/core.c | 3 +++
> drivers/nvme/host/fabrics.h | 1 +
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ccb6eb1282f8..bf9273081595 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -349,6 +349,9 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
> if (likely(nvme_req(req)->status == 0))
> return COMPLETE;
>
> + if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
> + return COMPLETE;
> +
> if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
> return AUTHENTICATE;
>
> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> index dcac3df8a5f7..40a5d6659af0 100644
> --- a/drivers/nvme/host/fabrics.h
> +++ b/drivers/nvme/host/fabrics.h
> @@ -199,6 +199,7 @@ static inline void nvmf_complete_timed_out_request(struct request *rq)
> {
> if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {
> nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;
> + nvme_req(rq)->flags |= NVME_REQ_CANCELLED;
> blk_mq_complete_request(rq);
> }
> }
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-05-13 16:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-11 10:20 [PATCH] nvme: donot try to resubmit a canceled IO Chunguang Xu
2023-05-13 16:15 ` brookxu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox