linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme: clear nvme request for nonready request
@ 2025-06-28  6:46 Yu Kuai
  2025-06-28 10:13 ` Sagi Grimberg
  2025-07-14 13:42 ` Christoph Hellwig
  0 siblings, 2 replies; 4+ messages in thread
From: Yu Kuai @ 2025-06-28  6:46 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi
  Cc: yi.zhang, linux-nvme, linux-kernel, yukuai3, yukuai1, yi.zhang,
	yangerkun, johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

It's found nvme mpath IO inflight counter can be decreased to negtive by
following stack:

CPU: 12 UID: 0 PID: 466 Comm: kworker/12:1H Tainted: G
   6.16.0-rc3.yu+ #2 PREEMPT(voluntary)
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:bdev_end_io_acct+0x494/0x5c0
Call Trace:
 <TASK>
 nvme_end_req+0x4d/0x70 [nvme_core]
 nvme_failover_req+0x3bd/0x530 [nvme_core]
 nvme_fail_nonready_command+0x12c/0x170 [nvme_core]
 nvme_fc_queue_rq+0x463/0x720 [nvme_fc]
 blk_mq_dispatch_rq_list+0x358/0x1260
 __blk_mq_sched_dispatch_requests+0x2dd/0x480
 blk_mq_sched_dispatch_requests+0xa6/0x140
 blk_mq_run_work_fn+0x1bb/0x2a0
 process_one_work+0x8ca/0x1950
 worker_thread+0x58d/0xcf0
 kthread+0x3d5/0x7a0
 ret_from_fork+0x403/0x510
 ret_from_fork_asm+0x1a/0x30
 </TASK>

The IO inflight counter is not increased from nvme_fail_nonready_command()
yet, hence decrease it will cause it to be negative.

This is not a problem for blk-mq request because it's already
initialized before issuing, however, nvme request is only initialized from
following nvme_setup_cmd(). Fix the problem by clearing it in
nvme_fail_nonready_command().

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/all/CAHj4cs_+dauobyYyP805t33WMJVzOWj=7+51p4_j9rA63D9sog@mail.gmail.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/nvme/host/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 92697f98c601..8caafa25c010 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -764,6 +764,9 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl,
 	    !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) &&
 	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
 		return BLK_STS_RESOURCE;
+
+	if (!(rq->rq_flags & RQF_DONTPREP))
+		nvme_clear_nvme_request(rq);
 	return nvme_host_path_error(rq);
 }
 EXPORT_SYMBOL_GPL(nvme_fail_nonready_command);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvme: clear nvme request for nonready request
  2025-06-28  6:46 [PATCH] nvme: clear nvme request for nonready request Yu Kuai
@ 2025-06-28 10:13 ` Sagi Grimberg
  2025-07-14 13:42 ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Sagi Grimberg @ 2025-06-28 10:13 UTC (permalink / raw)
  To: Yu Kuai, kbusch, axboe, hch
  Cc: yi.zhang, linux-nvme, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

First, we need change the patch title to clarify that it fixes a bug.

i.e. something like:
nvme: fix nvme-mpath misaccounting of inflight active IO

Second, we need to add a fixes tag (i.e. addition of nvme-mpath 
nr_active accounting)

Third, we need a code-comment that explains this subtlety because it is 
not trivial.

On 28/06/2025 9:46, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> It's found nvme mpath IO inflight counter can be decreased to negtive by
> following stack:
>
> CPU: 12 UID: 0 PID: 466 Comm: kworker/12:1H Tainted: G
>     6.16.0-rc3.yu+ #2 PREEMPT(voluntary)
> Workqueue: kblockd blk_mq_run_work_fn
> RIP: 0010:bdev_end_io_acct+0x494/0x5c0
> Call Trace:
>   <TASK>
>   nvme_end_req+0x4d/0x70 [nvme_core]
>   nvme_failover_req+0x3bd/0x530 [nvme_core]
>   nvme_fail_nonready_command+0x12c/0x170 [nvme_core]
>   nvme_fc_queue_rq+0x463/0x720 [nvme_fc]
>   blk_mq_dispatch_rq_list+0x358/0x1260
>   __blk_mq_sched_dispatch_requests+0x2dd/0x480
>   blk_mq_sched_dispatch_requests+0xa6/0x140
>   blk_mq_run_work_fn+0x1bb/0x2a0
>   process_one_work+0x8ca/0x1950
>   worker_thread+0x58d/0xcf0
>   kthread+0x3d5/0x7a0
>   ret_from_fork+0x403/0x510
>   ret_from_fork_asm+0x1a/0x30
>   </TASK>
>
> The IO inflight counter is not increased from nvme_fail_nonready_command()
> yet, hence decrease it will cause it to be negative.
>
> This is not a problem for blk-mq request because it's already
> initialized before issuing, however, nvme request is only initialized from
> following nvme_setup_cmd(). Fix the problem by clearing it in
> nvme_fail_nonready_command().
>
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Closes: https://lore.kernel.org/all/CAHj4cs_+dauobyYyP805t33WMJVzOWj=7+51p4_j9rA63D9sog@mail.gmail.com/
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/nvme/host/core.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 92697f98c601..8caafa25c010 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -764,6 +764,9 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl,
>   	    !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) &&
>   	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
>   		return BLK_STS_RESOURCE;
> +
> +	if (!(rq->rq_flags & RQF_DONTPREP))
> +		nvme_clear_nvme_request(rq);
>   	return nvme_host_path_error(rq);
>   }
>   EXPORT_SYMBOL_GPL(nvme_fail_nonready_command);


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvme: clear nvme request for nonready request
  2025-06-28  6:46 [PATCH] nvme: clear nvme request for nonready request Yu Kuai
  2025-06-28 10:13 ` Sagi Grimberg
@ 2025-07-14 13:42 ` Christoph Hellwig
  2025-07-15  1:08   ` Yu Kuai
  1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2025-07-14 13:42 UTC (permalink / raw)
  To: Yu Kuai
  Cc: kbusch, axboe, hch, sagi, yi.zhang, linux-nvme, linux-kernel,
	yukuai3, yi.zhang, yangerkun, johnny.chenyi

Are you going to resend this with the feedback from Sagi taken into
account?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvme: clear nvme request for nonready request
  2025-07-14 13:42 ` Christoph Hellwig
@ 2025-07-15  1:08   ` Yu Kuai
  0 siblings, 0 replies; 4+ messages in thread
From: Yu Kuai @ 2025-07-15  1:08 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: kbusch, axboe, sagi, yi.zhang, linux-nvme, linux-kernel, yi.zhang,
	yangerkun, johnny.chenyi, yukuai (C)

Hi,

在 2025/07/14 21:42, Christoph Hellwig 写道:
> Are you going to resend this with the feedback from Sagi taken into
> account?
> 

Sure, sorry that I totally forgot about this patch.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-07-15  1:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-28  6:46 [PATCH] nvme: clear nvme request for nonready request Yu Kuai
2025-06-28 10:13 ` Sagi Grimberg
2025-07-14 13:42 ` Christoph Hellwig
2025-07-15  1:08   ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).