[PATCH 0/3] misc nvme related fixes

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] misc nvme related fixes
@ 2025-01-28 16:34 Daniel Wagner
  2025-01-28 16:34 ` [PATCH 1/3] nvme-tcp: rate limit error message in send path Daniel Wagner
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Daniel Wagner @ 2025-01-28 16:34 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Ming Lei
  Cc: James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block, Daniel Wagner

While working on a patchset for TP4129 (kato corrections and
clarifications) I run into a bunch small issues:

- nvme-tcp was spamming the kernel messages, thus I think it makes sense
to rate limit those messages. I observed this with my changes, so it
might not happen that often in mainline right now but still I think it
makes sense to rate limit them.

- nvme-fc should use the nvme state accossor to ensure it always sees
the current state.

- blk_mq_tagset_wait_completed_request was hanging on ctrl deletion path
and after a bit of head scratching I figured
blk_mq_tagset_wait_completed_request does not do what it claims. After
this fix, the shutdown path works fine and I think this could go in
without my other pending stuff I am working on. As the only user of this
function is the nvme subsytem, I included in this mini series.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
Daniel Wagner (3):
      nvme-tcp: rate limit error message in send path
      nvme-fc: use ctrl state getter
      blk-mq: fix wait condition for tagset wait completed check

 block/blk-mq-tag.c      | 6 +++---
 drivers/nvme/host/fc.c  | 9 ++++++---
 drivers/nvme/host/tcp.c | 4 ++--
 3 files changed, 11 insertions(+), 8 deletions(-)
---
base-commit: fd6102e646a888931ad6ab21843c6ee4355e7525
change-id: 20250128-nvme-misc-fixes-07e8dad616cd

Best regards,
-- 
Daniel Wagner <wagi@kernel.org>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] nvme-tcp: rate limit error message in send path
  2025-01-28 16:34 [PATCH 0/3] misc nvme related fixes Daniel Wagner
@ 2025-01-28 16:34 ` Daniel Wagner
  2025-01-29  6:05   ` Christoph Hellwig
  2025-01-31  8:09   ` Sagi Grimberg
  2025-01-28 16:34 ` [PATCH 2/3] nvme-fc: use ctrl state getter Daniel Wagner
  2025-01-28 16:34 ` [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check Daniel Wagner
  2 siblings, 2 replies; 17+ messages in thread
From: Daniel Wagner @ 2025-01-28 16:34 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Ming Lei
  Cc: James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block, Daniel Wagner

If a lot of request are in the queue, this message is spamming the logs,
thus rate limit it.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/nvme/host/tcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index dc5bbca58c6dcbce40cfa3de893592d768ebc939..1ed0bc10b2dffe534536b1073abc0302056aa51e 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1257,8 +1257,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
 	if (ret == -EAGAIN) {
 		ret = 0;
 	} else if (ret < 0) {
-		dev_err(queue->ctrl->ctrl.device,
-			"failed to send request %d\n", ret);
+		dev_err_ratelimited(queue->ctrl->ctrl.device,
+				    "failed to send request %d\n", ret);
 		nvme_tcp_fail_request(queue->request);
 		nvme_tcp_done_send_req(queue);
 	}

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] nvme-fc: use ctrl state getter
  2025-01-28 16:34 [PATCH 0/3] misc nvme related fixes Daniel Wagner
  2025-01-28 16:34 ` [PATCH 1/3] nvme-tcp: rate limit error message in send path Daniel Wagner
@ 2025-01-28 16:34 ` Daniel Wagner
  2025-01-29  6:05   ` Christoph Hellwig
                     ` (2 more replies)
  2025-01-28 16:34 ` [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check Daniel Wagner
  2 siblings, 3 replies; 17+ messages in thread
From: Daniel Wagner @ 2025-01-28 16:34 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Ming Lei
  Cc: James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block, Daniel Wagner

Do not access the state variable directly, instead use proper
synchronization so not stale data is read.

Fixes: e6e7f7ac03e4 ("nvme: ensure reset state check ordering")
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/nvme/host/fc.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 55884d3df6f291cfddb4742e135b54a72f1cfa05..f4f1866fbd5b8b05730a785c7d256108c9344e62 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2087,7 +2087,8 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 		nvme_fc_complete_rq(rq);
 
 check_error:
-	if (terminate_assoc && ctrl->ctrl.state != NVME_CTRL_RESETTING)
+	if (terminate_assoc &&
+	    nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_RESETTING)
 		queue_work(nvme_reset_wq, &ctrl->ioerr_work);
 }
 
@@ -2541,6 +2542,8 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
 static void
 nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
 {
+	enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
+
 	/*
 	 * if an error (io timeout, etc) while (re)connecting, the remote
 	 * port requested terminating of the association (disconnect_ls)
@@ -2548,7 +2551,7 @@ nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
 	 * the controller.  Abort any ios on the association and let the
 	 * create_association error path resolve things.
 	 */
-	if (ctrl->ctrl.state == NVME_CTRL_CONNECTING) {
+	if (state == NVME_CTRL_CONNECTING) {
 		__nvme_fc_abort_outstanding_ios(ctrl, true);
 		dev_warn(ctrl->ctrl.device,
 			"NVME-FC{%d}: transport error during (re)connect\n",
@@ -2557,7 +2560,7 @@ nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
 	}
 
 	/* Otherwise, only proceed if in LIVE state - e.g. on first error */
-	if (ctrl->ctrl.state != NVME_CTRL_LIVE)
+	if (state != NVME_CTRL_LIVE)
 		return;
 
 	dev_warn(ctrl->ctrl.device,

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check
  2025-01-28 16:34 [PATCH 0/3] misc nvme related fixes Daniel Wagner
  2025-01-28 16:34 ` [PATCH 1/3] nvme-tcp: rate limit error message in send path Daniel Wagner
  2025-01-28 16:34 ` [PATCH 2/3] nvme-fc: use ctrl state getter Daniel Wagner
@ 2025-01-28 16:34 ` Daniel Wagner
  2025-01-29  6:07   ` Christoph Hellwig
                     ` (2 more replies)
  2 siblings, 3 replies; 17+ messages in thread
From: Daniel Wagner @ 2025-01-28 16:34 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Ming Lei
  Cc: James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block, Daniel Wagner

blk_mq_tagset_count_completed_reqs returns the number of completed
requests. The only user of this function is
blk_mq_tagset_wait_completed_request which wants to know how many
request are not yet completed. Thus return the number of in flight
requests and terminate the wait loop when there is no inflight request.

Fixes: f9934a80f91d ("blk-mq: introduce blk_mq_tagset_wait_completed_request()")
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 block/blk-mq-tag.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index b9f417d980b46d54b74dec8adcb5b04e6a78635c..3ce46afb65e3c3de9f11ca440bf0f335f21d0998 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -450,11 +450,11 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
 
-static bool blk_mq_tagset_count_completed_rqs(struct request *rq, void *data)
+static bool blk_mq_tagset_count_inflight_rqs(struct request *rq, void *data)
 {
 	unsigned *count = data;
 
-	if (blk_mq_request_completed(rq))
+	if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT)
 		(*count)++;
 	return true;
 }
@@ -472,7 +472,7 @@ void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset)
 		unsigned count = 0;
 
 		blk_mq_tagset_busy_iter(tagset,
-				blk_mq_tagset_count_completed_rqs, &count);
+				blk_mq_tagset_count_inflight_rqs, &count);
 		if (!count)
 			break;
 		msleep(5);

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] nvme-tcp: rate limit error message in send path
  2025-01-28 16:34 ` [PATCH 1/3] nvme-tcp: rate limit error message in send path Daniel Wagner
@ 2025-01-29  6:05   ` Christoph Hellwig
  2025-01-30 15:25     ` Daniel Wagner
  2025-01-31  8:09   ` Sagi Grimberg
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-01-29  6:05 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Ming Lei, James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block

On Tue, Jan 28, 2025 at 05:34:46PM +0100, Daniel Wagner wrote:
> If a lot of request are in the queue, this message is spamming the logs,
> thus rate limit it.

Are in the queue when what happens?  Not that I'm against this,
but if we have a known condition where this error is printed a lot
we should probably skip it entirely for that?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] nvme-fc: use ctrl state getter
  2025-01-28 16:34 ` [PATCH 2/3] nvme-fc: use ctrl state getter Daniel Wagner
@ 2025-01-29  6:05   ` Christoph Hellwig
  2025-01-29 18:39   ` Keith Busch
  2025-01-31  8:09   ` Sagi Grimberg
  2 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2025-01-29  6:05 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Ming Lei, James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check
  2025-01-28 16:34 ` [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check Daniel Wagner
@ 2025-01-29  6:07   ` Christoph Hellwig
  2025-01-29  9:54   ` Nilay Shroff
  2025-01-31  8:13   ` Sagi Grimberg
  2 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2025-01-29  6:07 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Ming Lei, James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block

On Tue, Jan 28, 2025 at 05:34:48PM +0100, Daniel Wagner wrote:
> blk_mq_tagset_count_completed_reqs returns the number of completed
> requests. The only user of this function is
> blk_mq_tagset_wait_completed_request which wants to know how many
> request are not yet completed. Thus return the number of in flight
> requests and terminate the wait loop when there is no inflight request.

This looks sensible, but you should probably send it directly to
Jens instead of tagging it onto a nvme series.

The code could also really use a comment on what we're counting and why.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check
  2025-01-28 16:34 ` [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check Daniel Wagner
  2025-01-29  6:07   ` Christoph Hellwig
@ 2025-01-29  9:54   ` Nilay Shroff
  2025-01-31  8:13   ` Sagi Grimberg
  2 siblings, 0 replies; 17+ messages in thread
From: Nilay Shroff @ 2025-01-29  9:54 UTC (permalink / raw)
  To: Daniel Wagner, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Ming Lei
  Cc: James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block

On 1/28/25 10:04 PM, Daniel Wagner wrote:
> blk_mq_tagset_count_completed_reqs returns the number of completed
> requests. The only user of this function is
> blk_mq_tagset_wait_completed_request which wants to know how many
> request are not yet completed. Thus return the number of in flight
> requests and terminate the wait loop when there is no inflight request.
> 
> Fixes: f9934a80f91d ("blk-mq: introduce blk_mq_tagset_wait_completed_request()")
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Daniel Wagner <wagi@kernel.org>
> ---
>  block/blk-mq-tag.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index b9f417d980b46d54b74dec8adcb5b04e6a78635c..3ce46afb65e3c3de9f11ca440bf0f335f21d0998 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -450,11 +450,11 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
>  }
>  EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
>  
> -static bool blk_mq_tagset_count_completed_rqs(struct request *rq, void *data)
> +static bool blk_mq_tagset_count_inflight_rqs(struct request *rq, void *data)
>  {
>  	unsigned *count = data;
>  
> -	if (blk_mq_request_completed(rq))
> +	if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT)
>  		(*count)++;
>  	return true;
>  }
> @@ -472,7 +472,7 @@ void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset)
>  		unsigned count = 0;
>  
>  		blk_mq_tagset_busy_iter(tagset,
> -				blk_mq_tagset_count_completed_rqs, &count);
> +				blk_mq_tagset_count_inflight_rqs, &count);
>  		if (!count)
>  			break;
>  		msleep(5);
> 
I see that blk_mq_tagset_wait_completed_request() is called from nvme_cancel_tagset()
and nvme_cancel_admin_tagset(). And it seems to me that the intention here's to wait 
until each completed requests are freed (or change its state to MQ_RQ_IDLE). 

Looking at code, the nvme_cancel_xxx() first invokes blk_mq_tagset_busy_iter() which 
iterates through tagset and cancels each in-flight request and marks the request state
to MQ_RQ_COMPLETE. Next in blk_mq_tagset_wait_completed_request(), we wait for each
completed request state changed to anything but MQ_RQ_COMPLETE. The next state of the 
request would be naturally MQ_RQ_IDLE once that request is freed. So in blk_mq_tagset_
wait_completed_request(), essentially we wait for request state change from MQ_RQ_COMPLETE
to MQ_RQ_IDLE.

So now if the proposal is that blk_mq_tagset_wait_completed_request() has to wait only 
if there's any in-flight request then it seems this function would never need to wait 
and looks redundant because req->state would never be MQ_RQ_IN_FLIGHT as those would 
have been already changed to MQ_RQ_COMPLETE when nvme_cancel_xxx() invokes blk_mq_tagset_
busy_iter(ctrl->tagset, nvme_cancel_request, ctrl).

Having said that, I am not sure what was the real intention here, in nvme_cancel_xxx(), 
do we really need to wait only until in-flight requests are completed or synchronize with 
request's completion callback (i.e. wait until all completed requests are freed)? 

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] nvme-fc: use ctrl state getter
  2025-01-28 16:34 ` [PATCH 2/3] nvme-fc: use ctrl state getter Daniel Wagner
  2025-01-29  6:05   ` Christoph Hellwig
@ 2025-01-29 18:39   ` Keith Busch
  2025-01-31  8:09   ` Sagi Grimberg
  2 siblings, 0 replies; 17+ messages in thread
From: Keith Busch @ 2025-01-29 18:39 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Ming Lei,
	James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block

On Tue, Jan 28, 2025 at 05:34:47PM +0100, Daniel Wagner wrote:
> Do not access the state variable directly, instead use proper
> synchronization so not stale data is read.
> 
> Fixes: e6e7f7ac03e4 ("nvme: ensure reset state check ordering")

Thanks, applied to nvme-6.14.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] nvme-tcp: rate limit error message in send path
  2025-01-29  6:05   ` Christoph Hellwig
@ 2025-01-30 15:25     ` Daniel Wagner
  2025-01-31  7:29       ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Wagner @ 2025-01-30 15:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Daniel Wagner, Keith Busch, Jens Axboe, Sagi Grimberg, Ming Lei,
	James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block

On Wed, Jan 29, 2025 at 07:05:34AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 28, 2025 at 05:34:46PM +0100, Daniel Wagner wrote:
> > If a lot of request are in the queue, this message is spamming the logs,
> > thus rate limit it.
> 
> Are in the queue when what happens?  Not that I'm against this,
> but if we have a known condition where this error is printed a lot
> we should probably skip it entirely for that?

The condition is that all the elements in the queue->send_list could fail as a
batch. I had a bug in my patches which re-queued all the failed command
immediately and semd them out again, thus spamming the log.

This behavior doesn't exist in upstream. I just thought it might make
sense to rate limit as precaution. I don't know if it is worth the code
churn.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] nvme-tcp: rate limit error message in send path
  2025-01-30 15:25     ` Daniel Wagner
@ 2025-01-31  7:29       ` Christoph Hellwig
  2025-01-31  8:09         ` Sagi Grimberg
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-01-31  7:29 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, Daniel Wagner, Keith Busch, Jens Axboe,
	Sagi Grimberg, Ming Lei, James Smart, Hannes Reinecke, linux-nvme,
	linux-kernel, linux-block

On Thu, Jan 30, 2025 at 04:25:35PM +0100, Daniel Wagner wrote:
> On Wed, Jan 29, 2025 at 07:05:34AM +0100, Christoph Hellwig wrote:
> > On Tue, Jan 28, 2025 at 05:34:46PM +0100, Daniel Wagner wrote:
> > > If a lot of request are in the queue, this message is spamming the logs,
> > > thus rate limit it.
> > 
> > Are in the queue when what happens?  Not that I'm against this,
> > but if we have a known condition where this error is printed a lot
> > we should probably skip it entirely for that?
> 
> The condition is that all the elements in the queue->send_list could fail as a
> batch. I had a bug in my patches which re-queued all the failed command
> immediately and semd them out again, thus spamming the log.
> 
> This behavior doesn't exist in upstream. I just thought it might make
> sense to rate limit as precaution. I don't know if it is worth the code
> churn.

I'm fine with the rate limiting.  I was just wondering if there is
a case where we'd easily hit it and could do even better.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] nvme-tcp: rate limit error message in send path
  2025-01-31  7:29       ` Christoph Hellwig
@ 2025-01-31  8:09         ` Sagi Grimberg
  0 siblings, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2025-01-31  8:09 UTC (permalink / raw)
  To: Christoph Hellwig, Daniel Wagner
  Cc: Daniel Wagner, Keith Busch, Jens Axboe, Ming Lei, James Smart,
	Hannes Reinecke, linux-nvme, linux-kernel, linux-block




On 31/01/2025 9:29, Christoph Hellwig wrote:
> On Thu, Jan 30, 2025 at 04:25:35PM +0100, Daniel Wagner wrote:
>> On Wed, Jan 29, 2025 at 07:05:34AM +0100, Christoph Hellwig wrote:
>>> On Tue, Jan 28, 2025 at 05:34:46PM +0100, Daniel Wagner wrote:
>>>> If a lot of request are in the queue, this message is spamming the logs,
>>>> thus rate limit it.
>>> Are in the queue when what happens?  Not that I'm against this,
>>> but if we have a known condition where this error is printed a lot
>>> we should probably skip it entirely for that?
>> The condition is that all the elements in the queue->send_list could fail as a
>> batch. I had a bug in my patches which re-queued all the failed command
>> immediately and semd them out again, thus spamming the log.
>>
>> This behavior doesn't exist in upstream. I just thought it might make
>> sense to rate limit as precaution. I don't know if it is worth the code
>> churn.
> I'm fine with the rate limiting.  I was just wondering if there is
> a case where we'd easily hit it and could do even better.

I agree with the change. The reason why I think its useful to keep is 
because its
has been really useful indication when debugging UAF panics.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] nvme-tcp: rate limit error message in send path
  2025-01-28 16:34 ` [PATCH 1/3] nvme-tcp: rate limit error message in send path Daniel Wagner
  2025-01-29  6:05   ` Christoph Hellwig
@ 2025-01-31  8:09   ` Sagi Grimberg
  1 sibling, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2025-01-31  8:09 UTC (permalink / raw)
  To: Daniel Wagner, Keith Busch, Jens Axboe, Christoph Hellwig,
	Ming Lei
  Cc: James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] nvme-fc: use ctrl state getter
  2025-01-28 16:34 ` [PATCH 2/3] nvme-fc: use ctrl state getter Daniel Wagner
  2025-01-29  6:05   ` Christoph Hellwig
  2025-01-29 18:39   ` Keith Busch
@ 2025-01-31  8:09   ` Sagi Grimberg
  2 siblings, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2025-01-31  8:09 UTC (permalink / raw)
  To: Daniel Wagner, Keith Busch, Jens Axboe, Christoph Hellwig,
	Ming Lei
  Cc: James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check
  2025-01-28 16:34 ` [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check Daniel Wagner
  2025-01-29  6:07   ` Christoph Hellwig
  2025-01-29  9:54   ` Nilay Shroff
@ 2025-01-31  8:13   ` Sagi Grimberg
  2025-01-31  8:46     ` Daniel Wagner
  2 siblings, 1 reply; 17+ messages in thread
From: Sagi Grimberg @ 2025-01-31  8:13 UTC (permalink / raw)
  To: Daniel Wagner, Keith Busch, Jens Axboe, Christoph Hellwig,
	Ming Lei
  Cc: James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block




On 28/01/2025 18:34, Daniel Wagner wrote:
> blk_mq_tagset_count_completed_reqs returns the number of completed
> requests. The only user of this function is
> blk_mq_tagset_wait_completed_request which wants to know how many
> request are not yet completed. Thus return the number of in flight
> requests and terminate the wait loop when there is no inflight request.
>
> Fixes: f9934a80f91d ("blk-mq: introduce blk_mq_tagset_wait_completed_request()")

Can you please describe what this patch is fixing? i.e. what is the 
observed bug?
It is not clear (to me) from the patch.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check
  2025-01-31  8:13   ` Sagi Grimberg
@ 2025-01-31  8:46     ` Daniel Wagner
  2025-01-31  8:54       ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Wagner @ 2025-01-31  8:46 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Daniel Wagner, Keith Busch, Jens Axboe, Christoph Hellwig,
	Ming Lei, James Smart, Hannes Reinecke, linux-nvme, linux-kernel,
	linux-block

On Fri, Jan 31, 2025 at 10:13:47AM +0200, Sagi Grimberg wrote:
> On 28/01/2025 18:34, Daniel Wagner wrote:
> > blk_mq_tagset_count_completed_reqs returns the number of completed
> > requests. The only user of this function is
> > blk_mq_tagset_wait_completed_request which wants to know how many
> > request are not yet completed. Thus return the number of in flight
> > requests and terminate the wait loop when there is no inflight request.
> > 
> > Fixes: f9934a80f91d ("blk-mq: introduce blk_mq_tagset_wait_completed_request()")
> 
> Can you please describe what this patch is fixing? i.e. what is the observed
> bug?
> It is not clear (to me) from the patch.

I have to double check again my reasoning after reading Nilay's reply.

The problem I am running into with my wip tp4129 patchset is that
requests are pending an newly introduce requeue_list queue and never get
canceled because these requests stay in the COMPLETE state at the
moment. This blocked the shutdown path.

I don't think this problem exists right now in upstream but I was under
the impression that the check is incorrect here. I mean the
only request we need to cancel are the ones which are not idle, no?

Anyway, I'll have to go back and do some more homework.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check
  2025-01-31  8:46     ` Daniel Wagner
@ 2025-01-31  8:54       ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2025-01-31  8:54 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Sagi Grimberg, Daniel Wagner, Keith Busch, Jens Axboe,
	Christoph Hellwig, Ming Lei, James Smart, Hannes Reinecke,
	linux-nvme, linux-kernel, linux-block

On Fri, Jan 31, 2025 at 09:46:46AM +0100, Daniel Wagner wrote:
> The problem I am running into with my wip tp4129 patchset is that
> requests are pending an newly introduce requeue_list queue and never get
> canceled because these requests stay in the COMPLETE state at the
> moment. This blocked the shutdown path.

I'd recomend to not bother impemebting 4129 as it's a fucked up mess.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-01-31  8:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28 16:34 [PATCH 0/3] misc nvme related fixes Daniel Wagner
2025-01-28 16:34 ` [PATCH 1/3] nvme-tcp: rate limit error message in send path Daniel Wagner
2025-01-29  6:05   ` Christoph Hellwig
2025-01-30 15:25     ` Daniel Wagner
2025-01-31  7:29       ` Christoph Hellwig
2025-01-31  8:09         ` Sagi Grimberg
2025-01-31  8:09   ` Sagi Grimberg
2025-01-28 16:34 ` [PATCH 2/3] nvme-fc: use ctrl state getter Daniel Wagner
2025-01-29  6:05   ` Christoph Hellwig
2025-01-29 18:39   ` Keith Busch
2025-01-31  8:09   ` Sagi Grimberg
2025-01-28 16:34 ` [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check Daniel Wagner
2025-01-29  6:07   ` Christoph Hellwig
2025-01-29  9:54   ` Nilay Shroff
2025-01-31  8:13   ` Sagi Grimberg
2025-01-31  8:46     ` Daniel Wagner
2025-01-31  8:54       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).