* [PATCHv5 0/2] nvme-tcp: fixup I/O stall on congested sockets
@ 2025-04-29 8:17 Hannes Reinecke
2025-04-29 8:17 ` [PATCH 1/2] nvme-tcp: sanitize request list handling Hannes Reinecke
2025-04-29 8:17 ` [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets Hannes Reinecke
0 siblings, 2 replies; 8+ messages in thread
From: Hannes Reinecke @ 2025-04-29 8:17 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, Hannes Reinecke
Hi all,
I have been chasing keep-alive timeouts with TLS enabled in the last few
weeks (monthsA, even ...). On larger setups (eg with 32 queues) the connection
never got established properly as I've been hitting keep-alive timeouts before
the last queue got connected.
Turns out that occasionally we simply do not send the keep-alive request; it's
been added to the request list but the io_work workqueue function is never
restarted as it bails out after nvme_tcp_try_recv() returns -EAGAIN.
During debugging I also found that we're quite lazy with the list
handling of requests, so I've added a patche to ensure that all list
elements are properly terminated.
As usual, comments and reviews are welcome.
Changes to v4:
- Drop check for 'queue->req' as noticed by Sagi
Changes to v3:
- Drop first patch as it already had been applied
- Include reviews from Sagi
- Check for sk_sock_is_writeable() to avoid requeing io_work when
the socket is blocked
Changes to v2:
- Removed AEN patches again
Changes to the original submission:
- Include reviews from Chris Leech
- Add patch to requeue namespace scan
- Add patch to re-read ANA log page
Hannes Reinecke (2):
nvme-tcp: sanitize request list handling
nvme-tcp: fix I/O stalls on congested sockets
drivers/nvme/host/tcp.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] nvme-tcp: sanitize request list handling
2025-04-29 8:17 [PATCHv5 0/2] nvme-tcp: fixup I/O stall on congested sockets Hannes Reinecke
@ 2025-04-29 8:17 ` Hannes Reinecke
2025-04-29 9:12 ` Sagi Grimberg
2025-04-29 8:17 ` [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets Hannes Reinecke
1 sibling, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2025-04-29 8:17 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, Hannes Reinecke
Validate the request in nvme_tcp_handle_r2t() to ensure it's not
part of any list, otherwise a malicious R2T PDU might inject a
loop in request list processing.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
---
drivers/nvme/host/tcp.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index a9d455c39652..1c39a307456c 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -452,7 +452,8 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue)
return NULL;
}
- list_del(&req->entry);
+ list_del_init(&req->entry);
+ init_llist_node(&req->lentry);
return req;
}
@@ -560,6 +561,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set,
req->queue = queue;
nvme_req(rq)->ctrl = &ctrl->ctrl;
nvme_req(rq)->cmd = &pdu->cmd;
+ init_llist_node(&req->lentry);
+ INIT_LIST_HEAD(&req->entry);
return 0;
}
@@ -764,6 +767,14 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue,
return -EPROTO;
}
+ if (llist_on_list(&req->lentry) ||
+ !list_empty(&req->entry)) {
+ dev_err(queue->ctrl->ctrl.device,
+ "req %d unexpected r2t while processing request\n",
+ rq->tag);
+ return -EPROTO;
+ }
+
req->pdu_len = 0;
req->h2cdata_left = r2t_length;
req->h2cdata_offset = r2t_offset;
@@ -2611,6 +2622,8 @@ static void nvme_tcp_submit_async_event(struct nvme_ctrl *arg)
ctrl->async_req.offset = 0;
ctrl->async_req.curr_bio = NULL;
ctrl->async_req.data_len = 0;
+ init_llist_node(&ctrl->async_req.lentry);
+ INIT_LIST_HEAD(&ctrl->async_req.entry);
nvme_tcp_queue_request(&ctrl->async_req, true);
}
--
2.35.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets
2025-04-29 8:17 [PATCHv5 0/2] nvme-tcp: fixup I/O stall on congested sockets Hannes Reinecke
2025-04-29 8:17 ` [PATCH 1/2] nvme-tcp: sanitize request list handling Hannes Reinecke
@ 2025-04-29 8:17 ` Hannes Reinecke
2025-04-29 9:13 ` Sagi Grimberg
1 sibling, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2025-04-29 8:17 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Sagi Grimberg, linux-nvme, Hannes Reinecke,
Chris Leech
When the socket is busy processing nvme_tcp_try_recv() might
return -EAGAIN, but this doesn't automatically imply that
the sending side is blocked, too.
So check if there are pending requests once nvme_tcp_try_recv()
returns -EAGAIN and continue with the sending loop to avoid
I/O stalls.
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Hannes Reinecke <hare@kernel.org>
---
drivers/nvme/host/tcp.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1c39a307456c..eb4222c5a77a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1361,7 +1361,7 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue)
queue->nr_cqe = 0;
consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
release_sock(sk);
- return consumed;
+ return consumed == -EAGAIN ? 0 : consumed;
}
static void nvme_tcp_io_work(struct work_struct *w)
@@ -1389,6 +1389,11 @@ static void nvme_tcp_io_work(struct work_struct *w)
else if (unlikely(result < 0))
return;
+ /* did we get some space after spending time in recv? */
+ if (nvme_tcp_queue_has_pending(queue) &&
+ sk_stream_is_writeable(queue->sock->sk))
+ pending = true;
+
if (!pending || !queue->rd_enabled)
return;
--
2.35.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets
2025-04-29 8:17 ` [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets Hannes Reinecke
@ 2025-04-29 9:13 ` Sagi Grimberg
2025-04-29 11:31 ` Hannes Reinecke
0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2025-04-29 9:13 UTC (permalink / raw)
To: Hannes Reinecke, Christoph Hellwig, Kamaljit Singh
Cc: Keith Busch, linux-nvme, Chris Leech
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
I want kamaljit to give his Tested-by tag.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets
2025-04-29 9:13 ` Sagi Grimberg
@ 2025-04-29 11:31 ` Hannes Reinecke
0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2025-04-29 11:31 UTC (permalink / raw)
To: Sagi Grimberg, Hannes Reinecke, Christoph Hellwig, Kamaljit Singh
Cc: Keith Busch, linux-nvme, Chris Leech
On 4/29/25 11:13, Sagi Grimberg wrote:
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
>
> I want kamaljit to give his Tested-by tag.
Sure.
I just wanted to push out the patches to give him a clean state
for his testing.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCHv4 0/2] nvme-tcp: fixup I/O stall on congested sockets
@ 2025-04-28 6:50 Hannes Reinecke
2025-04-28 6:50 ` [PATCH 2/2] nvme-tcp: fix I/O stalls " Hannes Reinecke
0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2025-04-28 6:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Sagi Grimberg, linux-nvme, Kamaljit Singh,
Hannes Reinecke
Hi all,
I have been chasing keep-alive timeouts with TLS enabled in the last few
weeks (months, even ...). On larger setups (eg with 32 queues) the connection
never got established properly as I've been hitting keep-alive timeouts before
the last queue got connected.
Turns out that occasionally we simply do not send the keep-alive request; it's
been added to the request list but the io_work workqueue function is never
restarted as it bails out after nvme_tcp_try_recv() returns -EAGAIN.
During debugging I also found that we're quite lazy with the list
handling of requests, so I've added a patche to ensure that all list
elements are properly terminated.
As usual, comments and reviews are welcome.
Changes to v3:
- Drop first patch as it already had been applied
- Include reviews from Sagi
- Check for sk_sock_is_writeable() to avoid requeing io_work when
the socket is blocked
Changes to v2:
- Removed AEN patches again
Changes to the original submission:
- Include reviews from Chris Leech
- Add patch to requeue namespace scan
- Add patch to re-read ANA log page
Hannes Reinecke (2):
nvme-tcp: sanitize request list handling
nvme-tcp: fix I/O stalls on congested sockets
drivers/nvme/host/tcp.c | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets
2025-04-28 6:50 [PATCHv4 0/2] nvme-tcp: fixup I/O stall " Hannes Reinecke
@ 2025-04-28 6:50 ` Hannes Reinecke
2025-04-28 11:24 ` Sagi Grimberg
0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2025-04-28 6:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Sagi Grimberg, linux-nvme, Kamaljit Singh,
Hannes Reinecke, Chris Leech
When the socket is busy processing nvme_tcp_try_recv() might
return -EAGAIN, but this doesn't automatically imply that
the sending side is blocked, too.
So check if there are pending requests once nvme_tcp_try_recv()
returns -EAGAIN and continue with the sending loop to avoid
I/O stalls.
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Hannes Reinecke <hare@kernel.org>
---
drivers/nvme/host/tcp.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 10ccd7cf5b8c..5c73af022273 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1363,7 +1363,7 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue)
queue->nr_cqe = 0;
consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
release_sock(sk);
- return consumed;
+ return consumed == -EAGAIN ? 0 : consumed;
}
static void nvme_tcp_io_work(struct work_struct *w)
@@ -1391,6 +1391,11 @@ static void nvme_tcp_io_work(struct work_struct *w)
else if (unlikely(result < 0))
return;
+ /* did we get some space after spending time in recv? */
+ if (nvme_tcp_queue_has_pending(queue) &&
+ sk_stream_is_writeable(queue->sock->sk))
+ pending = true;
+
if (!pending || !queue->rd_enabled)
return;
--
2.35.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets
2025-04-28 6:50 ` [PATCH 2/2] nvme-tcp: fix I/O stalls " Hannes Reinecke
@ 2025-04-28 11:24 ` Sagi Grimberg
0 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2025-04-28 11:24 UTC (permalink / raw)
To: Hannes Reinecke, Christoph Hellwig
Cc: Keith Busch, linux-nvme, Kamaljit Singh, Chris Leech
On 28/04/2025 9:50, Hannes Reinecke wrote:
> When the socket is busy processing nvme_tcp_try_recv() might
> return -EAGAIN, but this doesn't automatically imply that
> the sending side is blocked, too.
> So check if there are pending requests once nvme_tcp_try_recv()
> returns -EAGAIN and continue with the sending loop to avoid
> I/O stalls.
>
> Acked-by: Chris Leech <cleech@redhat.com>
> Signed-off-by: Hannes Reinecke <hare@kernel.org>
> ---
> drivers/nvme/host/tcp.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 10ccd7cf5b8c..5c73af022273 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1363,7 +1363,7 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue)
> queue->nr_cqe = 0;
> consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
> release_sock(sk);
> - return consumed;
> + return consumed == -EAGAIN ? 0 : consumed;
> }
>
> static void nvme_tcp_io_work(struct work_struct *w)
> @@ -1391,6 +1391,11 @@ static void nvme_tcp_io_work(struct work_struct *w)
> else if (unlikely(result < 0))
> return;
>
> + /* did we get some space after spending time in recv? */
> + if (nvme_tcp_queue_has_pending(queue) &&
> + sk_stream_is_writeable(queue->sock->sk))
> + pending = true;
> +
> if (!pending || !queue->rd_enabled)
> return;
>
Looks fine,
Need Kamaljit to verify it.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-04-29 11:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29 8:17 [PATCHv5 0/2] nvme-tcp: fixup I/O stall on congested sockets Hannes Reinecke
2025-04-29 8:17 ` [PATCH 1/2] nvme-tcp: sanitize request list handling Hannes Reinecke
2025-04-29 9:12 ` Sagi Grimberg
2025-04-29 8:17 ` [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets Hannes Reinecke
2025-04-29 9:13 ` Sagi Grimberg
2025-04-29 11:31 ` Hannes Reinecke
-- strict thread matches above, loose matches on Subject: below --
2025-04-28 6:50 [PATCHv4 0/2] nvme-tcp: fixup I/O stall " Hannes Reinecke
2025-04-28 6:50 ` [PATCH 2/2] nvme-tcp: fix I/O stalls " Hannes Reinecke
2025-04-28 11:24 ` Sagi Grimberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox