* [PATCHv4 0/2] nvme-tcp: fixup I/O stall on congested sockets @ 2025-04-28 6:50 Hannes Reinecke 2025-04-28 6:50 ` [PATCH 1/2] nvme-tcp: sanitize request list handling Hannes Reinecke 2025-04-28 6:50 ` [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets Hannes Reinecke 0 siblings, 2 replies; 5+ messages in thread From: Hannes Reinecke @ 2025-04-28 6:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Keith Busch, Sagi Grimberg, linux-nvme, Kamaljit Singh, Hannes Reinecke Hi all, I have been chasing keep-alive timeouts with TLS enabled in the last few weeks (months, even ...). On larger setups (eg with 32 queues) the connection never got established properly as I've been hitting keep-alive timeouts before the last queue got connected. Turns out that occasionally we simply do not send the keep-alive request; it's been added to the request list but the io_work workqueue function is never restarted as it bails out after nvme_tcp_try_recv() returns -EAGAIN. During debugging I also found that we're quite lazy with the list handling of requests, so I've added a patche to ensure that all list elements are properly terminated. As usual, comments and reviews are welcome. Changes to v3: - Drop first patch as it already had been applied - Include reviews from Sagi - Check for sk_sock_is_writeable() to avoid requeing io_work when the socket is blocked Changes to v2: - Removed AEN patches again Changes to the original submission: - Include reviews from Chris Leech - Add patch to requeue namespace scan - Add patch to re-read ANA log page Hannes Reinecke (2): nvme-tcp: sanitize request list handling nvme-tcp: fix I/O stalls on congested sockets drivers/nvme/host/tcp.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) -- 2.35.3 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] nvme-tcp: sanitize request list handling 2025-04-28 6:50 [PATCHv4 0/2] nvme-tcp: fixup I/O stall on congested sockets Hannes Reinecke @ 2025-04-28 6:50 ` Hannes Reinecke 2025-04-28 11:24 ` Sagi Grimberg 2025-04-28 6:50 ` [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets Hannes Reinecke 1 sibling, 1 reply; 5+ messages in thread From: Hannes Reinecke @ 2025-04-28 6:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Keith Busch, Sagi Grimberg, linux-nvme, Kamaljit Singh, Hannes Reinecke Validate the request in nvme_tcp_handle_r2t() to ensure it's not part of any list, otherwise a malicious R2T PDU might inject a loop in request list processing. Signed-off-by: Hannes Reinecke <hare@kernel.org> --- drivers/nvme/host/tcp.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index a9d455c39652..10ccd7cf5b8c 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -452,7 +452,8 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue) return NULL; } - list_del(&req->entry); + list_del_init(&req->entry); + init_llist_node(&req->lentry); return req; } @@ -560,6 +561,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set, req->queue = queue; nvme_req(rq)->ctrl = &ctrl->ctrl; nvme_req(rq)->cmd = &pdu->cmd; + init_llist_node(&req->lentry); + INIT_LIST_HEAD(&req->entry); return 0; } @@ -764,6 +767,15 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue, return -EPROTO; } + if (queue->request == req || + llist_on_list(&req->lentry) || + !list_empty(&req->entry)) { + dev_err(queue->ctrl->ctrl.device, + "req %d unexpected r2t while processing request\n", + rq->tag); + return -EPROTO; + } + req->pdu_len = 0; req->h2cdata_left = r2t_length; req->h2cdata_offset = r2t_offset; @@ -772,7 +784,8 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue, nvme_tcp_setup_h2c_data_pdu(req); llist_add(&req->lentry, &queue->req_list); - queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work); + if (list_empty(&queue->send_list)) + queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work); return 0; } @@ -2611,6 +2624,8 @@ static void nvme_tcp_submit_async_event(struct nvme_ctrl *arg) ctrl->async_req.offset = 0; ctrl->async_req.curr_bio = NULL; ctrl->async_req.data_len = 0; + init_llist_node(&ctrl->async_req.lentry); + INIT_LIST_HEAD(&ctrl->async_req.entry); nvme_tcp_queue_request(&ctrl->async_req, true); } -- 2.35.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] nvme-tcp: sanitize request list handling 2025-04-28 6:50 ` [PATCH 1/2] nvme-tcp: sanitize request list handling Hannes Reinecke @ 2025-04-28 11:24 ` Sagi Grimberg 0 siblings, 0 replies; 5+ messages in thread From: Sagi Grimberg @ 2025-04-28 11:24 UTC (permalink / raw) To: Hannes Reinecke, Christoph Hellwig Cc: Keith Busch, linux-nvme, Kamaljit Singh On 28/04/2025 9:50, Hannes Reinecke wrote: > Validate the request in nvme_tcp_handle_r2t() to ensure it's not > part of any list, otherwise a malicious R2T PDU might inject a > loop in request list processing. > > Signed-off-by: Hannes Reinecke <hare@kernel.org> > --- > drivers/nvme/host/tcp.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c > index a9d455c39652..10ccd7cf5b8c 100644 > --- a/drivers/nvme/host/tcp.c > +++ b/drivers/nvme/host/tcp.c > @@ -452,7 +452,8 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue) > return NULL; > } > > - list_del(&req->entry); > + list_del_init(&req->entry); > + init_llist_node(&req->lentry); > return req; > } > > @@ -560,6 +561,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set, > req->queue = queue; > nvme_req(rq)->ctrl = &ctrl->ctrl; > nvme_req(rq)->cmd = &pdu->cmd; > + init_llist_node(&req->lentry); > + INIT_LIST_HEAD(&req->entry); > > return 0; > } > @@ -764,6 +767,15 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue, > return -EPROTO; > } > > + if (queue->request == req || This is wrong, please remove this condition. queue->request can still reference the request when handling r2t. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets 2025-04-28 6:50 [PATCHv4 0/2] nvme-tcp: fixup I/O stall on congested sockets Hannes Reinecke 2025-04-28 6:50 ` [PATCH 1/2] nvme-tcp: sanitize request list handling Hannes Reinecke @ 2025-04-28 6:50 ` Hannes Reinecke 2025-04-28 11:24 ` Sagi Grimberg 1 sibling, 1 reply; 5+ messages in thread From: Hannes Reinecke @ 2025-04-28 6:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Keith Busch, Sagi Grimberg, linux-nvme, Kamaljit Singh, Hannes Reinecke, Chris Leech When the socket is busy processing nvme_tcp_try_recv() might return -EAGAIN, but this doesn't automatically imply that the sending side is blocked, too. So check if there are pending requests once nvme_tcp_try_recv() returns -EAGAIN and continue with the sending loop to avoid I/O stalls. Acked-by: Chris Leech <cleech@redhat.com> Signed-off-by: Hannes Reinecke <hare@kernel.org> --- drivers/nvme/host/tcp.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 10ccd7cf5b8c..5c73af022273 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1363,7 +1363,7 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue) queue->nr_cqe = 0; consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb); release_sock(sk); - return consumed; + return consumed == -EAGAIN ? 0 : consumed; } static void nvme_tcp_io_work(struct work_struct *w) @@ -1391,6 +1391,11 @@ static void nvme_tcp_io_work(struct work_struct *w) else if (unlikely(result < 0)) return; + /* did we get some space after spending time in recv? */ + if (nvme_tcp_queue_has_pending(queue) && + sk_stream_is_writeable(queue->sock->sk)) + pending = true; + if (!pending || !queue->rd_enabled) return; -- 2.35.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets 2025-04-28 6:50 ` [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets Hannes Reinecke @ 2025-04-28 11:24 ` Sagi Grimberg 0 siblings, 0 replies; 5+ messages in thread From: Sagi Grimberg @ 2025-04-28 11:24 UTC (permalink / raw) To: Hannes Reinecke, Christoph Hellwig Cc: Keith Busch, linux-nvme, Kamaljit Singh, Chris Leech On 28/04/2025 9:50, Hannes Reinecke wrote: > When the socket is busy processing nvme_tcp_try_recv() might > return -EAGAIN, but this doesn't automatically imply that > the sending side is blocked, too. > So check if there are pending requests once nvme_tcp_try_recv() > returns -EAGAIN and continue with the sending loop to avoid > I/O stalls. > > Acked-by: Chris Leech <cleech@redhat.com> > Signed-off-by: Hannes Reinecke <hare@kernel.org> > --- > drivers/nvme/host/tcp.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c > index 10ccd7cf5b8c..5c73af022273 100644 > --- a/drivers/nvme/host/tcp.c > +++ b/drivers/nvme/host/tcp.c > @@ -1363,7 +1363,7 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue) > queue->nr_cqe = 0; > consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb); > release_sock(sk); > - return consumed; > + return consumed == -EAGAIN ? 0 : consumed; > } > > static void nvme_tcp_io_work(struct work_struct *w) > @@ -1391,6 +1391,11 @@ static void nvme_tcp_io_work(struct work_struct *w) > else if (unlikely(result < 0)) > return; > > + /* did we get some space after spending time in recv? */ > + if (nvme_tcp_queue_has_pending(queue) && > + sk_stream_is_writeable(queue->sock->sk)) > + pending = true; > + > if (!pending || !queue->rd_enabled) > return; > Looks fine, Need Kamaljit to verify it. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-04-28 11:53 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-28 6:50 [PATCHv4 0/2] nvme-tcp: fixup I/O stall on congested sockets Hannes Reinecke 2025-04-28 6:50 ` [PATCH 1/2] nvme-tcp: sanitize request list handling Hannes Reinecke 2025-04-28 11:24 ` Sagi Grimberg 2025-04-28 6:50 ` [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets Hannes Reinecke 2025-04-28 11:24 ` Sagi Grimberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox