[PATCH AUTOSEL 6.14 01/29] nvme-tcp: fix I/O stalls on congested sockets

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.14 01/29] nvme-tcp: fix I/O stalls on congested sockets
@ 2025-06-09 13:44 Sasha Levin
  2025-06-09 13:44 ` [PATCH AUTOSEL 6.14 02/29] nvme-tcp: sanitize request list handling Sasha Levin
  0 siblings, 1 reply; 2+ messages in thread
From: Sasha Levin @ 2025-06-09 13:44 UTC (permalink / raw)
  To: patches, stable
  Cc: Hannes Reinecke, Chris Leech, Sagi Grimberg, Christoph Hellwig,
	Sasha Levin, kbusch, linux-nvme

From: Hannes Reinecke <hare@kernel.org>

[ Upstream commit f42d4796ee100fade86086d1cf98537fb4d326c8 ]

When the socket is busy processing nvme_tcp_try_recv() might return
-EAGAIN, but this doesn't automatically imply that the sending side is
blocked, too.  So check if there are pending requests once
nvme_tcp_try_recv() returns -EAGAIN and continue with the sending loop
to avoid I/O stalls.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Acked-by: Chris Leech <cleech@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

Based on my analysis of the commit and comparison with similar commits,
here is my determination:

**YES**

This commit should be backported to stable kernel trees for the
following reasons:

1. **Fixes a real user-impacting bug**: The commit addresses I/O stalls
   on congested sockets, which is a serious issue that can cause system
   hangs or severe performance degradation. When the socket is congested
   and `nvme_tcp_try_recv()` returns -EAGAIN, the current code
   incorrectly assumes that the sending side is also blocked, leading to
   I/O stalls.

2. **Small and contained fix**: The changes are minimal and localized to
   the `nvme_tcp_io_work()` function:
   - Changes `nvme_tcp_try_recv()` to return 0 instead of -EAGAIN to
     prevent premature exit
   - Adds a check after receive processing to see if the socket became
     writable
   - Only 5 lines of actual code changes

3. **Clear logic fix**: The patch addresses a specific logic error
   where:
   - The receive side returns -EAGAIN (socket would block on receive)
   - But this doesn't mean the send side is also blocked
   - The fix checks if there are pending requests and if the socket is
     writable after receive processing

4. **Similar to other backported fixes**: Looking at the historical
   commits:
   - Commit #2 (backported): Fixed hangs waiting for icresp response
   - Commit #3 (backported): Fixed wrong stop condition in io_work
   - Commit #4 (backported): Fixed UAF when detecting digest errors
   - Commit #5 (backported): Fixed possible null deref on timed out
     connections

   All these commits that were backported involved fixing hangs, stalls,
or error conditions in the nvme-tcp driver.

5. **No architectural changes**: The commit doesn't introduce new
   features or change the architecture. It simply adds a missing check
   to prevent I/O stalls, which aligns with stable kernel criteria.

6. **Critical subsystem**: NVMe-TCP is used for storage access, and I/O
   stalls can have severe consequences for system stability and data
   integrity.

The specific code changes show:
- `return consumed == -EAGAIN ? 0 : consumed;` - prevents treating
  EAGAIN as an error
- The new check `if (nvme_tcp_queue_has_pending(queue) &&
  sk_stream_is_writeable(queue->sock->sk))` ensures that if there are
  pending requests and the socket is writable after receive processing,
  we continue processing instead of stalling.

This is exactly the type of bug fix that should be backported to stable
kernels to ensure reliable storage operation.

 drivers/nvme/host/tcp.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index d991baa82a1c2..a2e825e37b38b 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1349,7 +1349,7 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue)
 	queue->nr_cqe = 0;
 	consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
 	release_sock(sk);
-	return consumed;
+	return consumed == -EAGAIN ? 0 : consumed;
 }
 
 static void nvme_tcp_io_work(struct work_struct *w)
@@ -1377,6 +1377,11 @@ static void nvme_tcp_io_work(struct work_struct *w)
 		else if (unlikely(result < 0))
 			return;
 
+		/* did we get some space after spending time in recv? */
+		if (nvme_tcp_queue_has_pending(queue) &&
+		    sk_stream_is_writeable(queue->sock->sk))
+			pending = true;
+
 		if (!pending || !queue->rd_enabled)
 			return;
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH AUTOSEL 6.14 02/29] nvme-tcp: sanitize request list handling
  2025-06-09 13:44 [PATCH AUTOSEL 6.14 01/29] nvme-tcp: fix I/O stalls on congested sockets Sasha Levin
@ 2025-06-09 13:44 ` Sasha Levin
  0 siblings, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-06-09 13:44 UTC (permalink / raw)
  To: patches, stable
  Cc: Hannes Reinecke, Sagi Grimberg, Christoph Hellwig, Sasha Levin,
	kbusch, linux-nvme

From: Hannes Reinecke <hare@kernel.org>

[ Upstream commit 0bf04c874fcb1ae46a863034296e4b33d8fbd66c ]

Validate the request in nvme_tcp_handle_r2t() to ensure it's not part of
any list, otherwise a malicious R2T PDU might inject a loop in request
list processing.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

Based on my analysis of the commit message and code changes, I provide
the following assessment:

**YES**, this commit should be backported to stable kernel trees.

## Extensive Explanation:

### 1. **Security Vulnerability Fix**
This commit fixes a serious security vulnerability where a malicious
NVMe-TCP target can inject loops in request list processing. The code
changes show clear validation being added to prevent this attack:

```c
if (llist_on_list(&req->lentry) || !list_empty(&req->entry)) {
    dev_err(queue->ctrl->ctrl.device,
        "req %d unexpected r2t while processing request\n",
        rq->tag);
    return -EPROTO;
}
```

### 2. **Attack Vector and Impact**
The vulnerability allows a compromised storage target to send R2T (Ready
to Transfer) PDUs for requests that are already being processed. Without
this validation:
- The same request could be added to lists multiple times
- This creates circular references or list corruption
- Can lead to infinite loops, system hangs, or memory corruption
- Potential for arbitrary code execution through use-after-free
  conditions

### 3. **Minimal and Contained Fix**
The fix is:
- Small in scope (only touches request list handling)
- Well-contained to the nvme-tcp subsystem
- Adds simple validation checks without architectural changes
- Initializes list nodes properly to maintain clean state

### 4. **Clear Bug Fix Pattern**
Looking at the similar commits provided:
- Commit #3 (marked YES for backport) also fixed a validation issue in
  nvme-tcp
- This follows the same pattern of adding necessary validation to
  prevent malicious behavior
- It's fixing a clear bug rather than adding features or optimizations

### 5. **High Security Impact**
This vulnerability affects:
- Any system using NVMe-TCP with untrusted storage targets
- Cloud environments and multi-tenant storage systems
- Data centers with NVMe-over-Fabrics deployments

The security impact is significant enough to warrant backporting to
protect stable kernel users.

### 6. **No Complex Dependencies**
The changes are self-contained and don't depend on major architectural
changes or new features. The fix:
- Uses existing list handling APIs (`llist_on_list`, `list_empty`)
- Adds proper initialization with standard macros
- Returns existing error codes (`-EPROTO`)

### 7. **Follows Stable Kernel Rules**
This commit clearly follows stable kernel criteria:
- Fixes a real bug that affects users (security vulnerability)
- Small change with minimal risk
- No new features or architectural changes
- Important for system security and stability

The commit prevents a serious security issue where malicious storage
devices could compromise system stability or potentially gain
unauthorized access through memory corruption. This makes it a critical
candidate for stable kernel backporting.

 drivers/nvme/host/tcp.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index a2e825e37b38b..28310b124b4ea 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -453,7 +453,8 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue)
 			return NULL;
 	}
 
-	list_del(&req->entry);
+	list_del_init(&req->entry);
+	init_llist_node(&req->lentry);
 	return req;
 }
 
@@ -561,6 +562,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set,
 	req->queue = queue;
 	nvme_req(rq)->ctrl = &ctrl->ctrl;
 	nvme_req(rq)->cmd = &pdu->cmd;
+	init_llist_node(&req->lentry);
+	INIT_LIST_HEAD(&req->entry);
 
 	return 0;
 }
@@ -765,6 +768,14 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue,
 		return -EPROTO;
 	}
 
+	if (llist_on_list(&req->lentry) ||
+	    !list_empty(&req->entry)) {
+		dev_err(queue->ctrl->ctrl.device,
+			"req %d unexpected r2t while processing request\n",
+			rq->tag);
+		return -EPROTO;
+	}
+
 	req->pdu_len = 0;
 	req->h2cdata_left = r2t_length;
 	req->h2cdata_offset = r2t_offset;
@@ -2588,6 +2599,8 @@ static void nvme_tcp_submit_async_event(struct nvme_ctrl *arg)
 	ctrl->async_req.offset = 0;
 	ctrl->async_req.curr_bio = NULL;
 	ctrl->async_req.data_len = 0;
+	init_llist_node(&ctrl->async_req.lentry);
+	INIT_LIST_HEAD(&ctrl->async_req.entry);
 
 	nvme_tcp_queue_request(&ctrl->async_req, true, true);
 }
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-06-09 13:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 13:44 [PATCH AUTOSEL 6.14 01/29] nvme-tcp: fix I/O stalls on congested sockets Sasha Levin
2025-06-09 13:44 ` [PATCH AUTOSEL 6.14 02/29] nvme-tcp: sanitize request list handling Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).