From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Laurent Vivier <lvivier@redhat.com>,
Lei Yang <leiyang@redhat.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
Jason Wang <jasowang@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
Paolo Abeni <pabeni@redhat.com>, Sasha Levin <sashal@kernel.org>,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 6.15 2/8] virtio_net: Enforce minimum TX ring size for reliability
Date: Mon, 7 Jul 2025 20:02:09 -0400 [thread overview]
Message-ID: <20250708000215.793090-2-sashal@kernel.org> (raw)
In-Reply-To: <20250708000215.793090-1-sashal@kernel.org>
From: Laurent Vivier <lvivier@redhat.com>
[ Upstream commit 24b2f5df86aaebbe7bac40304eaf5a146c02367c ]
The `tx_may_stop()` logic stops TX queues if free descriptors
(`sq->vq->num_free`) fall below the threshold of (`MAX_SKB_FRAGS` + 2).
If the total ring size (`ring_num`) is not strictly greater than this
value, queues can become persistently stopped or stop after minimal
use, severely degrading performance.
A single sk_buff transmission typically requires descriptors for:
- The virtio_net_hdr (1 descriptor)
- The sk_buff's linear data (head) (1 descriptor)
- Paged fragments (up to MAX_SKB_FRAGS descriptors)
This patch enforces that the TX ring size ('ring_num') must be strictly
greater than (MAX_SKB_FRAGS + 2). This ensures that the ring is
always large enough to hold at least one maximally-fragmented packet
plus at least one additional slot.
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-4-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
**YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## 1. **Critical Bug Fix**
The commit fixes a severe bug where TX queues can become permanently
stopped, causing complete network transmission failure. Looking at the
code change, it adds a crucial validation in `virtnet_tx_resize()`:
```c
if (ring_num <= MAX_SKB_FRAGS + 2) {
netdev_err(vi->dev, "tx size (%d) cannot be smaller than %d\n",
ring_num, MAX_SKB_FRAGS + 2);
return -EINVAL;
}
```
This prevents a configuration that would make the network interface
essentially unusable.
## 2. **Root Cause Analysis**
The bug occurs because the `tx_may_stop()` logic (used throughout
virtio_net for flow control) stops the TX queue when free descriptors
fall below `MAX_SKB_FRAGS + 2`. If the total ring size is not strictly
greater than this threshold, the queue can:
- Stop after transmitting just one packet
- Never have enough free slots to wake up again
- Result in a permanently stalled TX queue
## 3. **User Impact**
This affects real users who:
- Use virtio_net in virtualized environments (very common in cloud/VM
deployments)
- Configure TX ring sizes via ethtool or other management tools
- May unknowingly set ring sizes that trigger this bug
The commit message explicitly mentions a report from Lei Yang at Red
Hat, indicating this was discovered through real-world usage.
## 4. **Minimal and Safe Fix**
The fix is:
- **Small**: Only 6 lines of code added
- **Contained**: Only affects ring resize operations
- **Safe**: Adds validation without changing existing behavior for valid
configurations
- **Clear**: Returns -EINVAL with a descriptive error message
## 5. **Meets Stable Criteria**
According to stable kernel rules, this commit qualifies because it:
- Fixes a bug that affects users (network stalls)
- Is relatively small and contained
- Has no major side effects
- Doesn't introduce new features
- Has been tested (by Lei Yang) and reviewed by maintainers (Jason Wang,
Michael S. Tsirkin)
## 6. **Comparison with Similar Commits**
Looking at the historical commits:
- Similar commit #3 (virtio_net: ensure netdev_tx_reset_queue) was
backported because it fixed TX queue handling issues
- This commit is even more critical as it prevents complete TX failure
- The fix is simpler than commit #3 (just validation vs. callback
handling)
## 7. **No Architectural Changes**
Unlike commits #1 and #2 which made architectural changes to ring
sizing, this commit:
- Only adds input validation
- Doesn't change data structures
- Doesn't modify existing workflows
- Simply prevents an invalid configuration
The fact that experienced maintainers like Jason Wang and Michael S.
Tsirkin acked this change further supports its importance for stable
backporting. This is a textbook example of a fix that should go to
stable: it addresses a real bug with minimal risk.
drivers/net/virtio_net.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e53ba600605a5..3054b2b4f6a0c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3481,6 +3481,12 @@ static int virtnet_tx_resize(struct virtnet_info *vi, struct send_queue *sq,
{
int qindex, err;
+ if (ring_num <= MAX_SKB_FRAGS + 2) {
+ netdev_err(vi->dev, "tx size (%d) cannot be smaller than %d\n",
+ ring_num, MAX_SKB_FRAGS + 2);
+ return -EINVAL;
+ }
+
qindex = sq - vi->sq;
virtnet_tx_pause(vi, sq);
--
2.39.5
next prev parent reply other threads:[~2025-07-08 0:02 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-08 0:02 [PATCH AUTOSEL 6.15 1/8] Revert "ACPI: battery: negate current when discharging" Sasha Levin
2025-07-08 0:02 ` Sasha Levin [this message]
2025-07-08 0:02 ` [PATCH AUTOSEL 6.15 3/8] virtio_ring: Fix error reporting in virtqueue_resize Sasha Levin
2025-07-08 0:02 ` [PATCH AUTOSEL 6.15 4/8] drm/amd/display: Don't allow OLED to go down to fully off Sasha Levin
2025-07-08 0:02 ` [PATCH AUTOSEL 6.15 5/8] regulator: core: fix NULL dereference on unbind due to stale coupling data Sasha Levin
2025-07-08 0:02 ` [PATCH AUTOSEL 6.15 6/8] PM: Restrict swap use to later in the suspend sequence Sasha Levin
2025-07-08 6:25 ` Pavel Machek
2025-07-08 6:39 ` Pavel Machek
2025-07-08 19:13 ` Eric W. Biederman
2025-07-08 19:32 ` Eric W. Biederman
2025-07-08 20:32 ` Sasha Levin
2025-07-08 20:37 ` Pavel Machek
2025-07-08 20:46 ` Willy Tarreau
2025-07-08 20:49 ` Pavel Machek
2025-07-08 21:12 ` Sasha Levin
2025-07-08 21:26 ` Pavel Machek
2025-07-09 5:34 ` Pavel Machek
2025-07-08 20:41 ` Pavel Machek
2025-07-08 21:46 ` Eric W. Biederman
2025-07-08 22:26 ` Sasha Levin
2025-07-09 5:39 ` Pavel Machek
2025-07-09 14:35 ` Mario Limonciello
2025-07-09 16:23 ` Eric W. Biederman
2025-07-09 16:35 ` Mario Limonciello
2025-07-09 16:55 ` Rafael J. Wysocki
2025-07-09 17:37 ` Sasha Levin
2025-07-08 20:38 ` Pavel Machek
2025-07-08 0:02 ` [PATCH AUTOSEL 6.15 7/8] platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8406CA Sasha Levin
2025-07-08 0:02 ` [PATCH AUTOSEL 6.15 8/8] RDMA/core: Rate limit GID cache warning messages Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250708000215.793090-2-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=leiyang@redhat.com \
--cc=lvivier@redhat.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.