* [PATCH net 0/2] vsock/virtio: collapse receive queue under memory pressure
@ 2026-06-26 13:48 Stefano Garzarella
2026-06-26 13:48 ` [PATCH net 1/2] " Stefano Garzarella
2026-06-26 13:48 ` [PATCH net 2/2] vsock/test: add test for small packets under pressure Stefano Garzarella
0 siblings, 2 replies; 6+ messages in thread
From: Stefano Garzarella @ 2026-06-26 13:48 UTC (permalink / raw)
To: netdev
Cc: Jason Wang, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin, kvm,
virtualization, Xuan Zhuo, Eric Dumazet, Simon Horman,
Stefano Garzarella, linux-kernel, Stefan Hajnoczi,
David S. Miller, Eugenio Pérez
This series contains a patch (the first one) that is part of work I'm
doing to improve the tracking of memory used by AF_VSOCK sockets.
The second patch is a test for our suite that highlights the issue.
Since Brien reported an issue with his environment (based on Linux 6.12.y)
related to the work I’m doing, I extracted this patch and tried to make it
as easy as possible to backport. Brien tested it by backporting it to
6.12.y, which now contains the backport of the 059b7dbd20a6
("vsock/virtio: fix potential unbounded skb queue").
This patch primarily fixes STREAM sockets, but also partially fixes
SEQPACKET (with the exception of EOMs, which are kept in separate skbs to
avoid overcomplicating the code).
The rest of the work, I feel, is more net-next material and still needs
some work to be completed.
Thanks,
Stefano
Stefano Garzarella (2):
vsock/virtio: collapse receive queue under memory pressure
vsock/test: add test for small packets under pressure
net/vmw_vsock/virtio_transport_common.c | 148 +++++++++++++++++++++++-
tools/testing/vsock/vsock_test.c | 87 ++++++++++++++
2 files changed, 233 insertions(+), 2 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net 1/2] vsock/virtio: collapse receive queue under memory pressure
2026-06-26 13:48 [PATCH net 0/2] vsock/virtio: collapse receive queue under memory pressure Stefano Garzarella
@ 2026-06-26 13:48 ` Stefano Garzarella
2026-06-30 9:53 ` Paolo Abeni
2026-07-01 16:34 ` Bobby Eshleman
2026-06-26 13:48 ` [PATCH net 2/2] vsock/test: add test for small packets under pressure Stefano Garzarella
1 sibling, 2 replies; 6+ messages in thread
From: Stefano Garzarella @ 2026-06-26 13:48 UTC (permalink / raw)
To: netdev
Cc: Jason Wang, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin, kvm,
virtualization, Xuan Zhuo, Eric Dumazet, Simon Horman,
Stefano Garzarella, linux-kernel, Stefan Hajnoczi,
David S. Miller, Eugenio Pérez, stable, Brien Oberstein
From: Stefano Garzarella <sgarzare@redhat.com>
When many small packets accumulate in the receive queue, the skb overhead
can exceed buf_alloc even while the payload is within bounds. This causes
virtio_transport_inc_rx_pkt() to reject packets, leading to connection
resets during large transfers under backpressure.
The issue was reported by Brien, who has a reproducer, but it is also
easily reproducible with iperf-vsock [1] using a small packet size:
iperf3 --vsock -c $CID -l 129
which fails immediately without this patch but with commit 059b7dbd20a6
("vsock/virtio: fix potential unbounded skb queue").
Inspired by TCP's tcp_collapse() which solves a similar problem, add
virtio_transport_collapse_rx_queue() that walks the receive queue and
re-copies data into compact linear skbs to reduce the overhead.
The collapse is triggered from virtio_transport_recv_enqueue() when
virtio_transport_inc_rx_pkt() fails. A pre-scan counts the eligible bytes
to size each allocation precisely, avoiding waste for isolated small
packets. Partially consumed skbs are kept as-is to preserve
buf_used/fwd_cnt accounting, EOM-marked skbs to maintain SEQPACKET
message boundaries, and skbs already larger than the collapse target
because they already have a good data-to-overhead ratio.
[1] https://github.com/stefano-garzarella/iperf-vsock
Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
Cc: stable@vger.kernel.org
Reported-by: Brien Oberstein <brienpub@gmail.com>
Closes: https://lore.kernel.org/netdev/618701dd023e$063de350$12b9a9f0$@gmail.com/
Tested-by: Brien Oberstein <brienpub@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
net/vmw_vsock/virtio_transport_common.c | 148 +++++++++++++++++++++++-
1 file changed, 146 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 09475007165b..304ea424995d 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -420,6 +420,137 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
return ret;
}
+static bool virtio_transport_can_collapse(struct sk_buff *skb,
+ unsigned int size)
+{
+ /* skbs that are partially consumed, mark a SEQPACKET message boundary,
+ * or are already large enough should not be collapsed: they either
+ * need special accounting, carry protocol state, or already have a
+ * good data-to-overhead ratio.
+ */
+ if (VIRTIO_VSOCK_SKB_CB(skb)->offset)
+ return false;
+ if (le32_to_cpu(virtio_vsock_hdr(skb)->flags) & VIRTIO_VSOCK_SEQ_EOM)
+ return false;
+ if (skb->len >= size)
+ return false;
+ return true;
+}
+
+/* Iterate through the packets in the queue starting from the current skb to
+ * count the number of bytes we can collapse.
+ */
+static unsigned int
+virtio_transport_collapse_size(struct sk_buff *skb,
+ struct sk_buff_head *queue,
+ unsigned int max_size)
+{
+ unsigned int target = skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset;
+
+ while ((skb = skb_peek_next(skb, queue)) &&
+ virtio_transport_can_collapse(skb, max_size)) {
+ unsigned int len = skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset;
+
+ if (len > max_size - target)
+ return target;
+
+ target += len;
+ }
+
+ return target;
+}
+
+/* Called under lock_sock when skb overhead exceeds the budget. */
+static void virtio_transport_collapse_rx_queue(struct virtio_vsock_sock *vvs)
+{
+ /* Use the same linear allocation threshold as virtio_vsock_alloc_skb()
+ * to avoid adding pressure on the page allocator.
+ */
+ unsigned int collapse_max = SKB_MAX_ORDER(VIRTIO_VSOCK_SKB_HEADROOM,
+ PAGE_ALLOC_COSTLY_ORDER);
+ struct sk_buff *skb, *next_skb, *new_skb = NULL;
+ struct sk_buff_head new_queue;
+
+ __skb_queue_head_init(&new_queue);
+
+ skb_queue_walk_safe(&vvs->rx_queue, skb, next_skb) {
+ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
+ u32 src_off = VIRTIO_VSOCK_SKB_CB(skb)->offset;
+ u32 src_len = skb->len - src_off;
+ bool keep = false;
+
+ if (!virtio_transport_can_collapse(skb, collapse_max)) {
+ /* Finalize pending collapsed skb to preserve packet
+ * ordering.
+ */
+ if (new_skb) {
+ __skb_queue_tail(&new_queue, new_skb);
+ new_skb = NULL;
+ }
+ keep = true;
+ goto next;
+ }
+
+ /* Finalize if this packet won't fit in the remaining tailroom,
+ * so we can allocate a right-sized new_skb.
+ */
+ if (new_skb && src_len > skb_tailroom(new_skb)) {
+ __skb_queue_tail(&new_queue, new_skb);
+ new_skb = NULL;
+ }
+
+ if (!new_skb) {
+ unsigned int alloc_size;
+
+ alloc_size = virtio_transport_collapse_size(skb, &vvs->rx_queue,
+ collapse_max);
+
+ /* Only this skb's data is eligible, nothing to merge
+ * with. Keep as-is.
+ */
+ if (alloc_size <= src_len) {
+ keep = true;
+ goto next;
+ }
+
+ new_skb = virtio_vsock_alloc_linear_skb(alloc_size +
+ VIRTIO_VSOCK_SKB_HEADROOM, GFP_KERNEL);
+ if (!new_skb)
+ goto out;
+
+ memcpy(virtio_vsock_hdr(new_skb), hdr,
+ sizeof(struct virtio_vsock_hdr));
+ virtio_vsock_hdr(new_skb)->len = 0;
+ }
+
+ /* Cannot fail since src_off/src_len are within bounds, but if
+ * it does, discard new_skb to avoid queuing corrupted data.
+ */
+ if (WARN_ON_ONCE(skb_copy_bits(skb, src_off,
+ skb_put(new_skb, src_len),
+ src_len))) {
+ kfree_skb(new_skb);
+ new_skb = NULL;
+ goto out;
+ }
+
+ le32_add_cpu(&virtio_vsock_hdr(new_skb)->len, src_len);
+ virtio_vsock_hdr(new_skb)->flags |= hdr->flags;
+
+next:
+ __skb_unlink(skb, &vvs->rx_queue);
+ if (keep)
+ __skb_queue_tail(&new_queue, skb);
+ else
+ consume_skb(skb);
+ }
+out:
+ if (new_skb)
+ __skb_queue_tail(&new_queue, new_skb);
+
+ skb_queue_splice(&new_queue, &vvs->rx_queue);
+}
+
static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
u32 len)
{
@@ -1363,8 +1494,21 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
spin_lock_bh(&vvs->rx_lock);
can_enqueue = virtio_transport_inc_rx_pkt(vvs, len);
- if (!can_enqueue)
- goto out;
+ if (!can_enqueue) {
+ /* Try to collapse the receive queue to reduce skb overhead and
+ * make room for this packet.
+ * Unlock rx_lock since the collapse may sleep or, in any case,
+ * take some time to collapse the skbs, but this is safe, since
+ * sk_lock is held by caller so no one else can enqueue or
+ * dequeue.
+ */
+ spin_unlock_bh(&vvs->rx_lock);
+ virtio_transport_collapse_rx_queue(vvs);
+ spin_lock_bh(&vvs->rx_lock);
+ can_enqueue = virtio_transport_inc_rx_pkt(vvs, len);
+ if (!can_enqueue)
+ goto out;
+ }
if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)
vvs->msg_count++;
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net 2/2] vsock/test: add test for small packets under pressure
2026-06-26 13:48 [PATCH net 0/2] vsock/virtio: collapse receive queue under memory pressure Stefano Garzarella
2026-06-26 13:48 ` [PATCH net 1/2] " Stefano Garzarella
@ 2026-06-26 13:48 ` Stefano Garzarella
1 sibling, 0 replies; 6+ messages in thread
From: Stefano Garzarella @ 2026-06-26 13:48 UTC (permalink / raw)
To: netdev
Cc: Jason Wang, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin, kvm,
virtualization, Xuan Zhuo, Eric Dumazet, Simon Horman,
Stefano Garzarella, linux-kernel, Stefan Hajnoczi,
David S. Miller, Eugenio Pérez
From: Stefano Garzarella <sgarzare@redhat.com>
Add a test that sends 2 MB of data using randomly sized small packets
(129-512 bytes) over a SOCK_STREAM connection. Packets above
GOOD_COPY_LEN (128) bypass the in-place coalescing in recv_enqueue(),
forcing each one into its own skb.
Without receive queue collapsing, the per-skb overhead eventually
exceeds buf_alloc and the connection is reset. The test verifies
that all data arrives and that content integrity is preserved.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
tools/testing/vsock/vsock_test.c | 87 ++++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 76be0e4a7f0e..b4ff9f946565 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -2347,6 +2347,88 @@ static void test_stream_tx_credit_bounds_server(const struct test_opts *opts)
close(fd);
}
+/* Test that many small packets don't cause a connection reset under pressure
+ * and that data integrity is preserved. Packet sizes vary randomly between
+ * 129 and 512 bytes, above GOOD_COPY_LEN (128) to bypass in-place coalescing
+ * in recv_enqueue, forcing each one into its own skb. Without receive queue
+ * collapsing, the per-skb overhead eventually exceeds buf_alloc and the
+ * connection is reset.
+ */
+#define COLLAPSE_PKT_MIN 129
+#define COLLAPSE_PKT_MAX 512
+#define COLLAPSE_TOTAL (2 * 1024 * 1024)
+
+static void test_stream_collapse_client(const struct test_opts *opts)
+{
+ unsigned char *data;
+ unsigned long hash;
+ size_t offset = 0;
+ int i, fd;
+
+ data = malloc(COLLAPSE_TOTAL);
+ if (!data) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ for (i = 0; i < COLLAPSE_TOTAL; i++)
+ data[i] = rand() & 0xff;
+
+ fd = vsock_stream_connect(opts->peer_cid, opts->peer_port);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ while (offset < COLLAPSE_TOTAL) {
+ size_t pkt_size = COLLAPSE_PKT_MIN +
+ rand() % (COLLAPSE_PKT_MAX - COLLAPSE_PKT_MIN + 1);
+
+ pkt_size = min(pkt_size, COLLAPSE_TOTAL - offset);
+
+ send_buf(fd, data + offset, pkt_size, 0, pkt_size);
+ offset += pkt_size;
+ }
+
+ hash = hash_djb2(data, COLLAPSE_TOTAL);
+ control_writeulong(hash);
+
+ free(data);
+ close(fd);
+}
+
+static void test_stream_collapse_server(const struct test_opts *opts)
+{
+ unsigned long hash, remote_hash;
+ unsigned char *data;
+ int fd;
+
+ data = malloc(COLLAPSE_TOTAL);
+ if (!data) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ fd = vsock_stream_accept(VMADDR_CID_ANY, opts->peer_port, NULL);
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ recv_buf(fd, data, COLLAPSE_TOTAL, 0, COLLAPSE_TOTAL);
+
+ hash = hash_djb2(data, COLLAPSE_TOTAL);
+ remote_hash = control_readulong();
+ if (hash != remote_hash) {
+ fprintf(stderr, "hash mismatch: local %lu remote %lu\n",
+ hash, remote_hash);
+ exit(EXIT_FAILURE);
+ }
+
+ free(data);
+ close(fd);
+}
+
static struct test_case test_cases[] = {
{
.name = "SOCK_STREAM connection reset",
@@ -2546,6 +2628,11 @@ static struct test_case test_cases[] = {
.run_client = test_stream_msg_peek_client,
.run_server = test_stream_peek_after_recv_server,
},
+ {
+ .name = "SOCK_STREAM small packets backpressure",
+ .run_client = test_stream_collapse_client,
+ .run_server = test_stream_collapse_server,
+ },
{},
};
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net 1/2] vsock/virtio: collapse receive queue under memory pressure
2026-06-26 13:48 ` [PATCH net 1/2] " Stefano Garzarella
@ 2026-06-30 9:53 ` Paolo Abeni
2026-07-01 10:13 ` Stefano Garzarella
2026-07-01 16:34 ` Bobby Eshleman
1 sibling, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2026-06-30 9:53 UTC (permalink / raw)
To: Stefano Garzarella, netdev
Cc: Jason Wang, Jakub Kicinski, Michael S. Tsirkin, kvm,
virtualization, Xuan Zhuo, Eric Dumazet, Simon Horman,
linux-kernel, Stefan Hajnoczi, David S. Miller,
Eugenio Pérez, stable, Brien Oberstein
On 6/26/26 3:48 PM, Stefano Garzarella wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
>
> When many small packets accumulate in the receive queue, the skb overhead
> can exceed buf_alloc even while the payload is within bounds. This causes
> virtio_transport_inc_rx_pkt() to reject packets, leading to connection
> resets during large transfers under backpressure.
>
> The issue was reported by Brien, who has a reproducer, but it is also
> easily reproducible with iperf-vsock [1] using a small packet size:
>
> iperf3 --vsock -c $CID -l 129
>
> which fails immediately without this patch but with commit 059b7dbd20a6
> ("vsock/virtio: fix potential unbounded skb queue").
>
> Inspired by TCP's tcp_collapse() which solves a similar problem, add
> virtio_transport_collapse_rx_queue() that walks the receive queue and
> re-copies data into compact linear skbs to reduce the overhead.
>
> The collapse is triggered from virtio_transport_recv_enqueue() when
> virtio_transport_inc_rx_pkt() fails. A pre-scan counts the eligible bytes
> to size each allocation precisely, avoiding waste for isolated small
> packets. Partially consumed skbs are kept as-is to preserve
> buf_used/fwd_cnt accounting, EOM-marked skbs to maintain SEQPACKET
> message boundaries, and skbs already larger than the collapse target
> because they already have a good data-to-overhead ratio.
>
> [1] https://github.com/stefano-garzarella/iperf-vsock
>
> Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
> Cc: stable@vger.kernel.org
> Reported-by: Brien Oberstein <brienpub@gmail.com>
> Closes: https://lore.kernel.org/netdev/618701dd023e$063de350$12b9a9f0$@gmail.com/
> Tested-by: Brien Oberstein <brienpub@gmail.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> net/vmw_vsock/virtio_transport_common.c | 148 +++++++++++++++++++++++-
> 1 file changed, 146 insertions(+), 2 deletions(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 09475007165b..304ea424995d 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -420,6 +420,137 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> return ret;
> }
>
> +static bool virtio_transport_can_collapse(struct sk_buff *skb,
> + unsigned int size)
Why passing a `size` argument here? AFAICS the actual argument is always
a constant and IMHO rightfully so.
> +{
> + /* skbs that are partially consumed, mark a SEQPACKET message boundary,
> + * or are already large enough should not be collapsed: they either
> + * need special accounting, carry protocol state, or already have a
> + * good data-to-overhead ratio.
> + */
> + if (VIRTIO_VSOCK_SKB_CB(skb)->offset)
> + return false;
> + if (le32_to_cpu(virtio_vsock_hdr(skb)->flags) & VIRTIO_VSOCK_SEQ_EOM)
> + return false;
> + if (skb->len >= size)
> + return false;
> + return true;
> +}
> +
> +/* Iterate through the packets in the queue starting from the current skb to
> + * count the number of bytes we can collapse.
> + */
> +static unsigned int
> +virtio_transport_collapse_size(struct sk_buff *skb,
> + struct sk_buff_head *queue,
> + unsigned int max_size)
> +{
> + unsigned int target = skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset;
> +
> + while ((skb = skb_peek_next(skb, queue)) &&
> + virtio_transport_can_collapse(skb, max_size)) {
> + unsigned int len = skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset;
> +
> + if (len > max_size - target)
> + return target;
> +
> + target += len;
> + }
> +
> + return target;
> +}
> +
> +/* Called under lock_sock when skb overhead exceeds the budget. */
> +static void virtio_transport_collapse_rx_queue(struct virtio_vsock_sock *vvs)
> +{
> + /* Use the same linear allocation threshold as virtio_vsock_alloc_skb()
> + * to avoid adding pressure on the page allocator.
> + */
> + unsigned int collapse_max = SKB_MAX_ORDER(VIRTIO_VSOCK_SKB_HEADROOM,
> + PAGE_ALLOC_COSTLY_ORDER);
> + struct sk_buff *skb, *next_skb, *new_skb = NULL;
> + struct sk_buff_head new_queue;
> +
> + __skb_queue_head_init(&new_queue);
> +
> + skb_queue_walk_safe(&vvs->rx_queue, skb, next_skb) {
If the queue is relevantly big, walking all of it may take a significant
amount of time/cache misses and causes traffic burstines. I think you
could add an additional stop condition, i.e. when the current queue size
is below a reasonable threshold (allowing the current packet to be
inserted plus some more slack).
/P
> + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
> + u32 src_off = VIRTIO_VSOCK_SKB_CB(skb)->offset;
> + u32 src_len = skb->len - src_off;
> + bool keep = false;
> +
> + if (!virtio_transport_can_collapse(skb, collapse_max)) {
Minor nit, possibly something alike the following lead to more
compact/more readable code:
keep = !virtio_transport_can_collapse(skb, collapse_max);
if (keep) {
> + /* Finalize pending collapsed skb to preserve packet
> + * ordering.
> + */
> + if (new_skb) {
> + __skb_queue_tail(&new_queue, new_skb);
> + new_skb = NULL;
> + }
> + keep = true;
> + goto next;
> + }
> +
> + /* Finalize if this packet won't fit in the remaining tailroom,
> + * so we can allocate a right-sized new_skb.
> + */
> + if (new_skb && src_len > skb_tailroom(new_skb)) {
> + __skb_queue_tail(&new_queue, new_skb);
> + new_skb = NULL;
Possibly introduce an helper for the above 2 statements?
/P
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net 1/2] vsock/virtio: collapse receive queue under memory pressure
2026-06-30 9:53 ` Paolo Abeni
@ 2026-07-01 10:13 ` Stefano Garzarella
0 siblings, 0 replies; 6+ messages in thread
From: Stefano Garzarella @ 2026-07-01 10:13 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, Jason Wang, Jakub Kicinski, Michael S. Tsirkin, kvm,
virtualization, Xuan Zhuo, Eric Dumazet, Simon Horman,
linux-kernel, Stefan Hajnoczi, David S. Miller,
Eugenio Pérez, stable, Brien Oberstein
On Tue, Jun 30, 2026 at 11:53:04AM +0200, Paolo Abeni wrote:
>On 6/26/26 3:48 PM, Stefano Garzarella wrote:
>> From: Stefano Garzarella <sgarzare@redhat.com>
>>
>> When many small packets accumulate in the receive queue, the skb overhead
>> can exceed buf_alloc even while the payload is within bounds. This causes
>> virtio_transport_inc_rx_pkt() to reject packets, leading to connection
>> resets during large transfers under backpressure.
>>
>> The issue was reported by Brien, who has a reproducer, but it is also
>> easily reproducible with iperf-vsock [1] using a small packet size:
>>
>> iperf3 --vsock -c $CID -l 129
>>
>> which fails immediately without this patch but with commit 059b7dbd20a6
>> ("vsock/virtio: fix potential unbounded skb queue").
>>
>> Inspired by TCP's tcp_collapse() which solves a similar problem, add
>> virtio_transport_collapse_rx_queue() that walks the receive queue and
>> re-copies data into compact linear skbs to reduce the overhead.
>>
>> The collapse is triggered from virtio_transport_recv_enqueue() when
>> virtio_transport_inc_rx_pkt() fails. A pre-scan counts the eligible bytes
>> to size each allocation precisely, avoiding waste for isolated small
>> packets. Partially consumed skbs are kept as-is to preserve
>> buf_used/fwd_cnt accounting, EOM-marked skbs to maintain SEQPACKET
>> message boundaries, and skbs already larger than the collapse target
>> because they already have a good data-to-overhead ratio.
>>
>> [1] https://github.com/stefano-garzarella/iperf-vsock
>>
>> Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
>> Cc: stable@vger.kernel.org
>> Reported-by: Brien Oberstein <brienpub@gmail.com>
>> Closes: https://lore.kernel.org/netdev/618701dd023e$063de350$12b9a9f0$@gmail.com/
>> Tested-by: Brien Oberstein <brienpub@gmail.com>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>> net/vmw_vsock/virtio_transport_common.c | 148 +++++++++++++++++++++++-
>> 1 file changed, 146 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 09475007165b..304ea424995d 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -420,6 +420,137 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> return ret;
>> }
>>
>> +static bool virtio_transport_can_collapse(struct sk_buff *skb,
>> + unsigned int size)
>
>Why passing a `size` argument here? AFAICS the actual argument is always
>a constant and IMHO rightfully so.
This comes from a previous implementation where this was not constant.
With the current code, I agree that a macro should be better.
I'll fix it.
>
>> +{
>> + /* skbs that are partially consumed, mark a SEQPACKET message boundary,
>> + * or are already large enough should not be collapsed: they either
>> + * need special accounting, carry protocol state, or already have a
>> + * good data-to-overhead ratio.
>> + */
>> + if (VIRTIO_VSOCK_SKB_CB(skb)->offset)
>> + return false;
>> + if (le32_to_cpu(virtio_vsock_hdr(skb)->flags) & VIRTIO_VSOCK_SEQ_EOM)
>> + return false;
>> + if (skb->len >= size)
>> + return false;
>> + return true;
>> +}
>> +
>> +/* Iterate through the packets in the queue starting from the current skb to
>> + * count the number of bytes we can collapse.
>> + */
>> +static unsigned int
>> +virtio_transport_collapse_size(struct sk_buff *skb,
>> + struct sk_buff_head *queue,
>> + unsigned int max_size)
>> +{
>> + unsigned int target = skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset;
>> +
>> + while ((skb = skb_peek_next(skb, queue)) &&
>> + virtio_transport_can_collapse(skb, max_size)) {
>> + unsigned int len = skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset;
>> +
>> + if (len > max_size - target)
>> + return target;
>> +
>> + target += len;
>> + }
>> +
>> + return target;
>> +}
>> +
>> +/* Called under lock_sock when skb overhead exceeds the budget. */
>> +static void virtio_transport_collapse_rx_queue(struct virtio_vsock_sock *vvs)
>> +{
>> + /* Use the same linear allocation threshold as virtio_vsock_alloc_skb()
>> + * to avoid adding pressure on the page allocator.
>> + */
>> + unsigned int collapse_max = SKB_MAX_ORDER(VIRTIO_VSOCK_SKB_HEADROOM,
>> + PAGE_ALLOC_COSTLY_ORDER);
>> + struct sk_buff *skb, *next_skb, *new_skb = NULL;
>> + struct sk_buff_head new_queue;
>> +
>> + __skb_queue_head_init(&new_queue);
>> +
>> + skb_queue_walk_safe(&vvs->rx_queue, skb, next_skb) {
>
>If the queue is relevantly big, walking all of it may take a significant
>amount of time/cache misses and causes traffic burstines. I think you
>could add an additional stop condition, i.e. when the current queue size
>is below a reasonable threshold (allowing the current packet to be
>inserted plus some more slack).
Makes sense, any suggestion on the threshold?
I was thinking something like this: merge until we have space for at
least 2 skbs (because for now we estimate the overhead based on the
number of skbs, but in the future I'd like to support truesize), but
still trying to fill collapse_max as much as possible.
Does that make sense, or should we be more aggressive?
>
>/P
>
>> + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
>> + u32 src_off = VIRTIO_VSOCK_SKB_CB(skb)->offset;
>> + u32 src_len = skb->len - src_off;
>> + bool keep = false;
>> +
>> + if (!virtio_transport_can_collapse(skb, collapse_max)) {
>
>Minor nit, possibly something alike the following lead to more
>compact/more readable code:
>
>
> keep = !virtio_transport_can_collapse(skb, collapse_max);
> if (keep) {
>
Yeah, so I can remove the initialization to false. I'll change it.
>> + /* Finalize pending collapsed skb to preserve packet
>> + * ordering.
>> + */
>> + if (new_skb) {
>> + __skb_queue_tail(&new_queue, new_skb);
>> + new_skb = NULL;
>> + }
>> + keep = true;
>> + goto next;
>> + }
>> +
>> + /* Finalize if this packet won't fit in the remaining tailroom,
>> + * so we can allocate a right-sized new_skb.
>> + */
>> + if (new_skb && src_len > skb_tailroom(new_skb)) {
>> + __skb_queue_tail(&new_queue, new_skb);
>> + new_skb = NULL;
>
>Possibly introduce an helper for the above 2 statements?
Do you mean something like this?
static void virtio_transport_queue_skb(struct sk_buff_head *queue,
struct sk_buff **skb)
{
__skb_queue_tail(queue, *skb);
*skb = NULL;
}
Not sure, just for 2 places, but if you prefer it, I can change.
Thanks,
Stefano
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net 1/2] vsock/virtio: collapse receive queue under memory pressure
2026-06-26 13:48 ` [PATCH net 1/2] " Stefano Garzarella
2026-06-30 9:53 ` Paolo Abeni
@ 2026-07-01 16:34 ` Bobby Eshleman
1 sibling, 0 replies; 6+ messages in thread
From: Bobby Eshleman @ 2026-07-01 16:34 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, Jason Wang, Jakub Kicinski, Paolo Abeni,
Michael S. Tsirkin, kvm, virtualization, Xuan Zhuo, Eric Dumazet,
Simon Horman, linux-kernel, Stefan Hajnoczi, David S. Miller,
Eugenio Pérez, stable, Brien Oberstein
On Fri, Jun 26, 2026 at 03:48:22PM +0200, Stefano Garzarella wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
>
> When many small packets accumulate in the receive queue, the skb overhead
> can exceed buf_alloc even while the payload is within bounds. This causes
> virtio_transport_inc_rx_pkt() to reject packets, leading to connection
> resets during large transfers under backpressure.
>
> The issue was reported by Brien, who has a reproducer, but it is also
> easily reproducible with iperf-vsock [1] using a small packet size:
>
> iperf3 --vsock -c $CID -l 129
>
> which fails immediately without this patch but with commit 059b7dbd20a6
> ("vsock/virtio: fix potential unbounded skb queue").
>
> Inspired by TCP's tcp_collapse() which solves a similar problem, add
> virtio_transport_collapse_rx_queue() that walks the receive queue and
> re-copies data into compact linear skbs to reduce the overhead.
>
> The collapse is triggered from virtio_transport_recv_enqueue() when
> virtio_transport_inc_rx_pkt() fails. A pre-scan counts the eligible bytes
> to size each allocation precisely, avoiding waste for isolated small
> packets. Partially consumed skbs are kept as-is to preserve
> buf_used/fwd_cnt accounting, EOM-marked skbs to maintain SEQPACKET
> message boundaries, and skbs already larger than the collapse target
> because they already have a good data-to-overhead ratio.
>
> [1] https://github.com/stefano-garzarella/iperf-vsock
>
> Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
> Cc: stable@vger.kernel.org
> Reported-by: Brien Oberstein <brienpub@gmail.com>
> Closes: https://lore.kernel.org/netdev/618701dd023e$063de350$12b9a9f0$@gmail.com/
> Tested-by: Brien Oberstein <brienpub@gmail.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> net/vmw_vsock/virtio_transport_common.c | 148 +++++++++++++++++++++++-
> 1 file changed, 146 insertions(+), 2 deletions(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 09475007165b..304ea424995d 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -420,6 +420,137 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> return ret;
> }
>
> +static bool virtio_transport_can_collapse(struct sk_buff *skb,
> + unsigned int size)
> +{
> + /* skbs that are partially consumed, mark a SEQPACKET message boundary,
> + * or are already large enough should not be collapsed: they either
> + * need special accounting, carry protocol state, or already have a
> + * good data-to-overhead ratio.
> + */
> + if (VIRTIO_VSOCK_SKB_CB(skb)->offset)
> + return false;
> + if (le32_to_cpu(virtio_vsock_hdr(skb)->flags) & VIRTIO_VSOCK_SEQ_EOM)
> + return false;
> + if (skb->len >= size)
> + return false;
> + return true;
> +}
> +
> +/* Iterate through the packets in the queue starting from the current skb to
> + * count the number of bytes we can collapse.
> + */
> +static unsigned int
> +virtio_transport_collapse_size(struct sk_buff *skb,
> + struct sk_buff_head *queue,
> + unsigned int max_size)
> +{
> + unsigned int target = skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset;
> +
> + while ((skb = skb_peek_next(skb, queue)) &&
> + virtio_transport_can_collapse(skb, max_size)) {
> + unsigned int len = skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset;
> +
> + if (len > max_size - target)
> + return target;
> +
> + target += len;
> + }
> +
> + return target;
> +}
> +
> +/* Called under lock_sock when skb overhead exceeds the budget. */
> +static void virtio_transport_collapse_rx_queue(struct virtio_vsock_sock *vvs)
> +{
> + /* Use the same linear allocation threshold as virtio_vsock_alloc_skb()
> + * to avoid adding pressure on the page allocator.
> + */
> + unsigned int collapse_max = SKB_MAX_ORDER(VIRTIO_VSOCK_SKB_HEADROOM,
> + PAGE_ALLOC_COSTLY_ORDER);
> + struct sk_buff *skb, *next_skb, *new_skb = NULL;
> + struct sk_buff_head new_queue;
> +
> + __skb_queue_head_init(&new_queue);
> +
> + skb_queue_walk_safe(&vvs->rx_queue, skb, next_skb) {
> + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
> + u32 src_off = VIRTIO_VSOCK_SKB_CB(skb)->offset;
> + u32 src_len = skb->len - src_off;
> + bool keep = false;
> +
> + if (!virtio_transport_can_collapse(skb, collapse_max)) {
> + /* Finalize pending collapsed skb to preserve packet
> + * ordering.
> + */
> + if (new_skb) {
> + __skb_queue_tail(&new_queue, new_skb);
> + new_skb = NULL;
> + }
> + keep = true;
> + goto next;
> + }
> +
> + /* Finalize if this packet won't fit in the remaining tailroom,
> + * so we can allocate a right-sized new_skb.
> + */
> + if (new_skb && src_len > skb_tailroom(new_skb)) {
> + __skb_queue_tail(&new_queue, new_skb);
> + new_skb = NULL;
> + }
> +
> + if (!new_skb) {
> + unsigned int alloc_size;
> +
> + alloc_size = virtio_transport_collapse_size(skb, &vvs->rx_queue,
> + collapse_max);
> +
> + /* Only this skb's data is eligible, nothing to merge
> + * with. Keep as-is.
> + */
> + if (alloc_size <= src_len) {
> + keep = true;
> + goto next;
> + }
> +
> + new_skb = virtio_vsock_alloc_linear_skb(alloc_size +
> + VIRTIO_VSOCK_SKB_HEADROOM, GFP_KERNEL);
> + if (!new_skb)
> + goto out;
> +
> + memcpy(virtio_vsock_hdr(new_skb), hdr,
> + sizeof(struct virtio_vsock_hdr));
> + virtio_vsock_hdr(new_skb)->len = 0;
> + }
> +
> + /* Cannot fail since src_off/src_len are within bounds, but if
> + * it does, discard new_skb to avoid queuing corrupted data.
> + */
> + if (WARN_ON_ONCE(skb_copy_bits(skb, src_off,
> + skb_put(new_skb, src_len),
> + src_len))) {
> + kfree_skb(new_skb);
> + new_skb = NULL;
> + goto out;
> + }
> +
> + le32_add_cpu(&virtio_vsock_hdr(new_skb)->len, src_len);
> + virtio_vsock_hdr(new_skb)->flags |= hdr->flags;
> +
> +next:
> + __skb_unlink(skb, &vvs->rx_queue);
> + if (keep)
> + __skb_queue_tail(&new_queue, skb);
> + else
> + consume_skb(skb);
> + }
> +out:
> + if (new_skb)
> + __skb_queue_tail(&new_queue, new_skb);
> +
> + skb_queue_splice(&new_queue, &vvs->rx_queue);
I think the new skbs will also need skb_set_owner_sk_safe(skb, sk)
when adding to rx_queue?
Best,
Bobby
> +}
> +
> static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> u32 len)
> {
> @@ -1363,8 +1494,21 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> spin_lock_bh(&vvs->rx_lock);
>
> can_enqueue = virtio_transport_inc_rx_pkt(vvs, len);
> - if (!can_enqueue)
> - goto out;
> + if (!can_enqueue) {
> + /* Try to collapse the receive queue to reduce skb overhead and
> + * make room for this packet.
> + * Unlock rx_lock since the collapse may sleep or, in any case,
> + * take some time to collapse the skbs, but this is safe, since
> + * sk_lock is held by caller so no one else can enqueue or
> + * dequeue.
> + */
> + spin_unlock_bh(&vvs->rx_lock);
> + virtio_transport_collapse_rx_queue(vvs);
> + spin_lock_bh(&vvs->rx_lock);
> + can_enqueue = virtio_transport_inc_rx_pkt(vvs, len);
> + if (!can_enqueue)
> + goto out;
> + }
>
> if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)
> vvs->msg_count++;
> --
> 2.54.0
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-07-01 16:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 13:48 [PATCH net 0/2] vsock/virtio: collapse receive queue under memory pressure Stefano Garzarella
2026-06-26 13:48 ` [PATCH net 1/2] " Stefano Garzarella
2026-06-30 9:53 ` Paolo Abeni
2026-07-01 10:13 ` Stefano Garzarella
2026-07-01 16:34 ` Bobby Eshleman
2026-06-26 13:48 ` [PATCH net 2/2] vsock/test: add test for small packets under pressure Stefano Garzarella
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox