[PATCH net] vsock/virtio: fix potential unbounded skb queue

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH net] vsock/virtio: fix potential unbounded skb queue
@ 2026-04-30 12:26 Eric Dumazet
  2026-05-05  2:20 ` patchwork-bot+netdevbpf
  2026-05-05 13:51 ` Stefano Garzarella
  0 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2026-04-30 12:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet, Arseniy Krasnov,
	Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, kvm, virtualization

virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.

virtio_transport_recv_enqueue() skips coalescing for packets
with VIRTIO_VSOCK_SEQ_EOM.

If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
a very large number of packets can be queued
because vvs->rx_bytes stays at 0.

Fix this by estimating the skb metadata size:

	(Number of skbs in the queue) * SKB_TRUESIZE(0)

Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux.dev
---
 net/vmw_vsock/virtio_transport_common.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
 					u32 len)
 {
-	if (vvs->buf_used + len > vvs->buf_alloc)
+	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
+
+	if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
 		return false;
 
 	vvs->rx_bytes += len;
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-04-30 12:26 [PATCH net] vsock/virtio: fix potential unbounded skb queue Eric Dumazet
@ 2026-05-05  2:20 ` patchwork-bot+netdevbpf
  2026-05-05 13:51 ` Stefano Garzarella
  1 sibling, 0 replies; 26+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-05  2:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, netdev, eric.dumazet, AVKrasnov,
	stefanha, sgarzare, mst, jasowang, xuanzhuo, eperezma, kvm,
	virtualization

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 30 Apr 2026 12:26:52 +0000 you wrote:
> virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
> 
> virtio_transport_recv_enqueue() skips coalescing for packets
> with VIRTIO_VSOCK_SEQ_EOM.
> 
> If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
> a very large number of packets can be queued
> because vvs->rx_bytes stays at 0.
> 
> [...]

Here is the summary with links:
  - [net] vsock/virtio: fix potential unbounded skb queue
    https://git.kernel.org/netdev/net/c/059b7dbd20a6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-04-30 12:26 [PATCH net] vsock/virtio: fix potential unbounded skb queue Eric Dumazet
  2026-05-05  2:20 ` patchwork-bot+netdevbpf
@ 2026-05-05 13:51 ` Stefano Garzarella
  2026-05-05 14:14   ` Eric Dumazet
  1 sibling, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2026-05-05 13:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	kvm, virtualization

On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
>virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
>
>virtio_transport_recv_enqueue() skips coalescing for packets
>with VIRTIO_VSOCK_SEQ_EOM.
>
>If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
>a very large number of packets can be queued
>because vvs->rx_bytes stays at 0.
>
>Fix this by estimating the skb metadata size:
>
>	(Number of skbs in the queue) * SKB_TRUESIZE(0)
>
>Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>Cc: Stefan Hajnoczi <stefanha@redhat.com>
>Cc: Stefano Garzarella <sgarzare@redhat.com>
>Cc: "Michael S. Tsirkin" <mst@redhat.com>
>Cc: Jason Wang <jasowang@redhat.com>
>Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>Cc: "Eugenio Pérez" <eperezma@redhat.com>
>Cc: kvm@vger.kernel.org
>Cc: virtualization@lists.linux.dev
>---
> net/vmw_vsock/virtio_transport_common.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> 					u32 len)
> {
>-	if (vvs->buf_used + len > vvs->buf_alloc)
>+	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
>+
>+	if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> 		return false;

I'm not sure about this fix, I mean that maybe this is incomplete.
In virtio-vsock, there is a credit mechanism between the two peers:
https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003

This takes only the payload into account, so it’s true that this problem 
exists; however, perhaps we should also inform the other peer of a lower 
credit balance, otherwise the other peer will believe it has much more 
credit than it actually does, send a large payload, and then the packet 
will be discarded and the data lost (there are no retransmissions, 
etc.).

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-05 13:51 ` Stefano Garzarella
@ 2026-05-05 14:14   ` Eric Dumazet
  2026-05-05 16:11     ` Stefano Garzarella
  2026-05-06 15:15     ` Michael S. Tsirkin
  0 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2026-05-05 14:14 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	kvm, virtualization

On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
> >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
> >
> >virtio_transport_recv_enqueue() skips coalescing for packets
> >with VIRTIO_VSOCK_SEQ_EOM.
> >
> >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
> >a very large number of packets can be queued
> >because vvs->rx_bytes stays at 0.
> >
> >Fix this by estimating the skb metadata size:
> >
> >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
> >
> >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
> >Signed-off-by: Eric Dumazet <edumazet@google.com>
> >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> >Cc: Stefan Hajnoczi <stefanha@redhat.com>
> >Cc: Stefano Garzarella <sgarzare@redhat.com>
> >Cc: "Michael S. Tsirkin" <mst@redhat.com>
> >Cc: Jason Wang <jasowang@redhat.com>
> >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >Cc: "Eugenio Pérez" <eperezma@redhat.com>
> >Cc: kvm@vger.kernel.org
> >Cc: virtualization@lists.linux.dev
> >---
> > net/vmw_vsock/virtio_transport_common.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
> >--- a/net/vmw_vsock/virtio_transport_common.c
> >+++ b/net/vmw_vsock/virtio_transport_common.c
> >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> >                                       u32 len)
> > {
> >-      if (vvs->buf_used + len > vvs->buf_alloc)
> >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> >+
> >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> >               return false;
>
> I'm not sure about this fix, I mean that maybe this is incomplete.
> In virtio-vsock, there is a credit mechanism between the two peers:
> https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
>
> This takes only the payload into account, so it’s true that this problem
> exists; however, perhaps we should also inform the other peer of a lower
> credit balance, otherwise the other peer will believe it has much more
> credit than it actually does, send a large payload, and then the packet
> will be discarded and the data lost (there are no retransmissions,
> etc.).

I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
state to account credit")
and find a better fix then?

There is always a discrepancy between skb->len and skb->truesize.
You will not be able to announce a 1MB window, and accept one milliion
skb of 1-byte each.

This kind of contract is broken.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-05 14:14   ` Eric Dumazet
@ 2026-05-05 16:11     ` Stefano Garzarella
  2026-05-05 16:37       ` Bobby Eshleman
  2026-05-06 15:37       ` Michael S. Tsirkin
  2026-05-06 15:15     ` Michael S. Tsirkin
  1 sibling, 2 replies; 26+ messages in thread
From: Stefano Garzarella @ 2026-05-05 16:11 UTC (permalink / raw)
  To: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	Michael S. Tsirkin
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Stefan Hajnoczi,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	kvm, virtualization

On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
>On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
>> >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
>> >
>> >virtio_transport_recv_enqueue() skips coalescing for packets
>> >with VIRTIO_VSOCK_SEQ_EOM.
>> >
>> >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
>> >a very large number of packets can be queued
>> >because vvs->rx_bytes stays at 0.
>> >
>> >Fix this by estimating the skb metadata size:
>> >
>> >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
>> >
>> >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
>> >Signed-off-by: Eric Dumazet <edumazet@google.com>
>> >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> >Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> >Cc: Stefano Garzarella <sgarzare@redhat.com>
>> >Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> >Cc: Jason Wang <jasowang@redhat.com>
>> >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> >Cc: "Eugenio Pérez" <eperezma@redhat.com>
>> >Cc: kvm@vger.kernel.org
>> >Cc: virtualization@lists.linux.dev
>> >---
>> > net/vmw_vsock/virtio_transport_common.c | 4 +++-
>> > 1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
>> >--- a/net/vmw_vsock/virtio_transport_common.c
>> >+++ b/net/vmw_vsock/virtio_transport_common.c
>> >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
>> >                                       u32 len)
>> > {
>> >-      if (vvs->buf_used + len > vvs->buf_alloc)
>> >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
>> >+
>> >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
>> >               return false;
>>
>> I'm not sure about this fix, I mean that maybe this is incomplete.
>> In virtio-vsock, there is a credit mechanism between the two peers:
>> https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
>>
>> This takes only the payload into account, so it’s true that this problem
>> exists; however, perhaps we should also inform the other peer of a lower
>> credit balance, otherwise the other peer will believe it has much more
>> credit than it actually does, send a large payload, and then the packet
>> will be discarded and the data lost (there are no retransmissions,
>> etc.).
>
>I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
>state to account credit")
>and find a better fix then?

IIRC the same issue was there before the commit fixed by that one 
(commit 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with 
sk_buff")), so not sure about reverting it TBH.

CCing Arseniy and Bobby.

>
>There is always a discrepancy between skb->len and skb->truesize.
>You will not be able to announce a 1MB window, and accept one milliion
>skb of 1-byte each.
>
>This kind of contract is broken.
>

Yep, I agree, but before we start discarding data (and losing it), IMHO 
we should at least inform the other peer that we're out of space.

@Stefan, @Michael, do you think we can do something in the spec to avoid 
this issue and in some way take into account also the metadata in the 
credit. I mean to avoid the 1-byte packets flooding.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-05 16:11     ` Stefano Garzarella
@ 2026-05-05 16:37       ` Bobby Eshleman
  2026-05-06  9:50         ` Arseniy Krasnov
  2026-05-06 15:37       ` Michael S. Tsirkin
  1 sibling, 1 reply; 26+ messages in thread
From: Bobby Eshleman @ 2026-05-05 16:37 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	Michael S. Tsirkin, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, Arseniy Krasnov, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, kvm, virtualization

On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
> On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > 
> > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
> > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
> > > >
> > > >virtio_transport_recv_enqueue() skips coalescing for packets
> > > >with VIRTIO_VSOCK_SEQ_EOM.
> > > >
> > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
> > > >a very large number of packets can be queued
> > > >because vvs->rx_bytes stays at 0.
> > > >
> > > >Fix this by estimating the skb metadata size:
> > > >
> > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
> > > >
> > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
> > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
> > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > >Cc: Jason Wang <jasowang@redhat.com>
> > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
> > > >Cc: kvm@vger.kernel.org
> > > >Cc: virtualization@lists.linux.dev
> > > >---
> > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
> > > >--- a/net/vmw_vsock/virtio_transport_common.c
> > > >+++ b/net/vmw_vsock/virtio_transport_common.c
> > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > > >                                       u32 len)
> > > > {
> > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
> > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> > > >+
> > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> > > >               return false;
> > > 
> > > I'm not sure about this fix, I mean that maybe this is incomplete.
> > > In virtio-vsock, there is a credit mechanism between the two peers:
> > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
> > > 
> > > This takes only the payload into account, so it’s true that this problem
> > > exists; however, perhaps we should also inform the other peer of a lower
> > > credit balance, otherwise the other peer will believe it has much more
> > > credit than it actually does, send a large payload, and then the packet
> > > will be discarded and the data lost (there are no retransmissions,
> > > etc.).
> > 
> > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
> > state to account credit")
> > and find a better fix then?
> 
> IIRC the same issue was there before the commit fixed by that one (commit
> 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
> not sure about reverting it TBH.
> 
> CCing Arseniy and Bobby.
> 
> > 
> > There is always a discrepancy between skb->len and skb->truesize.
> > You will not be able to announce a 1MB window, and accept one milliion
> > skb of 1-byte each.
> > 
> > This kind of contract is broken.
> > 
> 
> Yep, I agree, but before we start discarding data (and losing it), IMHO we
> should at least inform the other peer that we're out of space.
> 
> @Stefan, @Michael, do you think we can do something in the spec to avoid
> this issue and in some way take into account also the metadata in the
> credit. I mean to avoid the 1-byte packets flooding.
> 
> Thanks,
> Stefano
> 
> 

Indeed the old pre-fix skb code would have the same issue.

I can't think of any way around this without extending the spec.

Best,
Bobby

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-05 16:37       ` Bobby Eshleman
@ 2026-05-06  9:50         ` Arseniy Krasnov
  2026-05-06 14:00           ` Stefano Garzarella
  0 siblings, 1 reply; 26+ messages in thread
From: Arseniy Krasnov @ 2026-05-06  9:50 UTC (permalink / raw)
  To: Bobby Eshleman, Stefano Garzarella
  Cc: Eric Dumazet, Bobby Eshleman, Stefan Hajnoczi, Michael S. Tsirkin,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization



05.05.2026 19:37, Bobby Eshleman wrote:
> On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
>> On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
>>> On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>
>>>> On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
>>>>> virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
>>>>>
>>>>> virtio_transport_recv_enqueue() skips coalescing for packets
>>>>> with VIRTIO_VSOCK_SEQ_EOM.
>>>>>
>>>>> If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
>>>>> a very large number of packets can be queued
>>>>> because vvs->rx_bytes stays at 0.
>>>>>
>>>>> Fix this by estimating the skb metadata size:
>>>>>
>>>>>       (Number of skbs in the queue) * SKB_TRUESIZE(0)
>>>>>
>>>>> Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
>>>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>>> Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>>>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>>>> Cc: Stefano Garzarella <sgarzare@redhat.com>
>>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>>> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>> Cc: "Eugenio Pérez" <eperezma@redhat.com>
>>>>> Cc: kvm@vger.kernel.org
>>>>> Cc: virtualization@lists.linux.dev
>>>>> ---
>>>>> net/vmw_vsock/virtio_transport_common.c | 4 +++-
>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>>> index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>> @@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>>>> static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
>>>>>                                       u32 len)
>>>>> {
>>>>> -      if (vvs->buf_used + len > vvs->buf_alloc)
>>>>> +      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
>>>>> +
>>>>> +      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
>>>>>               return false;
>>>>
>>>> I'm not sure about this fix, I mean that maybe this is incomplete.
>>>> In virtio-vsock, there is a credit mechanism between the two peers:
>>>> https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
>>>>
>>>> This takes only the payload into account, so it’s true that this problem
>>>> exists; however, perhaps we should also inform the other peer of a lower
>>>> credit balance, otherwise the other peer will believe it has much more
>>>> credit than it actually does, send a large payload, and then the packet
>>>> will be discarded and the data lost (there are no retransmissions,
>>>> etc.).
>>>
>>> I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
>>> state to account credit")
>>> and find a better fix then?
>>
>> IIRC the same issue was there before the commit fixed by that one (commit
>> 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
>> not sure about reverting it TBH.
>>
>> CCing Arseniy and Bobby.

Thanks!

>>
>>>
>>> There is always a discrepancy between skb->len and skb->truesize.
>>> You will not be able to announce a 1MB window, and accept one milliion
>>> skb of 1-byte each.
>>>
>>> This kind of contract is broken.
>>>
>>
>> Yep, I agree, but before we start discarding data (and losing it), IMHO we
>> should at least inform the other peer that we're out of space.
>>
>> @Stefan, @Michael, do you think we can do something in the spec to avoid
>> this issue and in some way take into account also the metadata in the
>> credit. I mean to avoid the 1-byte packets flooding.
>>
>> Thanks,
>> Stefano
>>
>>
> 
> Indeed the old pre-fix skb code would have the same issue.
> 
> I can't think of any way around this without extending the spec.

Hi, thanks, agree with Bobby, that accounting metadata (e.g. skb size here) was not implemented "by
design" in credit logic - another side of data exchange knows nothing about that. Also the same
situation was before skb implementation was added by Bobby. So looks like need to update spec may be.

Thanks!


> 
> Best,
> Bobby


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-06  9:50         ` Arseniy Krasnov
@ 2026-05-06 14:00           ` Stefano Garzarella
  0 siblings, 0 replies; 26+ messages in thread
From: Stefano Garzarella @ 2026-05-06 14:00 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Bobby Eshleman, Eric Dumazet, Bobby Eshleman, Stefan Hajnoczi,
	Michael S. Tsirkin, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, Arseniy Krasnov, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, kvm, virtualization

On Wed, May 06, 2026 at 12:50:04PM +0300, Arseniy Krasnov wrote:
>
>
>05.05.2026 19:37, Bobby Eshleman wrote:
>> On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
>>> On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
>>>> On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>>>
>>>>> On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
>>>>>> virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
>>>>>>
>>>>>> virtio_transport_recv_enqueue() skips coalescing for packets
>>>>>> with VIRTIO_VSOCK_SEQ_EOM.
>>>>>>
>>>>>> If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
>>>>>> a very large number of packets can be queued
>>>>>> because vvs->rx_bytes stays at 0.
>>>>>>
>>>>>> Fix this by estimating the skb metadata size:
>>>>>>
>>>>>>       (Number of skbs in the queue) * SKB_TRUESIZE(0)
>>>>>>
>>>>>> Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
>>>>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>>>> Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>>>>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>> Cc: Stefano Garzarella <sgarzare@redhat.com>
>>>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>>>> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>> Cc: "Eugenio Pérez" <eperezma@redhat.com>
>>>>>> Cc: kvm@vger.kernel.org
>>>>>> Cc: virtualization@lists.linux.dev
>>>>>> ---
>>>>>> net/vmw_vsock/virtio_transport_common.c | 4 +++-
>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>>>> index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
>>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>>> @@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>>>>> static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
>>>>>>                                       u32 len)
>>>>>> {
>>>>>> -      if (vvs->buf_used + len > vvs->buf_alloc)
>>>>>> +      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
>>>>>> +
>>>>>> +      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
>>>>>>               return false;
>>>>>
>>>>> I'm not sure about this fix, I mean that maybe this is incomplete.
>>>>> In virtio-vsock, there is a credit mechanism between the two peers:
>>>>> https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
>>>>>
>>>>> This takes only the payload into account, so it’s true that this problem
>>>>> exists; however, perhaps we should also inform the other peer of a lower
>>>>> credit balance, otherwise the other peer will believe it has much more
>>>>> credit than it actually does, send a large payload, and then the packet
>>>>> will be discarded and the data lost (there are no retransmissions,
>>>>> etc.).
>>>>
>>>> I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
>>>> state to account credit")
>>>> and find a better fix then?
>>>
>>> IIRC the same issue was there before the commit fixed by that one (commit
>>> 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
>>> not sure about reverting it TBH.
>>>
>>> CCing Arseniy and Bobby.
>
>Thanks!
>
>>>
>>>>
>>>> There is always a discrepancy between skb->len and skb->truesize.
>>>> You will not be able to announce a 1MB window, and accept one milliion
>>>> skb of 1-byte each.
>>>>
>>>> This kind of contract is broken.
>>>>
>>>
>>> Yep, I agree, but before we start discarding data (and losing it), IMHO we
>>> should at least inform the other peer that we're out of space.
>>>
>>> @Stefan, @Michael, do you think we can do something in the spec to avoid
>>> this issue and in some way take into account also the metadata in the
>>> credit. I mean to avoid the 1-byte packets flooding.
>>>
>>> Thanks,
>>> Stefano
>>>
>>>
>>
>> Indeed the old pre-fix skb code would have the same issue.
>>
>> I can't think of any way around this without extending the spec.
>
>Hi, thanks, agree with Bobby, that accounting metadata (e.g. skb size here) was not implemented "by
>design" in credit logic - another side of data exchange knows nothing about that. Also the same
>situation was before skb implementation was added by Bobby. So looks like need to update spec may be.
>

Even if we change the specifications, we still need to work with older 
devices, so we should find a solution for this as well.

My main concern is data loss, so I'm considering the following options:

1. Notify the other peer of a smaller buf_alloc from the start, leave 
some room for overhead, and when it's running out, notify them that 
buf_alloc = 0. This way, the peer realizes it can’t send anything else.

2. Or update buf_alloc each time by removing the overhead, similar to 
what’s currently done in virtio_transport_inc_rx_pkt(), but also do it 
in virtio_transport_inc_tx_pkt().

As I said, IMO this patch alone is incomplete; we need to communicate 
with the peer somehow regarding space. I don’t think including the 
overhead in fwd_cnt is spec compliant, since the other peer has no idea 
how much overhead is needed, but reducing buf_alloc should be okay, even 
though I’m concerned about packets in flight.

As a quick fix, I think option 2 might be the easiest; I’ll run some 
tests and send over a patch.

But in the long run, I think we absolutely need to improve memory 
management in vsock, perhaps by avoiding custom solutions.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-05 16:11     ` Stefano Garzarella
  2026-05-05 16:37       ` Bobby Eshleman
@ 2026-05-06 15:37       ` Michael S. Tsirkin
  2026-05-07  9:09         ` Stefano Garzarella
  1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2026-05-06 15:37 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
> On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > 
> > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
> > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
> > > >
> > > >virtio_transport_recv_enqueue() skips coalescing for packets
> > > >with VIRTIO_VSOCK_SEQ_EOM.
> > > >
> > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
> > > >a very large number of packets can be queued
> > > >because vvs->rx_bytes stays at 0.
> > > >
> > > >Fix this by estimating the skb metadata size:
> > > >
> > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
> > > >
> > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
> > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
> > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > >Cc: Jason Wang <jasowang@redhat.com>
> > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
> > > >Cc: kvm@vger.kernel.org
> > > >Cc: virtualization@lists.linux.dev
> > > >---
> > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
> > > >--- a/net/vmw_vsock/virtio_transport_common.c
> > > >+++ b/net/vmw_vsock/virtio_transport_common.c
> > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > > >                                       u32 len)
> > > > {
> > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
> > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> > > >+
> > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> > > >               return false;
> > > 
> > > I'm not sure about this fix, I mean that maybe this is incomplete.
> > > In virtio-vsock, there is a credit mechanism between the two peers:
> > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
> > > 
> > > This takes only the payload into account, so it’s true that this problem
> > > exists; however, perhaps we should also inform the other peer of a lower
> > > credit balance, otherwise the other peer will believe it has much more
> > > credit than it actually does, send a large payload, and then the packet
> > > will be discarded and the data lost (there are no retransmissions,
> > > etc.).
> > 
> > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
> > state to account credit")
> > and find a better fix then?
> 
> IIRC the same issue was there before the commit fixed by that one (commit
> 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
> not sure about reverting it TBH.
> 
> CCing Arseniy and Bobby.
> 
> > 
> > There is always a discrepancy between skb->len and skb->truesize.
> > You will not be able to announce a 1MB window, and accept one milliion
> > skb of 1-byte each.
> > 
> > This kind of contract is broken.
> > 
> 
> Yep, I agree, but before we start discarding data (and losing it), IMHO we
> should at least inform the other peer that we're out of space.
> 
> @Stefan, @Michael, do you think we can do something in the spec to avoid
> this issue and in some way take into account also the metadata in the
> credit. I mean to avoid the 1-byte packets flooding.
> 
> Thanks,
> Stefano

Why do we need the metadata? Just don't keep it around if you begin
running low on memory.

-- 
MST


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-06 15:37       ` Michael S. Tsirkin
@ 2026-05-07  9:09         ` Stefano Garzarella
  2026-05-07 11:45           ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2026-05-07  9:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Wed, May 06, 2026 at 11:37:45AM -0400, Michael S. Tsirkin wrote:
>On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
>> On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
>> > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > >
>> > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
>> > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
>> > > >
>> > > >virtio_transport_recv_enqueue() skips coalescing for packets
>> > > >with VIRTIO_VSOCK_SEQ_EOM.
>> > > >
>> > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
>> > > >a very large number of packets can be queued
>> > > >because vvs->rx_bytes stays at 0.
>> > > >
>> > > >Fix this by estimating the skb metadata size:
>> > > >
>> > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
>> > > >
>> > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
>> > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
>> > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
>> > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> > > >Cc: Jason Wang <jasowang@redhat.com>
>> > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
>> > > >Cc: kvm@vger.kernel.org
>> > > >Cc: virtualization@lists.linux.dev
>> > > >---
>> > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
>> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
>> > > >
>> > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
>> > > >--- a/net/vmw_vsock/virtio_transport_common.c
>> > > >+++ b/net/vmw_vsock/virtio_transport_common.c
>> > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
>> > > >                                       u32 len)
>> > > > {
>> > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
>> > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
>> > > >+
>> > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
>> > > >               return false;
>> > >
>> > > I'm not sure about this fix, I mean that maybe this is incomplete.
>> > > In virtio-vsock, there is a credit mechanism between the two peers:
>> > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
>> > >
>> > > This takes only the payload into account, so it’s true that this problem
>> > > exists; however, perhaps we should also inform the other peer of a lower
>> > > credit balance, otherwise the other peer will believe it has much more
>> > > credit than it actually does, send a large payload, and then the packet
>> > > will be discarded and the data lost (there are no retransmissions,
>> > > etc.).
>> >
>> > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
>> > state to account credit")
>> > and find a better fix then?
>>
>> IIRC the same issue was there before the commit fixed by that one (commit
>> 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
>> not sure about reverting it TBH.
>>
>> CCing Arseniy and Bobby.
>>
>> >
>> > There is always a discrepancy between skb->len and skb->truesize.
>> > You will not be able to announce a 1MB window, and accept one milliion
>> > skb of 1-byte each.
>> >
>> > This kind of contract is broken.
>> >
>>
>> Yep, I agree, but before we start discarding data (and losing it), IMHO we
>> should at least inform the other peer that we're out of space.
>>
>> @Stefan, @Michael, do you think we can do something in the spec to avoid
>> this issue and in some way take into account also the metadata in the
>> credit. I mean to avoid the 1-byte packets flooding.
>>
>> Thanks,
>> Stefano
>
>Why do we need the metadata? Just don't keep it around if you begin
>running low on memory.

I don't think removing the skuffs will be easy; we added them for ebpf, 
zero-copy, and seqpacket as well. For now, we're already doing 
something: merging the skuffs if they don't have EOM set.

As a quick fix, I'm thinking of reducing the `buf_alloc` value to 
account for the overhead and notifying the other peer, at least until we 
find a better solution.

Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07  9:09         ` Stefano Garzarella
@ 2026-05-07 11:45           ` Michael S. Tsirkin
  2026-05-07 12:59             ` Stefano Garzarella
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2026-05-07 11:45 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
> On Wed, May 06, 2026 at 11:37:45AM -0400, Michael S. Tsirkin wrote:
> > On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
> > > On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > > > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > >
> > > > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
> > > > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
> > > > > >
> > > > > >virtio_transport_recv_enqueue() skips coalescing for packets
> > > > > >with VIRTIO_VSOCK_SEQ_EOM.
> > > > > >
> > > > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
> > > > > >a very large number of packets can be queued
> > > > > >because vvs->rx_bytes stays at 0.
> > > > > >
> > > > > >Fix this by estimating the skb metadata size:
> > > > > >
> > > > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
> > > > > >
> > > > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
> > > > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> > > > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
> > > > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > >Cc: Jason Wang <jasowang@redhat.com>
> > > > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
> > > > > >Cc: kvm@vger.kernel.org
> > > > > >Cc: virtualization@lists.linux.dev
> > > > > >---
> > > > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
> > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > >
> > > > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
> > > > > >--- a/net/vmw_vsock/virtio_transport_common.c
> > > > > >+++ b/net/vmw_vsock/virtio_transport_common.c
> > > > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > > > > >                                       u32 len)
> > > > > > {
> > > > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
> > > > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> > > > > >+
> > > > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> > > > > >               return false;
> > > > >
> > > > > I'm not sure about this fix, I mean that maybe this is incomplete.
> > > > > In virtio-vsock, there is a credit mechanism between the two peers:
> > > > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
> > > > >
> > > > > This takes only the payload into account, so it’s true that this problem
> > > > > exists; however, perhaps we should also inform the other peer of a lower
> > > > > credit balance, otherwise the other peer will believe it has much more
> > > > > credit than it actually does, send a large payload, and then the packet
> > > > > will be discarded and the data lost (there are no retransmissions,
> > > > > etc.).
> > > >
> > > > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
> > > > state to account credit")
> > > > and find a better fix then?
> > > 
> > > IIRC the same issue was there before the commit fixed by that one (commit
> > > 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
> > > not sure about reverting it TBH.
> > > 
> > > CCing Arseniy and Bobby.
> > > 
> > > >
> > > > There is always a discrepancy between skb->len and skb->truesize.
> > > > You will not be able to announce a 1MB window, and accept one milliion
> > > > skb of 1-byte each.
> > > >
> > > > This kind of contract is broken.
> > > >
> > > 
> > > Yep, I agree, but before we start discarding data (and losing it), IMHO we
> > > should at least inform the other peer that we're out of space.
> > > 
> > > @Stefan, @Michael, do you think we can do something in the spec to avoid
> > > this issue and in some way take into account also the metadata in the
> > > credit. I mean to avoid the 1-byte packets flooding.
> > > 
> > > Thanks,
> > > Stefano
> > 
> > Why do we need the metadata? Just don't keep it around if you begin
> > running low on memory.
> 
> I don't think removing the skuffs will be easy; we added them for ebpf,
> zero-copy, and seqpacket as well.

You do not need to remove them completely.

> For now, we're already doing something:
> merging the skuffs if they don't have EOM set.


Right that's good. You could go further and merge with EOM too
if you stick the info about message boundaries somewhere else.

> As a quick fix, I'm thinking of reducing the `buf_alloc` value to account
> for the overhead and notifying the other peer, at least until we find a
> better solution.
> 
> Stefano

well if you want to support pathological cases such as 1 byte messages
that would mean like 100x reduction no?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07 11:45           ` Michael S. Tsirkin
@ 2026-05-07 12:59             ` Stefano Garzarella
  2026-05-07 14:33               ` Jakub Kicinski
                                 ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Stefano Garzarella @ 2026-05-07 12:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
>On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
>> On Wed, May 06, 2026 at 11:37:45AM -0400, Michael S. Tsirkin wrote:
>> > On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
>> > > On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
>> > > > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > > > >
>> > > > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
>> > > > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
>> > > > > >
>> > > > > >virtio_transport_recv_enqueue() skips coalescing for packets
>> > > > > >with VIRTIO_VSOCK_SEQ_EOM.
>> > > > > >
>> > > > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
>> > > > > >a very large number of packets can be queued
>> > > > > >because vvs->rx_bytes stays at 0.
>> > > > > >
>> > > > > >Fix this by estimating the skb metadata size:
>> > > > > >
>> > > > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
>> > > > > >
>> > > > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
>> > > > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
>> > > > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> > > > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> > > > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
>> > > > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> > > > > >Cc: Jason Wang <jasowang@redhat.com>
>> > > > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> > > > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
>> > > > > >Cc: kvm@vger.kernel.org
>> > > > > >Cc: virtualization@lists.linux.dev
>> > > > > >---
>> > > > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
>> > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
>> > > > > >
>> > > > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> > > > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
>> > > > > >--- a/net/vmw_vsock/virtio_transport_common.c
>> > > > > >+++ b/net/vmw_vsock/virtio_transport_common.c
>> > > > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > > > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
>> > > > > >                                       u32 len)
>> > > > > > {
>> > > > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
>> > > > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
>> > > > > >+
>> > > > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
>> > > > > >               return false;
>> > > > >
>> > > > > I'm not sure about this fix, I mean that maybe this is incomplete.
>> > > > > In virtio-vsock, there is a credit mechanism between the two peers:
>> > > > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
>> > > > >
>> > > > > This takes only the payload into account, so it’s true that this problem
>> > > > > exists; however, perhaps we should also inform the other peer of a lower
>> > > > > credit balance, otherwise the other peer will believe it has much more
>> > > > > credit than it actually does, send a large payload, and then the packet
>> > > > > will be discarded and the data lost (there are no retransmissions,
>> > > > > etc.).
>> > > >
>> > > > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
>> > > > state to account credit")
>> > > > and find a better fix then?
>> > >
>> > > IIRC the same issue was there before the commit fixed by that one (commit
>> > > 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
>> > > not sure about reverting it TBH.
>> > >
>> > > CCing Arseniy and Bobby.
>> > >
>> > > >
>> > > > There is always a discrepancy between skb->len and skb->truesize.
>> > > > You will not be able to announce a 1MB window, and accept one milliion
>> > > > skb of 1-byte each.
>> > > >
>> > > > This kind of contract is broken.
>> > > >
>> > >
>> > > Yep, I agree, but before we start discarding data (and losing it), IMHO we
>> > > should at least inform the other peer that we're out of space.
>> > >
>> > > @Stefan, @Michael, do you think we can do something in the spec to avoid
>> > > this issue and in some way take into account also the metadata in the
>> > > credit. I mean to avoid the 1-byte packets flooding.
>> > >
>> > > Thanks,
>> > > Stefano
>> >
>> > Why do we need the metadata? Just don't keep it around if you begin
>> > running low on memory.
>>
>> I don't think removing the skuffs will be easy; we added them for ebpf,
>> zero-copy, and seqpacket as well.
>
>You do not need to remove them completely.
>
>> For now, we're already doing something:
>> merging the skuffs if they don't have EOM set.
>
>
>Right that's good. You could go further and merge with EOM too
>if you stick the info about message boundaries somewhere else.

This adds a lot of complexity IMO, but we can try.

Do you have something in mind?

>
>> As a quick fix, I'm thinking of reducing the `buf_alloc` value to account
>> for the overhead and notifying the other peer, at least until we find a
>> better solution.
>>
>> Stefano
>
>well if you want to support pathological cases such as 1 byte messages
>that would mean like 100x reduction no?
>

Yep, but since this patch is already merged, IMHO that is better than 
losing data in those pathological cases.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07 12:59             ` Stefano Garzarella
@ 2026-05-07 14:33               ` Jakub Kicinski
  2026-05-07 16:05                 ` Stefano Garzarella
  2026-05-07 14:52               ` Michael S. Tsirkin
  2026-05-07 22:48               ` Michael S. Tsirkin
  2 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2026-05-07 14:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, Eric Dumazet, Arseniy Krasnov, Bobby Eshleman,
	Stefan Hajnoczi, David S . Miller, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Thu, 7 May 2026 14:59:13 +0200 Stefano Garzarella wrote:
> >well if you want to support pathological cases such as 1 byte messages
> >that would mean like 100x reduction no?
> 
> Yep, but since this patch is already merged, IMHO that is better than 
> losing data in those pathological cases.

We can revert if you think that the risk of regression is high..
Please LMK soon, we can do it before patch reaches Linus.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07 14:33               ` Jakub Kicinski
@ 2026-05-07 16:05                 ` Stefano Garzarella
  2026-05-07 16:32                   ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Garzarella @ 2026-05-07 16:05 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Michael S. Tsirkin, Eric Dumazet, Arseniy Krasnov, Bobby Eshleman,
	Stefan Hajnoczi, David S . Miller, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Thu, May 07, 2026 at 07:33:40AM -0700, Jakub Kicinski wrote:
>On Thu, 7 May 2026 14:59:13 +0200 Stefano Garzarella wrote:
>> >well if you want to support pathological cases such as 1 byte messages
>> >that would mean like 100x reduction no?
>>
>> Yep, but since this patch is already merged, IMHO that is better than
>> losing data in those pathological cases.
>
>We can revert if you think that the risk of regression is high..
>Please LMK soon, we can do it before patch reaches Linus.
>

Some tests in tools/testing/vsock/vsock_test.c are failing with this 
patch applied.

Test 18 are failing sometime in this way (I guess because we are 
dropping packets):

18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch

Test 22 is failing 100% in this way:

22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed: 
Resource temporarily unavailable

With my followup patch adding also advertisement to the other peer 
(still draft locally, waiting for Michael proposal) I saw 22 failing, 
because tests expects that can use the entire buf_alloc, but now we are 
reducing it.  So IMO we should do like in `__sock_set_rcvbuf()` and 
double the buffer size, or at least digest an overhead equal to the 
buffer size set by the user via SO_VM_SOCKETS_BUFFER_SIZE (yeah, 
AF_VSOCK has it owns sockopt since the beginning :-().

With that approach tests are passing, but I'd like to stress a bit more 
that patch. I'll send it tomorrow as fixup of this patch, or if you 
prefer to revert, I'll send as standalone.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07 16:05                 ` Stefano Garzarella
@ 2026-05-07 16:32                   ` Eric Dumazet
  2026-05-07 17:18                     ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2026-05-07 16:32 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Jakub Kicinski, Michael S. Tsirkin, Arseniy Krasnov,
	Bobby Eshleman, Stefan Hajnoczi, David S . Miller, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, Arseniy Krasnov, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, kvm, virtualization

On Thu, May 7, 2026 at 9:05 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Thu, May 07, 2026 at 07:33:40AM -0700, Jakub Kicinski wrote:
> >On Thu, 7 May 2026 14:59:13 +0200 Stefano Garzarella wrote:
> >> >well if you want to support pathological cases such as 1 byte messages
> >> >that would mean like 100x reduction no?
> >>
> >> Yep, but since this patch is already merged, IMHO that is better than
> >> losing data in those pathological cases.
> >
> >We can revert if you think that the risk of regression is high..
> >Please LMK soon, we can do it before patch reaches Linus.
> >
>
> Some tests in tools/testing/vsock/vsock_test.c are failing with this
> patch applied.
>
> Test 18 are failing sometime in this way (I guess because we are
> dropping packets):
>
> 18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
>
> Test 22 is failing 100% in this way:
>
> 22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
> Resource temporarily unavailable
>
>
> With my followup patch adding also advertisement to the other peer
> (still draft locally, waiting for Michael proposal) I saw 22 failing,
> because tests expects that can use the entire buf_alloc, but now we are
> reducing it.  So IMO we should do like in `__sock_set_rcvbuf()` and
> double the buffer size, or at least digest an overhead equal to the
> buffer size set by the user via SO_VM_SOCKETS_BUFFER_SIZE (yeah,
> AF_VSOCK has it owns sockopt since the beginning :-().
>
> With that approach tests are passing, but I'd like to stress a bit more
> that patch. I'll send it tomorrow as fixup of this patch, or if you
> prefer to revert, I'll send as standalone.
>

A plain revert is a big issue, now users now how to crash hypervisors.

This vulnerability allows a compromised guest (controlling
virtio_vsock_hdr fields)
to continuously flood the host's vsock receive queue without
triggering any memory
 accounting limits or reader wakeups, resulting in unbounded host
kernel memory consumption (Host DoS via OOM).

A vulnerability where a KVM guest can crash or deadlock its host is
classified as a KVM DoS.

Am I missing something?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07 16:32                   ` Eric Dumazet
@ 2026-05-07 17:18                     ` Jakub Kicinski
  2026-05-08  9:26                       ` Stefano Garzarella
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2026-05-07 17:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stefano Garzarella, Michael S. Tsirkin, Arseniy Krasnov,
	Bobby Eshleman, Stefan Hajnoczi, David S . Miller, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, Arseniy Krasnov, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, kvm, virtualization

On Thu, 7 May 2026 09:32:24 -0700 Eric Dumazet wrote:
> On Thu, May 7, 2026 at 9:05 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > On Thu, May 07, 2026 at 07:33:40AM -0700, Jakub Kicinski wrote:  
> > >We can revert if you think that the risk of regression is high..
> > >Please LMK soon, we can do it before patch reaches Linus.
> >
> > Some tests in tools/testing/vsock/vsock_test.c are failing with this
> > patch applied.
> >
> > Test 18 are failing sometime in this way (I guess because we are
> > dropping packets):
> >
> > 18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
> >
> > Test 22 is failing 100% in this way:
> >
> > 22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
> > Resource temporarily unavailable
> >
> >
> > With my followup patch adding also advertisement to the other peer
> > (still draft locally, waiting for Michael proposal) I saw 22 failing,
> > because tests expects that can use the entire buf_alloc, but now we are
> > reducing it.  So IMO we should do like in `__sock_set_rcvbuf()` and
> > double the buffer size, or at least digest an overhead equal to the
> > buffer size set by the user via SO_VM_SOCKETS_BUFFER_SIZE (yeah,
> > AF_VSOCK has it owns sockopt since the beginning :-().
> >
> > With that approach tests are passing, but I'd like to stress a bit more
> > that patch. I'll send it tomorrow as fixup of this patch, or if you
> > prefer to revert, I'll send as standalone.
> 
> A plain revert is a big issue, now users now how to crash hypervisors.
> 
> This vulnerability allows a compromised guest (controlling
> virtio_vsock_hdr fields)
> to continuously flood the host's vsock receive queue without
> triggering any memory
>  accounting limits or reader wakeups, resulting in unbounded host
> kernel memory consumption (Host DoS via OOM).
> 
> A vulnerability where a KVM guest can crash or deadlock its host is
> classified as a KVM DoS.
> 
> Am I missing something?

Alright, let's leave it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07 17:18                     ` Jakub Kicinski
@ 2026-05-08  9:26                       ` Stefano Garzarella
  0 siblings, 0 replies; 26+ messages in thread
From: Stefano Garzarella @ 2026-05-08  9:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, Michael S. Tsirkin, Arseniy Krasnov, Bobby Eshleman,
	Stefan Hajnoczi, David S . Miller, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Thu, May 07, 2026 at 10:18:05AM -0700, Jakub Kicinski wrote:
>On Thu, 7 May 2026 09:32:24 -0700 Eric Dumazet wrote:
>> On Thu, May 7, 2026 at 9:05 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > On Thu, May 07, 2026 at 07:33:40AM -0700, Jakub Kicinski wrote:
>> > >We can revert if you think that the risk of regression is high..
>> > >Please LMK soon, we can do it before patch reaches Linus.
>> >
>> > Some tests in tools/testing/vsock/vsock_test.c are failing with this
>> > patch applied.
>> >
>> > Test 18 are failing sometime in this way (I guess because we are
>> > dropping packets):
>> >
>> > 18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
>> >
>> > Test 22 is failing 100% in this way:
>> >
>> > 22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
>> > Resource temporarily unavailable
>> >
>> >
>> > With my followup patch adding also advertisement to the other peer
>> > (still draft locally, waiting for Michael proposal) I saw 22 failing,
>> > because tests expects that can use the entire buf_alloc, but now we are
>> > reducing it.  So IMO we should do like in `__sock_set_rcvbuf()` and
>> > double the buffer size, or at least digest an overhead equal to the
>> > buffer size set by the user via SO_VM_SOCKETS_BUFFER_SIZE (yeah,
>> > AF_VSOCK has it owns sockopt since the beginning :-().
>> >
>> > With that approach tests are passing, but I'd like to stress a bit more
>> > that patch. I'll send it tomorrow as fixup of this patch, or if you
>> > prefer to revert, I'll send as standalone.
>>
>> A plain revert is a big issue, now users now how to crash hypervisors.
>>
>> This vulnerability allows a compromised guest (controlling
>> virtio_vsock_hdr fields)
>> to continuously flood the host's vsock receive queue without
>> triggering any memory
>>  accounting limits or reader wakeups, resulting in unbounded host
>> kernel memory consumption (Host DoS via OOM).
>>
>> A vulnerability where a KVM guest can crash or deadlock its host is
>> classified as a KVM DoS.
>>
>> Am I missing something?
>
>Alright, let's leave it.
>

I posted a potential fixup: 
https://lore.kernel.org/netdev/20260508092330.69690-1-sgarzare@redhat.com/

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07 12:59             ` Stefano Garzarella
  2026-05-07 14:33               ` Jakub Kicinski
@ 2026-05-07 14:52               ` Michael S. Tsirkin
  2026-05-07 22:48               ` Michael S. Tsirkin
  2 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2026-05-07 14:52 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
> On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
> > On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
> > > On Wed, May 06, 2026 at 11:37:45AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
> > > > > On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > > > > > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
> > > > > > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
> > > > > > > >
> > > > > > > >virtio_transport_recv_enqueue() skips coalescing for packets
> > > > > > > >with VIRTIO_VSOCK_SEQ_EOM.
> > > > > > > >
> > > > > > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
> > > > > > > >a very large number of packets can be queued
> > > > > > > >because vvs->rx_bytes stays at 0.
> > > > > > > >
> > > > > > > >Fix this by estimating the skb metadata size:
> > > > > > > >
> > > > > > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
> > > > > > > >
> > > > > > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
> > > > > > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> > > > > > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > >Cc: Jason Wang <jasowang@redhat.com>
> > > > > > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
> > > > > > > >Cc: kvm@vger.kernel.org
> > > > > > > >Cc: virtualization@lists.linux.dev
> > > > > > > >---
> > > > > > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
> > > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > > > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
> > > > > > > >--- a/net/vmw_vsock/virtio_transport_common.c
> > > > > > > >+++ b/net/vmw_vsock/virtio_transport_common.c
> > > > > > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > > > > > > >                                       u32 len)
> > > > > > > > {
> > > > > > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
> > > > > > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> > > > > > > >+
> > > > > > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> > > > > > > >               return false;
> > > > > > >
> > > > > > > I'm not sure about this fix, I mean that maybe this is incomplete.
> > > > > > > In virtio-vsock, there is a credit mechanism between the two peers:
> > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
> > > > > > >
> > > > > > > This takes only the payload into account, so it’s true that this problem
> > > > > > > exists; however, perhaps we should also inform the other peer of a lower
> > > > > > > credit balance, otherwise the other peer will believe it has much more
> > > > > > > credit than it actually does, send a large payload, and then the packet
> > > > > > > will be discarded and the data lost (there are no retransmissions,
> > > > > > > etc.).
> > > > > >
> > > > > > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
> > > > > > state to account credit")
> > > > > > and find a better fix then?
> > > > >
> > > > > IIRC the same issue was there before the commit fixed by that one (commit
> > > > > 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
> > > > > not sure about reverting it TBH.
> > > > >
> > > > > CCing Arseniy and Bobby.
> > > > >
> > > > > >
> > > > > > There is always a discrepancy between skb->len and skb->truesize.
> > > > > > You will not be able to announce a 1MB window, and accept one milliion
> > > > > > skb of 1-byte each.
> > > > > >
> > > > > > This kind of contract is broken.
> > > > > >
> > > > >
> > > > > Yep, I agree, but before we start discarding data (and losing it), IMHO we
> > > > > should at least inform the other peer that we're out of space.
> > > > >
> > > > > @Stefan, @Michael, do you think we can do something in the spec to avoid
> > > > > this issue and in some way take into account also the metadata in the
> > > > > credit. I mean to avoid the 1-byte packets flooding.
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Why do we need the metadata? Just don't keep it around if you begin
> > > > running low on memory.
> > > 
> > > I don't think removing the skuffs will be easy; we added them for ebpf,
> > > zero-copy, and seqpacket as well.
> > 
> > You do not need to remove them completely.
> > 
> > > For now, we're already doing something:
> > > merging the skuffs if they don't have EOM set.
> > 
> > 
> > Right that's good. You could go further and merge with EOM too
> > if you stick the info about message boundaries somewhere else.
> 
> This adds a lot of complexity IMO, but we can try.
> 
> Do you have something in mind?

I'll send something shortly just to give you an idea.


> > 
> > > As a quick fix, I'm thinking of reducing the `buf_alloc` value to account
> > > for the overhead and notifying the other peer, at least until we find a
> > > better solution.
> > > 
> > > Stefano
> > 
> > well if you want to support pathological cases such as 1 byte messages
> > that would mean like 100x reduction no?
> > 
> 
> Yep, but since this patch is already merged, IMHO that is better than losing
> data in those pathological cases.
> 
> Thanks,
> Stefano





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07 12:59             ` Stefano Garzarella
  2026-05-07 14:33               ` Jakub Kicinski
  2026-05-07 14:52               ` Michael S. Tsirkin
@ 2026-05-07 22:48               ` Michael S. Tsirkin
  2026-05-08  9:41                 ` Stefano Garzarella
  2 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2026-05-07 22:48 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
> On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
> > On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
> > > On Wed, May 06, 2026 at 11:37:45AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
> > > > > On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > > > > > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
> > > > > > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
> > > > > > > >
> > > > > > > >virtio_transport_recv_enqueue() skips coalescing for packets
> > > > > > > >with VIRTIO_VSOCK_SEQ_EOM.
> > > > > > > >
> > > > > > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
> > > > > > > >a very large number of packets can be queued
> > > > > > > >because vvs->rx_bytes stays at 0.
> > > > > > > >
> > > > > > > >Fix this by estimating the skb metadata size:
> > > > > > > >
> > > > > > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
> > > > > > > >
> > > > > > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
> > > > > > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> > > > > > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
> > > > > > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > >Cc: Jason Wang <jasowang@redhat.com>
> > > > > > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
> > > > > > > >Cc: kvm@vger.kernel.org
> > > > > > > >Cc: virtualization@lists.linux.dev
> > > > > > > >---
> > > > > > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
> > > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > > > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
> > > > > > > >--- a/net/vmw_vsock/virtio_transport_common.c
> > > > > > > >+++ b/net/vmw_vsock/virtio_transport_common.c
> > > > > > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > > > > > > >                                       u32 len)
> > > > > > > > {
> > > > > > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
> > > > > > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> > > > > > > >+
> > > > > > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> > > > > > > >               return false;
> > > > > > >
> > > > > > > I'm not sure about this fix, I mean that maybe this is incomplete.
> > > > > > > In virtio-vsock, there is a credit mechanism between the two peers:
> > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
> > > > > > >
> > > > > > > This takes only the payload into account, so it’s true that this problem
> > > > > > > exists; however, perhaps we should also inform the other peer of a lower
> > > > > > > credit balance, otherwise the other peer will believe it has much more
> > > > > > > credit than it actually does, send a large payload, and then the packet
> > > > > > > will be discarded and the data lost (there are no retransmissions,
> > > > > > > etc.).
> > > > > >
> > > > > > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
> > > > > > state to account credit")
> > > > > > and find a better fix then?
> > > > >
> > > > > IIRC the same issue was there before the commit fixed by that one (commit
> > > > > 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
> > > > > not sure about reverting it TBH.
> > > > >
> > > > > CCing Arseniy and Bobby.
> > > > >
> > > > > >
> > > > > > There is always a discrepancy between skb->len and skb->truesize.
> > > > > > You will not be able to announce a 1MB window, and accept one milliion
> > > > > > skb of 1-byte each.
> > > > > >
> > > > > > This kind of contract is broken.
> > > > > >
> > > > >
> > > > > Yep, I agree, but before we start discarding data (and losing it), IMHO we
> > > > > should at least inform the other peer that we're out of space.
> > > > >
> > > > > @Stefan, @Michael, do you think we can do something in the spec to avoid
> > > > > this issue and in some way take into account also the metadata in the
> > > > > credit. I mean to avoid the 1-byte packets flooding.
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Why do we need the metadata? Just don't keep it around if you begin
> > > > running low on memory.
> > > 
> > > I don't think removing the skuffs will be easy; we added them for ebpf,
> > > zero-copy, and seqpacket as well.
> > 
> > You do not need to remove them completely.
> > 
> > > For now, we're already doing something:
> > > merging the skuffs if they don't have EOM set.
> > 
> > 
> > Right that's good. You could go further and merge with EOM too
> > if you stick the info about message boundaries somewhere else.
> 
> This adds a lot of complexity IMO, but we can try.
> 
> Do you have something in mind?

BER is clearly overkill but here's a POC that claude made for me,
just to give u an idea. It's clearly has a ton of issues,
for example I dislike how GFP_ATOMIC is handled.
Yet it seems to work fine in light testing.

-->


vsock/virtio: use DWARF ULEB128 to record EOM boundaries, enable cross-EOM skb coalescing

virtio_transport_recv_enqueue() currently refuses to coalesce an
incoming skb with the previous one when the previous skb carries
VIRTIO_VSOCK_SEQ_EOM.  This forces one skb per seqpacket message.
For workloads with many small or zero-byte messages the per-skb
overhead (~960 bytes) dominates, causing unbounded memory growth.

Decouple message boundary tracking from the skb structure: store
boundary offsets in a compact side buffer using DWARF ULEB128
encoding with the EOR flag folded into the low bit, then allow
the data of multiple complete messages to be coalesced into a single
skb.

Cross-EOM coalescing fires only when:
- both the tail skb and the incoming packet carry EOM (complete msgs)
- the incoming packet fits in the tail skb's tailroom
- no BPF psock is attached (read_skb expects one msg per skb)

On allocation failure the code falls back to separate skbs (existing
behaviour).  Credit accounting is unchanged; the boundary buffer is
capped at PAGE_SIZE.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index f91704731057..e36b9ab28372 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -12,6 +12,7 @@
 struct virtio_vsock_skb_cb {
 	bool reply;
 	bool tap_delivered;
+	bool has_boundary_entries;
 	u32 offset;
 };
 
@@ -167,6 +168,12 @@ struct virtio_vsock_sock {
 	u32 buf_used;
 	struct sk_buff_head rx_queue;
 	u32 msg_count;
+
+	/* ULEB128-encoded seqpacket message boundary buffer */
+	u8 *boundary_buf;
+	u32 boundary_len;
+	u32 boundary_alloc;
+	u32 boundary_off;
 };
 
 struct virtio_vsock_pkt_info {
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 416d533f493d..81654f70f72c 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -11,6 +11,7 @@
 #include <linux/sched/signal.h>
 #include <linux/ctype.h>
 #include <linux/list.h>
+#include <linux/skmsg.h>
 #include <linux/virtio_vsock.h>
 #include <uapi/linux/vsockmon.h>
 
@@ -26,6 +27,91 @@
 /* Threshold for detecting small packets to copy */
 #define GOOD_COPY_LEN  128
 
+#define VSOCK_BOUNDARY_BUF_INIT	64
+#define VSOCK_BOUNDARY_BUF_MAX	PAGE_SIZE
+
+/* ULEB128 boundary encoding: value = (msg_len << 1) | eor.
+ * Each byte carries 7 data bits; bit 7 is set on all but the last byte.
+ * Max 5 bytes for a u32 msg_len (33 bits with eor shift).
+ */
+static int vsock_uleb_encode_boundary(u8 *buf, u32 msg_len, bool eor)
+{
+	u64 val = ((u64)msg_len << 1) | eor;
+	int n = 0;
+
+	do {
+		buf[n] = val & 0x7f;
+		val >>= 7;
+		if (val)
+			buf[n] |= 0x80;
+		n++;
+	} while (val);
+
+	return n;
+}
+
+static int vsock_uleb_decode_boundary(const u8 *buf, u32 avail,
+				      u32 *msg_len, bool *eor)
+{
+	u64 val = 0;
+	int shift = 0;
+	int n = 0;
+
+	do {
+		if (n >= avail || shift >= 35)
+			return -EINVAL;
+		val |= (u64)(buf[n] & 0x7f) << shift;
+		shift += 7;
+	} while (buf[n++] & 0x80);
+
+	*eor = val & 1;
+	*msg_len = val >> 1;
+	return n;
+}
+
+static void vsock_boundary_buf_compact(struct virtio_vsock_sock *vvs)
+{
+	if (vvs->boundary_off == 0)
+		return;
+
+	vvs->boundary_len -= vvs->boundary_off;
+	memmove(vvs->boundary_buf, vvs->boundary_buf + vvs->boundary_off,
+		vvs->boundary_len);
+	vvs->boundary_off = 0;
+}
+
+static int vsock_boundary_buf_ensure(struct virtio_vsock_sock *vvs, u32 needed)
+{
+	u32 new_alloc;
+	u8 *new_buf;
+
+	if (vvs->boundary_alloc >= needed)
+		return 0;
+
+	/* Reclaim consumed space before growing */
+	if (vvs->boundary_off) {
+		needed -= vvs->boundary_off;
+		vsock_boundary_buf_compact(vvs);
+		if (vvs->boundary_alloc >= needed)
+			return 0;
+	}
+
+	new_alloc = max(needed, vvs->boundary_alloc ? vvs->boundary_alloc * 2
+						    : VSOCK_BOUNDARY_BUF_INIT);
+	if (new_alloc > VSOCK_BOUNDARY_BUF_MAX)
+		new_alloc = VSOCK_BOUNDARY_BUF_MAX;
+	if (new_alloc < needed)
+		return -ENOMEM;
+
+	new_buf = krealloc(vvs->boundary_buf, new_alloc, GFP_ATOMIC);
+	if (!new_buf)
+		return -ENOMEM;
+
+	vvs->boundary_buf = new_buf;
+	vvs->boundary_alloc = new_alloc;
+	return 0;
+}
+
 static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
 					       bool cancel_timeout);
 static s64 virtio_transport_has_space(struct virtio_vsock_sock *vvs);
@@ -682,41 +768,74 @@ virtio_transport_seqpacket_do_peek(struct vsock_sock *vsk,
 	total = 0;
 	len = msg_data_left(msg);
 
-	skb_queue_walk(&vvs->rx_queue, skb) {
-		struct virtio_vsock_hdr *hdr;
+	skb = skb_peek(&vvs->rx_queue);
+	if (skb && VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
+		u32 msg_len, offset;
+		size_t bytes;
+		bool eor;
+		int ret;
 
-		if (total < len) {
-			size_t bytes;
+		ret = vsock_uleb_decode_boundary(
+			vvs->boundary_buf + vvs->boundary_off,
+			vvs->boundary_len - vvs->boundary_off,
+			&msg_len, &eor);
+		if (ret < 0)
+			goto unlock;
+
+		offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
+		bytes = min(len, (size_t)msg_len);
+
+		if (bytes) {
 			int err;
 
-			bytes = len - total;
-			if (bytes > skb->len)
-				bytes = skb->len;
-
 			spin_unlock_bh(&vvs->rx_lock);
-
-			/* sk_lock is held by caller so no one else can dequeue.
-			 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
-			 */
-			err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->offset,
+			err = skb_copy_datagram_iter(skb, offset,
 						     &msg->msg_iter, bytes);
 			if (err)
 				return err;
-
 			spin_lock_bh(&vvs->rx_lock);
 		}
 
-		total += skb->len;
-		hdr = virtio_vsock_hdr(skb);
+		total = msg_len;
+		if (eor)
+			msg->msg_flags |= MSG_EOR;
+	} else {
+		skb_queue_walk(&vvs->rx_queue, skb) {
+			struct virtio_vsock_hdr *hdr;
 
-		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
-			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
-				msg->msg_flags |= MSG_EOR;
+			if (total < len) {
+				size_t bytes;
+				int err;
 
-			break;
+				bytes = len - total;
+				if (bytes > skb->len)
+					bytes = skb->len;
+
+				spin_unlock_bh(&vvs->rx_lock);
+
+				err = skb_copy_datagram_iter(
+					skb,
+					VIRTIO_VSOCK_SKB_CB(skb)->offset,
+					&msg->msg_iter, bytes);
+				if (err)
+					return err;
+
+				spin_lock_bh(&vvs->rx_lock);
+			}
+
+			total += skb->len;
+			hdr = virtio_vsock_hdr(skb);
+
+			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
+				if (le32_to_cpu(hdr->flags) &
+				    VIRTIO_VSOCK_SEQ_EOR)
+					msg->msg_flags |= MSG_EOR;
+				break;
+			}
 		}
 	}
 
+unlock:
 	spin_unlock_bh(&vvs->rx_lock);
 
 	return total;
@@ -740,57 +859,105 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
 	}
 
 	while (!msg_ready) {
-		struct virtio_vsock_hdr *hdr;
-		size_t pkt_len;
-
-		skb = __skb_dequeue(&vvs->rx_queue);
+		skb = skb_peek(&vvs->rx_queue);
 		if (!skb)
 			break;
-		hdr = virtio_vsock_hdr(skb);
-		pkt_len = (size_t)le32_to_cpu(hdr->len);
 
-		if (dequeued_len >= 0) {
+		if (VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
 			size_t bytes_to_copy;
+			u32 msg_len, offset;
+			bool eor;
+			int ret;
 
-			bytes_to_copy = min(user_buf_len, pkt_len);
+			ret = vsock_uleb_decode_boundary(
+				vvs->boundary_buf + vvs->boundary_off,
+				vvs->boundary_len - vvs->boundary_off,
+				&msg_len, &eor);
+			if (ret < 0)
+				break;
+			vvs->boundary_off += ret;
 
-			if (bytes_to_copy) {
+			offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
+			bytes_to_copy = min(user_buf_len, (size_t)msg_len);
+
+			if (bytes_to_copy && dequeued_len >= 0) {
 				int err;
 
-				/* sk_lock is held by caller so no one else can dequeue.
-				 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
-				 */
 				spin_unlock_bh(&vvs->rx_lock);
-
-				err = skb_copy_datagram_iter(skb, 0,
+				err = skb_copy_datagram_iter(skb, offset,
 							     &msg->msg_iter,
 							     bytes_to_copy);
-				if (err) {
-					/* Copy of message failed. Rest of
-					 * fragments will be freed without copy.
-					 */
-					dequeued_len = err;
-				} else {
-					user_buf_len -= bytes_to_copy;
-				}
-
 				spin_lock_bh(&vvs->rx_lock);
+				if (err)
+					dequeued_len = err;
+				else
+					user_buf_len -= bytes_to_copy;
 			}
 
 			if (dequeued_len >= 0)
-				dequeued_len += pkt_len;
-		}
+				dequeued_len += msg_len;
 
-		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
+			VIRTIO_VSOCK_SKB_CB(skb)->offset += msg_len;
 			msg_ready = true;
 			vvs->msg_count--;
 
-			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
+			if (eor)
 				msg->msg_flags |= MSG_EOR;
-		}
 
-		virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
-		kfree_skb(skb);
+			virtio_transport_dec_rx_pkt(vvs, msg_len, msg_len);
+
+			if (VIRTIO_VSOCK_SKB_CB(skb)->offset >= skb->len) {
+				__skb_unlink(skb, &vvs->rx_queue);
+				kfree_skb(skb);
+			}
+
+			if (vvs->boundary_off >= vvs->boundary_len / 2)
+				vsock_boundary_buf_compact(vvs);
+		} else {
+			struct virtio_vsock_hdr *hdr;
+			size_t pkt_len;
+
+			skb = __skb_dequeue(&vvs->rx_queue);
+			if (!skb)
+				break;
+			hdr = virtio_vsock_hdr(skb);
+			pkt_len = (size_t)le32_to_cpu(hdr->len);
+
+			if (dequeued_len >= 0) {
+				size_t bytes_to_copy;
+
+				bytes_to_copy = min(user_buf_len, pkt_len);
+
+				if (bytes_to_copy) {
+					int err;
+
+					spin_unlock_bh(&vvs->rx_lock);
+					err = skb_copy_datagram_iter(
+						skb, 0, &msg->msg_iter,
+						bytes_to_copy);
+					if (err)
+						dequeued_len = err;
+					else
+						user_buf_len -= bytes_to_copy;
+					spin_lock_bh(&vvs->rx_lock);
+				}
+
+				if (dequeued_len >= 0)
+					dequeued_len += pkt_len;
+			}
+
+			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
+				msg_ready = true;
+				vvs->msg_count--;
+
+				if (le32_to_cpu(hdr->flags) &
+				    VIRTIO_VSOCK_SEQ_EOR)
+					msg->msg_flags |= MSG_EOR;
+			}
+
+			virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
+			kfree_skb(skb);
+		}
 	}
 
 	spin_unlock_bh(&vvs->rx_lock);
@@ -1132,6 +1299,7 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
 
 	virtio_transport_cancel_close_work(vsk, true);
 
+	kfree(vvs->boundary_buf);
 	kfree(vvs);
 	vsk->trans = NULL;
 }
@@ -1224,6 +1392,11 @@ static void virtio_transport_remove_sock(struct vsock_sock *vsk)
 	 * removing it.
 	 */
 	__skb_queue_purge(&vvs->rx_queue);
+	kfree(vvs->boundary_buf);
+	vvs->boundary_buf = NULL;
+	vvs->boundary_len = 0;
+	vvs->boundary_alloc = 0;
+	vvs->boundary_off = 0;
 	vsock_remove_sock(vsk);
 }
 
@@ -1395,23 +1568,62 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
 	    !skb_is_nonlinear(skb)) {
 		struct virtio_vsock_hdr *last_hdr;
 		struct sk_buff *last_skb;
+		bool last_has_eom;
+		bool has_eom;
 
 		last_skb = skb_peek_tail(&vvs->rx_queue);
 		last_hdr = virtio_vsock_hdr(last_skb);
+		last_has_eom = le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
+		has_eom = le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
 
-		/* If there is space in the last packet queued, we copy the
-		 * new packet in its buffer. We avoid this if the last packet
-		 * queued has VIRTIO_VSOCK_SEQ_EOM set, because this is
-		 * delimiter of SEQPACKET message, so 'pkt' is the first packet
-		 * of a new message.
-		 */
-		if (skb->len < skb_tailroom(last_skb) &&
-		    !(le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)) {
-			memcpy(skb_put(last_skb, skb->len), skb->data, skb->len);
-			free_pkt = true;
-			last_hdr->flags |= hdr->flags;
-			le32_add_cpu(&last_hdr->len, len);
-			goto out;
+		if (skb->len < skb_tailroom(last_skb)) {
+			if (!last_has_eom) {
+				/* Same-message coalescing (existing path) */
+				memcpy(skb_put(last_skb, skb->len),
+				       skb->data, skb->len);
+				free_pkt = true;
+				last_hdr->flags |= hdr->flags;
+				le32_add_cpu(&last_hdr->len, len);
+				goto out;
+			}
+
+			/* Cross-EOM: coalesce complete messages into one skb,
+			 * recording message boundaries in a compact BER buffer.
+			 * Only when incoming packet also has EOM (complete msg).
+			 */
+			if (has_eom && !sk_psock(sk_vsock(vsk))) {
+				bool prev_eor, cur_eor;
+				u8 tmp[12];
+				int n = 0;
+
+				cur_eor = le32_to_cpu(hdr->flags) &
+					  VIRTIO_VSOCK_SEQ_EOR;
+
+				if (!VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries) {
+					u32 prev_len = le32_to_cpu(last_hdr->len);
+
+					prev_eor = le32_to_cpu(last_hdr->flags) &
+						   VIRTIO_VSOCK_SEQ_EOR;
+					n += vsock_uleb_encode_boundary(
+						tmp + n, prev_len, prev_eor);
+				}
+				n += vsock_uleb_encode_boundary(
+					tmp + n, len, cur_eor);
+
+				if (!vsock_boundary_buf_ensure(
+					    vvs, vvs->boundary_len + n)) {
+					memcpy(vvs->boundary_buf +
+					       vvs->boundary_len, tmp, n);
+					vvs->boundary_len += n;
+					VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries = true;
+					memcpy(skb_put(last_skb, skb->len),
+					       skb->data, skb->len);
+					free_pkt = true;
+					last_hdr->flags |= hdr->flags;
+					le32_add_cpu(&last_hdr->len, len);
+					goto out;
+				}
+			}
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-07 22:48               ` Michael S. Tsirkin
@ 2026-05-08  9:41                 ` Stefano Garzarella
  2026-05-08  9:43                   ` Michael S. Tsirkin
  2026-05-08  9:58                   ` Michael S. Tsirkin
  0 siblings, 2 replies; 26+ messages in thread
From: Stefano Garzarella @ 2026-05-08  9:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Thu, May 07, 2026 at 06:48:47PM -0400, Michael S. Tsirkin wrote:
>On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
>> On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
>> > On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:

[...]

>> > > For now, we're already doing something:
>> > > merging the skuffs if they don't have EOM set.
>> >
>> >
>> > Right that's good. You could go further and merge with EOM too
>> > if you stick the info about message boundaries somewhere else.
>>
>> This adds a lot of complexity IMO, but we can try.
>>
>> Do you have something in mind?
>
>BER is clearly overkill but here's a POC that claude made for me,
>just to give u an idea. It's clearly has a ton of issues,
>for example I dislike how GFP_ATOMIC is handled.

Okay, I somewhat understand, but clearly this isn't net material, so for 
now I think the best thing to do is to merge the fixup I sent (or 
something similar): 
https://lore.kernel.org/netdev/20260508092330.69690-1-sgarzare@redhat.com/

This is a major change that should be merged with more caution.
Could this have too much of an impact on performance?

Thanks,
Stefano

>Yet it seems to work fine in light testing.
>
>-->
>
>
>vsock/virtio: use DWARF ULEB128 to record EOM boundaries, enable cross-EOM skb coalescing
>
>virtio_transport_recv_enqueue() currently refuses to coalesce an
>incoming skb with the previous one when the previous skb carries
>VIRTIO_VSOCK_SEQ_EOM.  This forces one skb per seqpacket message.
>For workloads with many small or zero-byte messages the per-skb
>overhead (~960 bytes) dominates, causing unbounded memory growth.
>
>Decouple message boundary tracking from the skb structure: store
>boundary offsets in a compact side buffer using DWARF ULEB128
>encoding with the EOR flag folded into the low bit, then allow
>the data of multiple complete messages to be coalesced into a single
>skb.
>
>Cross-EOM coalescing fires only when:
>- both the tail skb and the incoming packet carry EOM (complete msgs)
>- the incoming packet fits in the tail skb's tailroom
>- no BPF psock is attached (read_skb expects one msg per skb)
>
>On allocation failure the code falls back to separate skbs (existing
>behaviour).  Credit accounting is unchanged; the boundary buffer is
>capped at PAGE_SIZE.
>
>Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index f91704731057..e36b9ab28372 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -12,6 +12,7 @@
> struct virtio_vsock_skb_cb {
> 	bool reply;
> 	bool tap_delivered;
>+	bool has_boundary_entries;
> 	u32 offset;
> };
>
>@@ -167,6 +168,12 @@ struct virtio_vsock_sock {
> 	u32 buf_used;
> 	struct sk_buff_head rx_queue;
> 	u32 msg_count;
>+
>+	/* ULEB128-encoded seqpacket message boundary buffer */
>+	u8 *boundary_buf;
>+	u32 boundary_len;
>+	u32 boundary_alloc;
>+	u32 boundary_off;
> };
>
> struct virtio_vsock_pkt_info {
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 416d533f493d..81654f70f72c 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -11,6 +11,7 @@
> #include <linux/sched/signal.h>
> #include <linux/ctype.h>
> #include <linux/list.h>
>+#include <linux/skmsg.h>
> #include <linux/virtio_vsock.h>
> #include <uapi/linux/vsockmon.h>
>
>@@ -26,6 +27,91 @@
> /* Threshold for detecting small packets to copy */
> #define GOOD_COPY_LEN  128
>
>+#define VSOCK_BOUNDARY_BUF_INIT	64
>+#define VSOCK_BOUNDARY_BUF_MAX	PAGE_SIZE
>+
>+/* ULEB128 boundary encoding: value = (msg_len << 1) | eor.
>+ * Each byte carries 7 data bits; bit 7 is set on all but the last byte.
>+ * Max 5 bytes for a u32 msg_len (33 bits with eor shift).
>+ */
>+static int vsock_uleb_encode_boundary(u8 *buf, u32 msg_len, bool eor)
>+{
>+	u64 val = ((u64)msg_len << 1) | eor;
>+	int n = 0;
>+
>+	do {
>+		buf[n] = val & 0x7f;
>+		val >>= 7;
>+		if (val)
>+			buf[n] |= 0x80;
>+		n++;
>+	} while (val);
>+
>+	return n;
>+}
>+
>+static int vsock_uleb_decode_boundary(const u8 *buf, u32 avail,
>+				      u32 *msg_len, bool *eor)
>+{
>+	u64 val = 0;
>+	int shift = 0;
>+	int n = 0;
>+
>+	do {
>+		if (n >= avail || shift >= 35)
>+			return -EINVAL;
>+		val |= (u64)(buf[n] & 0x7f) << shift;
>+		shift += 7;
>+	} while (buf[n++] & 0x80);
>+
>+	*eor = val & 1;
>+	*msg_len = val >> 1;
>+	return n;
>+}
>+
>+static void vsock_boundary_buf_compact(struct virtio_vsock_sock *vvs)
>+{
>+	if (vvs->boundary_off == 0)
>+		return;
>+
>+	vvs->boundary_len -= vvs->boundary_off;
>+	memmove(vvs->boundary_buf, vvs->boundary_buf + vvs->boundary_off,
>+		vvs->boundary_len);
>+	vvs->boundary_off = 0;
>+}
>+
>+static int vsock_boundary_buf_ensure(struct virtio_vsock_sock *vvs, u32 needed)
>+{
>+	u32 new_alloc;
>+	u8 *new_buf;
>+
>+	if (vvs->boundary_alloc >= needed)
>+		return 0;
>+
>+	/* Reclaim consumed space before growing */
>+	if (vvs->boundary_off) {
>+		needed -= vvs->boundary_off;
>+		vsock_boundary_buf_compact(vvs);
>+		if (vvs->boundary_alloc >= needed)
>+			return 0;
>+	}
>+
>+	new_alloc = max(needed, vvs->boundary_alloc ? vvs->boundary_alloc * 2
>+						    : VSOCK_BOUNDARY_BUF_INIT);
>+	if (new_alloc > VSOCK_BOUNDARY_BUF_MAX)
>+		new_alloc = VSOCK_BOUNDARY_BUF_MAX;
>+	if (new_alloc < needed)
>+		return -ENOMEM;
>+
>+	new_buf = krealloc(vvs->boundary_buf, new_alloc, GFP_ATOMIC);
>+	if (!new_buf)
>+		return -ENOMEM;
>+
>+	vvs->boundary_buf = new_buf;
>+	vvs->boundary_alloc = new_alloc;
>+	return 0;
>+}
>+
> static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
> 					       bool cancel_timeout);
> static s64 virtio_transport_has_space(struct virtio_vsock_sock *vvs);
>@@ -682,41 +768,74 @@ virtio_transport_seqpacket_do_peek(struct vsock_sock *vsk,
> 	total = 0;
> 	len = msg_data_left(msg);
>
>-	skb_queue_walk(&vvs->rx_queue, skb) {
>-		struct virtio_vsock_hdr *hdr;
>+	skb = skb_peek(&vvs->rx_queue);
>+	if (skb && VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
>+		u32 msg_len, offset;
>+		size_t bytes;
>+		bool eor;
>+		int ret;
>
>-		if (total < len) {
>-			size_t bytes;
>+		ret = vsock_uleb_decode_boundary(
>+			vvs->boundary_buf + vvs->boundary_off,
>+			vvs->boundary_len - vvs->boundary_off,
>+			&msg_len, &eor);
>+		if (ret < 0)
>+			goto unlock;
>+
>+		offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
>+		bytes = min(len, (size_t)msg_len);
>+
>+		if (bytes) {
> 			int err;
>
>-			bytes = len - total;
>-			if (bytes > skb->len)
>-				bytes = skb->len;
>-
> 			spin_unlock_bh(&vvs->rx_lock);
>-
>-			/* sk_lock is held by caller so no one else can dequeue.
>-			 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
>-			 */
>-			err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->offset,
>+			err = skb_copy_datagram_iter(skb, offset,
> 						     &msg->msg_iter, bytes);
> 			if (err)
> 				return err;
>-
> 			spin_lock_bh(&vvs->rx_lock);
> 		}
>
>-		total += skb->len;
>-		hdr = virtio_vsock_hdr(skb);
>+		total = msg_len;
>+		if (eor)
>+			msg->msg_flags |= MSG_EOR;
>+	} else {
>+		skb_queue_walk(&vvs->rx_queue, skb) {
>+			struct virtio_vsock_hdr *hdr;
>
>-		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
>-			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
>-				msg->msg_flags |= MSG_EOR;
>+			if (total < len) {
>+				size_t bytes;
>+				int err;
>
>-			break;
>+				bytes = len - total;
>+				if (bytes > skb->len)
>+					bytes = skb->len;
>+
>+				spin_unlock_bh(&vvs->rx_lock);
>+
>+				err = skb_copy_datagram_iter(
>+					skb,
>+					VIRTIO_VSOCK_SKB_CB(skb)->offset,
>+					&msg->msg_iter, bytes);
>+				if (err)
>+					return err;
>+
>+				spin_lock_bh(&vvs->rx_lock);
>+			}
>+
>+			total += skb->len;
>+			hdr = virtio_vsock_hdr(skb);
>+
>+			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
>+				if (le32_to_cpu(hdr->flags) &
>+				    VIRTIO_VSOCK_SEQ_EOR)
>+					msg->msg_flags |= MSG_EOR;
>+				break;
>+			}
> 		}
> 	}
>
>+unlock:
> 	spin_unlock_bh(&vvs->rx_lock);
>
> 	return total;
>@@ -740,57 +859,105 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
> 	}
>
> 	while (!msg_ready) {
>-		struct virtio_vsock_hdr *hdr;
>-		size_t pkt_len;
>-
>-		skb = __skb_dequeue(&vvs->rx_queue);
>+		skb = skb_peek(&vvs->rx_queue);
> 		if (!skb)
> 			break;
>-		hdr = virtio_vsock_hdr(skb);
>-		pkt_len = (size_t)le32_to_cpu(hdr->len);
>
>-		if (dequeued_len >= 0) {
>+		if (VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
> 			size_t bytes_to_copy;
>+			u32 msg_len, offset;
>+			bool eor;
>+			int ret;
>
>-			bytes_to_copy = min(user_buf_len, pkt_len);
>+			ret = vsock_uleb_decode_boundary(
>+				vvs->boundary_buf + vvs->boundary_off,
>+				vvs->boundary_len - vvs->boundary_off,
>+				&msg_len, &eor);
>+			if (ret < 0)
>+				break;
>+			vvs->boundary_off += ret;
>
>-			if (bytes_to_copy) {
>+			offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
>+			bytes_to_copy = min(user_buf_len, (size_t)msg_len);
>+
>+			if (bytes_to_copy && dequeued_len >= 0) {
> 				int err;
>
>-				/* sk_lock is held by caller so no one else can dequeue.
>-				 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
>-				 */
> 				spin_unlock_bh(&vvs->rx_lock);
>-
>-				err = skb_copy_datagram_iter(skb, 0,
>+				err = skb_copy_datagram_iter(skb, offset,
> 							     &msg->msg_iter,
> 							     bytes_to_copy);
>-				if (err) {
>-					/* Copy of message failed. Rest of
>-					 * fragments will be freed without copy.
>-					 */
>-					dequeued_len = err;
>-				} else {
>-					user_buf_len -= bytes_to_copy;
>-				}
>-
> 				spin_lock_bh(&vvs->rx_lock);
>+				if (err)
>+					dequeued_len = err;
>+				else
>+					user_buf_len -= bytes_to_copy;
> 			}
>
> 			if (dequeued_len >= 0)
>-				dequeued_len += pkt_len;
>-		}
>+				dequeued_len += msg_len;
>
>-		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
>+			VIRTIO_VSOCK_SKB_CB(skb)->offset += msg_len;
> 			msg_ready = true;
> 			vvs->msg_count--;
>
>-			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
>+			if (eor)
> 				msg->msg_flags |= MSG_EOR;
>-		}
>
>-		virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
>-		kfree_skb(skb);
>+			virtio_transport_dec_rx_pkt(vvs, msg_len, msg_len);
>+
>+			if (VIRTIO_VSOCK_SKB_CB(skb)->offset >= skb->len) {
>+				__skb_unlink(skb, &vvs->rx_queue);
>+				kfree_skb(skb);
>+			}
>+
>+			if (vvs->boundary_off >= vvs->boundary_len / 2)
>+				vsock_boundary_buf_compact(vvs);
>+		} else {
>+			struct virtio_vsock_hdr *hdr;
>+			size_t pkt_len;
>+
>+			skb = __skb_dequeue(&vvs->rx_queue);
>+			if (!skb)
>+				break;
>+			hdr = virtio_vsock_hdr(skb);
>+			pkt_len = (size_t)le32_to_cpu(hdr->len);
>+
>+			if (dequeued_len >= 0) {
>+				size_t bytes_to_copy;
>+
>+				bytes_to_copy = min(user_buf_len, pkt_len);
>+
>+				if (bytes_to_copy) {
>+					int err;
>+
>+					spin_unlock_bh(&vvs->rx_lock);
>+					err = skb_copy_datagram_iter(
>+						skb, 0, &msg->msg_iter,
>+						bytes_to_copy);
>+					if (err)
>+						dequeued_len = err;
>+					else
>+						user_buf_len -= bytes_to_copy;
>+					spin_lock_bh(&vvs->rx_lock);
>+				}
>+
>+				if (dequeued_len >= 0)
>+					dequeued_len += pkt_len;
>+			}
>+
>+			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
>+				msg_ready = true;
>+				vvs->msg_count--;
>+
>+				if (le32_to_cpu(hdr->flags) &
>+				    VIRTIO_VSOCK_SEQ_EOR)
>+					msg->msg_flags |= MSG_EOR;
>+			}
>+
>+			virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
>+			kfree_skb(skb);
>+		}
> 	}
>
> 	spin_unlock_bh(&vvs->rx_lock);
>@@ -1132,6 +1299,7 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
>
> 	virtio_transport_cancel_close_work(vsk, true);
>
>+	kfree(vvs->boundary_buf);
> 	kfree(vvs);
> 	vsk->trans = NULL;
> }
>@@ -1224,6 +1392,11 @@ static void virtio_transport_remove_sock(struct vsock_sock *vsk)
> 	 * removing it.
> 	 */
> 	__skb_queue_purge(&vvs->rx_queue);
>+	kfree(vvs->boundary_buf);
>+	vvs->boundary_buf = NULL;
>+	vvs->boundary_len = 0;
>+	vvs->boundary_alloc = 0;
>+	vvs->boundary_off = 0;
> 	vsock_remove_sock(vsk);
> }
>
>@@ -1395,23 +1568,62 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> 	    !skb_is_nonlinear(skb)) {
> 		struct virtio_vsock_hdr *last_hdr;
> 		struct sk_buff *last_skb;
>+		bool last_has_eom;
>+		bool has_eom;
>
> 		last_skb = skb_peek_tail(&vvs->rx_queue);
> 		last_hdr = virtio_vsock_hdr(last_skb);
>+		last_has_eom = le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
>+		has_eom = le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
>
>-		/* If there is space in the last packet queued, we copy the
>-		 * new packet in its buffer. We avoid this if the last packet
>-		 * queued has VIRTIO_VSOCK_SEQ_EOM set, because this is
>-		 * delimiter of SEQPACKET message, so 'pkt' is the first packet
>-		 * of a new message.
>-		 */
>-		if (skb->len < skb_tailroom(last_skb) &&
>-		    !(le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)) {
>-			memcpy(skb_put(last_skb, skb->len), skb->data, skb->len);
>-			free_pkt = true;
>-			last_hdr->flags |= hdr->flags;
>-			le32_add_cpu(&last_hdr->len, len);
>-			goto out;
>+		if (skb->len < skb_tailroom(last_skb)) {
>+			if (!last_has_eom) {
>+				/* Same-message coalescing (existing path) */
>+				memcpy(skb_put(last_skb, skb->len),
>+				       skb->data, skb->len);
>+				free_pkt = true;
>+				last_hdr->flags |= hdr->flags;
>+				le32_add_cpu(&last_hdr->len, len);
>+				goto out;
>+			}
>+
>+			/* Cross-EOM: coalesce complete messages into one skb,
>+			 * recording message boundaries in a compact BER buffer.
>+			 * Only when incoming packet also has EOM (complete msg).
>+			 */
>+			if (has_eom && !sk_psock(sk_vsock(vsk))) {
>+				bool prev_eor, cur_eor;
>+				u8 tmp[12];
>+				int n = 0;
>+
>+				cur_eor = le32_to_cpu(hdr->flags) &
>+					  VIRTIO_VSOCK_SEQ_EOR;
>+
>+				if (!VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries) {
>+					u32 prev_len = le32_to_cpu(last_hdr->len);
>+
>+					prev_eor = le32_to_cpu(last_hdr->flags) &
>+						   VIRTIO_VSOCK_SEQ_EOR;
>+					n += vsock_uleb_encode_boundary(
>+						tmp + n, prev_len, prev_eor);
>+				}
>+				n += vsock_uleb_encode_boundary(
>+					tmp + n, len, cur_eor);
>+
>+				if (!vsock_boundary_buf_ensure(
>+					    vvs, vvs->boundary_len + n)) {
>+					memcpy(vvs->boundary_buf +
>+					       vvs->boundary_len, tmp, n);
>+					vvs->boundary_len += n;
>+					VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries = true;
>+					memcpy(skb_put(last_skb, skb->len),
>+					       skb->data, skb->len);
>+					free_pkt = true;
>+					last_hdr->flags |= hdr->flags;
>+					le32_add_cpu(&last_hdr->len, len);
>+					goto out;
>+				}
>+			}
> 		}
> 	}
>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-08  9:41                 ` Stefano Garzarella
@ 2026-05-08  9:43                   ` Michael S. Tsirkin
  2026-05-08  9:58                   ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2026-05-08  9:43 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Fri, May 08, 2026 at 11:41:21AM +0200, Stefano Garzarella wrote:
> On Thu, May 07, 2026 at 06:48:47PM -0400, Michael S. Tsirkin wrote:
> > On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
> > > On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
> 
> [...]
> 
> > > > > For now, we're already doing something:
> > > > > merging the skuffs if they don't have EOM set.
> > > >
> > > >
> > > > Right that's good. You could go further and merge with EOM too
> > > > if you stick the info about message boundaries somewhere else.
> > > 
> > > This adds a lot of complexity IMO, but we can try.
> > > 
> > > Do you have something in mind?
> > 
> > BER is clearly overkill but here's a POC that claude made for me,
> > just to give u an idea. It's clearly has a ton of issues,
> > for example I dislike how GFP_ATOMIC is handled.
> 
> Okay, I somewhat understand, but clearly this isn't net material, so for now
> I think the best thing to do is to merge the fixup I sent (or something
> similar):
> https://lore.kernel.org/netdev/20260508092330.69690-1-sgarzare@redhat.com/


will respond there.

> This is a major change that should be merged with more caution.
> Could this have too much of an impact on performance?
> 
> Thanks,
> Stefano

Upstream is so broken now, I'd not worry about perf even.


> > Yet it seems to work fine in light testing.
> > 
> > -->
> > 
> > 
> > vsock/virtio: use DWARF ULEB128 to record EOM boundaries, enable cross-EOM skb coalescing
> > 
> > virtio_transport_recv_enqueue() currently refuses to coalesce an
> > incoming skb with the previous one when the previous skb carries
> > VIRTIO_VSOCK_SEQ_EOM.  This forces one skb per seqpacket message.
> > For workloads with many small or zero-byte messages the per-skb
> > overhead (~960 bytes) dominates, causing unbounded memory growth.
> > 
> > Decouple message boundary tracking from the skb structure: store
> > boundary offsets in a compact side buffer using DWARF ULEB128
> > encoding with the EOR flag folded into the low bit, then allow
> > the data of multiple complete messages to be coalesced into a single
> > skb.
> > 
> > Cross-EOM coalescing fires only when:
> > - both the tail skb and the incoming packet carry EOM (complete msgs)
> > - the incoming packet fits in the tail skb's tailroom
> > - no BPF psock is attached (read_skb expects one msg per skb)
> > 
> > On allocation failure the code falls back to separate skbs (existing
> > behaviour).  Credit accounting is unchanged; the boundary buffer is
> > capped at PAGE_SIZE.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
> > 
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index f91704731057..e36b9ab28372 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -12,6 +12,7 @@
> > struct virtio_vsock_skb_cb {
> > 	bool reply;
> > 	bool tap_delivered;
> > +	bool has_boundary_entries;
> > 	u32 offset;
> > };
> > 
> > @@ -167,6 +168,12 @@ struct virtio_vsock_sock {
> > 	u32 buf_used;
> > 	struct sk_buff_head rx_queue;
> > 	u32 msg_count;
> > +
> > +	/* ULEB128-encoded seqpacket message boundary buffer */
> > +	u8 *boundary_buf;
> > +	u32 boundary_len;
> > +	u32 boundary_alloc;
> > +	u32 boundary_off;
> > };
> > 
> > struct virtio_vsock_pkt_info {
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 416d533f493d..81654f70f72c 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -11,6 +11,7 @@
> > #include <linux/sched/signal.h>
> > #include <linux/ctype.h>
> > #include <linux/list.h>
> > +#include <linux/skmsg.h>
> > #include <linux/virtio_vsock.h>
> > #include <uapi/linux/vsockmon.h>
> > 
> > @@ -26,6 +27,91 @@
> > /* Threshold for detecting small packets to copy */
> > #define GOOD_COPY_LEN  128
> > 
> > +#define VSOCK_BOUNDARY_BUF_INIT	64
> > +#define VSOCK_BOUNDARY_BUF_MAX	PAGE_SIZE
> > +
> > +/* ULEB128 boundary encoding: value = (msg_len << 1) | eor.
> > + * Each byte carries 7 data bits; bit 7 is set on all but the last byte.
> > + * Max 5 bytes for a u32 msg_len (33 bits with eor shift).
> > + */
> > +static int vsock_uleb_encode_boundary(u8 *buf, u32 msg_len, bool eor)
> > +{
> > +	u64 val = ((u64)msg_len << 1) | eor;
> > +	int n = 0;
> > +
> > +	do {
> > +		buf[n] = val & 0x7f;
> > +		val >>= 7;
> > +		if (val)
> > +			buf[n] |= 0x80;
> > +		n++;
> > +	} while (val);
> > +
> > +	return n;
> > +}
> > +
> > +static int vsock_uleb_decode_boundary(const u8 *buf, u32 avail,
> > +				      u32 *msg_len, bool *eor)
> > +{
> > +	u64 val = 0;
> > +	int shift = 0;
> > +	int n = 0;
> > +
> > +	do {
> > +		if (n >= avail || shift >= 35)
> > +			return -EINVAL;
> > +		val |= (u64)(buf[n] & 0x7f) << shift;
> > +		shift += 7;
> > +	} while (buf[n++] & 0x80);
> > +
> > +	*eor = val & 1;
> > +	*msg_len = val >> 1;
> > +	return n;
> > +}
> > +
> > +static void vsock_boundary_buf_compact(struct virtio_vsock_sock *vvs)
> > +{
> > +	if (vvs->boundary_off == 0)
> > +		return;
> > +
> > +	vvs->boundary_len -= vvs->boundary_off;
> > +	memmove(vvs->boundary_buf, vvs->boundary_buf + vvs->boundary_off,
> > +		vvs->boundary_len);
> > +	vvs->boundary_off = 0;
> > +}
> > +
> > +static int vsock_boundary_buf_ensure(struct virtio_vsock_sock *vvs, u32 needed)
> > +{
> > +	u32 new_alloc;
> > +	u8 *new_buf;
> > +
> > +	if (vvs->boundary_alloc >= needed)
> > +		return 0;
> > +
> > +	/* Reclaim consumed space before growing */
> > +	if (vvs->boundary_off) {
> > +		needed -= vvs->boundary_off;
> > +		vsock_boundary_buf_compact(vvs);
> > +		if (vvs->boundary_alloc >= needed)
> > +			return 0;
> > +	}
> > +
> > +	new_alloc = max(needed, vvs->boundary_alloc ? vvs->boundary_alloc * 2
> > +						    : VSOCK_BOUNDARY_BUF_INIT);
> > +	if (new_alloc > VSOCK_BOUNDARY_BUF_MAX)
> > +		new_alloc = VSOCK_BOUNDARY_BUF_MAX;
> > +	if (new_alloc < needed)
> > +		return -ENOMEM;
> > +
> > +	new_buf = krealloc(vvs->boundary_buf, new_alloc, GFP_ATOMIC);
> > +	if (!new_buf)
> > +		return -ENOMEM;
> > +
> > +	vvs->boundary_buf = new_buf;
> > +	vvs->boundary_alloc = new_alloc;
> > +	return 0;
> > +}
> > +
> > static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
> > 					       bool cancel_timeout);
> > static s64 virtio_transport_has_space(struct virtio_vsock_sock *vvs);
> > @@ -682,41 +768,74 @@ virtio_transport_seqpacket_do_peek(struct vsock_sock *vsk,
> > 	total = 0;
> > 	len = msg_data_left(msg);
> > 
> > -	skb_queue_walk(&vvs->rx_queue, skb) {
> > -		struct virtio_vsock_hdr *hdr;
> > +	skb = skb_peek(&vvs->rx_queue);
> > +	if (skb && VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
> > +		u32 msg_len, offset;
> > +		size_t bytes;
> > +		bool eor;
> > +		int ret;
> > 
> > -		if (total < len) {
> > -			size_t bytes;
> > +		ret = vsock_uleb_decode_boundary(
> > +			vvs->boundary_buf + vvs->boundary_off,
> > +			vvs->boundary_len - vvs->boundary_off,
> > +			&msg_len, &eor);
> > +		if (ret < 0)
> > +			goto unlock;
> > +
> > +		offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
> > +		bytes = min(len, (size_t)msg_len);
> > +
> > +		if (bytes) {
> > 			int err;
> > 
> > -			bytes = len - total;
> > -			if (bytes > skb->len)
> > -				bytes = skb->len;
> > -
> > 			spin_unlock_bh(&vvs->rx_lock);
> > -
> > -			/* sk_lock is held by caller so no one else can dequeue.
> > -			 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
> > -			 */
> > -			err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->offset,
> > +			err = skb_copy_datagram_iter(skb, offset,
> > 						     &msg->msg_iter, bytes);
> > 			if (err)
> > 				return err;
> > -
> > 			spin_lock_bh(&vvs->rx_lock);
> > 		}
> > 
> > -		total += skb->len;
> > -		hdr = virtio_vsock_hdr(skb);
> > +		total = msg_len;
> > +		if (eor)
> > +			msg->msg_flags |= MSG_EOR;
> > +	} else {
> > +		skb_queue_walk(&vvs->rx_queue, skb) {
> > +			struct virtio_vsock_hdr *hdr;
> > 
> > -		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
> > -			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
> > -				msg->msg_flags |= MSG_EOR;
> > +			if (total < len) {
> > +				size_t bytes;
> > +				int err;
> > 
> > -			break;
> > +				bytes = len - total;
> > +				if (bytes > skb->len)
> > +					bytes = skb->len;
> > +
> > +				spin_unlock_bh(&vvs->rx_lock);
> > +
> > +				err = skb_copy_datagram_iter(
> > +					skb,
> > +					VIRTIO_VSOCK_SKB_CB(skb)->offset,
> > +					&msg->msg_iter, bytes);
> > +				if (err)
> > +					return err;
> > +
> > +				spin_lock_bh(&vvs->rx_lock);
> > +			}
> > +
> > +			total += skb->len;
> > +			hdr = virtio_vsock_hdr(skb);
> > +
> > +			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
> > +				if (le32_to_cpu(hdr->flags) &
> > +				    VIRTIO_VSOCK_SEQ_EOR)
> > +					msg->msg_flags |= MSG_EOR;
> > +				break;
> > +			}
> > 		}
> > 	}
> > 
> > +unlock:
> > 	spin_unlock_bh(&vvs->rx_lock);
> > 
> > 	return total;
> > @@ -740,57 +859,105 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
> > 	}
> > 
> > 	while (!msg_ready) {
> > -		struct virtio_vsock_hdr *hdr;
> > -		size_t pkt_len;
> > -
> > -		skb = __skb_dequeue(&vvs->rx_queue);
> > +		skb = skb_peek(&vvs->rx_queue);
> > 		if (!skb)
> > 			break;
> > -		hdr = virtio_vsock_hdr(skb);
> > -		pkt_len = (size_t)le32_to_cpu(hdr->len);
> > 
> > -		if (dequeued_len >= 0) {
> > +		if (VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
> > 			size_t bytes_to_copy;
> > +			u32 msg_len, offset;
> > +			bool eor;
> > +			int ret;
> > 
> > -			bytes_to_copy = min(user_buf_len, pkt_len);
> > +			ret = vsock_uleb_decode_boundary(
> > +				vvs->boundary_buf + vvs->boundary_off,
> > +				vvs->boundary_len - vvs->boundary_off,
> > +				&msg_len, &eor);
> > +			if (ret < 0)
> > +				break;
> > +			vvs->boundary_off += ret;
> > 
> > -			if (bytes_to_copy) {
> > +			offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
> > +			bytes_to_copy = min(user_buf_len, (size_t)msg_len);
> > +
> > +			if (bytes_to_copy && dequeued_len >= 0) {
> > 				int err;
> > 
> > -				/* sk_lock is held by caller so no one else can dequeue.
> > -				 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
> > -				 */
> > 				spin_unlock_bh(&vvs->rx_lock);
> > -
> > -				err = skb_copy_datagram_iter(skb, 0,
> > +				err = skb_copy_datagram_iter(skb, offset,
> > 							     &msg->msg_iter,
> > 							     bytes_to_copy);
> > -				if (err) {
> > -					/* Copy of message failed. Rest of
> > -					 * fragments will be freed without copy.
> > -					 */
> > -					dequeued_len = err;
> > -				} else {
> > -					user_buf_len -= bytes_to_copy;
> > -				}
> > -
> > 				spin_lock_bh(&vvs->rx_lock);
> > +				if (err)
> > +					dequeued_len = err;
> > +				else
> > +					user_buf_len -= bytes_to_copy;
> > 			}
> > 
> > 			if (dequeued_len >= 0)
> > -				dequeued_len += pkt_len;
> > -		}
> > +				dequeued_len += msg_len;
> > 
> > -		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
> > +			VIRTIO_VSOCK_SKB_CB(skb)->offset += msg_len;
> > 			msg_ready = true;
> > 			vvs->msg_count--;
> > 
> > -			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
> > +			if (eor)
> > 				msg->msg_flags |= MSG_EOR;
> > -		}
> > 
> > -		virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
> > -		kfree_skb(skb);
> > +			virtio_transport_dec_rx_pkt(vvs, msg_len, msg_len);
> > +
> > +			if (VIRTIO_VSOCK_SKB_CB(skb)->offset >= skb->len) {
> > +				__skb_unlink(skb, &vvs->rx_queue);
> > +				kfree_skb(skb);
> > +			}
> > +
> > +			if (vvs->boundary_off >= vvs->boundary_len / 2)
> > +				vsock_boundary_buf_compact(vvs);
> > +		} else {
> > +			struct virtio_vsock_hdr *hdr;
> > +			size_t pkt_len;
> > +
> > +			skb = __skb_dequeue(&vvs->rx_queue);
> > +			if (!skb)
> > +				break;
> > +			hdr = virtio_vsock_hdr(skb);
> > +			pkt_len = (size_t)le32_to_cpu(hdr->len);
> > +
> > +			if (dequeued_len >= 0) {
> > +				size_t bytes_to_copy;
> > +
> > +				bytes_to_copy = min(user_buf_len, pkt_len);
> > +
> > +				if (bytes_to_copy) {
> > +					int err;
> > +
> > +					spin_unlock_bh(&vvs->rx_lock);
> > +					err = skb_copy_datagram_iter(
> > +						skb, 0, &msg->msg_iter,
> > +						bytes_to_copy);
> > +					if (err)
> > +						dequeued_len = err;
> > +					else
> > +						user_buf_len -= bytes_to_copy;
> > +					spin_lock_bh(&vvs->rx_lock);
> > +				}
> > +
> > +				if (dequeued_len >= 0)
> > +					dequeued_len += pkt_len;
> > +			}
> > +
> > +			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
> > +				msg_ready = true;
> > +				vvs->msg_count--;
> > +
> > +				if (le32_to_cpu(hdr->flags) &
> > +				    VIRTIO_VSOCK_SEQ_EOR)
> > +					msg->msg_flags |= MSG_EOR;
> > +			}
> > +
> > +			virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
> > +			kfree_skb(skb);
> > +		}
> > 	}
> > 
> > 	spin_unlock_bh(&vvs->rx_lock);
> > @@ -1132,6 +1299,7 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
> > 
> > 	virtio_transport_cancel_close_work(vsk, true);
> > 
> > +	kfree(vvs->boundary_buf);
> > 	kfree(vvs);
> > 	vsk->trans = NULL;
> > }
> > @@ -1224,6 +1392,11 @@ static void virtio_transport_remove_sock(struct vsock_sock *vsk)
> > 	 * removing it.
> > 	 */
> > 	__skb_queue_purge(&vvs->rx_queue);
> > +	kfree(vvs->boundary_buf);
> > +	vvs->boundary_buf = NULL;
> > +	vvs->boundary_len = 0;
> > +	vvs->boundary_alloc = 0;
> > +	vvs->boundary_off = 0;
> > 	vsock_remove_sock(vsk);
> > }
> > 
> > @@ -1395,23 +1568,62 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> > 	    !skb_is_nonlinear(skb)) {
> > 		struct virtio_vsock_hdr *last_hdr;
> > 		struct sk_buff *last_skb;
> > +		bool last_has_eom;
> > +		bool has_eom;
> > 
> > 		last_skb = skb_peek_tail(&vvs->rx_queue);
> > 		last_hdr = virtio_vsock_hdr(last_skb);
> > +		last_has_eom = le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
> > +		has_eom = le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
> > 
> > -		/* If there is space in the last packet queued, we copy the
> > -		 * new packet in its buffer. We avoid this if the last packet
> > -		 * queued has VIRTIO_VSOCK_SEQ_EOM set, because this is
> > -		 * delimiter of SEQPACKET message, so 'pkt' is the first packet
> > -		 * of a new message.
> > -		 */
> > -		if (skb->len < skb_tailroom(last_skb) &&
> > -		    !(le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)) {
> > -			memcpy(skb_put(last_skb, skb->len), skb->data, skb->len);
> > -			free_pkt = true;
> > -			last_hdr->flags |= hdr->flags;
> > -			le32_add_cpu(&last_hdr->len, len);
> > -			goto out;
> > +		if (skb->len < skb_tailroom(last_skb)) {
> > +			if (!last_has_eom) {
> > +				/* Same-message coalescing (existing path) */
> > +				memcpy(skb_put(last_skb, skb->len),
> > +				       skb->data, skb->len);
> > +				free_pkt = true;
> > +				last_hdr->flags |= hdr->flags;
> > +				le32_add_cpu(&last_hdr->len, len);
> > +				goto out;
> > +			}
> > +
> > +			/* Cross-EOM: coalesce complete messages into one skb,
> > +			 * recording message boundaries in a compact BER buffer.
> > +			 * Only when incoming packet also has EOM (complete msg).
> > +			 */
> > +			if (has_eom && !sk_psock(sk_vsock(vsk))) {
> > +				bool prev_eor, cur_eor;
> > +				u8 tmp[12];
> > +				int n = 0;
> > +
> > +				cur_eor = le32_to_cpu(hdr->flags) &
> > +					  VIRTIO_VSOCK_SEQ_EOR;
> > +
> > +				if (!VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries) {
> > +					u32 prev_len = le32_to_cpu(last_hdr->len);
> > +
> > +					prev_eor = le32_to_cpu(last_hdr->flags) &
> > +						   VIRTIO_VSOCK_SEQ_EOR;
> > +					n += vsock_uleb_encode_boundary(
> > +						tmp + n, prev_len, prev_eor);
> > +				}
> > +				n += vsock_uleb_encode_boundary(
> > +					tmp + n, len, cur_eor);
> > +
> > +				if (!vsock_boundary_buf_ensure(
> > +					    vvs, vvs->boundary_len + n)) {
> > +					memcpy(vvs->boundary_buf +
> > +					       vvs->boundary_len, tmp, n);
> > +					vvs->boundary_len += n;
> > +					VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries = true;
> > +					memcpy(skb_put(last_skb, skb->len),
> > +					       skb->data, skb->len);
> > +					free_pkt = true;
> > +					last_hdr->flags |= hdr->flags;
> > +					le32_add_cpu(&last_hdr->len, len);
> > +					goto out;
> > +				}
> > +			}
> > 		}
> > 	}
> > 
> > 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-08  9:41                 ` Stefano Garzarella
  2026-05-08  9:43                   ` Michael S. Tsirkin
@ 2026-05-08  9:58                   ` Michael S. Tsirkin
  2026-05-08 10:11                     ` Stefano Garzarella
  1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2026-05-08  9:58 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Fri, May 08, 2026 at 11:41:21AM +0200, Stefano Garzarella wrote:
> On Thu, May 07, 2026 at 06:48:47PM -0400, Michael S. Tsirkin wrote:
> > On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
> > > On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
> 
> [...]
> 
> > > > > For now, we're already doing something:
> > > > > merging the skuffs if they don't have EOM set.
> > > >
> > > >
> > > > Right that's good. You could go further and merge with EOM too
> > > > if you stick the info about message boundaries somewhere else.
> > > 
> > > This adds a lot of complexity IMO, but we can try.
> > > 
> > > Do you have something in mind?
> > 
> > BER is clearly overkill but here's a POC that claude made for me,
> > just to give u an idea. It's clearly has a ton of issues,
> > for example I dislike how GFP_ATOMIC is handled.
> 
> Okay, I somewhat understand, but clearly this isn't net material

I doubt we have many other options given reverting the regression was
ruled out.


> so for now
> I think the best thing to do is to merge the fixup I sent (or something
> similar):
> https://lore.kernel.org/netdev/20260508092330.69690-1-sgarzare@redhat.com/

I reviewed that one, problem is it's a spec violation/change that we'll
have to support forever. 

> This is a major change that should be merged with more caution.
> Could this have too much of an impact on performance?
> 
> Thanks,
> Stefano

It's really a POC, real patch is left as an excersise for the reader :).
The correct approach IMHO is to only start using this
when we wasted a lot of memory on small packets.

For example, if sum(truesize) >= buf size.

then we'll not see a perf impact unless it's already pathological.


> > Yet it seems to work fine in light testing.
> > 
> > -->
> > 
> > 
> > vsock/virtio: use DWARF ULEB128 to record EOM boundaries, enable cross-EOM skb coalescing
> > 
> > virtio_transport_recv_enqueue() currently refuses to coalesce an
> > incoming skb with the previous one when the previous skb carries
> > VIRTIO_VSOCK_SEQ_EOM.  This forces one skb per seqpacket message.
> > For workloads with many small or zero-byte messages the per-skb
> > overhead (~960 bytes) dominates, causing unbounded memory growth.
> > 
> > Decouple message boundary tracking from the skb structure: store
> > boundary offsets in a compact side buffer using DWARF ULEB128
> > encoding with the EOR flag folded into the low bit, then allow
> > the data of multiple complete messages to be coalesced into a single
> > skb.
> > 
> > Cross-EOM coalescing fires only when:
> > - both the tail skb and the incoming packet carry EOM (complete msgs)
> > - the incoming packet fits in the tail skb's tailroom
> > - no BPF psock is attached (read_skb expects one msg per skb)
> > 
> > On allocation failure the code falls back to separate skbs (existing
> > behaviour).  Credit accounting is unchanged; the boundary buffer is
> > capped at PAGE_SIZE.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
> > 
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index f91704731057..e36b9ab28372 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -12,6 +12,7 @@
> > struct virtio_vsock_skb_cb {
> > 	bool reply;
> > 	bool tap_delivered;
> > +	bool has_boundary_entries;
> > 	u32 offset;
> > };
> > 
> > @@ -167,6 +168,12 @@ struct virtio_vsock_sock {
> > 	u32 buf_used;
> > 	struct sk_buff_head rx_queue;
> > 	u32 msg_count;
> > +
> > +	/* ULEB128-encoded seqpacket message boundary buffer */
> > +	u8 *boundary_buf;
> > +	u32 boundary_len;
> > +	u32 boundary_alloc;
> > +	u32 boundary_off;
> > };
> > 
> > struct virtio_vsock_pkt_info {
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 416d533f493d..81654f70f72c 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -11,6 +11,7 @@
> > #include <linux/sched/signal.h>
> > #include <linux/ctype.h>
> > #include <linux/list.h>
> > +#include <linux/skmsg.h>
> > #include <linux/virtio_vsock.h>
> > #include <uapi/linux/vsockmon.h>
> > 
> > @@ -26,6 +27,91 @@
> > /* Threshold for detecting small packets to copy */
> > #define GOOD_COPY_LEN  128
> > 
> > +#define VSOCK_BOUNDARY_BUF_INIT	64
> > +#define VSOCK_BOUNDARY_BUF_MAX	PAGE_SIZE
> > +
> > +/* ULEB128 boundary encoding: value = (msg_len << 1) | eor.
> > + * Each byte carries 7 data bits; bit 7 is set on all but the last byte.
> > + * Max 5 bytes for a u32 msg_len (33 bits with eor shift).
> > + */
> > +static int vsock_uleb_encode_boundary(u8 *buf, u32 msg_len, bool eor)
> > +{
> > +	u64 val = ((u64)msg_len << 1) | eor;
> > +	int n = 0;
> > +
> > +	do {
> > +		buf[n] = val & 0x7f;
> > +		val >>= 7;
> > +		if (val)
> > +			buf[n] |= 0x80;
> > +		n++;
> > +	} while (val);
> > +
> > +	return n;
> > +}
> > +
> > +static int vsock_uleb_decode_boundary(const u8 *buf, u32 avail,
> > +				      u32 *msg_len, bool *eor)
> > +{
> > +	u64 val = 0;
> > +	int shift = 0;
> > +	int n = 0;
> > +
> > +	do {
> > +		if (n >= avail || shift >= 35)
> > +			return -EINVAL;
> > +		val |= (u64)(buf[n] & 0x7f) << shift;
> > +		shift += 7;
> > +	} while (buf[n++] & 0x80);
> > +
> > +	*eor = val & 1;
> > +	*msg_len = val >> 1;
> > +	return n;
> > +}
> > +
> > +static void vsock_boundary_buf_compact(struct virtio_vsock_sock *vvs)
> > +{
> > +	if (vvs->boundary_off == 0)
> > +		return;
> > +
> > +	vvs->boundary_len -= vvs->boundary_off;
> > +	memmove(vvs->boundary_buf, vvs->boundary_buf + vvs->boundary_off,
> > +		vvs->boundary_len);
> > +	vvs->boundary_off = 0;
> > +}
> > +
> > +static int vsock_boundary_buf_ensure(struct virtio_vsock_sock *vvs, u32 needed)
> > +{
> > +	u32 new_alloc;
> > +	u8 *new_buf;
> > +
> > +	if (vvs->boundary_alloc >= needed)
> > +		return 0;
> > +
> > +	/* Reclaim consumed space before growing */
> > +	if (vvs->boundary_off) {
> > +		needed -= vvs->boundary_off;
> > +		vsock_boundary_buf_compact(vvs);
> > +		if (vvs->boundary_alloc >= needed)
> > +			return 0;
> > +	}
> > +
> > +	new_alloc = max(needed, vvs->boundary_alloc ? vvs->boundary_alloc * 2
> > +						    : VSOCK_BOUNDARY_BUF_INIT);
> > +	if (new_alloc > VSOCK_BOUNDARY_BUF_MAX)
> > +		new_alloc = VSOCK_BOUNDARY_BUF_MAX;
> > +	if (new_alloc < needed)
> > +		return -ENOMEM;
> > +
> > +	new_buf = krealloc(vvs->boundary_buf, new_alloc, GFP_ATOMIC);
> > +	if (!new_buf)
> > +		return -ENOMEM;
> > +
> > +	vvs->boundary_buf = new_buf;
> > +	vvs->boundary_alloc = new_alloc;
> > +	return 0;
> > +}
> > +
> > static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
> > 					       bool cancel_timeout);
> > static s64 virtio_transport_has_space(struct virtio_vsock_sock *vvs);
> > @@ -682,41 +768,74 @@ virtio_transport_seqpacket_do_peek(struct vsock_sock *vsk,
> > 	total = 0;
> > 	len = msg_data_left(msg);
> > 
> > -	skb_queue_walk(&vvs->rx_queue, skb) {
> > -		struct virtio_vsock_hdr *hdr;
> > +	skb = skb_peek(&vvs->rx_queue);
> > +	if (skb && VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
> > +		u32 msg_len, offset;
> > +		size_t bytes;
> > +		bool eor;
> > +		int ret;
> > 
> > -		if (total < len) {
> > -			size_t bytes;
> > +		ret = vsock_uleb_decode_boundary(
> > +			vvs->boundary_buf + vvs->boundary_off,
> > +			vvs->boundary_len - vvs->boundary_off,
> > +			&msg_len, &eor);
> > +		if (ret < 0)
> > +			goto unlock;
> > +
> > +		offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
> > +		bytes = min(len, (size_t)msg_len);
> > +
> > +		if (bytes) {
> > 			int err;
> > 
> > -			bytes = len - total;
> > -			if (bytes > skb->len)
> > -				bytes = skb->len;
> > -
> > 			spin_unlock_bh(&vvs->rx_lock);
> > -
> > -			/* sk_lock is held by caller so no one else can dequeue.
> > -			 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
> > -			 */
> > -			err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->offset,
> > +			err = skb_copy_datagram_iter(skb, offset,
> > 						     &msg->msg_iter, bytes);
> > 			if (err)
> > 				return err;
> > -
> > 			spin_lock_bh(&vvs->rx_lock);
> > 		}
> > 
> > -		total += skb->len;
> > -		hdr = virtio_vsock_hdr(skb);
> > +		total = msg_len;
> > +		if (eor)
> > +			msg->msg_flags |= MSG_EOR;
> > +	} else {
> > +		skb_queue_walk(&vvs->rx_queue, skb) {
> > +			struct virtio_vsock_hdr *hdr;
> > 
> > -		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
> > -			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
> > -				msg->msg_flags |= MSG_EOR;
> > +			if (total < len) {
> > +				size_t bytes;
> > +				int err;
> > 
> > -			break;
> > +				bytes = len - total;
> > +				if (bytes > skb->len)
> > +					bytes = skb->len;
> > +
> > +				spin_unlock_bh(&vvs->rx_lock);
> > +
> > +				err = skb_copy_datagram_iter(
> > +					skb,
> > +					VIRTIO_VSOCK_SKB_CB(skb)->offset,
> > +					&msg->msg_iter, bytes);
> > +				if (err)
> > +					return err;
> > +
> > +				spin_lock_bh(&vvs->rx_lock);
> > +			}
> > +
> > +			total += skb->len;
> > +			hdr = virtio_vsock_hdr(skb);
> > +
> > +			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
> > +				if (le32_to_cpu(hdr->flags) &
> > +				    VIRTIO_VSOCK_SEQ_EOR)
> > +					msg->msg_flags |= MSG_EOR;
> > +				break;
> > +			}
> > 		}
> > 	}
> > 
> > +unlock:
> > 	spin_unlock_bh(&vvs->rx_lock);
> > 
> > 	return total;
> > @@ -740,57 +859,105 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
> > 	}
> > 
> > 	while (!msg_ready) {
> > -		struct virtio_vsock_hdr *hdr;
> > -		size_t pkt_len;
> > -
> > -		skb = __skb_dequeue(&vvs->rx_queue);
> > +		skb = skb_peek(&vvs->rx_queue);
> > 		if (!skb)
> > 			break;
> > -		hdr = virtio_vsock_hdr(skb);
> > -		pkt_len = (size_t)le32_to_cpu(hdr->len);
> > 
> > -		if (dequeued_len >= 0) {
> > +		if (VIRTIO_VSOCK_SKB_CB(skb)->has_boundary_entries) {
> > 			size_t bytes_to_copy;
> > +			u32 msg_len, offset;
> > +			bool eor;
> > +			int ret;
> > 
> > -			bytes_to_copy = min(user_buf_len, pkt_len);
> > +			ret = vsock_uleb_decode_boundary(
> > +				vvs->boundary_buf + vvs->boundary_off,
> > +				vvs->boundary_len - vvs->boundary_off,
> > +				&msg_len, &eor);
> > +			if (ret < 0)
> > +				break;
> > +			vvs->boundary_off += ret;
> > 
> > -			if (bytes_to_copy) {
> > +			offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
> > +			bytes_to_copy = min(user_buf_len, (size_t)msg_len);
> > +
> > +			if (bytes_to_copy && dequeued_len >= 0) {
> > 				int err;
> > 
> > -				/* sk_lock is held by caller so no one else can dequeue.
> > -				 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
> > -				 */
> > 				spin_unlock_bh(&vvs->rx_lock);
> > -
> > -				err = skb_copy_datagram_iter(skb, 0,
> > +				err = skb_copy_datagram_iter(skb, offset,
> > 							     &msg->msg_iter,
> > 							     bytes_to_copy);
> > -				if (err) {
> > -					/* Copy of message failed. Rest of
> > -					 * fragments will be freed without copy.
> > -					 */
> > -					dequeued_len = err;
> > -				} else {
> > -					user_buf_len -= bytes_to_copy;
> > -				}
> > -
> > 				spin_lock_bh(&vvs->rx_lock);
> > +				if (err)
> > +					dequeued_len = err;
> > +				else
> > +					user_buf_len -= bytes_to_copy;
> > 			}
> > 
> > 			if (dequeued_len >= 0)
> > -				dequeued_len += pkt_len;
> > -		}
> > +				dequeued_len += msg_len;
> > 
> > -		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
> > +			VIRTIO_VSOCK_SKB_CB(skb)->offset += msg_len;
> > 			msg_ready = true;
> > 			vvs->msg_count--;
> > 
> > -			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
> > +			if (eor)
> > 				msg->msg_flags |= MSG_EOR;
> > -		}
> > 
> > -		virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
> > -		kfree_skb(skb);
> > +			virtio_transport_dec_rx_pkt(vvs, msg_len, msg_len);
> > +
> > +			if (VIRTIO_VSOCK_SKB_CB(skb)->offset >= skb->len) {
> > +				__skb_unlink(skb, &vvs->rx_queue);
> > +				kfree_skb(skb);
> > +			}
> > +
> > +			if (vvs->boundary_off >= vvs->boundary_len / 2)
> > +				vsock_boundary_buf_compact(vvs);
> > +		} else {
> > +			struct virtio_vsock_hdr *hdr;
> > +			size_t pkt_len;
> > +
> > +			skb = __skb_dequeue(&vvs->rx_queue);
> > +			if (!skb)
> > +				break;
> > +			hdr = virtio_vsock_hdr(skb);
> > +			pkt_len = (size_t)le32_to_cpu(hdr->len);
> > +
> > +			if (dequeued_len >= 0) {
> > +				size_t bytes_to_copy;
> > +
> > +				bytes_to_copy = min(user_buf_len, pkt_len);
> > +
> > +				if (bytes_to_copy) {
> > +					int err;
> > +
> > +					spin_unlock_bh(&vvs->rx_lock);
> > +					err = skb_copy_datagram_iter(
> > +						skb, 0, &msg->msg_iter,
> > +						bytes_to_copy);
> > +					if (err)
> > +						dequeued_len = err;
> > +					else
> > +						user_buf_len -= bytes_to_copy;
> > +					spin_lock_bh(&vvs->rx_lock);
> > +				}
> > +
> > +				if (dequeued_len >= 0)
> > +					dequeued_len += pkt_len;
> > +			}
> > +
> > +			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
> > +				msg_ready = true;
> > +				vvs->msg_count--;
> > +
> > +				if (le32_to_cpu(hdr->flags) &
> > +				    VIRTIO_VSOCK_SEQ_EOR)
> > +					msg->msg_flags |= MSG_EOR;
> > +			}
> > +
> > +			virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
> > +			kfree_skb(skb);
> > +		}
> > 	}
> > 
> > 	spin_unlock_bh(&vvs->rx_lock);
> > @@ -1132,6 +1299,7 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
> > 
> > 	virtio_transport_cancel_close_work(vsk, true);
> > 
> > +	kfree(vvs->boundary_buf);
> > 	kfree(vvs);
> > 	vsk->trans = NULL;
> > }
> > @@ -1224,6 +1392,11 @@ static void virtio_transport_remove_sock(struct vsock_sock *vsk)
> > 	 * removing it.
> > 	 */
> > 	__skb_queue_purge(&vvs->rx_queue);
> > +	kfree(vvs->boundary_buf);
> > +	vvs->boundary_buf = NULL;
> > +	vvs->boundary_len = 0;
> > +	vvs->boundary_alloc = 0;
> > +	vvs->boundary_off = 0;
> > 	vsock_remove_sock(vsk);
> > }
> > 
> > @@ -1395,23 +1568,62 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> > 	    !skb_is_nonlinear(skb)) {
> > 		struct virtio_vsock_hdr *last_hdr;
> > 		struct sk_buff *last_skb;
> > +		bool last_has_eom;
> > +		bool has_eom;
> > 
> > 		last_skb = skb_peek_tail(&vvs->rx_queue);
> > 		last_hdr = virtio_vsock_hdr(last_skb);
> > +		last_has_eom = le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
> > +		has_eom = le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM;
> > 
> > -		/* If there is space in the last packet queued, we copy the
> > -		 * new packet in its buffer. We avoid this if the last packet
> > -		 * queued has VIRTIO_VSOCK_SEQ_EOM set, because this is
> > -		 * delimiter of SEQPACKET message, so 'pkt' is the first packet
> > -		 * of a new message.
> > -		 */
> > -		if (skb->len < skb_tailroom(last_skb) &&
> > -		    !(le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)) {
> > -			memcpy(skb_put(last_skb, skb->len), skb->data, skb->len);
> > -			free_pkt = true;
> > -			last_hdr->flags |= hdr->flags;
> > -			le32_add_cpu(&last_hdr->len, len);
> > -			goto out;
> > +		if (skb->len < skb_tailroom(last_skb)) {
> > +			if (!last_has_eom) {
> > +				/* Same-message coalescing (existing path) */
> > +				memcpy(skb_put(last_skb, skb->len),
> > +				       skb->data, skb->len);
> > +				free_pkt = true;
> > +				last_hdr->flags |= hdr->flags;
> > +				le32_add_cpu(&last_hdr->len, len);
> > +				goto out;
> > +			}
> > +
> > +			/* Cross-EOM: coalesce complete messages into one skb,
> > +			 * recording message boundaries in a compact BER buffer.
> > +			 * Only when incoming packet also has EOM (complete msg).
> > +			 */
> > +			if (has_eom && !sk_psock(sk_vsock(vsk))) {
> > +				bool prev_eor, cur_eor;
> > +				u8 tmp[12];
> > +				int n = 0;
> > +
> > +				cur_eor = le32_to_cpu(hdr->flags) &
> > +					  VIRTIO_VSOCK_SEQ_EOR;
> > +
> > +				if (!VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries) {
> > +					u32 prev_len = le32_to_cpu(last_hdr->len);
> > +
> > +					prev_eor = le32_to_cpu(last_hdr->flags) &
> > +						   VIRTIO_VSOCK_SEQ_EOR;
> > +					n += vsock_uleb_encode_boundary(
> > +						tmp + n, prev_len, prev_eor);
> > +				}
> > +				n += vsock_uleb_encode_boundary(
> > +					tmp + n, len, cur_eor);
> > +
> > +				if (!vsock_boundary_buf_ensure(
> > +					    vvs, vvs->boundary_len + n)) {
> > +					memcpy(vvs->boundary_buf +
> > +					       vvs->boundary_len, tmp, n);
> > +					vvs->boundary_len += n;
> > +					VIRTIO_VSOCK_SKB_CB(last_skb)->has_boundary_entries = true;
> > +					memcpy(skb_put(last_skb, skb->len),
> > +					       skb->data, skb->len);
> > +					free_pkt = true;
> > +					last_hdr->flags |= hdr->flags;
> > +					le32_add_cpu(&last_hdr->len, len);
> > +					goto out;
> > +				}
> > +			}
> > 		}
> > 	}
> > 
> > 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-08  9:58                   ` Michael S. Tsirkin
@ 2026-05-08 10:11                     ` Stefano Garzarella
  0 siblings, 0 replies; 26+ messages in thread
From: Stefano Garzarella @ 2026-05-08 10:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization

On Fri, May 08, 2026 at 05:58:06AM -0400, Michael S. Tsirkin wrote:
>On Fri, May 08, 2026 at 11:41:21AM +0200, Stefano Garzarella wrote:
>> On Thu, May 07, 2026 at 06:48:47PM -0400, Michael S. Tsirkin wrote:
>> > On Thu, May 07, 2026 at 02:59:13PM +0200, Stefano Garzarella wrote:
>> > > On Thu, May 07, 2026 at 07:45:10AM -0400, Michael S. Tsirkin wrote:
>> > > > On Thu, May 07, 2026 at 11:09:47AM +0200, Stefano Garzarella wrote:
>>
>> [...]
>>
>> > > > > For now, we're already doing something:
>> > > > > merging the skuffs if they don't have EOM set.
>> > > >
>> > > >
>> > > > Right that's good. You could go further and merge with EOM too
>> > > > if you stick the info about message boundaries somewhere else.
>> > >
>> > > This adds a lot of complexity IMO, but we can try.
>> > >
>> > > Do you have something in mind?
>> >
>> > BER is clearly overkill but here's a POC that claude made for me,
>> > just to give u an idea. It's clearly has a ton of issues,
>> > for example I dislike how GFP_ATOMIC is handled.
>>
>> Okay, I somewhat understand, but clearly this isn't net material
>
>I doubt we have many other options given reverting the regression was
>ruled out.

As Eric pointed out, we can't revert it.

>
>
>> so for now
>> I think the best thing to do is to merge the fixup I sent (or something
>> similar):
>> https://lore.kernel.org/netdev/20260508092330.69690-1-sgarzare@redhat.com/
>
>I reviewed that one, problem is it's a spec violation/change that we'll
>have to support forever.

I have a few points to make on this, but let's discuss them there.

>
>> This is a major change that should be merged with more caution.
>> Could this have too much of an impact on performance?
>>
>> Thanks,
>> Stefano
>
>It's really a POC, real patch is left as an excersise for the reader 
>:).

eheh, I see, but honestly, this overcomplication scares me. I'll try to 
think it over.

>The correct approach IMHO is to only start using this
>when we wasted a lot of memory on small packets.
>
>For example, if sum(truesize) >= buf size.
>
>then we'll not see a perf impact unless it's already pathological.

Agree on this, which is similar to what I'm doing in that patch.  
Reducing the advertised buf_alloc only in pathological cases (e.g.  
overhead > buf_alloc).

Stefano


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-05 14:14   ` Eric Dumazet
  2026-05-05 16:11     ` Stefano Garzarella
@ 2026-05-06 15:15     ` Michael S. Tsirkin
  2026-05-06 15:38       ` Eric Dumazet
  1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2026-05-06 15:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stefano Garzarella, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, Arseniy Krasnov,
	Stefan Hajnoczi, Jason Wang, Xuan Zhuo, Eugenio Pérez, kvm,
	virtualization

On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> There is always a discrepancy between skb->len and skb->truesize.
> You will not be able to announce a 1MB window, and accept one milliion
> skb of 1-byte each.

We can if we copy.

> This kind of contract is broken.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-06 15:15     ` Michael S. Tsirkin
@ 2026-05-06 15:38       ` Eric Dumazet
  2026-05-06 15:49         ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2026-05-06 15:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefano Garzarella, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, Arseniy Krasnov,
	Stefan Hajnoczi, Jason Wang, Xuan Zhuo, Eugenio Pérez, kvm,
	virtualization

On Wed, May 6, 2026 at 8:15 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > There is always a discrepancy between skb->len and skb->truesize.
> > You will not be able to announce a 1MB window, and accept one milliion
> > skb of 1-byte each.
>
> We can if we copy.

You mean, ignore VIRTIO_VSOCK_SEQ_EOM?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
  2026-05-06 15:38       ` Eric Dumazet
@ 2026-05-06 15:49         ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2026-05-06 15:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stefano Garzarella, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, Arseniy Krasnov,
	Stefan Hajnoczi, Jason Wang, Xuan Zhuo, Eugenio Pérez, kvm,
	virtualization

On Wed, May 06, 2026 at 08:38:39AM -0700, Eric Dumazet wrote:
> On Wed, May 6, 2026 at 8:15 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > > There is always a discrepancy between skb->len and skb->truesize.
> > > You will not be able to announce a 1MB window, and accept one milliion
> > > skb of 1-byte each.
> >
> > We can if we copy.
> 
> You mean, ignore VIRTIO_VSOCK_SEQ_EOM?

No I mean saving it in a compact form.
If packet boundaries are the only concern, the overhead
needn't exceed 50% even for 1 byte messages. asn.1 ber, for
an over-engineered example?


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-05-08 10:11 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 12:26 [PATCH net] vsock/virtio: fix potential unbounded skb queue Eric Dumazet
2026-05-05  2:20 ` patchwork-bot+netdevbpf
2026-05-05 13:51 ` Stefano Garzarella
2026-05-05 14:14   ` Eric Dumazet
2026-05-05 16:11     ` Stefano Garzarella
2026-05-05 16:37       ` Bobby Eshleman
2026-05-06  9:50         ` Arseniy Krasnov
2026-05-06 14:00           ` Stefano Garzarella
2026-05-06 15:37       ` Michael S. Tsirkin
2026-05-07  9:09         ` Stefano Garzarella
2026-05-07 11:45           ` Michael S. Tsirkin
2026-05-07 12:59             ` Stefano Garzarella
2026-05-07 14:33               ` Jakub Kicinski
2026-05-07 16:05                 ` Stefano Garzarella
2026-05-07 16:32                   ` Eric Dumazet
2026-05-07 17:18                     ` Jakub Kicinski
2026-05-08  9:26                       ` Stefano Garzarella
2026-05-07 14:52               ` Michael S. Tsirkin
2026-05-07 22:48               ` Michael S. Tsirkin
2026-05-08  9:41                 ` Stefano Garzarella
2026-05-08  9:43                   ` Michael S. Tsirkin
2026-05-08  9:58                   ` Michael S. Tsirkin
2026-05-08 10:11                     ` Stefano Garzarella
2026-05-06 15:15     ` Michael S. Tsirkin
2026-05-06 15:38       ` Eric Dumazet
2026-05-06 15:49         ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox