[PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf

Netdev List
 help / color / mirror / Atom feed

* [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
@ 2026-05-08  9:23 Stefano Garzarella
  2026-05-08  9:53 ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2026-05-08  9:23 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Michael S. Tsirkin, Stefan Hajnoczi, virtualization,
	David S. Miller, Jason Wang, Simon Horman, linux-kernel,
	Paolo Abeni, Xuan Zhuo, kvm, Jakub Kicinski, Stefano Garzarella,
	Eugenio Pérez

From: Stefano Garzarella <sgarzare@redhat.com>

After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
buf_alloc when checking whether a new packet fits. This reduces the
effective receive buffer below what the user configured via
SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
silently dropped and applications that rely on the full buffer size
to deadlock.

Also, the reduced space is not communicated to the remote peer, so
its credit calculation accounts more credit than the receiver will
actually accept, causing data loss (there is no retransmission).

This also causes failures in tools/testing/vsock/vsock_test.c.
Test 18 sometimes fails, while test 22 always fails in this way:
    18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch

    22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
    Resource temporarily unavailable

Fix this by introducing virtio_transport_rx_buf_size() to calculate the
size of the RX buffer based on the overhead. Using it in the acceptance
check, the advertised buf_alloc, and the credit update decision.
Use buf_alloc * 2 as total budget (payload + overhead), similar to how
SO_RCVBUF is doubled to reserve space for sk_buff metadata.
The function returns buf_alloc as long as overhead fits within the
reservation, then gradually reduces toward 0 as overhead exceeds
buf_alloc (e.g. under small-packet flooding), informing the peer to
slow down.

Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 31 +++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 9b8014516f4f..94a4beb8fd61 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -444,12 +444,32 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	return ret;
 }
 
+/* vvs->rx_lock held by the caller */
+static u32 virtio_transport_rx_buf_size(struct virtio_vsock_sock *vvs)
+{
+	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
+	/* Use buf_alloc * 2 as total budget (payload + overhead), similar to
+	 * how SO_RCVBUF is doubled to reserve space for sk_buff metadata.
+	 */
+	u64 total_budget = (u64)vvs->buf_alloc * 2;
+
+	/* Overhead within buf_alloc: full buf_alloc available for payload */
+	if (skb_overhead < vvs->buf_alloc)
+		return vvs->buf_alloc;
+
+	/* Overhead exceeded buf_alloc: gradually reduce to bound skb queue */
+	if (skb_overhead < total_budget)
+		return total_budget - skb_overhead;
+
+	return 0;
+}
+
 static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
 					u32 len)
 {
-	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
+	u32 rx_buf_size = virtio_transport_rx_buf_size(vvs);
 
-	if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
+	if (!rx_buf_size || vvs->buf_used + len > rx_buf_size)
 		return false;
 
 	vvs->rx_bytes += len;
@@ -472,7 +492,7 @@ void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *
 	spin_lock_bh(&vvs->rx_lock);
 	vvs->last_fwd_cnt = vvs->fwd_cnt;
 	hdr->fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
-	hdr->buf_alloc = cpu_to_le32(vvs->buf_alloc);
+	hdr->buf_alloc = cpu_to_le32(virtio_transport_rx_buf_size(vvs));
 	spin_unlock_bh(&vvs->rx_lock);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
@@ -594,6 +614,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	bool low_rx_bytes;
 	int err = -EFAULT;
 	size_t total = 0;
+	u32 rx_buf_size;
 	u32 free_space;
 
 	spin_lock_bh(&vvs->rx_lock);
@@ -639,7 +660,9 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	}
 
 	fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
-	free_space = vvs->buf_alloc - fwd_cnt_delta;
+	rx_buf_size = virtio_transport_rx_buf_size(vvs);
+	free_space = rx_buf_size > fwd_cnt_delta ?
+		     rx_buf_size - fwd_cnt_delta : 0;
 	low_rx_bytes = (vvs->rx_bytes <
 			sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
  2026-05-08  9:23 [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc Stefano Garzarella
@ 2026-05-08  9:53 ` Michael S. Tsirkin
  2026-05-08 10:01   ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2026-05-08  9:53 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Eric Dumazet, Stefan Hajnoczi, virtualization,
	David S. Miller, Jason Wang, Simon Horman, linux-kernel,
	Paolo Abeni, Xuan Zhuo, kvm, Jakub Kicinski, Eugenio Pérez

On Fri, May 08, 2026 at 11:23:30AM +0200, Stefano Garzarella wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
> 
> After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
> queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
> buf_alloc when checking whether a new packet fits. This reduces the
> effective receive buffer below what the user configured via
> SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
> silently dropped and applications that rely on the full buffer size
> to deadlock.
> 
> Also, the reduced space is not communicated to the remote peer, so
> its credit calculation accounts more credit than the receiver will
> actually accept, causing data loss (there is no retransmission).
> 
> This also causes failures in tools/testing/vsock/vsock_test.c.
> Test 18 sometimes fails, while test 22 always fails in this way:
>     18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
> 
>     22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
>     Resource temporarily unavailable
> 
> Fix this by introducing virtio_transport_rx_buf_size() to calculate the
> size of the RX buffer based on the overhead. Using it in the acceptance
> check, the advertised buf_alloc, and the credit update decision.
> Use buf_alloc * 2 as total budget (payload + overhead), similar to how
> SO_RCVBUF is doubled to reserve space for sk_buff metadata.
> The function returns buf_alloc as long as overhead fits within the
> reservation, then gradually reduces toward 0 as overhead exceeds
> buf_alloc (e.g. under small-packet flooding), informing the peer to
> slow down.
> 
> Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>


unfortunately, this is a bit of a spec violation and there is no guarantee
it helps.

a spec violation because the spec says:
Only payload bytes are counted and header bytes are not
included

and the implication is that a side can not reduce its own buf_alloc.

no guarantee because the other side is not required to process your
packets, so it might not see your buf alloc reduction.

as designed in the current spec, you can only increase your buf alloc,
not decrease it.

what can be done:
- more efficient storage for small packets (poc i posted)
- reduce buf alloc ahead of the time

> ---
>  net/vmw_vsock/virtio_transport_common.c | 31 +++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 9b8014516f4f..94a4beb8fd61 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -444,12 +444,32 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	return ret;
>  }
>  
> +/* vvs->rx_lock held by the caller */
> +static u32 virtio_transport_rx_buf_size(struct virtio_vsock_sock *vvs)
> +{
> +	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> +	/* Use buf_alloc * 2 as total budget (payload + overhead), similar to
> +	 * how SO_RCVBUF is doubled to reserve space for sk_buff metadata.
> +	 */
> +	u64 total_budget = (u64)vvs->buf_alloc * 2;
> +
> +	/* Overhead within buf_alloc: full buf_alloc available for payload */
> +	if (skb_overhead < vvs->buf_alloc)
> +		return vvs->buf_alloc;
> +
> +	/* Overhead exceeded buf_alloc: gradually reduce to bound skb queue */
> +	if (skb_overhead < total_budget)
> +		return total_budget - skb_overhead;
> +
> +	return 0;
> +}
> +
>  static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
>  					u32 len)
>  {
> -	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> +	u32 rx_buf_size = virtio_transport_rx_buf_size(vvs);
>  
> -	if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> +	if (!rx_buf_size || vvs->buf_used + len > rx_buf_size)
>  		return false;
>  
>  	vvs->rx_bytes += len;
> @@ -472,7 +492,7 @@ void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *
>  	spin_lock_bh(&vvs->rx_lock);
>  	vvs->last_fwd_cnt = vvs->fwd_cnt;
>  	hdr->fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> -	hdr->buf_alloc = cpu_to_le32(vvs->buf_alloc);
> +	hdr->buf_alloc = cpu_to_le32(virtio_transport_rx_buf_size(vvs));
>  	spin_unlock_bh(&vvs->rx_lock);
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
> @@ -594,6 +614,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>  	bool low_rx_bytes;
>  	int err = -EFAULT;
>  	size_t total = 0;
> +	u32 rx_buf_size;
>  	u32 free_space;
>  
>  	spin_lock_bh(&vvs->rx_lock);
> @@ -639,7 +660,9 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>  	}
>  
>  	fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
> -	free_space = vvs->buf_alloc - fwd_cnt_delta;
> +	rx_buf_size = virtio_transport_rx_buf_size(vvs);
> +	free_space = rx_buf_size > fwd_cnt_delta ?
> +		     rx_buf_size - fwd_cnt_delta : 0;
>  	low_rx_bytes = (vvs->rx_bytes <
>  			sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
>  
> -- 
> 2.54.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
  2026-05-08  9:53 ` Michael S. Tsirkin
@ 2026-05-08 10:01   ` Stefano Garzarella
  2026-05-08 10:33     ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2026-05-08 10:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Eric Dumazet, Stefan Hajnoczi, virtualization,
	David S. Miller, Jason Wang, Simon Horman, linux-kernel,
	Paolo Abeni, Xuan Zhuo, kvm, Jakub Kicinski, Eugenio Pérez

On Fri, May 08, 2026 at 05:53:07AM -0400, Michael S. Tsirkin wrote:
>On Fri, May 08, 2026 at 11:23:30AM +0200, Stefano Garzarella wrote:
>> From: Stefano Garzarella <sgarzare@redhat.com>
>>
>> After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
>> queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
>> buf_alloc when checking whether a new packet fits. This reduces the
>> effective receive buffer below what the user configured via
>> SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
>> silently dropped and applications that rely on the full buffer size
>> to deadlock.
>>
>> Also, the reduced space is not communicated to the remote peer, so
>> its credit calculation accounts more credit than the receiver will
>> actually accept, causing data loss (there is no retransmission).
>>
>> This also causes failures in tools/testing/vsock/vsock_test.c.
>> Test 18 sometimes fails, while test 22 always fails in this way:
>>     18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
>>
>>     22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
>>     Resource temporarily unavailable
>>
>> Fix this by introducing virtio_transport_rx_buf_size() to calculate the
>> size of the RX buffer based on the overhead. Using it in the acceptance
>> check, the advertised buf_alloc, and the credit update decision.
>> Use buf_alloc * 2 as total budget (payload + overhead), similar to how
>> SO_RCVBUF is doubled to reserve space for sk_buff metadata.
>> The function returns buf_alloc as long as overhead fits within the
>> reservation, then gradually reduces toward 0 as overhead exceeds
>> buf_alloc (e.g. under small-packet flooding), informing the peer to
>> slow down.
>>
>> Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
>
>unfortunately, this is a bit of a spec violation and there is no guarantee
>it helps.

Loosing data like we are doing in 059b7dbd20a6 is even worse IMHO.

>
>a spec violation because the spec says:
>Only payload bytes are counted and header bytes are not
>included
>
>and the implication is that a side can not reduce its own buf_alloc.
>
>no guarantee because the other side is not required to process your
>packets, so it might not see your buf alloc reduction.
>
>as designed in the current spec, you can only increase your buf alloc,
>not decrease it.

We never enforced this, currently an user can reduce it by 
SO_VM_SOCKETS_BUFFER_SIZE and we haven't blocked it since virtio-vsock 
was introduced, should we update the spec?

>
>what can be done:
>- more efficient storage for small packets (poc i posted)
>- reduce buf alloc ahead of the time

That's basically what I'm doing here: I'm using twice the size of 
`buf_alloc` (just like `SO_RCVBUF` does for other socket types) and 
telling the other peer just `buf_alloc`.

But then, somehow, we have to let the other person know that we're 
running out of space. With this patch that only happens when the other 
peer isn't behaving properly, sending so many small packets that the 
overhead exceeds `buf_alloc`.

Stefano

>
>> ---
>>  net/vmw_vsock/virtio_transport_common.c | 31 +++++++++++++++++++++----
>>  1 file changed, 27 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 9b8014516f4f..94a4beb8fd61 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -444,12 +444,32 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>  	return ret;
>>  }
>>
>> +/* vvs->rx_lock held by the caller */
>> +static u32 virtio_transport_rx_buf_size(struct virtio_vsock_sock *vvs)
>> +{
>> +	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
>> +	/* Use buf_alloc * 2 as total budget (payload + overhead), similar to
>> +	 * how SO_RCVBUF is doubled to reserve space for sk_buff metadata.
>> +	 */
>> +	u64 total_budget = (u64)vvs->buf_alloc * 2;
>> +
>> +	/* Overhead within buf_alloc: full buf_alloc available for payload */
>> +	if (skb_overhead < vvs->buf_alloc)
>> +		return vvs->buf_alloc;
>> +
>> +	/* Overhead exceeded buf_alloc: gradually reduce to bound skb queue */
>> +	if (skb_overhead < total_budget)
>> +		return total_budget - skb_overhead;
>> +
>> +	return 0;
>> +}
>> +
>>  static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
>>  					u32 len)
>>  {
>> -	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
>> +	u32 rx_buf_size = virtio_transport_rx_buf_size(vvs);
>>
>> -	if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
>> +	if (!rx_buf_size || vvs->buf_used + len > rx_buf_size)
>>  		return false;
>>
>>  	vvs->rx_bytes += len;
>> @@ -472,7 +492,7 @@ void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *
>>  	spin_lock_bh(&vvs->rx_lock);
>>  	vvs->last_fwd_cnt = vvs->fwd_cnt;
>>  	hdr->fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
>> -	hdr->buf_alloc = cpu_to_le32(vvs->buf_alloc);
>> +	hdr->buf_alloc = cpu_to_le32(virtio_transport_rx_buf_size(vvs));
>>  	spin_unlock_bh(&vvs->rx_lock);
>>  }
>>  EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
>> @@ -594,6 +614,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>>  	bool low_rx_bytes;
>>  	int err = -EFAULT;
>>  	size_t total = 0;
>> +	u32 rx_buf_size;
>>  	u32 free_space;
>>
>>  	spin_lock_bh(&vvs->rx_lock);
>> @@ -639,7 +660,9 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>>  	}
>>
>>  	fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
>> -	free_space = vvs->buf_alloc - fwd_cnt_delta;
>> +	rx_buf_size = virtio_transport_rx_buf_size(vvs);
>> +	free_space = rx_buf_size > fwd_cnt_delta ?
>> +		     rx_buf_size - fwd_cnt_delta : 0;
>>  	low_rx_bytes = (vvs->rx_bytes <
>>  			sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
>>
>> --
>> 2.54.0
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
  2026-05-08 10:01   ` Stefano Garzarella
@ 2026-05-08 10:33     ` Michael S. Tsirkin
  2026-05-08 10:38       ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2026-05-08 10:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Eric Dumazet, Stefan Hajnoczi, virtualization,
	David S. Miller, Jason Wang, Simon Horman, linux-kernel,
	Paolo Abeni, Xuan Zhuo, kvm, Jakub Kicinski, Eugenio Pérez

On Fri, May 08, 2026 at 12:01:50PM +0200, Stefano Garzarella wrote:
> On Fri, May 08, 2026 at 05:53:07AM -0400, Michael S. Tsirkin wrote:
> > On Fri, May 08, 2026 at 11:23:30AM +0200, Stefano Garzarella wrote:
> > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > 
> > > After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
> > > queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
> > > buf_alloc when checking whether a new packet fits. This reduces the
> > > effective receive buffer below what the user configured via
> > > SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
> > > silently dropped and applications that rely on the full buffer size
> > > to deadlock.
> > > 
> > > Also, the reduced space is not communicated to the remote peer, so
> > > its credit calculation accounts more credit than the receiver will
> > > actually accept, causing data loss (there is no retransmission).
> > > 
> > > This also causes failures in tools/testing/vsock/vsock_test.c.
> > > Test 18 sometimes fails, while test 22 always fails in this way:
> > >     18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
> > > 
> > >     22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
> > >     Resource temporarily unavailable
> > > 
> > > Fix this by introducing virtio_transport_rx_buf_size() to calculate the
> > > size of the RX buffer based on the overhead. Using it in the acceptance
> > > check, the advertised buf_alloc, and the credit update decision.
> > > Use buf_alloc * 2 as total budget (payload + overhead), similar to how
> > > SO_RCVBUF is doubled to reserve space for sk_buff metadata.
> > > The function returns buf_alloc as long as overhead fits within the
> > > reservation, then gradually reduces toward 0 as overhead exceeds
> > > buf_alloc (e.g. under small-packet flooding), informing the peer to
> > > slow down.
> > > 
> > > Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > 
> > 
> > unfortunately, this is a bit of a spec violation and there is no guarantee
> > it helps.
> 
> Loosing data like we are doing in 059b7dbd20a6 is even worse IMHO.
> 
> > 
> > a spec violation because the spec says:
> > Only payload bytes are counted and header bytes are not
> > included
> > 
> > and the implication is that a side can not reduce its own buf_alloc.
> > 
> > no guarantee because the other side is not required to process your
> > packets, so it might not see your buf alloc reduction.
> > 
> > as designed in the current spec, you can only increase your buf alloc,
> > not decrease it.
> 
> We never enforced this, currently an user can reduce it by
> SO_VM_SOCKETS_BUFFER_SIZE and we haven't blocked it since virtio-vsock was
> introduced, should we update the spec?


it's not that we need to enforce it, it's that all synchronization
assumes this. as in, anyone can use an old copy until they run out
of credits.


> > 
> > what can be done:
> > - more efficient storage for small packets (poc i posted)
> > - reduce buf alloc ahead of the time
> 
> That's basically what I'm doing here: I'm using twice the size of
> `buf_alloc` (just like `SO_RCVBUF` does for other socket types) and telling
> the other peer just `buf_alloc`.
> 
> But then, somehow, we have to let the other person know that we're running
> out of space. With this patch that only happens when the other peer isn't
> behaving properly, sending so many small packets that the overhead exceeds
> `buf_alloc`.
> 
> Stefano

what is "not proper" here, it is up to the application what to send.


> > 
> > > ---
> > >  net/vmw_vsock/virtio_transport_common.c | 31 +++++++++++++++++++++----
> > >  1 file changed, 27 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index 9b8014516f4f..94a4beb8fd61 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -444,12 +444,32 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > >  	return ret;
> > >  }
> > > 
> > > +/* vvs->rx_lock held by the caller */
> > > +static u32 virtio_transport_rx_buf_size(struct virtio_vsock_sock *vvs)
> > > +{
> > > +	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> > > +	/* Use buf_alloc * 2 as total budget (payload + overhead), similar to
> > > +	 * how SO_RCVBUF is doubled to reserve space for sk_buff metadata.
> > > +	 */
> > > +	u64 total_budget = (u64)vvs->buf_alloc * 2;
> > > +
> > > +	/* Overhead within buf_alloc: full buf_alloc available for payload */
> > > +	if (skb_overhead < vvs->buf_alloc)
> > > +		return vvs->buf_alloc;
> > > +
> > > +	/* Overhead exceeded buf_alloc: gradually reduce to bound skb queue */
> > > +	if (skb_overhead < total_budget)
> > > +		return total_budget - skb_overhead;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > >  					u32 len)
> > >  {
> > > -	u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> > > +	u32 rx_buf_size = virtio_transport_rx_buf_size(vvs);
> > > 
> > > -	if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> > > +	if (!rx_buf_size || vvs->buf_used + len > rx_buf_size)
> > >  		return false;
> > > 
> > >  	vvs->rx_bytes += len;
> > > @@ -472,7 +492,7 @@ void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *
> > >  	spin_lock_bh(&vvs->rx_lock);
> > >  	vvs->last_fwd_cnt = vvs->fwd_cnt;
> > >  	hdr->fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> > > -	hdr->buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > > +	hdr->buf_alloc = cpu_to_le32(virtio_transport_rx_buf_size(vvs));
> > >  	spin_unlock_bh(&vvs->rx_lock);
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
> > > @@ -594,6 +614,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > >  	bool low_rx_bytes;
> > >  	int err = -EFAULT;
> > >  	size_t total = 0;
> > > +	u32 rx_buf_size;
> > >  	u32 free_space;
> > > 
> > >  	spin_lock_bh(&vvs->rx_lock);
> > > @@ -639,7 +660,9 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > >  	}
> > > 
> > >  	fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
> > > -	free_space = vvs->buf_alloc - fwd_cnt_delta;
> > > +	rx_buf_size = virtio_transport_rx_buf_size(vvs);
> > > +	free_space = rx_buf_size > fwd_cnt_delta ?
> > > +		     rx_buf_size - fwd_cnt_delta : 0;
> > >  	low_rx_bytes = (vvs->rx_bytes <
> > >  			sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
> > > 
> > > --
> > > 2.54.0
> > 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
  2026-05-08 10:33     ` Michael S. Tsirkin
@ 2026-05-08 10:38       ` Stefano Garzarella
  2026-05-11 10:54         ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2026-05-08 10:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Eric Dumazet, Stefan Hajnoczi, virtualization,
	David S. Miller, Jason Wang, Simon Horman, linux-kernel,
	Paolo Abeni, Xuan Zhuo, kvm, Jakub Kicinski, Eugenio Pérez

On Fri, May 08, 2026 at 06:33:13AM -0400, Michael S. Tsirkin wrote:
>On Fri, May 08, 2026 at 12:01:50PM +0200, Stefano Garzarella wrote:
>> On Fri, May 08, 2026 at 05:53:07AM -0400, Michael S. Tsirkin wrote:
>> > On Fri, May 08, 2026 at 11:23:30AM +0200, Stefano Garzarella wrote:
>> > > From: Stefano Garzarella <sgarzare@redhat.com>
>> > >
>> > > After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
>> > > queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
>> > > buf_alloc when checking whether a new packet fits. This reduces the
>> > > effective receive buffer below what the user configured via
>> > > SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
>> > > silently dropped and applications that rely on the full buffer size
>> > > to deadlock.
>> > >
>> > > Also, the reduced space is not communicated to the remote peer, so
>> > > its credit calculation accounts more credit than the receiver will
>> > > actually accept, causing data loss (there is no retransmission).
>> > >
>> > > This also causes failures in tools/testing/vsock/vsock_test.c.
>> > > Test 18 sometimes fails, while test 22 always fails in this way:
>> > >     18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
>> > >
>> > >     22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
>> > >     Resource temporarily unavailable
>> > >
>> > > Fix this by introducing virtio_transport_rx_buf_size() to calculate the
>> > > size of the RX buffer based on the overhead. Using it in the acceptance
>> > > check, the advertised buf_alloc, and the credit update decision.
>> > > Use buf_alloc * 2 as total budget (payload + overhead), similar to how
>> > > SO_RCVBUF is doubled to reserve space for sk_buff metadata.
>> > > The function returns buf_alloc as long as overhead fits within the
>> > > reservation, then gradually reduces toward 0 as overhead exceeds
>> > > buf_alloc (e.g. under small-packet flooding), informing the peer to
>> > > slow down.
>> > >
>> > > Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
>> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> >
>> >
>> > unfortunately, this is a bit of a spec violation and there is no guarantee
>> > it helps.
>>
>> Loosing data like we are doing in 059b7dbd20a6 is even worse IMHO.
>>
>> >
>> > a spec violation because the spec says:
>> > Only payload bytes are counted and header bytes are not
>> > included
>> >
>> > and the implication is that a side can not reduce its own buf_alloc.
>> >
>> > no guarantee because the other side is not required to process your
>> > packets, so it might not see your buf alloc reduction.
>> >
>> > as designed in the current spec, you can only increase your buf alloc,
>> > not decrease it.
>>
>> We never enforced this, currently an user can reduce it by
>> SO_VM_SOCKETS_BUFFER_SIZE and we haven't blocked it since virtio-vsock was
>> introduced, should we update the spec?
>
>
>it's not that we need to enforce it, it's that all synchronization
>assumes this. as in, anyone can use an old copy until they run out
>of credits.
>
>
>> >
>> > what can be done:
>> > - more efficient storage for small packets (poc i posted)
>> > - reduce buf alloc ahead of the time
>>
>> That's basically what I'm doing here: I'm using twice the size of
>> `buf_alloc` (just like `SO_RCVBUF` does for other socket types) and telling
>> the other peer just `buf_alloc`.
>>
>> But then, somehow, we have to let the other person know that we're running
>> out of space. With this patch that only happens when the other peer isn't
>> behaving properly, sending so many small packets that the overhead exceeds
>> `buf_alloc`.
>>
>> Stefano
>
>what is "not proper" here, it is up to the application what to send.

Sure, but here we're just slowing down the application by telling it we 
don't have any more space.

Again, without this patch we are just dropping data, which IMO is even 
worse.

So I think we should merge this for now, while we handle better the EOM.
If you prefer, I can drop the part where we reduce the buf_alloc 
advertised to the other peer, but at least we should drop data after 
`buf_alloc * 2` IMO.


Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
  2026-05-08 10:38       ` Stefano Garzarella
@ 2026-05-11 10:54         ` Stefano Garzarella
  2026-05-11 12:46           ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2026-05-11 10:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Eric Dumazet, Stefan Hajnoczi, virtualization,
	David S. Miller, Jason Wang, Simon Horman, linux-kernel,
	Paolo Abeni, Xuan Zhuo, kvm, Jakub Kicinski, Eugenio Pérez

On Fri, 8 May 2026 at 12:38, Stefano Garzarella <sgarzare@redhat.com> 
wrote:
>
> On Fri, May 08, 2026 at 06:33:13AM -0400, Michael S. Tsirkin wrote:
> >On Fri, May 08, 2026 at 12:01:50PM +0200, Stefano Garzarella wrote:
> >> On Fri, May 08, 2026 at 05:53:07AM -0400, Michael S. Tsirkin wrote:
> >> > On Fri, May 08, 2026 at 11:23:30AM +0200, Stefano Garzarella wrote:
> >> > > From: Stefano Garzarella <sgarzare@redhat.com>
> >> > >
> >> > > After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
> >> > > queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
> >> > > buf_alloc when checking whether a new packet fits. This reduces the
> >> > > effective receive buffer below what the user configured via
> >> > > SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
> >> > > silently dropped and applications that rely on the full buffer size
> >> > > to deadlock.
> >> > >
> >> > > Also, the reduced space is not communicated to the remote peer, so
> >> > > its credit calculation accounts more credit than the receiver will
> >> > > actually accept, causing data loss (there is no retransmission).
> >> > >
> >> > > This also causes failures in tools/testing/vsock/vsock_test.c.
> >> > > Test 18 sometimes fails, while test 22 always fails in this way:
> >> > >     18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
> >> > >
> >> > >     22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
> >> > >     Resource temporarily unavailable
> >> > >
> >> > > Fix this by introducing virtio_transport_rx_buf_size() to calculate the
> >> > > size of the RX buffer based on the overhead. Using it in the acceptance
> >> > > check, the advertised buf_alloc, and the credit update decision.
> >> > > Use buf_alloc * 2 as total budget (payload + overhead), similar to how
> >> > > SO_RCVBUF is doubled to reserve space for sk_buff metadata.
> >> > > The function returns buf_alloc as long as overhead fits within the
> >> > > reservation, then gradually reduces toward 0 as overhead exceeds
> >> > > buf_alloc (e.g. under small-packet flooding), informing the peer to
> >> > > slow down.
> >> > >
> >> > > Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
> >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >> >
> >> >
> >> > unfortunately, this is a bit of a spec violation and there is no guarantee
> >> > it helps.
> >>
> >> Loosing data like we are doing in 059b7dbd20a6 is even worse IMHO.
> >>
> >> >
> >> > a spec violation because the spec says:
> >> > Only payload bytes are counted and header bytes are not
> >> > included
> >> >
> >> > and the implication is that a side can not reduce its own buf_alloc.
> >> >
> >> > no guarantee because the other side is not required to process your
> >> > packets, so it might not see your buf alloc reduction.
> >> >
> >> > as designed in the current spec, you can only increase your buf alloc,
> >> > not decrease it.
> >>
> >> We never enforced this, currently an user can reduce it by
> >> SO_VM_SOCKETS_BUFFER_SIZE and we haven't blocked it since virtio-vsock was
> >> introduced, should we update the spec?
> >
> >
> >it's not that we need to enforce it, it's that all synchronization
> >assumes this. as in, anyone can use an old copy until they run out
> >of credits.
> >
> >
> >> >
> >> > what can be done:
> >> > - more efficient storage for small packets (poc i posted)
> >> > - reduce buf alloc ahead of the time
> >>
> >> That's basically what I'm doing here: I'm using twice the size of
> >> `buf_alloc` (just like `SO_RCVBUF` does for other socket types) and telling
> >> the other peer just `buf_alloc`.
> >>
> >> But then, somehow, we have to let the other person know that we're running
> >> out of space. With this patch that only happens when the other peer isn't
> >> behaving properly, sending so many small packets that the overhead exceeds
> >> `buf_alloc`.
> >>
> >> Stefano
> >
> >what is "not proper" here, it is up to the application what to send.
>
> Sure, but here we're just slowing down the application by telling it we
> don't have any more space.
>
> Again, without this patch we are just dropping data, which IMO is even
> worse.
>
> So I think we should merge this for now, while we handle better the EOM.
> If you prefer, I can drop the part where we reduce the buf_alloc
> advertised to the other peer, but at least we should drop data after
> `buf_alloc * 2` IMO.

Okay, I thought it over over the weekend, and I agree that this patch 
still doesn't solve the problem and would still result in packet loss.
So, until we resolve the issue permanently, and since 059b7dbd20a6 is 
coming to stable, I'd like to rework this patch so that we only start 
dropping packets when the overhead plus the queued bytes exceeds 
`buf_alloc * 2`.
Removing the other changes to reduce the buf_alloc advertised, but 
terminating the connection so that both peers are aware that something 
went wrong.

Any objections?

Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
  2026-05-11 10:54         ` Stefano Garzarella
@ 2026-05-11 12:46           ` Michael S. Tsirkin
  2026-05-11 13:17             ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2026-05-11 12:46 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Eric Dumazet, Stefan Hajnoczi, virtualization,
	David S. Miller, Jason Wang, Simon Horman, linux-kernel,
	Paolo Abeni, Xuan Zhuo, kvm, Jakub Kicinski, Eugenio Pérez

On Mon, May 11, 2026 at 12:54:44PM +0200, Stefano Garzarella wrote:
> On Fri, 8 May 2026 at 12:38, Stefano Garzarella <sgarzare@redhat.com> 
> wrote:
> >
> > On Fri, May 08, 2026 at 06:33:13AM -0400, Michael S. Tsirkin wrote:
> > >On Fri, May 08, 2026 at 12:01:50PM +0200, Stefano Garzarella wrote:
> > >> On Fri, May 08, 2026 at 05:53:07AM -0400, Michael S. Tsirkin wrote:
> > >> > On Fri, May 08, 2026 at 11:23:30AM +0200, Stefano Garzarella wrote:
> > >> > > From: Stefano Garzarella <sgarzare@redhat.com>
> > >> > >
> > >> > > After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
> > >> > > queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
> > >> > > buf_alloc when checking whether a new packet fits. This reduces the
> > >> > > effective receive buffer below what the user configured via
> > >> > > SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
> > >> > > silently dropped and applications that rely on the full buffer size
> > >> > > to deadlock.
> > >> > >
> > >> > > Also, the reduced space is not communicated to the remote peer, so
> > >> > > its credit calculation accounts more credit than the receiver will
> > >> > > actually accept, causing data loss (there is no retransmission).
> > >> > >
> > >> > > This also causes failures in tools/testing/vsock/vsock_test.c.
> > >> > > Test 18 sometimes fails, while test 22 always fails in this way:
> > >> > >     18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
> > >> > >
> > >> > >     22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
> > >> > >     Resource temporarily unavailable
> > >> > >
> > >> > > Fix this by introducing virtio_transport_rx_buf_size() to calculate the
> > >> > > size of the RX buffer based on the overhead. Using it in the acceptance
> > >> > > check, the advertised buf_alloc, and the credit update decision.
> > >> > > Use buf_alloc * 2 as total budget (payload + overhead), similar to how
> > >> > > SO_RCVBUF is doubled to reserve space for sk_buff metadata.
> > >> > > The function returns buf_alloc as long as overhead fits within the
> > >> > > reservation, then gradually reduces toward 0 as overhead exceeds
> > >> > > buf_alloc (e.g. under small-packet flooding), informing the peer to
> > >> > > slow down.
> > >> > >
> > >> > > Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
> > >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >> >
> > >> >
> > >> > unfortunately, this is a bit of a spec violation and there is no guarantee
> > >> > it helps.
> > >>
> > >> Loosing data like we are doing in 059b7dbd20a6 is even worse IMHO.
> > >>
> > >> >
> > >> > a spec violation because the spec says:
> > >> > Only payload bytes are counted and header bytes are not
> > >> > included
> > >> >
> > >> > and the implication is that a side can not reduce its own buf_alloc.
> > >> >
> > >> > no guarantee because the other side is not required to process your
> > >> > packets, so it might not see your buf alloc reduction.
> > >> >
> > >> > as designed in the current spec, you can only increase your buf alloc,
> > >> > not decrease it.
> > >>
> > >> We never enforced this, currently an user can reduce it by
> > >> SO_VM_SOCKETS_BUFFER_SIZE and we haven't blocked it since virtio-vsock was
> > >> introduced, should we update the spec?
> > >
> > >
> > >it's not that we need to enforce it, it's that all synchronization
> > >assumes this. as in, anyone can use an old copy until they run out
> > >of credits.
> > >
> > >
> > >> >
> > >> > what can be done:
> > >> > - more efficient storage for small packets (poc i posted)
> > >> > - reduce buf alloc ahead of the time
> > >>
> > >> That's basically what I'm doing here: I'm using twice the size of
> > >> `buf_alloc` (just like `SO_RCVBUF` does for other socket types) and telling
> > >> the other peer just `buf_alloc`.
> > >>
> > >> But then, somehow, we have to let the other person know that we're running
> > >> out of space. With this patch that only happens when the other peer isn't
> > >> behaving properly, sending so many small packets that the overhead exceeds
> > >> `buf_alloc`.
> > >>
> > >> Stefano
> > >
> > >what is "not proper" here, it is up to the application what to send.
> >
> > Sure, but here we're just slowing down the application by telling it we
> > don't have any more space.
> >
> > Again, without this patch we are just dropping data, which IMO is even
> > worse.
> >
> > So I think we should merge this for now, while we handle better the EOM.
> > If you prefer, I can drop the part where we reduce the buf_alloc
> > advertised to the other peer, but at least we should drop data after
> > `buf_alloc * 2` IMO.
> 
> Okay, I thought it over over the weekend, and I agree that this patch 
> still doesn't solve the problem and would still result in packet loss.
> So, until we resolve the issue permanently, and since 059b7dbd20a6 is 
> coming to stable, I'd like to rework this patch so that we only start 
> dropping packets when the overhead plus the queued bytes exceeds 
> `buf_alloc * 2`.
> Removing the other changes to reduce the buf_alloc advertised, but 
> terminating the connection so that both peers are aware that something 
> went wrong.
> 
> Any objections?
> 
> Stefano

Let's try to first fix it upstream properly please. Discuss backporting
later.

-- 
MST


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
  2026-05-11 12:46           ` Michael S. Tsirkin
@ 2026-05-11 13:17             ` Stefano Garzarella
  2026-05-11 23:48               ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2026-05-11 13:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Eric Dumazet, Stefan Hajnoczi, virtualization,
	David S. Miller, Jason Wang, Simon Horman, linux-kernel,
	Paolo Abeni, Xuan Zhuo, kvm, Jakub Kicinski, Eugenio Pérez

On Mon, 11 May 2026 at 14:46, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, May 11, 2026 at 12:54:44PM +0200, Stefano Garzarella wrote:
> > On Fri, 8 May 2026 at 12:38, Stefano Garzarella <sgarzare@redhat.com>
> > wrote:
> > >
> > > On Fri, May 08, 2026 at 06:33:13AM -0400, Michael S. Tsirkin wrote:
> > > >On Fri, May 08, 2026 at 12:01:50PM +0200, Stefano Garzarella wrote:
> > > >> On Fri, May 08, 2026 at 05:53:07AM -0400, Michael S. Tsirkin wrote:
> > > >> > On Fri, May 08, 2026 at 11:23:30AM +0200, Stefano Garzarella wrote:
> > > >> > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > >> > >
> > > >> > > After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
> > > >> > > queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
> > > >> > > buf_alloc when checking whether a new packet fits. This reduces the
> > > >> > > effective receive buffer below what the user configured via
> > > >> > > SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
> > > >> > > silently dropped and applications that rely on the full buffer size
> > > >> > > to deadlock.
> > > >> > >
> > > >> > > Also, the reduced space is not communicated to the remote peer, so
> > > >> > > its credit calculation accounts more credit than the receiver will
> > > >> > > actually accept, causing data loss (there is no retransmission).
> > > >> > >
> > > >> > > This also causes failures in tools/testing/vsock/vsock_test.c.
> > > >> > > Test 18 sometimes fails, while test 22 always fails in this way:
> > > >> > >     18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
> > > >> > >
> > > >> > >     22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
> > > >> > >     Resource temporarily unavailable
> > > >> > >
> > > >> > > Fix this by introducing virtio_transport_rx_buf_size() to calculate the
> > > >> > > size of the RX buffer based on the overhead. Using it in the acceptance
> > > >> > > check, the advertised buf_alloc, and the credit update decision.
> > > >> > > Use buf_alloc * 2 as total budget (payload + overhead), similar to how
> > > >> > > SO_RCVBUF is doubled to reserve space for sk_buff metadata.
> > > >> > > The function returns buf_alloc as long as overhead fits within the
> > > >> > > reservation, then gradually reduces toward 0 as overhead exceeds
> > > >> > > buf_alloc (e.g. under small-packet flooding), informing the peer to
> > > >> > > slow down.
> > > >> > >
> > > >> > > Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
> > > >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >> >
> > > >> >
> > > >> > unfortunately, this is a bit of a spec violation and there is no guarantee
> > > >> > it helps.
> > > >>
> > > >> Loosing data like we are doing in 059b7dbd20a6 is even worse IMHO.
> > > >>
> > > >> >
> > > >> > a spec violation because the spec says:
> > > >> > Only payload bytes are counted and header bytes are not
> > > >> > included
> > > >> >
> > > >> > and the implication is that a side can not reduce its own buf_alloc.
> > > >> >
> > > >> > no guarantee because the other side is not required to process your
> > > >> > packets, so it might not see your buf alloc reduction.
> > > >> >
> > > >> > as designed in the current spec, you can only increase your buf alloc,
> > > >> > not decrease it.
> > > >>
> > > >> We never enforced this, currently an user can reduce it by
> > > >> SO_VM_SOCKETS_BUFFER_SIZE and we haven't blocked it since virtio-vsock was
> > > >> introduced, should we update the spec?
> > > >
> > > >
> > > >it's not that we need to enforce it, it's that all synchronization
> > > >assumes this. as in, anyone can use an old copy until they run out
> > > >of credits.
> > > >
> > > >
> > > >> >
> > > >> > what can be done:
> > > >> > - more efficient storage for small packets (poc i posted)
> > > >> > - reduce buf alloc ahead of the time
> > > >>
> > > >> That's basically what I'm doing here: I'm using twice the size of
> > > >> `buf_alloc` (just like `SO_RCVBUF` does for other socket types) and telling
> > > >> the other peer just `buf_alloc`.
> > > >>
> > > >> But then, somehow, we have to let the other person know that we're running
> > > >> out of space. With this patch that only happens when the other peer isn't
> > > >> behaving properly, sending so many small packets that the overhead exceeds
> > > >> `buf_alloc`.
> > > >>
> > > >> Stefano
> > > >
> > > >what is "not proper" here, it is up to the application what to send.
> > >
> > > Sure, but here we're just slowing down the application by telling it we
> > > don't have any more space.
> > >
> > > Again, without this patch we are just dropping data, which IMO is even
> > > worse.
> > >
> > > So I think we should merge this for now, while we handle better the EOM.
> > > If you prefer, I can drop the part where we reduce the buf_alloc
> > > advertised to the other peer, but at least we should drop data after
> > > `buf_alloc * 2` IMO.
> >
> > Okay, I thought it over over the weekend, and I agree that this patch
> > still doesn't solve the problem and would still result in packet loss.
> > So, until we resolve the issue permanently, and since 059b7dbd20a6 is
> > coming to stable, I'd like to rework this patch so that we only start
> > dropping packets when the overhead plus the queued bytes exceeds
> > `buf_alloc * 2`.
> > Removing the other changes to reduce the buf_alloc advertised, but
> > terminating the connection so that both peers are aware that something
> > went wrong.
> >
> > Any objections?
> >
> > Stefano
>
> Let's try to first fix it upstream properly please. Discuss backporting
> later.

Commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
queue") is already in Linus tree and will land soon on stable. Which
issue do you see on having a patch on top of that to close the
connection instead of losing data and breaking our test suite?

IMO we need that change in any case, because the previous code also
discarded packets without any notification, whereas breaking the
connection would be better in that case.

Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
  2026-05-11 13:17             ` Stefano Garzarella
@ 2026-05-11 23:48               ` Jakub Kicinski
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2026-05-11 23:48 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, netdev, Eric Dumazet, Stefan Hajnoczi,
	virtualization, David S. Miller, Jason Wang, Simon Horman,
	linux-kernel, Paolo Abeni, Xuan Zhuo, kvm, Eugenio Pérez

On Mon, 11 May 2026 15:17:56 +0200 Stefano Garzarella wrote:
> > > Okay, I thought it over over the weekend, and I agree that this patch
> > > still doesn't solve the problem and would still result in packet loss.
> > > So, until we resolve the issue permanently, and since 059b7dbd20a6 is
> > > coming to stable, I'd like to rework this patch so that we only start
> > > dropping packets when the overhead plus the queued bytes exceeds
> > > `buf_alloc * 2`.
> > > Removing the other changes to reduce the buf_alloc advertised, but
> > > terminating the connection so that both peers are aware that something
> > > went wrong.
> > >
> > > Any objections?
> > >
> > > Stefano  
> >
> > Let's try to first fix it upstream properly please. Discuss backporting
> > later.  
> 
> Commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
> queue") is already in Linus tree and will land soon on stable. Which
> issue do you see on having a patch on top of that to close the
> connection instead of losing data and breaking our test suite?
> 
> IMO we need that change in any case, because the previous code also
> discarded packets without any notification, whereas breaking the
> connection would be better in that case.

Sorry if I'm speaking out of turn or misunderstanding but, like Michael
says, let's focus on fixing this the way we want it to be fixed?
IIUC you are trying to minimize the size of the fix, please don't worry
about the LoC in the diff at this stage.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-11 23:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08  9:23 [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc Stefano Garzarella
2026-05-08  9:53 ` Michael S. Tsirkin
2026-05-08 10:01   ` Stefano Garzarella
2026-05-08 10:33     ` Michael S. Tsirkin
2026-05-08 10:38       ` Stefano Garzarella
2026-05-11 10:54         ` Stefano Garzarella
2026-05-11 12:46           ` Michael S. Tsirkin
2026-05-11 13:17             ` Stefano Garzarella
2026-05-11 23:48               ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox