Re: [PATCH net v3] vsock/virtio: cap TX credit to local buffer size

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Melbin K Mathew <mlbnkm1@gmail.com>
Cc: stefanha@redhat.com, sgarzare@redhat.com, kvm@vger.kernel.org,
	netdev@vger.kernel.org, virtualization@lists.linux.dev,
	linux-kernel@vger.kernel.org, jasowang@redhat.com,
	xuanzhuo@linux.alibaba.com, eperezma@redhat.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org
Subject: Re: [PATCH net v3] vsock/virtio: cap TX credit to local buffer size
Date: Thu, 11 Dec 2025 08:05:11 -0500	[thread overview]
Message-ID: <20251211080251-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20251211125104.375020-1-mlbnkm1@gmail.com>

On Thu, Dec 11, 2025 at 01:51:04PM +0100, Melbin K Mathew wrote:
> The virtio vsock transport currently derives its TX credit directly from
> peer_buf_alloc, which is populated from the remote endpoint's
> SO_VM_SOCKETS_BUFFER_SIZE value.
> 
> On the host side, this means the amount of data we are willing to queue
> for a given connection is scaled purely by a peer-chosen value, rather
> than by the host's own vsock buffer configuration. A guest that
> advertises a very large buffer and reads slowly can cause the host to
> allocate a correspondingly large amount of sk_buff memory for that
> connection.
> 
> In practice, a malicious guest can:
> 
>   - set a large AF_VSOCK buffer size (e.g. 2 GiB) with
>     SO_VM_SOCKETS_BUFFER_MAX_SIZE / SO_VM_SOCKETS_BUFFER_SIZE, and
> 
>   - open multiple connections to a host vsock service that sends data
>     while the guest drains slowly.
> 
> On an unconstrained host this can drive Slab/SUnreclaim into the tens of
> GiB range, causing allocation failures and OOM kills in unrelated host
> processes while the offending VM remains running.
> 
> On non-virtio transports and compatibility:
> 
>   - VMCI uses the AF_VSOCK buffer knobs to size its queue pairs per
>     socket based on the local vsk->buffer_* values; the remote side
>     can’t enlarge those queues beyond what the local endpoint
>     configured.
> 
>   - Hyper-V’s vsock transport uses fixed-size VMBus ring buffers and
>     an MTU bound; there is no peer-controlled credit field comparable
>     to peer_buf_alloc, and the remote endpoint can’t drive in-flight
>     kernel memory above those ring sizes.
> 
>   - The loopback path reuses virtio_transport_common.c, so it
>     naturally follows the same semantics as the virtio transport.
> 
> Make virtio-vsock consistent with that model by intersecting the peer’s
> advertised receive window with the local vsock buffer size when
> computing TX credit. We introduce a small helper and use it in
> virtio_transport_get_credit(), virtio_transport_has_space() and
> virtio_transport_seqpacket_enqueue(), so that:
> 
>     effective_tx_window = min(peer_buf_alloc, buf_alloc)
> 
> This prevents a remote endpoint from forcing us to queue more data than
> our own configuration allows, while preserving the existing credit
> semantics and keeping virtio-vsock compatible with the other transports.
> 
> On an unpatched Ubuntu 22.04 host (~64 GiB RAM), running a PoC with
> 32 guest vsock connections advertising 2 GiB each and reading slowly
> drove Slab/SUnreclaim from ~0.5 GiB to ~57 GiB and the system only
> recovered after killing the QEMU process.
> 
> With this patch applied, rerunning the same PoC yields:
> 
>   Before:
>     MemFree:        ~61.6 GiB
>     MemAvailable:   ~62.3 GiB
>     Slab:           ~142 MiB
>     SUnreclaim:     ~117 MiB
> 
>   After 32 high-credit connections:
>     MemFree:        ~61.5 GiB
>     MemAvailable:   ~62.3 GiB
>     Slab:           ~178 MiB
>     SUnreclaim:     ~152 MiB
> 
> i.e. only ~35 MiB increase in Slab/SUnreclaim, no host OOM, and the
> guest remains responsive.
> 
> Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko")
> Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 27 ++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index dcc8a1d58..02eeb96dd 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -491,6 +491,25 @@ void virtio_transport_consume_skb_sent(struct sk_buff *skb, bool consume)
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent);
>  
> +/* Return the effective peer buffer size for TX credit computation.
> + *
> + * The peer advertises its receive buffer via peer_buf_alloc, but we
> + * cap that to our local buf_alloc (derived from
> + * SO_VM_SOCKETS_BUFFER_SIZE and already clamped to buffer_max_size)
> + * so that a remote endpoint cannot force us to queue more data than
> + * our own configuration allows.
> + */
> +static u32 virtio_transport_tx_buf_alloc(struct virtio_vsock_sock *vvs)
> +{
> +	return min(vvs->peer_buf_alloc, vvs->buf_alloc);
> +}
> +
>  u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
>  {
>  	u32 ret;
> @@ -499,7 +518,8 @@ u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
>  		return 0;
>  
>  	spin_lock_bh(&vvs->tx_lock);
> -	ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> +	ret = virtio_transport_tx_buf_alloc(vvs) -
> +		(vvs->tx_cnt - vvs->peer_fwd_cnt);
>  	if (ret > credit)
>  		ret = credit;
>  	vvs->tx_cnt += ret;
> @@ -831,7 +851,7 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
>  
>  	spin_lock_bh(&vvs->tx_lock);
>  
> -	if (len > vvs->peer_buf_alloc) {
> +	if (len > virtio_transport_tx_buf_alloc(vvs)) {
>  		spin_unlock_bh(&vvs->tx_lock);
>  		return -EMSGSIZE;
>  	}
> @@ -882,7 +902,8 @@ static s64 virtio_transport_has_space(struct vsock_sock *vsk)
>  	struct virtio_vsock_sock *vvs = vsk->trans;
>  	s64 bytes;
>  
> -	bytes = (s64)vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> +	bytes = (s64)virtio_transport_tx_buf_alloc(vvs) -
> +		(vvs->tx_cnt - vvs->peer_fwd_cnt);
>  	if (bytes < 0)
>  		bytes = 0;
>  

Acked-by: Michael S. Tsirkin <mst@redhat.com>


Looking at this, why is one place casting to s64 the other is not?




> -- 
> 2.34.1

next prev parent reply	other threads:[~2025-12-11 13:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-11 12:51 [PATCH net v3] vsock/virtio: cap TX credit to local buffer size Melbin K Mathew
2025-12-11 13:05 ` Michael S. Tsirkin [this message]
2025-12-11 13:56   ` Stefano Garzarella
2025-12-11 14:43     ` Melbin Mathew Antony
2025-12-11 14:51       ` Stefano Garzarella
2025-12-12  9:56         ` Melbin Mathew Antony
2025-12-12 10:40           ` Stefano Garzarella
2025-12-12 11:40             ` Melbin K Mathew
2025-12-12 12:02               ` Michael S. Tsirkin
2025-12-12 12:26               ` Stefano Garzarella
2025-12-14  6:38                 ` Melbin K Mathew
2025-12-15 14:15                   ` Stefano Garzarella
2025-12-11 13:15 ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251211080251-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=horms@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlbnkm1@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.