Re: [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will@kernel.org>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: linux-kernel@vger.kernel.org, "Keir Fraser" <keirf@google.com>,
	"Steven Moreland" <smoreland@google.com>,
	"Frederick Mayle" <fmayle@google.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	netdev@vger.kernel.org, virtualization@lists.linux.dev
Subject: Re: [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers
Date: Mon, 30 Jun 2025 15:20:57 +0100	[thread overview]
Message-ID: <aGKdSVJTjg_vi-12@willie-the-truck> (raw)
In-Reply-To: <orht2imwke5xhnmeewxrbey3xbn2ivjzujksqnrtfe3cjtgrg2@6ls6dyexnkvc>

On Fri, Jun 27, 2025 at 12:45:45PM +0200, Stefano Garzarella wrote:
> On Wed, Jun 25, 2025 at 02:15:41PM +0100, Will Deacon wrote:
> > When receiving a packet from a guest, vhost_vsock_handle_tx_kick()
> > calls vhost_vsock_alloc_skb() to allocate and fill an SKB with the
> > receive data. Unfortunately, these are always linear allocations and can
> > therefore result in significant pressure on kmalloc() considering that
> > the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE +
> > VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB
> > allocation for each packet.
> > 
> > Rework the vsock SKB allocation so that, for sizes with page order
> > greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated
> > instead with the packet header in the SKB and the receive data in the
> > fragments.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> > drivers/vhost/vsock.c        | 15 +++++++++------
> > include/linux/virtio_vsock.h | 31 +++++++++++++++++++++++++------
> > 2 files changed, 34 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index 66a0f060770e..cfa4e1bcf367 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -344,11 +344,16 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
> > 
> > 	len = iov_length(vq->iov, out);
> > 
> > -	if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
> > +	if (len < VIRTIO_VSOCK_SKB_HEADROOM ||
> 
> Why moving this check here?

I moved it here because virtio_vsock_alloc_skb_with_frags() does:

+       size -= VIRTIO_VSOCK_SKB_HEADROOM;
+       return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
+                                                  size, mask);

and so having the check in __virtio_vsock_alloc_skb_with_frags() looks
strange as, by then, it really only applies to the linear case. It also
feels weird to me to have the upper-bound of the length checked by the
caller but the lower-bound checked in the callee. I certainly find it
easier to reason about if they're in the same place.

Additionally, the lower-bound check is only needed by the vhost receive
code, as the transmit path uses virtio_vsock_alloc_skb(), which never
passes a size smaller than VIRTIO_VSOCK_SKB_HEADROOM.

Given all that, moving it to the one place that needs it seemed like the
best option. What do you think?

> > +	    len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
> > 		return NULL;
> > 
> > 	/* len contains both payload and hdr */
> > -	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
> > +	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> > +		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
> > +	else
> > +		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
> 
> Can we do this directly in virtio_vsock_alloc_skb() so we don't need
> to duplicate code on virtio/vhost code?

We can, but then I think we should do something different for the
rx_fill() path -- it feels fragile to rely on that using small-enough
buffers to guarantee linear allocations. How about I:

 1. Add virtio_vsock_alloc_linear_skb(), which always performs a linear
    allocation.

 2. Change virtio_vsock_alloc_skb() to use nonlinear SKBs for sizes
    greater than SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)

 3. Use virtio_vsock_alloc_linear_skb() to fill the guest RX buffers

 4. Use virtio_vsock_alloc_skb() for everything else

If you like the idea, I'll rework the series along those lines.
Diff below... (see end of mail)

> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 67ffb64325ef..8f9fa1cab32a 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -51,27 +51,46 @@ static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb)
> > {
> > 	u32 len;
> > 
> > +	DEBUG_NET_WARN_ON_ONCE(skb->len);
> 
> Should we mention in the commit message?

Sure, I'll add something. The non-linear handling doesn't accumulate len,
so it's a debug check to ensure that len hasn't been messed with between
allocation and here.

> > 	len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
> > 
> > -	if (len > 0)
> 
> Why removing this check?

I think it's redundant: len is a u32, so we're basically just checking
to see if it's non-zero. All the callers have already checked for this
but, even if they didn't, skb_put(skb, 0) is harmless afaict.

Will

--->8

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 3799c0aeeec5..a6cd72a32f63 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -349,11 +349,7 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
 		return NULL;
 
 	/* len contains both payload and hdr */
-	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
-	else
-		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
-
+	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
 	if (!skb)
 		return NULL;
 
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 0e265921be03..ed5eab46e3dc 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -79,16 +79,19 @@ __virtio_vsock_alloc_skb_with_frags(unsigned int header_len,
 }
 
 static inline struct sk_buff *
-virtio_vsock_alloc_skb_with_frags(unsigned int size, gfp_t mask)
+virtio_vsock_alloc_linear_skb(unsigned int size, gfp_t mask)
 {
-	size -= VIRTIO_VSOCK_SKB_HEADROOM;
-	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
-						   size, mask);
+	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
 }
 
 static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask)
 {
-	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
+	if (size <= SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
+		return virtio_vsock_alloc_linear_skb(size, mask);
+
+	size -= VIRTIO_VSOCK_SKB_HEADROOM;
+	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
+						   size, mask);
 }
 
 static inline void
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 4ae714397ca3..8c9ca0cb0d4e 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -321,7 +321,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 	vq = vsock->vqs[VSOCK_VQ_RX];
 
 	do {
-		skb = virtio_vsock_alloc_skb(total_len, GFP_KERNEL);
+		skb = virtio_vsock_alloc_linear_skb(total_len, GFP_KERNEL);
 		if (!skb)
 			break;
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 424eb69e84f9..f74677c3511e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -262,11 +262,7 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
 	if (!zcopy)
 		skb_len += payload_len;
 
-	if (skb_len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		skb = virtio_vsock_alloc_skb_with_frags(skb_len, GFP_KERNEL);
-	else
-		skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
-
+	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
 	if (!skb)
 		return NULL;

next prev parent reply	other threads:[~2025-06-30 14:21 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25 13:15 [PATCH 0/5] vsock/virtio: SKB allocation improvements Will Deacon
2025-06-25 13:15 ` [PATCH 1/5] vhost/vsock: Avoid allocating arbitrarily-sized SKBs Will Deacon
2025-06-27 10:36   ` Stefano Garzarella
2025-06-30 12:51     ` Will Deacon
2025-07-01 10:37       ` Stefano Garzarella
2025-06-25 13:15 ` [PATCH 2/5] vsock/virtio: Resize receive buffers so that each SKB fits in a page Will Deacon
2025-06-27 10:41   ` Stefano Garzarella
2025-06-30 13:06     ` Will Deacon
2025-06-25 13:15 ` [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers Will Deacon
2025-06-27 10:45   ` Stefano Garzarella
2025-06-30 14:20     ` Will Deacon [this message]
2025-07-01 10:44       ` Stefano Garzarella
2025-07-01 13:52         ` Will Deacon
2025-06-25 13:15 ` [PATCH 4/5] vsock/virtio: Rename virtio_vsock_skb_rx_put() to virtio_vsock_skb_put() Will Deacon
2025-06-27 10:46   ` Stefano Garzarella
2025-06-25 13:15 ` [PATCH 5/5] vhost/vsock: Allocate nonlinear SKBs for handling large transmit buffers Will Deacon
2025-06-27 10:50   ` Stefano Garzarella
2025-06-30 14:21     ` Will Deacon
2025-06-27 10:51 ` [PATCH 0/5] vsock/virtio: SKB allocation improvements Stefano Garzarella
2025-06-30 12:50   ` Will Deacon

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:3799c0aeeec dfblob:a6cd72a32f6 dfblob:0e265921be0
dfblob:ed5eab46e3d dfblob:4ae714397ca dfblob:8c9ca0cb0d4
dfblob:424eb69e84f dfblob:f74677c3511 )
 OR (
bs:"Re: [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGKdSVJTjg_vi-12@willie-the-truck \
    --to=will@kernel.org \
    --cc=eperezma@redhat.com \
    --cc=fmayle@google.com \
    --cc=jasowang@redhat.com \
    --cc=keirf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=smoreland@google.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox