[PATCH net-next v11] virtio_net: add page_pool support for buffer allocation

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
@ 2026-03-10 18:31 Vishwanath Seshagiri
  2026-03-13  7:51 ` Jason Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Vishwanath Seshagiri @ 2026-03-10 18:31 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang
  Cc: Xuan Zhuo, Eugenio Pérez, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, David Wei,
	Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="y", Size: 32960 bytes --]

Use page_pool for RX buffer allocation in mergeable and small buffer
modes to enable page recycling and avoid repeated page allocator calls.
skb_mark_for_recycle() enables page reuse in the network stack.

Big packets mode is unchanged because it uses page->private for linked
list chaining of multiple pages per buffer, which conflicts with
page_pool's internal use of page->private.

Implement conditional DMA premapping using virtqueue_dma_dev():
- When non-NULL (vhost, virtio-pci): use PP_FLAG_DMA_MAP with page_pool
  handling DMA mapping, submit via virtqueue_add_inbuf_premapped()
- When NULL (VDUSE, direct physical): page_pool handles allocation only,
  submit via virtqueue_add_inbuf_ctx()

This preserves the DMA premapping optimization from commit 31f3cd4e5756b
("virtio-net: rq submits premapped per-buffer") while adding page_pool
support as a prerequisite for future zero-copy features (devmem TCP,
io_uring ZCRX).

Page pools are created in probe and destroyed in remove (not open/close),
following existing driver behavior where RX buffers remain in virtqueues
across interface state changes.

Signed-off-by: Vishwanath Seshagiri <vishs@meta.com>
---
Changes in v11:
- add_recvbuf_small: encode alloc_len and xdp_headroom in ctx via
  mergeable_len_to_ctx() so receive_small() recovers the actual buflen
  via mergeable_ctx_to_truesize() (Michael S. Tsirkin)
- receive_small_build_skb, receive_small_xdp: accept buflen parameter
  instead of recomputing it, to use the actual allocation size
- v10:
  https://lore.kernel.org/virtualization/9752a952-195d-4da3-bc7a-5a4a1f2fd2ca@meta.com/

Changes in v10:
- add_recvbuf_small: use alloc_len to avoid clobbering len; v9 feedback
  was about truesize under-accounting, not variable naming — misunderstood
  the comment in v9
- v9:
  https://lore.kernel.org/virtualization/20260302041005.1627210-1-vishs@meta.com/

Changes in v9:
- Fix virtnet_skb_append_frag() for XSK callers (Michael S. Tsirkin)
- v8:
  https://lore.kernel.org/virtualization/e824c5a3-cfe0-4d11-958f-c3ec82d11d37@meta.com/

Changes in v8:
- Remove virtnet_no_page_pool() helper, replace with direct !rq->page_pool
  checks or inlined conditions (Xuan Zhuo)
- Extract virtnet_rq_submit() helper to consolidate DMA/non-DMA buffer
  submission in add_recvbuf_small() and add_recvbuf_mergeable()
- Add skb_mark_for_recycle(nskb) for overflow frag_list skbs in
  virtnet_skb_append_frag() to ensure page_pool pages are returned to
  the pool instead of freed via put_page()
- Rebase on net-next (kzalloc_objs API)
- v7:
  https://lore.kernel.org/virtualization/20260210014305.3236342-1-vishs@meta.com/

Changes in v7:
- Replace virtnet_put_page() helper with direct page_pool_put_page()
  calls (Xuan Zhuo)
- Add virtnet_no_page_pool() helper to consolidate big_packets mode check
  (Michael S. Tsirkin)
- Add DMA sync_for_cpu for subsequent buffers in xdp_linearize_page() when
  use_page_pool_dma is set (Michael S. Tsirkin)
- Remove unused pp_params.dev assignment in non-DMA path
- Add page pool recreation in virtnet_restore_up() for freeze/restore support (Chris Mason's
Review Prompt)
- v6:
  https://lore.kernel.org/virtualization/20260208175410.1910001-1-vishs@meta.com/

Changes in v6:
- Drop page_pool_frag_offset_add() helper and switch to page_pool_alloc_va();
  page_pool_alloc_netmem() already handles internal fragmentation internally
  (Jakub Kicinski)
- v5:
  https://lore.kernel.org/virtualization/20260206002715.1885869-1-vishs@meta.com/

Benchmark results:

Configuration: pktgen TX -> tap -> vhost-net | virtio-net RX -> XDP_DROP

Small packets (64 bytes, mrg_rxbuf=off):
  1Q:  853,493 -> 868,923 pps  (+1.8%)
  2Q: 1,655,793 -> 1,696,707 pps (+2.5%)
  4Q: 3,143,375 -> 3,302,511 pps (+5.1%)
  8Q: 6,082,590 -> 6,156,894 pps (+1.2%)

Mergeable RX (64 bytes):
  1Q:   766,168 ->   814,493 pps  (+6.3%)
  2Q: 1,384,871 -> 1,670,639 pps (+20.6%)
  4Q: 2,773,081 -> 3,080,574 pps (+11.1%)
  8Q: 5,600,615 -> 6,043,891 pps  (+7.9%)

Mergeable RX (1500 bytes):
  1Q:   741,579 ->   785,442 pps  (+5.9%)
  2Q: 1,310,043 -> 1,534,554 pps (+17.1%)
  4Q: 2,748,700 -> 2,890,582 pps  (+5.2%)
  8Q: 5,348,589 -> 5,618,664 pps  (+5.0%)

 drivers/net/Kconfig      |   1 +
 drivers/net/virtio_net.c | 497 ++++++++++++++++++++-------------------
 2 files changed, 251 insertions(+), 247 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 17108c359216..b2fd90466bab 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -452,6 +452,7 @@ config VIRTIO_NET
 	depends on VIRTIO
 	select NET_FAILOVER
 	select DIMLIB
+	select PAGE_POOL
 	help
 	  This is the virtual network driver for virtio.  It can be used with
 	  QEMU based VMMs (like KVM or Xen).  Say Y or M.
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 72d6a9c6a5a2..a85d75a7f539 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,7 @@
 #include <net/netdev_rx_queue.h>
 #include <net/netdev_queues.h>
 #include <net/xdp_sock_drv.h>
+#include <net/page_pool/helpers.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -290,14 +291,6 @@ struct virtnet_interrupt_coalesce {
 	u32 max_usecs;
 };
 
-/* The dma information of pages allocated at a time. */
-struct virtnet_rq_dma {
-	dma_addr_t addr;
-	u32 ref;
-	u16 len;
-	u16 need_sync;
-};
-
 /* Internal representation of a send virtqueue */
 struct send_queue {
 	/* Virtqueue associated with this send _queue */
@@ -356,8 +349,10 @@ struct receive_queue {
 	/* Average packet length for mergeable receive buffers. */
 	struct ewma_pkt_len mrg_avg_pkt_len;
 
-	/* Page frag for packet buffer allocation. */
-	struct page_frag alloc_frag;
+	struct page_pool *page_pool;
+
+	/* True if page_pool handles DMA mapping via PP_FLAG_DMA_MAP */
+	bool use_page_pool_dma;
 
 	/* RX: fragments + linear part + virtio header */
 	struct scatterlist sg[MAX_SKB_FRAGS + 2];
@@ -370,9 +365,6 @@ struct receive_queue {
 
 	struct xdp_rxq_info xdp_rxq;
 
-	/* Record the last dma info to free after new pages is allocated. */
-	struct virtnet_rq_dma *last_dma;
-
 	struct xsk_buff_pool *xsk_pool;
 
 	/* xdp rxq used by xsk */
@@ -521,11 +513,14 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
 			       struct virtnet_rq_stats *stats);
 static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
 				 struct sk_buff *skb, u8 flags);
-static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
+static struct sk_buff *virtnet_skb_append_frag(struct receive_queue *rq,
+					       struct sk_buff *head_skb,
 					       struct sk_buff *curr_skb,
 					       struct page *page, void *buf,
 					       int len, int truesize);
 static void virtnet_xsk_completed(struct send_queue *sq, int num);
+static void free_unused_bufs(struct virtnet_info *vi);
+static void virtnet_del_vqs(struct virtnet_info *vi);
 
 enum virtnet_xmit_type {
 	VIRTNET_XMIT_TYPE_SKB,
@@ -709,12 +704,10 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 static void virtnet_rq_free_buf(struct virtnet_info *vi,
 				struct receive_queue *rq, void *buf)
 {
-	if (vi->mergeable_rx_bufs)
-		put_page(virt_to_head_page(buf));
-	else if (vi->big_packets)
+	if (!rq->page_pool)
 		give_pages(rq, buf);
 	else
-		put_page(virt_to_head_page(buf));
+		page_pool_put_page(rq->page_pool, virt_to_head_page(buf), -1, false);
 }
 
 static void enable_rx_mode_work(struct virtnet_info *vi)
@@ -876,10 +869,16 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		skb = virtnet_build_skb(buf, truesize, p - buf, len);
 		if (unlikely(!skb))
 			return NULL;
+		/* Big packets mode chains pages via page->private, which is
+		 * incompatible with the way page_pool uses page->private.
+		 * Currently, big packets mode doesn't use page pools.
+		 */
+		if (!rq->page_pool) {
+			page = (struct page *)page->private;
+			if (page)
+				give_pages(rq, page);
+		}
 
-		page = (struct page *)page->private;
-		if (page)
-			give_pages(rq, page);
 		goto ok;
 	}
 
@@ -925,133 +924,16 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	hdr = skb_vnet_common_hdr(skb);
 	memcpy(hdr, hdr_p, hdr_len);
 	if (page_to_free)
-		put_page(page_to_free);
+		page_pool_put_page(rq->page_pool, page_to_free, -1, true);
 
 	return skb;
 }
 
-static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len)
-{
-	struct virtnet_info *vi = rq->vq->vdev->priv;
-	struct page *page = virt_to_head_page(buf);
-	struct virtnet_rq_dma *dma;
-	void *head;
-	int offset;
-
-	BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
-
-	head = page_address(page);
-
-	dma = head;
-
-	--dma->ref;
-
-	if (dma->need_sync && len) {
-		offset = buf - (head + sizeof(*dma));
-
-		virtqueue_map_sync_single_range_for_cpu(rq->vq, dma->addr,
-							offset, len,
-							DMA_FROM_DEVICE);
-	}
-
-	if (dma->ref)
-		return;
-
-	virtqueue_unmap_single_attrs(rq->vq, dma->addr, dma->len,
-				     DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
-	put_page(page);
-}
-
 static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
 {
-	struct virtnet_info *vi = rq->vq->vdev->priv;
-	void *buf;
-
-	BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
+	BUG_ON(!rq->page_pool);
 
-	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
-	if (buf)
-		virtnet_rq_unmap(rq, buf, *len);
-
-	return buf;
-}
-
-static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
-{
-	struct virtnet_info *vi = rq->vq->vdev->priv;
-	struct virtnet_rq_dma *dma;
-	dma_addr_t addr;
-	u32 offset;
-	void *head;
-
-	BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
-
-	head = page_address(rq->alloc_frag.page);
-
-	offset = buf - head;
-
-	dma = head;
-
-	addr = dma->addr - sizeof(*dma) + offset;
-
-	sg_init_table(rq->sg, 1);
-	sg_fill_dma(rq->sg, addr, len);
-}
-
-static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
-{
-	struct page_frag *alloc_frag = &rq->alloc_frag;
-	struct virtnet_info *vi = rq->vq->vdev->priv;
-	struct virtnet_rq_dma *dma;
-	void *buf, *head;
-	dma_addr_t addr;
-
-	BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
-
-	head = page_address(alloc_frag->page);
-
-	dma = head;
-
-	/* new pages */
-	if (!alloc_frag->offset) {
-		if (rq->last_dma) {
-			/* Now, the new page is allocated, the last dma
-			 * will not be used. So the dma can be unmapped
-			 * if the ref is 0.
-			 */
-			virtnet_rq_unmap(rq, rq->last_dma, 0);
-			rq->last_dma = NULL;
-		}
-
-		dma->len = alloc_frag->size - sizeof(*dma);
-
-		addr = virtqueue_map_single_attrs(rq->vq, dma + 1,
-						  dma->len, DMA_FROM_DEVICE, 0);
-		if (virtqueue_map_mapping_error(rq->vq, addr))
-			return NULL;
-
-		dma->addr = addr;
-		dma->need_sync = virtqueue_map_need_sync(rq->vq, addr);
-
-		/* Add a reference to dma to prevent the entire dma from
-		 * being released during error handling. This reference
-		 * will be freed after the pages are no longer used.
-		 */
-		get_page(alloc_frag->page);
-		dma->ref = 1;
-		alloc_frag->offset = sizeof(*dma);
-
-		rq->last_dma = dma;
-	}
-
-	++dma->ref;
-
-	buf = head + alloc_frag->offset;
-
-	get_page(alloc_frag->page);
-	alloc_frag->offset += size;
-
-	return buf;
+	return virtqueue_get_buf_ctx(rq->vq, len, ctx);
 }
 
 static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
@@ -1067,9 +949,6 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
 		return;
 	}
 
-	if (!vi->big_packets || vi->mergeable_rx_bufs)
-		virtnet_rq_unmap(rq, buf, 0);
-
 	virtnet_rq_free_buf(vi, rq, buf);
 }
 
@@ -1335,7 +1214,7 @@ static int xsk_append_merge_buffer(struct virtnet_info *vi,
 
 		truesize = len;
 
-		curr_skb  = virtnet_skb_append_frag(head_skb, curr_skb, page,
+		curr_skb  = virtnet_skb_append_frag(rq, head_skb, curr_skb, page,
 						    buf, len, truesize);
 		if (!curr_skb) {
 			put_page(page);
@@ -1771,7 +1650,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	return ret;
 }
 
-static void put_xdp_frags(struct xdp_buff *xdp)
+static void put_xdp_frags(struct receive_queue *rq, struct xdp_buff *xdp)
 {
 	struct skb_shared_info *shinfo;
 	struct page *xdp_page;
@@ -1781,7 +1660,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
 		shinfo = xdp_get_shared_info_from_buff(xdp);
 		for (i = 0; i < shinfo->nr_frags; i++) {
 			xdp_page = skb_frag_page(&shinfo->frags[i]);
-			put_page(xdp_page);
+			page_pool_put_page(rq->page_pool, xdp_page, -1, true);
 		}
 	}
 }
@@ -1873,7 +1752,7 @@ static struct page *xdp_linearize_page(struct net_device *dev,
 	if (page_off + *len + tailroom > PAGE_SIZE)
 		return NULL;
 
-	page = alloc_page(GFP_ATOMIC);
+	page = page_pool_alloc_pages(rq->page_pool, GFP_ATOMIC);
 	if (!page)
 		return NULL;
 
@@ -1896,8 +1775,12 @@ static struct page *xdp_linearize_page(struct net_device *dev,
 		p = virt_to_head_page(buf);
 		off = buf - page_address(p);
 
+		if (rq->use_page_pool_dma)
+			page_pool_dma_sync_for_cpu(rq->page_pool, p,
+						   off, buflen);
+
 		if (check_mergeable_len(dev, ctx, buflen)) {
-			put_page(p);
+			page_pool_put_page(rq->page_pool, p, -1, true);
 			goto err_buf;
 		}
 
@@ -1905,38 +1788,36 @@ static struct page *xdp_linearize_page(struct net_device *dev,
 		 * is sending packet larger than the MTU.
 		 */
 		if ((page_off + buflen + tailroom) > PAGE_SIZE) {
-			put_page(p);
+			page_pool_put_page(rq->page_pool, p, -1, true);
 			goto err_buf;
 		}
 
 		memcpy(page_address(page) + page_off,
 		       page_address(p) + off, buflen);
 		page_off += buflen;
-		put_page(p);
+		page_pool_put_page(rq->page_pool, p, -1, true);
 	}
 
 	/* Headroom does not contribute to packet length */
 	*len = page_off - XDP_PACKET_HEADROOM;
 	return page;
 err_buf:
-	__free_pages(page, 0);
+	page_pool_put_page(rq->page_pool, page, -1, true);
 	return NULL;
 }
 
 static struct sk_buff *receive_small_build_skb(struct virtnet_info *vi,
 					       unsigned int xdp_headroom,
 					       void *buf,
-					       unsigned int len)
+					       unsigned int len,
+					       unsigned int buflen)
 {
 	unsigned int header_offset;
 	unsigned int headroom;
-	unsigned int buflen;
 	struct sk_buff *skb;
 
 	header_offset = VIRTNET_RX_PAD + xdp_headroom;
 	headroom = vi->hdr_len + header_offset;
-	buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
-		SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
 	skb = virtnet_build_skb(buf, buflen, headroom, len);
 	if (unlikely(!skb))
@@ -1955,6 +1836,7 @@ static struct sk_buff *receive_small_xdp(struct net_device *dev,
 					 void *buf,
 					 unsigned int xdp_headroom,
 					 unsigned int len,
+					 unsigned int buflen,
 					 unsigned int *xdp_xmit,
 					 struct virtnet_rq_stats *stats)
 {
@@ -1963,7 +1845,6 @@ static struct sk_buff *receive_small_xdp(struct net_device *dev,
 	struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
 	struct page *page = virt_to_head_page(buf);
 	struct page *xdp_page;
-	unsigned int buflen;
 	struct xdp_buff xdp;
 	struct sk_buff *skb;
 	unsigned int metasize = 0;
@@ -1976,9 +1857,6 @@ static struct sk_buff *receive_small_xdp(struct net_device *dev,
 	if (unlikely(hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
 		goto err_xdp;
 
-	buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
-		SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-
 	if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
 		int offset = buf - page_address(page) + header_offset;
 		unsigned int tlen = len + vi->hdr_len;
@@ -1996,7 +1874,7 @@ static struct sk_buff *receive_small_xdp(struct net_device *dev,
 			goto err_xdp;
 
 		buf = page_address(xdp_page);
-		put_page(page);
+		page_pool_put_page(rq->page_pool, page, -1, true);
 		page = xdp_page;
 	}
 
@@ -2028,13 +1906,15 @@ static struct sk_buff *receive_small_xdp(struct net_device *dev,
 	if (metasize)
 		skb_metadata_set(skb, metasize);
 
+	skb_mark_for_recycle(skb);
+
 	return skb;
 
 err_xdp:
 	u64_stats_inc(&stats->xdp_drops);
 err:
 	u64_stats_inc(&stats->drops);
-	put_page(page);
+	page_pool_put_page(rq->page_pool, page, -1, true);
 xdp_xmit:
 	return NULL;
 }
@@ -2047,7 +1927,8 @@ static struct sk_buff *receive_small(struct net_device *dev,
 				     unsigned int *xdp_xmit,
 				     struct virtnet_rq_stats *stats)
 {
-	unsigned int xdp_headroom = (unsigned long)ctx;
+	unsigned int xdp_headroom = mergeable_ctx_to_headroom(ctx);
+	unsigned int buflen = mergeable_ctx_to_truesize(ctx);
 	struct page *page = virt_to_head_page(buf);
 	struct sk_buff *skb;
 
@@ -2056,6 +1937,13 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	 */
 	buf -= VIRTNET_RX_PAD + xdp_headroom;
 
+	if (rq->use_page_pool_dma) {
+		int offset = buf - page_address(page) +
+			     VIRTNET_RX_PAD + xdp_headroom;
+
+		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
+	}
+
 	len -= vi->hdr_len;
 	u64_stats_add(&stats->bytes, len);
 
@@ -2073,21 +1961,23 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		xdp_prog = rcu_dereference(rq->xdp_prog);
 		if (xdp_prog) {
 			skb = receive_small_xdp(dev, vi, rq, xdp_prog, buf,
-						xdp_headroom, len, xdp_xmit,
-						stats);
+						xdp_headroom, len, buflen,
+						xdp_xmit, stats);
 			rcu_read_unlock();
 			return skb;
 		}
 		rcu_read_unlock();
 	}
 
-	skb = receive_small_build_skb(vi, xdp_headroom, buf, len);
-	if (likely(skb))
+	skb = receive_small_build_skb(vi, xdp_headroom, buf, len, buflen);
+	if (likely(skb)) {
+		skb_mark_for_recycle(skb);
 		return skb;
+	}
 
 err:
 	u64_stats_inc(&stats->drops);
-	put_page(page);
+	page_pool_put_page(rq->page_pool, page, -1, true);
 	return NULL;
 }
 
@@ -2142,7 +2032,7 @@ static void mergeable_buf_free(struct receive_queue *rq, int num_buf,
 		}
 		u64_stats_add(&stats->bytes, len);
 		page = virt_to_head_page(buf);
-		put_page(page);
+		page_pool_put_page(rq->page_pool, page, -1, true);
 	}
 }
 
@@ -2252,8 +2142,12 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
 		page = virt_to_head_page(buf);
 		offset = buf - page_address(page);
 
+		if (rq->use_page_pool_dma)
+			page_pool_dma_sync_for_cpu(rq->page_pool, page,
+						   offset, len);
+
 		if (check_mergeable_len(dev, ctx, len)) {
-			put_page(page);
+			page_pool_put_page(rq->page_pool, page, -1, true);
 			goto err;
 		}
 
@@ -2272,7 +2166,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
 	return 0;
 
 err:
-	put_xdp_frags(xdp);
+	put_xdp_frags(rq, xdp);
 	return -EINVAL;
 }
 
@@ -2337,7 +2231,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi,
 		if (*len + xdp_room > PAGE_SIZE)
 			return NULL;
 
-		xdp_page = alloc_page(GFP_ATOMIC);
+		xdp_page = page_pool_alloc_pages(rq->page_pool, GFP_ATOMIC);
 		if (!xdp_page)
 			return NULL;
 
@@ -2347,7 +2241,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi,
 
 	*frame_sz = PAGE_SIZE;
 
-	put_page(*page);
+	page_pool_put_page(rq->page_pool, *page, -1, true);
 
 	*page = xdp_page;
 
@@ -2393,6 +2287,8 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev,
 		head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz);
 		if (unlikely(!head_skb))
 			break;
+
+		skb_mark_for_recycle(head_skb);
 		return head_skb;
 
 	case XDP_TX:
@@ -2403,10 +2299,10 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev,
 		break;
 	}
 
-	put_xdp_frags(&xdp);
+	put_xdp_frags(rq, &xdp);
 
 err_xdp:
-	put_page(page);
+	page_pool_put_page(rq->page_pool, page, -1, true);
 	mergeable_buf_free(rq, num_buf, dev, stats);
 
 	u64_stats_inc(&stats->xdp_drops);
@@ -2414,7 +2310,8 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev,
 	return NULL;
 }
 
-static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
+static struct sk_buff *virtnet_skb_append_frag(struct receive_queue *rq,
+					       struct sk_buff *head_skb,
 					       struct sk_buff *curr_skb,
 					       struct page *page, void *buf,
 					       int len, int truesize)
@@ -2429,6 +2326,9 @@ static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
 		if (unlikely(!nskb))
 			return NULL;
 
+		if (head_skb->pp_recycle)
+			skb_mark_for_recycle(nskb);
+
 		if (curr_skb == head_skb)
 			skb_shinfo(curr_skb)->frag_list = nskb;
 		else
@@ -2446,7 +2346,10 @@ static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
 
 	offset = buf - page_address(page);
 	if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
-		put_page(page);
+		if (head_skb->pp_recycle)
+			page_pool_put_page(rq->page_pool, page, -1, true);
+		else
+			put_page(page);
 		skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1,
 				     len, truesize);
 	} else {
@@ -2475,6 +2378,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
 
 	head_skb = NULL;
+
+	if (rq->use_page_pool_dma)
+		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
+
 	u64_stats_add(&stats->bytes, len - vi->hdr_len);
 
 	if (check_mergeable_len(dev, ctx, len))
@@ -2499,6 +2406,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 
 	if (unlikely(!curr_skb))
 		goto err_skb;
+
+	skb_mark_for_recycle(head_skb);
 	while (--num_buf) {
 		buf = virtnet_rq_get_buf(rq, &len, &ctx);
 		if (unlikely(!buf)) {
@@ -2513,11 +2422,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		u64_stats_add(&stats->bytes, len);
 		page = virt_to_head_page(buf);
 
+		if (rq->use_page_pool_dma) {
+			offset = buf - page_address(page);
+			page_pool_dma_sync_for_cpu(rq->page_pool, page,
+						   offset, len);
+		}
+
 		if (check_mergeable_len(dev, ctx, len))
 			goto err_skb;
 
 		truesize = mergeable_ctx_to_truesize(ctx);
-		curr_skb  = virtnet_skb_append_frag(head_skb, curr_skb, page,
+		curr_skb  = virtnet_skb_append_frag(rq, head_skb, curr_skb, page,
 						    buf, len, truesize);
 		if (!curr_skb)
 			goto err_skb;
@@ -2527,7 +2442,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	return head_skb;
 
 err_skb:
-	put_page(page);
+	page_pool_put_page(rq->page_pool, page, -1, true);
 	mergeable_buf_free(rq, num_buf, dev, stats);
 
 err_buf:
@@ -2658,40 +2573,54 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 	virtnet_receive_done(vi, rq, skb, flags);
 }
 
-/* Unlike mergeable buffers, all buffers are allocated to the
- * same size, except for the headroom. For this reason we do
- * not need to use  mergeable_len_to_ctx here - it is enough
- * to store the headroom as the context ignoring the truesize.
+static int virtnet_rq_submit(struct receive_queue *rq, char *buf,
+			     int len, void *ctx, gfp_t gfp)
+{
+	if (rq->use_page_pool_dma) {
+		struct page *page = virt_to_head_page(buf);
+		dma_addr_t addr = page_pool_get_dma_addr(page) +
+				  (buf - (char *)page_address(page));
+
+		sg_init_table(rq->sg, 1);
+		sg_fill_dma(rq->sg, addr, len);
+		return virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1,
+						     buf, ctx, gfp);
+	}
+
+	sg_init_one(rq->sg, buf, len);
+	return virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+}
+
+/* With page_pool, the actual allocation may exceed the requested size
+ * when the remaining page fragment can't fit another buffer. Encode
+ * the actual allocation size in ctx so build_skb() gets the correct
+ * buflen for truesize accounting.
  */
 static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 			     gfp_t gfp)
 {
-	char *buf;
 	unsigned int xdp_headroom = virtnet_get_headroom(vi);
-	void *ctx = (void *)(unsigned long)xdp_headroom;
-	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
+	unsigned int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
+	unsigned int alloc_len;
+	char *buf;
+	void *ctx;
 	int err;
 
 	len = SKB_DATA_ALIGN(len) +
 	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
-	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
-		return -ENOMEM;
-
-	buf = virtnet_rq_alloc(rq, len, gfp);
+	alloc_len = len;
+	buf = page_pool_alloc_va(rq->page_pool, &alloc_len, gfp);
 	if (unlikely(!buf))
 		return -ENOMEM;
 
 	buf += VIRTNET_RX_PAD + xdp_headroom;
 
-	virtnet_rq_init_one_sg(rq, buf, vi->hdr_len + GOOD_PACKET_LEN);
-
-	err = virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, buf, ctx, gfp);
-	if (err < 0) {
-		virtnet_rq_unmap(rq, buf, 0);
-		put_page(virt_to_head_page(buf));
-	}
+	ctx = mergeable_len_to_ctx(alloc_len, xdp_headroom);
+	err = virtnet_rq_submit(rq, buf, vi->hdr_len + GOOD_PACKET_LEN, ctx, gfp);
 
+	if (err < 0)
+		page_pool_put_page(rq->page_pool, virt_to_head_page(buf), -1, false);
 	return err;
 }
 
@@ -2764,13 +2693,12 @@ static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
 static int add_recvbuf_mergeable(struct virtnet_info *vi,
 				 struct receive_queue *rq, gfp_t gfp)
 {
-	struct page_frag *alloc_frag = &rq->alloc_frag;
 	unsigned int headroom = virtnet_get_headroom(vi);
 	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
 	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
-	unsigned int len, hole;
-	void *ctx;
+	unsigned int len, alloc_len;
 	char *buf;
+	void *ctx;
 	int err;
 
 	/* Extra tailroom is needed to satisfy XDP's assumption. This
@@ -2779,39 +2707,22 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	 */
 	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
 
-	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
-		return -ENOMEM;
-
-	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
-		len -= sizeof(struct virtnet_rq_dma);
-
-	buf = virtnet_rq_alloc(rq, len + room, gfp);
+	alloc_len = len + room;
+	buf = page_pool_alloc_va(rq->page_pool, &alloc_len, gfp);
 	if (unlikely(!buf))
 		return -ENOMEM;
 
 	buf += headroom; /* advance address leaving hole at front of pkt */
-	hole = alloc_frag->size - alloc_frag->offset;
-	if (hole < len + room) {
-		/* To avoid internal fragmentation, if there is very likely not
-		 * enough space for another buffer, add the remaining space to
-		 * the current buffer.
-		 * XDP core assumes that frame_size of xdp_buff and the length
-		 * of the frag are PAGE_SIZE, so we disable the hole mechanism.
-		 */
-		if (!headroom)
-			len += hole;
-		alloc_frag->offset += hole;
-	}
 
-	virtnet_rq_init_one_sg(rq, buf, len);
+	if (!headroom)
+		len = alloc_len - room;
 
 	ctx = mergeable_len_to_ctx(len + room, headroom);
-	err = virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, buf, ctx, gfp);
-	if (err < 0) {
-		virtnet_rq_unmap(rq, buf, 0);
-		put_page(virt_to_head_page(buf));
-	}
 
+	err = virtnet_rq_submit(rq, buf, len, ctx, gfp);
+
+	if (err < 0)
+		page_pool_put_page(rq->page_pool, virt_to_head_page(buf), -1, false);
 	return err;
 }
 
@@ -2963,7 +2874,7 @@ static int virtnet_receive_packets(struct virtnet_info *vi,
 	int packets = 0;
 	void *buf;
 
-	if (!vi->big_packets || vi->mergeable_rx_bufs) {
+	if (rq->page_pool) {
 		void *ctx;
 		while (packets < budget &&
 		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
@@ -3128,7 +3039,10 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
 		return err;
 
 	err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq,
-					 MEM_TYPE_PAGE_SHARED, NULL);
+					 vi->rq[qp_index].page_pool ?
+						MEM_TYPE_PAGE_POOL :
+						MEM_TYPE_PAGE_SHARED,
+					 vi->rq[qp_index].page_pool);
 	if (err < 0)
 		goto err_xdp_reg_mem_model;
 
@@ -3168,6 +3082,82 @@ static void virtnet_update_settings(struct virtnet_info *vi)
 		vi->duplex = duplex;
 }
 
+static int virtnet_create_page_pools(struct virtnet_info *vi)
+{
+	int i, err;
+
+	if (vi->big_packets && !vi->mergeable_rx_bufs)
+		return 0;
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		struct receive_queue *rq = &vi->rq[i];
+		struct page_pool_params pp_params = { 0 };
+		struct device *dma_dev;
+
+		if (rq->page_pool)
+			continue;
+
+		if (rq->xsk_pool)
+			continue;
+
+		pp_params.order = 0;
+		pp_params.pool_size = virtqueue_get_vring_size(rq->vq);
+		pp_params.nid = dev_to_node(vi->vdev->dev.parent);
+		pp_params.netdev = vi->dev;
+		pp_params.napi = &rq->napi;
+
+		/* Use page_pool DMA mapping if backend supports DMA API.
+		 * DMA_SYNC_DEV is needed for non-coherent archs on recycle.
+		 */
+		dma_dev = virtqueue_dma_dev(rq->vq);
+		if (dma_dev) {
+			pp_params.dev = dma_dev;
+			pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
+			pp_params.dma_dir = DMA_FROM_DEVICE;
+			pp_params.max_len = PAGE_SIZE;
+			pp_params.offset = 0;
+			rq->use_page_pool_dma = true;
+		} else {
+			/* No DMA API (e.g., VDUSE): page_pool for allocation only. */
+			pp_params.flags = 0;
+			rq->use_page_pool_dma = false;
+		}
+
+		rq->page_pool = page_pool_create(&pp_params);
+		if (IS_ERR(rq->page_pool)) {
+			err = PTR_ERR(rq->page_pool);
+			rq->page_pool = NULL;
+			goto err_cleanup;
+		}
+	}
+	return 0;
+
+err_cleanup:
+	while (--i >= 0) {
+		struct receive_queue *rq = &vi->rq[i];
+
+		if (rq->page_pool) {
+			page_pool_destroy(rq->page_pool);
+			rq->page_pool = NULL;
+		}
+	}
+	return err;
+}
+
+static void virtnet_destroy_page_pools(struct virtnet_info *vi)
+{
+	int i;
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		struct receive_queue *rq = &vi->rq[i];
+
+		if (rq->page_pool) {
+			page_pool_destroy(rq->page_pool);
+			rq->page_pool = NULL;
+		}
+	}
+}
+
 static int virtnet_open(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
@@ -5715,6 +5705,10 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 	if (err)
 		return err;
 
+	err = virtnet_create_page_pools(vi);
+	if (err)
+		goto err_del_vqs;
+
 	virtio_device_ready(vdev);
 
 	enable_rx_mode_work(vi);
@@ -5724,12 +5718,24 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 		err = virtnet_open(vi->dev);
 		rtnl_unlock();
 		if (err)
-			return err;
+			goto err_destroy_pools;
 	}
 
 	netif_tx_lock_bh(vi->dev);
 	netif_device_attach(vi->dev);
 	netif_tx_unlock_bh(vi->dev);
+	return 0;
+
+err_destroy_pools:
+	virtio_reset_device(vdev);
+	free_unused_bufs(vi);
+	virtnet_destroy_page_pools(vi);
+	virtnet_del_vqs(vi);
+	return err;
+
+err_del_vqs:
+	virtio_reset_device(vdev);
+	virtnet_del_vqs(vi);
 	return err;
 }
 
@@ -5857,7 +5863,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
 	/* In big_packets mode, xdp cannot work, so there is no need to
 	 * initialize xsk of rq.
 	 */
-	if (vi->big_packets && !vi->mergeable_rx_bufs)
+	if (!vi->rq[qid].page_pool)
 		return -ENOENT;
 
 	if (qid >= vi->curr_queue_pairs)
@@ -6287,17 +6293,6 @@ static void free_receive_bufs(struct virtnet_info *vi)
 	rtnl_unlock();
 }
 
-static void free_receive_page_frags(struct virtnet_info *vi)
-{
-	int i;
-	for (i = 0; i < vi->max_queue_pairs; i++)
-		if (vi->rq[i].alloc_frag.page) {
-			if (vi->rq[i].last_dma)
-				virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
-			put_page(vi->rq[i].alloc_frag.page);
-		}
-}
-
 static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
@@ -6401,7 +6396,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 	vqs_info = kzalloc_objs(*vqs_info, total_vqs);
 	if (!vqs_info)
 		goto err_vqs_info;
-	if (!vi->big_packets || vi->mergeable_rx_bufs) {
+	if (vi->mergeable_rx_bufs || !vi->big_packets) {
 		ctx = kzalloc_objs(*ctx, total_vqs);
 		if (!ctx)
 			goto err_ctx;
@@ -6441,10 +6436,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 		vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
 		vi->sq[i].vq = vqs[txq2vq(i)];
 	}
-
 	/* run here: ret == 0. */
 
-
 err_find:
 	kfree(ctx);
 err_ctx:
@@ -6945,6 +6938,14 @@ static int virtnet_probe(struct virtio_device *vdev)
 			goto free;
 	}
 
+	/* Create page pools for receive queues.
+	 * Page pools are created at probe time so they can be used
+	 * with premapped DMA addresses throughout the device lifetime.
+	 */
+	err = virtnet_create_page_pools(vi);
+	if (err)
+		goto free_irq_moder;
+
 #ifdef CONFIG_SYSFS
 	if (vi->mergeable_rx_bufs)
 		dev->sysfs_rx_queue_group = &virtio_net_mrg_rx_group;
@@ -6958,7 +6959,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 		vi->failover = net_failover_create(vi->dev);
 		if (IS_ERR(vi->failover)) {
 			err = PTR_ERR(vi->failover);
-			goto free_vqs;
+			goto free_page_pools;
 		}
 	}
 
@@ -7075,9 +7076,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 	unregister_netdev(dev);
 free_failover:
 	net_failover_destroy(vi->failover);
-free_vqs:
+free_page_pools:
+	virtnet_destroy_page_pools(vi);
+free_irq_moder:
+	virtnet_free_irq_moder(vi);
 	virtio_reset_device(vdev);
-	free_receive_page_frags(vi);
 	virtnet_del_vqs(vi);
 free:
 	free_netdev(dev);
@@ -7102,7 +7105,7 @@ static void remove_vq_common(struct virtnet_info *vi)
 
 	free_receive_bufs(vi);
 
-	free_receive_page_frags(vi);
+	virtnet_destroy_page_pools(vi);
 
 	virtnet_del_vqs(vi);
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-10 18:31 [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation Vishwanath Seshagiri
@ 2026-03-13  7:51 ` Jason Wang
  2026-03-13  9:26   ` Vishwanath Seshagiri
  2026-03-13 16:50   ` Vishwanath Seshagiri
  2026-03-16  9:56 ` Michael S. Tsirkin
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Jason Wang @ 2026-03-13  7:51 UTC (permalink / raw)
  To: Vishwanath Seshagiri
  Cc: Michael S . Tsirkin, Xuan Zhuo, Eugenio Pérez, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wei, Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team

On Wed, Mar 11, 2026 at 2:31 AM Vishwanath Seshagiri <vishs@meta.com> wrote:
>
> Use page_pool for RX buffer allocation in mergeable and small buffer
> modes to enable page recycling and avoid repeated page allocator calls.
> skb_mark_for_recycle() enables page reuse in the network stack.
>
> Big packets mode is unchanged because it uses page->private for linked
> list chaining of multiple pages per buffer, which conflicts with
> page_pool's internal use of page->private.
>
> Implement conditional DMA premapping using virtqueue_dma_dev():
> - When non-NULL (vhost, virtio-pci): use PP_FLAG_DMA_MAP with page_pool
>   handling DMA mapping, submit via virtqueue_add_inbuf_premapped()
> - When NULL (VDUSE, direct physical): page_pool handles allocation only,
>   submit via virtqueue_add_inbuf_ctx()
>
> This preserves the DMA premapping optimization from commit 31f3cd4e5756b
> ("virtio-net: rq submits premapped per-buffer") while adding page_pool
> support as a prerequisite for future zero-copy features (devmem TCP,
> io_uring ZCRX).
>
> Page pools are created in probe and destroyed in remove (not open/close),
> following existing driver behavior where RX buffers remain in virtqueues
> across interface state changes.
>
> Signed-off-by: Vishwanath Seshagiri <vishs@meta.com>
> ---
> Changes in v11:
> - add_recvbuf_small: encode alloc_len and xdp_headroom in ctx via
>   mergeable_len_to_ctx() so receive_small() recovers the actual buflen
>   via mergeable_ctx_to_truesize() (Michael S. Tsirkin)
> - receive_small_build_skb, receive_small_xdp: accept buflen parameter
>   instead of recomputing it, to use the actual allocation size
> - v10:
>   https://lore.kernel.org/virtualization/9752a952-195d-4da3-bc7a-5a4a1f2fd2ca@meta.com/
>
> Changes in v10:
> - add_recvbuf_small: use alloc_len to avoid clobbering len; v9 feedback
>   was about truesize under-accounting, not variable naming — misunderstood
>   the comment in v9
> - v9:
>   https://lore.kernel.org/virtualization/20260302041005.1627210-1-vishs@meta.com/
>
> Changes in v9:
> - Fix virtnet_skb_append_frag() for XSK callers (Michael S. Tsirkin)
> - v8:
>   https://lore.kernel.org/virtualization/e824c5a3-cfe0-4d11-958f-c3ec82d11d37@meta.com/
>
> Changes in v8:
> - Remove virtnet_no_page_pool() helper, replace with direct !rq->page_pool
>   checks or inlined conditions (Xuan Zhuo)
> - Extract virtnet_rq_submit() helper to consolidate DMA/non-DMA buffer
>   submission in add_recvbuf_small() and add_recvbuf_mergeable()
> - Add skb_mark_for_recycle(nskb) for overflow frag_list skbs in
>   virtnet_skb_append_frag() to ensure page_pool pages are returned to
>   the pool instead of freed via put_page()
> - Rebase on net-next (kzalloc_objs API)
> - v7:
>   https://lore.kernel.org/virtualization/20260210014305.3236342-1-vishs@meta.com/
>
> Changes in v7:
> - Replace virtnet_put_page() helper with direct page_pool_put_page()
>   calls (Xuan Zhuo)
> - Add virtnet_no_page_pool() helper to consolidate big_packets mode check
>   (Michael S. Tsirkin)
> - Add DMA sync_for_cpu for subsequent buffers in xdp_linearize_page() when
>   use_page_pool_dma is set (Michael S. Tsirkin)
> - Remove unused pp_params.dev assignment in non-DMA path
> - Add page pool recreation in virtnet_restore_up() for freeze/restore support (Chris Mason's
> Review Prompt)
> - v6:
>   https://lore.kernel.org/virtualization/20260208175410.1910001-1-vishs@meta.com/
>
> Changes in v6:
> - Drop page_pool_frag_offset_add() helper and switch to page_pool_alloc_va();
>   page_pool_alloc_netmem() already handles internal fragmentation internally
>   (Jakub Kicinski)
> - v5:
>   https://lore.kernel.org/virtualization/20260206002715.1885869-1-vishs@meta.com/
>
> Benchmark results:
>
> Configuration: pktgen TX -> tap -> vhost-net | virtio-net RX -> XDP_DROP
>
> Small packets (64 bytes, mrg_rxbuf=off):
>   1Q:  853,493 -> 868,923 pps  (+1.8%)
>   2Q: 1,655,793 -> 1,696,707 pps (+2.5%)
>   4Q: 3,143,375 -> 3,302,511 pps (+5.1%)
>   8Q: 6,082,590 -> 6,156,894 pps (+1.2%)
>
> Mergeable RX (64 bytes):
>   1Q:   766,168 ->   814,493 pps  (+6.3%)
>   2Q: 1,384,871 -> 1,670,639 pps (+20.6%)
>   4Q: 2,773,081 -> 3,080,574 pps (+11.1%)
>   8Q: 5,600,615 -> 6,043,891 pps  (+7.9%)
>
> Mergeable RX (1500 bytes):
>   1Q:   741,579 ->   785,442 pps  (+5.9%)
>   2Q: 1,310,043 -> 1,534,554 pps (+17.1%)
>   4Q: 2,748,700 -> 2,890,582 pps  (+5.2%)
>   8Q: 5,348,589 -> 5,618,664 pps  (+5.0%)
>
>  drivers/net/Kconfig      |   1 +
>  drivers/net/virtio_net.c | 497 ++++++++++++++++++++-------------------
>  2 files changed, 251 insertions(+), 247 deletions(-)
>
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 17108c359216..b2fd90466bab 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -452,6 +452,7 @@ config VIRTIO_NET
>         depends on VIRTIO
>         select NET_FAILOVER
>         select DIMLIB
> +       select PAGE_POOL
>         help
>           This is the virtual network driver for virtio.  It can be used with
>           QEMU based VMMs (like KVM or Xen).  Say Y or M.
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 72d6a9c6a5a2..a85d75a7f539 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,6 +26,7 @@
>  #include <net/netdev_rx_queue.h>
>  #include <net/netdev_queues.h>
>  #include <net/xdp_sock_drv.h>
> +#include <net/page_pool/helpers.h>
>
>  static int napi_weight = NAPI_POLL_WEIGHT;
>  module_param(napi_weight, int, 0444);
> @@ -290,14 +291,6 @@ struct virtnet_interrupt_coalesce {
>         u32 max_usecs;
>  };
>
> -/* The dma information of pages allocated at a time. */
> -struct virtnet_rq_dma {
> -       dma_addr_t addr;
> -       u32 ref;
> -       u16 len;
> -       u16 need_sync;
> -};
> -
>  /* Internal representation of a send virtqueue */
>  struct send_queue {
>         /* Virtqueue associated with this send _queue */
> @@ -356,8 +349,10 @@ struct receive_queue {
>         /* Average packet length for mergeable receive buffers. */
>         struct ewma_pkt_len mrg_avg_pkt_len;
>
> -       /* Page frag for packet buffer allocation. */
> -       struct page_frag alloc_frag;
> +       struct page_pool *page_pool;
> +
> +       /* True if page_pool handles DMA mapping via PP_FLAG_DMA_MAP */
> +       bool use_page_pool_dma;
>
>         /* RX: fragments + linear part + virtio header */
>         struct scatterlist sg[MAX_SKB_FRAGS + 2];
> @@ -370,9 +365,6 @@ struct receive_queue {
>
>         struct xdp_rxq_info xdp_rxq;
>
> -       /* Record the last dma info to free after new pages is allocated. */
> -       struct virtnet_rq_dma *last_dma;
> -
>         struct xsk_buff_pool *xsk_pool;
>
>         /* xdp rxq used by xsk */
> @@ -521,11 +513,14 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
>                                struct virtnet_rq_stats *stats);
>  static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
>                                  struct sk_buff *skb, u8 flags);
> -static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
> +static struct sk_buff *virtnet_skb_append_frag(struct receive_queue *rq,
> +                                              struct sk_buff *head_skb,
>                                                struct sk_buff *curr_skb,
>                                                struct page *page, void *buf,
>                                                int len, int truesize);
>  static void virtnet_xsk_completed(struct send_queue *sq, int num);
> +static void free_unused_bufs(struct virtnet_info *vi);
> +static void virtnet_del_vqs(struct virtnet_info *vi);
>
>  enum virtnet_xmit_type {
>         VIRTNET_XMIT_TYPE_SKB,
> @@ -709,12 +704,10 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>  static void virtnet_rq_free_buf(struct virtnet_info *vi,
>                                 struct receive_queue *rq, void *buf)
>  {
> -       if (vi->mergeable_rx_bufs)
> -               put_page(virt_to_head_page(buf));
> -       else if (vi->big_packets)
> +       if (!rq->page_pool)
>                 give_pages(rq, buf);
>         else
> -               put_page(virt_to_head_page(buf));
> +               page_pool_put_page(rq->page_pool, virt_to_head_page(buf), -1, false);
>  }
>
>  static void enable_rx_mode_work(struct virtnet_info *vi)
> @@ -876,10 +869,16 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>                 skb = virtnet_build_skb(buf, truesize, p - buf, len);
>                 if (unlikely(!skb))
>                         return NULL;
> +               /* Big packets mode chains pages via page->private, which is
> +                * incompatible with the way page_pool uses page->private.
> +                * Currently, big packets mode doesn't use page pools.
> +                */
> +               if (!rq->page_pool) {
> +                       page = (struct page *)page->private;
> +                       if (page)
> +                               give_pages(rq, page);
> +               }
>
> -               page = (struct page *)page->private;
> -               if (page)
> -                       give_pages(rq, page);
>                 goto ok;
>         }
>
> @@ -925,133 +924,16 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>         hdr = skb_vnet_common_hdr(skb);
>         memcpy(hdr, hdr_p, hdr_len);
>         if (page_to_free)
> -               put_page(page_to_free);
> +               page_pool_put_page(rq->page_pool, page_to_free, -1, true);
>
>         return skb;
>  }
>
> -static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len)
> -{
> -       struct virtnet_info *vi = rq->vq->vdev->priv;
> -       struct page *page = virt_to_head_page(buf);
> -       struct virtnet_rq_dma *dma;
> -       void *head;
> -       int offset;
> -
> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
> -
> -       head = page_address(page);
> -
> -       dma = head;
> -
> -       --dma->ref;
> -
> -       if (dma->need_sync && len) {
> -               offset = buf - (head + sizeof(*dma));
> -
> -               virtqueue_map_sync_single_range_for_cpu(rq->vq, dma->addr,
> -                                                       offset, len,
> -                                                       DMA_FROM_DEVICE);
> -       }
> -
> -       if (dma->ref)
> -               return;
> -
> -       virtqueue_unmap_single_attrs(rq->vq, dma->addr, dma->len,
> -                                    DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
> -       put_page(page);
> -}
> -
>  static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
>  {
> -       struct virtnet_info *vi = rq->vq->vdev->priv;
> -       void *buf;
> -
> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
> +       BUG_ON(!rq->page_pool);
>
> -       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> -       if (buf)
> -               virtnet_rq_unmap(rq, buf, *len);
> -
> -       return buf;
> -}
> -
> -static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
> -{
> -       struct virtnet_info *vi = rq->vq->vdev->priv;
> -       struct virtnet_rq_dma *dma;
> -       dma_addr_t addr;
> -       u32 offset;
> -       void *head;
> -
> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
> -
> -       head = page_address(rq->alloc_frag.page);
> -
> -       offset = buf - head;
> -
> -       dma = head;
> -
> -       addr = dma->addr - sizeof(*dma) + offset;
> -
> -       sg_init_table(rq->sg, 1);
> -       sg_fill_dma(rq->sg, addr, len);
> -}
> -
> -static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> -{
> -       struct page_frag *alloc_frag = &rq->alloc_frag;
> -       struct virtnet_info *vi = rq->vq->vdev->priv;
> -       struct virtnet_rq_dma *dma;
> -       void *buf, *head;
> -       dma_addr_t addr;
> -
> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
> -
> -       head = page_address(alloc_frag->page);
> -
> -       dma = head;
> -
> -       /* new pages */
> -       if (!alloc_frag->offset) {
> -               if (rq->last_dma) {
> -                       /* Now, the new page is allocated, the last dma
> -                        * will not be used. So the dma can be unmapped
> -                        * if the ref is 0.
> -                        */
> -                       virtnet_rq_unmap(rq, rq->last_dma, 0);
> -                       rq->last_dma = NULL;
> -               }
> -
> -               dma->len = alloc_frag->size - sizeof(*dma);
> -
> -               addr = virtqueue_map_single_attrs(rq->vq, dma + 1,
> -                                                 dma->len, DMA_FROM_DEVICE, 0);
> -               if (virtqueue_map_mapping_error(rq->vq, addr))
> -                       return NULL;
> -
> -               dma->addr = addr;
> -               dma->need_sync = virtqueue_map_need_sync(rq->vq, addr);
> -
> -               /* Add a reference to dma to prevent the entire dma from
> -                * being released during error handling. This reference
> -                * will be freed after the pages are no longer used.
> -                */
> -               get_page(alloc_frag->page);
> -               dma->ref = 1;
> -               alloc_frag->offset = sizeof(*dma);
> -
> -               rq->last_dma = dma;
> -       }
> -
> -       ++dma->ref;
> -
> -       buf = head + alloc_frag->offset;
> -
> -       get_page(alloc_frag->page);
> -       alloc_frag->offset += size;
> -
> -       return buf;
> +       return virtqueue_get_buf_ctx(rq->vq, len, ctx);
>  }
>
>  static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> @@ -1067,9 +949,6 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>                 return;
>         }
>
> -       if (!vi->big_packets || vi->mergeable_rx_bufs)
> -               virtnet_rq_unmap(rq, buf, 0);
> -
>         virtnet_rq_free_buf(vi, rq, buf);
>  }
>
> @@ -1335,7 +1214,7 @@ static int xsk_append_merge_buffer(struct virtnet_info *vi,
>
>                 truesize = len;
>
> -               curr_skb  = virtnet_skb_append_frag(head_skb, curr_skb, page,
> +               curr_skb  = virtnet_skb_append_frag(rq, head_skb, curr_skb, page,
>                                                     buf, len, truesize);
>                 if (!curr_skb) {
>                         put_page(page);
> @@ -1771,7 +1650,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>         return ret;
>  }
>
> -static void put_xdp_frags(struct xdp_buff *xdp)
> +static void put_xdp_frags(struct receive_queue *rq, struct xdp_buff *xdp)
>  {
>         struct skb_shared_info *shinfo;
>         struct page *xdp_page;
> @@ -1781,7 +1660,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
>                 shinfo = xdp_get_shared_info_from_buff(xdp);
>                 for (i = 0; i < shinfo->nr_frags; i++) {
>                         xdp_page = skb_frag_page(&shinfo->frags[i]);
> -                       put_page(xdp_page);
> +                       page_pool_put_page(rq->page_pool, xdp_page, -1, true);
>                 }
>         }
>  }
> @@ -1873,7 +1752,7 @@ static struct page *xdp_linearize_page(struct net_device *dev,
>         if (page_off + *len + tailroom > PAGE_SIZE)
>                 return NULL;
>
> -       page = alloc_page(GFP_ATOMIC);
> +       page = page_pool_alloc_pages(rq->page_pool, GFP_ATOMIC);
>         if (!page)
>                 return NULL;
>
> @@ -1896,8 +1775,12 @@ static struct page *xdp_linearize_page(struct net_device *dev,
>                 p = virt_to_head_page(buf);
>                 off = buf - page_address(p);
>
> +               if (rq->use_page_pool_dma)
> +                       page_pool_dma_sync_for_cpu(rq->page_pool, p,
> +                                                  off, buflen);

Intresting, I think we need a patch for -stable to sync for cpu as
well (and probably the XDP_TX path).


> +
>                 if (check_mergeable_len(dev, ctx, buflen)) {
> -                       put_page(p);
> +                       page_pool_put_page(rq->page_pool, p, -1, true);
>                         goto err_buf;
>                 }
>
> @@ -1905,38 +1788,36 @@ static struct page *xdp_linearize_page(struct net_device *dev,
>                  * is sending packet larger than the MTU.
>                  */
>                 if ((page_off + buflen + tailroom) > PAGE_SIZE) {
> -                       put_page(p);
> +                       page_pool_put_page(rq->page_pool, p, -1, true);
>                         goto err_buf;
>                 }
>
>                 memcpy(page_address(page) + page_off,
>                        page_address(p) + off, buflen);
>                 page_off += buflen;
> -               put_page(p);
> +               page_pool_put_page(rq->page_pool, p, -1, true);
>         }
>
>         /* Headroom does not contribute to packet length */
>         *len = page_off - XDP_PACKET_HEADROOM;
>         return page;
>  err_buf:
> -       __free_pages(page, 0);
> +       page_pool_put_page(rq->page_pool, page, -1, true);
>         return NULL;
>  }
>
>  static struct sk_buff *receive_small_build_skb(struct virtnet_info *vi,
>                                                unsigned int xdp_headroom,
>                                                void *buf,
> -                                              unsigned int len)
> +                                              unsigned int len,
> +                                              unsigned int buflen)
>  {
>         unsigned int header_offset;
>         unsigned int headroom;
> -       unsigned int buflen;
>         struct sk_buff *skb;
>
>         header_offset = VIRTNET_RX_PAD + xdp_headroom;
>         headroom = vi->hdr_len + header_offset;
> -       buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> -               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>

Any reason for remvoing this?

The rest looks fine.

Thanks


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-13  7:51 ` Jason Wang
@ 2026-03-13  9:26   ` Vishwanath Seshagiri
  2026-03-16  7:41     ` Jason Wang
  2026-03-13 16:50   ` Vishwanath Seshagiri
  1 sibling, 1 reply; 22+ messages in thread
From: Vishwanath Seshagiri @ 2026-03-13  9:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Xuan Zhuo, Eugenio Pérez, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wei, Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team

On 3/13/26 1:21 PM, Jason Wang wrote:
> On Wed, Mar 11, 2026 at 2:31 AM Vishwanath Seshagiri <vishs@meta.com> wrote:
>>
>> Use page_pool for RX buffer allocation in mergeable and small buffer
>> modes to enable page recycling and avoid repeated page allocator calls.
>> skb_mark_for_recycle() enables page reuse in the network stack.
>>
>> Big packets mode is unchanged because it uses page->private for linked
>> list chaining of multiple pages per buffer, which conflicts with
>> page_pool's internal use of page->private.
>>
>> Implement conditional DMA premapping using virtqueue_dma_dev():
>> - When non-NULL (vhost, virtio-pci): use PP_FLAG_DMA_MAP with page_pool
>>    handling DMA mapping, submit via virtqueue_add_inbuf_premapped()
>> - When NULL (VDUSE, direct physical): page_pool handles allocation only,
>>    submit via virtqueue_add_inbuf_ctx()
>>
>> This preserves the DMA premapping optimization from commit 31f3cd4e5756b
>> ("virtio-net: rq submits premapped per-buffer") while adding page_pool
>> support as a prerequisite for future zero-copy features (devmem TCP,
>> io_uring ZCRX).
>>
>> Page pools are created in probe and destroyed in remove (not open/close),
>> following existing driver behavior where RX buffers remain in virtqueues
>> across interface state changes.
>>
>> Signed-off-by: Vishwanath Seshagiri <vishs@meta.com>
>> ---
>> Changes in v11:
>> - add_recvbuf_small: encode alloc_len and xdp_headroom in ctx via
>>    mergeable_len_to_ctx() so receive_small() recovers the actual buflen
>>    via mergeable_ctx_to_truesize() (Michael S. Tsirkin)
>> - receive_small_build_skb, receive_small_xdp: accept buflen parameter
>>    instead of recomputing it, to use the actual allocation size
>> - v10:
>>    https://lore.kernel.org/virtualization/9752a952-195d-4da3-bc7a-5a4a1f2fd2ca@meta.com/
>>
>> Changes in v10:
>> - add_recvbuf_small: use alloc_len to avoid clobbering len; v9 feedback
>>    was about truesize under-accounting, not variable naming — misunderstood
>>    the comment in v9
>> - v9:
>>    https://lore.kernel.org/virtualization/20260302041005.1627210-1-vishs@meta.com/
>>
>> Changes in v9:
>> - Fix virtnet_skb_append_frag() for XSK callers (Michael S. Tsirkin)
>> - v8:
>>    https://lore.kernel.org/virtualization/e824c5a3-cfe0-4d11-958f-c3ec82d11d37@meta.com/
>>
>> Changes in v8:
>> - Remove virtnet_no_page_pool() helper, replace with direct !rq->page_pool
>>    checks or inlined conditions (Xuan Zhuo)
>> - Extract virtnet_rq_submit() helper to consolidate DMA/non-DMA buffer
>>    submission in add_recvbuf_small() and add_recvbuf_mergeable()
>> - Add skb_mark_for_recycle(nskb) for overflow frag_list skbs in
>>    virtnet_skb_append_frag() to ensure page_pool pages are returned to
>>    the pool instead of freed via put_page()
>> - Rebase on net-next (kzalloc_objs API)
>> - v7:
>>    https://lore.kernel.org/virtualization/20260210014305.3236342-1-vishs@meta.com/
>>
>> Changes in v7:
>> - Replace virtnet_put_page() helper with direct page_pool_put_page()
>>    calls (Xuan Zhuo)
>> - Add virtnet_no_page_pool() helper to consolidate big_packets mode check
>>    (Michael S. Tsirkin)
>> - Add DMA sync_for_cpu for subsequent buffers in xdp_linearize_page() when
>>    use_page_pool_dma is set (Michael S. Tsirkin)
>> - Remove unused pp_params.dev assignment in non-DMA path
>> - Add page pool recreation in virtnet_restore_up() for freeze/restore support (Chris Mason's
>> Review Prompt)
>> - v6:
>>    https://lore.kernel.org/virtualization/20260208175410.1910001-1-vishs@meta.com/
>>
>> Changes in v6:
>> - Drop page_pool_frag_offset_add() helper and switch to page_pool_alloc_va();
>>    page_pool_alloc_netmem() already handles internal fragmentation internally
>>    (Jakub Kicinski)
>> - v5:
>>    https://lore.kernel.org/virtualization/20260206002715.1885869-1-vishs@meta.com/
>>
>> Benchmark results:
>>
>> Configuration: pktgen TX -> tap -> vhost-net | virtio-net RX -> XDP_DROP
>>
>> Small packets (64 bytes, mrg_rxbuf=off):
>>    1Q:  853,493 -> 868,923 pps  (+1.8%)
>>    2Q: 1,655,793 -> 1,696,707 pps (+2.5%)
>>    4Q: 3,143,375 -> 3,302,511 pps (+5.1%)
>>    8Q: 6,082,590 -> 6,156,894 pps (+1.2%)
>>
>> Mergeable RX (64 bytes):
>>    1Q:   766,168 ->   814,493 pps  (+6.3%)
>>    2Q: 1,384,871 -> 1,670,639 pps (+20.6%)
>>    4Q: 2,773,081 -> 3,080,574 pps (+11.1%)
>>    8Q: 5,600,615 -> 6,043,891 pps  (+7.9%)
>>
>> Mergeable RX (1500 bytes):
>>    1Q:   741,579 ->   785,442 pps  (+5.9%)
>>    2Q: 1,310,043 -> 1,534,554 pps (+17.1%)
>>    4Q: 2,748,700 -> 2,890,582 pps  (+5.2%)
>>    8Q: 5,348,589 -> 5,618,664 pps  (+5.0%)
>>
>>   drivers/net/Kconfig      |   1 +
>>   drivers/net/virtio_net.c | 497 ++++++++++++++++++++-------------------
>>   2 files changed, 251 insertions(+), 247 deletions(-)
>>
>> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>> index 17108c359216..b2fd90466bab 100644
>> --- a/drivers/net/Kconfig
>> +++ b/drivers/net/Kconfig
>> @@ -452,6 +452,7 @@ config VIRTIO_NET
>>          depends on VIRTIO
>>          select NET_FAILOVER
>>          select DIMLIB
>> +       select PAGE_POOL
>>          help
>>            This is the virtual network driver for virtio.  It can be used with
>>            QEMU based VMMs (like KVM or Xen).  Say Y or M.
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 72d6a9c6a5a2..a85d75a7f539 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -26,6 +26,7 @@
>>   #include <net/netdev_rx_queue.h>
>>   #include <net/netdev_queues.h>
>>   #include <net/xdp_sock_drv.h>
>> +#include <net/page_pool/helpers.h>
>>
>>   static int napi_weight = NAPI_POLL_WEIGHT;
>>   module_param(napi_weight, int, 0444);
>> @@ -290,14 +291,6 @@ struct virtnet_interrupt_coalesce {
>>          u32 max_usecs;
>>   };
>>
>> -/* The dma information of pages allocated at a time. */
>> -struct virtnet_rq_dma {
>> -       dma_addr_t addr;
>> -       u32 ref;
>> -       u16 len;
>> -       u16 need_sync;
>> -};
>> -
>>   /* Internal representation of a send virtqueue */
>>   struct send_queue {
>>          /* Virtqueue associated with this send _queue */
>> @@ -356,8 +349,10 @@ struct receive_queue {
>>          /* Average packet length for mergeable receive buffers. */
>>          struct ewma_pkt_len mrg_avg_pkt_len;
>>
>> -       /* Page frag for packet buffer allocation. */
>> -       struct page_frag alloc_frag;
>> +       struct page_pool *page_pool;
>> +
>> +       /* True if page_pool handles DMA mapping via PP_FLAG_DMA_MAP */
>> +       bool use_page_pool_dma;
>>
>>          /* RX: fragments + linear part + virtio header */
>>          struct scatterlist sg[MAX_SKB_FRAGS + 2];
>> @@ -370,9 +365,6 @@ struct receive_queue {
>>
>>          struct xdp_rxq_info xdp_rxq;
>>
>> -       /* Record the last dma info to free after new pages is allocated. */
>> -       struct virtnet_rq_dma *last_dma;
>> -
>>          struct xsk_buff_pool *xsk_pool;
>>
>>          /* xdp rxq used by xsk */
>> @@ -521,11 +513,14 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
>>                                 struct virtnet_rq_stats *stats);
>>   static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
>>                                   struct sk_buff *skb, u8 flags);
>> -static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
>> +static struct sk_buff *virtnet_skb_append_frag(struct receive_queue *rq,
>> +                                              struct sk_buff *head_skb,
>>                                                 struct sk_buff *curr_skb,
>>                                                 struct page *page, void *buf,
>>                                                 int len, int truesize);
>>   static void virtnet_xsk_completed(struct send_queue *sq, int num);
>> +static void free_unused_bufs(struct virtnet_info *vi);
>> +static void virtnet_del_vqs(struct virtnet_info *vi);
>>
>>   enum virtnet_xmit_type {
>>          VIRTNET_XMIT_TYPE_SKB,
>> @@ -709,12 +704,10 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>>   static void virtnet_rq_free_buf(struct virtnet_info *vi,
>>                                  struct receive_queue *rq, void *buf)
>>   {
>> -       if (vi->mergeable_rx_bufs)
>> -               put_page(virt_to_head_page(buf));
>> -       else if (vi->big_packets)
>> +       if (!rq->page_pool)
>>                  give_pages(rq, buf);
>>          else
>> -               put_page(virt_to_head_page(buf));
>> +               page_pool_put_page(rq->page_pool, virt_to_head_page(buf), -1, false);
>>   }
>>
>>   static void enable_rx_mode_work(struct virtnet_info *vi)
>> @@ -876,10 +869,16 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>                  skb = virtnet_build_skb(buf, truesize, p - buf, len);
>>                  if (unlikely(!skb))
>>                          return NULL;
>> +               /* Big packets mode chains pages via page->private, which is
>> +                * incompatible with the way page_pool uses page->private.
>> +                * Currently, big packets mode doesn't use page pools.
>> +                */
>> +               if (!rq->page_pool) {
>> +                       page = (struct page *)page->private;
>> +                       if (page)
>> +                               give_pages(rq, page);
>> +               }
>>
>> -               page = (struct page *)page->private;
>> -               if (page)
>> -                       give_pages(rq, page);
>>                  goto ok;
>>          }
>>
>> @@ -925,133 +924,16 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>          hdr = skb_vnet_common_hdr(skb);
>>          memcpy(hdr, hdr_p, hdr_len);
>>          if (page_to_free)
>> -               put_page(page_to_free);
>> +               page_pool_put_page(rq->page_pool, page_to_free, -1, true);
>>
>>          return skb;
>>   }
>>
>> -static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len)
>> -{
>> -       struct virtnet_info *vi = rq->vq->vdev->priv;
>> -       struct page *page = virt_to_head_page(buf);
>> -       struct virtnet_rq_dma *dma;
>> -       void *head;
>> -       int offset;
>> -
>> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
>> -
>> -       head = page_address(page);
>> -
>> -       dma = head;
>> -
>> -       --dma->ref;
>> -
>> -       if (dma->need_sync && len) {
>> -               offset = buf - (head + sizeof(*dma));
>> -
>> -               virtqueue_map_sync_single_range_for_cpu(rq->vq, dma->addr,
>> -                                                       offset, len,
>> -                                                       DMA_FROM_DEVICE);
>> -       }
>> -
>> -       if (dma->ref)
>> -               return;
>> -
>> -       virtqueue_unmap_single_attrs(rq->vq, dma->addr, dma->len,
>> -                                    DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
>> -       put_page(page);
>> -}
>> -
>>   static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
>>   {
>> -       struct virtnet_info *vi = rq->vq->vdev->priv;
>> -       void *buf;
>> -
>> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
>> +       BUG_ON(!rq->page_pool);
>>
>> -       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
>> -       if (buf)
>> -               virtnet_rq_unmap(rq, buf, *len);
>> -
>> -       return buf;
>> -}
>> -
>> -static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
>> -{
>> -       struct virtnet_info *vi = rq->vq->vdev->priv;
>> -       struct virtnet_rq_dma *dma;
>> -       dma_addr_t addr;
>> -       u32 offset;
>> -       void *head;
>> -
>> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
>> -
>> -       head = page_address(rq->alloc_frag.page);
>> -
>> -       offset = buf - head;
>> -
>> -       dma = head;
>> -
>> -       addr = dma->addr - sizeof(*dma) + offset;
>> -
>> -       sg_init_table(rq->sg, 1);
>> -       sg_fill_dma(rq->sg, addr, len);
>> -}
>> -
>> -static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>> -{
>> -       struct page_frag *alloc_frag = &rq->alloc_frag;
>> -       struct virtnet_info *vi = rq->vq->vdev->priv;
>> -       struct virtnet_rq_dma *dma;
>> -       void *buf, *head;
>> -       dma_addr_t addr;
>> -
>> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
>> -
>> -       head = page_address(alloc_frag->page);
>> -
>> -       dma = head;
>> -
>> -       /* new pages */
>> -       if (!alloc_frag->offset) {
>> -               if (rq->last_dma) {
>> -                       /* Now, the new page is allocated, the last dma
>> -                        * will not be used. So the dma can be unmapped
>> -                        * if the ref is 0.
>> -                        */
>> -                       virtnet_rq_unmap(rq, rq->last_dma, 0);
>> -                       rq->last_dma = NULL;
>> -               }
>> -
>> -               dma->len = alloc_frag->size - sizeof(*dma);
>> -
>> -               addr = virtqueue_map_single_attrs(rq->vq, dma + 1,
>> -                                                 dma->len, DMA_FROM_DEVICE, 0);
>> -               if (virtqueue_map_mapping_error(rq->vq, addr))
>> -                       return NULL;
>> -
>> -               dma->addr = addr;
>> -               dma->need_sync = virtqueue_map_need_sync(rq->vq, addr);
>> -
>> -               /* Add a reference to dma to prevent the entire dma from
>> -                * being released during error handling. This reference
>> -                * will be freed after the pages are no longer used.
>> -                */
>> -               get_page(alloc_frag->page);
>> -               dma->ref = 1;
>> -               alloc_frag->offset = sizeof(*dma);
>> -
>> -               rq->last_dma = dma;
>> -       }
>> -
>> -       ++dma->ref;
>> -
>> -       buf = head + alloc_frag->offset;
>> -
>> -       get_page(alloc_frag->page);
>> -       alloc_frag->offset += size;
>> -
>> -       return buf;
>> +       return virtqueue_get_buf_ctx(rq->vq, len, ctx);
>>   }
>>
>>   static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>> @@ -1067,9 +949,6 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>>                  return;
>>          }
>>
>> -       if (!vi->big_packets || vi->mergeable_rx_bufs)
>> -               virtnet_rq_unmap(rq, buf, 0);
>> -
>>          virtnet_rq_free_buf(vi, rq, buf);
>>   }
>>
>> @@ -1335,7 +1214,7 @@ static int xsk_append_merge_buffer(struct virtnet_info *vi,
>>
>>                  truesize = len;
>>
>> -               curr_skb  = virtnet_skb_append_frag(head_skb, curr_skb, page,
>> +               curr_skb  = virtnet_skb_append_frag(rq, head_skb, curr_skb, page,
>>                                                      buf, len, truesize);
>>                  if (!curr_skb) {
>>                          put_page(page);
>> @@ -1771,7 +1650,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>>          return ret;
>>   }
>>
>> -static void put_xdp_frags(struct xdp_buff *xdp)
>> +static void put_xdp_frags(struct receive_queue *rq, struct xdp_buff *xdp)
>>   {
>>          struct skb_shared_info *shinfo;
>>          struct page *xdp_page;
>> @@ -1781,7 +1660,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
>>                  shinfo = xdp_get_shared_info_from_buff(xdp);
>>                  for (i = 0; i < shinfo->nr_frags; i++) {
>>                          xdp_page = skb_frag_page(&shinfo->frags[i]);
>> -                       put_page(xdp_page);
>> +                       page_pool_put_page(rq->page_pool, xdp_page, -1, true);
>>                  }
>>          }
>>   }
>> @@ -1873,7 +1752,7 @@ static struct page *xdp_linearize_page(struct net_device *dev,
>>          if (page_off + *len + tailroom > PAGE_SIZE)
>>                  return NULL;
>>
>> -       page = alloc_page(GFP_ATOMIC);
>> +       page = page_pool_alloc_pages(rq->page_pool, GFP_ATOMIC);
>>          if (!page)
>>                  return NULL;
>>
>> @@ -1896,8 +1775,12 @@ static struct page *xdp_linearize_page(struct net_device *dev,
>>                  p = virt_to_head_page(buf);
>>                  off = buf - page_address(p);
>>
>> +               if (rq->use_page_pool_dma)
>> +                       page_pool_dma_sync_for_cpu(rq->page_pool, p,
>> +                                                  off, buflen);
> 
> Intresting, I think we need a patch for -stable to sync for cpu as
> well (and probably the XDP_TX path).
> 
> 
>> +
>>                  if (check_mergeable_len(dev, ctx, buflen)) {
>> -                       put_page(p);
>> +                       page_pool_put_page(rq->page_pool, p, -1, true);
>>                          goto err_buf;
>>                  }
>>
>> @@ -1905,38 +1788,36 @@ static struct page *xdp_linearize_page(struct net_device *dev,
>>                   * is sending packet larger than the MTU.
>>                   */
>>                  if ((page_off + buflen + tailroom) > PAGE_SIZE) {
>> -                       put_page(p);
>> +                       page_pool_put_page(rq->page_pool, p, -1, true);
>>                          goto err_buf;
>>                  }
>>
>>                  memcpy(page_address(page) + page_off,
>>                         page_address(p) + off, buflen);
>>                  page_off += buflen;
>> -               put_page(p);
>> +               page_pool_put_page(rq->page_pool, p, -1, true);
>>          }
>>
>>          /* Headroom does not contribute to packet length */
>>          *len = page_off - XDP_PACKET_HEADROOM;
>>          return page;
>>   err_buf:
>> -       __free_pages(page, 0);
>> +       page_pool_put_page(rq->page_pool, page, -1, true);
>>          return NULL;
>>   }
>>
>>   static struct sk_buff *receive_small_build_skb(struct virtnet_info *vi,
>>                                                 unsigned int xdp_headroom,
>>                                                 void *buf,
>> -                                              unsigned int len)
>> +                                              unsigned int len,
>> +                                              unsigned int buflen)
>>   {
>>          unsigned int header_offset;
>>          unsigned int headroom;
>> -       unsigned int buflen;
>>          struct sk_buff *skb;
>>
>>          header_offset = VIRTNET_RX_PAD + xdp_headroom;
>>          headroom = vi->hdr_len + header_offset;
>> -       buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
>> -               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>
> 
> Any reason for remvoing this?

page_pool_alloc_va() can return a larger allocation than requested as it
appends remaining fragment space to avoid truesize underestimation
(comment in page_pool_alloc_netmem() in helpers.h). The old hardcoded
computation would always produce the requested ~512 bytes, ignoring any
extra space page_pool gave us, so build_skb() would set skb->truesize
too low. To pass the real size through: add_recvbuf_small() encodes
alloc_len and xdp_headroom into ctx via mergeable_len_to_ctx(alloc_len,
xdp_headroom). On the receive side, receive_small() extracts it with
mergeable_ctx_to_truesize(ctx) and passes it as the buflen parameter to
receive_small_build_skb().

> 
> The rest looks fine.
> 
> Thanks
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-13  7:51 ` Jason Wang
  2026-03-13  9:26   ` Vishwanath Seshagiri
@ 2026-03-13 16:50   ` Vishwanath Seshagiri
  2026-03-16  7:35     ` Jason Wang
  1 sibling, 1 reply; 22+ messages in thread
From: Vishwanath Seshagiri @ 2026-03-13 16:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Xuan Zhuo, Eugenio Pérez, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wei, Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team



On 3/13/26 1:21 PM, Jason Wang wrote:
>> -static void put_xdp_frags(struct xdp_buff *xdp)
>> +static void put_xdp_frags(struct receive_queue *rq, struct xdp_buff *xdp)
>>   {
>>          struct skb_shared_info *shinfo;
>>          struct page *xdp_page;
>> @@ -1781,7 +1660,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
>>                  shinfo = xdp_get_shared_info_from_buff(xdp);
>>                  for (i = 0; i < shinfo->nr_frags; i++) {
>>                          xdp_page = skb_frag_page(&shinfo->frags[i]);
>> -                       put_page(xdp_page);
>> +                       page_pool_put_page(rq->page_pool, xdp_page, -1, true);
>>                  }
>>          }
>>   }
>> @@ -1873,7 +1752,7 @@ static struct page *xdp_linearize_page(struct net_device *dev,
>>          if (page_off + *len + tailroom > PAGE_SIZE)
>>                  return NULL;
>>
>> -       page = alloc_page(GFP_ATOMIC);
>> +       page = page_pool_alloc_pages(rq->page_pool, GFP_ATOMIC);
>>          if (!page)
>>                  return NULL;
>>
>> @@ -1896,8 +1775,12 @@ static struct page *xdp_linearize_page(struct net_device *dev,
>>                  p = virt_to_head_page(buf);
>>                  off = buf - page_address(p);
>>
>> +               if (rq->use_page_pool_dma)
>> +                       page_pool_dma_sync_for_cpu(rq->page_pool, p,
>> +                                                  off, buflen);
> 
> Intresting, I think we need a patch for -stable to sync for cpu as
> well (and probably the XDP_TX path).

In the old code, I believed the sync was handled implicitly inside 
virtnet_rq_get_buf() -> virtnet_rq_unmap() before data was accessed. 
Could you point me at the specific path you're concerned about for 
-stable? I want to make sure I understand the issue correctly before 
sending a fix.

> 
> 
>> +
>>                  if (check_mergeable_len(dev, ctx, buflen)) {
>> -                       put_page(p);
>> +                       page_pool_put_page(rq->page_pool, p, -1, true);
>>                          goto err_buf;
>>                  }

Sorry for missing that comment in the earlier email


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-13 16:50   ` Vishwanath Seshagiri
@ 2026-03-16  7:35     ` Jason Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2026-03-16  7:35 UTC (permalink / raw)
  To: Vishwanath Seshagiri
  Cc: Michael S . Tsirkin, Xuan Zhuo, Eugenio Pérez, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wei, Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team

On Sat, Mar 14, 2026 at 12:51 AM Vishwanath Seshagiri <vishs@meta.com> wrote:
>
>
>
> On 3/13/26 1:21 PM, Jason Wang wrote:
> >> -static void put_xdp_frags(struct xdp_buff *xdp)
> >> +static void put_xdp_frags(struct receive_queue *rq, struct xdp_buff *xdp)
> >>   {
> >>          struct skb_shared_info *shinfo;
> >>          struct page *xdp_page;
> >> @@ -1781,7 +1660,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> >>                  shinfo = xdp_get_shared_info_from_buff(xdp);
> >>                  for (i = 0; i < shinfo->nr_frags; i++) {
> >>                          xdp_page = skb_frag_page(&shinfo->frags[i]);
> >> -                       put_page(xdp_page);
> >> +                       page_pool_put_page(rq->page_pool, xdp_page, -1, true);
> >>                  }
> >>          }
> >>   }
> >> @@ -1873,7 +1752,7 @@ static struct page *xdp_linearize_page(struct net_device *dev,
> >>          if (page_off + *len + tailroom > PAGE_SIZE)
> >>                  return NULL;
> >>
> >> -       page = alloc_page(GFP_ATOMIC);
> >> +       page = page_pool_alloc_pages(rq->page_pool, GFP_ATOMIC);
> >>          if (!page)
> >>                  return NULL;
> >>
> >> @@ -1896,8 +1775,12 @@ static struct page *xdp_linearize_page(struct net_device *dev,
> >>                  p = virt_to_head_page(buf);
> >>                  off = buf - page_address(p);
> >>
> >> +               if (rq->use_page_pool_dma)
> >> +                       page_pool_dma_sync_for_cpu(rq->page_pool, p,
> >> +                                                  off, buflen);
> >
> > Intresting, I think we need a patch for -stable to sync for cpu as
> > well (and probably the XDP_TX path).
>
> In the old code, I believed the sync was handled implicitly inside
> virtnet_rq_get_buf() -> virtnet_rq_unmap() before data was accessed.
> Could you point me at the specific path you're concerned about for
> -stable? I want to make sure I understand the issue correctly before
> sending a fix.

You're right; the code seems fine.

(Note that, when big_packets is true but small is used, we will go
virtqueue_get_buf(), so the unamp is done via deatch_buf().

>
> >
> >
> >> +
> >>                  if (check_mergeable_len(dev, ctx, buflen)) {
> >> -                       put_page(p);
> >> +                       page_pool_put_page(rq->page_pool, p, -1, true);
> >>                          goto err_buf;
> >>                  }
>
> Sorry for missing that comment in the earlier email

No worries.

Thanks

>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-13  9:26   ` Vishwanath Seshagiri
@ 2026-03-16  7:41     ` Jason Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2026-03-16  7:41 UTC (permalink / raw)
  To: Vishwanath Seshagiri
  Cc: Michael S . Tsirkin, Xuan Zhuo, Eugenio Pérez, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wei, Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team

On Fri, Mar 13, 2026 at 5:27 PM Vishwanath Seshagiri <vishs@meta.com> wrote:
>
> On 3/13/26 1:21 PM, Jason Wang wrote:
> > On Wed, Mar 11, 2026 at 2:31 AM Vishwanath Seshagiri <vishs@meta.com> wrote:
> >>
> >> Use page_pool for RX buffer allocation in mergeable and small buffer
> >> modes to enable page recycling and avoid repeated page allocator calls.
> >> skb_mark_for_recycle() enables page reuse in the network stack.
> >>
> >> Big packets mode is unchanged because it uses page->private for linked
> >> list chaining of multiple pages per buffer, which conflicts with
> >> page_pool's internal use of page->private.
> >>
> >> Implement conditional DMA premapping using virtqueue_dma_dev():
> >> - When non-NULL (vhost, virtio-pci): use PP_FLAG_DMA_MAP with page_pool
> >>    handling DMA mapping, submit via virtqueue_add_inbuf_premapped()
> >> - When NULL (VDUSE, direct physical): page_pool handles allocation only,
> >>    submit via virtqueue_add_inbuf_ctx()
> >>
> >> This preserves the DMA premapping optimization from commit 31f3cd4e5756b
> >> ("virtio-net: rq submits premapped per-buffer") while adding page_pool
> >> support as a prerequisite for future zero-copy features (devmem TCP,
> >> io_uring ZCRX).
> >>
> >> Page pools are created in probe and destroyed in remove (not open/close),
> >> following existing driver behavior where RX buffers remain in virtqueues
> >> across interface state changes.
> >>
> >> Signed-off-by: Vishwanath Seshagiri <vishs@meta.com>
> >> ---
> >> Changes in v11:
> >> - add_recvbuf_small: encode alloc_len and xdp_headroom in ctx via
> >>    mergeable_len_to_ctx() so receive_small() recovers the actual buflen
> >>    via mergeable_ctx_to_truesize() (Michael S. Tsirkin)
> >> - receive_small_build_skb, receive_small_xdp: accept buflen parameter
> >>    instead of recomputing it, to use the actual allocation size
> >> - v10:
> >>    https://lore.kernel.org/virtualization/9752a952-195d-4da3-bc7a-5a4a1f2fd2ca@meta.com/
> >>
> >> Changes in v10:
> >> - add_recvbuf_small: use alloc_len to avoid clobbering len; v9 feedback
> >>    was about truesize under-accounting, not variable naming — misunderstood
> >>    the comment in v9
> >> - v9:
> >>    https://lore.kernel.org/virtualization/20260302041005.1627210-1-vishs@meta.com/
> >>
> >> Changes in v9:
> >> - Fix virtnet_skb_append_frag() for XSK callers (Michael S. Tsirkin)
> >> - v8:
> >>    https://lore.kernel.org/virtualization/e824c5a3-cfe0-4d11-958f-c3ec82d11d37@meta.com/
> >>
> >> Changes in v8:
> >> - Remove virtnet_no_page_pool() helper, replace with direct !rq->page_pool
> >>    checks or inlined conditions (Xuan Zhuo)
> >> - Extract virtnet_rq_submit() helper to consolidate DMA/non-DMA buffer
> >>    submission in add_recvbuf_small() and add_recvbuf_mergeable()
> >> - Add skb_mark_for_recycle(nskb) for overflow frag_list skbs in
> >>    virtnet_skb_append_frag() to ensure page_pool pages are returned to
> >>    the pool instead of freed via put_page()
> >> - Rebase on net-next (kzalloc_objs API)
> >> - v7:
> >>    https://lore.kernel.org/virtualization/20260210014305.3236342-1-vishs@meta.com/
> >>
> >> Changes in v7:
> >> - Replace virtnet_put_page() helper with direct page_pool_put_page()
> >>    calls (Xuan Zhuo)
> >> - Add virtnet_no_page_pool() helper to consolidate big_packets mode check
> >>    (Michael S. Tsirkin)
> >> - Add DMA sync_for_cpu for subsequent buffers in xdp_linearize_page() when
> >>    use_page_pool_dma is set (Michael S. Tsirkin)
> >> - Remove unused pp_params.dev assignment in non-DMA path
> >> - Add page pool recreation in virtnet_restore_up() for freeze/restore support (Chris Mason's
> >> Review Prompt)
> >> - v6:
> >>    https://lore.kernel.org/virtualization/20260208175410.1910001-1-vishs@meta.com/
> >>
> >> Changes in v6:
> >> - Drop page_pool_frag_offset_add() helper and switch to page_pool_alloc_va();
> >>    page_pool_alloc_netmem() already handles internal fragmentation internally
> >>    (Jakub Kicinski)
> >> - v5:
> >>    https://lore.kernel.org/virtualization/20260206002715.1885869-1-vishs@meta.com/
> >>
> >> Benchmark results:
> >>
> >> Configuration: pktgen TX -> tap -> vhost-net | virtio-net RX -> XDP_DROP
> >>
> >> Small packets (64 bytes, mrg_rxbuf=off):
> >>    1Q:  853,493 -> 868,923 pps  (+1.8%)
> >>    2Q: 1,655,793 -> 1,696,707 pps (+2.5%)
> >>    4Q: 3,143,375 -> 3,302,511 pps (+5.1%)
> >>    8Q: 6,082,590 -> 6,156,894 pps (+1.2%)
> >>
> >> Mergeable RX (64 bytes):
> >>    1Q:   766,168 ->   814,493 pps  (+6.3%)
> >>    2Q: 1,384,871 -> 1,670,639 pps (+20.6%)
> >>    4Q: 2,773,081 -> 3,080,574 pps (+11.1%)
> >>    8Q: 5,600,615 -> 6,043,891 pps  (+7.9%)
> >>
> >> Mergeable RX (1500 bytes):
> >>    1Q:   741,579 ->   785,442 pps  (+5.9%)
> >>    2Q: 1,310,043 -> 1,534,554 pps (+17.1%)
> >>    4Q: 2,748,700 -> 2,890,582 pps  (+5.2%)
> >>    8Q: 5,348,589 -> 5,618,664 pps  (+5.0%)
> >>
> >>   drivers/net/Kconfig      |   1 +
> >>   drivers/net/virtio_net.c | 497 ++++++++++++++++++++-------------------
> >>   2 files changed, 251 insertions(+), 247 deletions(-)
> >>
> >> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> >> index 17108c359216..b2fd90466bab 100644
> >> --- a/drivers/net/Kconfig
> >> +++ b/drivers/net/Kconfig
> >> @@ -452,6 +452,7 @@ config VIRTIO_NET
> >>          depends on VIRTIO
> >>          select NET_FAILOVER
> >>          select DIMLIB
> >> +       select PAGE_POOL
> >>          help
> >>            This is the virtual network driver for virtio.  It can be used with
> >>            QEMU based VMMs (like KVM or Xen).  Say Y or M.
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 72d6a9c6a5a2..a85d75a7f539 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -26,6 +26,7 @@
> >>   #include <net/netdev_rx_queue.h>
> >>   #include <net/netdev_queues.h>
> >>   #include <net/xdp_sock_drv.h>
> >> +#include <net/page_pool/helpers.h>
> >>
> >>   static int napi_weight = NAPI_POLL_WEIGHT;
> >>   module_param(napi_weight, int, 0444);
> >> @@ -290,14 +291,6 @@ struct virtnet_interrupt_coalesce {
> >>          u32 max_usecs;
> >>   };
> >>
> >> -/* The dma information of pages allocated at a time. */
> >> -struct virtnet_rq_dma {
> >> -       dma_addr_t addr;
> >> -       u32 ref;
> >> -       u16 len;
> >> -       u16 need_sync;
> >> -};
> >> -
> >>   /* Internal representation of a send virtqueue */
> >>   struct send_queue {
> >>          /* Virtqueue associated with this send _queue */
> >> @@ -356,8 +349,10 @@ struct receive_queue {
> >>          /* Average packet length for mergeable receive buffers. */
> >>          struct ewma_pkt_len mrg_avg_pkt_len;
> >>
> >> -       /* Page frag for packet buffer allocation. */
> >> -       struct page_frag alloc_frag;
> >> +       struct page_pool *page_pool;
> >> +
> >> +       /* True if page_pool handles DMA mapping via PP_FLAG_DMA_MAP */
> >> +       bool use_page_pool_dma;
> >>
> >>          /* RX: fragments + linear part + virtio header */
> >>          struct scatterlist sg[MAX_SKB_FRAGS + 2];
> >> @@ -370,9 +365,6 @@ struct receive_queue {
> >>
> >>          struct xdp_rxq_info xdp_rxq;
> >>
> >> -       /* Record the last dma info to free after new pages is allocated. */
> >> -       struct virtnet_rq_dma *last_dma;
> >> -
> >>          struct xsk_buff_pool *xsk_pool;
> >>
> >>          /* xdp rxq used by xsk */
> >> @@ -521,11 +513,14 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> >>                                 struct virtnet_rq_stats *stats);
> >>   static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
> >>                                   struct sk_buff *skb, u8 flags);
> >> -static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
> >> +static struct sk_buff *virtnet_skb_append_frag(struct receive_queue *rq,
> >> +                                              struct sk_buff *head_skb,
> >>                                                 struct sk_buff *curr_skb,
> >>                                                 struct page *page, void *buf,
> >>                                                 int len, int truesize);
> >>   static void virtnet_xsk_completed(struct send_queue *sq, int num);
> >> +static void free_unused_bufs(struct virtnet_info *vi);
> >> +static void virtnet_del_vqs(struct virtnet_info *vi);
> >>
> >>   enum virtnet_xmit_type {
> >>          VIRTNET_XMIT_TYPE_SKB,
> >> @@ -709,12 +704,10 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> >>   static void virtnet_rq_free_buf(struct virtnet_info *vi,
> >>                                  struct receive_queue *rq, void *buf)
> >>   {
> >> -       if (vi->mergeable_rx_bufs)
> >> -               put_page(virt_to_head_page(buf));
> >> -       else if (vi->big_packets)
> >> +       if (!rq->page_pool)
> >>                  give_pages(rq, buf);
> >>          else
> >> -               put_page(virt_to_head_page(buf));
> >> +               page_pool_put_page(rq->page_pool, virt_to_head_page(buf), -1, false);
> >>   }
> >>
> >>   static void enable_rx_mode_work(struct virtnet_info *vi)
> >> @@ -876,10 +869,16 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >>                  skb = virtnet_build_skb(buf, truesize, p - buf, len);
> >>                  if (unlikely(!skb))
> >>                          return NULL;
> >> +               /* Big packets mode chains pages via page->private, which is
> >> +                * incompatible with the way page_pool uses page->private.
> >> +                * Currently, big packets mode doesn't use page pools.
> >> +                */
> >> +               if (!rq->page_pool) {
> >> +                       page = (struct page *)page->private;
> >> +                       if (page)
> >> +                               give_pages(rq, page);
> >> +               }
> >>
> >> -               page = (struct page *)page->private;
> >> -               if (page)
> >> -                       give_pages(rq, page);
> >>                  goto ok;
> >>          }
> >>
> >> @@ -925,133 +924,16 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >>          hdr = skb_vnet_common_hdr(skb);
> >>          memcpy(hdr, hdr_p, hdr_len);
> >>          if (page_to_free)
> >> -               put_page(page_to_free);
> >> +               page_pool_put_page(rq->page_pool, page_to_free, -1, true);
> >>
> >>          return skb;
> >>   }
> >>
> >> -static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len)
> >> -{
> >> -       struct virtnet_info *vi = rq->vq->vdev->priv;
> >> -       struct page *page = virt_to_head_page(buf);
> >> -       struct virtnet_rq_dma *dma;
> >> -       void *head;
> >> -       int offset;
> >> -
> >> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
> >> -
> >> -       head = page_address(page);
> >> -
> >> -       dma = head;
> >> -
> >> -       --dma->ref;
> >> -
> >> -       if (dma->need_sync && len) {
> >> -               offset = buf - (head + sizeof(*dma));
> >> -
> >> -               virtqueue_map_sync_single_range_for_cpu(rq->vq, dma->addr,
> >> -                                                       offset, len,
> >> -                                                       DMA_FROM_DEVICE);
> >> -       }
> >> -
> >> -       if (dma->ref)
> >> -               return;
> >> -
> >> -       virtqueue_unmap_single_attrs(rq->vq, dma->addr, dma->len,
> >> -                                    DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
> >> -       put_page(page);
> >> -}
> >> -
> >>   static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> >>   {
> >> -       struct virtnet_info *vi = rq->vq->vdev->priv;
> >> -       void *buf;
> >> -
> >> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
> >> +       BUG_ON(!rq->page_pool);
> >>
> >> -       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> >> -       if (buf)
> >> -               virtnet_rq_unmap(rq, buf, *len);
> >> -
> >> -       return buf;
> >> -}
> >> -
> >> -static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
> >> -{
> >> -       struct virtnet_info *vi = rq->vq->vdev->priv;
> >> -       struct virtnet_rq_dma *dma;
> >> -       dma_addr_t addr;
> >> -       u32 offset;
> >> -       void *head;
> >> -
> >> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
> >> -
> >> -       head = page_address(rq->alloc_frag.page);
> >> -
> >> -       offset = buf - head;
> >> -
> >> -       dma = head;
> >> -
> >> -       addr = dma->addr - sizeof(*dma) + offset;
> >> -
> >> -       sg_init_table(rq->sg, 1);
> >> -       sg_fill_dma(rq->sg, addr, len);
> >> -}
> >> -
> >> -static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> >> -{
> >> -       struct page_frag *alloc_frag = &rq->alloc_frag;
> >> -       struct virtnet_info *vi = rq->vq->vdev->priv;
> >> -       struct virtnet_rq_dma *dma;
> >> -       void *buf, *head;
> >> -       dma_addr_t addr;
> >> -
> >> -       BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs);
> >> -
> >> -       head = page_address(alloc_frag->page);
> >> -
> >> -       dma = head;
> >> -
> >> -       /* new pages */
> >> -       if (!alloc_frag->offset) {
> >> -               if (rq->last_dma) {
> >> -                       /* Now, the new page is allocated, the last dma
> >> -                        * will not be used. So the dma can be unmapped
> >> -                        * if the ref is 0.
> >> -                        */
> >> -                       virtnet_rq_unmap(rq, rq->last_dma, 0);
> >> -                       rq->last_dma = NULL;
> >> -               }
> >> -
> >> -               dma->len = alloc_frag->size - sizeof(*dma);
> >> -
> >> -               addr = virtqueue_map_single_attrs(rq->vq, dma + 1,
> >> -                                                 dma->len, DMA_FROM_DEVICE, 0);
> >> -               if (virtqueue_map_mapping_error(rq->vq, addr))
> >> -                       return NULL;
> >> -
> >> -               dma->addr = addr;
> >> -               dma->need_sync = virtqueue_map_need_sync(rq->vq, addr);
> >> -
> >> -               /* Add a reference to dma to prevent the entire dma from
> >> -                * being released during error handling. This reference
> >> -                * will be freed after the pages are no longer used.
> >> -                */
> >> -               get_page(alloc_frag->page);
> >> -               dma->ref = 1;
> >> -               alloc_frag->offset = sizeof(*dma);
> >> -
> >> -               rq->last_dma = dma;
> >> -       }
> >> -
> >> -       ++dma->ref;
> >> -
> >> -       buf = head + alloc_frag->offset;
> >> -
> >> -       get_page(alloc_frag->page);
> >> -       alloc_frag->offset += size;
> >> -
> >> -       return buf;
> >> +       return virtqueue_get_buf_ctx(rq->vq, len, ctx);
> >>   }
> >>
> >>   static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> >> @@ -1067,9 +949,6 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> >>                  return;
> >>          }
> >>
> >> -       if (!vi->big_packets || vi->mergeable_rx_bufs)
> >> -               virtnet_rq_unmap(rq, buf, 0);
> >> -
> >>          virtnet_rq_free_buf(vi, rq, buf);
> >>   }
> >>
> >> @@ -1335,7 +1214,7 @@ static int xsk_append_merge_buffer(struct virtnet_info *vi,
> >>
> >>                  truesize = len;
> >>
> >> -               curr_skb  = virtnet_skb_append_frag(head_skb, curr_skb, page,
> >> +               curr_skb  = virtnet_skb_append_frag(rq, head_skb, curr_skb, page,
> >>                                                      buf, len, truesize);
> >>                  if (!curr_skb) {
> >>                          put_page(page);
> >> @@ -1771,7 +1650,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> >>          return ret;
> >>   }
> >>
> >> -static void put_xdp_frags(struct xdp_buff *xdp)
> >> +static void put_xdp_frags(struct receive_queue *rq, struct xdp_buff *xdp)
> >>   {
> >>          struct skb_shared_info *shinfo;
> >>          struct page *xdp_page;
> >> @@ -1781,7 +1660,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> >>                  shinfo = xdp_get_shared_info_from_buff(xdp);
> >>                  for (i = 0; i < shinfo->nr_frags; i++) {
> >>                          xdp_page = skb_frag_page(&shinfo->frags[i]);
> >> -                       put_page(xdp_page);
> >> +                       page_pool_put_page(rq->page_pool, xdp_page, -1, true);
> >>                  }
> >>          }
> >>   }
> >> @@ -1873,7 +1752,7 @@ static struct page *xdp_linearize_page(struct net_device *dev,
> >>          if (page_off + *len + tailroom > PAGE_SIZE)
> >>                  return NULL;
> >>
> >> -       page = alloc_page(GFP_ATOMIC);
> >> +       page = page_pool_alloc_pages(rq->page_pool, GFP_ATOMIC);
> >>          if (!page)
> >>                  return NULL;
> >>
> >> @@ -1896,8 +1775,12 @@ static struct page *xdp_linearize_page(struct net_device *dev,
> >>                  p = virt_to_head_page(buf);
> >>                  off = buf - page_address(p);
> >>
> >> +               if (rq->use_page_pool_dma)
> >> +                       page_pool_dma_sync_for_cpu(rq->page_pool, p,
> >> +                                                  off, buflen);
> >
> > Intresting, I think we need a patch for -stable to sync for cpu as
> > well (and probably the XDP_TX path).
> >
> >
> >> +
> >>                  if (check_mergeable_len(dev, ctx, buflen)) {
> >> -                       put_page(p);
> >> +                       page_pool_put_page(rq->page_pool, p, -1, true);
> >>                          goto err_buf;
> >>                  }
> >>
> >> @@ -1905,38 +1788,36 @@ static struct page *xdp_linearize_page(struct net_device *dev,
> >>                   * is sending packet larger than the MTU.
> >>                   */
> >>                  if ((page_off + buflen + tailroom) > PAGE_SIZE) {
> >> -                       put_page(p);
> >> +                       page_pool_put_page(rq->page_pool, p, -1, true);
> >>                          goto err_buf;
> >>                  }
> >>
> >>                  memcpy(page_address(page) + page_off,
> >>                         page_address(p) + off, buflen);
> >>                  page_off += buflen;
> >> -               put_page(p);
> >> +               page_pool_put_page(rq->page_pool, p, -1, true);
> >>          }
> >>
> >>          /* Headroom does not contribute to packet length */
> >>          *len = page_off - XDP_PACKET_HEADROOM;
> >>          return page;
> >>   err_buf:
> >> -       __free_pages(page, 0);
> >> +       page_pool_put_page(rq->page_pool, page, -1, true);
> >>          return NULL;
> >>   }
> >>
> >>   static struct sk_buff *receive_small_build_skb(struct virtnet_info *vi,
> >>                                                 unsigned int xdp_headroom,
> >>                                                 void *buf,
> >> -                                              unsigned int len)
> >> +                                              unsigned int len,
> >> +                                              unsigned int buflen)
> >>   {
> >>          unsigned int header_offset;
> >>          unsigned int headroom;
> >> -       unsigned int buflen;
> >>          struct sk_buff *skb;
> >>
> >>          header_offset = VIRTNET_RX_PAD + xdp_headroom;
> >>          headroom = vi->hdr_len + header_offset;
> >> -       buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> >> -               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >>
> >
> > Any reason for remvoing this?
>
> page_pool_alloc_va() can return a larger allocation than requested as it
> appends remaining fragment space to avoid truesize underestimation
> (comment in page_pool_alloc_netmem() in helpers.h). The old hardcoded
> computation would always produce the requested ~512 bytes, ignoring any
> extra space page_pool gave us, so build_skb() would set skb->truesize
> too low. To pass the real size through: add_recvbuf_small() encodes
> alloc_len and xdp_headroom into ctx via mergeable_len_to_ctx(alloc_len,
> xdp_headroom). On the receive side, receive_small() extracts it with
> mergeable_ctx_to_truesize(ctx) and passes it as the buflen parameter to
> receive_small_build_skb().
>

Right.

So

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-10 18:31 [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation Vishwanath Seshagiri
  2026-03-13  7:51 ` Jason Wang
@ 2026-03-16  9:56 ` Michael S. Tsirkin
  2026-03-16 10:43   ` Michael S. Tsirkin
  2026-03-17  2:30 ` patchwork-bot+netdevbpf
  2026-03-23 15:01 ` Omar Elghoul
  3 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2026-03-16  9:56 UTC (permalink / raw)
  To: Vishwanath Seshagiri
  Cc: Jason Wang, Xuan Zhuo, Eugenio Pérez, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wei, Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team

On Tue, Mar 10, 2026 at 11:31:04AM -0700, Vishwanath Seshagiri wrote:
> @@ -5857,7 +5863,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
>  	/* In big_packets mode, xdp cannot work, so there is no need to
>  	 * initialize xsk of rq.
>  	 */
> -	if (vi->big_packets && !vi->mergeable_rx_bufs)
> +	if (!vi->rq[qid].page_pool)
>  		return -ENOENT;
>  
>  	if (qid >= vi->curr_queue_pairs)



It seems that a qid that exceeds curr_queue_pairs would previously get
-EINVAL and now gets -ENOENT. Maybe reorder the checks:

        if (qid >= vi->curr_queue_pairs)
                return -EINVAL;

        /* In big_packets mode, xdp cannot work, so there is no need to
         * initialize xsk of rq.
         */ 
       if (!vi->rq[qid].page_pool)
               return -ENOENT;


Alternatively I think we can completely drop this chunk: we already seem
to init page_pull at all times except for big packets mode, so the
current code is fine I think.


-- 
MST


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-16  9:56 ` Michael S. Tsirkin
@ 2026-03-16 10:43   ` Michael S. Tsirkin
  2026-03-16 11:57     ` Vishwanath Seshagiri
  0 siblings, 1 reply; 22+ messages in thread
From: Michael S. Tsirkin @ 2026-03-16 10:43 UTC (permalink / raw)
  To: Vishwanath Seshagiri
  Cc: Jason Wang, Xuan Zhuo, Eugenio Pérez, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wei, Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team

On Mon, Mar 16, 2026 at 05:56:20AM -0400, Michael S. Tsirkin wrote:
> On Tue, Mar 10, 2026 at 11:31:04AM -0700, Vishwanath Seshagiri wrote:
> > @@ -5857,7 +5863,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
> >  	/* In big_packets mode, xdp cannot work, so there is no need to
> >  	 * initialize xsk of rq.
> >  	 */
> > -	if (vi->big_packets && !vi->mergeable_rx_bufs)
> > +	if (!vi->rq[qid].page_pool)
> >  		return -ENOENT;
> >  
> >  	if (qid >= vi->curr_queue_pairs)
> 
> 
> 
> It seems that a qid that exceeds curr_queue_pairs would previously get
> -EINVAL and now gets -ENOENT.

Or maybe this if (qid >= vi->curr_queue_pairs) is dead code?
I looked at it some more and I can't find a path where this
triggers.

> Maybe reorder the checks:
> 
>         if (qid >= vi->curr_queue_pairs)
>                 return -EINVAL;
> 
>         /* In big_packets mode, xdp cannot work, so there is no need to
>          * initialize xsk of rq.
>          */ 
>        if (!vi->rq[qid].page_pool)
>                return -ENOENT;
> 
> 
> Alternatively I think we can completely drop this chunk: we already seem
> to init page_pull at all times except for big packets mode, so the
> current code is fine I think.
> 
> 
> -- 
> MST


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-16 10:43   ` Michael S. Tsirkin
@ 2026-03-16 11:57     ` Vishwanath Seshagiri
  2026-03-16 12:04       ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Vishwanath Seshagiri @ 2026-03-16 11:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Xuan Zhuo, Eugenio Pérez, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wei, Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team

On 3/16/26 4:13 PM, Michael S. Tsirkin wrote:
> On Mon, Mar 16, 2026 at 05:56:20AM -0400, Michael S. Tsirkin wrote:
>> On Tue, Mar 10, 2026 at 11:31:04AM -0700, Vishwanath Seshagiri wrote:
>>> @@ -5857,7 +5863,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
>>>   	/* In big_packets mode, xdp cannot work, so there is no need to
>>>   	 * initialize xsk of rq.
>>>   	 */
>>> -	if (vi->big_packets && !vi->mergeable_rx_bufs)
>>> +	if (!vi->rq[qid].page_pool)
>>>   		return -ENOENT;
>>>   
>>>   	if (qid >= vi->curr_queue_pairs)
>>
>>
>>
>> It seems that a qid that exceeds curr_queue_pairs would previously get
>> -EINVAL and now gets -ENOENT.
> 
> Or maybe this if (qid >= vi->curr_queue_pairs) is dead code?
> I looked at it some more and I can't find a path where this
> triggers.
> 
>> Maybe reorder the checks:
>>
>>          if (qid >= vi->curr_queue_pairs)
>>                  return -EINVAL;
>>
>>          /* In big_packets mode, xdp cannot work, so there is no need to
>>           * initialize xsk of rq.
>>           */
>>         if (!vi->rq[qid].page_pool)
>>                 return -ENOENT;
>>
>>
>> Alternatively I think we can completely drop this chunk: we already seem
>> to init page_pull at all times except for big packets mode, so the
>> current code is fine I think.

Yes, I agree qid >= curr_queue_pairs appears to be dead code.
xsk_reg_pool_at_qid() in the XSK core already validates
queue_id < max(real_num_rx_queues, real_num_tx_queues) before
ndo_bpf is called, and real_num_rx_queues == curr_queue_pairs is an
invariant in virtio_net. Both paths hold rtnl_lock so no race is
possible. That said, I'll adopt your suggested reorder for v12 to ensure
vi->rq[qid] isn't accessed before a bounds check, even if the check
is currently redundant.

>>
>>
>> -- 
>> MST
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-16 11:57     ` Vishwanath Seshagiri
@ 2026-03-16 12:04       ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2026-03-16 12:04 UTC (permalink / raw)
  To: Vishwanath Seshagiri
  Cc: Jason Wang, Xuan Zhuo, Eugenio Pérez, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wei, Matteo Croce, Ilias Apalodimas, netdev, virtualization,
	linux-kernel, kernel-team

On Mon, Mar 16, 2026 at 05:27:57PM +0530, Vishwanath Seshagiri wrote:
> On 3/16/26 4:13 PM, Michael S. Tsirkin wrote:
> > On Mon, Mar 16, 2026 at 05:56:20AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Mar 10, 2026 at 11:31:04AM -0700, Vishwanath Seshagiri wrote:
> > > > @@ -5857,7 +5863,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
> > > >   	/* In big_packets mode, xdp cannot work, so there is no need to
> > > >   	 * initialize xsk of rq.
> > > >   	 */
> > > > -	if (vi->big_packets && !vi->mergeable_rx_bufs)
> > > > +	if (!vi->rq[qid].page_pool)
> > > >   		return -ENOENT;
> > > >   	if (qid >= vi->curr_queue_pairs)
> > > 
> > > 
> > > 
> > > It seems that a qid that exceeds curr_queue_pairs would previously get
> > > -EINVAL and now gets -ENOENT.
> > 
> > Or maybe this if (qid >= vi->curr_queue_pairs) is dead code?
> > I looked at it some more and I can't find a path where this
> > triggers.
> > 
> > > Maybe reorder the checks:
> > > 
> > >          if (qid >= vi->curr_queue_pairs)
> > >                  return -EINVAL;
> > > 
> > >          /* In big_packets mode, xdp cannot work, so there is no need to
> > >           * initialize xsk of rq.
> > >           */
> > >         if (!vi->rq[qid].page_pool)
> > >                 return -ENOENT;
> > > 
> > > 
> > > Alternatively I think we can completely drop this chunk: we already seem
> > > to init page_pull at all times except for big packets mode, so the
> > > current code is fine I think.
> 
> Yes, I agree qid >= curr_queue_pairs appears to be dead code.
> xsk_reg_pool_at_qid() in the XSK core already validates
> queue_id < max(real_num_rx_queues, real_num_tx_queues) before
> ndo_bpf is called, and real_num_rx_queues == curr_queue_pairs is an
> invariant in virtio_net. Both paths hold rtnl_lock so no race is
> possible. That said, I'll adopt your suggested reorder for v12 to ensure
> vi->rq[qid] isn't accessed before a bounds check, even if the check
> is currently redundant.

I think we can just take this as-is, and do more cleanups on top.
Acked-by: Michael S. Tsirkin <mst@redhat.com>


> > > 
> > > 
> > > -- 
> > > MST
> > 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-10 18:31 [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation Vishwanath Seshagiri
  2026-03-13  7:51 ` Jason Wang
  2026-03-16  9:56 ` Michael S. Tsirkin
@ 2026-03-17  2:30 ` patchwork-bot+netdevbpf
  2026-03-23 15:01 ` Omar Elghoul
  3 siblings, 0 replies; 22+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-03-17  2:30 UTC (permalink / raw)
  To: Vishwanath Seshagiri
  Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, dw, technoboy85, ilias.apalodimas, netdev,
	virtualization, linux-kernel, kernel-team

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 10 Mar 2026 11:31:04 -0700 you wrote:
> Use page_pool for RX buffer allocation in mergeable and small buffer
> modes to enable page recycling and avoid repeated page allocator calls.
> skb_mark_for_recycle() enables page reuse in the network stack.
> 
> Big packets mode is unchanged because it uses page->private for linked
> list chaining of multiple pages per buffer, which conflicts with
> page_pool's internal use of page->private.
> 
> [...]

Here is the summary with links:
  - [net-next,v11] virtio_net: add page_pool support for buffer allocation
    https://git.kernel.org/netdev/net-next/c/24fbd3967f3f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-10 18:31 [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation Vishwanath Seshagiri
                   ` (2 preceding siblings ...)
  2026-03-17  2:30 ` patchwork-bot+netdevbpf
@ 2026-03-23 15:01 ` Omar Elghoul
  2026-03-23 15:01   ` Omar Elghoul
  2026-03-23 15:52   ` Michael S. Tsirkin
  3 siblings, 2 replies; 22+ messages in thread
From: Omar Elghoul @ 2026-03-23 15:01 UTC (permalink / raw)
  To: vishs
  Cc: andrew+netdev, davem, dw, edumazet, eperezma, ilias.apalodimas,
	jasowang, kernel-team, kuba, linux-kernel, mst, netdev, pabeni,
	technoboy85, virtualization, xuanzhuo, oelghoul

Hi,

I've been testing linux-next (tags later than 03/17) and hit new issues in
virtio-net on s390x. I bisected the issue, and I found this patch to be the
first buggy commit.

The issue seems to only be reproducible when running in Secure Execution.
Tested in a KVM guest, the virtio-net performance appears greatly reduced,
and the dmesg output shows many instances of the following error messages.

Partial relevant logs
=====================
[   49.332028] macvtap0: bad gso: type: 0, size: 0, flags 1 tunnel 0 tnl csum 0
[   74.365668] macvtap0: bad gso: type: 2e, size: 27948, flags 0 tunnel 0 tnl csum 0
[  403.302168] macvtap0: bad csum: flags: 2, gso_type: 23 rx_tnl_csum 0
[  403.302271] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
[  403.302279] macvtap0: bad csum: flags: 2, gso_type: e1 rx_tnl_csum 0
[  403.309492] macvtap0: bad csum: flags: 2, gso_type: 4c rx_tnl_csum 0
[  403.317029] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0

Steps to reproduce
==================
1. Boot a Linux guest implementing this patch under QEMU/KVM (*) with SE
   enabled and a virtio-net-ccw device attached.
2. Run dmesg. The error message is usually already present at boot time,
   but if not, it can be reproduced by creating any network traffic.

(*) This patch was not tested in a non-KVM hypervisor environment.

I've further confirmed that reverting this patch onto its parent commit
resolves the issue. Please let me know if you'd like me to test a fix or if
you would need more information.

Thanks in advance.

Best,
Omar

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 15:01 ` Omar Elghoul
@ 2026-03-23 15:01   ` Omar Elghoul
  2026-03-23 15:52   ` Michael S. Tsirkin
  1 sibling, 0 replies; 22+ messages in thread
From: Omar Elghoul @ 2026-03-23 15:01 UTC (permalink / raw)
  To: vishs
  Cc: andrew+netdev, davem, dw, edumazet, eperezma, ilias.apalodimas,
	jasowang, kernel-team, kuba, linux-kernel, mst, netdev, pabeni,
	technoboy85, virtualization, xuanzhuo, oelghoul

Hi,

I've been testing linux-next (tags later than 03/17) and hit new issues in
virtio-net on s390x. I bisected the issue, and I found this patch to be the
first buggy commit.

The issue seems to only be reproducible when running in Secure Execution.
Tested in a KVM guest, the virtio-net performance appears greatly reduced,
and the dmesg output shows many instances of the following error messages.

Partial relevant logs
=====================
[   49.332028] macvtap0: bad gso: type: 0, size: 0, flags 1 tunnel 0 tnl csum 0
[   74.365668] macvtap0: bad gso: type: 2e, size: 27948, flags 0 tunnel 0 tnl csum 0
[  403.302168] macvtap0: bad csum: flags: 2, gso_type: 23 rx_tnl_csum 0
[  403.302271] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
[  403.302279] macvtap0: bad csum: flags: 2, gso_type: e1 rx_tnl_csum 0
[  403.309492] macvtap0: bad csum: flags: 2, gso_type: 4c rx_tnl_csum 0
[  403.317029] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0

Steps to reproduce
==================
1. Boot a Linux guest implementing this patch under QEMU/KVM (*) with SE
   enabled and a virtio-net-ccw device attached.
2. Run dmesg. The error message is usually already present at boot time,
   but if not, it can be reproduced by creating any network traffic.

(*) This patch was not tested in a non-KVM hypervisor environment.

I've further confirmed that reverting this patch onto its parent commit
resolves the issue. Please let me know if you'd like me to test a fix or if
you would need more information.

Thanks in advance.

Best,
Omar

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 15:01 ` Omar Elghoul
  2026-03-23 15:01   ` Omar Elghoul
@ 2026-03-23 15:52   ` Michael S. Tsirkin
  2026-03-23 16:54     ` Omar Elghoul
  2026-03-23 16:58     ` Michael S. Tsirkin
  1 sibling, 2 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2026-03-23 15:52 UTC (permalink / raw)
  To: Omar Elghoul
  Cc: vishs, andrew+netdev, davem, dw, edumazet, eperezma,
	ilias.apalodimas, jasowang, kernel-team, kuba, linux-kernel,
	netdev, pabeni, technoboy85, virtualization, xuanzhuo

On Mon, Mar 23, 2026 at 11:01:31AM -0400, Omar Elghoul wrote:
> Hi,
> 
> I've been testing linux-next (tags later than 03/17) and hit new issues in
> virtio-net on s390x. I bisected the issue, and I found this patch to be the
> first buggy commit.
> 
> The issue seems to only be reproducible when running in Secure Execution.
> Tested in a KVM guest, the virtio-net performance appears greatly reduced,
> and the dmesg output shows many instances of the following error messages.
> 
> Partial relevant logs
> =====================
> [   49.332028] macvtap0: bad gso: type: 0, size: 0, flags 1 tunnel 0 tnl csum 0
> [   74.365668] macvtap0: bad gso: type: 2e, size: 27948, flags 0 tunnel 0 tnl csum 0
> [  403.302168] macvtap0: bad csum: flags: 2, gso_type: 23 rx_tnl_csum 0
> [  403.302271] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
> [  403.302279] macvtap0: bad csum: flags: 2, gso_type: e1 rx_tnl_csum 0
> [  403.309492] macvtap0: bad csum: flags: 2, gso_type: 4c rx_tnl_csum 0
> [  403.317029] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
> 
> Steps to reproduce
> ==================
> 1. Boot a Linux guest implementing this patch under QEMU/KVM (*) with SE
>    enabled and a virtio-net-ccw device attached.
> 2. Run dmesg. The error message is usually already present at boot time,
>    but if not, it can be reproduced by creating any network traffic.
> 
> (*) This patch was not tested in a non-KVM hypervisor environment.
> 
> I've further confirmed that reverting this patch onto its parent commit
> resolves the issue. Please let me know if you'd like me to test a fix or if
> you would need more information.
> 
> Thanks in advance.
> 
> Best,
> Omar

Well... I am not sure how I missed it. Obvious in hindsight:

static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
                        void *buf, unsigned int len, void **ctx,
                        unsigned int *xdp_xmit,
                        struct virtnet_rq_stats *stats)
{
        struct net_device *dev = vi->dev;
        struct sk_buff *skb;
        u8 flags;
                
        if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
                pr_debug("%s: short packet %i\n", dev->name, len);
                DEV_STATS_INC(dev, rx_length_errors);
                virtnet_rq_free_buf(vi, rq, buf);
                return;
        }
        
        /* About the flags below:
         * 1. Save the flags early, as the XDP program might overwrite them.
         * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
         * stay valid after XDP processing.
         * 2. XDP doesn't work with partially checksummed packets (refer to
         * virtnet_xdp_set()), so packets marked as
         * VIRTIO_NET_HDR_F_NEEDS_CSUM get dropped during XDP processing.
         */
                
        if (vi->mergeable_rx_bufs) {
                flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
                skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
                                        stats);
        } else if (vi->big_packets) {
                void *p = page_address((struct page *)buf);
                
                flags = ((struct virtio_net_common_hdr *)p)->hdr.flags;
                skb = receive_big(dev, vi, rq, buf, len, stats);
        } else {
                flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
                skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
        }       


So we are reading the header, before dma sync, which is within
receive_mergeable and friends:

static struct sk_buff *receive_mergeable(struct net_device *dev,
                                         struct virtnet_info *vi,
                                         struct receive_queue *rq,
                                         void *buf,
                                         void *ctx,
                                         unsigned int len,
                                         unsigned int *xdp_xmit,
                                         struct virtnet_rq_stats *stats)
{               
        struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
        int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
        struct page *page = virt_to_head_page(buf);
        int offset = buf - page_address(page);
        struct sk_buff *head_skb, *curr_skb;     
        unsigned int truesize = mergeable_ctx_to_truesize(ctx);
        unsigned int headroom = mergeable_ctx_to_headroom(ctx);
                
        head_skb = NULL;
                
        if (rq->use_page_pool_dma)
                page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
                


Just as a test, the below should fix it (compiled only), but the real
fix is more complex since we need to be careful to avoid expensive syncing
twice.


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 97035b49bae7..57b4f5954bed 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -931,9 +931,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
 static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
 {
+	void *buf;
+
 	BUG_ON(!rq->page_pool);
 
-	return virtqueue_get_buf_ctx(rq->vq, len, ctx);
+	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
+	if (buf && rq->use_page_pool_dma && *len) {
+		struct page *page = virt_to_head_page(buf);
+		int offset = buf - page_address(page);
+
+		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, *len);
+	}
+
+	return buf;
 }
 
 static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)




-- 
MST


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 15:52   ` Michael S. Tsirkin
@ 2026-03-23 16:54     ` Omar Elghoul
  2026-03-23 17:10       ` Michael S. Tsirkin
  2026-03-23 16:58     ` Michael S. Tsirkin
  1 sibling, 1 reply; 22+ messages in thread
From: Omar Elghoul @ 2026-03-23 16:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: vishs, andrew+netdev, davem, dw, edumazet, eperezma,
	ilias.apalodimas, jasowang, kernel-team, kuba, linux-kernel,
	netdev, pabeni, technoboy85, virtualization, xuanzhuo

On 3/23/26 11:52 AM, Michael S. Tsirkin wrote:
> On Mon, Mar 23, 2026 at 11:01:31AM -0400, Omar Elghoul wrote:
>> [...]
> Well... I am not sure how I missed it. Obvious in hindsight:
>
> static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>                          void *buf, unsigned int len, void **ctx,
>                          unsigned int *xdp_xmit,
>                          struct virtnet_rq_stats *stats)
> {
>          struct net_device *dev = vi->dev;
>          struct sk_buff *skb;
>          u8 flags;
>                  
>          if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
>                  pr_debug("%s: short packet %i\n", dev->name, len);
>                  DEV_STATS_INC(dev, rx_length_errors);
>                  virtnet_rq_free_buf(vi, rq, buf);
>                  return;
>          }
>          
>          /* About the flags below:
>           * 1. Save the flags early, as the XDP program might overwrite them.
>           * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
>           * stay valid after XDP processing.
>           * 2. XDP doesn't work with partially checksummed packets (refer to
>           * virtnet_xdp_set()), so packets marked as
>           * VIRTIO_NET_HDR_F_NEEDS_CSUM get dropped during XDP processing.
>           */
>                  
>          if (vi->mergeable_rx_bufs) {
>                  flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
>                  skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
>                                          stats);
>          } else if (vi->big_packets) {
>                  void *p = page_address((struct page *)buf);
>                  
>                  flags = ((struct virtio_net_common_hdr *)p)->hdr.flags;
>                  skb = receive_big(dev, vi, rq, buf, len, stats);
>          } else {
>                  flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
>                  skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
>          }
>
>
> So we are reading the header, before dma sync, which is within
> receive_mergeable and friends:
Thank you for your analysis and explanation.
>
> static struct sk_buff *receive_mergeable(struct net_device *dev,
>                                           struct virtnet_info *vi,
>                                           struct receive_queue *rq,
>                                           void *buf,
>                                           void *ctx,
>                                           unsigned int len,
>                                           unsigned int *xdp_xmit,
>                                           struct virtnet_rq_stats *stats)
> {
>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>          int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>          struct page *page = virt_to_head_page(buf);
>          int offset = buf - page_address(page);
>          struct sk_buff *head_skb, *curr_skb;
>          unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>          unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>                  
>          head_skb = NULL;
>                  
>          if (rq->use_page_pool_dma)
>                  page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
>                  
>
>
> Just as a test, the below should fix it (compiled only), but the real
> fix is more complex since we need to be careful to avoid expensive syncing
> twice.

I applied your patch and tested it on my system. With this change, I 
could not reproduce the same error anymore. I would be happy to test a 
proper fix once you have one.

>
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 97035b49bae7..57b4f5954bed 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -931,9 +931,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>   
>   static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
>   {
> +	void *buf;
> +
>   	BUG_ON(!rq->page_pool);
>   
> -	return virtqueue_get_buf_ctx(rq->vq, len, ctx);
> +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> +	if (buf && rq->use_page_pool_dma && *len) {
> +		struct page *page = virt_to_head_page(buf);
> +		int offset = buf - page_address(page);
> +
> +		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, *len);
> +	}
> +
> +	return buf;
>   }
>   
>   static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 15:52   ` Michael S. Tsirkin
  2026-03-23 16:54     ` Omar Elghoul
@ 2026-03-23 16:58     ` Michael S. Tsirkin
  2026-03-23 17:09       ` Omar Elghoul
                         ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2026-03-23 16:58 UTC (permalink / raw)
  To: Omar Elghoul
  Cc: vishs, andrew+netdev, davem, dw, edumazet, eperezma,
	ilias.apalodimas, jasowang, kernel-team, kuba, linux-kernel,
	netdev, pabeni, technoboy85, virtualization, xuanzhuo

On Mon, Mar 23, 2026 at 11:52:34AM -0400, Michael S. Tsirkin wrote:
> On Mon, Mar 23, 2026 at 11:01:31AM -0400, Omar Elghoul wrote:
> > Hi,
> > 
> > I've been testing linux-next (tags later than 03/17) and hit new issues in
> > virtio-net on s390x. I bisected the issue, and I found this patch to be the
> > first buggy commit.
> > 
> > The issue seems to only be reproducible when running in Secure Execution.
> > Tested in a KVM guest, the virtio-net performance appears greatly reduced,
> > and the dmesg output shows many instances of the following error messages.
> > 
> > Partial relevant logs
> > =====================
> > [   49.332028] macvtap0: bad gso: type: 0, size: 0, flags 1 tunnel 0 tnl csum 0
> > [   74.365668] macvtap0: bad gso: type: 2e, size: 27948, flags 0 tunnel 0 tnl csum 0
> > [  403.302168] macvtap0: bad csum: flags: 2, gso_type: 23 rx_tnl_csum 0
> > [  403.302271] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
> > [  403.302279] macvtap0: bad csum: flags: 2, gso_type: e1 rx_tnl_csum 0
> > [  403.309492] macvtap0: bad csum: flags: 2, gso_type: 4c rx_tnl_csum 0
> > [  403.317029] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
> > 
> > Steps to reproduce
> > ==================
> > 1. Boot a Linux guest implementing this patch under QEMU/KVM (*) with SE
> >    enabled and a virtio-net-ccw device attached.
> > 2. Run dmesg. The error message is usually already present at boot time,
> >    but if not, it can be reproduced by creating any network traffic.
> > 
> > (*) This patch was not tested in a non-KVM hypervisor environment.
> > 
> > I've further confirmed that reverting this patch onto its parent commit
> > resolves the issue. Please let me know if you'd like me to test a fix or if
> > you would need more information.
> > 
> > Thanks in advance.
> > 
> > Best,
> > Omar
> 
> Well... I am not sure how I missed it. Obvious in hindsight:
> 
> static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>                         void *buf, unsigned int len, void **ctx,
>                         unsigned int *xdp_xmit,
>                         struct virtnet_rq_stats *stats)
> {
>         struct net_device *dev = vi->dev;
>         struct sk_buff *skb;
>         u8 flags;
>                 
>         if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
>                 pr_debug("%s: short packet %i\n", dev->name, len);
>                 DEV_STATS_INC(dev, rx_length_errors);
>                 virtnet_rq_free_buf(vi, rq, buf);
>                 return;
>         }
>         
>         /* About the flags below:
>          * 1. Save the flags early, as the XDP program might overwrite them.
>          * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
>          * stay valid after XDP processing.
>          * 2. XDP doesn't work with partially checksummed packets (refer to
>          * virtnet_xdp_set()), so packets marked as
>          * VIRTIO_NET_HDR_F_NEEDS_CSUM get dropped during XDP processing.
>          */
>                 
>         if (vi->mergeable_rx_bufs) {
>                 flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
>                 skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
>                                         stats);
>         } else if (vi->big_packets) {
>                 void *p = page_address((struct page *)buf);
>                 
>                 flags = ((struct virtio_net_common_hdr *)p)->hdr.flags;
>                 skb = receive_big(dev, vi, rq, buf, len, stats);
>         } else {
>                 flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
>                 skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
>         }       
> 
> 
> So we are reading the header, before dma sync, which is within
> receive_mergeable and friends:
> 
> static struct sk_buff *receive_mergeable(struct net_device *dev,
>                                          struct virtnet_info *vi,
>                                          struct receive_queue *rq,
>                                          void *buf,
>                                          void *ctx,
>                                          unsigned int len,
>                                          unsigned int *xdp_xmit,
>                                          struct virtnet_rq_stats *stats)
> {               
>         struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>         int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>         struct page *page = virt_to_head_page(buf);
>         int offset = buf - page_address(page);
>         struct sk_buff *head_skb, *curr_skb;     
>         unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>         unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>                 
>         head_skb = NULL;
>                 
>         if (rq->use_page_pool_dma)
>                 page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
>                 
> 
> 
> Just as a test, the below should fix it (compiled only), but the real
> fix is more complex since we need to be careful to avoid expensive syncing
> twice.
> 
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 97035b49bae7..57b4f5954bed 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -931,9 +931,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  
>  static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
>  {
> +	void *buf;
> +
>  	BUG_ON(!rq->page_pool);
>  
> -	return virtqueue_get_buf_ctx(rq->vq, len, ctx);
> +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> +	if (buf && rq->use_page_pool_dma && *len) {
> +		struct page *page = virt_to_head_page(buf);
> +		int offset = buf - page_address(page);
> +
> +		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, *len);
> +	}
> +
> +	return buf;
>  }
>  
>  static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> 
> 
> 
> 
> -- 
> MST

or maybe like this:

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 97035b49bae7..835f52651006 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1956,13 +1956,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	 */
 	buf -= VIRTNET_RX_PAD + xdp_headroom;
 
-	if (rq->use_page_pool_dma) {
-		int offset = buf - page_address(page) +
-			     VIRTNET_RX_PAD + xdp_headroom;
-
-		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
-	}
-
 	len -= vi->hdr_len;
 	u64_stats_add(&stats->bytes, len);
 
@@ -2398,9 +2391,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 
 	head_skb = NULL;
 
-	if (rq->use_page_pool_dma)
-		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
-
 	u64_stats_add(&stats->bytes, len - vi->hdr_len);
 
 	if (check_mergeable_len(dev, ctx, len))
@@ -2563,6 +2553,13 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 		return;
 	}
 
+	if (rq->use_page_pool_dma) {
+		struct page *page = virt_to_head_page(buf);
+		int offset = buf - page_address(page);
+
+		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
+	}
+
 	/* About the flags below:
 	 * 1. Save the flags early, as the XDP program might overwrite them.
 	 * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 16:58     ` Michael S. Tsirkin
@ 2026-03-23 17:09       ` Omar Elghoul
  2026-03-23 17:50         ` Vishwanath Seshagiri
  2026-03-24  0:34       ` Jason Wang
  2026-03-24  8:20       ` Aithal, Srikanth
  2 siblings, 1 reply; 22+ messages in thread
From: Omar Elghoul @ 2026-03-23 17:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: vishs, andrew+netdev, davem, dw, edumazet, eperezma,
	ilias.apalodimas, jasowang, kernel-team, kuba, linux-kernel,
	netdev, pabeni, technoboy85, virtualization, xuanzhuo

On 3/23/26 12:58 PM, Michael S. Tsirkin wrote:

> On Mon, Mar 23, 2026 at 11:52:34AM -0400, Michael S. Tsirkin wrote:
>> On Mon, Mar 23, 2026 at 11:01:31AM -0400, Omar Elghoul wrote:
>>> [...]
>> Well... I am not sure how I missed it. Obvious in hindsight:
>>
>> static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>>                          void *buf, unsigned int len, void **ctx,
>>                          unsigned int *xdp_xmit,
>>                          struct virtnet_rq_stats *stats)
>> {
>>          struct net_device *dev = vi->dev;
>>          struct sk_buff *skb;
>>          u8 flags;
>>                  
>>          if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
>>                  pr_debug("%s: short packet %i\n", dev->name, len);
>>                  DEV_STATS_INC(dev, rx_length_errors);
>>                  virtnet_rq_free_buf(vi, rq, buf);
>>                  return;
>>          }
>>          
>>          /* About the flags below:
>>           * 1. Save the flags early, as the XDP program might overwrite them.
>>           * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
>>           * stay valid after XDP processing.
>>           * 2. XDP doesn't work with partially checksummed packets (refer to
>>           * virtnet_xdp_set()), so packets marked as
>>           * VIRTIO_NET_HDR_F_NEEDS_CSUM get dropped during XDP processing.
>>           */
>>                  
>>          if (vi->mergeable_rx_bufs) {
>>                  flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
>>                  skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
>>                                          stats);
>>          } else if (vi->big_packets) {
>>                  void *p = page_address((struct page *)buf);
>>                  
>>                  flags = ((struct virtio_net_common_hdr *)p)->hdr.flags;
>>                  skb = receive_big(dev, vi, rq, buf, len, stats);
>>          } else {
>>                  flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
>>                  skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
>>          }
>>
>>
>> So we are reading the header, before dma sync, which is within
>> receive_mergeable and friends:
>>
>> static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                                           struct virtnet_info *vi,
>>                                           struct receive_queue *rq,
>>                                           void *buf,
>>                                           void *ctx,
>>                                           unsigned int len,
>>                                           unsigned int *xdp_xmit,
>>                                           struct virtnet_rq_stats *stats)
>> {
>>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>          int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>>          struct page *page = virt_to_head_page(buf);
>>          int offset = buf - page_address(page);
>>          struct sk_buff *head_skb, *curr_skb;
>>          unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>>          unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>>                  
>>          head_skb = NULL;
>>                  
>>          if (rq->use_page_pool_dma)
>>                  page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
>>                  
>>
>>
>> Just as a test, the below should fix it (compiled only), but the real
>> fix is more complex since we need to be careful to avoid expensive syncing
>> twice.
>>
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 97035b49bae7..57b4f5954bed 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -931,9 +931,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>   
>>   static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
>>   {
>> +	void *buf;
>> +
>>   	BUG_ON(!rq->page_pool);
>>   
>> -	return virtqueue_get_buf_ctx(rq->vq, len, ctx);
>> +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
>> +	if (buf && rq->use_page_pool_dma && *len) {
>> +		struct page *page = virt_to_head_page(buf);
>> +		int offset = buf - page_address(page);
>> +
>> +		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, *len);
>> +	}
>> +
>> +	return buf;
>>   }
>>   
>>   static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>>
>>
>>
>>
>> -- 
>> MST
> or maybe like this:
Both of these patches resolve the issue on my end.
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 97035b49bae7..835f52651006 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1956,13 +1956,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   	 */
>   	buf -= VIRTNET_RX_PAD + xdp_headroom;
>   
> -	if (rq->use_page_pool_dma) {
> -		int offset = buf - page_address(page) +
> -			     VIRTNET_RX_PAD + xdp_headroom;
> -
> -		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> -	}
> -
>   	len -= vi->hdr_len;
>   	u64_stats_add(&stats->bytes, len);
>   
> @@ -2398,9 +2391,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   
>   	head_skb = NULL;
>   
> -	if (rq->use_page_pool_dma)
> -		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> -
>   	u64_stats_add(&stats->bytes, len - vi->hdr_len);
>   
>   	if (check_mergeable_len(dev, ctx, len))
> @@ -2563,6 +2553,13 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>   		return;
>   	}
>   
> +	if (rq->use_page_pool_dma) {
> +		struct page *page = virt_to_head_page(buf);
> +		int offset = buf - page_address(page);
> +
> +		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> +	}
> +
>   	/* About the flags below:
>   	 * 1. Save the flags early, as the XDP program might overwrite them.
>   	 * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 16:54     ` Omar Elghoul
@ 2026-03-23 17:10       ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2026-03-23 17:10 UTC (permalink / raw)
  To: Omar Elghoul
  Cc: vishs, andrew+netdev, davem, dw, edumazet, eperezma,
	ilias.apalodimas, jasowang, kernel-team, kuba, linux-kernel,
	netdev, pabeni, technoboy85, virtualization, xuanzhuo

On Mon, Mar 23, 2026 at 12:54:03PM -0400, Omar Elghoul wrote:
> On 3/23/26 11:52 AM, Michael S. Tsirkin wrote:
> > On Mon, Mar 23, 2026 at 11:01:31AM -0400, Omar Elghoul wrote:
> > > [...]
> > Well... I am not sure how I missed it. Obvious in hindsight:
> > 
> > static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> >                          void *buf, unsigned int len, void **ctx,
> >                          unsigned int *xdp_xmit,
> >                          struct virtnet_rq_stats *stats)
> > {
> >          struct net_device *dev = vi->dev;
> >          struct sk_buff *skb;
> >          u8 flags;
> >          if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
> >                  pr_debug("%s: short packet %i\n", dev->name, len);
> >                  DEV_STATS_INC(dev, rx_length_errors);
> >                  virtnet_rq_free_buf(vi, rq, buf);
> >                  return;
> >          }
> >          /* About the flags below:
> >           * 1. Save the flags early, as the XDP program might overwrite them.
> >           * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
> >           * stay valid after XDP processing.
> >           * 2. XDP doesn't work with partially checksummed packets (refer to
> >           * virtnet_xdp_set()), so packets marked as
> >           * VIRTIO_NET_HDR_F_NEEDS_CSUM get dropped during XDP processing.
> >           */
> >          if (vi->mergeable_rx_bufs) {
> >                  flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
> >                  skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
> >                                          stats);
> >          } else if (vi->big_packets) {
> >                  void *p = page_address((struct page *)buf);
> >                  flags = ((struct virtio_net_common_hdr *)p)->hdr.flags;
> >                  skb = receive_big(dev, vi, rq, buf, len, stats);
> >          } else {
> >                  flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
> >                  skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
> >          }
> > 
> > 
> > So we are reading the header, before dma sync, which is within
> > receive_mergeable and friends:
> Thank you for your analysis and explanation.
> > 
> > static struct sk_buff *receive_mergeable(struct net_device *dev,
> >                                           struct virtnet_info *vi,
> >                                           struct receive_queue *rq,
> >                                           void *buf,
> >                                           void *ctx,
> >                                           unsigned int len,
> >                                           unsigned int *xdp_xmit,
> >                                           struct virtnet_rq_stats *stats)
> > {
> >          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> >          int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> >          struct page *page = virt_to_head_page(buf);
> >          int offset = buf - page_address(page);
> >          struct sk_buff *head_skb, *curr_skb;
> >          unsigned int truesize = mergeable_ctx_to_truesize(ctx);
> >          unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> >          head_skb = NULL;
> >          if (rq->use_page_pool_dma)
> >                  page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> > 
> > 
> > Just as a test, the below should fix it (compiled only), but the real
> > fix is more complex since we need to be careful to avoid expensive syncing
> > twice.
> 
> I applied your patch and tested it on my system. With this change, I could
> not reproduce the same error anymore. I would be happy to test a proper fix
> once you have one.
> 
> > 
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 97035b49bae7..57b4f5954bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -931,9 +931,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >   static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> >   {
> > +	void *buf;
> > +
> >   	BUG_ON(!rq->page_pool);
> > -	return virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +	if (buf && rq->use_page_pool_dma && *len) {
> > +		struct page *page = virt_to_head_page(buf);
> > +		int offset = buf - page_address(page);
> > +
> > +		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, *len);
> > +	}
> > +
> > +	return buf;
> >   }
> >   static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> > 
> > 
> > 
> > 

just sent


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 17:09       ` Omar Elghoul
@ 2026-03-23 17:50         ` Vishwanath Seshagiri
  2026-03-23 23:37           ` Michael S. Tsirkin
  0 siblings, 1 reply; 22+ messages in thread
From: Vishwanath Seshagiri @ 2026-03-23 17:50 UTC (permalink / raw)
  To: Omar Elghoul, Michael S. Tsirkin
  Cc: andrew+netdev, davem, dw, edumazet, eperezma, ilias.apalodimas,
	jasowang, kernel-team, kuba, linux-kernel, netdev, pabeni,
	technoboy85, virtualization, xuanzhuo



On 3/23/26 10:39 PM, Omar Elghoul wrote:
> On 3/23/26 12:58 PM, Michael S. Tsirkin wrote:
> 
>> On Mon, Mar 23, 2026 at 11:52:34AM -0400, Michael S. Tsirkin wrote:
>>> On Mon, Mar 23, 2026 at 11:01:31AM -0400, Omar Elghoul wrote:
>>>> [...]
>>> Well... I am not sure how I missed it. Obvious in hindsight:
>>>
>>> static void receive_buf(struct virtnet_info *vi, struct receive_queue 
>>> *rq,
>>>                          void *buf, unsigned int len, void **ctx,
>>>                          unsigned int *xdp_xmit,
>>>                          struct virtnet_rq_stats *stats)
>>> {
>>>          struct net_device *dev = vi->dev;
>>>          struct sk_buff *skb;
>>>          u8 flags;
>>>          if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
>>>                  pr_debug("%s: short packet %i\n", dev->name, len);
>>>                  DEV_STATS_INC(dev, rx_length_errors);
>>>                  virtnet_rq_free_buf(vi, rq, buf);
>>>                  return;
>>>          }
>>>          /* About the flags below:
>>>           * 1. Save the flags early, as the XDP program might 
>>> overwrite them.
>>>           * These flags ensure packets marked as 
>>> VIRTIO_NET_HDR_F_DATA_VALID
>>>           * stay valid after XDP processing.
>>>           * 2. XDP doesn't work with partially checksummed packets 
>>> (refer to
>>>           * virtnet_xdp_set()), so packets marked as
>>>           * VIRTIO_NET_HDR_F_NEEDS_CSUM get dropped during XDP 
>>> processing.
>>>           */
>>>          if (vi->mergeable_rx_bufs) {
>>>                  flags = ((struct virtio_net_common_hdr *)buf)- 
>>> >hdr.flags;
>>>                  skb = receive_mergeable(dev, vi, rq, buf, ctx, len, 
>>> xdp_xmit,
>>>                                          stats);
>>>          } else if (vi->big_packets) {
>>>                  void *p = page_address((struct page *)buf);
>>>                  flags = ((struct virtio_net_common_hdr *)p)->hdr.flags;
>>>                  skb = receive_big(dev, vi, rq, buf, len, stats);
>>>          } else {
>>>                  flags = ((struct virtio_net_common_hdr *)buf)- 
>>> >hdr.flags;
>>>                  skb = receive_small(dev, vi, rq, buf, ctx, len, 
>>> xdp_xmit, stats);
>>>          }
>>>
>>>
>>> So we are reading the header, before dma sync, which is within
>>> receive_mergeable and friends:
>>>
>>> static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>                                           struct virtnet_info *vi,
>>>                                           struct receive_queue *rq,
>>>                                           void *buf,
>>>                                           void *ctx,
>>>                                           unsigned int len,
>>>                                           unsigned int *xdp_xmit,
>>>                                           struct virtnet_rq_stats 
>>> *stats)
>>> {
>>>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>>          int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>>>          struct page *page = virt_to_head_page(buf);
>>>          int offset = buf - page_address(page);
>>>          struct sk_buff *head_skb, *curr_skb;
>>>          unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>>>          unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>>>          head_skb = NULL;
>>>          if (rq->use_page_pool_dma)
>>>                  page_pool_dma_sync_for_cpu(rq->page_pool, page, 
>>> offset, len);
>>>
>>>
>>> Just as a test, the below should fix it (compiled only), but the real
>>> fix is more complex since we need to be careful to avoid expensive 
>>> syncing
>>> twice.
>>>
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 97035b49bae7..57b4f5954bed 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -931,9 +931,19 @@ static struct sk_buff *page_to_skb(struct 
>>> virtnet_info *vi,
>>>   static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, 
>>> void **ctx)
>>>   {
>>> +    void *buf;
>>> +
>>>       BUG_ON(!rq->page_pool);
>>> -    return virtqueue_get_buf_ctx(rq->vq, len, ctx);
>>> +    buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
>>> +    if (buf && rq->use_page_pool_dma && *len) {
>>> +        struct page *page = virt_to_head_page(buf);
>>> +        int offset = buf - page_address(page);
>>> +
>>> +        page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, *len);
>>> +    }
>>> +
>>> +    return buf;
>>>   }
>>>   static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>>>
>>>
>>>
>>>
>>> -- 
>>> MST
>> or maybe like this:
> Both of these patches resolve the issue on my end.
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 97035b49bae7..835f52651006 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1956,13 +1956,6 @@ static struct sk_buff *receive_small(struct 
>> net_device *dev,
>>        */
>>       buf -= VIRTNET_RX_PAD + xdp_headroom;
>> -    if (rq->use_page_pool_dma) {
>> -        int offset = buf - page_address(page) +
>> -                 VIRTNET_RX_PAD + xdp_headroom;
>> -
>> -        page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
>> -    }
>> -
>>       len -= vi->hdr_len;
>>       u64_stats_add(&stats->bytes, len);
>> @@ -2398,9 +2391,6 @@ static struct sk_buff *receive_mergeable(struct 
>> net_device *dev,
>>       head_skb = NULL;
>> -    if (rq->use_page_pool_dma)
>> -        page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
>> -
>>       u64_stats_add(&stats->bytes, len - vi->hdr_len);
>>       if (check_mergeable_len(dev, ctx, len))
>> @@ -2563,6 +2553,13 @@ static void receive_buf(struct virtnet_info 
>> *vi, struct receive_queue *rq,
>>           return;
>>       }
>> +    if (rq->use_page_pool_dma) {
>> +        struct page *page = virt_to_head_page(buf);
>> +        int offset = buf - page_address(page);
>> +
>> +        page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
>> +    }
>> +
>>       /* About the flags below:
>>        * 1. Save the flags early, as the XDP program might overwrite 
>> them.
>>        * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
>>

Hello Omar. Thank you for the bisect, apologies for breaking the
functionality. Thank you Michael for the patch, I will be careful next
time about DMA sync issues. Looks like the bug is fixed, is there
anything else needed from my end?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 17:50         ` Vishwanath Seshagiri
@ 2026-03-23 23:37           ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2026-03-23 23:37 UTC (permalink / raw)
  To: Vishwanath Seshagiri
  Cc: Omar Elghoul, andrew+netdev, davem, dw, edumazet, eperezma,
	ilias.apalodimas, jasowang, kernel-team, kuba, linux-kernel,
	netdev, pabeni, technoboy85, virtualization, xuanzhuo

On Mon, Mar 23, 2026 at 01:50:29PM -0400, Vishwanath Seshagiri wrote:
> 
> 
> On 3/23/26 10:39 PM, Omar Elghoul wrote:
> > On 3/23/26 12:58 PM, Michael S. Tsirkin wrote:
> > 
> > > On Mon, Mar 23, 2026 at 11:52:34AM -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Mar 23, 2026 at 11:01:31AM -0400, Omar Elghoul wrote:
> > > > > [...]
> > > > Well... I am not sure how I missed it. Obvious in hindsight:
> > > > 
> > > > static void receive_buf(struct virtnet_info *vi, struct
> > > > receive_queue *rq,
> > > >                          void *buf, unsigned int len, void **ctx,
> > > >                          unsigned int *xdp_xmit,
> > > >                          struct virtnet_rq_stats *stats)
> > > > {
> > > >          struct net_device *dev = vi->dev;
> > > >          struct sk_buff *skb;
> > > >          u8 flags;
> > > >          if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
> > > >                  pr_debug("%s: short packet %i\n", dev->name, len);
> > > >                  DEV_STATS_INC(dev, rx_length_errors);
> > > >                  virtnet_rq_free_buf(vi, rq, buf);
> > > >                  return;
> > > >          }
> > > >          /* About the flags below:
> > > >           * 1. Save the flags early, as the XDP program might
> > > > overwrite them.
> > > >           * These flags ensure packets marked as
> > > > VIRTIO_NET_HDR_F_DATA_VALID
> > > >           * stay valid after XDP processing.
> > > >           * 2. XDP doesn't work with partially checksummed
> > > > packets (refer to
> > > >           * virtnet_xdp_set()), so packets marked as
> > > >           * VIRTIO_NET_HDR_F_NEEDS_CSUM get dropped during XDP
> > > > processing.
> > > >           */
> > > >          if (vi->mergeable_rx_bufs) {
> > > >                  flags = ((struct virtio_net_common_hdr *)buf)-
> > > > >hdr.flags;
> > > >                  skb = receive_mergeable(dev, vi, rq, buf, ctx,
> > > > len, xdp_xmit,
> > > >                                          stats);
> > > >          } else if (vi->big_packets) {
> > > >                  void *p = page_address((struct page *)buf);
> > > >                  flags = ((struct virtio_net_common_hdr *)p)->hdr.flags;
> > > >                  skb = receive_big(dev, vi, rq, buf, len, stats);
> > > >          } else {
> > > >                  flags = ((struct virtio_net_common_hdr *)buf)-
> > > > >hdr.flags;
> > > >                  skb = receive_small(dev, vi, rq, buf, ctx, len,
> > > > xdp_xmit, stats);
> > > >          }
> > > > 
> > > > 
> > > > So we are reading the header, before dma sync, which is within
> > > > receive_mergeable and friends:
> > > > 
> > > > static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >                                           struct virtnet_info *vi,
> > > >                                           struct receive_queue *rq,
> > > >                                           void *buf,
> > > >                                           void *ctx,
> > > >                                           unsigned int len,
> > > >                                           unsigned int *xdp_xmit,
> > > >                                           struct
> > > > virtnet_rq_stats *stats)
> > > > {
> > > >          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> > > >          int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> > > >          struct page *page = virt_to_head_page(buf);
> > > >          int offset = buf - page_address(page);
> > > >          struct sk_buff *head_skb, *curr_skb;
> > > >          unsigned int truesize = mergeable_ctx_to_truesize(ctx);
> > > >          unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> > > >          head_skb = NULL;
> > > >          if (rq->use_page_pool_dma)
> > > >                  page_pool_dma_sync_for_cpu(rq->page_pool, page,
> > > > offset, len);
> > > > 
> > > > 
> > > > Just as a test, the below should fix it (compiled only), but the real
> > > > fix is more complex since we need to be careful to avoid
> > > > expensive syncing
> > > > twice.
> > > > 
> > > > 
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 97035b49bae7..57b4f5954bed 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -931,9 +931,19 @@ static struct sk_buff *page_to_skb(struct
> > > > virtnet_info *vi,
> > > >   static void *virtnet_rq_get_buf(struct receive_queue *rq, u32
> > > > *len, void **ctx)
> > > >   {
> > > > +    void *buf;
> > > > +
> > > >       BUG_ON(!rq->page_pool);
> > > > -    return virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > +    buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > +    if (buf && rq->use_page_pool_dma && *len) {
> > > > +        struct page *page = virt_to_head_page(buf);
> > > > +        int offset = buf - page_address(page);
> > > > +
> > > > +        page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, *len);
> > > > +    }
> > > > +
> > > > +    return buf;
> > > >   }
> > > >   static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> > > > 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > MST
> > > or maybe like this:
> > Both of these patches resolve the issue on my end.
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 97035b49bae7..835f52651006 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1956,13 +1956,6 @@ static struct sk_buff *receive_small(struct
> > > net_device *dev,
> > >        */
> > >       buf -= VIRTNET_RX_PAD + xdp_headroom;
> > > -    if (rq->use_page_pool_dma) {
> > > -        int offset = buf - page_address(page) +
> > > -                 VIRTNET_RX_PAD + xdp_headroom;
> > > -
> > > -        page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> > > -    }
> > > -
> > >       len -= vi->hdr_len;
> > >       u64_stats_add(&stats->bytes, len);
> > > @@ -2398,9 +2391,6 @@ static struct sk_buff
> > > *receive_mergeable(struct net_device *dev,
> > >       head_skb = NULL;
> > > -    if (rq->use_page_pool_dma)
> > > -        page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> > > -
> > >       u64_stats_add(&stats->bytes, len - vi->hdr_len);
> > >       if (check_mergeable_len(dev, ctx, len))
> > > @@ -2563,6 +2553,13 @@ static void receive_buf(struct virtnet_info
> > > *vi, struct receive_queue *rq,
> > >           return;
> > >       }
> > > +    if (rq->use_page_pool_dma) {
> > > +        struct page *page = virt_to_head_page(buf);
> > > +        int offset = buf - page_address(page);
> > > +
> > > +        page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> > > +    }
> > > +
> > >       /* About the flags below:
> > >        * 1. Save the flags early, as the XDP program might overwrite
> > > them.
> > >        * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
> > > 
> 
> Hello Omar. Thank you for the bisect, apologies for breaking the
> functionality. Thank you Michael for the patch, I will be careful next
> time about DMA sync issues. Looks like the bug is fixed, is there
> anything else needed from my end?
> 


Would be great to have you test the patch on your end and report
the performace. Thanks!

-- 
MST


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 16:58     ` Michael S. Tsirkin
  2026-03-23 17:09       ` Omar Elghoul
@ 2026-03-24  0:34       ` Jason Wang
  2026-03-24  8:20       ` Aithal, Srikanth
  2 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2026-03-24  0:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Omar Elghoul, vishs, andrew+netdev, davem, dw, edumazet, eperezma,
	ilias.apalodimas, kernel-team, kuba, linux-kernel, netdev, pabeni,
	technoboy85, virtualization, xuanzhuo

On Tue, Mar 24, 2026 at 12:58 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Mar 23, 2026 at 11:52:34AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Mar 23, 2026 at 11:01:31AM -0400, Omar Elghoul wrote:
> > > Hi,
> > >
> > > I've been testing linux-next (tags later than 03/17) and hit new issues in
> > > virtio-net on s390x. I bisected the issue, and I found this patch to be the
> > > first buggy commit.
> > >
> > > The issue seems to only be reproducible when running in Secure Execution.
> > > Tested in a KVM guest, the virtio-net performance appears greatly reduced,
> > > and the dmesg output shows many instances of the following error messages.
> > >
> > > Partial relevant logs
> > > =====================
> > > [   49.332028] macvtap0: bad gso: type: 0, size: 0, flags 1 tunnel 0 tnl csum 0
> > > [   74.365668] macvtap0: bad gso: type: 2e, size: 27948, flags 0 tunnel 0 tnl csum 0
> > > [  403.302168] macvtap0: bad csum: flags: 2, gso_type: 23 rx_tnl_csum 0
> > > [  403.302271] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
> > > [  403.302279] macvtap0: bad csum: flags: 2, gso_type: e1 rx_tnl_csum 0
> > > [  403.309492] macvtap0: bad csum: flags: 2, gso_type: 4c rx_tnl_csum 0
> > > [  403.317029] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
> > >
> > > Steps to reproduce
> > > ==================
> > > 1. Boot a Linux guest implementing this patch under QEMU/KVM (*) with SE
> > >    enabled and a virtio-net-ccw device attached.
> > > 2. Run dmesg. The error message is usually already present at boot time,
> > >    but if not, it can be reproduced by creating any network traffic.
> > >
> > > (*) This patch was not tested in a non-KVM hypervisor environment.
> > >
> > > I've further confirmed that reverting this patch onto its parent commit
> > > resolves the issue. Please let me know if you'd like me to test a fix or if
> > > you would need more information.
> > >
> > > Thanks in advance.
> > >
> > > Best,
> > > Omar
> >
> > Well... I am not sure how I missed it. Obvious in hindsight:
> >
> > static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> >                         void *buf, unsigned int len, void **ctx,
> >                         unsigned int *xdp_xmit,
> >                         struct virtnet_rq_stats *stats)
> > {
> >         struct net_device *dev = vi->dev;
> >         struct sk_buff *skb;
> >         u8 flags;
> >
> >         if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
> >                 pr_debug("%s: short packet %i\n", dev->name, len);
> >                 DEV_STATS_INC(dev, rx_length_errors);
> >                 virtnet_rq_free_buf(vi, rq, buf);
> >                 return;
> >         }
> >
> >         /* About the flags below:
> >          * 1. Save the flags early, as the XDP program might overwrite them.
> >          * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
> >          * stay valid after XDP processing.
> >          * 2. XDP doesn't work with partially checksummed packets (refer to
> >          * virtnet_xdp_set()), so packets marked as
> >          * VIRTIO_NET_HDR_F_NEEDS_CSUM get dropped during XDP processing.
> >          */
> >
> >         if (vi->mergeable_rx_bufs) {
> >                 flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
> >                 skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
> >                                         stats);
> >         } else if (vi->big_packets) {
> >                 void *p = page_address((struct page *)buf);
> >
> >                 flags = ((struct virtio_net_common_hdr *)p)->hdr.flags;
> >                 skb = receive_big(dev, vi, rq, buf, len, stats);
> >         } else {
> >                 flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
> >                 skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
> >         }
> >
> >
> > So we are reading the header, before dma sync, which is within
> > receive_mergeable and friends:
> >
> > static struct sk_buff *receive_mergeable(struct net_device *dev,
> >                                          struct virtnet_info *vi,
> >                                          struct receive_queue *rq,
> >                                          void *buf,
> >                                          void *ctx,
> >                                          unsigned int len,
> >                                          unsigned int *xdp_xmit,
> >                                          struct virtnet_rq_stats *stats)
> > {
> >         struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> >         int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> >         struct page *page = virt_to_head_page(buf);
> >         int offset = buf - page_address(page);
> >         struct sk_buff *head_skb, *curr_skb;
> >         unsigned int truesize = mergeable_ctx_to_truesize(ctx);
> >         unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> >
> >         head_skb = NULL;
> >
> >         if (rq->use_page_pool_dma)
> >                 page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> >
> >
> >
> > Just as a test, the below should fix it (compiled only), but the real
> > fix is more complex since we need to be careful to avoid expensive syncing
> > twice.
> >
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 97035b49bae7..57b4f5954bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -931,9 +931,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >
> >  static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> >  {
> > +     void *buf;
> > +
> >       BUG_ON(!rq->page_pool);
> >
> > -     return virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +     buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +     if (buf && rq->use_page_pool_dma && *len) {
> > +             struct page *page = virt_to_head_page(buf);
> > +             int offset = buf - page_address(page);
> > +
> > +             page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, *len);
> > +     }
> > +
> > +     return buf;
> >  }
> >
> >  static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> >
> >
> >
> >
> > --
> > MST
>
> or maybe like this:
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 97035b49bae7..835f52651006 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1956,13 +1956,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>          */
>         buf -= VIRTNET_RX_PAD + xdp_headroom;
>
> -       if (rq->use_page_pool_dma) {
> -               int offset = buf - page_address(page) +
> -                            VIRTNET_RX_PAD + xdp_headroom;
> -
> -               page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> -       }
> -
>         len -= vi->hdr_len;
>         u64_stats_add(&stats->bytes, len);
>
> @@ -2398,9 +2391,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>
>         head_skb = NULL;
>
> -       if (rq->use_page_pool_dma)
> -               page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> -
>         u64_stats_add(&stats->bytes, len - vi->hdr_len);
>
>         if (check_mergeable_len(dev, ctx, len))
> @@ -2563,6 +2553,13 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>                 return;
>         }
>
> +       if (rq->use_page_pool_dma) {
> +               struct page *page = virt_to_head_page(buf);
> +               int offset = buf - page_address(page);
> +
> +               page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> +       }
> +
>         /* About the flags below:
>          * 1. Save the flags early, as the XDP program might overwrite them.
>          * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
>

This looks better.

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation
  2026-03-23 16:58     ` Michael S. Tsirkin
  2026-03-23 17:09       ` Omar Elghoul
  2026-03-24  0:34       ` Jason Wang
@ 2026-03-24  8:20       ` Aithal, Srikanth
  2 siblings, 0 replies; 22+ messages in thread
From: Aithal, Srikanth @ 2026-03-24  8:20 UTC (permalink / raw)
  To: Michael S. Tsirkin, Omar Elghoul
  Cc: vishs, andrew+netdev, davem, dw, edumazet, eperezma,
	ilias.apalodimas, jasowang, kernel-team, kuba, linux-kernel,
	netdev, pabeni, technoboy85, virtualization, xuanzhuo

On 3/23/2026 10:28 PM, Michael S. Tsirkin wrote:
> On Mon, Mar 23, 2026 at 11:52:34AM -0400, Michael S. Tsirkin wrote:
>> On Mon, Mar 23, 2026 at 11:01:31AM -0400, Omar Elghoul wrote:
>>> Hi,
>>>
>>> I've been testing linux-next (tags later than 03/17) and hit new issues in
>>> virtio-net on s390x. I bisected the issue, and I found this patch to be the
>>> first buggy commit.
>>>
>>> The issue seems to only be reproducible when running in Secure Execution.
>>> Tested in a KVM guest, the virtio-net performance appears greatly reduced,
>>> and the dmesg output shows many instances of the following error messages.
>>>
>>> Partial relevant logs
>>> =====================
>>> [   49.332028] macvtap0: bad gso: type: 0, size: 0, flags 1 tunnel 0 tnl csum 0
>>> [   74.365668] macvtap0: bad gso: type: 2e, size: 27948, flags 0 tunnel 0 tnl csum 0
>>> [  403.302168] macvtap0: bad csum: flags: 2, gso_type: 23 rx_tnl_csum 0
>>> [  403.302271] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
>>> [  403.302279] macvtap0: bad csum: flags: 2, gso_type: e1 rx_tnl_csum 0
>>> [  403.309492] macvtap0: bad csum: flags: 2, gso_type: 4c rx_tnl_csum 0
>>> [  403.317029] macvtap0: bad csum: flags: 2, gso_type: e0 rx_tnl_csum 0
>>>
>>> Steps to reproduce
>>> ==================
>>> 1. Boot a Linux guest implementing this patch under QEMU/KVM (*) with SE
>>>     enabled and a virtio-net-ccw device attached.
>>> 2. Run dmesg. The error message is usually already present at boot time,
>>>     but if not, it can be reproduced by creating any network traffic.
>>>
>>> (*) This patch was not tested in a non-KVM hypervisor environment.
>>>
>>> I've further confirmed that reverting this patch onto its parent commit
>>> resolves the issue. Please let me know if you'd like me to test a fix or if
>>> you would need more information.
>>>
>>> Thanks in advance.
>>>
>>> Best,
>>> Omar
>>
>> Well... I am not sure how I missed it. Obvious in hindsight:
>>
>> static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>>                          void *buf, unsigned int len, void **ctx,
>>                          unsigned int *xdp_xmit,
>>                          struct virtnet_rq_stats *stats)
>> {
>>          struct net_device *dev = vi->dev;
>>          struct sk_buff *skb;
>>          u8 flags;
>>                  
>>          if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
>>                  pr_debug("%s: short packet %i\n", dev->name, len);
>>                  DEV_STATS_INC(dev, rx_length_errors);
>>                  virtnet_rq_free_buf(vi, rq, buf);
>>                  return;
>>          }
>>          
>>          /* About the flags below:
>>           * 1. Save the flags early, as the XDP program might overwrite them.
>>           * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
>>           * stay valid after XDP processing.
>>           * 2. XDP doesn't work with partially checksummed packets (refer to
>>           * virtnet_xdp_set()), so packets marked as
>>           * VIRTIO_NET_HDR_F_NEEDS_CSUM get dropped during XDP processing.
>>           */
>>                  
>>          if (vi->mergeable_rx_bufs) {
>>                  flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
>>                  skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
>>                                          stats);
>>          } else if (vi->big_packets) {
>>                  void *p = page_address((struct page *)buf);
>>                  
>>                  flags = ((struct virtio_net_common_hdr *)p)->hdr.flags;
>>                  skb = receive_big(dev, vi, rq, buf, len, stats);
>>          } else {
>>                  flags = ((struct virtio_net_common_hdr *)buf)->hdr.flags;
>>                  skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
>>          }
>>
>>
>> So we are reading the header, before dma sync, which is within
>> receive_mergeable and friends:
>>
>> static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                                           struct virtnet_info *vi,
>>                                           struct receive_queue *rq,
>>                                           void *buf,
>>                                           void *ctx,
>>                                           unsigned int len,
>>                                           unsigned int *xdp_xmit,
>>                                           struct virtnet_rq_stats *stats)
>> {
>>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>          int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>>          struct page *page = virt_to_head_page(buf);
>>          int offset = buf - page_address(page);
>>          struct sk_buff *head_skb, *curr_skb;
>>          unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>>          unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>>                  
>>          head_skb = NULL;
>>                  
>>          if (rq->use_page_pool_dma)
>>                  page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
>>                  
>>
>>
>> Just as a test, the below should fix it (compiled only), but the real
>> fix is more complex since we need to be careful to avoid expensive syncing
>> twice.
>>
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 97035b49bae7..57b4f5954bed 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -931,9 +931,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>   
>>   static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
>>   {
>> +	void *buf;
>> +
>>   	BUG_ON(!rq->page_pool);
>>   
>> -	return virtqueue_get_buf_ctx(rq->vq, len, ctx);
>> +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
>> +	if (buf && rq->use_page_pool_dma && *len) {
>> +		struct page *page = virt_to_head_page(buf);
>> +		int offset = buf - page_address(page);
>> +
>> +		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, *len);
>> +	}
>> +
>> +	return buf;
>>   }
>>   
>>   static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>>
>>
>>
>>
>> -- 
>> MST
> 
> or maybe like this:
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 97035b49bae7..835f52651006 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1956,13 +1956,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   	 */
>   	buf -= VIRTNET_RX_PAD + xdp_headroom;
>   
> -	if (rq->use_page_pool_dma) {
> -		int offset = buf - page_address(page) +
> -			     VIRTNET_RX_PAD + xdp_headroom;
> -
> -		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> -	}
> -
>   	len -= vi->hdr_len;
>   	u64_stats_add(&stats->bytes, len);
>   
> @@ -2398,9 +2391,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   
>   	head_skb = NULL;
>   
> -	if (rq->use_page_pool_dma)
> -		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> -
>   	u64_stats_add(&stats->bytes, len - vi->hdr_len);
>   
>   	if (check_mergeable_len(dev, ctx, len))
> @@ -2563,6 +2553,13 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>   		return;
>   	}
>   
> +	if (rq->use_page_pool_dma) {
> +		struct page *page = virt_to_head_page(buf);
> +		int offset = buf - page_address(page);
> +
> +		page_pool_dma_sync_for_cpu(rq->page_pool, page, offset, len);
> +	}
> +
>   	/* About the flags below:
>   	 * 1. Save the flags early, as the XDP program might overwrite them.
>   	 * These flags ensure packets marked as VIRTIO_NET_HDR_F_DATA_VALID
> 

I see the same virtio_net regression introduced by commit 24fbd3967f3f 
(“virtio_net: add page_pool support for buffer allocation”) on AMD EPYC 
(KVM/QEMU secure guests with virtio-net), in addition to s390x as in 
Omar Elghoul’s report [1].

The failure mode matches: broken guest networking until that commit is 
reverted or worked around.

I built and tested both of Michael Tsirkin’s proposed fixes in this 
thread [2] [3]; each resolves the issue on my AMD EPYC setup.

[1] 
https://lore.kernel.org/all/20260323150136.14452-1-oelghoul@linux.ibm.com/
[2] 
https://lore.kernel.org/all/20260323114313-mutt-send-email-mst@kernel.org/
[3] 
https://lore.kernel.org/all/20260323122728-mutt-send-email-mst@kernel.org/#t

Tested-by: Srikanth Aithal <sraithal@amd.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-03-24  8:20 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 18:31 [PATCH net-next v11] virtio_net: add page_pool support for buffer allocation Vishwanath Seshagiri
2026-03-13  7:51 ` Jason Wang
2026-03-13  9:26   ` Vishwanath Seshagiri
2026-03-16  7:41     ` Jason Wang
2026-03-13 16:50   ` Vishwanath Seshagiri
2026-03-16  7:35     ` Jason Wang
2026-03-16  9:56 ` Michael S. Tsirkin
2026-03-16 10:43   ` Michael S. Tsirkin
2026-03-16 11:57     ` Vishwanath Seshagiri
2026-03-16 12:04       ` Michael S. Tsirkin
2026-03-17  2:30 ` patchwork-bot+netdevbpf
2026-03-23 15:01 ` Omar Elghoul
2026-03-23 15:01   ` Omar Elghoul
2026-03-23 15:52   ` Michael S. Tsirkin
2026-03-23 16:54     ` Omar Elghoul
2026-03-23 17:10       ` Michael S. Tsirkin
2026-03-23 16:58     ` Michael S. Tsirkin
2026-03-23 17:09       ` Omar Elghoul
2026-03-23 17:50         ` Vishwanath Seshagiri
2026-03-23 23:37           ` Michael S. Tsirkin
2026-03-24  0:34       ` Jason Wang
2026-03-24  8:20       ` Aithal, Srikanth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox