public inbox for virtualization@lists.linux-foundation.org
 help / color / mirror / Atom feed
From: Vishwanath Seshagiri <vishs@meta.com>
To: "Michael S . Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>
Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S . Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>, "David Wei" <dw@davidwei.uk>,
	"Matteo Croce" <technoboy85@gmail.com>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	netdev@vger.kernel.org, virtualization@lists.linux.dev,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: [PATCH net-next v5 0/2] virtio_net: add page_pool support
Date: Thu, 5 Feb 2026 16:27:13 -0800	[thread overview]
Message-ID: <20260206002715.1885869-1-vishs@meta.com> (raw)

Introduce page_pool support in virtio_net driver to enable page
recycling in RX buffer allocation and avoid repeated page allocator
calls. This applies to mergeable and small buffer modes.

Beyond performance improvements, this patch is a prerequisite for
enabling memory provider-based zero-copy features in virtio_net,
specifically devmem TCP and io_uring ZCRX, which require drivers to
use page_pool for buffer management.

The implementation preserves the DMA premapping optimization introduced
in commit 31f3cd4e5756 ("virtio-net: rq submits premapped per-buffer")
by conditionally using PP_FLAG_DMA_MAP when the virtio backend supports
standard DMA API (vhost, virtio-pci), and falling back to
allocation-only mode for backends with custom DMA mechanisms (VDUSE).

================================================================================
                      VIRTIO-NET PAGE POOL BENCHMARK RESULTS
================================================================================

CONFIGURATION
-------------
- Host: pktgen TX -> tap interface -> vhost-net
- Guest: virtio-net RX -> XDP_DROP
- Packet sizes: small buffers - 64; merge receivable 64, 1500

SMALL PACKETS (64 bytes)
==================================================
Queues | Base (pps)  | Page Pool (pps) | Improvement | Base (Gb/s) |PP (Gb/s)
-------|-------------|-----------------|-------------|-------------|----------
  1Q   |    853,493  |      868,923    |   +1.8%     |    0.44     | 0.44
  2Q   |  1,655,793  |    1,696,707    |   +2.5%     |    0.85     | 0.87
  4Q   |  3,143,375  |    3,302,511    |   +5.1%     |    1.61     | 1.69
  8Q   |  6,082,590  |    6,156,894    |   +1.2%     |    3.11     | 3.15


RECEIVE MERGEABLE (64 bytes)
======================================================
Queues | Base (pps)  | Page Pool (pps) | Improvement | Base (Gb/s) |PP (Gb/s)
-------|-------------|-----------------|-------------|-------------|----------
   1Q  |    766,168  |      814,493    |   +6.3%     |    0.39     | 0.42
   2Q  |  1,384,871  |    1,670,639    |  +20.6%     |    0.71     | 0.86
   4Q  |  2,773,081  |    3,080,574    |  +11.1%     |    1.42     | 1.58
   8Q  |  5,600,615  |    6,043,891    |   +7.9%     |    2.87     | 3.10


RECEIVE MERGEABLE (1500 bytes)
========================================================
Queues | Base (pps)  | Page Pool (pps) | Improvement | Base (Gb/s) |PP (Gb/s)
-------|-------------|-----------------|-------------|-------------|----------
  1Q   |    741,579  |      785,442    |   +5.9%     |    8.90     | 9.43
  2Q   |  1,310,043  |    1,534,554    |  +17.1%     |   15.72     | 18.41
  4Q   |  2,748,700  |    2,890,582    |   +5.2%     |   32.98     | 34.69
  8Q   |  5,348,589  |    5,618,664    |   +5.0%     |   64.18     | 67.42

The page_pool implementation showed consistent performance improvements
by eliminating per packet overhead of allocating and deallocating
memory. When running the performance benchmarks, I noticed that
page_pool also had a consistent throughput performance compared to the
base patch where performance variability was due to accessing free_list
for getting the next set of pages.


Changes in v5
=============

Addressing reviewer feedback from v4:

- Add page_pool_frag_offset_add() helper to page_pool API to advance
  fragment offset when drivers extend buffers to consume unused page
  space (hole optimization). (Michael S. Tsirkin)
- Unify big_packets condition checks and added an  explanatory comment 
  (Michael S. Tsirkin)
- Add page_pool_dma_sync_for_cpu() calls in receive paths before
  reading buffer data when using PP_FLAG_DMA_MAP
  (Michael S. Tsirkin)
- Remove virtnet_rq_unmap() and free_receive_page_frags() entirely,
  replacing with page_pool lifecycle management (Jason Wang)
- Dropped selftests patch from the series
- v4 link: https://lore.kernel.org/virtualization/20260204193617.1200752-1-vishs@meta.com/ 

Changes in v4
=============

Addressing reviewer feedback from v3:

- Remove unnecessary !rq->page_pool check in page_to_skb()
- Reorder put_xdp_frags() parameters
- Remove unnecessary xdp_page = NULL initialization in receive_small_xdp()
- Move big_packets mode check outside the loop in virtnet_create_page_pools()
  for efficiency
- Remove unrelated whitespace changes
- v3 link: https://lore.kernel.org/virtualization/20260203231021.1331392-1-vishs@meta.com/

Changes in v3
=============

Addressing reviewer feedback from v2:

- Fix CI null-ptr-deref crash: use max_queue_pairs instead of
  curr_queue_pairs in virtnet_create_page_pools() to ensure page pools
  are created for all queues (Jason Wang, Jakub Kicinski)
- Preserve big_packets mode page->private chaining in page_to_skb() with
  conditional checks (Jason Wang)
- Use page_pool_alloc_pages() in xdp_linearize_page() and
  mergeable_xdp_get_buf() to eliminate xdp_page tracking logic and simplify
  skb_mark_for_recycle() calls (Jason Wang)
- Add page_pool_page_is_pp() check in virtnet_put_page() to safely
  handle both page_pool and non-page_pool pages (Michael S. Tsirkin)
- Remove unrelated rx_mode_work_enabled changes (Jason Wang)
- Selftest: use random ephemeral port instead of hardcoded port to avoid
  conflicts when running tests in parallel (Michael S. Tsirkin)
- v2 link: https://lore.kernel.org/virtualization/20260128212031.1431746-1-vishs@meta.com/

Changes in v2
=============

Addressing reviewer feedback from v1:

- Add "select PAGE_POOL" to Kconfig (Jason Wang)
- Move page pool creation from ndo_open to probe for device lifetime
  management (Xuan Zhuo, Jason Wang)
- Implement conditional DMA strategy using virtqueue_dma_dev():
  - When non-NULL: use PP_FLAG_DMA_MAP for page_pool-managed DMA
    premapping
  - When NULL (VDUSE): page_pool handles allocation only
- Use page_pool_get_dma_addr() + virtqueue_add_inbuf_premapped() to
  preserve DMA premapping optimization from commit 31f3cd4e5756
  ("virtio-net: rq submits premapped per-buffer") (Jason Wang)
- Remove dual allocation code paths - page_pool now always used for
  small/mergeable modes (Jason Wang)
- Remove unused virtnet_rq_alloc/virtnet_rq_init_one_sg functions
- Add comprehensive performance data (Michael S. Tsirkin)
- v1 link:
  https://lore.kernel.org/virtualization/20260106221924.123856-1-vishs@meta.com/

Vishwanath Seshagiri (2):
  page_pool: add page_pool_frag_offset_add() helper
  virtio_net: add page_pool support for buffer allocation

 drivers/net/Kconfig             |   1 +
 drivers/net/virtio_net.c        | 430 ++++++++++++++++----------------
 include/net/page_pool/helpers.h |  20 ++
 3 files changed, 241 insertions(+), 210 deletions(-)

-- 
2.47.3


             reply	other threads:[~2026-02-06  0:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06  0:27 Vishwanath Seshagiri [this message]
2026-02-06  0:27 ` [PATCH net-next v5 1/2] page_pool: add page_pool_frag_offset_add() helper Vishwanath Seshagiri
2026-02-06  0:27 ` [PATCH net-next v5 2/2] virtio_net: add page_pool support for buffer allocation Vishwanath Seshagiri
2026-02-07  4:41   ` Jakub Kicinski
2026-02-07  5:25     ` Vishwanath Seshagiri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260206002715.1885869-1-vishs@meta.com \
    --to=vishs@meta.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jasowang@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=technoboy85@gmail.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox