From: Pavel Begunkov <asml.silence@gmail.com>
To: netdev@vger.kernel.org
Cc: Andrew Lunn <andrew@lunn.ch>, Jakub Kicinski <kuba@kernel.org>,
davem@davemloft.net, Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Donald Hunter <donald.hunter@gmail.com>,
Michael Chan <michael.chan@broadcom.com>,
Pavan Chebbi <pavan.chebbi@broadcom.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Stanislav Fomichev <sdf@fomichev.me>,
Joshua Washington <joshwash@google.com>,
Harshitha Ramamurthy <hramamurthy@google.com>,
Jian Shen <shenjian15@huawei.com>,
Salil Mehta <salil.mehta@huawei.com>,
Jijie Shao <shaojijie@huawei.com>,
Sunil Goutham <sgoutham@marvell.com>,
Geetha sowjanya <gakula@marvell.com>,
Subbaraya Sundeep <sbhatta@marvell.com>,
hariprasad <hkelam@marvell.com>,
Bharat Bhushan <bbhushan2@marvell.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Alexander Duyck <alexanderduyck@fb.com>,
kernel-team@meta.com,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Joe Damato <joe@dama.to>, David Wei <dw@davidwei.uk>,
Willem de Bruijn <willemb@google.com>,
Mina Almasry <almasrymina@google.com>,
Pavel Begunkov <asml.silence@gmail.com>,
Breno Leitao <leitao@debian.org>,
Dragos Tatulea <dtatulea@nvidia.com>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-rdma@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>
Subject: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
Date: Mon, 13 Oct 2025 15:54:02 +0100 [thread overview]
Message-ID: <cover.1760364551.git.asml.silence@gmail.com> (raw)
Add support for per-queue rx buffer length configuration based on [2]
and basic infrastructure for using it in memory providers like
io_uring/zcrx. Note, it only includes net/ patches and leaves out
zcrx to be merged separately. Large rx buffers can be beneficial with
hw-gro enabled cards that can coalesce traffic, which reduces the
number of frags traversing the network stack and resuling in larger
contiguous chunks of data given to the userspace.
Benchmarks with zcrx [2+3] show up to ~30% improvement in CPU util.
E.g. comparison for 4K vs 32K buffers with a 200Gbit NIC, napi and
userspace pinned to the same CPU:
packets=23987040 (MB=2745098), rps=199559 (MB/s=22837)
CPU %usr %nice %sys %iowait %irq %soft %idle
0 1.53 0.00 27.78 2.72 1.31 66.45 0.22
packets=24078368 (MB=2755550), rps=200319 (MB/s=22924)
CPU %usr %nice %sys %iowait %irq %soft %idle
0 0.69 0.00 8.26 31.65 1.83 57.00 0.57
netdev + zcrx changes:
[1] https://github.com/isilence/linux.git zcrx/large-buffers-v4
Per queue configuration series:
[2] https://lore.kernel.org/all/20250421222827.283737-1-kuba@kernel.org/
Liburing example:
[3] https://github.com/isilence/liburing.git zcrx/rx-buf-len
---
The following changes since commit 3a8660878839faadb4f1a6dd72c3179c1df56787:
Linux 6.18-rc1 (2025-10-12 13:42:36 -0700)
are available in the Git repository at:
https://github.com/isilence/linux.git tags/net-for-6.19-queue-rx-buf-len
for you to fetch changes up to bc5737ba2a1e5586408cd0398b2db0f218ed3e89:
net: validate driver supports passed qcfg params (2025-10-13 10:04:05 +0100)
v4: - Update fbnic qops
- Propagate max buf len for hns3
- Use configured buf size in __bnxt_alloc_rx_netmem
- Minor stylistic changes
v3: https://lore.kernel.org/all/cover.1755499375.git.asml.silence@gmail.com/
- Rebased, excluded zcrx specific patches
- Set agg_size_fac to 1 on warning
v2: https://lore.kernel.org/all/cover.1754657711.git.asml.silence@gmail.com/
- Add MAX_PAGE_ORDER check on pp init
- Applied comments rewording
- Adjust pp.max_len based on order
- Patch up mlx5 queue callbacks after rebase
- Minor ->queue_mgmt_ops refactoring
- Rebased to account for both fill level and agg_size_fac
- Pass providers buf length in struct pp_memory_provider_params and
apply it in __netdev_queue_confi().
- Use ->supported_ring_params to validate drivers support of set
qcfg parameters.
Jakub Kicinski (20):
docs: ethtool: document that rx_buf_len must control payload lengths
net: ethtool: report max value for rx-buf-len
net: use zero value to restore rx_buf_len to default
net: clarify the meaning of netdev_config members
net: add rx_buf_len to netdev config
eth: bnxt: read the page size from the adapter struct
eth: bnxt: set page pool page order based on rx_page_size
eth: bnxt: support setting size of agg buffers via ethtool
net: move netdev_config manipulation to dedicated helpers
net: reduce indent of struct netdev_queue_mgmt_ops members
net: allocate per-queue config structs and pass them thru the queue
API
net: pass extack to netdev_rx_queue_restart()
net: add queue config validation callback
eth: bnxt: always set the queue mgmt ops
eth: bnxt: store the rx buf size per queue
eth: bnxt: adjust the fill level of agg queues with larger buffers
netdev: add support for setting rx-buf-len per queue
net: wipe the setting of deactived queues
eth: bnxt: use queue op config validate
eth: bnxt: support per queue configuration of rx-buf-len
Pavel Begunkov (4):
net: page_pool: sanitise allocation order
net: hns3: net: use zero to restore rx_buf_len to default
net: let pp memory provider to specify rx buf len
net: validate driver supports passed qcfg params
Documentation/netlink/specs/ethtool.yaml | 4 +
Documentation/netlink/specs/netdev.yaml | 15 ++
Documentation/networking/ethtool-netlink.rst | 7 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 148 +++++++++++---
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +-
.../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 9 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 2 +-
drivers/net/ethernet/google/gve/gve_main.c | 9 +-
.../ethernet/hisilicon/hns3/hns3_ethtool.c | 10 +-
.../marvell/octeontx2/nic/otx2_ethtool.c | 6 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 10 +-
drivers/net/ethernet/meta/fbnic/fbnic_txrx.c | 8 +-
drivers/net/netdevsim/netdev.c | 8 +-
include/linux/ethtool.h | 3 +
include/net/netdev_queues.h | 88 +++++++--
include/net/netdev_rx_queue.h | 3 +-
include/net/netlink.h | 19 ++
include/net/page_pool/types.h | 1 +
.../uapi/linux/ethtool_netlink_generated.h | 1 +
include/uapi/linux/netdev.h | 2 +
net/core/Makefile | 1 +
net/core/dev.c | 12 +-
net/core/dev.h | 15 ++
net/core/netdev-genl-gen.c | 15 ++
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 92 +++++++++
net/core/netdev_config.c | 183 ++++++++++++++++++
net/core/netdev_rx_queue.c | 22 ++-
net/core/page_pool.c | 3 +
net/ethtool/common.c | 4 +-
net/ethtool/netlink.c | 14 +-
net/ethtool/rings.c | 14 +-
tools/include/uapi/linux/netdev.h | 2 +
34 files changed, 650 insertions(+), 92 deletions(-)
create mode 100644 net/core/netdev_config.c
--
2.49.0
next reply other threads:[~2025-10-13 14:53 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 14:54 Pavel Begunkov [this message]
2025-10-13 14:54 ` [PATCH net-next v4 01/24] net: page_pool: sanitise allocation order Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 02/24] docs: ethtool: document that rx_buf_len must control payload lengths Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 03/24] net: ethtool: report max value for rx-buf-len Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 04/24] net: use zero value to restore rx_buf_len to default Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 05/24] net: hns3: net: use zero " Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 06/24] net: clarify the meaning of netdev_config members Pavel Begunkov
2025-10-13 17:12 ` Randy Dunlap
2025-10-14 12:53 ` Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 07/24] net: add rx_buf_len to netdev config Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 08/24] eth: bnxt: read the page size from the adapter struct Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 09/24] eth: bnxt: set page pool page order based on rx_page_size Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 10/24] eth: bnxt: support setting size of agg buffers via ethtool Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 11/24] net: move netdev_config manipulation to dedicated helpers Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 12/24] net: reduce indent of struct netdev_queue_mgmt_ops members Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 13/24] net: allocate per-queue config structs and pass them thru the queue API Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 14/24] net: pass extack to netdev_rx_queue_restart() Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 15/24] net: add queue config validation callback Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 16/24] eth: bnxt: always set the queue mgmt ops Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 17/24] eth: bnxt: store the rx buf size per queue Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 18/24] eth: bnxt: adjust the fill level of agg queues with larger buffers Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 19/24] netdev: add support for setting rx-buf-len per queue Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 20/24] net: wipe the setting of deactived queues Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 21/24] eth: bnxt: use queue op config validate Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 22/24] eth: bnxt: support per queue configuration of rx-buf-len Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 23/24] net: let pp memory provider to specify rx buf len Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 24/24] net: validate driver supports passed qcfg params Pavel Begunkov
2025-10-13 15:03 ` [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
2025-10-13 17:54 ` Jakub Kicinski
2025-10-14 4:41 ` Mina Almasry
2025-10-14 12:50 ` Pavel Begunkov
2025-10-15 1:41 ` Jakub Kicinski
2025-10-15 17:44 ` Mina Almasry
2025-10-14 12:46 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1760364551.git.asml.silence@gmail.com \
--to=asml.silence@gmail.com \
--cc=alexanderduyck@fb.com \
--cc=almasrymina@google.com \
--cc=andrew@lunn.ch \
--cc=bbhushan2@marvell.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=dtatulea@nvidia.com \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=gakula@marvell.com \
--cc=hawk@kernel.org \
--cc=hkelam@marvell.com \
--cc=horms@kernel.org \
--cc=hramamurthy@google.com \
--cc=ilias.apalodimas@linaro.org \
--cc=joe@dama.to \
--cc=john.fastabend@gmail.com \
--cc=joshwash@google.com \
--cc=kernel-team@meta.com \
--cc=kuba@kernel.org \
--cc=leitao@debian.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pavan.chebbi@broadcom.com \
--cc=saeedm@nvidia.com \
--cc=salil.mehta@huawei.com \
--cc=sbhatta@marvell.com \
--cc=sdf@fomichev.me \
--cc=sgoutham@marvell.com \
--cc=shaojijie@huawei.com \
--cc=shenjian15@huawei.com \
--cc=tariqt@nvidia.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).