[PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
@ 2025-10-13 14:54 Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 01/24] net: page_pool: sanitise allocation order Pavel Begunkov
                   ` (25 more replies)
  0 siblings, 26 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

Add support for per-queue rx buffer length configuration based on [2]
and basic infrastructure for using it in memory providers like
io_uring/zcrx. Note, it only includes net/ patches and leaves out
zcrx to be merged separately. Large rx buffers can be beneficial with
hw-gro enabled cards that can coalesce traffic, which reduces the
number of frags traversing the network stack and resuling in larger
contiguous chunks of data given to the userspace.

Benchmarks with zcrx [2+3] show up to ~30% improvement in CPU util.
E.g. comparison for 4K vs 32K buffers with a 200Gbit NIC, napi and
userspace pinned to the same CPU:

packets=23987040 (MB=2745098), rps=199559 (MB/s=22837)
CPU    %usr   %nice    %sys %iowait    %irq   %soft   %idle
  0    1.53    0.00   27.78    2.72    1.31   66.45    0.22
packets=24078368 (MB=2755550), rps=200319 (MB/s=22924)
CPU    %usr   %nice    %sys %iowait    %irq   %soft   %idle
  0    0.69    0.00    8.26   31.65    1.83   57.00    0.57

netdev + zcrx changes:
[1] https://github.com/isilence/linux.git zcrx/large-buffers-v4

Per queue configuration series:
[2] https://lore.kernel.org/all/20250421222827.283737-1-kuba@kernel.org/

Liburing example:
[3] https://github.com/isilence/liburing.git zcrx/rx-buf-len

---
The following changes since commit 3a8660878839faadb4f1a6dd72c3179c1df56787:

  Linux 6.18-rc1 (2025-10-12 13:42:36 -0700)

are available in the Git repository at:

  https://github.com/isilence/linux.git tags/net-for-6.19-queue-rx-buf-len

for you to fetch changes up to bc5737ba2a1e5586408cd0398b2db0f218ed3e89:

  net: validate driver supports passed qcfg params (2025-10-13 10:04:05 +0100)


v4: - Update fbnic qops
    - Propagate max buf len for hns3
    - Use configured buf size in __bnxt_alloc_rx_netmem
    - Minor stylistic changes
v3: https://lore.kernel.org/all/cover.1755499375.git.asml.silence@gmail.com/
    - Rebased, excluded zcrx specific patches
    - Set agg_size_fac to 1 on warning
v2: https://lore.kernel.org/all/cover.1754657711.git.asml.silence@gmail.com/
    - Add MAX_PAGE_ORDER check on pp init
    - Applied comments rewording
    - Adjust pp.max_len based on order
    - Patch up mlx5 queue callbacks after rebase
    - Minor ->queue_mgmt_ops refactoring
    - Rebased to account for both fill level and agg_size_fac
    - Pass providers buf length in struct pp_memory_provider_params and
      apply it in __netdev_queue_confi().
    - Use ->supported_ring_params to validate drivers support of set
      qcfg parameters.

Jakub Kicinski (20):
  docs: ethtool: document that rx_buf_len must control payload lengths
  net: ethtool: report max value for rx-buf-len
  net: use zero value to restore rx_buf_len to default
  net: clarify the meaning of netdev_config members
  net: add rx_buf_len to netdev config
  eth: bnxt: read the page size from the adapter struct
  eth: bnxt: set page pool page order based on rx_page_size
  eth: bnxt: support setting size of agg buffers via ethtool
  net: move netdev_config manipulation to dedicated helpers
  net: reduce indent of struct netdev_queue_mgmt_ops members
  net: allocate per-queue config structs and pass them thru the queue
    API
  net: pass extack to netdev_rx_queue_restart()
  net: add queue config validation callback
  eth: bnxt: always set the queue mgmt ops
  eth: bnxt: store the rx buf size per queue
  eth: bnxt: adjust the fill level of agg queues with larger buffers
  netdev: add support for setting rx-buf-len per queue
  net: wipe the setting of deactived queues
  eth: bnxt: use queue op config validate
  eth: bnxt: support per queue configuration of rx-buf-len

Pavel Begunkov (4):
  net: page_pool: sanitise allocation order
  net: hns3: net: use zero to restore rx_buf_len to default
  net: let pp memory provider to specify rx buf len
  net: validate driver supports passed qcfg params

 Documentation/netlink/specs/ethtool.yaml      |   4 +
 Documentation/netlink/specs/netdev.yaml       |  15 ++
 Documentation/networking/ethtool-netlink.rst  |   7 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 148 +++++++++++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   5 +-
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   9 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |   6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +-
 drivers/net/ethernet/google/gve/gve_main.c    |   9 +-
 .../ethernet/hisilicon/hns3/hns3_ethtool.c    |  10 +-
 .../marvell/octeontx2/nic/otx2_ethtool.c      |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  10 +-
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c  |   8 +-
 drivers/net/netdevsim/netdev.c                |   8 +-
 include/linux/ethtool.h                       |   3 +
 include/net/netdev_queues.h                   |  88 +++++++--
 include/net/netdev_rx_queue.h                 |   3 +-
 include/net/netlink.h                         |  19 ++
 include/net/page_pool/types.h                 |   1 +
 .../uapi/linux/ethtool_netlink_generated.h    |   1 +
 include/uapi/linux/netdev.h                   |   2 +
 net/core/Makefile                             |   1 +
 net/core/dev.c                                |  12 +-
 net/core/dev.h                                |  15 ++
 net/core/netdev-genl-gen.c                    |  15 ++
 net/core/netdev-genl-gen.h                    |   1 +
 net/core/netdev-genl.c                        |  92 +++++++++
 net/core/netdev_config.c                      | 183 ++++++++++++++++++
 net/core/netdev_rx_queue.c                    |  22 ++-
 net/core/page_pool.c                          |   3 +
 net/ethtool/common.c                          |   4 +-
 net/ethtool/netlink.c                         |  14 +-
 net/ethtool/rings.c                           |  14 +-
 tools/include/uapi/linux/netdev.h             |   2 +
 34 files changed, 650 insertions(+), 92 deletions(-)
 create mode 100644 net/core/netdev_config.c

-- 
2.49.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 01/24] net: page_pool: sanitise allocation order
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 02/24] docs: ethtool: document that rx_buf_len must control payload lengths Pavel Begunkov
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

We're going to give more control over rx buffer sizes to user space, and
since we can't always rely on driver validation, let's sanitise it in
page_pool_init() as well. Note that we only need to reject over
MAX_PAGE_ORDER allocations for normal page pools, as current memory
providers don't need to use the buddy allocator and must check the order
on init.i

Suggested-by: Stanislav Fomichev <stfomichev@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/core/page_pool.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1a5edec485f1..635c77e8050b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -301,6 +301,9 @@ static int page_pool_init(struct page_pool *pool,
 		}
 
 		static_branch_inc(&page_pool_mem_providers);
+	} else if (pool->p.order > MAX_PAGE_ORDER) {
+		err = -EINVAL;
+		goto free_ptr_ring;
 	}
 
 	return 0;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 02/24] docs: ethtool: document that rx_buf_len must control payload lengths
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 01/24] net: page_pool: sanitise allocation order Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 03/24] net: ethtool: report max value for rx-buf-len Pavel Begunkov
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Document the semantics of the rx_buf_len ethtool ring param.
Clarify its meaning in case of HDS, where driver may have
two separate buffer pools.

The various zero-copy TCP Rx schemes suffer from memory management
overhead. Specifically applications aren't too impressed with the
number of 4kB buffers they have to juggle. Zero-copy TCP makes most
sense with larger memory transfers so using 16kB or 32kB buffers
(with the help of HW-GRO) feels more natural.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 Documentation/networking/ethtool-netlink.rst | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index b270886c5f5d..392a359a9cab 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -966,7 +966,6 @@ Kernel checks that requested ring sizes do not exceed limits reported by
 driver. Driver may impose additional constraints and may not support all
 attributes.
 
-
 ``ETHTOOL_A_RINGS_CQE_SIZE`` specifies the completion queue event size.
 Completion queue events (CQE) are the events posted by NIC to indicate the
 completion status of a packet when the packet is sent (like send success or
@@ -980,6 +979,11 @@ completion queue size can be adjusted in the driver if CQE size is modified.
 header / data split feature. If a received packet size is larger than this
 threshold value, header and data will be split.
 
+``ETHTOOL_A_RINGS_RX_BUF_LEN`` controls the size of the buffers driver
+uses to receive packets. If the device uses different buffer pools for
+headers and payload (due to HDS, HW-GRO etc.) this setting must
+control the size of the payload buffers.
+
 CHANNELS_GET
 ============
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 03/24] net: ethtool: report max value for rx-buf-len
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 01/24] net: page_pool: sanitise allocation order Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 02/24] docs: ethtool: document that rx_buf_len must control payload lengths Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 04/24] net: use zero value to restore rx_buf_len to default Pavel Begunkov
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Unlike most of our APIs the rx-buf-len param does not have an associated
max value. In theory user could set this value pretty high, but in
practice most NICs have limits due to the width of the length fields
in the descriptors.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 Documentation/netlink/specs/ethtool.yaml                  | 4 ++++
 Documentation/networking/ethtool-netlink.rst              | 1 +
 drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 3 ++-
 include/linux/ethtool.h                                   | 2 ++
 include/uapi/linux/ethtool_netlink_generated.h            | 1 +
 net/ethtool/rings.c                                       | 5 +++++
 6 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 6a0fb1974513..68e2b63ba970 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -452,6 +452,9 @@ attribute-sets:
       -
         name: hds-thresh-max
         type: u32
+      -
+        name: rx-buf-len-max
+        type: u32
 
   -
     name: mm-stat
@@ -2078,6 +2081,7 @@ operations:
             - rx-jumbo
             - tx
             - rx-buf-len
+            - rx-buf-len-max
             - tcp-data-split
             - cqe-size
             - tx-push
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 392a359a9cab..d96a6292f37b 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -902,6 +902,7 @@ Kernel response contents:
   ``ETHTOOL_A_RINGS_RX_JUMBO``              u32     size of RX jumbo ring
   ``ETHTOOL_A_RINGS_TX``                    u32     size of TX ring
   ``ETHTOOL_A_RINGS_RX_BUF_LEN``            u32     size of buffers on the ring
+  ``ETHTOOL_A_RINGS_RX_BUF_LEN_MAX``        u32     max size of rx buffers
   ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT``        u8      TCP header / data split
   ``ETHTOOL_A_RINGS_CQE_SIZE``              u32     Size of TX/RX CQE
   ``ETHTOOL_A_RINGS_TX_PUSH``               u8      flag of TX Push mode
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
index b90e23dc49de..19bcf52330d4 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
@@ -377,6 +377,7 @@ static void otx2_get_ringparam(struct net_device *netdev,
 	ring->tx_max_pending = Q_COUNT(Q_SIZE_MAX);
 	ring->tx_pending = qs->sqe_cnt ? qs->sqe_cnt : Q_COUNT(Q_SIZE_4K);
 	kernel_ring->rx_buf_len = pfvf->hw.rbuf_len;
+	kernel_ring->rx_buf_len_max = 32768;
 	kernel_ring->cqe_size = pfvf->hw.xqe_size;
 }
 
@@ -399,7 +400,7 @@ static int otx2_set_ringparam(struct net_device *netdev,
 	/* Hardware supports max size of 32k for a receive buffer
 	 * and 1536 is typical ethernet frame size.
 	 */
-	if (rx_buf_len && (rx_buf_len < 1536 || rx_buf_len > 32768)) {
+	if (rx_buf_len && rx_buf_len < 1536) {
 		netdev_err(netdev,
 			   "Receive buffer range is 1536 - 32768");
 		return -EINVAL;
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index c2d8b4ec62eb..26ef5ffdc435 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -77,6 +77,7 @@ enum {
 /**
  * struct kernel_ethtool_ringparam - RX/TX ring configuration
  * @rx_buf_len: Current length of buffers on the rx ring.
+ * @rx_buf_len_max: Max length of buffers on the rx ring.
  * @tcp_data_split: Scatter packet headers and data to separate buffers
  * @tx_push: The flag of tx push mode
  * @rx_push: The flag of rx push mode
@@ -89,6 +90,7 @@ enum {
  */
 struct kernel_ethtool_ringparam {
 	u32	rx_buf_len;
+	u32	rx_buf_len_max;
 	u8	tcp_data_split;
 	u8	tx_push;
 	u8	rx_push;
diff --git a/include/uapi/linux/ethtool_netlink_generated.h b/include/uapi/linux/ethtool_netlink_generated.h
index 0e8ac0d974e2..ae59d17bd7f2 100644
--- a/include/uapi/linux/ethtool_netlink_generated.h
+++ b/include/uapi/linux/ethtool_netlink_generated.h
@@ -192,6 +192,7 @@ enum {
 	ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX,
 	ETHTOOL_A_RINGS_HDS_THRESH,
 	ETHTOOL_A_RINGS_HDS_THRESH_MAX,
+	ETHTOOL_A_RINGS_RX_BUF_LEN_MAX,
 
 	__ETHTOOL_A_RINGS_CNT,
 	ETHTOOL_A_RINGS_MAX = (__ETHTOOL_A_RINGS_CNT - 1)
diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c
index aeedd5ec6b8c..5e872ceab5dd 100644
--- a/net/ethtool/rings.c
+++ b/net/ethtool/rings.c
@@ -105,6 +105,9 @@ static int rings_fill_reply(struct sk_buff *skb,
 			  ringparam->tx_pending)))  ||
 	    (kr->rx_buf_len &&
 	     (nla_put_u32(skb, ETHTOOL_A_RINGS_RX_BUF_LEN, kr->rx_buf_len))) ||
+	    (kr->rx_buf_len_max &&
+	     (nla_put_u32(skb, ETHTOOL_A_RINGS_RX_BUF_LEN_MAX,
+			  kr->rx_buf_len_max))) ||
 	    (kr->tcp_data_split &&
 	     (nla_put_u8(skb, ETHTOOL_A_RINGS_TCP_DATA_SPLIT,
 			 kr->tcp_data_split))) ||
@@ -281,6 +284,8 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info)
 		err_attr = tb[ETHTOOL_A_RINGS_TX];
 	else if (kernel_ringparam.hds_thresh > kernel_ringparam.hds_thresh_max)
 		err_attr = tb[ETHTOOL_A_RINGS_HDS_THRESH];
+	else if (kernel_ringparam.rx_buf_len > kernel_ringparam.rx_buf_len_max)
+		err_attr = tb[ETHTOOL_A_RINGS_RX_BUF_LEN];
 	else
 		err_attr = NULL;
 	if (err_attr) {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 04/24] net: use zero value to restore rx_buf_len to default
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (2 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 03/24] net: ethtool: report max value for rx-buf-len Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 05/24] net: hns3: net: use zero " Pavel Begunkov
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Distinguish between rx_buf_len being driver default vs user config.
Use 0 as a special value meaning "unset" or "restore driver default".
This will be necessary later on to configure it per-queue, but
the ability to restore defaults may be useful in itself.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 Documentation/networking/ethtool-netlink.rst              | 2 +-
 drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c | 3 +++
 include/linux/ethtool.h                                   | 1 +
 net/ethtool/rings.c                                       | 2 +-
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index d96a6292f37b..41d4d81a86d1 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -983,7 +983,7 @@ threshold value, header and data will be split.
 ``ETHTOOL_A_RINGS_RX_BUF_LEN`` controls the size of the buffers driver
 uses to receive packets. If the device uses different buffer pools for
 headers and payload (due to HDS, HW-GRO etc.) this setting must
-control the size of the payload buffers.
+control the size of the payload buffers. Setting to 0 restores driver default.
 
 CHANNELS_GET
 ============
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
index 19bcf52330d4..ada6244445da 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
@@ -397,6 +397,9 @@ static int otx2_set_ringparam(struct net_device *netdev,
 	if (ring->rx_mini_pending || ring->rx_jumbo_pending)
 		return -EINVAL;
 
+	if (!rx_buf_len)
+		rx_buf_len = OTX2_DEFAULT_RBUF_LEN;
+
 	/* Hardware supports max size of 32k for a receive buffer
 	 * and 1536 is typical ethernet frame size.
 	 */
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 26ef5ffdc435..0e6023df3ee9 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -77,6 +77,7 @@ enum {
 /**
  * struct kernel_ethtool_ringparam - RX/TX ring configuration
  * @rx_buf_len: Current length of buffers on the rx ring.
+ *		Setting to 0 means reset to driver default.
  * @rx_buf_len_max: Max length of buffers on the rx ring.
  * @tcp_data_split: Scatter packet headers and data to separate buffers
  * @tx_push: The flag of tx push mode
diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c
index 5e872ceab5dd..628546a1827b 100644
--- a/net/ethtool/rings.c
+++ b/net/ethtool/rings.c
@@ -139,7 +139,7 @@ const struct nla_policy ethnl_rings_set_policy[] = {
 	[ETHTOOL_A_RINGS_RX_MINI]		= { .type = NLA_U32 },
 	[ETHTOOL_A_RINGS_RX_JUMBO]		= { .type = NLA_U32 },
 	[ETHTOOL_A_RINGS_TX]			= { .type = NLA_U32 },
-	[ETHTOOL_A_RINGS_RX_BUF_LEN]            = NLA_POLICY_MIN(NLA_U32, 1),
+	[ETHTOOL_A_RINGS_RX_BUF_LEN]            = { .type = NLA_U32 },
 	[ETHTOOL_A_RINGS_TCP_DATA_SPLIT]	=
 		NLA_POLICY_MAX(NLA_U8, ETHTOOL_TCP_DATA_SPLIT_ENABLED),
 	[ETHTOOL_A_RINGS_CQE_SIZE]		= NLA_POLICY_MIN(NLA_U32, 1),
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 05/24] net: hns3: net: use zero to restore rx_buf_len to default
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (3 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 04/24] net: use zero value to restore rx_buf_len to default Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 06/24] net: clarify the meaning of netdev_config members Pavel Begunkov
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

As in previous commit, restore the default rx_buf_len value if the user
passes 0. Also initialise rx_buf_len_max.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index a5eefa28454c..3d3acc2b9402 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -10,6 +10,9 @@
 #include "hns3_enet.h"
 #include "hns3_ethtool.h"
 
+#define RX_BUF_LEN_2K 2048
+#define RX_BUF_LEN_4K 4096
+
 /* tqp related stats */
 #define HNS3_TQP_STAT(_string, _member)	{			\
 	.stats_string = _string,				\
@@ -684,6 +687,7 @@ static void hns3_get_ringparam(struct net_device *netdev,
 	param->tx_pending = priv->ring[0].desc_num;
 	param->rx_pending = priv->ring[rx_queue_index].desc_num;
 	kernel_param->rx_buf_len = priv->ring[rx_queue_index].buf_size;
+	kernel_param->rx_buf_len_max = RX_BUF_LEN_4K;
 	kernel_param->tx_push = test_bit(HNS3_NIC_STATE_TX_PUSH_ENABLE,
 					 &priv->state);
 }
@@ -1113,9 +1117,6 @@ static int hns3_check_ringparam(struct net_device *ndev,
 				struct ethtool_ringparam *param,
 				struct kernel_ethtool_ringparam *kernel_param)
 {
-#define RX_BUF_LEN_2K 2048
-#define RX_BUF_LEN_4K 4096
-
 	struct hns3_nic_priv *priv = netdev_priv(ndev);
 
 	if (hns3_nic_resetting(ndev) || !priv->ring) {
@@ -1127,6 +1128,9 @@ static int hns3_check_ringparam(struct net_device *ndev,
 	if (param->rx_mini_pending || param->rx_jumbo_pending)
 		return -EINVAL;
 
+	if (!kernel_param->rx_buf_len)
+		kernel_param->rx_buf_len = RX_BUF_LEN_2K;
+
 	if (kernel_param->rx_buf_len != RX_BUF_LEN_2K &&
 	    kernel_param->rx_buf_len != RX_BUF_LEN_4K) {
 		netdev_err(ndev, "Rx buf len only support 2048 and 4096\n");
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 06/24] net: clarify the meaning of netdev_config members
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (4 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 05/24] net: hns3: net: use zero " Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 17:12   ` Randy Dunlap
  2025-10-13 14:54 ` [PATCH net-next v4 07/24] net: add rx_buf_len to netdev config Pavel Begunkov
                   ` (19 subsequent siblings)
  25 siblings, 1 reply; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

hds_thresh and hds_config are both inside struct netdev_config
but have quite different semantics. hds_config is the user config
with ternary semantics (on/off/unset). hds_thresh is a straight
up value, populated by the driver at init and only modified by
user space. We don't expect the drivers to have to pick a special
hds_thresh value based on other configuration.

The two approaches have different advantages and downsides.
hds_thresh ("direct value") gives core easy access to current
device settings, but there's no way to express whether the value
comes from the user. It also requires the initialization by
the driver.

hds_config ("user config values") tells us what user wanted, but
doesn't give us the current value in the core.

Try to explain this a bit in the comments, so at we make a conscious
choice for new values which semantics we expect.

Move the init inside ethtool_ringparam_get_cfg() to reflect the semantics.
Commit 216a61d33c07 ("net: ethtool: fix ethtool_ringparam_get_cfg()
returns a hds_thresh value always as 0.") added the setting for the
benefit of netdevsim which doesn't touch the value at all on get.
Again, this is just to clarify the intention, shouldn't cause any
functional change.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[pavel: applied clarification on relationship b/w HDS thresh and config]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/netdev_queues.h | 20 ++++++++++++++++++--
 net/ethtool/common.c        |  3 ++-
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index cd00e0406cf4..9d5dde36c2e5 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -6,11 +6,27 @@
 
 /**
  * struct netdev_config - queue-related configuration for a netdev
- * @hds_thresh:		HDS Threshold value.
- * @hds_config:		HDS value from userspace.
  */
 struct netdev_config {
+	/* Direct value
+	 *
+	 * Driver default is expected to be fixed, and set in this struct
+	 * at init. From that point on user may change the value. There is
+	 * no explicit way to "unset" / restore driver default. Used only
+	 * when @hds_config is set.
+	 */
+	/** @hds_thresh: HDS Threshold value (ETHTOOL_A_RINGS_HDS_THRESH).
+	 */
 	u32	hds_thresh;
+
+	/* User config values
+	 *
+	 * Contain user configuration. If "set" driver must obey.
+	 * If "unset" driver is free to decide, and may change its choice
+	 * as other parameters change.
+	 */
+	/** @hds_config: HDS enabled (ETHTOOL_A_RINGS_TCP_DATA_SPLIT).
+	 */
 	u8	hds_config;
 };
 
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 55223ebc2a7e..eeb257d9ab48 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -902,12 +902,13 @@ void ethtool_ringparam_get_cfg(struct net_device *dev,
 	memset(param, 0, sizeof(*param));
 	memset(kparam, 0, sizeof(*kparam));
 
+	kparam->hds_thresh = dev->cfg->hds_thresh;
+
 	param->cmd = ETHTOOL_GRINGPARAM;
 	dev->ethtool_ops->get_ringparam(dev, param, kparam, extack);
 
 	/* Driver gives us current state, we want to return current config */
 	kparam->tcp_data_split = dev->cfg->hds_config;
-	kparam->hds_thresh = dev->cfg->hds_thresh;
 }
 
 static void ethtool_init_tsinfo(struct kernel_ethtool_ts_info *info)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 07/24] net: add rx_buf_len to netdev config
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (5 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 06/24] net: clarify the meaning of netdev_config members Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 08/24] eth: bnxt: read the page size from the adapter struct Pavel Begunkov
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Add rx_buf_len to configuration maintained by the core.
Use "three-state" semantics where 0 means "driver default".

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/netdev_queues.h | 4 ++++
 net/ethtool/common.c        | 1 +
 net/ethtool/rings.c         | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 9d5dde36c2e5..31559f2711de 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -25,6 +25,10 @@ struct netdev_config {
 	 * If "unset" driver is free to decide, and may change its choice
 	 * as other parameters change.
 	 */
+	/** @rx_buf_len: Size of buffers on the Rx ring
+	 *		 (ETHTOOL_A_RINGS_RX_BUF_LEN).
+	 */
+	u32	rx_buf_len;
 	/** @hds_config: HDS enabled (ETHTOOL_A_RINGS_TCP_DATA_SPLIT).
 	 */
 	u8	hds_config;
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index eeb257d9ab48..2f05359d9782 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -909,6 +909,7 @@ void ethtool_ringparam_get_cfg(struct net_device *dev,
 
 	/* Driver gives us current state, we want to return current config */
 	kparam->tcp_data_split = dev->cfg->hds_config;
+	kparam->rx_buf_len = dev->cfg->rx_buf_len;
 }
 
 static void ethtool_init_tsinfo(struct kernel_ethtool_ts_info *info)
diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c
index 628546a1827b..6a74e7e4064e 100644
--- a/net/ethtool/rings.c
+++ b/net/ethtool/rings.c
@@ -41,6 +41,7 @@ static int rings_prepare_data(const struct ethnl_req_info *req_base,
 		return ret;
 
 	data->kernel_ringparam.tcp_data_split = dev->cfg->hds_config;
+	data->kernel_ringparam.rx_buf_len = dev->cfg->rx_buf_len;
 	data->kernel_ringparam.hds_thresh = dev->cfg->hds_thresh;
 
 	dev->ethtool_ops->get_ringparam(dev, &data->ringparam,
@@ -302,6 +303,7 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info)
 		return -EINVAL;
 	}
 
+	dev->cfg_pending->rx_buf_len = kernel_ringparam.rx_buf_len;
 	dev->cfg_pending->hds_config = kernel_ringparam.tcp_data_split;
 	dev->cfg_pending->hds_thresh = kernel_ringparam.hds_thresh;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 08/24] eth: bnxt: read the page size from the adapter struct
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (6 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 07/24] net: add rx_buf_len to netdev config Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 09/24] eth: bnxt: set page pool page order based on rx_page_size Pavel Begunkov
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Switch from using a constant to storing the BNXT_RX_PAGE_SIZE
inside struct bnxt. This will allow configuring the page size
at runtime in subsequent patches.

The MSS size calculation for older chip continues to use the constant.
I'm intending to support the configuration only on more recent HW,
looks like on older chips setting this per queue won't work,
and that's the ultimate goal.

This patch should not change the current behavior as value
read from the struct will always be BNXT_RX_PAGE_SIZE at this stage.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[pavel: place const on right side in comparisons]
[pavel: update __bnxt_alloc_rx_netmem's size check]
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 32 ++++++++++---------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  4 +--
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 3fc33b1b4dfb..13286f4a2fa7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -905,7 +905,7 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
 
 static bool bnxt_separate_head_pool(struct bnxt_rx_ring_info *rxr)
 {
-	return rxr->need_head_pool || PAGE_SIZE > BNXT_RX_PAGE_SIZE;
+	return rxr->need_head_pool || rxr->bnapi->bp->rx_page_size < PAGE_SIZE;
 }
 
 static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
@@ -915,9 +915,9 @@ static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
 {
 	struct page *page;
 
-	if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) {
+	if (bp->rx_page_size < PAGE_SIZE) {
 		page = page_pool_dev_alloc_frag(rxr->page_pool, offset,
-						BNXT_RX_PAGE_SIZE);
+						bp->rx_page_size);
 	} else {
 		page = page_pool_dev_alloc_pages(rxr->page_pool);
 		*offset = 0;
@@ -936,8 +936,9 @@ static netmem_ref __bnxt_alloc_rx_netmem(struct bnxt *bp, dma_addr_t *mapping,
 {
 	netmem_ref netmem;
 
-	if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) {
-		netmem = page_pool_alloc_frag_netmem(rxr->page_pool, offset, BNXT_RX_PAGE_SIZE, gfp);
+	if (bp->rx_page_size < PAGE_SIZE) {
+		netmem = page_pool_alloc_frag_netmem(rxr->page_pool, offset,
+						     bp->rx_page_size, gfp);
 	} else {
 		netmem = page_pool_alloc_netmems(rxr->page_pool, gfp);
 		*offset = 0;
@@ -1155,9 +1156,9 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
 		return NULL;
 	}
 	dma_addr -= bp->rx_dma_offset;
-	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE,
+	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, bp->rx_page_size,
 				bp->rx_dir);
-	skb = napi_build_skb(data_ptr - bp->rx_offset, BNXT_RX_PAGE_SIZE);
+	skb = napi_build_skb(data_ptr - bp->rx_offset, bp->rx_page_size);
 	if (!skb) {
 		page_pool_recycle_direct(rxr->page_pool, page);
 		return NULL;
@@ -1189,7 +1190,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 		return NULL;
 	}
 	dma_addr -= bp->rx_dma_offset;
-	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE,
+	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, bp->rx_page_size,
 				bp->rx_dir);
 
 	if (unlikely(!payload))
@@ -1203,7 +1204,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 
 	skb_mark_for_recycle(skb);
 	off = (void *)data_ptr - page_address(page);
-	skb_add_rx_frag(skb, 0, page, off, len, BNXT_RX_PAGE_SIZE);
+	skb_add_rx_frag(skb, 0, page, off, len, bp->rx_page_size);
 	memcpy(skb->data - NET_IP_ALIGN, data_ptr - NET_IP_ALIGN,
 	       payload + NET_IP_ALIGN);
 
@@ -1288,7 +1289,7 @@ static u32 __bnxt_rx_agg_netmems(struct bnxt *bp,
 		if (skb) {
 			skb_add_rx_frag_netmem(skb, i, cons_rx_buf->netmem,
 					       cons_rx_buf->offset,
-					       frag_len, BNXT_RX_PAGE_SIZE);
+					       frag_len, bp->rx_page_size);
 		} else {
 			skb_frag_t *frag = &shinfo->frags[i];
 
@@ -1313,7 +1314,7 @@ static u32 __bnxt_rx_agg_netmems(struct bnxt *bp,
 			if (skb) {
 				skb->len -= frag_len;
 				skb->data_len -= frag_len;
-				skb->truesize -= BNXT_RX_PAGE_SIZE;
+				skb->truesize -= bp->rx_page_size;
 			}
 
 			--shinfo->nr_frags;
@@ -1328,7 +1329,7 @@ static u32 __bnxt_rx_agg_netmems(struct bnxt *bp,
 		}
 
 		page_pool_dma_sync_netmem_for_cpu(rxr->page_pool, netmem, 0,
-						  BNXT_RX_PAGE_SIZE);
+						  bp->rx_page_size);
 
 		total_frag_len += frag_len;
 		prod = NEXT_RX_AGG(prod);
@@ -4478,7 +4479,7 @@ static void bnxt_init_one_rx_agg_ring_rxbd(struct bnxt *bp,
 	ring = &rxr->rx_agg_ring_struct;
 	ring->fw_ring_id = INVALID_HW_RING_ID;
 	if ((bp->flags & BNXT_FLAG_AGG_RINGS)) {
-		type = ((u32)BNXT_RX_PAGE_SIZE << RX_BD_LEN_SHIFT) |
+		type = ((u32)bp->rx_page_size << RX_BD_LEN_SHIFT) |
 			RX_BD_TYPE_RX_AGG_BD | RX_BD_FLAGS_SOP;
 
 		bnxt_init_rxbd_pages(ring, type);
@@ -4740,7 +4741,7 @@ void bnxt_set_ring_params(struct bnxt *bp)
 	bp->rx_agg_nr_pages = 0;
 
 	if (bp->flags & BNXT_FLAG_TPA || bp->flags & BNXT_FLAG_HDS)
-		agg_factor = min_t(u32, 4, 65536 / BNXT_RX_PAGE_SIZE);
+		agg_factor = min_t(u32, 4, 65536 / bp->rx_page_size);
 
 	bp->flags &= ~BNXT_FLAG_JUMBO;
 	if (rx_space > PAGE_SIZE && !(bp->flags & BNXT_FLAG_NO_AGG_RINGS)) {
@@ -7054,7 +7055,7 @@ static void bnxt_set_rx_ring_params_p5(struct bnxt *bp, u32 ring_type,
 	if (ring_type == HWRM_RING_ALLOC_AGG) {
 		req->ring_type = RING_ALLOC_REQ_RING_TYPE_RX_AGG;
 		req->rx_ring_id = cpu_to_le16(grp_info->rx_fw_ring_id);
-		req->rx_buf_size = cpu_to_le16(BNXT_RX_PAGE_SIZE);
+		req->rx_buf_size = cpu_to_le16(bp->rx_page_size);
 		enables |= RING_ALLOC_REQ_ENABLES_RX_RING_ID_VALID;
 	} else {
 		req->rx_buf_size = cpu_to_le16(bp->rx_buf_use_size);
@@ -16631,6 +16632,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	bp = netdev_priv(dev);
 	bp->board_idx = ent->driver_data;
 	bp->msg_enable = BNXT_DEF_MSG_ENABLE;
+	bp->rx_page_size = BNXT_RX_PAGE_SIZE;
 	bnxt_set_max_func_irqs(bp, max_irqs);
 
 	if (bnxt_vf_pciid(bp->board_idx))
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 741b2d854789..bbf4ff49ac0f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2361,6 +2361,7 @@ struct bnxt {
 	u16			max_tpa;
 	u32			rx_buf_size;
 	u32			rx_buf_use_size;	/* useable size */
+	u16			rx_page_size;
 	u16			rx_offset;
 	u16			rx_dma_offset;
 	enum dma_data_direction	rx_dir;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 3e77a96e5a3e..c23c04007136 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -183,7 +183,7 @@ void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 			u16 cons, u8 *data_ptr, unsigned int len,
 			struct xdp_buff *xdp)
 {
-	u32 buflen = BNXT_RX_PAGE_SIZE;
+	u32 buflen = bp->rx_page_size;
 	struct bnxt_sw_rx_bd *rx_buf;
 	struct pci_dev *pdev;
 	dma_addr_t mapping;
@@ -469,7 +469,7 @@ bnxt_xdp_build_skb(struct bnxt *bp, struct sk_buff *skb, u8 num_frags,
 		return NULL;
 
 	xdp_update_skb_frags_info(skb, num_frags, sinfo->xdp_frags_size,
-				  BNXT_RX_PAGE_SIZE * num_frags,
+				  bp->rx_page_size * num_frags,
 				  xdp_buff_get_skb_flags(xdp));
 	return skb;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 09/24] eth: bnxt: set page pool page order based on rx_page_size
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (7 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 08/24] eth: bnxt: read the page size from the adapter struct Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 10/24] eth: bnxt: support setting size of agg buffers via ethtool Pavel Begunkov
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

If user decides to increase the buffer size for agg ring
we need to ask the page pool for higher order pages.
There is no need to use larger pages for header frags,
if user increase the size of agg ring buffers switch
to separate header page automatically.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[pavel: adjust max_len]
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 13286f4a2fa7..5c57b2a5c51c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3829,11 +3829,13 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	pp.pool_size = bp->rx_agg_ring_size / agg_size_fac;
 	if (BNXT_RX_PAGE_MODE(bp))
 		pp.pool_size += bp->rx_ring_size / rx_size_fac;
+
+	pp.order = get_order(bp->rx_page_size);
 	pp.nid = numa_node;
 	pp.netdev = bp->dev;
 	pp.dev = &bp->pdev->dev;
 	pp.dma_dir = bp->rx_dir;
-	pp.max_len = PAGE_SIZE;
+	pp.max_len = PAGE_SIZE << pp.order;
 	pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV |
 		   PP_FLAG_ALLOW_UNREADABLE_NETMEM;
 	pp.queue_idx = rxr->bnapi->index;
@@ -3844,7 +3846,10 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	rxr->page_pool = pool;
 
 	rxr->need_head_pool = page_pool_is_unreadable(pool);
+	rxr->need_head_pool |= !!pp.order;
 	if (bnxt_separate_head_pool(rxr)) {
+		pp.order = 0;
+		pp.max_len = PAGE_SIZE;
 		pp.pool_size = min(bp->rx_ring_size / rx_size_fac, 1024);
 		pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
 		pool = page_pool_create(&pp);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 10/24] eth: bnxt: support setting size of agg buffers via ethtool
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (8 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 09/24] eth: bnxt: set page pool page order based on rx_page_size Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 11/24] net: move netdev_config manipulation to dedicated helpers Pavel Begunkov
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

bnxt seems to be able to aggregate data up to 32kB without any issue.
The driver is already capable of doing this for systems with higher
order pages. While for systems with 4k pages we historically preferred
to stick to small buffers because they are easier to allocate, the
zero-copy APIs remove the allocation problem. The ZC mem is
pre-allocated and fixed size.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  3 ++-
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 21 ++++++++++++++++++-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index bbf4ff49ac0f..3abe59e9b021 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -758,7 +758,8 @@ struct nqe_cn {
 #define BNXT_RX_PAGE_SHIFT PAGE_SHIFT
 #endif
 
-#define BNXT_RX_PAGE_SIZE (1 << BNXT_RX_PAGE_SHIFT)
+#define BNXT_MAX_RX_PAGE_SIZE	(1 << 15)
+#define BNXT_RX_PAGE_SIZE	(1 << BNXT_RX_PAGE_SHIFT)
 
 #define BNXT_MAX_MTU		9500
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 41686a6f84b5..7b5b9781262d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -835,6 +835,8 @@ static void bnxt_get_ringparam(struct net_device *dev,
 	ering->rx_jumbo_pending = bp->rx_agg_ring_size;
 	ering->tx_pending = bp->tx_ring_size;
 
+	kernel_ering->rx_buf_len_max = BNXT_MAX_RX_PAGE_SIZE;
+	kernel_ering->rx_buf_len = bp->rx_page_size;
 	kernel_ering->hds_thresh_max = BNXT_HDS_THRESHOLD_MAX;
 }
 
@@ -862,6 +864,21 @@ static int bnxt_set_ringparam(struct net_device *dev,
 		return -EINVAL;
 	}
 
+	if (!kernel_ering->rx_buf_len)	/* Zero means restore default */
+		kernel_ering->rx_buf_len = BNXT_RX_PAGE_SIZE;
+
+	if (kernel_ering->rx_buf_len != bp->rx_page_size &&
+	    !(bp->flags & BNXT_FLAG_CHIP_P5_PLUS)) {
+		NL_SET_ERR_MSG_MOD(extack, "changing rx-buf-len not supported");
+		return -EINVAL;
+	}
+	if (!is_power_of_2(kernel_ering->rx_buf_len) ||
+	    kernel_ering->rx_buf_len < BNXT_RX_PAGE_SIZE ||
+	    kernel_ering->rx_buf_len > BNXT_MAX_RX_PAGE_SIZE) {
+		NL_SET_ERR_MSG_MOD(extack, "rx-buf-len out of range, or not power of 2");
+		return -ERANGE;
+	}
+
 	if (netif_running(dev))
 		bnxt_close_nic(bp, false, false);
 
@@ -874,6 +891,7 @@ static int bnxt_set_ringparam(struct net_device *dev,
 
 	bp->rx_ring_size = ering->rx_pending;
 	bp->tx_ring_size = ering->tx_pending;
+	bp->rx_page_size = kernel_ering->rx_buf_len;
 	bnxt_set_ring_params(bp);
 
 	if (netif_running(dev))
@@ -5577,7 +5595,8 @@ const struct ethtool_ops bnxt_ethtool_ops = {
 				     ETHTOOL_COALESCE_STATS_BLOCK_USECS |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
 				     ETHTOOL_COALESCE_USE_CQE,
-	.supported_ring_params	= ETHTOOL_RING_USE_TCP_DATA_SPLIT |
+	.supported_ring_params	= ETHTOOL_RING_USE_RX_BUF_LEN |
+				  ETHTOOL_RING_USE_TCP_DATA_SPLIT |
 				  ETHTOOL_RING_USE_HDS_THRS,
 	.get_link_ksettings	= bnxt_get_link_ksettings,
 	.set_link_ksettings	= bnxt_set_link_ksettings,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 11/24] net: move netdev_config manipulation to dedicated helpers
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (9 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 10/24] eth: bnxt: support setting size of agg buffers via ethtool Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 12/24] net: reduce indent of struct netdev_queue_mgmt_ops members Pavel Begunkov
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

netdev_config manipulation will become slightly more complicated
soon and we will need to call if from ethtool as well as queue API.
Encapsulate the logic into helper functions.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/core/Makefile        |  1 +
 net/core/dev.c           |  7 ++-----
 net/core/dev.h           |  5 +++++
 net/core/netdev_config.c | 43 ++++++++++++++++++++++++++++++++++++++++
 net/ethtool/netlink.c    | 14 ++++++-------
 5 files changed, 57 insertions(+), 13 deletions(-)
 create mode 100644 net/core/netdev_config.c

diff --git a/net/core/Makefile b/net/core/Makefile
index 9ef2099c5426..9f1f08ff585f 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -21,6 +21,7 @@ obj-y += net-sysfs.o
 obj-y += hotdata.o
 obj-y += netdev_rx_queue.o
 obj-y += netdev_queues.o
+obj-y += netdev_config.o
 obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
 obj-$(CONFIG_PROC_FS) += net-procfs.o
 obj-$(CONFIG_NET_PKTGEN) += pktgen.o
diff --git a/net/core/dev.c b/net/core/dev.c
index a64cef2c537e..5f92425dfdbd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11973,10 +11973,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	if (!dev->ethtool)
 		goto free_all;
 
-	dev->cfg = kzalloc(sizeof(*dev->cfg), GFP_KERNEL_ACCOUNT);
-	if (!dev->cfg)
+	if (netdev_alloc_config(dev))
 		goto free_all;
-	dev->cfg_pending = dev->cfg;
 
 	dev->num_napi_configs = maxqs;
 	napi_config_sz = array_size(maxqs, sizeof(*dev->napi_config));
@@ -12047,8 +12045,7 @@ void free_netdev(struct net_device *dev)
 		return;
 	}
 
-	WARN_ON(dev->cfg != dev->cfg_pending);
-	kfree(dev->cfg);
+	netdev_free_config(dev);
 	kfree(dev->ethtool);
 	netif_free_tx_queues(dev);
 	netif_free_rx_queues(dev);
diff --git a/net/core/dev.h b/net/core/dev.h
index 900880e8b5b4..1ec0b836c652 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -92,6 +92,11 @@ extern struct rw_semaphore dev_addr_sem;
 extern struct list_head net_todo_list;
 void netdev_run_todo(void);
 
+int netdev_alloc_config(struct net_device *dev);
+void __netdev_free_config(struct netdev_config *cfg);
+void netdev_free_config(struct net_device *dev);
+int netdev_reconfig_start(struct net_device *dev);
+
 /* netdev management, shared between various uAPI entry points */
 struct netdev_name_node {
 	struct hlist_node hlist;
diff --git a/net/core/netdev_config.c b/net/core/netdev_config.c
new file mode 100644
index 000000000000..270b7f10a192
--- /dev/null
+++ b/net/core/netdev_config.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/netdevice.h>
+#include <net/netdev_queues.h>
+
+#include "dev.h"
+
+int netdev_alloc_config(struct net_device *dev)
+{
+	struct netdev_config *cfg;
+
+	cfg = kzalloc(sizeof(*dev->cfg), GFP_KERNEL_ACCOUNT);
+	if (!cfg)
+		return -ENOMEM;
+
+	dev->cfg = cfg;
+	dev->cfg_pending = cfg;
+	return 0;
+}
+
+void __netdev_free_config(struct netdev_config *cfg)
+{
+	kfree(cfg);
+}
+
+void netdev_free_config(struct net_device *dev)
+{
+	WARN_ON(dev->cfg != dev->cfg_pending);
+	__netdev_free_config(dev->cfg);
+}
+
+int netdev_reconfig_start(struct net_device *dev)
+{
+	struct netdev_config *cfg;
+
+	WARN_ON(dev->cfg != dev->cfg_pending);
+	cfg = kmemdup(dev->cfg, sizeof(*dev->cfg), GFP_KERNEL_ACCOUNT);
+	if (!cfg)
+		return -ENOMEM;
+
+	dev->cfg_pending = cfg;
+	return 0;
+}
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 2f813f25f07e..d376d3043177 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -6,6 +6,7 @@
 #include <linux/ethtool_netlink.h>
 #include <linux/phy_link_topology.h>
 #include <linux/pm_runtime.h>
+#include "../core/dev.h"
 #include "netlink.h"
 #include "module_fw.h"
 
@@ -906,12 +907,9 @@ static int ethnl_default_set_doit(struct sk_buff *skb, struct genl_info *info)
 
 	rtnl_lock();
 	netdev_lock_ops(dev);
-	dev->cfg_pending = kmemdup(dev->cfg, sizeof(*dev->cfg),
-				   GFP_KERNEL_ACCOUNT);
-	if (!dev->cfg_pending) {
-		ret = -ENOMEM;
-		goto out_tie_cfg;
-	}
+	ret = netdev_reconfig_start(dev);
+	if (ret)
+		goto out_unlock;
 
 	ret = ethnl_ops_begin(dev);
 	if (ret < 0)
@@ -930,9 +928,9 @@ static int ethnl_default_set_doit(struct sk_buff *skb, struct genl_info *info)
 out_ops:
 	ethnl_ops_complete(dev);
 out_free_cfg:
-	kfree(dev->cfg_pending);
-out_tie_cfg:
+	__netdev_free_config(dev->cfg_pending);
 	dev->cfg_pending = dev->cfg;
+out_unlock:
 	netdev_unlock_ops(dev);
 	rtnl_unlock();
 out_dev:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 12/24] net: reduce indent of struct netdev_queue_mgmt_ops members
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (10 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 11/24] net: move netdev_config manipulation to dedicated helpers Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 13/24] net: allocate per-queue config structs and pass them thru the queue API Pavel Begunkov
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Trivial change, reduce the indent. I think the original is copied
from real NDOs. It's unnecessarily deep, makes passing struct args
problematic.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/netdev_queues.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 31559f2711de..b7c9895cd4b2 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -155,20 +155,20 @@ void netdev_stat_queue_sum(struct net_device *netdev,
  * be called for an interface which is open.
  */
 struct netdev_queue_mgmt_ops {
-	size_t			ndo_queue_mem_size;
-	int			(*ndo_queue_mem_alloc)(struct net_device *dev,
-						       void *per_queue_mem,
-						       int idx);
-	void			(*ndo_queue_mem_free)(struct net_device *dev,
-						      void *per_queue_mem);
-	int			(*ndo_queue_start)(struct net_device *dev,
-						   void *per_queue_mem,
-						   int idx);
-	int			(*ndo_queue_stop)(struct net_device *dev,
-						  void *per_queue_mem,
-						  int idx);
-	struct device *		(*ndo_queue_get_dma_dev)(struct net_device *dev,
-							 int idx);
+	size_t	ndo_queue_mem_size;
+	int	(*ndo_queue_mem_alloc)(struct net_device *dev,
+				       void *per_queue_mem,
+				       int idx);
+	void	(*ndo_queue_mem_free)(struct net_device *dev,
+				      void *per_queue_mem);
+	int	(*ndo_queue_start)(struct net_device *dev,
+				   void *per_queue_mem,
+				   int idx);
+	int	(*ndo_queue_stop)(struct net_device *dev,
+				  void *per_queue_mem,
+				  int idx);
+	struct device *	(*ndo_queue_get_dma_dev)(struct net_device *dev,
+						 int idx);
 };
 
 bool netif_rxq_has_unreadable_mp(struct net_device *dev, int idx);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 13/24] net: allocate per-queue config structs and pass them thru the queue API
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (11 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 12/24] net: reduce indent of struct netdev_queue_mgmt_ops members Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 14/24] net: pass extack to netdev_rx_queue_restart() Pavel Begunkov
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Create an array of config structs to store per-queue config.
Pass these structs in the queue API. Drivers can also retrieve
the config for a single queue calling netdev_queue_config()
directly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[pavel: patch up mlx callbacks with unused qcfg]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  8 ++-
 drivers/net/ethernet/google/gve/gve_main.c    |  9 ++-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 10 ++--
 drivers/net/ethernet/meta/fbnic/fbnic_txrx.c  |  8 ++-
 drivers/net/netdevsim/netdev.c                |  6 +-
 include/net/netdev_queues.h                   | 19 ++++++
 net/core/dev.h                                |  3 +
 net/core/netdev_config.c                      | 58 +++++++++++++++++++
 net/core/netdev_rx_queue.c                    | 11 +++-
 9 files changed, 116 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5c57b2a5c51c..7d7a9d5bc566 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -15882,7 +15882,9 @@ static const struct netdev_stat_ops bnxt_stat_ops = {
 	.get_base_stats		= bnxt_get_base_stats,
 };
 
-static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
+static int bnxt_queue_mem_alloc(struct net_device *dev,
+				struct netdev_queue_config *qcfg,
+				void *qmem, int idx)
 {
 	struct bnxt_rx_ring_info *rxr, *clone;
 	struct bnxt *bp = netdev_priv(dev);
@@ -16048,7 +16050,9 @@ static void bnxt_copy_rx_ring(struct bnxt *bp,
 	dst->rx_agg_bmap = src->rx_agg_bmap;
 }
 
-static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
+static int bnxt_queue_start(struct net_device *dev,
+			    struct netdev_queue_config *qcfg,
+			    void *qmem, int idx)
 {
 	struct bnxt *bp = netdev_priv(dev);
 	struct bnxt_rx_ring_info *rxr, *clone;
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 1be1b1ef31ee..a49c28f3667b 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -2580,8 +2580,9 @@ static void gve_rx_queue_mem_free(struct net_device *dev, void *per_q_mem)
 		gve_rx_free_ring_dqo(priv, gve_per_q_mem, &cfg);
 }
 
-static int gve_rx_queue_mem_alloc(struct net_device *dev, void *per_q_mem,
-				  int idx)
+static int gve_rx_queue_mem_alloc(struct net_device *dev,
+				  struct netdev_queue_config *qcfg,
+				  void *per_q_mem, int idx)
 {
 	struct gve_priv *priv = netdev_priv(dev);
 	struct gve_rx_alloc_rings_cfg cfg = {0};
@@ -2602,7 +2603,9 @@ static int gve_rx_queue_mem_alloc(struct net_device *dev, void *per_q_mem,
 	return err;
 }
 
-static int gve_rx_queue_start(struct net_device *dev, void *per_q_mem, int idx)
+static int gve_rx_queue_start(struct net_device *dev,
+			      struct netdev_queue_config *qcfg,
+			      void *per_q_mem, int idx)
 {
 	struct gve_priv *priv = netdev_priv(dev);
 	struct gve_rx_ring *gve_per_q_mem;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a56825921c23..75757abeb3dd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5562,8 +5562,9 @@ struct mlx5_qmgmt_data {
 	struct mlx5e_channel_param cparam;
 };
 
-static int mlx5e_queue_mem_alloc(struct net_device *dev, void *newq,
-				 int queue_index)
+static int mlx5e_queue_mem_alloc(struct net_device *dev,
+				 struct netdev_queue_config *qcfg,
+				 void *newq, int queue_index)
 {
 	struct mlx5_qmgmt_data *new = (struct mlx5_qmgmt_data *)newq;
 	struct mlx5e_priv *priv = netdev_priv(dev);
@@ -5624,8 +5625,9 @@ static int mlx5e_queue_stop(struct net_device *dev, void *oldq, int queue_index)
 	return 0;
 }
 
-static int mlx5e_queue_start(struct net_device *dev, void *newq,
-			     int queue_index)
+static int mlx5e_queue_start(struct net_device *dev,
+			     struct netdev_queue_config *qcfg,
+			     void *newq, int queue_index)
 {
 	struct mlx5_qmgmt_data *new = (struct mlx5_qmgmt_data *)newq;
 	struct mlx5e_priv *priv = netdev_priv(dev);
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
index b1e8ce89870f..8854c496f1dd 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_txrx.c
@@ -2789,7 +2789,9 @@ void fbnic_napi_depletion_check(struct net_device *netdev)
 	fbnic_wrfl(fbd);
 }
 
-static int fbnic_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
+static int fbnic_queue_mem_alloc(struct net_device *dev,
+				 struct netdev_queue_config *qcfg,
+				 void *qmem, int idx)
 {
 	struct fbnic_net *fbn = netdev_priv(dev);
 	const struct fbnic_q_triad *real;
@@ -2841,7 +2843,9 @@ static void __fbnic_nv_restart(struct fbnic_net *fbn,
 		netif_wake_subqueue(fbn->netdev, nv->qt[i].sub0.q_idx);
 }
 
-static int fbnic_queue_start(struct net_device *dev, void *qmem, int idx)
+static int fbnic_queue_start(struct net_device *dev,
+			     struct netdev_queue_config *qcfg,
+			     void *qmem, int idx)
 {
 	struct fbnic_net *fbn = netdev_priv(dev);
 	struct fbnic_napi_vector *nv;
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index ebc3833e95b4..032ef17dcf61 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -750,7 +750,8 @@ struct nsim_queue_mem {
 };
 
 static int
-nsim_queue_mem_alloc(struct net_device *dev, void *per_queue_mem, int idx)
+nsim_queue_mem_alloc(struct net_device *dev, struct netdev_queue_config *qcfg,
+		     void *per_queue_mem, int idx)
 {
 	struct nsim_queue_mem *qmem = per_queue_mem;
 	struct netdevsim *ns = netdev_priv(dev);
@@ -799,7 +800,8 @@ static void nsim_queue_mem_free(struct net_device *dev, void *per_queue_mem)
 }
 
 static int
-nsim_queue_start(struct net_device *dev, void *per_queue_mem, int idx)
+nsim_queue_start(struct net_device *dev, struct netdev_queue_config *qcfg,
+		 void *per_queue_mem, int idx)
 {
 	struct nsim_queue_mem *qmem = per_queue_mem;
 	struct netdevsim *ns = netdev_priv(dev);
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index b7c9895cd4b2..a7b325307029 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -32,6 +32,13 @@ struct netdev_config {
 	/** @hds_config: HDS enabled (ETHTOOL_A_RINGS_TCP_DATA_SPLIT).
 	 */
 	u8	hds_config;
+
+	/** @qcfg: per-queue configuration */
+	struct netdev_queue_config *qcfg;
+};
+
+/* Same semantics as fields in struct netdev_config */
+struct netdev_queue_config {
 };
 
 /* See the netdev.yaml spec for definition of each statistic */
@@ -136,6 +143,10 @@ void netdev_stat_queue_sum(struct net_device *netdev,
  *
  * @ndo_queue_mem_size: Size of the struct that describes a queue's memory.
  *
+ * @ndo_queue_cfg_defaults: (Optional) Populate queue config struct with
+ *			defaults. Queue config structs are passed to this
+ *			helper before the user-requested settings are applied.
+ *
  * @ndo_queue_mem_alloc: Allocate memory for an RX queue at the specified index.
  *			 The new memory is written at the specified address.
  *
@@ -156,12 +167,17 @@ void netdev_stat_queue_sum(struct net_device *netdev,
  */
 struct netdev_queue_mgmt_ops {
 	size_t	ndo_queue_mem_size;
+	void	(*ndo_queue_cfg_defaults)(struct net_device *dev,
+					  int idx,
+					  struct netdev_queue_config *qcfg);
 	int	(*ndo_queue_mem_alloc)(struct net_device *dev,
+				       struct netdev_queue_config *qcfg,
 				       void *per_queue_mem,
 				       int idx);
 	void	(*ndo_queue_mem_free)(struct net_device *dev,
 				      void *per_queue_mem);
 	int	(*ndo_queue_start)(struct net_device *dev,
+				   struct netdev_queue_config *qcfg,
 				   void *per_queue_mem,
 				   int idx);
 	int	(*ndo_queue_stop)(struct net_device *dev,
@@ -173,6 +189,9 @@ struct netdev_queue_mgmt_ops {
 
 bool netif_rxq_has_unreadable_mp(struct net_device *dev, int idx);
 
+void netdev_queue_config(struct net_device *dev, int rxq,
+			 struct netdev_queue_config *qcfg);
+
 /**
  * DOC: Lockless queue stopping / waking helpers.
  *
diff --git a/net/core/dev.h b/net/core/dev.h
index 1ec0b836c652..a2d6a181b9b0 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -9,6 +9,7 @@
 #include <net/netdev_lock.h>
 
 struct net;
+struct netdev_queue_config;
 struct netlink_ext_ack;
 struct cpumask;
 
@@ -96,6 +97,8 @@ int netdev_alloc_config(struct net_device *dev);
 void __netdev_free_config(struct netdev_config *cfg);
 void netdev_free_config(struct net_device *dev);
 int netdev_reconfig_start(struct net_device *dev);
+void __netdev_queue_config(struct net_device *dev, int rxq,
+			   struct netdev_queue_config *qcfg, bool pending);
 
 /* netdev management, shared between various uAPI entry points */
 struct netdev_name_node {
diff --git a/net/core/netdev_config.c b/net/core/netdev_config.c
index 270b7f10a192..bad2d53522f0 100644
--- a/net/core/netdev_config.c
+++ b/net/core/netdev_config.c
@@ -8,18 +8,29 @@
 int netdev_alloc_config(struct net_device *dev)
 {
 	struct netdev_config *cfg;
+	unsigned int maxqs;
 
 	cfg = kzalloc(sizeof(*dev->cfg), GFP_KERNEL_ACCOUNT);
 	if (!cfg)
 		return -ENOMEM;
 
+	maxqs = max(dev->num_rx_queues, dev->num_tx_queues);
+	cfg->qcfg = kcalloc(maxqs, sizeof(*cfg->qcfg), GFP_KERNEL_ACCOUNT);
+	if (!cfg->qcfg)
+		goto err_free_cfg;
+
 	dev->cfg = cfg;
 	dev->cfg_pending = cfg;
 	return 0;
+
+err_free_cfg:
+	kfree(cfg);
+	return -ENOMEM;
 }
 
 void __netdev_free_config(struct netdev_config *cfg)
 {
+	kfree(cfg->qcfg);
 	kfree(cfg);
 }
 
@@ -32,12 +43,59 @@ void netdev_free_config(struct net_device *dev)
 int netdev_reconfig_start(struct net_device *dev)
 {
 	struct netdev_config *cfg;
+	unsigned int maxqs;
 
 	WARN_ON(dev->cfg != dev->cfg_pending);
 	cfg = kmemdup(dev->cfg, sizeof(*dev->cfg), GFP_KERNEL_ACCOUNT);
 	if (!cfg)
 		return -ENOMEM;
 
+	maxqs = max(dev->num_rx_queues, dev->num_tx_queues);
+	cfg->qcfg = kmemdup_array(dev->cfg->qcfg, maxqs, sizeof(*cfg->qcfg),
+				  GFP_KERNEL_ACCOUNT);
+	if (!cfg->qcfg)
+		goto err_free_cfg;
+
 	dev->cfg_pending = cfg;
 	return 0;
+
+err_free_cfg:
+	kfree(cfg);
+	return -ENOMEM;
+}
+
+void __netdev_queue_config(struct net_device *dev, int rxq,
+			   struct netdev_queue_config *qcfg, bool pending)
+{
+	memset(qcfg, 0, sizeof(*qcfg));
+
+	/* Get defaults from the driver, in case user config not set */
+	if (dev->queue_mgmt_ops->ndo_queue_cfg_defaults)
+		dev->queue_mgmt_ops->ndo_queue_cfg_defaults(dev, rxq, qcfg);
+}
+
+/**
+ * netdev_queue_config() - get configuration for a given queue
+ * @dev:  net_device instance
+ * @rxq:  index of the queue of interest
+ * @qcfg: queue configuration struct (output)
+ *
+ * Render the configuration for a given queue. This helper should be used
+ * by drivers which support queue configuration to retrieve config for
+ * a particular queue.
+ *
+ * @qcfg is an output parameter and is always fully initialized by this
+ * function. Some values may not be set by the user, drivers may either
+ * deal with the "unset" values in @qcfg, or provide the callback
+ * to populate defaults in queue_management_ops.
+ *
+ * Note that this helper returns pending config, as it is expected that
+ * "old" queues are retained until config is successful so they can
+ * be restored directly without asking for the config.
+ */
+void netdev_queue_config(struct net_device *dev, int rxq,
+			 struct netdev_queue_config *qcfg)
+{
+	__netdev_queue_config(dev, rxq, qcfg, true);
 }
+EXPORT_SYMBOL(netdev_queue_config);
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
index c7d9341b7630..f6a07fccebd1 100644
--- a/net/core/netdev_rx_queue.c
+++ b/net/core/netdev_rx_queue.c
@@ -7,6 +7,7 @@
 #include <net/netdev_rx_queue.h>
 #include <net/page_pool/memory_provider.h>
 
+#include "dev.h"
 #include "page_pool_priv.h"
 
 /* See also page_pool_is_unreadable() */
@@ -22,6 +23,7 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
 {
 	struct netdev_rx_queue *rxq = __netif_get_rx_queue(dev, rxq_idx);
 	const struct netdev_queue_mgmt_ops *qops = dev->queue_mgmt_ops;
+	struct netdev_queue_config qcfg;
 	void *new_mem, *old_mem;
 	int err;
 
@@ -41,7 +43,9 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
 		goto err_free_new_mem;
 	}
 
-	err = qops->ndo_queue_mem_alloc(dev, new_mem, rxq_idx);
+	netdev_queue_config(dev, rxq_idx, &qcfg);
+
+	err = qops->ndo_queue_mem_alloc(dev, &qcfg, new_mem, rxq_idx);
 	if (err)
 		goto err_free_old_mem;
 
@@ -54,7 +58,7 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
 		if (err)
 			goto err_free_new_queue_mem;
 
-		err = qops->ndo_queue_start(dev, new_mem, rxq_idx);
+		err = qops->ndo_queue_start(dev, &qcfg, new_mem, rxq_idx);
 		if (err)
 			goto err_start_queue;
 	} else {
@@ -69,6 +73,7 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
 	return 0;
 
 err_start_queue:
+	__netdev_queue_config(dev, rxq_idx, &qcfg, false);
 	/* Restarting the queue with old_mem should be successful as we haven't
 	 * changed any of the queue configuration, and there is not much we can
 	 * do to recover from a failure here.
@@ -76,7 +81,7 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
 	 * WARN if we fail to recover the old rx queue, and at least free
 	 * old_mem so we don't also leak that.
 	 */
-	if (qops->ndo_queue_start(dev, old_mem, rxq_idx)) {
+	if (qops->ndo_queue_start(dev, &qcfg, old_mem, rxq_idx)) {
 		WARN(1,
 		     "Failed to restart old queue in error path. RX queue %d may be unhealthy.",
 		     rxq_idx);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 14/24] net: pass extack to netdev_rx_queue_restart()
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (12 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 13/24] net: allocate per-queue config structs and pass them thru the queue API Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 15/24] net: add queue config validation callback Pavel Begunkov
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Pass extack to netdev_rx_queue_restart(). Subsequent change will need it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 drivers/net/netdevsim/netdev.c            | 2 +-
 include/net/netdev_rx_queue.h             | 3 ++-
 net/core/netdev_rx_queue.c                | 7 ++++---
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7d7a9d5bc566..61e5c866d946 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -11570,7 +11570,7 @@ static void bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
 
 	netdev_lock(irq->bp->dev);
 	if (netif_running(irq->bp->dev)) {
-		err = netdev_rx_queue_restart(irq->bp->dev, irq->ring_nr);
+		err = netdev_rx_queue_restart(irq->bp->dev, irq->ring_nr, NULL);
 		if (err)
 			netdev_err(irq->bp->dev,
 				   "RX queue restart failed: err=%d\n", err);
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index 032ef17dcf61..649822af352e 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -886,7 +886,7 @@ nsim_qreset_write(struct file *file, const char __user *data,
 	}
 
 	ns->rq_reset_mode = mode;
-	ret = netdev_rx_queue_restart(ns->netdev, queue);
+	ret = netdev_rx_queue_restart(ns->netdev, queue, NULL);
 	ns->rq_reset_mode = 0;
 	if (ret)
 		goto exit_unlock;
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index 8cdcd138b33f..a7def1f94823 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -56,6 +56,7 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
 	return index;
 }
 
-int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
+int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq,
+			    struct netlink_ext_ack *extack);
 
 #endif
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
index f6a07fccebd1..16db850aafd7 100644
--- a/net/core/netdev_rx_queue.c
+++ b/net/core/netdev_rx_queue.c
@@ -19,7 +19,8 @@ bool netif_rxq_has_unreadable_mp(struct net_device *dev, int idx)
 }
 EXPORT_SYMBOL(netif_rxq_has_unreadable_mp);
 
-int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
+int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx,
+			    struct netlink_ext_ack *extack)
 {
 	struct netdev_rx_queue *rxq = __netif_get_rx_queue(dev, rxq_idx);
 	const struct netdev_queue_mgmt_ops *qops = dev->queue_mgmt_ops;
@@ -143,7 +144,7 @@ int __net_mp_open_rxq(struct net_device *dev, unsigned int rxq_idx,
 #endif
 
 	rxq->mp_params = *p;
-	ret = netdev_rx_queue_restart(dev, rxq_idx);
+	ret = netdev_rx_queue_restart(dev, rxq_idx, extack);
 	if (ret) {
 		rxq->mp_params.mp_ops = NULL;
 		rxq->mp_params.mp_priv = NULL;
@@ -186,7 +187,7 @@ void __net_mp_close_rxq(struct net_device *dev, unsigned int ifq_idx,
 
 	rxq->mp_params.mp_ops = NULL;
 	rxq->mp_params.mp_priv = NULL;
-	err = netdev_rx_queue_restart(dev, ifq_idx);
+	err = netdev_rx_queue_restart(dev, ifq_idx, NULL);
 	WARN_ON(err && err != -ENETDOWN);
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 15/24] net: add queue config validation callback
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (13 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 14/24] net: pass extack to netdev_rx_queue_restart() Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 16/24] eth: bnxt: always set the queue mgmt ops Pavel Begunkov
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

I imagine (tm) that as the number of per-queue configuration
options grows some of them may conflict for certain drivers.
While the drivers can obviously do all the validation locally
doing so is fairly inconvenient as the config is fed to drivers
piecemeal via different ops (for different params and NIC-wide
vs per-queue).

Add a centralized callback for validating the queue config
in queue ops. The callback gets invoked before each queue restart
and when ring params are modified.

For NIC-wide changes the callback gets invoked for each active
(or active to-be) queue, and additionally with a negative queue
index for NIC-wide defaults. The NIC-wide check is needed in
case all queues have an override active when NIC-wide setting
is changed to an unsupported one. Alternatively we could check
the settings when new queues are enabled (in the channel API),
but accepting invalid config is a bad idea. Users may expect
that resetting a queue override will always work.

The "trick" of passing a negative index is a bit ugly, we may
want to revisit if it causes confusion and bugs. Existing drivers
don't care about the index so it "just works".

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/netdev_queues.h | 12 ++++++++++++
 net/core/dev.h              |  2 ++
 net/core/netdev_config.c    | 20 ++++++++++++++++++++
 net/core/netdev_rx_queue.c  |  6 ++++++
 net/ethtool/rings.c         |  5 +++++
 5 files changed, 45 insertions(+)

diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index a7b325307029..532f60ee1a66 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -147,6 +147,14 @@ void netdev_stat_queue_sum(struct net_device *netdev,
  *			defaults. Queue config structs are passed to this
  *			helper before the user-requested settings are applied.
  *
+ * @ndo_queue_cfg_validate: (Optional) Check if queue config is supported.
+ *			Called when configuration affecting a queue may be
+ *			changing, either due to NIC-wide config, or config
+ *			scoped to the queue at a specified index.
+ *			When NIC-wide config is changed the callback will
+ *			be invoked for all queues, and in addition to that
+ *			with a negative queue index for the base settings.
+ *
  * @ndo_queue_mem_alloc: Allocate memory for an RX queue at the specified index.
  *			 The new memory is written at the specified address.
  *
@@ -170,6 +178,10 @@ struct netdev_queue_mgmt_ops {
 	void	(*ndo_queue_cfg_defaults)(struct net_device *dev,
 					  int idx,
 					  struct netdev_queue_config *qcfg);
+	int	(*ndo_queue_cfg_validate)(struct net_device *dev,
+					  int idx,
+					  struct netdev_queue_config *qcfg,
+					  struct netlink_ext_ack *extack);
 	int	(*ndo_queue_mem_alloc)(struct net_device *dev,
 				       struct netdev_queue_config *qcfg,
 				       void *per_queue_mem,
diff --git a/net/core/dev.h b/net/core/dev.h
index a2d6a181b9b0..a203b63198e7 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -99,6 +99,8 @@ void netdev_free_config(struct net_device *dev);
 int netdev_reconfig_start(struct net_device *dev);
 void __netdev_queue_config(struct net_device *dev, int rxq,
 			   struct netdev_queue_config *qcfg, bool pending);
+int netdev_queue_config_revalidate(struct net_device *dev,
+				   struct netlink_ext_ack *extack);
 
 /* netdev management, shared between various uAPI entry points */
 struct netdev_name_node {
diff --git a/net/core/netdev_config.c b/net/core/netdev_config.c
index bad2d53522f0..fc700b77e4eb 100644
--- a/net/core/netdev_config.c
+++ b/net/core/netdev_config.c
@@ -99,3 +99,23 @@ void netdev_queue_config(struct net_device *dev, int rxq,
 	__netdev_queue_config(dev, rxq, qcfg, true);
 }
 EXPORT_SYMBOL(netdev_queue_config);
+
+int netdev_queue_config_revalidate(struct net_device *dev,
+				   struct netlink_ext_ack *extack)
+{
+	const struct netdev_queue_mgmt_ops *qops = dev->queue_mgmt_ops;
+	struct netdev_queue_config qcfg;
+	int i, err;
+
+	if (!qops || !qops->ndo_queue_cfg_validate)
+		return 0;
+
+	for (i = -1; i < (int)dev->real_num_rx_queues; i++) {
+		netdev_queue_config(dev, i, &qcfg);
+		err = qops->ndo_queue_cfg_validate(dev, i, &qcfg, extack);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
index 16db850aafd7..5ae375a072a1 100644
--- a/net/core/netdev_rx_queue.c
+++ b/net/core/netdev_rx_queue.c
@@ -46,6 +46,12 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx,
 
 	netdev_queue_config(dev, rxq_idx, &qcfg);
 
+	if (qops->ndo_queue_cfg_validate) {
+		err = qops->ndo_queue_cfg_validate(dev, rxq_idx, &qcfg, extack);
+		if (err)
+			goto err_free_old_mem;
+	}
+
 	err = qops->ndo_queue_mem_alloc(dev, &qcfg, new_mem, rxq_idx);
 	if (err)
 		goto err_free_old_mem;
diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c
index 6a74e7e4064e..7884d10c090f 100644
--- a/net/ethtool/rings.c
+++ b/net/ethtool/rings.c
@@ -4,6 +4,7 @@
 
 #include "netlink.h"
 #include "common.h"
+#include "../core/dev.h"
 
 struct rings_req_info {
 	struct ethnl_req_info		base;
@@ -307,6 +308,10 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info)
 	dev->cfg_pending->hds_config = kernel_ringparam.tcp_data_split;
 	dev->cfg_pending->hds_thresh = kernel_ringparam.hds_thresh;
 
+	ret = netdev_queue_config_revalidate(dev, info->extack);
+	if (ret)
+		return ret;
+
 	ret = dev->ethtool_ops->set_ringparam(dev, &ringparam,
 					      &kernel_ringparam, info->extack);
 	return ret < 0 ? ret : 1;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 16/24] eth: bnxt: always set the queue mgmt ops
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (14 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 15/24] net: add queue config validation callback Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 17/24] eth: bnxt: store the rx buf size per queue Pavel Begunkov
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Core provides a centralized callback for validating per-queue settings
but the callback is part of the queue management ops. Having the ops
conditionally set complicates the parts of the driver which could
otherwise lean on the core to feed it the correct settings.

Always set the queue ops, but provide no restart-related callbacks if
queue ops are not supported by the device. This should maintain current
behavior, the check in netdev_rx_queue_restart() looks both at op struct
and individual ops.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[pavel: reflow mgmt ops assignment]
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 61e5c866d946..bd06171cc86c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -16187,6 +16187,9 @@ static const struct netdev_queue_mgmt_ops bnxt_queue_mgmt_ops = {
 	.ndo_queue_stop		= bnxt_queue_stop,
 };
 
+static const struct netdev_queue_mgmt_ops bnxt_queue_mgmt_ops_unsupp = {
+};
+
 static void bnxt_remove_one(struct pci_dev *pdev)
 {
 	struct net_device *dev = pci_get_drvdata(pdev);
@@ -16840,6 +16843,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	if (BNXT_SUPPORTS_NTUPLE_VNIC(bp))
 		bp->rss_cap |= BNXT_RSS_CAP_MULTI_RSS_CTX;
+
+	dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops_unsupp;
 	if (BNXT_SUPPORTS_QUEUE_API(bp))
 		dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops;
 	dev->request_ops_lock = true;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 17/24] eth: bnxt: store the rx buf size per queue
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (15 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 16/24] eth: bnxt: always set the queue mgmt ops Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 18/24] eth: bnxt: adjust the fill level of agg queues with larger buffers Pavel Begunkov
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

In normal operation only a subset of queues is configured for
zero-copy. Since zero-copy is the main use for larger buffer
sizes we need to configure the sizes per queue.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 50 ++++++++++---------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  6 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |  2 +-
 4 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index bd06171cc86c..e4dba91332ae 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -905,7 +905,7 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
 
 static bool bnxt_separate_head_pool(struct bnxt_rx_ring_info *rxr)
 {
-	return rxr->need_head_pool || rxr->bnapi->bp->rx_page_size < PAGE_SIZE;
+	return rxr->need_head_pool || rxr->rx_page_size < PAGE_SIZE;
 }
 
 static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
@@ -915,9 +915,9 @@ static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
 {
 	struct page *page;
 
-	if (bp->rx_page_size < PAGE_SIZE) {
+	if (rxr->rx_page_size < PAGE_SIZE) {
 		page = page_pool_dev_alloc_frag(rxr->page_pool, offset,
-						bp->rx_page_size);
+						rxr->rx_page_size);
 	} else {
 		page = page_pool_dev_alloc_pages(rxr->page_pool);
 		*offset = 0;
@@ -936,9 +936,9 @@ static netmem_ref __bnxt_alloc_rx_netmem(struct bnxt *bp, dma_addr_t *mapping,
 {
 	netmem_ref netmem;
 
-	if (bp->rx_page_size < PAGE_SIZE) {
+	if (rxr->rx_page_size < PAGE_SIZE) {
 		netmem = page_pool_alloc_frag_netmem(rxr->page_pool, offset,
-						     bp->rx_page_size, gfp);
+						     rxr->rx_page_size, gfp);
 	} else {
 		netmem = page_pool_alloc_netmems(rxr->page_pool, gfp);
 		*offset = 0;
@@ -1156,9 +1156,9 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
 		return NULL;
 	}
 	dma_addr -= bp->rx_dma_offset;
-	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, bp->rx_page_size,
+	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, rxr->rx_page_size,
 				bp->rx_dir);
-	skb = napi_build_skb(data_ptr - bp->rx_offset, bp->rx_page_size);
+	skb = napi_build_skb(data_ptr - bp->rx_offset, rxr->rx_page_size);
 	if (!skb) {
 		page_pool_recycle_direct(rxr->page_pool, page);
 		return NULL;
@@ -1190,7 +1190,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 		return NULL;
 	}
 	dma_addr -= bp->rx_dma_offset;
-	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, bp->rx_page_size,
+	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, rxr->rx_page_size,
 				bp->rx_dir);
 
 	if (unlikely(!payload))
@@ -1204,7 +1204,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 
 	skb_mark_for_recycle(skb);
 	off = (void *)data_ptr - page_address(page);
-	skb_add_rx_frag(skb, 0, page, off, len, bp->rx_page_size);
+	skb_add_rx_frag(skb, 0, page, off, len, rxr->rx_page_size);
 	memcpy(skb->data - NET_IP_ALIGN, data_ptr - NET_IP_ALIGN,
 	       payload + NET_IP_ALIGN);
 
@@ -1289,7 +1289,7 @@ static u32 __bnxt_rx_agg_netmems(struct bnxt *bp,
 		if (skb) {
 			skb_add_rx_frag_netmem(skb, i, cons_rx_buf->netmem,
 					       cons_rx_buf->offset,
-					       frag_len, bp->rx_page_size);
+					       frag_len, rxr->rx_page_size);
 		} else {
 			skb_frag_t *frag = &shinfo->frags[i];
 
@@ -1314,7 +1314,7 @@ static u32 __bnxt_rx_agg_netmems(struct bnxt *bp,
 			if (skb) {
 				skb->len -= frag_len;
 				skb->data_len -= frag_len;
-				skb->truesize -= bp->rx_page_size;
+				skb->truesize -= rxr->rx_page_size;
 			}
 
 			--shinfo->nr_frags;
@@ -1329,7 +1329,7 @@ static u32 __bnxt_rx_agg_netmems(struct bnxt *bp,
 		}
 
 		page_pool_dma_sync_netmem_for_cpu(rxr->page_pool, netmem, 0,
-						  bp->rx_page_size);
+						  rxr->rx_page_size);
 
 		total_frag_len += frag_len;
 		prod = NEXT_RX_AGG(prod);
@@ -2282,8 +2282,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 			if (!skb)
 				goto oom_next_rx;
 		} else {
-			skb = bnxt_xdp_build_skb(bp, skb, agg_bufs,
-						 rxr->page_pool, &xdp);
+			skb = bnxt_xdp_build_skb(bp, skb, agg_bufs, rxr, &xdp);
 			if (!skb) {
 				/* we should be able to free the old skb here */
 				bnxt_xdp_buff_frags_free(rxr, &xdp);
@@ -3830,7 +3829,7 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	if (BNXT_RX_PAGE_MODE(bp))
 		pp.pool_size += bp->rx_ring_size / rx_size_fac;
 
-	pp.order = get_order(bp->rx_page_size);
+	pp.order = get_order(rxr->rx_page_size);
 	pp.nid = numa_node;
 	pp.netdev = bp->dev;
 	pp.dev = &bp->pdev->dev;
@@ -4325,6 +4324,8 @@ static void bnxt_init_ring_struct(struct bnxt *bp)
 		if (!rxr)
 			goto skip_rx;
 
+		rxr->rx_page_size = bp->rx_page_size;
+
 		ring = &rxr->rx_ring_struct;
 		rmem = &ring->ring_mem;
 		rmem->nr_pages = bp->rx_nr_pages;
@@ -4484,7 +4485,7 @@ static void bnxt_init_one_rx_agg_ring_rxbd(struct bnxt *bp,
 	ring = &rxr->rx_agg_ring_struct;
 	ring->fw_ring_id = INVALID_HW_RING_ID;
 	if ((bp->flags & BNXT_FLAG_AGG_RINGS)) {
-		type = ((u32)bp->rx_page_size << RX_BD_LEN_SHIFT) |
+		type = ((u32)rxr->rx_page_size << RX_BD_LEN_SHIFT) |
 			RX_BD_TYPE_RX_AGG_BD | RX_BD_FLAGS_SOP;
 
 		bnxt_init_rxbd_pages(ring, type);
@@ -7051,6 +7052,7 @@ static void bnxt_hwrm_ring_grp_free(struct bnxt *bp)
 
 static void bnxt_set_rx_ring_params_p5(struct bnxt *bp, u32 ring_type,
 				       struct hwrm_ring_alloc_input *req,
+				       struct bnxt_rx_ring_info *rxr,
 				       struct bnxt_ring_struct *ring)
 {
 	struct bnxt_ring_grp_info *grp_info = &bp->grp_info[ring->grp_idx];
@@ -7060,7 +7062,7 @@ static void bnxt_set_rx_ring_params_p5(struct bnxt *bp, u32 ring_type,
 	if (ring_type == HWRM_RING_ALLOC_AGG) {
 		req->ring_type = RING_ALLOC_REQ_RING_TYPE_RX_AGG;
 		req->rx_ring_id = cpu_to_le16(grp_info->rx_fw_ring_id);
-		req->rx_buf_size = cpu_to_le16(bp->rx_page_size);
+		req->rx_buf_size = cpu_to_le16(rxr->rx_page_size);
 		enables |= RING_ALLOC_REQ_ENABLES_RX_RING_ID_VALID;
 	} else {
 		req->rx_buf_size = cpu_to_le16(bp->rx_buf_use_size);
@@ -7074,6 +7076,7 @@ static void bnxt_set_rx_ring_params_p5(struct bnxt *bp, u32 ring_type,
 }
 
 static int hwrm_ring_alloc_send_msg(struct bnxt *bp,
+				    struct bnxt_rx_ring_info *rxr,
 				    struct bnxt_ring_struct *ring,
 				    u32 ring_type, u32 map_index)
 {
@@ -7130,7 +7133,8 @@ static int hwrm_ring_alloc_send_msg(struct bnxt *bp,
 			      cpu_to_le32(bp->rx_ring_mask + 1) :
 			      cpu_to_le32(bp->rx_agg_ring_mask + 1);
 		if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS)
-			bnxt_set_rx_ring_params_p5(bp, ring_type, req, ring);
+			bnxt_set_rx_ring_params_p5(bp, ring_type, req,
+						   rxr, ring);
 		break;
 	case HWRM_RING_ALLOC_CMPL:
 		req->ring_type = RING_ALLOC_REQ_RING_TYPE_L2_CMPL;
@@ -7278,7 +7282,7 @@ static int bnxt_hwrm_rx_ring_alloc(struct bnxt *bp,
 	u32 map_idx = bnapi->index;
 	int rc;
 
-	rc = hwrm_ring_alloc_send_msg(bp, ring, type, map_idx);
+	rc = hwrm_ring_alloc_send_msg(bp, rxr, ring, type, map_idx);
 	if (rc)
 		return rc;
 
@@ -7298,7 +7302,7 @@ static int bnxt_hwrm_rx_agg_ring_alloc(struct bnxt *bp,
 	int rc;
 
 	map_idx = grp_idx + bp->rx_nr_rings;
-	rc = hwrm_ring_alloc_send_msg(bp, ring, type, map_idx);
+	rc = hwrm_ring_alloc_send_msg(bp, rxr, ring, type, map_idx);
 	if (rc)
 		return rc;
 
@@ -7322,7 +7326,7 @@ static int bnxt_hwrm_cp_ring_alloc_p5(struct bnxt *bp,
 
 	ring = &cpr->cp_ring_struct;
 	ring->handle = BNXT_SET_NQ_HDL(cpr);
-	rc = hwrm_ring_alloc_send_msg(bp, ring, type, map_idx);
+	rc = hwrm_ring_alloc_send_msg(bp, NULL, ring, type, map_idx);
 	if (rc)
 		return rc;
 	bnxt_set_db(bp, &cpr->cp_db, type, map_idx, ring->fw_ring_id);
@@ -7337,7 +7341,7 @@ static int bnxt_hwrm_tx_ring_alloc(struct bnxt *bp,
 	const u32 type = HWRM_RING_ALLOC_TX;
 	int rc;
 
-	rc = hwrm_ring_alloc_send_msg(bp, ring, type, tx_idx);
+	rc = hwrm_ring_alloc_send_msg(bp, NULL, ring, type, tx_idx);
 	if (rc)
 		return rc;
 	bnxt_set_db(bp, &txr->tx_db, type, tx_idx, ring->fw_ring_id);
@@ -7363,7 +7367,7 @@ static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
 
 		vector = bp->irq_tbl[map_idx].vector;
 		disable_irq_nosync(vector);
-		rc = hwrm_ring_alloc_send_msg(bp, ring, type, map_idx);
+		rc = hwrm_ring_alloc_send_msg(bp, NULL, ring, type, map_idx);
 		if (rc) {
 			enable_irq(vector);
 			goto err_out;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3abe59e9b021..c8931de76de3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1107,6 +1107,7 @@ struct bnxt_rx_ring_info {
 
 	unsigned long		*rx_agg_bmap;
 	u16			rx_agg_bmap_size;
+	u16			rx_page_size;
 	bool                    need_head_pool;
 
 	dma_addr_t		rx_desc_mapping[MAX_RX_PAGES];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index c23c04007136..619235b151a4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -183,7 +183,7 @@ void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 			u16 cons, u8 *data_ptr, unsigned int len,
 			struct xdp_buff *xdp)
 {
-	u32 buflen = bp->rx_page_size;
+	u32 buflen = rxr->rx_page_size;
 	struct bnxt_sw_rx_bd *rx_buf;
 	struct pci_dev *pdev;
 	dma_addr_t mapping;
@@ -461,7 +461,7 @@ int bnxt_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 
 struct sk_buff *
 bnxt_xdp_build_skb(struct bnxt *bp, struct sk_buff *skb, u8 num_frags,
-		   struct page_pool *pool, struct xdp_buff *xdp)
+		   struct bnxt_rx_ring_info *rxr, struct xdp_buff *xdp)
 {
 	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
 
@@ -469,7 +469,7 @@ bnxt_xdp_build_skb(struct bnxt *bp, struct sk_buff *skb, u8 num_frags,
 		return NULL;
 
 	xdp_update_skb_frags_info(skb, num_frags, sinfo->xdp_frags_size,
-				  bp->rx_page_size * num_frags,
+				  rxr->rx_page_size * num_frags,
 				  xdp_buff_get_skb_flags(xdp));
 	return skb;
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
index 220285e190fc..8933a0dec09a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
@@ -32,6 +32,6 @@ void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 void bnxt_xdp_buff_frags_free(struct bnxt_rx_ring_info *rxr,
 			      struct xdp_buff *xdp);
 struct sk_buff *bnxt_xdp_build_skb(struct bnxt *bp, struct sk_buff *skb,
-				   u8 num_frags, struct page_pool *pool,
+				   u8 num_frags, struct bnxt_rx_ring_info *rxr,
 				   struct xdp_buff *xdp);
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 18/24] eth: bnxt: adjust the fill level of agg queues with larger buffers
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (16 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 17/24] eth: bnxt: store the rx buf size per queue Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 19/24] netdev: add support for setting rx-buf-len per queue Pavel Begunkov
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

The driver tries to provision more agg buffers than header buffers
since multiple agg segments can reuse the same header. The calculation
/ heuristic tries to provide enough pages for 65k of data for each header
(or 4 frags per header if the result is too big). This calculation is
currently global to the adapter. If we increase the buffer sizes 8x
we don't want 8x the amount of memory sitting on the rings.
Luckily we don't have to fill the rings completely, adjust
the fill level dynamically in case particular queue has buffers
larger than the global size.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[pavel: rebase on top of agg_size_fac, assert agg_size_fac]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 28 +++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index e4dba91332ae..1741aeffee55 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3816,16 +3816,34 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
 	}
 }
 
+static int bnxt_rx_agg_ring_fill_level(struct bnxt *bp,
+				       struct bnxt_rx_ring_info *rxr)
+{
+	/* User may have chosen larger than default rx_page_size,
+	 * we keep the ring sizes uniform and also want uniform amount
+	 * of bytes consumed per ring, so cap how much of the rings we fill.
+	 */
+	int fill_level = bp->rx_agg_ring_size;
+
+	if (rxr->rx_page_size > bp->rx_page_size)
+		fill_level /= rxr->rx_page_size / bp->rx_page_size;
+
+	return fill_level;
+}
+
 static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 				   struct bnxt_rx_ring_info *rxr,
 				   int numa_node)
 {
-	const unsigned int agg_size_fac = PAGE_SIZE / BNXT_RX_PAGE_SIZE;
+	unsigned int agg_size_fac = rxr->rx_page_size / BNXT_RX_PAGE_SIZE;
 	const unsigned int rx_size_fac = PAGE_SIZE / SZ_4K;
 	struct page_pool_params pp = { 0 };
 	struct page_pool *pool;
 
-	pp.pool_size = bp->rx_agg_ring_size / agg_size_fac;
+	if (WARN_ON_ONCE(agg_size_fac == 0))
+		agg_size_fac = 1;
+
+	pp.pool_size = bnxt_rx_agg_ring_fill_level(bp, rxr) / agg_size_fac;
 	if (BNXT_RX_PAGE_MODE(bp))
 		pp.pool_size += bp->rx_ring_size / rx_size_fac;
 
@@ -4403,11 +4421,13 @@ static void bnxt_alloc_one_rx_ring_netmem(struct bnxt *bp,
 					  struct bnxt_rx_ring_info *rxr,
 					  int ring_nr)
 {
+	int fill_level, i;
 	u32 prod;
-	int i;
+
+	fill_level = bnxt_rx_agg_ring_fill_level(bp, rxr);
 
 	prod = rxr->rx_agg_prod;
-	for (i = 0; i < bp->rx_agg_ring_size; i++) {
+	for (i = 0; i < fill_level; i++) {
 		if (bnxt_alloc_rx_netmem(bp, rxr, prod, GFP_KERNEL)) {
 			netdev_warn(bp->dev, "init'ed rx ring %d with %d/%d pages only\n",
 				    ring_nr, i, bp->rx_agg_ring_size);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 19/24] netdev: add support for setting rx-buf-len per queue
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (17 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 18/24] eth: bnxt: adjust the fill level of agg queues with larger buffers Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 20/24] net: wipe the setting of deactived queues Pavel Begunkov
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Zero-copy APIs increase the cost of buffer management. They also extend
this cost to user space applications which may be used to dealing with
much larger buffers. Allow setting rx-buf-len per queue, devices with
HW-GRO support can commonly fill buffers up to 32k (or rather 64k - 1
but that's not a power of 2..)

The implementation adds a new option to the netdev netlink, rather
than ethtool. The NIC-wide setting lives in ethtool ringparams so
one could argue that we should be extending the ethtool API.
OTOH netdev API is where we already have queue-get, and it's how
zero-copy applications bind memory providers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 Documentation/netlink/specs/netdev.yaml | 15 ++++
 include/net/netdev_queues.h             |  5 ++
 include/net/netlink.h                   | 19 +++++
 include/uapi/linux/netdev.h             |  2 +
 net/core/netdev-genl-gen.c              | 15 ++++
 net/core/netdev-genl-gen.h              |  1 +
 net/core/netdev-genl.c                  | 92 +++++++++++++++++++++++++
 net/core/netdev_config.c                | 16 +++++
 tools/include/uapi/linux/netdev.h       |  2 +
 9 files changed, 167 insertions(+)

diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index e00d3fa1c152..fabae13f45e8 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -338,6 +338,10 @@ attribute-sets:
         doc: XSK information for this queue, if any.
         type: nest
         nested-attributes: xsk-info
+      -
+        name: rx-buf-len
+        doc: Per-queue configuration of ETHTOOL_A_RINGS_RX_BUF_LEN.
+        type: u32
   -
     name: qstats
     doc: |
@@ -771,6 +775,17 @@ operations:
         reply:
           attributes:
             - id
+    -
+      name: queue-set
+      doc: Set per-queue configurable options.
+      attribute-set: queue
+      do:
+        request:
+          attributes:
+            - ifindex
+            - type
+            - id
+            - rx-buf-len
 
 kernel-family:
   headers: ["net/netdev_netlink.h"]
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 532f60ee1a66..4b59ca9d5a4b 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -39,6 +39,7 @@ struct netdev_config {
 
 /* Same semantics as fields in struct netdev_config */
 struct netdev_queue_config {
+	u32	rx_buf_len;
 };
 
 /* See the netdev.yaml spec for definition of each statistic */
@@ -141,6 +142,8 @@ void netdev_stat_queue_sum(struct net_device *netdev,
 /**
  * struct netdev_queue_mgmt_ops - netdev ops for queue management
  *
+ * @supported_ring_params: ring params supported per queue (ETHTOOL_RING_USE_*).
+ *
  * @ndo_queue_mem_size: Size of the struct that describes a queue's memory.
  *
  * @ndo_queue_cfg_defaults: (Optional) Populate queue config struct with
@@ -174,6 +177,8 @@ void netdev_stat_queue_sum(struct net_device *netdev,
  * be called for an interface which is open.
  */
 struct netdev_queue_mgmt_ops {
+	u32     supported_ring_params;
+
 	size_t	ndo_queue_mem_size;
 	void	(*ndo_queue_cfg_defaults)(struct net_device *dev,
 					  int idx,
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 1a8356ca4b78..29989ad81ddd 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -2200,6 +2200,25 @@ static inline struct nla_bitfield32 nla_get_bitfield32(const struct nlattr *nla)
 	return tmp;
 }
 
+/**
+ * nla_update_u32() - update u32 value from NLA_U32 attribute
+ * @dst:  value to update
+ * @attr: netlink attribute with new value or null
+ *
+ * Copy the u32 value from NLA_U32 netlink attribute @attr into variable
+ * pointed to by @dst; do nothing if @attr is null.
+ *
+ * Return: true if this function changed the value of @dst, otherwise false.
+ */
+static inline bool nla_update_u32(u32 *dst, const struct nlattr *attr)
+{
+	u32 old_val = *dst;
+
+	if (attr)
+		*dst = nla_get_u32(attr);
+	return *dst != old_val;
+}
+
 /**
  * nla_memdup - duplicate attribute memory (kmemdup)
  * @src: netlink attribute to duplicate from
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 48eb49aa03d4..820f89b67a72 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -158,6 +158,7 @@ enum {
 	NETDEV_A_QUEUE_DMABUF,
 	NETDEV_A_QUEUE_IO_URING,
 	NETDEV_A_QUEUE_XSK,
+	NETDEV_A_QUEUE_RX_BUF_LEN,
 
 	__NETDEV_A_QUEUE_MAX,
 	NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
@@ -226,6 +227,7 @@ enum {
 	NETDEV_CMD_BIND_RX,
 	NETDEV_CMD_NAPI_SET,
 	NETDEV_CMD_BIND_TX,
+	NETDEV_CMD_QUEUE_SET,
 
 	__NETDEV_CMD_MAX,
 	NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index e9a2a6f26cb7..d053306a3af8 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -106,6 +106,14 @@ static const struct nla_policy netdev_bind_tx_nl_policy[NETDEV_A_DMABUF_FD + 1]
 	[NETDEV_A_DMABUF_FD] = { .type = NLA_U32, },
 };
 
+/* NETDEV_CMD_QUEUE_SET - do */
+static const struct nla_policy netdev_queue_set_nl_policy[NETDEV_A_QUEUE_RX_BUF_LEN + 1] = {
+	[NETDEV_A_QUEUE_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
+	[NETDEV_A_QUEUE_TYPE] = NLA_POLICY_MAX(NLA_U32, 1),
+	[NETDEV_A_QUEUE_ID] = { .type = NLA_U32, },
+	[NETDEV_A_QUEUE_RX_BUF_LEN] = { .type = NLA_U32, },
+};
+
 /* Ops table for netdev */
 static const struct genl_split_ops netdev_nl_ops[] = {
 	{
@@ -204,6 +212,13 @@ static const struct genl_split_ops netdev_nl_ops[] = {
 		.maxattr	= NETDEV_A_DMABUF_FD,
 		.flags		= GENL_CMD_CAP_DO,
 	},
+	{
+		.cmd		= NETDEV_CMD_QUEUE_SET,
+		.doit		= netdev_nl_queue_set_doit,
+		.policy		= netdev_queue_set_nl_policy,
+		.maxattr	= NETDEV_A_QUEUE_RX_BUF_LEN,
+		.flags		= GENL_CMD_CAP_DO,
+	},
 };
 
 static const struct genl_multicast_group netdev_nl_mcgrps[] = {
diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h
index cf3fad74511f..b7f5e5d9fca9 100644
--- a/net/core/netdev-genl-gen.h
+++ b/net/core/netdev-genl-gen.h
@@ -35,6 +35,7 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
 int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info);
 int netdev_nl_napi_set_doit(struct sk_buff *skb, struct genl_info *info);
 int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info);
+int netdev_nl_queue_set_doit(struct sk_buff *skb, struct genl_info *info);
 
 enum {
 	NETDEV_NLGRP_MGMT,
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 470fabbeacd9..61529b600c87 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -386,6 +386,30 @@ static int nla_put_napi_id(struct sk_buff *skb, const struct napi_struct *napi)
 	return 0;
 }
 
+static int
+netdev_nl_queue_fill_cfg(struct sk_buff *rsp, struct net_device *netdev,
+			 u32 q_idx, u32 q_type)
+{
+	struct netdev_queue_config *qcfg;
+
+	if (!netdev_need_ops_lock(netdev))
+		return 0;
+
+	qcfg = &netdev->cfg->qcfg[q_idx];
+	switch (q_type) {
+	case NETDEV_QUEUE_TYPE_RX:
+		if (qcfg->rx_buf_len &&
+		    nla_put_u32(rsp, NETDEV_A_QUEUE_RX_BUF_LEN,
+				qcfg->rx_buf_len))
+			return -EMSGSIZE;
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
 static int
 netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
 			 u32 q_idx, u32 q_type, const struct genl_info *info)
@@ -433,6 +457,9 @@ netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
 		break;
 	}
 
+	if (netdev_nl_queue_fill_cfg(rsp, netdev, q_idx, q_type))
+		goto nla_put_failure;
+
 	genlmsg_end(rsp, hdr);
 
 	return 0;
@@ -572,6 +599,71 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 	return err;
 }
 
+int netdev_nl_queue_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr * const *tb = info->attrs;
+	struct netdev_queue_config *qcfg;
+	u32 q_id, q_type, ifindex;
+	struct net_device *netdev;
+	bool mod;
+	int ret;
+
+	if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_ID) ||
+	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_TYPE) ||
+	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_IFINDEX))
+		return -EINVAL;
+
+	q_id = nla_get_u32(tb[NETDEV_A_QUEUE_ID]);
+	q_type = nla_get_u32(tb[NETDEV_A_QUEUE_TYPE]);
+	ifindex = nla_get_u32(tb[NETDEV_A_QUEUE_IFINDEX]);
+
+	if (q_type != NETDEV_QUEUE_TYPE_RX) {
+		/* Only Rx params exist right now */
+		NL_SET_BAD_ATTR(info->extack, tb[NETDEV_A_QUEUE_TYPE]);
+		return -EINVAL;
+	}
+
+	ret = 0;
+	netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
+	if (!netdev || !netif_device_present(netdev))
+		ret = -ENODEV;
+	else if (!netdev->queue_mgmt_ops)
+		ret = -EOPNOTSUPP;
+	if (ret) {
+		NL_SET_BAD_ATTR(info->extack, tb[NETDEV_A_QUEUE_IFINDEX]);
+		goto exit_unlock;
+	}
+
+	ret = netdev_nl_queue_validate(netdev, q_id, q_type);
+	if (ret) {
+		NL_SET_BAD_ATTR(info->extack, tb[NETDEV_A_QUEUE_ID]);
+		goto exit_unlock;
+	}
+
+	ret = netdev_reconfig_start(netdev);
+	if (ret)
+		goto exit_unlock;
+
+	qcfg = &netdev->cfg_pending->qcfg[q_id];
+	mod = nla_update_u32(&qcfg->rx_buf_len, tb[NETDEV_A_QUEUE_RX_BUF_LEN]);
+	if (!mod)
+		goto exit_free_cfg;
+
+	ret = netdev_rx_queue_restart(netdev, q_id, info->extack);
+	if (ret)
+		goto exit_free_cfg;
+
+	swap(netdev->cfg, netdev->cfg_pending);
+
+exit_free_cfg:
+	__netdev_free_config(netdev->cfg_pending);
+	netdev->cfg_pending = netdev->cfg;
+exit_unlock:
+	if (netdev)
+		netdev_unlock(netdev);
+	return ret;
+}
+
 #define NETDEV_STAT_NOT_SET		(~0ULL)
 
 static void netdev_nl_stats_add(void *_sum, const void *_add, size_t size)
diff --git a/net/core/netdev_config.c b/net/core/netdev_config.c
index fc700b77e4eb..ede02b77470e 100644
--- a/net/core/netdev_config.c
+++ b/net/core/netdev_config.c
@@ -67,11 +67,27 @@ int netdev_reconfig_start(struct net_device *dev)
 void __netdev_queue_config(struct net_device *dev, int rxq,
 			   struct netdev_queue_config *qcfg, bool pending)
 {
+	const struct netdev_config *cfg;
+
+	cfg = pending ? dev->cfg_pending : dev->cfg;
+
 	memset(qcfg, 0, sizeof(*qcfg));
 
 	/* Get defaults from the driver, in case user config not set */
 	if (dev->queue_mgmt_ops->ndo_queue_cfg_defaults)
 		dev->queue_mgmt_ops->ndo_queue_cfg_defaults(dev, rxq, qcfg);
+
+	/* Set config based on device-level settings */
+	if (cfg->rx_buf_len)
+		qcfg->rx_buf_len = cfg->rx_buf_len;
+
+	/* Set config dedicated to this queue */
+	if (rxq >= 0) {
+		const struct netdev_queue_config *user_cfg = &cfg->qcfg[rxq];
+
+		if (user_cfg->rx_buf_len)
+			qcfg->rx_buf_len = user_cfg->rx_buf_len;
+	}
 }
 
 /**
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 48eb49aa03d4..820f89b67a72 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -158,6 +158,7 @@ enum {
 	NETDEV_A_QUEUE_DMABUF,
 	NETDEV_A_QUEUE_IO_URING,
 	NETDEV_A_QUEUE_XSK,
+	NETDEV_A_QUEUE_RX_BUF_LEN,
 
 	__NETDEV_A_QUEUE_MAX,
 	NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
@@ -226,6 +227,7 @@ enum {
 	NETDEV_CMD_BIND_RX,
 	NETDEV_CMD_NAPI_SET,
 	NETDEV_CMD_BIND_TX,
+	NETDEV_CMD_QUEUE_SET,
 
 	__NETDEV_CMD_MAX,
 	NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 20/24] net: wipe the setting of deactived queues
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (18 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 19/24] netdev: add support for setting rx-buf-len per queue Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 21/24] eth: bnxt: use queue op config validate Pavel Begunkov
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Clear out all settings of deactived queues when user changes
the number of channels. We already perform similar cleanup
for shapers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/core/dev.c           |  5 +++++
 net/core/dev.h           |  2 ++
 net/core/netdev_config.c | 13 +++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5f92425dfdbd..b253e7e29ffa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3188,6 +3188,8 @@ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
 		if (dev->num_tc)
 			netif_setup_tc(dev, txq);
 
+		netdev_queue_config_update_cnt(dev, txq,
+					       dev->real_num_rx_queues);
 		net_shaper_set_real_num_tx_queues(dev, txq);
 
 		dev_qdisc_change_real_num_tx(dev, txq);
@@ -3233,6 +3235,9 @@ int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq)
 						  rxq);
 		if (rc)
 			return rc;
+
+		netdev_queue_config_update_cnt(dev, dev->real_num_tx_queues,
+					       rxq);
 	}
 
 	dev->real_num_rx_queues = rxq;
diff --git a/net/core/dev.h b/net/core/dev.h
index a203b63198e7..63192dbb1895 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -101,6 +101,8 @@ void __netdev_queue_config(struct net_device *dev, int rxq,
 			   struct netdev_queue_config *qcfg, bool pending);
 int netdev_queue_config_revalidate(struct net_device *dev,
 				   struct netlink_ext_ack *extack);
+void netdev_queue_config_update_cnt(struct net_device *dev, unsigned int txq,
+				    unsigned int rxq);
 
 /* netdev management, shared between various uAPI entry points */
 struct netdev_name_node {
diff --git a/net/core/netdev_config.c b/net/core/netdev_config.c
index ede02b77470e..c5ae39e76f40 100644
--- a/net/core/netdev_config.c
+++ b/net/core/netdev_config.c
@@ -64,6 +64,19 @@ int netdev_reconfig_start(struct net_device *dev)
 	return -ENOMEM;
 }
 
+void netdev_queue_config_update_cnt(struct net_device *dev, unsigned int txq,
+				    unsigned int rxq)
+{
+	size_t len;
+
+	if (rxq < dev->real_num_rx_queues) {
+		len = (dev->real_num_rx_queues - rxq) * sizeof(*dev->cfg->qcfg);
+
+		memset(&dev->cfg->qcfg[rxq], 0, len);
+		memset(&dev->cfg_pending->qcfg[rxq], 0, len);
+	}
+}
+
 void __netdev_queue_config(struct net_device *dev, int rxq,
 			   struct netdev_queue_config *qcfg, bool pending)
 {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 21/24] eth: bnxt: use queue op config validate
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (19 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 20/24] net: wipe the setting of deactived queues Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 22/24] eth: bnxt: support per queue configuration of rx-buf-len Pavel Begunkov
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Move the rx-buf-len config validation to the queue ops.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 40 +++++++++++++++++++
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 ------
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1741aeffee55..ea95a06ae62b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -16203,8 +16203,46 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
 	return 0;
 }
 
+static int
+bnxt_queue_cfg_validate(struct net_device *dev, int idx,
+			struct netdev_queue_config *qcfg,
+			struct netlink_ext_ack *extack)
+{
+	struct bnxt *bp = netdev_priv(dev);
+
+	/* Older chips need MSS calc so rx_buf_len is not supported,
+	 * but we don't set queue ops for them so we should never get here.
+	 */
+	if (qcfg->rx_buf_len != bp->rx_page_size &&
+	    !(bp->flags & BNXT_FLAG_CHIP_P5_PLUS)) {
+		NL_SET_ERR_MSG_MOD(extack, "changing rx-buf-len not supported");
+		return -EINVAL;
+	}
+
+	if (!is_power_of_2(qcfg->rx_buf_len)) {
+		NL_SET_ERR_MSG_MOD(extack, "rx-buf-len is not power of 2");
+		return -ERANGE;
+	}
+	if (qcfg->rx_buf_len < BNXT_RX_PAGE_SIZE ||
+	    qcfg->rx_buf_len > BNXT_MAX_RX_PAGE_SIZE) {
+		NL_SET_ERR_MSG_MOD(extack, "rx-buf-len out of range");
+		return -ERANGE;
+	}
+	return 0;
+}
+
+static void
+bnxt_queue_cfg_defaults(struct net_device *dev, int idx,
+			struct netdev_queue_config *qcfg)
+{
+	qcfg->rx_buf_len	= BNXT_RX_PAGE_SIZE;
+}
+
 static const struct netdev_queue_mgmt_ops bnxt_queue_mgmt_ops = {
 	.ndo_queue_mem_size	= sizeof(struct bnxt_rx_ring_info),
+
+	.ndo_queue_cfg_defaults	= bnxt_queue_cfg_defaults,
+	.ndo_queue_cfg_validate = bnxt_queue_cfg_validate,
 	.ndo_queue_mem_alloc	= bnxt_queue_mem_alloc,
 	.ndo_queue_mem_free	= bnxt_queue_mem_free,
 	.ndo_queue_start	= bnxt_queue_start,
@@ -16212,6 +16250,8 @@ static const struct netdev_queue_mgmt_ops bnxt_queue_mgmt_ops = {
 };
 
 static const struct netdev_queue_mgmt_ops bnxt_queue_mgmt_ops_unsupp = {
+	.ndo_queue_cfg_defaults	= bnxt_queue_cfg_defaults,
+	.ndo_queue_cfg_validate = bnxt_queue_cfg_validate,
 };
 
 static void bnxt_remove_one(struct pci_dev *pdev)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 7b5b9781262d..07bdf37421ce 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -867,18 +867,6 @@ static int bnxt_set_ringparam(struct net_device *dev,
 	if (!kernel_ering->rx_buf_len)	/* Zero means restore default */
 		kernel_ering->rx_buf_len = BNXT_RX_PAGE_SIZE;
 
-	if (kernel_ering->rx_buf_len != bp->rx_page_size &&
-	    !(bp->flags & BNXT_FLAG_CHIP_P5_PLUS)) {
-		NL_SET_ERR_MSG_MOD(extack, "changing rx-buf-len not supported");
-		return -EINVAL;
-	}
-	if (!is_power_of_2(kernel_ering->rx_buf_len) ||
-	    kernel_ering->rx_buf_len < BNXT_RX_PAGE_SIZE ||
-	    kernel_ering->rx_buf_len > BNXT_MAX_RX_PAGE_SIZE) {
-		NL_SET_ERR_MSG_MOD(extack, "rx-buf-len out of range, or not power of 2");
-		return -ERANGE;
-	}
-
 	if (netif_running(dev))
 		bnxt_close_nic(bp, false, false);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 22/24] eth: bnxt: support per queue configuration of rx-buf-len
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (20 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 21/24] eth: bnxt: use queue op config validate Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 23/24] net: let pp memory provider to specify rx buf len Pavel Begunkov
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

From: Jakub Kicinski <kuba@kernel.org>

Now that the rx_buf_len is stored and validated per queue allow
it being set differently for different queues. Instead of copying
the device setting for each queue ask the core for the config
via netdev_queue_config().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ea95a06ae62b..a734b18e47c4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4320,6 +4320,7 @@ static void bnxt_init_ring_struct(struct bnxt *bp)
 
 	for (i = 0; i < bp->cp_nr_rings; i++) {
 		struct bnxt_napi *bnapi = bp->bnapi[i];
+		struct netdev_queue_config qcfg;
 		struct bnxt_ring_mem_info *rmem;
 		struct bnxt_cp_ring_info *cpr;
 		struct bnxt_rx_ring_info *rxr;
@@ -4342,7 +4343,8 @@ static void bnxt_init_ring_struct(struct bnxt *bp)
 		if (!rxr)
 			goto skip_rx;
 
-		rxr->rx_page_size = bp->rx_page_size;
+		netdev_queue_config(bp->dev, i,	&qcfg);
+		rxr->rx_page_size = qcfg.rx_buf_len;
 
 		ring = &rxr->rx_ring_struct;
 		rmem = &ring->ring_mem;
@@ -15928,6 +15930,7 @@ static int bnxt_queue_mem_alloc(struct net_device *dev,
 	clone->rx_agg_prod = 0;
 	clone->rx_sw_agg_prod = 0;
 	clone->rx_next_cons = 0;
+	clone->rx_page_size = qcfg->rx_buf_len;
 	clone->need_head_pool = false;
 
 	rc = bnxt_alloc_rx_page_pool(bp, clone, rxr->page_pool->p.nid);
@@ -16032,6 +16035,8 @@ static void bnxt_copy_rx_ring(struct bnxt *bp,
 	src_ring = &src->rx_ring_struct;
 	src_rmem = &src_ring->ring_mem;
 
+	dst->rx_page_size = src->rx_page_size;
+
 	WARN_ON(dst_rmem->nr_pages != src_rmem->nr_pages);
 	WARN_ON(dst_rmem->page_size != src_rmem->page_size);
 	WARN_ON(dst_rmem->flags != src_rmem->flags);
@@ -16239,6 +16244,7 @@ bnxt_queue_cfg_defaults(struct net_device *dev, int idx,
 }
 
 static const struct netdev_queue_mgmt_ops bnxt_queue_mgmt_ops = {
+	.supported_ring_params	= ETHTOOL_RING_USE_RX_BUF_LEN,
 	.ndo_queue_mem_size	= sizeof(struct bnxt_rx_ring_info),
 
 	.ndo_queue_cfg_defaults	= bnxt_queue_cfg_defaults,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 23/24] net: let pp memory provider to specify rx buf len
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (21 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 22/24] eth: bnxt: support per queue configuration of rx-buf-len Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 14:54 ` [PATCH net-next v4 24/24] net: validate driver supports passed qcfg params Pavel Begunkov
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

Allow memory providers to configure rx queues with a specific receive
buffer length. Pass it in sturct pp_memory_provider_params, which is
copied into the queue, and make __netdev_queue_config() to check if it's
present and apply to the configuration. This way the configured length
will persist across queue restarts, and will be automatically removed
once a memory provider is detached.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/page_pool/types.h |  1 +
 net/core/netdev_config.c      | 15 +++++++++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 1509a536cb85..be74e4aec7b5 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -161,6 +161,7 @@ struct memory_provider_ops;
 struct pp_memory_provider_params {
 	void *mp_priv;
 	const struct memory_provider_ops *mp_ops;
+	u32 rx_buf_len;
 };
 
 struct page_pool {
diff --git a/net/core/netdev_config.c b/net/core/netdev_config.c
index c5ae39e76f40..2c9b06f94e01 100644
--- a/net/core/netdev_config.c
+++ b/net/core/netdev_config.c
@@ -2,6 +2,7 @@
 
 #include <linux/netdevice.h>
 #include <net/netdev_queues.h>
+#include <net/netdev_rx_queue.h>
 
 #include "dev.h"
 
@@ -77,7 +78,7 @@ void netdev_queue_config_update_cnt(struct net_device *dev, unsigned int txq,
 	}
 }
 
-void __netdev_queue_config(struct net_device *dev, int rxq,
+void __netdev_queue_config(struct net_device *dev, int rxq_idx,
 			   struct netdev_queue_config *qcfg, bool pending)
 {
 	const struct netdev_config *cfg;
@@ -88,18 +89,24 @@ void __netdev_queue_config(struct net_device *dev, int rxq,
 
 	/* Get defaults from the driver, in case user config not set */
 	if (dev->queue_mgmt_ops->ndo_queue_cfg_defaults)
-		dev->queue_mgmt_ops->ndo_queue_cfg_defaults(dev, rxq, qcfg);
+		dev->queue_mgmt_ops->ndo_queue_cfg_defaults(dev, rxq_idx, qcfg);
 
 	/* Set config based on device-level settings */
 	if (cfg->rx_buf_len)
 		qcfg->rx_buf_len = cfg->rx_buf_len;
 
 	/* Set config dedicated to this queue */
-	if (rxq >= 0) {
-		const struct netdev_queue_config *user_cfg = &cfg->qcfg[rxq];
+	if (rxq_idx >= 0) {
+		const struct netdev_queue_config *user_cfg;
+		struct netdev_rx_queue *rxq;
 
+		user_cfg = &cfg->qcfg[rxq_idx];
 		if (user_cfg->rx_buf_len)
 			qcfg->rx_buf_len = user_cfg->rx_buf_len;
+
+		rxq = __netif_get_rx_queue(dev, rxq_idx);
+		if (rxq->mp_params.mp_ops && rxq->mp_params.rx_buf_len)
+			qcfg->rx_buf_len = rxq->mp_params.rx_buf_len;
 	}
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v4 24/24] net: validate driver supports passed qcfg params
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (22 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 23/24] net: let pp memory provider to specify rx buf len Pavel Begunkov
@ 2025-10-13 14:54 ` Pavel Begunkov
  2025-10-13 15:03 ` [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
  2025-10-13 17:54 ` Jakub Kicinski
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 14:54 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Pavel Begunkov, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

When we pass a qcfg to a driver, make sure it supports the set
parameters by checking it against ->supported_ring_params.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/core/dev.h             |  3 +++
 net/core/netdev_config.c   | 26 ++++++++++++++++++++++++++
 net/core/netdev_rx_queue.c |  8 +++-----
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.h b/net/core/dev.h
index 63192dbb1895..96eae1c51328 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -103,6 +103,9 @@ int netdev_queue_config_revalidate(struct net_device *dev,
 				   struct netlink_ext_ack *extack);
 void netdev_queue_config_update_cnt(struct net_device *dev, unsigned int txq,
 				    unsigned int rxq);
+int netdev_queue_config_validate(struct net_device *dev, int rxq_idx,
+				 struct netdev_queue_config *qcfg,
+				 struct netlink_ext_ack *extack);
 
 /* netdev management, shared between various uAPI entry points */
 struct netdev_name_node {
diff --git a/net/core/netdev_config.c b/net/core/netdev_config.c
index 2c9b06f94e01..99e64d942d44 100644
--- a/net/core/netdev_config.c
+++ b/net/core/netdev_config.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
 #include <linux/netdevice.h>
+#include <linux/ethtool.h>
 #include <net/netdev_queues.h>
 #include <net/netdev_rx_queue.h>
 
@@ -136,6 +137,31 @@ void netdev_queue_config(struct net_device *dev, int rxq,
 }
 EXPORT_SYMBOL(netdev_queue_config);
 
+int netdev_queue_config_validate(struct net_device *dev, int rxq_idx,
+				 struct netdev_queue_config *qcfg,
+				 struct netlink_ext_ack *extack)
+{
+	const struct netdev_queue_mgmt_ops *qops = dev->queue_mgmt_ops;
+	int err;
+
+	if (WARN_ON_ONCE(!qops))
+		return -EINVAL;
+
+	if (!(qops->supported_ring_params & ETHTOOL_RING_USE_RX_BUF_LEN) &&
+	    qcfg->rx_buf_len &&
+	    qcfg->rx_buf_len != dev->cfg_pending->rx_buf_len) {
+		NL_SET_ERR_MSG_MOD(extack, "changing rx-buf-len not supported");
+		return -EINVAL;
+	}
+
+	if (qops->ndo_queue_cfg_validate) {
+		err = qops->ndo_queue_cfg_validate(dev, rxq_idx, qcfg, extack);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
 int netdev_queue_config_revalidate(struct net_device *dev,
 				   struct netlink_ext_ack *extack)
 {
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
index 5ae375a072a1..a157964cf60d 100644
--- a/net/core/netdev_rx_queue.c
+++ b/net/core/netdev_rx_queue.c
@@ -46,11 +46,9 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx,
 
 	netdev_queue_config(dev, rxq_idx, &qcfg);
 
-	if (qops->ndo_queue_cfg_validate) {
-		err = qops->ndo_queue_cfg_validate(dev, rxq_idx, &qcfg, extack);
-		if (err)
-			goto err_free_old_mem;
-	}
+	err = netdev_queue_config_validate(dev, rxq_idx, &qcfg, extack);
+	if (err)
+		goto err_free_old_mem;
 
 	err = qops->ndo_queue_mem_alloc(dev, &qcfg, new_mem, rxq_idx);
 	if (err)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (23 preceding siblings ...)
  2025-10-13 14:54 ` [PATCH net-next v4 24/24] net: validate driver supports passed qcfg params Pavel Begunkov
@ 2025-10-13 15:03 ` Pavel Begunkov
  2025-10-13 17:54 ` Jakub Kicinski
  25 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-13 15:03 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Breno Leitao, Dragos Tatulea, linux-kernel,
	linux-doc, linux-rdma, Jonathan Corbet, io-uring

On 10/13/25 15:54, Pavel Begunkov wrote:

Forgot to CC io_uring

> Add support for per-queue rx buffer length configuration based on [2]
> and basic infrastructure for using it in memory providers like
> io_uring/zcrx. Note, it only includes net/ patches and leaves out
> zcrx to be merged separately. Large rx buffers can be beneficial with
> hw-gro enabled cards that can coalesce traffic, which reduces the
> number of frags traversing the network stack and resuling in larger
> contiguous chunks of data given to the userspace.

Same note as the last time, not great that it's over the 15 patches,
but I don't see a good way to shrink it considering that the original
series [2] is 22 patches long, and I'll somehow need to pull it it
into the io_uring tree after. Please let me know if there is a strong
feeling about that, and/or what would the preferred way be.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 06/24] net: clarify the meaning of netdev_config members
  2025-10-13 14:54 ` [PATCH net-next v4 06/24] net: clarify the meaning of netdev_config members Pavel Begunkov
@ 2025-10-13 17:12   ` Randy Dunlap
  2025-10-14 12:53     ` Pavel Begunkov
  0 siblings, 1 reply; 37+ messages in thread
From: Randy Dunlap @ 2025-10-13 17:12 UTC (permalink / raw)
  To: Pavel Begunkov, netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Breno Leitao, Dragos Tatulea, linux-kernel,
	linux-doc, linux-rdma, Jonathan Corbet

Hi,

On 10/13/25 7:54 AM, Pavel Begunkov wrote:
> From: Jakub Kicinski <kuba@kernel.org>
> 
> hds_thresh and hds_config are both inside struct netdev_config
> but have quite different semantics. hds_config is the user config
> with ternary semantics (on/off/unset). hds_thresh is a straight
> up value, populated by the driver at init and only modified by
> user space. We don't expect the drivers to have to pick a special
> hds_thresh value based on other configuration.
> 
> The two approaches have different advantages and downsides.
> hds_thresh ("direct value") gives core easy access to current
> device settings, but there's no way to express whether the value
> comes from the user. It also requires the initialization by
> the driver.
> 
> hds_config ("user config values") tells us what user wanted, but
> doesn't give us the current value in the core.
> 
> Try to explain this a bit in the comments, so at we make a conscious
> choice for new values which semantics we expect.
> 
> Move the init inside ethtool_ringparam_get_cfg() to reflect the semantics.
> Commit 216a61d33c07 ("net: ethtool: fix ethtool_ringparam_get_cfg()
> returns a hds_thresh value always as 0.") added the setting for the
> benefit of netdevsim which doesn't touch the value at all on get.
> Again, this is just to clarify the intention, shouldn't cause any
> functional change.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> [pavel: applied clarification on relationship b/w HDS thresh and config]
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/net/netdev_queues.h | 20 ++++++++++++++++++--
>  net/ethtool/common.c        |  3 ++-
>  2 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> index cd00e0406cf4..9d5dde36c2e5 100644
> --- a/include/net/netdev_queues.h
> +++ b/include/net/netdev_queues.h
> @@ -6,11 +6,27 @@
>  
>  /**
>   * struct netdev_config - queue-related configuration for a netdev
> - * @hds_thresh:		HDS Threshold value.
> - * @hds_config:		HDS value from userspace.
>   */
>  struct netdev_config {
> +	/* Direct value
> +	 *
> +	 * Driver default is expected to be fixed, and set in this struct
> +	 * at init. From that point on user may change the value. There is
> +	 * no explicit way to "unset" / restore driver default. Used only
> +	 * when @hds_config is set.
> +	 */
> +	/** @hds_thresh: HDS Threshold value (ETHTOOL_A_RINGS_HDS_THRESH).
> +	 */
>  	u32	hds_thresh;
> +
> +	/* User config values
> +	 *
> +	 * Contain user configuration. If "set" driver must obey.
> +	 * If "unset" driver is free to decide, and may change its choice
> +	 * as other parameters change.
> +	 */
> +	/** @hds_config: HDS enabled (ETHTOOL_A_RINGS_TCP_DATA_SPLIT).
> +	 */
>  	u8	hds_config;
>  };

kernel-doc comments should being with
/**
on a separate line. This will prevent warnings like these new ones:

Warning: include/net/netdev_queues.h:36 struct member 'hds_thresh' not described in 'netdev_config'
Warning: include/net/netdev_queues.h:36 struct member 'hds_config' not described in 'netdev_config'
Warning: include/net/netdev_queues.h:36 struct member 'rx_buf_len' not described in 'netdev_config'

(but there are 4 variables that this applies to. I don't know why kernel-doc.py
only reported 3 of them.)


-- 
~Randy


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
                   ` (24 preceding siblings ...)
  2025-10-13 15:03 ` [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
@ 2025-10-13 17:54 ` Jakub Kicinski
  2025-10-14  4:41   ` Mina Almasry
  2025-10-14 12:46   ` Pavel Begunkov
  25 siblings, 2 replies; 37+ messages in thread
From: Jakub Kicinski @ 2025-10-13 17:54 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: netdev, Andrew Lunn, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Breno Leitao, Dragos Tatulea, linux-kernel,
	linux-doc, linux-rdma, Jonathan Corbet

On Mon, 13 Oct 2025 15:54:02 +0100 Pavel Begunkov wrote:
> Jakub Kicinski (20):
>   docs: ethtool: document that rx_buf_len must control payload lengths
>   net: ethtool: report max value for rx-buf-len
>   net: use zero value to restore rx_buf_len to default
>   net: clarify the meaning of netdev_config members
>   net: add rx_buf_len to netdev config
>   eth: bnxt: read the page size from the adapter struct
>   eth: bnxt: set page pool page order based on rx_page_size
>   eth: bnxt: support setting size of agg buffers via ethtool
>   net: move netdev_config manipulation to dedicated helpers
>   net: reduce indent of struct netdev_queue_mgmt_ops members
>   net: allocate per-queue config structs and pass them thru the queue
>     API
>   net: pass extack to netdev_rx_queue_restart()
>   net: add queue config validation callback
>   eth: bnxt: always set the queue mgmt ops
>   eth: bnxt: store the rx buf size per queue
>   eth: bnxt: adjust the fill level of agg queues with larger buffers
>   netdev: add support for setting rx-buf-len per queue
>   net: wipe the setting of deactived queues
>   eth: bnxt: use queue op config validate
>   eth: bnxt: support per queue configuration of rx-buf-len

I'd like to rework these a little bit.
On reflection I don't like the single size control.
Please hold off.

Also what's the resolution for the maintainers entry / cross posting?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-13 17:54 ` Jakub Kicinski
@ 2025-10-14  4:41   ` Mina Almasry
  2025-10-14 12:50     ` Pavel Begunkov
  2025-10-15  1:41     ` Jakub Kicinski
  2025-10-14 12:46   ` Pavel Begunkov
  1 sibling, 2 replies; 37+ messages in thread
From: Mina Almasry @ 2025-10-14  4:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Pavel Begunkov, netdev, Andrew Lunn, davem, Eric Dumazet,
	Paolo Abeni, Simon Horman, Donald Hunter, Michael Chan,
	Pavan Chebbi, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Joshua Washington, Harshitha Ramamurthy,
	Jian Shen, Salil Mehta, Jijie Shao, Sunil Goutham,
	Geetha sowjanya, Subbaraya Sundeep, hariprasad, Bharat Bhushan,
	Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
	Alexander Duyck, kernel-team, Ilias Apalodimas, Joe Damato,
	David Wei, Willem de Bruijn, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

On Mon, Oct 13, 2025 at 10:54 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 13 Oct 2025 15:54:02 +0100 Pavel Begunkov wrote:
> > Jakub Kicinski (20):
> >   docs: ethtool: document that rx_buf_len must control payload lengths
> >   net: ethtool: report max value for rx-buf-len
> >   net: use zero value to restore rx_buf_len to default
> >   net: clarify the meaning of netdev_config members
> >   net: add rx_buf_len to netdev config
> >   eth: bnxt: read the page size from the adapter struct
> >   eth: bnxt: set page pool page order based on rx_page_size
> >   eth: bnxt: support setting size of agg buffers via ethtool
> >   net: move netdev_config manipulation to dedicated helpers
> >   net: reduce indent of struct netdev_queue_mgmt_ops members
> >   net: allocate per-queue config structs and pass them thru the queue
> >     API
> >   net: pass extack to netdev_rx_queue_restart()
> >   net: add queue config validation callback
> >   eth: bnxt: always set the queue mgmt ops
> >   eth: bnxt: store the rx buf size per queue
> >   eth: bnxt: adjust the fill level of agg queues with larger buffers
> >   netdev: add support for setting rx-buf-len per queue
> >   net: wipe the setting of deactived queues
> >   eth: bnxt: use queue op config validate
> >   eth: bnxt: support per queue configuration of rx-buf-len
>
> I'd like to rework these a little bit.
> On reflection I don't like the single size control.
> Please hold off.
>

FWIW when I last looked at this I didn't like that the size control
seemed to control the size of the allocations made from the pp, but
not the size actually posted to the NIC.

I.e. in the scenario where the driver fragments each pp buffer into 2,
and the user asks for 8K rx-buf-len, the size actually posted to the
NIC would have actually been 4K (8K / 2 for 2 fragments).

Not sure how much of a concern this really is. I thought it would be
great if somehow rx-buf-len controlled the buffer sizes actually
posted to the NIC, because that what ultimately matters, no (it ends
up being the size of the incoming frags)? Or does that not matter for
some reason I'm missing?

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-13 17:54 ` Jakub Kicinski
  2025-10-14  4:41   ` Mina Almasry
@ 2025-10-14 12:46   ` Pavel Begunkov
  1 sibling, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-14 12:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Andrew Lunn, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Breno Leitao, Dragos Tatulea, linux-kernel,
	linux-doc, linux-rdma, Jonathan Corbet, io-uring

On 10/13/25 18:54, Jakub Kicinski wrote:
> On Mon, 13 Oct 2025 15:54:02 +0100 Pavel Begunkov wrote:
>> Jakub Kicinski (20):
>>    docs: ethtool: document that rx_buf_len must control payload lengths
>>    net: ethtool: report max value for rx-buf-len
>>    net: use zero value to restore rx_buf_len to default
>>    net: clarify the meaning of netdev_config members
>>    net: add rx_buf_len to netdev config
>>    eth: bnxt: read the page size from the adapter struct
>>    eth: bnxt: set page pool page order based on rx_page_size
>>    eth: bnxt: support setting size of agg buffers via ethtool
>>    net: move netdev_config manipulation to dedicated helpers
>>    net: reduce indent of struct netdev_queue_mgmt_ops members
>>    net: allocate per-queue config structs and pass them thru the queue
>>      API
>>    net: pass extack to netdev_rx_queue_restart()
>>    net: add queue config validation callback
>>    eth: bnxt: always set the queue mgmt ops
>>    eth: bnxt: store the rx buf size per queue
>>    eth: bnxt: adjust the fill level of agg queues with larger buffers
>>    netdev: add support for setting rx-buf-len per queue
>>    net: wipe the setting of deactived queues
>>    eth: bnxt: use queue op config validate
>>    eth: bnxt: support per queue configuration of rx-buf-len
> 
> I'd like to rework these a little bit.
> On reflection I don't like the single size control.
> Please hold off.

I think that would be quite unproductive considering that this series
has been around for 3 months already with no forward progress, and the
API was posted 6 months ago. I have a better idea, I'll shrink it down
by removing all unnecessary parts, that makes it much much simpler and
should detangle the effort from ethtool bits like Stan once suggested.
I've also been bothered for some time by it growing to 24 patches, it'll
help with that as well. And it'll be a good base to put all the netlink
configuration bits on top if necessary.

> Also what's the resolution for the maintainers entry / cross posting?

I'm pretty much interested as well :) I've been CC'ing netdev as a
gesture of goodwill, that's despite you blocking an unrelated series
because of a rule you made up and retrospectively applied and belittling
my work after. It doesn't seem that you content with it either,
evidently from you blocking it again. I'm very curious what's that all
about? And since you're unwilling to deal with the series, maybe you'll
let other maintainers to handle it?

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-14  4:41   ` Mina Almasry
@ 2025-10-14 12:50     ` Pavel Begunkov
  2025-10-15  1:41     ` Jakub Kicinski
  1 sibling, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-14 12:50 UTC (permalink / raw)
  To: Mina Almasry, Jakub Kicinski
  Cc: netdev, Andrew Lunn, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Breno Leitao, Dragos Tatulea, linux-kernel, linux-doc, linux-rdma,
	Jonathan Corbet

On 10/14/25 05:41, Mina Almasry wrote:
> On Mon, Oct 13, 2025 at 10:54 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Mon, 13 Oct 2025 15:54:02 +0100 Pavel Begunkov wrote:
>>> Jakub Kicinski (20):
>>>    docs: ethtool: document that rx_buf_len must control payload lengths
>>>    net: ethtool: report max value for rx-buf-len
>>>    net: use zero value to restore rx_buf_len to default
>>>    net: clarify the meaning of netdev_config members
>>>    net: add rx_buf_len to netdev config
>>>    eth: bnxt: read the page size from the adapter struct
>>>    eth: bnxt: set page pool page order based on rx_page_size
>>>    eth: bnxt: support setting size of agg buffers via ethtool
>>>    net: move netdev_config manipulation to dedicated helpers
>>>    net: reduce indent of struct netdev_queue_mgmt_ops members
>>>    net: allocate per-queue config structs and pass them thru the queue
>>>      API
>>>    net: pass extack to netdev_rx_queue_restart()
>>>    net: add queue config validation callback
>>>    eth: bnxt: always set the queue mgmt ops
>>>    eth: bnxt: store the rx buf size per queue
>>>    eth: bnxt: adjust the fill level of agg queues with larger buffers
>>>    netdev: add support for setting rx-buf-len per queue
>>>    net: wipe the setting of deactived queues
>>>    eth: bnxt: use queue op config validate
>>>    eth: bnxt: support per queue configuration of rx-buf-len
>>
>> I'd like to rework these a little bit.
>> On reflection I don't like the single size control.
>> Please hold off.
>>
> 
> FWIW when I last looked at this I didn't like that the size control
> seemed to control the size of the allocations made from the pp, but
> not the size actually posted to the NIC.
> 
> I.e. in the scenario where the driver fragments each pp buffer into 2,
> and the user asks for 8K rx-buf-len, the size actually posted to the
> NIC would have actually been 4K (8K / 2 for 2 fragments).
> 
> Not sure how much of a concern this really is. I thought it would be
> great if somehow rx-buf-len controlled the buffer sizes actually
> posted to the NIC, because that what ultimately matters, no (it ends
> up being the size of the incoming frags)? Or does that not matter for
> some reason I'm missing?

Maybe we should just make a rule that if hardware doesn't support
the given size, qops should fail, but ultimately the userspace
should be able to handle it either way as gro is packing not
100% reliably.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 06/24] net: clarify the meaning of netdev_config members
  2025-10-13 17:12   ` Randy Dunlap
@ 2025-10-14 12:53     ` Pavel Begunkov
  0 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2025-10-14 12:53 UTC (permalink / raw)
  To: Randy Dunlap, netdev
  Cc: Andrew Lunn, Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni,
	Simon Horman, Donald Hunter, Michael Chan, Pavan Chebbi,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Joshua Washington, Harshitha Ramamurthy, Jian Shen, Salil Mehta,
	Jijie Shao, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Bharat Bhushan, Saeed Mahameed, Tariq Toukan,
	Mark Bloch, Leon Romanovsky, Alexander Duyck, kernel-team,
	Ilias Apalodimas, Joe Damato, David Wei, Willem de Bruijn,
	Mina Almasry, Breno Leitao, Dragos Tatulea, linux-kernel,
	linux-doc, linux-rdma, Jonathan Corbet

On 10/13/25 18:12, Randy Dunlap wrote:
> Hi,
> 
> On 10/13/25 7:54 AM, Pavel Begunkov wrote:
>> From: Jakub Kicinski <kuba@kernel.org>
>>
>> hds_thresh and hds_config are both inside struct netdev_config
>> but have quite different semantics. hds_config is the user config
>> with ternary semantics (on/off/unset). hds_thresh is a straight
>> up value, populated by the driver at init and only modified by
>> user space. We don't expect the drivers to have to pick a special
>> hds_thresh value based on other configuration.
>>
>> The two approaches have different advantages and downsides.
>> hds_thresh ("direct value") gives core easy access to current
>> device settings, but there's no way to express whether the value
>> comes from the user. It also requires the initialization by
>> the driver.
>>
>> hds_config ("user config values") tells us what user wanted, but
>> doesn't give us the current value in the core.
>>
>> Try to explain this a bit in the comments, so at we make a conscious
>> choice for new values which semantics we expect.
>>
>> Move the init inside ethtool_ringparam_get_cfg() to reflect the semantics.
>> Commit 216a61d33c07 ("net: ethtool: fix ethtool_ringparam_get_cfg()
>> returns a hds_thresh value always as 0.") added the setting for the
>> benefit of netdevsim which doesn't touch the value at all on get.
>> Again, this is just to clarify the intention, shouldn't cause any
>> functional change.
>>
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> [pavel: applied clarification on relationship b/w HDS thresh and config]
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/net/netdev_queues.h | 20 ++++++++++++++++++--
>>   net/ethtool/common.c        |  3 ++-
>>   2 files changed, 20 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
>> index cd00e0406cf4..9d5dde36c2e5 100644
>> --- a/include/net/netdev_queues.h
>> +++ b/include/net/netdev_queues.h
>> @@ -6,11 +6,27 @@
>>   
>>   /**
>>    * struct netdev_config - queue-related configuration for a netdev
>> - * @hds_thresh:		HDS Threshold value.
>> - * @hds_config:		HDS value from userspace.
>>    */
>>   struct netdev_config {
>> +	/* Direct value
>> +	 *
>> +	 * Driver default is expected to be fixed, and set in this struct
>> +	 * at init. From that point on user may change the value. There is
>> +	 * no explicit way to "unset" / restore driver default. Used only
>> +	 * when @hds_config is set.
>> +	 */
>> +	/** @hds_thresh: HDS Threshold value (ETHTOOL_A_RINGS_HDS_THRESH).
>> +	 */
>>   	u32	hds_thresh;
>> +
>> +	/* User config values
>> +	 *
>> +	 * Contain user configuration. If "set" driver must obey.
>> +	 * If "unset" driver is free to decide, and may change its choice
>> +	 * as other parameters change.
>> +	 */
>> +	/** @hds_config: HDS enabled (ETHTOOL_A_RINGS_TCP_DATA_SPLIT).
>> +	 */
>>   	u8	hds_config;
>>   };
> 
> kernel-doc comments should being with
> /**
> on a separate line. This will prevent warnings like these new ones:
> 
> Warning: include/net/netdev_queues.h:36 struct member 'hds_thresh' not described in 'netdev_config'
> Warning: include/net/netdev_queues.h:36 struct member 'hds_config' not described in 'netdev_config'
> Warning: include/net/netdev_queues.h:36 struct member 'rx_buf_len' not described in 'netdev_config'
> 
> (but there are 4 variables that this applies to. I don't know why kernel-doc.py
> only reported 3 of them.)

Thanks for taking a look! I was a bit reluctant to change it
too much since it's not authored by me. I'll be moving in direction
of removing the patch completely.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-14  4:41   ` Mina Almasry
  2025-10-14 12:50     ` Pavel Begunkov
@ 2025-10-15  1:41     ` Jakub Kicinski
  2025-10-15 17:44       ` Mina Almasry
  1 sibling, 1 reply; 37+ messages in thread
From: Jakub Kicinski @ 2025-10-15  1:41 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Pavel Begunkov, netdev, Andrew Lunn, davem, Eric Dumazet,
	Paolo Abeni, Simon Horman, Donald Hunter, Michael Chan,
	Pavan Chebbi, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Joshua Washington, Harshitha Ramamurthy,
	Jian Shen, Salil Mehta, Jijie Shao, Sunil Goutham,
	Geetha sowjanya, Subbaraya Sundeep, hariprasad, Bharat Bhushan,
	Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
	Alexander Duyck, kernel-team, Ilias Apalodimas, Joe Damato,
	David Wei, Willem de Bruijn, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

On Mon, 13 Oct 2025 21:41:38 -0700 Mina Almasry wrote:
> > I'd like to rework these a little bit.
> > On reflection I don't like the single size control.
> > Please hold off.
>
> FWIW when I last looked at this I didn't like that the size control
> seemed to control the size of the allocations made from the pp, but
> not the size actually posted to the NIC.
> 
> I.e. in the scenario where the driver fragments each pp buffer into 2,
> and the user asks for 8K rx-buf-len, the size actually posted to the
> NIC would have actually been 4K (8K / 2 for 2 fragments).
> 
> Not sure how much of a concern this really is. I thought it would be
> great if somehow rx-buf-len controlled the buffer sizes actually
> posted to the NIC, because that what ultimately matters, no (it ends
> up being the size of the incoming frags)? Or does that not matter for
> some reason I'm missing?

I spent a couple of hours trying to write up my thoughts but I still haven't
finished 😅️ I'll send the full thing tomorrow.

You may have looked at hns3 is that right? It bumps the page pool order
by 1 so that it can fit two allocations into each page. I'm guessing
it's a remnant of "page flipping". The other current user of rx-buf-len
(otx2) doesn't do that - it uses simple page_order(rx_buf_len), AFAICT.
If that's what you mean - I'd chalk the hns3 behavior to "historical
reasons", it can probably be straightened out today to everyone's
benefit.

I wanted to reply already (before I present my "full case" :)) because
my thinking started slipping in the opposite direction of being
concerned about "buffer sizes actually posted to the NIC". 
Say the NIC packs packet payloads into buffers like this:

          1          2     3      
packets:  xxxxxxxxx  yyyy  zzzzzzz
buffers:  [xxxx] [xxxx] [x|yyy] [y|zzz] [zzzz]

Hope the diagram makes sense, each [....] is 4k, headers went elsewhere.

If the user filled in the page pool with 16k buffers, and driver split
it up into 4k chunks. HW packed the payloads into those 4k chunks,
and GRO reformed them back into just 2 skb frags. Do we really care
about the buffer size on the HW fill ring being 4kB ? Isn't what user
cares about that they saw 2 frags not 5 ?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-15  1:41     ` Jakub Kicinski
@ 2025-10-15 17:44       ` Mina Almasry
  2025-10-17  1:40         ` Jakub Kicinski
  0 siblings, 1 reply; 37+ messages in thread
From: Mina Almasry @ 2025-10-15 17:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Pavel Begunkov, netdev, Andrew Lunn, davem, Eric Dumazet,
	Paolo Abeni, Simon Horman, Donald Hunter, Michael Chan,
	Pavan Chebbi, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Joshua Washington, Harshitha Ramamurthy,
	Jian Shen, Salil Mehta, Jijie Shao, Sunil Goutham,
	Geetha sowjanya, Subbaraya Sundeep, hariprasad, Bharat Bhushan,
	Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
	Alexander Duyck, kernel-team, Ilias Apalodimas, Joe Damato,
	David Wei, Willem de Bruijn, Breno Leitao, Dragos Tatulea,
	linux-kernel, linux-doc, linux-rdma, Jonathan Corbet

On Tue, Oct 14, 2025 at 6:41 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 13 Oct 2025 21:41:38 -0700 Mina Almasry wrote:
> > > I'd like to rework these a little bit.
> > > On reflection I don't like the single size control.
> > > Please hold off.
> >
> > FWIW when I last looked at this I didn't like that the size control
> > seemed to control the size of the allocations made from the pp, but
> > not the size actually posted to the NIC.
> >
> > I.e. in the scenario where the driver fragments each pp buffer into 2,
> > and the user asks for 8K rx-buf-len, the size actually posted to the
> > NIC would have actually been 4K (8K / 2 for 2 fragments).
> >
> > Not sure how much of a concern this really is. I thought it would be
> > great if somehow rx-buf-len controlled the buffer sizes actually
> > posted to the NIC, because that what ultimately matters, no (it ends
> > up being the size of the incoming frags)? Or does that not matter for
> > some reason I'm missing?
>
> I spent a couple of hours trying to write up my thoughts but I still haven't
> finished 😅️ I'll send the full thing tomorrow.
>
> You may have looked at hns3 is that right? It bumps the page pool order
> by 1 so that it can fit two allocations into each page. I'm guessing
> it's a remnant of "page flipping". The other current user of rx-buf-len
> (otx2) doesn't do that - it uses simple page_order(rx_buf_len), AFAICT.
> If that's what you mean - I'd chalk the hns3 behavior to "historical
> reasons", it can probably be straightened out today to everyone's
> benefit.
>
> I wanted to reply already (before I present my "full case" :)) because
> my thinking started slipping in the opposite direction of being
> concerned about "buffer sizes actually posted to the NIC".
> Say the NIC packs packet payloads into buffers like this:
>
>           1          2     3
> packets:  xxxxxxxxx  yyyy  zzzzzzz
> buffers:  [xxxx] [xxxx] [x|yyy] [y|zzz] [zzzz]
>
> Hope the diagram makes sense, each [....] is 4k, headers went elsewhere.
>
> If the user filled in the page pool with 16k buffers, and driver split
> it up into 4k chunks. HW packed the payloads into those 4k chunks,
> and GRO reformed them back into just 2 skb frags. Do we really care
> about the buffer size on the HW fill ring being 4kB ? Isn't what user
> cares about that they saw 2 frags not 5 ?

I think what you're saying is what I was trying to say, but you said
it more eloquently and genetically correct. I'm not familiar with the
GRO packing you're referring to so I just assumed the 'buffer sizes
actually posted to the NIC' are the 'buffer sizes we end up seeing in
the skb frags'.

I guess what I'm trying to say in a different way, is: there are lots
of buffer sizes in the rx path, AFAICT, at least:

1. The size of the allocated netmems from the pp.
2. The size of the buffers posted to the NIC (which will be different
from #1 if the page_pool_fragment_netmem or some other trick like
hns3).
3. The size of the frags that end up in the skb (which will be
different from #2 for GRO/other things I don't fully understand).

...and I'm not sure what rx-buf-len should actually configure. My
thinking is that it probably should configure #3, since that is what
the user cares about, I agree with that.

IIRC when I last looked at this a few weeks ago, I think as written
this patch series makes rx-buf-len actually configure #1.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-15 17:44       ` Mina Almasry
@ 2025-10-17  1:40         ` Jakub Kicinski
  2025-10-22 13:17           ` Dragos Tatulea
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Kicinski @ 2025-10-17  1:40 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Pavel Begunkov, netdev, Andrew Lunn, davem, Eric Dumazet,
	Paolo Abeni, Simon Horman, Donald Hunter, Michael Chan,
	Pavan Chebbi, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Joshua Washington, Harshitha Ramamurthy,
	Jian Shen, Salil Mehta, Jijie Shao, Sunil Goutham,
	Geetha sowjanya, Subbaraya Sundeep, hariprasad, Bharat Bhushan,
	Saeed Mahameed, Tariq Toukan, Mark Bloch, Alexander Duyck,
	kernel-team, Ilias Apalodimas, Joe Damato, David Wei,
	Willem de Bruijn, Breno Leitao, Dragos Tatulea, linux-kernel,
	linux-doc, Jonathan Corbet

On Wed, 15 Oct 2025 10:44:19 -0700 Mina Almasry wrote:
> I think what you're saying is what I was trying to say, but you said
> it more eloquently and genetically correct. I'm not familiar with the
> GRO packing you're referring to so. I just assumed the 'buffer sizes
> actually posted to the NIC' are the 'buffer sizes we end up seeing in
> the skb frags'.

I don't think that code path exists today, buffers posted are frags
in the skb. But that's easily fixable.

> I guess what I'm trying to say in a different way, is: there are lots
> of buffer sizes in the rx path, AFAICT, at least:
> 
> 1. The size of the allocated netmems from the pp.
> 2. The size of the buffers posted to the NIC (which will be different
> from #1 if the page_pool_fragment_netmem or some other trick like
> hns3).
> 3. The size of the frags that end up in the skb (which will be
> different from #2 for GRO/other things I don't fully understand).
> 
> ...and I'm not sure what rx-buf-len should actually configure. My
> thinking is that it probably should configure #3, since that is what
> the user cares about, I agree with that.
> 
> IIRC when I last looked at this a few weeks ago, I think as written
> this patch series makes rx-buf-len actually configure #1.

#1 or #2. #1 for otx2. For the RFC bnxt implementation they were
equivalent. But hns3's reading would be that it's #2.

From user PoV neither #1 nor #2 is particularly meaningful.
Assuming driver can fragment - #1 only configures memory accounting
blocks. #2 configures buffers passed to the HW, but some HW can pack
payloads into a single buf to save memory. Which means that if previous
frame was small and ate some of a page, subsequent large frame of
size M may not fit into a single buf of size X, even if M < X.

So I think the full set of parameters we should define would be
what you defined as #1 and #2. And on top of that we need some kind of
min alignment enforcement. David Wei mentioned that one of his main use
cases is ZC of a buffer which is then sent to storage, which has strict
alignment requirements. And some NICs will internally fragment the
page.. Maybe let's define the expected device behavior..

Device models
=============
Assume we receive 2 5kB packets, "x" means bytes from first packet,
"y" means bytes from the second packet.

A. Basic-scatter
----------------
Packet uses one or more buffers, so 1:n mapping between packets and
buffers.
                       unused space
                      v
 1kB      [xx] [xx] [x ] [yy] [yy] [y ]
16kB      [xxxxx            ] [yyyyy             ]

B. Multi-packet
---------------
The configurations above are still possible, but we can configure 
the device to place multiple packets in a large page:

                 unused space
                v
16kB, 2kB [xxxxx |yyyyy |...] [..................]
      ^
      alignment / stride

We can probably assume that this model always comes with alignment
cause DMA'ing frames at odd offsets is a bad idea. And also note
that packets smaller that alignment can get scattered to multiple
bufs.

C. Multi-packet HW-GRO
----------------------
For completeness, I guess. We need a third packet here. Assume x-packet
and z-packet are from the same flow and GRO session, y-packet is not.
(Good?) HW-GRO gives us out of order placement and hopefully in this
case we do want to pack:

16kB, 2kB [xxxxxzzzzz |.......] [xxxxx.............]
                     ^
      alignment / stride

End of sidebar. I think / hope these are all practical buffer layouts
we need to care about.

What does user care about? Presumably three things:
 a) efficiency of memory use (larger pages == more chance of low fill)
 b) max size of a buffer (larger buffer = fewer iovecs to pass around)
 c) alignment
I don't think we can make these map 1:1 to any of the knobs we discussed
at the start. (b) is really neither #1 (if driver fragments) nor #2 (if
SW GRO can glue back together).

We could simply let the user control #1 - basically user control
overrides the places where driver would previously use PAGE_SIZE.
I think this is what Stan suggested long ago as well.

But I wonder if user still needs to know #2 (rx-buf-len) because
practically speaking, setting page size >4x the size of rx-buf-len
is likely a lot more fragmentation for little extra aggregation.. ?
Tho, admittedly I think user only needs to know max-rx-buf-len
not necessarily set it.

The last knob is alignment / reuse. For allowing multiple packets in
one buffer we probably need to distinguish these cases to cater to
sufficiently clever adapters:
 - previous and next packets are from the same flow and
   - within one GRO session
   - previous had PSH set (or closed the GRO for another reason,
     this is to allow realigning the buffer on GRO session close)
  or
   - the device doesn't know further distinctions / HW-GRO
 - previous and next are from different flows
And the actions (for each case separately) are one of:
 - no reuse allowed (release buffer = -1?)
 - reuse but must align (align to = N)
 - reuse don't align (pack = 0)

So to restate do we need:
 - "page order" control
 - max-rx-buf-len
 - 4 alignment knobs?

Corner cases
============
I. Non-power of 2 buffer sizes
------------------------------
Looks like multiple devices are limited by width of length fields,
making max buffer size something like 32kB - 1 or 64kB - 1.
Should we allow applications to configure the buffer to 

    power of 2 - alignment 

? It will probably annoy the page pool code a bit. I guess for now
we should just make sure that uAPI doesn't bake in the idea that
buffers are always power of 2.

II. Fractional page sizes
-------------------------
If the HW has max-rx-buf-len of 16k or 32k, and PAGE_SIZE is 64k
should we support hunking devmem/iouring into less than a PAGE_SIZE?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-17  1:40         ` Jakub Kicinski
@ 2025-10-22 13:17           ` Dragos Tatulea
  2025-10-23  0:09             ` Jakub Kicinski
  0 siblings, 1 reply; 37+ messages in thread
From: Dragos Tatulea @ 2025-10-22 13:17 UTC (permalink / raw)
  To: Jakub Kicinski, Mina Almasry
  Cc: Pavel Begunkov, netdev, Andrew Lunn, davem, Eric Dumazet,
	Paolo Abeni, Simon Horman, Donald Hunter, Michael Chan,
	Pavan Chebbi, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Joshua Washington, Harshitha Ramamurthy,
	Jian Shen, Salil Mehta, Jijie Shao, Sunil Goutham,
	Geetha sowjanya, Subbaraya Sundeep, hariprasad, Bharat Bhushan,
	Saeed Mahameed, Tariq Toukan, Mark Bloch, Alexander Duyck,
	kernel-team, Ilias Apalodimas, Joe Damato, David Wei,
	Willem de Bruijn, Breno Leitao, linux-kernel, linux-doc,
	Jonathan Corbet

Sorry for the late reply, I didn't see the discussion here.

On Thu, Oct 16, 2025 at 06:40:31PM -0700, Jakub Kicinski wrote:
> On Wed, 15 Oct 2025 10:44:19 -0700 Mina Almasry wrote:
> > I think what you're saying is what I was trying to say, but you said
> > it more eloquently and genetically correct. I'm not familiar with the
> > GRO packing you're referring to so. I just assumed the 'buffer sizes
> > actually posted to the NIC' are the 'buffer sizes we end up seeing in
> > the skb frags'.
> 
> I don't think that code path exists today, buffers posted are frags
> in the skb. But that's easily fixable.
> 
> > I guess what I'm trying to say in a different way, is: there are lots
> > of buffer sizes in the rx path, AFAICT, at least:
> > 
> > 1. The size of the allocated netmems from the pp.
> > 2. The size of the buffers posted to the NIC (which will be different
> > from #1 if the page_pool_fragment_netmem or some other trick like
> > hns3).
> > 3. The size of the frags that end up in the skb (which will be
> > different from #2 for GRO/other things I don't fully understand).
> > 
> > ...and I'm not sure what rx-buf-len should actually configure. My
> > thinking is that it probably should configure #3, since that is what
> > the user cares about, I agree with that.
> > 
> > IIRC when I last looked at this a few weeks ago, I think as written
> > this patch series makes rx-buf-len actually configure #1.
> 
> #1 or #2. #1 for otx2. For the RFC bnxt implementation they were
> equivalent. But hns3's reading would be that it's #2.
> 
> From user PoV neither #1 nor #2 is particularly meaningful.
> Assuming driver can fragment - #1 only configures memory accounting
> blocks. #2 configures buffers passed to the HW, but some HW can pack
> payloads into a single buf to save memory. Which means that if previous
> frame was small and ate some of a page, subsequent large frame of
> size M may not fit into a single buf of size X, even if M < X.
> 
> So I think the full set of parameters we should define would be
> what you defined as #1 and #2. And on top of that we need some kind of
> min alignment enforcement. David Wei mentioned that one of his main use
> cases is ZC of a buffer which is then sent to storage, which has strict
> alignment requirements. And some NICs will internally fragment the
> page.. Maybe let's define the expected device behavior..
> 
> Device models
> =============
> Assume we receive 2 5kB packets, "x" means bytes from first packet,
> "y" means bytes from the second packet.
> 
> A. Basic-scatter
> ----------------
> Packet uses one or more buffers, so 1:n mapping between packets and
> buffers.
>                        unused space
>                       v
>  1kB      [xx] [xx] [x ] [yy] [yy] [y ]
> 16kB      [xxxxx            ] [yyyyy             ]
> 
> B. Multi-packet
> ---------------
> The configurations above are still possible, but we can configure 
> the device to place multiple packets in a large page:
>  
>                  unused space
>                 v
> 16kB, 2kB [xxxxx |yyyyy |...] [..................]
>       ^
>       alignment / stride
> 
> We can probably assume that this model always comes with alignment
> cause DMA'ing frames at odd offsets is a bad idea. And also note
> that packets smaller that alignment can get scattered to multiple
> bufs.
> 
> C. Multi-packet HW-GRO
> ----------------------
> For completeness, I guess. We need a third packet here. Assume x-packet
> and z-packet are from the same flow and GRO session, y-packet is not.
> (Good?) HW-GRO gives us out of order placement and hopefully in this
> case we do want to pack:
> 
> 16kB, 2kB [xxxxxzzzzz |.......] [xxxxx.............]
>                      ^
>       alignment / stride
> 
                                   ^^^^^
                                   is this y?

Not sure I understand this last representation: if x and z are 5kB
packets each and the stride size is 2kB, they should occupy 5 strides:

16kB, 2kB [xx|xx|xz|zz|zz|.......] [yy|yy|y |............]

I think I understand the point, just making sure that I got it straight.
Did I?

> 
> End of sidebar. I think / hope these are all practical buffer layouts
> we need to care about.
> 
> 
> What does user care about? Presumably three things:
>  a) efficiency of memory use (larger pages == more chance of low fill)
>  b) max size of a buffer (larger buffer = fewer iovecs to pass around)
>  c) alignment
> I don't think we can make these map 1:1 to any of the knobs we discussed
> at the start. (b) is really neither #1 (if driver fragments) nor #2 (if
> SW GRO can glue back together).
> 
> We could simply let the user control #1 - basically user control
> overrides the places where driver would previously use PAGE_SIZE.
> I think this is what Stan suggested long ago as well.
>
> But I wonder if user still needs to know #2 (rx-buf-len) because
> practically speaking, setting page size >4x the size of rx-buf-len
> is likely a lot more fragmentation for little extra aggregation.. ?
So how would rx-buf-len be configured then? Who gets to decide if not the
user: the driver or the kernel?

I don't understand what you mean by "setting page size >4x the size of
rx-buf-len". I thought it was the other way around: rx-buf-len is an
order of page size. Or am I stuck in the mindset of the old proposal?

> Tho, admittedly I think user only needs to know max-rx-buf-len
> not necessarily set it.
> 
> The last knob is alignment / reuse. For allowing multiple packets in
> one buffer we probably need to distinguish these cases to cater to
> sufficiently clever adapters:
>  - previous and next packets are from the same flow and
>    - within one GRO session
>    - previous had PSH set (or closed the GRO for another reason,
>      this is to allow realigning the buffer on GRO session close)
>   or
>    - the device doesn't know further distinctions / HW-GRO
>  - previous and next are from different flows
> And the actions (for each case separately) are one of:
>  - no reuse allowed (release buffer = -1?)
>  - reuse but must align (align to = N)
>  - reuse don't align (pack = 0)
> 
I am assuming that different HW will support a subset of these
actions and/or they will apply differently in each case (hence the 4
knobs?).

For example, in mlx5 the actions would work only for the second case
(at the end of a GRO session).

> So to restate do we need:
>  - "page order" control
>  - max-rx-buf-len
>  - 4 alignment knobs?
>
We do need at least 1 alignment knob.

> Corner cases
> ============
> I. Non-power of 2 buffer sizes
> ------------------------------
> Looks like multiple devices are limited by width of length fields,
> making max buffer size something like 32kB - 1 or 64kB - 1.
> Should we allow applications to configure the buffer to 
> 
>     power of 2 - alignment 
> 
> ? It will probably annoy the page pool code a bit. I guess for now
> we should just make sure that uAPI doesn't bake in the idea that
> buffers are always power of 2.
What if the hardware uses a log scheme to represent the buffer
length? Then it would still need to align down to the next power of 2?

> 
> II. Fractional page sizes
> -------------------------
> If the HW has max-rx-buf-len of 16k or 32k, and PAGE_SIZE is 64k
> should we support hunking devmem/iouring into less than a PAGE_SIZE?

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers
  2025-10-22 13:17           ` Dragos Tatulea
@ 2025-10-23  0:09             ` Jakub Kicinski
  0 siblings, 0 replies; 37+ messages in thread
From: Jakub Kicinski @ 2025-10-23  0:09 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Mina Almasry, Pavel Begunkov, netdev, Andrew Lunn, davem,
	Eric Dumazet, Paolo Abeni, Simon Horman, Donald Hunter,
	Michael Chan, Pavan Chebbi, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Joshua Washington,
	Harshitha Ramamurthy, Jian Shen, Salil Mehta, Jijie Shao,
	Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep, hariprasad,
	Bharat Bhushan, Saeed Mahameed, Tariq Toukan, Mark Bloch,
	Alexander Duyck, kernel-team, Ilias Apalodimas, Joe Damato,
	David Wei, Willem de Bruijn, Breno Leitao, linux-kernel,
	linux-doc, Jonathan Corbet

On Wed, 22 Oct 2025 13:17:43 +0000 Dragos Tatulea wrote:
> On Thu, Oct 16, 2025 at 06:40:31PM -0700, Jakub Kicinski wrote:
> > On Wed, 15 Oct 2025 10:44:19 -0700 Mina Almasry wrote:  
> > > I think what you're saying is what I was trying to say, but you said
> > > it more eloquently and genetically correct. I'm not familiar with the
> > > GRO packing you're referring to so. I just assumed the 'buffer sizes
> > > actually posted to the NIC' are the 'buffer sizes we end up seeing in
> > > the skb frags'.  
> > 
> > I don't think that code path exists today, buffers posted are frags
> > in the skb. But that's easily fixable.
> >   
> > > I guess what I'm trying to say in a different way, is: there are lots
> > > of buffer sizes in the rx path, AFAICT, at least:
> > > 
> > > 1. The size of the allocated netmems from the pp.
> > > 2. The size of the buffers posted to the NIC (which will be different
> > > from #1 if the page_pool_fragment_netmem or some other trick like
> > > hns3).
> > > 3. The size of the frags that end up in the skb (which will be
> > > different from #2 for GRO/other things I don't fully understand).
> > > 
> > > ...and I'm not sure what rx-buf-len should actually configure. My
> > > thinking is that it probably should configure #3, since that is what
> > > the user cares about, I agree with that.
> > > 
> > > IIRC when I last looked at this a few weeks ago, I think as written
> > > this patch series makes rx-buf-len actually configure #1.  
> > 
> > #1 or #2. #1 for otx2. For the RFC bnxt implementation they were
> > equivalent. But hns3's reading would be that it's #2.
> > 
> > From user PoV neither #1 nor #2 is particularly meaningful.
> > Assuming driver can fragment - #1 only configures memory accounting
> > blocks. #2 configures buffers passed to the HW, but some HW can pack
> > payloads into a single buf to save memory. Which means that if previous
> > frame was small and ate some of a page, subsequent large frame of
> > size M may not fit into a single buf of size X, even if M < X.
> > 
> > So I think the full set of parameters we should define would be
> > what you defined as #1 and #2. And on top of that we need some kind of
> > min alignment enforcement. David Wei mentioned that one of his main use
> > cases is ZC of a buffer which is then sent to storage, which has strict
> > alignment requirements. And some NICs will internally fragment the
> > page.. Maybe let's define the expected device behavior..
> > 
> > Device models
> > =============
> > Assume we receive 2 5kB packets, "x" means bytes from first packet,
> > "y" means bytes from the second packet.
> > 
> > A. Basic-scatter
> > ----------------
> > Packet uses one or more buffers, so 1:n mapping between packets and
> > buffers.
> >                        unused space
> >                       v
> >  1kB      [xx] [xx] [x ] [yy] [yy] [y ]
> > 16kB      [xxxxx            ] [yyyyy             ]
> > 
> > B. Multi-packet
> > ---------------
> > The configurations above are still possible, but we can configure 
> > the device to place multiple packets in a large page:
> >  
> >                  unused space
> >                 v
> > 16kB, 2kB [xxxxx |yyyyy |...] [..................]
> >       ^
> >       alignment / stride
> > 
> > We can probably assume that this model always comes with alignment
> > cause DMA'ing frames at odd offsets is a bad idea. And also note
> > that packets smaller that alignment can get scattered to multiple
> > bufs.
> > 
> > C. Multi-packet HW-GRO
> > ----------------------
> > For completeness, I guess. We need a third packet here. Assume x-packet
> > and z-packet are from the same flow and GRO session, y-packet is not.
> > (Good?) HW-GRO gives us out of order placement and hopefully in this
> > case we do want to pack:
> > 
> > 16kB, 2kB [xxxxxzzzzz |.......] [xxxxx.............]
> >                      ^
> >       alignment / stride
> >   
>                                    ^^^^^
>                                    is this y?

Yes, my bad

> Not sure I understand this last representation: if x and z are 5kB
> packets each and the stride size is 2kB, they should occupy 5 strides:
> 
> 16kB, 2kB [xx|xx|xz|zz|zz|.......] [yy|yy|y |............]
> 
> I think I understand the point, just making sure that I got it straight.
> Did I?

Yes, that's right. I was trying to (poorly) express that the alignment
is not:

  16kB, 2kB [xx|xx|x |zz|zz|z |.....] [yy|yy|y |............]

IOW that HW-GRO is expected to pack (at least by default).

> > End of sidebar. I think / hope these are all practical buffer layouts
> > we need to care about.
> > 
> > 
> > What does user care about? Presumably three things:
> >  a) efficiency of memory use (larger pages == more chance of low fill)
> >  b) max size of a buffer (larger buffer = fewer iovecs to pass around)
> >  c) alignment
> > I don't think we can make these map 1:1 to any of the knobs we discussed
> > at the start. (b) is really neither #1 (if driver fragments) nor #2 (if
> > SW GRO can glue back together).
> > 
> > We could simply let the user control #1 - basically user control
> > overrides the places where driver would previously use PAGE_SIZE.
> > I think this is what Stan suggested long ago as well.
> >
> > But I wonder if user still needs to know #2 (rx-buf-len) because
> > practically speaking, setting page size >4x the size of rx-buf-len
> > is likely a lot more fragmentation for little extra aggregation.. ?  
> So how would rx-buf-len be configured then? Who gets to decide if not the
> user: the driver or the kernel?

Driver.

> I don't understand what you mean by "setting page size >4x the size of
> rx-buf-len". I thought it was the other way around: rx-buf-len is an
> order of page size. Or am I stuck in the mindset of the old proposal?

Yes, rx-buf-len is a good match for model A (basic-scatter).
But in other models it becomes a bit difficult to define the exact
semantics. 

> > Tho, admittedly I think user only needs to know max-rx-buf-len
> > not necessarily set it.
> > 
> > The last knob is alignment / reuse. For allowing multiple packets in
> > one buffer we probably need to distinguish these cases to cater to
> > sufficiently clever adapters:
> >  - previous and next packets are from the same flow and
> >    - within one GRO session
> >    - previous had PSH set (or closed the GRO for another reason,
> >      this is to allow realigning the buffer on GRO session close)
> >   or
> >    - the device doesn't know further distinctions / HW-GRO
> >  - previous and next are from different flows
> > And the actions (for each case separately) are one of:
> >  - no reuse allowed (release buffer = -1?)
> >  - reuse but must align (align to = N)
> >  - reuse don't align (pack = 0)
> >   
> I am assuming that different HW will support a subset of these
> actions and/or they will apply differently in each case (hence the 4
> knobs?).

Yup! All the knobs are optional, we can also define extra ones if use
cases come up and HW can support it. Hopefully we can get selftests 
to validate the devices behave as configured.

> For example, in mlx5 the actions would work only for the second case
> (at the end of a GRO session).

Nice, I think that's the most useful one.

Not sure whether we should define all 4 from the start, or just
document them as a "plan", what the implicit default is expected 
to be (e.g. HW-GRO packs within a session) and then add as HW/users 
come around.

> > So to restate do we need:
> >  - "page order" control
> >  - max-rx-buf-len
> >  - 4 alignment knobs?
> >  
> We do need at least 1 alignment knob.
> 
> > Corner cases
> > ============
> > I. Non-power of 2 buffer sizes
> > ------------------------------
> > Looks like multiple devices are limited by width of length fields,
> > making max buffer size something like 32kB - 1 or 64kB - 1.
> > Should we allow applications to configure the buffer to 
> > 
> >     power of 2 - alignment 
> > 
> > ? It will probably annoy the page pool code a bit. I guess for now
> > we should just make sure that uAPI doesn't bake in the idea that
> > buffers are always power of 2.  
> What if the hardware uses a log scheme to represent the buffer
> length? Then it would still need to align down to the next power of 2?

Yes, that's fine, pow-of-2 should obviously work. I was trying to say
that we shouldn't _require_ power-of-2 in uAPI, because devices with
are limited to (power-of-2 - 1) would strand ~half of the max length.

> > II. Fractional page sizes
> > -------------------------
> > If the HW has max-rx-buf-len of 16k or 32k, and PAGE_SIZE is 64k
> > should we support hunking devmem/iouring into less than a PAGE_SIZE?  

Thanks for the comments!

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-10-23  0:09 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13 14:54 [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 01/24] net: page_pool: sanitise allocation order Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 02/24] docs: ethtool: document that rx_buf_len must control payload lengths Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 03/24] net: ethtool: report max value for rx-buf-len Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 04/24] net: use zero value to restore rx_buf_len to default Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 05/24] net: hns3: net: use zero " Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 06/24] net: clarify the meaning of netdev_config members Pavel Begunkov
2025-10-13 17:12   ` Randy Dunlap
2025-10-14 12:53     ` Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 07/24] net: add rx_buf_len to netdev config Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 08/24] eth: bnxt: read the page size from the adapter struct Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 09/24] eth: bnxt: set page pool page order based on rx_page_size Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 10/24] eth: bnxt: support setting size of agg buffers via ethtool Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 11/24] net: move netdev_config manipulation to dedicated helpers Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 12/24] net: reduce indent of struct netdev_queue_mgmt_ops members Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 13/24] net: allocate per-queue config structs and pass them thru the queue API Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 14/24] net: pass extack to netdev_rx_queue_restart() Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 15/24] net: add queue config validation callback Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 16/24] eth: bnxt: always set the queue mgmt ops Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 17/24] eth: bnxt: store the rx buf size per queue Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 18/24] eth: bnxt: adjust the fill level of agg queues with larger buffers Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 19/24] netdev: add support for setting rx-buf-len per queue Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 20/24] net: wipe the setting of deactived queues Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 21/24] eth: bnxt: use queue op config validate Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 22/24] eth: bnxt: support per queue configuration of rx-buf-len Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 23/24] net: let pp memory provider to specify rx buf len Pavel Begunkov
2025-10-13 14:54 ` [PATCH net-next v4 24/24] net: validate driver supports passed qcfg params Pavel Begunkov
2025-10-13 15:03 ` [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers Pavel Begunkov
2025-10-13 17:54 ` Jakub Kicinski
2025-10-14  4:41   ` Mina Almasry
2025-10-14 12:50     ` Pavel Begunkov
2025-10-15  1:41     ` Jakub Kicinski
2025-10-15 17:44       ` Mina Almasry
2025-10-17  1:40         ` Jakub Kicinski
2025-10-22 13:17           ` Dragos Tatulea
2025-10-23  0:09             ` Jakub Kicinski
2025-10-14 12:46   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).