netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 00/22] net: per-queue rx-buf-len configuration
@ 2025-04-21 22:28 Jakub Kicinski
  2025-04-21 22:28 ` [RFC net-next 01/22] docs: ethtool: document that rx_buf_len must control payload lengths Jakub Kicinski
                   ` (22 more replies)
  0 siblings, 23 replies; 66+ messages in thread
From: Jakub Kicinski @ 2025-04-21 22:28 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
	sdf, almasrymina, dw, asml.silence, ap420073, jdamato, dtatulea,
	michael.chan, Jakub Kicinski

Add support for per-queue rx-buf-len configuration.

I'm sending this as RFC because I'd like to ponder the uAPI side
a little longer but it's good enough for people to work on
the memory provider side and support in other drivers.

The direct motivation for the series is that zero-copy Rx queues would
like to use larger Rx buffers. Most modern high-speed NICs support HW-GRO,
and can coalesce payloads into pages much larger than than the MTU.
Enabling larger buffers globally is a bit precarious as it exposes us
to potentially very inefficient memory use. Also allocating large
buffers may not be easy or cheap under load. Zero-copy queues service
only select traffic and have pre-allocated memory so the concerns don't
apply as much.

The per-queue config has to address 3 problems:
- user API
- driver API
- memory provider API

For user API the main question is whether we expose the config via
ethtool or netdev nl. I picked the latter - via queue GET/SET, rather
than extending the ethtool RINGS_GET API. I worry slightly that queue
GET/SET will turn in a monster like SETLINK. OTOH the only per-queue
settings we have in ethtool which are not going via RINGS_SET is
IRQ coalescing.

My goal for the driver API was to avoid complexity in the drivers.
The queue management API has gained two ops, responsible for preparing
configuration for a given queue, and validating whether the config
is supported. The validating is used both for NIC-wide and per-queue
changes. Queue alloc/start ops have a new "config" argument which
contains the current config for a given queue (we use queue restart
to apply per-queue settings). Outside of queue reset paths drivers
can call netdev_queue_config() which returns the config for an arbitrary
queue. Long story short I anticipate it to be used during ndo_open.

In the core I extended struct netdev_config with per queue settings.
All in all this isn't too far from what was there in my "queue API
prototype" a few years ago. One thing I was hoping to support but
haven't gotten to is providing the settings at the RSS context level.
Zero-copy users often depend on RSS for load spreading. It'd be more
convenient for them to provide the settings per RSS context.
We may be better off converting the QUEUE_SET netlink op to CONFIG_SET
and accept multiple "scopes" (queue, rss context)?

Memory provider API is a bit tricky. Initially I wasn't sure whether
the buffer size should be a MP attribute or a device attribute.
IOW whether it's the device that should be telling the MP what page
size it wants, or the MP telling the device what page size it has.
In some ways the latter is more flexible, but the implementation
gets hairy rather quickly. Drivers expect to know their parameters
early in the init process, page pools are allocated relatively late.

Jakub Kicinski (22):
  docs: ethtool: document that rx_buf_len must control payload lengths
  net: ethtool: report max value for rx-buf-len
  net: use zero value to restore rx_buf_len to default
  net: clarify the meaning of netdev_config members
  net: add rx_buf_len to netdev config
  eth: bnxt: read the page size from the adapter struct
  eth: bnxt: set page pool page order based on rx_page_size
  eth: bnxt: support setting size of agg buffers via ethtool
  net: move netdev_config manipulation to dedicated helpers
  net: reduce indent of struct netdev_queue_mgmt_ops members
  net: allocate per-queue config structs and pass them thru the queue
    API
  net: pass extack to netdev_rx_queue_restart()
  net: add queue config validation callback
  eth: bnxt: always set the queue mgmt ops
  eth: bnxt: store the rx buf size per queue
  eth: bnxt: adjust the fill level of agg queues with larger buffers
  netdev: add support for setting rx-buf-len per queue
  net: wipe the setting of deactived queues
  eth: bnxt: use queue op config validate
  eth: bnxt: support per queue configuration of rx-buf-len
  selftests: drv-net: add helper/wrapper for bpftrace
  selftests: drv-net: add test for rx-buf-len

 Documentation/netlink/specs/ethtool.yaml      |   4 +
 Documentation/netlink/specs/netdev.yaml       |  15 +
 Documentation/networking/ethtool-netlink.rst  |   7 +-
 net/core/Makefile                             |   2 +-
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   5 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +-
 include/linux/ethtool.h                       |   3 +
 include/net/netdev_queues.h                   |  83 ++++-
 include/net/netdev_rx_queue.h                 |   3 +-
 include/net/netlink.h                         |  19 ++
 .../uapi/linux/ethtool_netlink_generated.h    |   1 +
 include/uapi/linux/netdev.h                   |   2 +
 net/core/dev.h                                |  12 +
 net/core/netdev-genl-gen.h                    |   1 +
 tools/include/uapi/linux/netdev.h             |   2 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 135 ++++++--
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   9 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |   6 +-
 drivers/net/ethernet/google/gve/gve_main.c    |   9 +-
 .../marvell/octeontx2/nic/otx2_ethtool.c      |   6 +-
 drivers/net/netdevsim/netdev.c                |   8 +-
 net/core/dev.c                                |  12 +-
 net/core/netdev-genl-gen.c                    |  15 +
 net/core/netdev-genl.c                        |  92 ++++++
 net/core/netdev_config.c                      | 150 +++++++++
 net/core/netdev_rx_queue.c                    |  24 +-
 net/ethtool/common.c                          |   4 +-
 net/ethtool/netlink.c                         |  14 +-
 net/ethtool/rings.c                           |  14 +-
 .../selftests/drivers/net/hw/rx_buf_len.py    | 299 ++++++++++++++++++
 tools/testing/selftests/net/lib/py/utils.py   |  33 ++
 32 files changed, 913 insertions(+), 79 deletions(-)
 create mode 100644 net/core/netdev_config.c
 create mode 100755 tools/testing/selftests/drivers/net/hw/rx_buf_len.py

-- 
2.49.0


^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2025-06-25 12:23 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-21 22:28 [RFC net-next 00/22] net: per-queue rx-buf-len configuration Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 01/22] docs: ethtool: document that rx_buf_len must control payload lengths Jakub Kicinski
2025-04-22 16:19   ` David Wei
2025-04-22 19:48   ` Joe Damato
2025-04-23 20:08   ` Mina Almasry
2025-04-25 22:50     ` Jakub Kicinski
2025-04-25 23:20       ` Joe Damato
2025-04-21 22:28 ` [RFC net-next 02/22] net: ethtool: report max value for rx-buf-len Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 03/22] net: use zero value to restore rx_buf_len to default Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 04/22] net: clarify the meaning of netdev_config members Jakub Kicinski
2025-04-22 19:57   ` Joe Damato
2025-04-21 22:28 ` [RFC net-next 05/22] net: add rx_buf_len to netdev config Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 06/22] eth: bnxt: read the page size from the adapter struct Jakub Kicinski
2025-04-23 20:35   ` Mina Almasry
2025-04-25 22:51     ` Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 07/22] eth: bnxt: set page pool page order based on rx_page_size Jakub Kicinski
2025-04-22 15:32   ` Stanislav Fomichev
2025-04-22 15:52     ` Jakub Kicinski
2025-04-22 17:27       ` Stanislav Fomichev
2025-04-21 22:28 ` [RFC net-next 08/22] eth: bnxt: support setting size of agg buffers via ethtool Jakub Kicinski
2025-04-23 21:00   ` Mina Almasry
2025-04-25 22:58     ` Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 09/22] net: move netdev_config manipulation to dedicated helpers Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 10/22] net: reduce indent of struct netdev_queue_mgmt_ops members Jakub Kicinski
2025-04-23 21:04   ` Mina Almasry
2025-04-21 22:28 ` [RFC net-next 11/22] net: allocate per-queue config structs and pass them thru the queue API Jakub Kicinski
2025-04-23 21:17   ` Mina Almasry
2025-04-25 23:24     ` Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 12/22] net: pass extack to netdev_rx_queue_restart() Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 13/22] net: add queue config validation callback Jakub Kicinski
2025-04-22 15:49   ` Stanislav Fomichev
2025-04-22 20:16   ` Joe Damato
2025-04-21 22:28 ` [RFC net-next 14/22] eth: bnxt: always set the queue mgmt ops Jakub Kicinski
2025-04-22 15:50   ` Stanislav Fomichev
2025-04-22 20:18   ` Joe Damato
2025-04-21 22:28 ` [RFC net-next 15/22] eth: bnxt: store the rx buf size per queue Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 16/22] eth: bnxt: adjust the fill level of agg queues with larger buffers Jakub Kicinski
2025-04-22 16:13   ` Stanislav Fomichev
2025-04-21 22:28 ` [RFC net-next 17/22] netdev: add support for setting rx-buf-len per queue Jakub Kicinski
2025-04-22 16:15   ` Stanislav Fomichev
2025-04-25 23:41     ` Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 18/22] net: wipe the setting of deactived queues Jakub Kicinski
2025-04-22 16:21   ` Stanislav Fomichev
2025-04-25 23:42     ` Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 19/22] eth: bnxt: use queue op config validate Jakub Kicinski
2025-04-23 10:00   ` Dragos Tatulea
2025-04-23 13:46     ` Jakub Kicinski
2025-04-23 14:24       ` Dragos Tatulea
2025-04-23 15:33         ` Jakub Kicinski
2025-06-12 11:56   ` Dragos Tatulea
2025-06-12 14:10     ` Jakub Kicinski
2025-06-12 15:52       ` Dragos Tatulea
2025-06-12 22:30         ` Jakub Kicinski
2025-06-13 19:02           ` Dragos Tatulea
2025-06-13 23:16             ` Jakub Kicinski
2025-06-17 12:36               ` Dragos Tatulea
2025-04-21 22:28 ` [RFC net-next 20/22] eth: bnxt: support per queue configuration of rx-buf-len Jakub Kicinski
2025-04-21 22:28 ` [RFC net-next 21/22] selftests: drv-net: add helper/wrapper for bpftrace Jakub Kicinski
2025-04-22 16:36   ` Stanislav Fomichev
2025-04-22 16:39     ` Stanislav Fomichev
2025-06-25 12:23   ` Breno Leitao
2025-04-21 22:28 ` [RFC net-next 22/22] selftests: drv-net: add test for rx-buf-len Jakub Kicinski
2025-04-22 17:06   ` Stanislav Fomichev
2025-04-25 23:52     ` Jakub Kicinski
2025-04-23 20:02 ` [RFC net-next 00/22] net: per-queue rx-buf-len configuration Mina Almasry
2025-04-25 23:55   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).