From: Joe Damato <jdamato@fastly.com>
To: netdev@vger.kernel.org
Cc: mkarsten@uwaterloo.ca, kuba@kernel.org, skhawaja@google.com,
sdf@fomichev.me, bjorn@rivosinc.com, amritha.nambiar@intel.com,
sridhar.samudrala@intel.com, Joe Damato <jdamato@fastly.com>,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Breno Leitao <leitao@debian.org>,
Daniel Jurgens <danielj@nvidia.com>,
David Ahern <dsahern@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Donald Hunter <donald.hunter@gmail.com>,
Eric Dumazet <edumazet@google.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Jiri Pirko <jiri@resnulli.us>,
Johannes Berg <johannes.berg@intel.com>,
Jonathan Corbet <corbet@lwn.net>,
Kory Maincent <kory.maincent@bootlin.com>,
Leon Romanovsky <leon@kernel.org>,
linux-doc@vger.kernel.org (open list:DOCUMENTATION),
linux-kernel@vger.kernel.org (open list),
linux-rdma@vger.kernel.org (open list:MELLANOX MLX4 core VPI
driver), Lorenzo Bianconi <lorenzo@kernel.org>,
Michael Chan <michael.chan@broadcom.com>,
Paolo Abeni <pabeni@redhat.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Tariq Toukan <tariqt@nvidia.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Subject: [RFC net-next v3 0/9] Add support for per-NAPI config via netlink
Date: Thu, 12 Sep 2024 10:07:08 +0000 [thread overview]
Message-ID: <20240912100738.16567-1-jdamato@fastly.com> (raw)
Greetings:
Welcome to RFC v3.
This implementation allocates an array of "struct napi_config" in
net_device and each NAPI instance is assigned an index into the config
array.
Per-NAPI settings like:
- NAPI ID
- gro_flush_timeout
- defer_hard_irqs
are persisted in napi_config and restored on napi_disable/napi_enable
respectively.
To help illustrate how this would end up working, I've added patches for
3 drivers, of which I have access to only 1:
- mlx5 which is the basis of the examples below
- mlx4 which has TX only NAPIs, just to highlight that case. I have
only compile tested this patch; I don't have this hardware.
- bnxt which I have only compiled tested. I don't have this
hardware.
NOTE: I only tested this on mlx5; I have no access to the other hardware
for which I provided patches. Hopefully other folks can help test :)
This iteration seems to persist NAPI IDs and settings even when resizing
queues, see below, so I think maybe this is getting close to where we
want to land?
Here's an example of how it works on my mlx5:
# start with 2 queues
$ ethtool -l eth4 | grep Combined | tail -1
Combined: 2
First, output the current NAPI settings:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now, set the global sysfs parameters:
$ sudo bash -c 'echo 20000 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 100 >/sys/class/net/eth4/napi_defer_hard_irqs'
Output current NAPI settings again:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now set NAPI ID 345, via its NAPI ID to specific values:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set \
--json='{"id": 345,
"defer-hard-irqs": 111,
"gro-flush-timeout": 11111}'
None
Now output current NAPI settings again to ensure only NAPI ID 345
changed:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 11111,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now, increase gro-flush-timeout only:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"id": 345,
"gro-flush-timeout": 44444}'
None
Now output the current NAPI settings once more:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 44444,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now set NAPI ID 345 to have gro_flush_timeout of 0:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"id": 345,
"gro-flush-timeout": 0}'
None
Check that NAPI ID 345 has a value of 0:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Change the queue count, ensuring that NAPI ID 345 retains its settings:
$ sudo ethtool -L eth4 combined 4
Check that the new queues have the system wide settings but that NAPI ID
345 remains unchanged:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 347,
'ifindex': 7,
'irq': 529},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 346,
'ifindex': 7,
'irq': 528},
{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now reduce the queue count below where NAPI ID 345 is indexed:
$ sudo ethtool -L eth4 combined 1
Check the output:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Re-increase the queue count to ensure NAPI ID 345 is re-assigned the same
values:
$ sudo ethtool -L eth4 combined 2
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Create new queues to ensure the sysfs globals are used for the new NAPIs
but that NAPI ID 345 is unchanged:
$ sudo ethtool -L eth4 comabined 8
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[...]
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 346,
'ifindex': 7,
'irq': 528},
{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Last, but not least, let's try writing the sysfs parameters to ensure
all NAPIs are rewritten:
$ sudo bash -c 'echo 33333 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 222 >/sys/class/net/eth4/napi_defer_hard_irqs'
Check that worked:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[...]
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 346,
'ifindex': 7,
'irq': 528},
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 344,
'ifindex': 7,
'irq': 327}]
Thanks,
Joe
rfcv3:
- Renamed napi_storage to napi_config
- Reordered patches
- Added defer_hard_irqs and gro_flush_timeout to napi_struct
- Attempt to save and restore settings on napi_disable/napi_enable
- Removed weight as a parameter to netif_napi_add_storage
- Updated driver patches to no longer pass in weight
rfcv2:
- Almost total rewrite from v1
Joe Damato (9):
net: napi: Make napi_defer_hard_irqs per-NAPI
netdev-genl: Dump napi_defer_hard_irqs
net: napi: Make gro_flush_timeout per-NAPI
netdev-genl: Dump gro_flush_timeout
net: napi: Add napi_config
netdev-genl: Support setting per-NAPI config values
bnxt: Add support for napi storage
mlx5: Add support for napi storage
mlx4: Add support for napi storage to RX CQs
Documentation/netlink/specs/netdev.yaml | 25 ++++++
.../networking/net_cachelines/net_device.rst | 5 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +-
drivers/net/ethernet/mellanox/mlx4/en_cq.c | 3 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
include/linux/netdevice.h | 40 ++++++++-
include/uapi/linux/netdev.h | 3 +
net/core/dev.c | 90 +++++++++++++++----
net/core/dev.h | 87 ++++++++++++++++++
net/core/net-sysfs.c | 4 +-
net/core/netdev-genl-gen.c | 14 +++
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 55 ++++++++++++
tools/include/uapi/linux/netdev.h | 3 +
14 files changed, 310 insertions(+), 25 deletions(-)
--
2.25.1
next reply other threads:[~2024-09-12 10:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-12 10:07 Joe Damato [this message]
2024-09-12 10:07 ` [RFC net-next v3 1/9] net: napi: Make napi_defer_hard_irqs per-NAPI Joe Damato
2024-09-12 10:07 ` [RFC net-next v3 3/9] net: napi: Make gro_flush_timeout per-NAPI Joe Damato
2024-09-12 10:07 ` [RFC net-next v3 5/9] net: napi: Add napi_config Joe Damato
2024-09-12 15:03 ` Joe Damato
2024-09-13 21:44 ` Willem de Bruijn
2024-09-13 22:10 ` Joe Damato
2024-09-13 17:42 ` Stanislav Fomichev
2024-09-13 18:55 ` Joe Damato
2024-09-13 19:39 ` Joe Damato
2024-09-14 13:00 ` Joe Damato
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240912100738.16567-1-jdamato@fastly.com \
--to=jdamato@fastly.com \
--cc=aleksander.lobakin@intel.com \
--cc=amritha.nambiar@intel.com \
--cc=bigeasy@linutronix.de \
--cc=bjorn@rivosinc.com \
--cc=corbet@lwn.net \
--cc=danielj@nvidia.com \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=jiri@resnulli.us \
--cc=johannes.berg@intel.com \
--cc=kory.maincent@bootlin.com \
--cc=kuba@kernel.org \
--cc=leitao@debian.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lorenzo@kernel.org \
--cc=michael.chan@broadcom.com \
--cc=mkarsten@uwaterloo.ca \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=sdf@fomichev.me \
--cc=skhawaja@google.com \
--cc=sridhar.samudrala@intel.com \
--cc=tariqt@nvidia.com \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox