netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next v2 0/9] Add support for per-NAPI config via netlink
@ 2024-09-08 16:06 Joe Damato
  2024-09-08 16:06 ` [RFC net-next v2 1/9] net: napi: Add napi_storage Joe Damato
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Joe Damato @ 2024-09-08 16:06 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, kuba, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, Joe Damato, Alexander Lobakin, Breno Leitao,
	Daniel Jurgens, David Ahern, David S. Miller, Donald Hunter,
	Eric Dumazet, Jesper Dangaard Brouer, Jiri Pirko, Johannes Berg,
	Jonathan Corbet, Kory Maincent, Larysa Zaremba, Leon Romanovsky,
	open list:DOCUMENTATION, open list,
	open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
	Michael Chan, Paolo Abeni, Saeed Mahameed,
	Sebastian Andrzej Siewior, Tariq Toukan, Xuan Zhuo

Greetings:

Welcome to v2, converted to RFC.... which is definitely incorrect, but
hopefully can serve as a basis for discussion to get to something
better.

This implementation allocates "struct napi_storage" and each NAPI
instance is assigned an index into the storage array.

It seemed like this storage area should persist even if different HW
queues are created, but should be cleared when queue counts are resized
(ethtool -L).

What I did is flat out incorrect: memset the struct to 0
on napi_enable.

I am not totally clear if I understand the part of the previous
conversation about mapping rings to NAPIs and so on, but I wanted to
make sure the rest of the implementation was starting to vaguely look
like what was discussed in the previous thread.

To help illustrate how this would end up working, I've added patches for
3 drivers, of which I have access to only 1:
  - mlx5 which is the basis of the examples below
  - mlx4 which has TX only NAPIs, just to highlight that case. I have
    only compile tested this patch; I don't have this hardware.
  - bnxt which I have only compiled tested.

Zeroing on napi_enable is incorrect because, at the very least, it
breaks sysfs (the sysfs settings should be inherited). It's not clear
to me how we achieve persistence with the zero-ing unless we assume the
drivers are changed somehow? IIRC, there was a suggestion in the
previous thread to memset the napi_storage to 0 on queue resize, but
I've definitely gotten the nuance in what was desired wrong.

Anyway, sending what I have before iterating further to see if this is
even remotely the right direction before going too deep down this path.

I hope that's OK.

Here's an example of how it works on my mlx5 as is:

# start with 4 queues

$ ethtool -l eth4 | grep Combined | tail -1
Combined:	4

First, output the current NAPI settings:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Now, set the global sysfs parameters:

$ sudo bash -c 'echo 20000 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 100 >/sys/class/net/eth4/napi_defer_hard_irqs'

Output current NAPI settings again:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Now set NAPI ID 927, via its ifindex and index to specific values:

$ sudo ./tools/net/ynl/cli.py \
          --spec Documentation/netlink/specs/netdev.yaml \
          --do napi-set \
          --json='{"ifindex": 7, "index": 2,
                   "defer-hard-irqs": 111,
                   "gro-flush-timeout": 11111}'
None

Now output current NAPI settings again to ensure only 927 changed:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 11111,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Now, increase gro-flush-timeout only:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"ifindex": 7, "index": 2,
                              "gro-flush-timeout": 44444}
None

Now output the current NAPI settings once more:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 44444,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Now set NAPI ID 927, via its ifindex and index, to have
gro_flush_timeout of 0:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"ifindex": 7, "index": 2,
                              "gro-flush-timeout": 0}'
None

Check that NAPI ID 927 has a value of 0:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Last, but not least, let's try writing the sysfs parameters to ensure
all NAPIs are rewritten:

$ sudo bash -c 'echo 33333 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 222 >/sys/class/net/eth4/napi_defer_hard_irqs'

Check that worked:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Resizing the queues (ethtool -L) resets everything to 0, which is wrong
because it breaks sysfs and it breaks the persistence goal.

I hope though that his code can still be discussed to ensure that I am
moving in the correct direction toward solving these issues before going
too far down the rabbit hole :)

Thanks,
Joe

Joe Damato (9):
  net: napi: Add napi_storage
  netdev-genl: Export NAPI index
  net: napi: Make napi_defer_hard_irqs per-NAPI
  netdev-genl: Dump napi_defer_hard_irqs
  net: napi: Make gro_flush_timeout per-NAPI
  netdev-genl: Support setting per-NAPI config values
  bnxt: Add support for napi storage
  mlx5: Add support for napi storage
  mlx4: Add support for napi storage to RX CQs

 Documentation/netlink/specs/netdev.yaml       | 35 ++++++++
 .../networking/net_cachelines/net_device.rst  |  3 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  3 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c    |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 include/linux/netdevice.h                     | 38 ++++++++-
 include/uapi/linux/netdev.h                   |  4 +
 net/core/dev.c                                | 36 +++++---
 net/core/dev.h                                | 83 +++++++++++++++++++
 net/core/net-sysfs.c                          |  4 +-
 net/core/netdev-genl-gen.c                    | 15 ++++
 net/core/netdev-genl-gen.h                    |  1 +
 net/core/netdev-genl.c                        | 65 +++++++++++++++
 tools/include/uapi/linux/netdev.h             |  3 +
 14 files changed, 276 insertions(+), 19 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-09-11  7:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-08 16:06 [RFC net-next v2 0/9] Add support for per-NAPI config via netlink Joe Damato
2024-09-08 16:06 ` [RFC net-next v2 1/9] net: napi: Add napi_storage Joe Damato
2024-09-08 20:49   ` Joe Damato
2024-09-09 22:37     ` Stanislav Fomichev
2024-09-10  6:16       ` Joe Damato
2024-09-10 14:56         ` Jakub Kicinski
2024-09-09 23:40   ` Jakub Kicinski
2024-09-10  6:13     ` Joe Damato
2024-09-10 14:52       ` Jakub Kicinski
2024-09-10 16:10         ` Joe Damato
2024-09-11  0:10           ` Jakub Kicinski
2024-09-11  7:47             ` Joe Damato
2024-09-11  7:51     ` Joe Damato
2024-09-08 16:06 ` [RFC net-next v2 2/9] netdev-genl: Export NAPI index Joe Damato
2024-09-08 16:06 ` [RFC net-next v2 3/9] net: napi: Make napi_defer_hard_irqs per-NAPI Joe Damato
2024-09-08 16:06 ` [RFC net-next v2 4/9] netdev-genl: Dump napi_defer_hard_irqs Joe Damato
2024-09-08 20:36   ` Joe Damato
2024-09-08 16:06 ` [RFC net-next v2 5/9] net: napi: Make gro_flush_timeout per-NAPI Joe Damato
2024-09-08 16:06 ` [RFC net-next v2 6/9] netdev-genl: Support setting per-NAPI config values Joe Damato
2024-09-08 16:06 ` [RFC net-next v2 7/9] bnxt: Add support for napi storage Joe Damato
2024-09-08 16:06 ` [RFC net-next v2 8/9] mlx5: " Joe Damato
2024-09-08 16:06 ` [RFC net-next v2 9/9] mlx4: Add support for napi storage to RX CQs Joe Damato

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).