netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next v4 0/9] Add support for per-NAPI config via netlink
@ 2024-10-01 23:52 Joe Damato
  2024-10-01 23:52 ` [RFC net-next v4 1/9] net: napi: Make napi_defer_hard_irqs per-NAPI Joe Damato
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	Alexander Lobakin, Breno Leitao, Daniel Jurgens, David Ahern,
	David S. Miller, Donald Hunter, Eric Dumazet,
	moderated list:INTEL ETHERNET DRIVERS, Jakub Kicinski,
	Jesper Dangaard Brouer, Jiri Pirko, Johannes Berg,
	Jonathan Corbet, Kory Maincent, Leon Romanovsky,
	open list:DOCUMENTATION, open list,
	open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
	Michael Chan, Mina Almasry, Paolo Abeni, Przemek Kitszel,
	Saeed Mahameed, Sebastian Andrzej Siewior, Tariq Toukan,
	Tony Nguyen, Xuan Zhuo

Greetings:

Welcome to RFC v4.

Very important and significant changes have been made since RFC v3 [1],
please see the changelog below for details.

A couple important call outs for this revision for reviewers:

  1. idpf embeds a napi_struct in an internal data structure and
     includes an assertion on the size of napi_struct. The maintainers
     have stated that they think anyone touching napi_struct should update
     the assertion [2], so I've done this in patch 3. 

     Even though the assertion has been updated, I've given the
     cacheline placement of napi_struct within idpf's internals no
     thought or consideration.

     Would appreciate other opinions on this; I think idpf should be
     fixed. It seems unreasonable to me that anyone changing the size of
     a struct in the core should need to think about cachelines in idpf.

  2. This revision seems to work (see below for a full walk through). Is
     this the behavior we want? Am I missing some use case or some
     behavioral thing other folks need?

  3. Re a previous point made by Stanislav regarding "taking over a NAPI
     ID" when the channel count changes: mlx5 seems to call napi_disable
     followed by netif_napi_del for the old queues and then calls
     napi_enable for the new ones. In this RFC, the NAPI ID generation
     is deferred to napi_enable. This means we won't end up with two of
     the same NAPI IDs added to the hash at the same time (I am pretty
     sure).

     Can we assume all drivers will napi_disable the old queues before
     napi_enable the new ones? If yes, we might not need to worry about
     a NAPI ID takeover function.
 
  4. I made the decision to remove the WARN_ON_ONCE that (I think?)
     Jakub previously suggested in alloc_netdev_mqs (WARN_ON_ONCE(txqs
     != rxqs);) because this was triggering on every kernel boot with my
     mlx5 NIC.

  5. I left the "maxqs = max(txqs, rxqs);" in alloc_netdev_mqs despite
     thinking this is a bit strange. I think it's strange that we might
     be short some number of NAPI configs, but it seems like most people
     are in favor of this approach, so I've left it.

I'd appreciate thoughts from reviewers on the above items, if at all
possible. Once those are addressed, modulo any feedback on the
code/white space wrapping I still need to do, I think this is close to
an official submission.

Now, on to the implementation:

This implementation allocates an array of "struct napi_config" in
net_device and each NAPI instance is assigned an index into the config
array.

Per-NAPI settings like:
  - NAPI ID
  - gro_flush_timeout
  - defer_hard_irqs

are persisted in napi_config and restored on napi_disable/napi_enable
respectively.

To help illustrate how this would end up working, I've added patches for
3 drivers, of which I have access to only 1:
  - mlx5 which is the basis of the examples below
  - mlx4 which has TX only NAPIs, just to highlight that case. I have
    only compile tested this patch; I don't have this hardware.
  - bnxt which I have only compiled tested. I don't have this
    hardware.

NOTE: I only tested this on mlx5; I have no access to the other hardware
for which I provided patches. Hopefully other folks can help test :)

This iteration seems to persist NAPI IDs and settings even when resizing
queues, see below, so I think maybe this is getting close to where we
want to land in terms of functionality.

Here's how it works when I test it on my system:

# start with 2 queues

$ ethtool -l eth4 | grep Combined | tail -1
Combined:	2

First, output the current NAPI settings:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now, set the global sysfs parameters:

$ sudo bash -c 'echo 20000 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 100 >/sys/class/net/eth4/napi_defer_hard_irqs'

Output current NAPI settings again:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now set NAPI ID 345, via its NAPI ID to specific values:

$ sudo ./tools/net/ynl/cli.py \
          --spec Documentation/netlink/specs/netdev.yaml \
          --do napi-set \
          --json='{"id": 345,
                   "defer-hard-irqs": 111,
                   "gro-flush-timeout": 11111}'
None

Now output current NAPI settings again to ensure only NAPI ID 345
changed:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 11111,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now, increase gro-flush-timeout only:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"id": 345,
                              "gro-flush-timeout": 44444}'
None

Now output the current NAPI settings once more:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 44444,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now set NAPI ID 345 to have gro_flush_timeout of 0:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"id": 345,
                              "gro-flush-timeout": 0}'
None

Check that NAPI ID 345 has a value of 0:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Change the queue count, ensuring that NAPI ID 345 retains its settings:

$ sudo ethtool -L eth4 combined 4

Check that the new queues have the system wide settings but that NAPI ID
345 remains unchanged:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 347,
  'ifindex': 7,
  'irq': 529},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 346,
  'ifindex': 7,
  'irq': 528},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now reduce the queue count below where NAPI ID 345 is indexed:

$ sudo ethtool -L eth4 combined 1

Check the output:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Re-increase the queue count to ensure NAPI ID 345 is re-assigned the same
values:

$ sudo ethtool -L eth4 combined 2

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Create new queues to ensure the sysfs globals are used for the new NAPIs
but that NAPI ID 345 is unchanged:

$ sudo ethtool -L eth4 combined 8

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[...]
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 346,
  'ifindex': 7,
  'irq': 528},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Last, but not least, let's try writing the sysfs parameters to ensure
all NAPIs are rewritten:

$ sudo bash -c 'echo 33333 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 222 >/sys/class/net/eth4/napi_defer_hard_irqs'

Check that worked:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[...]
 {'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 346,
  'ifindex': 7,
  'irq': 528},
 {'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Thanks,
Joe

[1]: https://lore.kernel.org/netdev/20240912100738.16567-1-jdamato@fastly.com/
[2]: https://lore.kernel.org/netdev/20240925180017.82891-1-jdamato@fastly.com/T/#m56b743bd16304a626848b14f90cecb661f464b74 

rfcv4:
  - Updated commit messages of most patches
  - Renamed netif_napi_add_storage to netif_napi_add_config in patch 5
  - Added a NULL check in netdev_set_defer_hard_irqs and
    netdev_set_gro_flush_timeout for netdev->napi_config in patch 5
  - Removed the WARN_ON_ONCE suggested in an earlier revision
    in alloc_netdev_mqs from patch 5; it triggers every time on my mlx5
    machine at boot and needlessly spams the log
  - Added a locking adjustment suggested by Stanislav to patch 6 to
    protect napi_id in patch 5
  - Removed napi_hash_del from netif_napi_del in patch 5. netif_napi_del
    calls __netif_napi_del which itself calls napi_hash_del. The
    original code thus resulted in two napi_hash_del calls, which is
    incorrect.
  - Removed the napi_hash_add from netif_napi_add_weight in patch 5.
    NAPIs are added to the hash when napi_enable is called, instead.
  - Moved the napi_restore_config to the top of napi_enable in patch 5.
  - Simplified the logic in __netif_napi_del and removed napi_hash_del.
    NAPIs are removed in napi_disable.
  - Fixed merge conflicts in patch 6 so it applies cleanly

rfcv3:
  - Renamed napi_storage to napi_config
  - Reordered patches
  - Added defer_hard_irqs and gro_flush_timeout to napi_struct
  - Attempt to save and restore settings on napi_disable/napi_enable
  - Removed weight as a parameter to netif_napi_add_storage
  - Updated driver patches to no longer pass in weight

rfcv2:
  - Almost total rewrite from v1

Joe Damato (9):
  net: napi: Make napi_defer_hard_irqs per-NAPI
  netdev-genl: Dump napi_defer_hard_irqs
  net: napi: Make gro_flush_timeout per-NAPI
  netdev-genl: Dump gro_flush_timeout
  net: napi: Add napi_config
  netdev-genl: Support setting per-NAPI config values
  bnxt: Add support for persistent NAPI config
  mlx5: Add support for persistent NAPI config
  mlx4: Add support for persistent NAPI config to RX CQs

 Documentation/netlink/specs/netdev.yaml       | 25 +++++
 .../networking/net_cachelines/net_device.rst  |  5 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  3 +-
 drivers/net/ethernet/intel/idpf/idpf_txrx.h   |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c    |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 include/linux/netdevice.h                     | 38 +++++++-
 include/uapi/linux/netdev.h                   |  3 +
 net/core/dev.c                                | 95 ++++++++++++++++---
 net/core/dev.h                                | 90 ++++++++++++++++++
 net/core/net-sysfs.c                          |  4 +-
 net/core/netdev-genl-gen.c                    | 14 +++
 net/core/netdev-genl-gen.h                    |  1 +
 net/core/netdev-genl.c                        | 57 +++++++++++
 tools/include/uapi/linux/netdev.h             |  3 +
 15 files changed, 320 insertions(+), 25 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC net-next v4 1/9] net: napi: Make napi_defer_hard_irqs per-NAPI
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
@ 2024-10-01 23:52 ` Joe Damato
  2024-10-08 22:08   ` Jakub Kicinski
  2024-10-01 23:52 ` [RFC net-next v4 2/9] netdev-genl: Dump napi_defer_hard_irqs Joe Damato
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Jiri Pirko, Sebastian Andrzej Siewior,
	Lorenzo Bianconi, Kory Maincent, Johannes Berg, Breno Leitao,
	Alexander Lobakin, open list:DOCUMENTATION, open list

Add defer_hard_irqs to napi_struct in preparation for per-NAPI
settings.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device defer_hard_irq field.
Reads from sysfs show the net_device field.

The ability to set defer_hard_irqs on specific NAPI instances will be
added in a later commit, via netdev-genl.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 .../networking/net_cachelines/net_device.rst  |  2 +-
 include/linux/netdevice.h                     |  3 +-
 net/core/dev.c                                | 10 +++---
 net/core/dev.h                                | 36 +++++++++++++++++++
 net/core/net-sysfs.c                          |  2 +-
 5 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
index 22b07c814f4a..eeeb7c925ec5 100644
--- a/Documentation/networking/net_cachelines/net_device.rst
+++ b/Documentation/networking/net_cachelines/net_device.rst
@@ -99,7 +99,6 @@ unsigned_int                        num_rx_queues
 unsigned_int                        real_num_rx_queues      -                   read_mostly         get_rps_cpu
 struct_bpf_prog*                    xdp_prog                -                   read_mostly         netif_elide_gro()
 unsigned_long                       gro_flush_timeout       -                   read_mostly         napi_complete_done
-u32                                 napi_defer_hard_irqs    -                   read_mostly         napi_complete_done
 unsigned_int                        gro_max_size            -                   read_mostly         skb_gro_receive
 unsigned_int                        gro_ipv4_max_size       -                   read_mostly         skb_gro_receive
 rx_handler_func_t*                  rx_handler              read_mostly         -                   __netif_receive_skb_core
@@ -183,3 +182,4 @@ struct_devlink_port*                devlink_port
 struct_dpll_pin*                    dpll_pin                                                        
 struct hlist_head                   page_pools
 struct dim_irq_moder*               irq_moder
+u32                                 napi_defer_hard_irqs
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e87b5e488325..55764efc5c93 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -377,6 +377,7 @@ struct napi_struct {
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	int			irq;
+	u32			defer_hard_irqs;
 };
 
 enum {
@@ -2075,7 +2076,6 @@ struct net_device {
 	unsigned int		real_num_rx_queues;
 	struct netdev_rx_queue	*_rx;
 	unsigned long		gro_flush_timeout;
-	u32			napi_defer_hard_irqs;
 	unsigned int		gro_max_size;
 	unsigned int		gro_ipv4_max_size;
 	rx_handler_func_t __rcu	*rx_handler;
@@ -2398,6 +2398,7 @@ struct net_device {
 
 	/** @irq_moder: dim parameters used if IS_ENABLED(CONFIG_DIMLIB). */
 	struct dim_irq_moder	*irq_moder;
+	u32			napi_defer_hard_irqs;
 
 	u8			priv[] ____cacheline_aligned
 				       __counted_by(priv_len);
diff --git a/net/core/dev.c b/net/core/dev.c
index cd479f5f22f6..748739958d2a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6227,7 +6227,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 	if (work_done) {
 		if (n->gro_bitmask)
 			timeout = READ_ONCE(n->dev->gro_flush_timeout);
-		n->defer_hard_irqs_count = READ_ONCE(n->dev->napi_defer_hard_irqs);
+		n->defer_hard_irqs_count = napi_get_defer_hard_irqs(n);
 	}
 	if (n->defer_hard_irqs_count > 0) {
 		n->defer_hard_irqs_count--;
@@ -6365,7 +6365,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
 
 	if (flags & NAPI_F_PREFER_BUSY_POLL) {
-		napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs);
+		napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
 		timeout = READ_ONCE(napi->dev->gro_flush_timeout);
 		if (napi->defer_hard_irqs_count && timeout) {
 			hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED);
@@ -6647,6 +6647,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 	INIT_HLIST_NODE(&napi->napi_hash_node);
 	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	napi->timer.function = napi_watchdog;
+	napi_set_defer_hard_irqs(napi, READ_ONCE(dev->napi_defer_hard_irqs));
 	init_gro_hash(napi);
 	napi->skb = NULL;
 	INIT_LIST_HEAD(&napi->rx_list);
@@ -11053,7 +11054,7 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev)
 
 	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
 		dev->gro_flush_timeout = 20000;
-		dev->napi_defer_hard_irqs = 1;
+		netdev_set_defer_hard_irqs(dev, 1);
 	}
 }
 EXPORT_SYMBOL_GPL(netdev_sw_irq_coalesce_default_on);
@@ -11991,7 +11992,6 @@ static void __init net_dev_struct_check(void)
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, real_num_rx_queues);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, _rx);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, gro_flush_timeout);
-	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, napi_defer_hard_irqs);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, gro_max_size);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, gro_ipv4_max_size);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, rx_handler);
@@ -12003,7 +12003,7 @@ static void __init net_dev_struct_check(void)
 #ifdef CONFIG_NET_XGRESS
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, tcx_ingress);
 #endif
-	CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_rx, 104);
+	CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_rx, 100);
 }
 
 /*
diff --git a/net/core/dev.h b/net/core/dev.h
index 5654325c5b71..b3792219879b 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -138,6 +138,42 @@ static inline void netif_set_gro_ipv4_max_size(struct net_device *dev,
 	WRITE_ONCE(dev->gro_ipv4_max_size, size);
 }
 
+/**
+ * napi_get_defer_hard_irqs - get the NAPI's defer_hard_irqs
+ * @n: napi struct to get the defer_hard_irqs field from
+ *
+ * Return: the per-NAPI value of the defar_hard_irqs field.
+ */
+static inline u32 napi_get_defer_hard_irqs(const struct napi_struct *n)
+{
+	return READ_ONCE(n->defer_hard_irqs);
+}
+
+/**
+ * napi_set_defer_hard_irqs - set the defer_hard_irqs for a napi
+ * @n: napi_struct to set the defer_hard_irqs field
+ * @defer: the value the field should be set to
+ */
+static inline void napi_set_defer_hard_irqs(struct napi_struct *n, u32 defer)
+{
+	WRITE_ONCE(n->defer_hard_irqs, defer);
+}
+
+/**
+ * netdev_set_defer_hard_irqs - set defer_hard_irqs for all NAPIs of a netdev
+ * @netdev: the net_device for which all NAPIs will have defer_hard_irqs set
+ * @defer: the defer_hard_irqs value to set
+ */
+static inline void netdev_set_defer_hard_irqs(struct net_device *netdev,
+					      u32 defer)
+{
+	struct napi_struct *napi;
+
+	WRITE_ONCE(netdev->napi_defer_hard_irqs, defer);
+	list_for_each_entry(napi, &netdev->napi_list, dev_list)
+		napi_set_defer_hard_irqs(napi, defer);
+}
+
 int rps_cpumask_housekeeping(struct cpumask *mask);
 
 #if defined(CONFIG_DEBUG_NET) && defined(CONFIG_BPF_SYSCALL)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 05cf5347f25e..25125f356a15 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -429,7 +429,7 @@ static int change_napi_defer_hard_irqs(struct net_device *dev, unsigned long val
 	if (val > S32_MAX)
 		return -ERANGE;
 
-	WRITE_ONCE(dev->napi_defer_hard_irqs, val);
+	netdev_set_defer_hard_irqs(dev, (u32)val);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC net-next v4 2/9] netdev-genl: Dump napi_defer_hard_irqs
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
  2024-10-01 23:52 ` [RFC net-next v4 1/9] net: napi: Make napi_defer_hard_irqs per-NAPI Joe Damato
@ 2024-10-01 23:52 ` Joe Damato
  2024-10-01 23:52 ` [RFC net-next v4 3/9] net: napi: Make gro_flush_timeout per-NAPI Joe Damato
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Jesper Dangaard Brouer, Mina Almasry, Xuan Zhuo,
	open list

Support dumping defer_hard_irqs for a NAPI ID.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 Documentation/netlink/specs/netdev.yaml | 8 ++++++++
 include/uapi/linux/netdev.h             | 1 +
 net/core/netdev-genl.c                  | 6 ++++++
 tools/include/uapi/linux/netdev.h       | 1 +
 4 files changed, 16 insertions(+)

diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 08412c279297..585e87ec3c16 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -248,6 +248,13 @@ attribute-sets:
              threaded mode. If NAPI is not in threaded mode (i.e. uses normal
              softirq context), the attribute will be absent.
         type: u32
+      -
+        name: defer-hard-irqs
+        doc: The number of consecutive empty polls before IRQ deferral ends
+             and hardware IRQs are re-enabled.
+        type: u32
+        checks:
+          max: s32-max
   -
     name: queue
     attributes:
@@ -636,6 +643,7 @@ operations:
             - ifindex
             - irq
             - pid
+            - defer-hard-irqs
       dump:
         request:
           attributes:
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 7c308f04e7a0..13dc0b027e86 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -122,6 +122,7 @@ enum {
 	NETDEV_A_NAPI_ID,
 	NETDEV_A_NAPI_IRQ,
 	NETDEV_A_NAPI_PID,
+	NETDEV_A_NAPI_DEFER_HARD_IRQS,
 
 	__NETDEV_A_NAPI_MAX,
 	NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 1cb954f2d39e..de9bd76f43f8 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -161,6 +161,7 @@ static int
 netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi,
 			const struct genl_info *info)
 {
+	u32 napi_defer_hard_irqs;
 	void *hdr;
 	pid_t pid;
 
@@ -189,6 +190,11 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi,
 			goto nla_put_failure;
 	}
 
+	napi_defer_hard_irqs = napi_get_defer_hard_irqs(napi);
+	if (nla_put_s32(rsp, NETDEV_A_NAPI_DEFER_HARD_IRQS,
+			napi_defer_hard_irqs))
+		goto nla_put_failure;
+
 	genlmsg_end(rsp, hdr);
 
 	return 0;
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 7c308f04e7a0..13dc0b027e86 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -122,6 +122,7 @@ enum {
 	NETDEV_A_NAPI_ID,
 	NETDEV_A_NAPI_IRQ,
 	NETDEV_A_NAPI_PID,
+	NETDEV_A_NAPI_DEFER_HARD_IRQS,
 
 	__NETDEV_A_NAPI_MAX,
 	NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC net-next v4 3/9] net: napi: Make gro_flush_timeout per-NAPI
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
  2024-10-01 23:52 ` [RFC net-next v4 1/9] net: napi: Make napi_defer_hard_irqs per-NAPI Joe Damato
  2024-10-01 23:52 ` [RFC net-next v4 2/9] netdev-genl: Dump napi_defer_hard_irqs Joe Damato
@ 2024-10-01 23:52 ` Joe Damato
  2024-10-08 18:22   ` Joe Damato
  2024-10-08 22:10   ` Jakub Kicinski
  2024-10-01 23:52 ` [RFC net-next v4 4/9] netdev-genl: Dump gro_flush_timeout Joe Damato
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Tony Nguyen, Przemek Kitszel, Jiri Pirko,
	Sebastian Andrzej Siewior, Lorenzo Bianconi, David Ahern,
	Kory Maincent, Johannes Berg, Breno Leitao, Alexander Lobakin,
	open list:DOCUMENTATION, open list,
	moderated list:INTEL ETHERNET DRIVERS

Allow per-NAPI gro_flush_timeout setting.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device gro_flush_timeout
field. Reads from sysfs will read from the net_device field.

The ability to set gro_flush_timeout on specific NAPI instances will be
added in a later commit, via netdev-genl.

Note that idpf has embedded napi_struct in its internals and has
established some series of asserts that involve the size of napi
structure. Since this change increases the napi_struct size from 400 to
416 (according to pahole on my system), I've increased the assertion in
idpf by 16 bytes. No attention whatsoever was paid to the cacheline
placement of idpf internals as a result of this change.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 .../networking/net_cachelines/net_device.rst  |  2 +-
 drivers/net/ethernet/intel/idpf/idpf_txrx.h   |  2 +-
 include/linux/netdevice.h                     |  3 +-
 net/core/dev.c                                | 12 +++---
 net/core/dev.h                                | 40 +++++++++++++++++++
 net/core/net-sysfs.c                          |  2 +-
 6 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
index eeeb7c925ec5..3d02ae79c850 100644
--- a/Documentation/networking/net_cachelines/net_device.rst
+++ b/Documentation/networking/net_cachelines/net_device.rst
@@ -98,7 +98,6 @@ struct_netdev_queue*                _rx                     read_mostly
 unsigned_int                        num_rx_queues                                                   
 unsigned_int                        real_num_rx_queues      -                   read_mostly         get_rps_cpu
 struct_bpf_prog*                    xdp_prog                -                   read_mostly         netif_elide_gro()
-unsigned_long                       gro_flush_timeout       -                   read_mostly         napi_complete_done
 unsigned_int                        gro_max_size            -                   read_mostly         skb_gro_receive
 unsigned_int                        gro_ipv4_max_size       -                   read_mostly         skb_gro_receive
 rx_handler_func_t*                  rx_handler              read_mostly         -                   __netif_receive_skb_core
@@ -182,4 +181,5 @@ struct_devlink_port*                devlink_port
 struct_dpll_pin*                    dpll_pin                                                        
 struct hlist_head                   page_pools
 struct dim_irq_moder*               irq_moder
+unsigned_long                       gro_flush_timeout
 u32                                 napi_defer_hard_irqs
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index f0537826f840..fcdf73486d46 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -438,7 +438,7 @@ struct idpf_q_vector {
 	__cacheline_group_end_aligned(cold);
 };
 libeth_cacheline_set_assert(struct idpf_q_vector, 112,
-			    424 + 2 * sizeof(struct dim),
+			    440 + 2 * sizeof(struct dim),
 			    8 + sizeof(cpumask_var_t));
 
 struct idpf_rx_queue_stats {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 55764efc5c93..33897edd16c8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -377,6 +377,7 @@ struct napi_struct {
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	int			irq;
+	unsigned long		gro_flush_timeout;
 	u32			defer_hard_irqs;
 };
 
@@ -2075,7 +2076,6 @@ struct net_device {
 	int			ifindex;
 	unsigned int		real_num_rx_queues;
 	struct netdev_rx_queue	*_rx;
-	unsigned long		gro_flush_timeout;
 	unsigned int		gro_max_size;
 	unsigned int		gro_ipv4_max_size;
 	rx_handler_func_t __rcu	*rx_handler;
@@ -2398,6 +2398,7 @@ struct net_device {
 
 	/** @irq_moder: dim parameters used if IS_ENABLED(CONFIG_DIMLIB). */
 	struct dim_irq_moder	*irq_moder;
+	unsigned long		gro_flush_timeout;
 	u32			napi_defer_hard_irqs;
 
 	u8			priv[] ____cacheline_aligned
diff --git a/net/core/dev.c b/net/core/dev.c
index 748739958d2a..056ed44f766f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6226,12 +6226,12 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 
 	if (work_done) {
 		if (n->gro_bitmask)
-			timeout = READ_ONCE(n->dev->gro_flush_timeout);
+			timeout = napi_get_gro_flush_timeout(n);
 		n->defer_hard_irqs_count = napi_get_defer_hard_irqs(n);
 	}
 	if (n->defer_hard_irqs_count > 0) {
 		n->defer_hard_irqs_count--;
-		timeout = READ_ONCE(n->dev->gro_flush_timeout);
+		timeout = napi_get_gro_flush_timeout(n);
 		if (timeout)
 			ret = false;
 	}
@@ -6366,7 +6366,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 
 	if (flags & NAPI_F_PREFER_BUSY_POLL) {
 		napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
-		timeout = READ_ONCE(napi->dev->gro_flush_timeout);
+		timeout = napi_get_gro_flush_timeout(napi);
 		if (napi->defer_hard_irqs_count && timeout) {
 			hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED);
 			skip_schedule = true;
@@ -6648,6 +6648,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	napi->timer.function = napi_watchdog;
 	napi_set_defer_hard_irqs(napi, READ_ONCE(dev->napi_defer_hard_irqs));
+	napi_set_gro_flush_timeout(napi, READ_ONCE(dev->gro_flush_timeout));
 	init_gro_hash(napi);
 	napi->skb = NULL;
 	INIT_LIST_HEAD(&napi->rx_list);
@@ -11053,7 +11054,7 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev)
 	WARN_ON(dev->reg_state == NETREG_REGISTERED);
 
 	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
-		dev->gro_flush_timeout = 20000;
+		netdev_set_gro_flush_timeout(dev, 20000);
 		netdev_set_defer_hard_irqs(dev, 1);
 	}
 }
@@ -11991,7 +11992,6 @@ static void __init net_dev_struct_check(void)
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, ifindex);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, real_num_rx_queues);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, _rx);
-	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, gro_flush_timeout);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, gro_max_size);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, gro_ipv4_max_size);
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, rx_handler);
@@ -12003,7 +12003,7 @@ static void __init net_dev_struct_check(void)
 #ifdef CONFIG_NET_XGRESS
 	CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, tcx_ingress);
 #endif
-	CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_rx, 100);
+	CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_rx, 92);
 }
 
 /*
diff --git a/net/core/dev.h b/net/core/dev.h
index b3792219879b..26e598aa56c3 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -174,6 +174,46 @@ static inline void netdev_set_defer_hard_irqs(struct net_device *netdev,
 		napi_set_defer_hard_irqs(napi, defer);
 }
 
+/**
+ * napi_get_gro_flush_timeout - get the gro_flush_timeout
+ * @n: napi struct to get the gro_flush_timeout from
+ *
+ * Return: the per-NAPI value of the gro_flush_timeout field.
+ */
+static inline unsigned long
+napi_get_gro_flush_timeout(const struct napi_struct *n)
+{
+	return READ_ONCE(n->gro_flush_timeout);
+}
+
+/**
+ * napi_set_gro_flush_timeout - set the gro_flush_timeout for a napi
+ * @n: napi struct to set the gro_flush_timeout
+ * @timeout: timeout value to set
+ *
+ * napi_set_gro_flush_timeout sets the per-NAPI gro_flush_timeout
+ */
+static inline void napi_set_gro_flush_timeout(struct napi_struct *n,
+					      unsigned long timeout)
+{
+	WRITE_ONCE(n->gro_flush_timeout, timeout);
+}
+
+/**
+ * netdev_set_gro_flush_timeout - set gro_flush_timeout of a netdev's NAPIs
+ * @netdev: the net_device for which all NAPIs will have gro_flush_timeout set
+ * @timeout: the timeout value to set
+ */
+static inline void netdev_set_gro_flush_timeout(struct net_device *netdev,
+						unsigned long timeout)
+{
+	struct napi_struct *napi;
+
+	WRITE_ONCE(netdev->gro_flush_timeout, timeout);
+	list_for_each_entry(napi, &netdev->napi_list, dev_list)
+		napi_set_gro_flush_timeout(napi, timeout);
+}
+
 int rps_cpumask_housekeeping(struct cpumask *mask);
 
 #if defined(CONFIG_DEBUG_NET) && defined(CONFIG_BPF_SYSCALL)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 25125f356a15..2d9afc6e2161 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -409,7 +409,7 @@ NETDEVICE_SHOW_RW(tx_queue_len, fmt_dec);
 
 static int change_gro_flush_timeout(struct net_device *dev, unsigned long val)
 {
-	WRITE_ONCE(dev->gro_flush_timeout, val);
+	netdev_set_gro_flush_timeout(dev, val);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC net-next v4 4/9] netdev-genl: Dump gro_flush_timeout
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
                   ` (2 preceding siblings ...)
  2024-10-01 23:52 ` [RFC net-next v4 3/9] net: napi: Make gro_flush_timeout per-NAPI Joe Damato
@ 2024-10-01 23:52 ` Joe Damato
  2024-10-01 23:52 ` [RFC net-next v4 5/9] net: napi: Add napi_config Joe Damato
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	Donald Hunter, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni, Jesper Dangaard Brouer, Mina Almasry, Xuan Zhuo,
	Daniel Jurgens, open list

Support dumping gro_flush_timeout for a NAPI ID.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 Documentation/netlink/specs/netdev.yaml | 6 ++++++
 include/uapi/linux/netdev.h             | 1 +
 net/core/netdev-genl.c                  | 6 ++++++
 tools/include/uapi/linux/netdev.h       | 1 +
 4 files changed, 14 insertions(+)

diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 585e87ec3c16..bf13613eaa0d 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -255,6 +255,11 @@ attribute-sets:
         type: u32
         checks:
           max: s32-max
+      -
+        name: gro-flush-timeout
+        doc: The timeout, in nanoseconds, of when to trigger the NAPI
+             watchdog timer and schedule NAPI processing.
+        type: uint
   -
     name: queue
     attributes:
@@ -644,6 +649,7 @@ operations:
             - irq
             - pid
             - defer-hard-irqs
+            - gro-flush-timeout
       dump:
         request:
           attributes:
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 13dc0b027e86..cacd33359c76 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -123,6 +123,7 @@ enum {
 	NETDEV_A_NAPI_IRQ,
 	NETDEV_A_NAPI_PID,
 	NETDEV_A_NAPI_DEFER_HARD_IRQS,
+	NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT,
 
 	__NETDEV_A_NAPI_MAX,
 	NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index de9bd76f43f8..64e5e4cee60d 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -161,6 +161,7 @@ static int
 netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi,
 			const struct genl_info *info)
 {
+	unsigned long gro_flush_timeout;
 	u32 napi_defer_hard_irqs;
 	void *hdr;
 	pid_t pid;
@@ -195,6 +196,11 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi,
 			napi_defer_hard_irqs))
 		goto nla_put_failure;
 
+	gro_flush_timeout = napi_get_gro_flush_timeout(napi);
+	if (nla_put_uint(rsp, NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT,
+			 gro_flush_timeout))
+		goto nla_put_failure;
+
 	genlmsg_end(rsp, hdr);
 
 	return 0;
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 13dc0b027e86..cacd33359c76 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -123,6 +123,7 @@ enum {
 	NETDEV_A_NAPI_IRQ,
 	NETDEV_A_NAPI_PID,
 	NETDEV_A_NAPI_DEFER_HARD_IRQS,
+	NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT,
 
 	__NETDEV_A_NAPI_MAX,
 	NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC net-next v4 5/9] net: napi: Add napi_config
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
                   ` (3 preceding siblings ...)
  2024-10-01 23:52 ` [RFC net-next v4 4/9] netdev-genl: Dump gro_flush_timeout Joe Damato
@ 2024-10-01 23:52 ` Joe Damato
  2024-10-08 18:19   ` Joe Damato
  2024-10-08 22:17   ` Jakub Kicinski
  2024-10-01 23:52 ` [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values Joe Damato
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Jiri Pirko, Sebastian Andrzej Siewior,
	Lorenzo Bianconi, Johannes Berg, open list:DOCUMENTATION,
	open list

Add a persistent NAPI config area for NAPI configuration to the core.
Drivers opt-in to setting the persistent config for a NAPI by passing an
index when calling netif_napi_add_config.

napi_config is allocated in alloc_netdev_mqs, freed in free_netdev
(after the NAPIs are deleted).

Drivers which call netif_napi_add_config will have persistent per-NAPI
settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings.

Per-NAPI settings are saved in napi_disable and restored in napi_enable.

Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 .../networking/net_cachelines/net_device.rst  |  1 +
 include/linux/netdevice.h                     | 32 ++++++++
 net/core/dev.c                                | 79 +++++++++++++++++--
 net/core/dev.h                                | 14 ++++
 4 files changed, 119 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
index 3d02ae79c850..11d659051f5e 100644
--- a/Documentation/networking/net_cachelines/net_device.rst
+++ b/Documentation/networking/net_cachelines/net_device.rst
@@ -183,3 +183,4 @@ struct hlist_head                   page_pools
 struct dim_irq_moder*               irq_moder
 unsigned_long                       gro_flush_timeout
 u32                                 napi_defer_hard_irqs
+struct napi_config*                 napi_config
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 33897edd16c8..51cff55e7ab8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -342,6 +342,15 @@ struct gro_list {
  */
 #define GRO_HASH_BUCKETS	8
 
+/*
+ * Structure for per-NAPI config
+ */
+struct napi_config {
+	u64 gro_flush_timeout;
+	u32 defer_hard_irqs;
+	unsigned int napi_id;
+};
+
 /*
  * Structure for NAPI scheduling similar to tasklet but with weighting
  */
@@ -379,6 +388,8 @@ struct napi_struct {
 	int			irq;
 	unsigned long		gro_flush_timeout;
 	u32			defer_hard_irqs;
+	int			index;
+	struct napi_config	*config;
 };
 
 enum {
@@ -2011,6 +2022,9 @@ enum netdev_reg_state {
  *	@dpll_pin: Pointer to the SyncE source pin of a DPLL subsystem,
  *		   where the clock is recovered.
  *
+ *	@napi_config: An array of napi_config structures containing per-NAPI
+ *		      settings.
+ *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
  */
@@ -2400,6 +2414,7 @@ struct net_device {
 	struct dim_irq_moder	*irq_moder;
 	unsigned long		gro_flush_timeout;
 	u32			napi_defer_hard_irqs;
+	struct napi_config	*napi_config;
 
 	u8			priv[] ____cacheline_aligned
 				       __counted_by(priv_len);
@@ -2650,6 +2665,23 @@ netif_napi_add_tx_weight(struct net_device *dev,
 	netif_napi_add_weight(dev, napi, poll, weight);
 }
 
+/**
+ * netif_napi_add_config - initialize a NAPI context with persistent config
+ * @dev: network device
+ * @napi: NAPI context
+ * @poll: polling function
+ * @weight: the poll weight of this NAPI
+ * @index: the NAPI index
+ */
+static inline void
+netif_napi_add_config(struct net_device *dev, struct napi_struct *napi,
+		      int (*poll)(struct napi_struct *, int), int index)
+{
+	napi->index = index;
+	napi->config = &dev->napi_config[index];
+	netif_napi_add_weight(dev, napi, poll, NAPI_POLL_WEIGHT);
+}
+
 /**
  * netif_napi_add_tx() - initialize a NAPI context to be used for Tx only
  * @dev:  network device
diff --git a/net/core/dev.c b/net/core/dev.c
index 056ed44f766f..9acb6db19200 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6499,6 +6499,22 @@ EXPORT_SYMBOL(napi_busy_loop);
 
 #endif /* CONFIG_NET_RX_BUSY_POLL */
 
+static void __napi_hash_add_with_id(struct napi_struct *napi,
+				    unsigned int napi_id)
+{
+	napi->napi_id = napi_id;
+	hlist_add_head_rcu(&napi->napi_hash_node,
+			   &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
+}
+
+static void napi_hash_add_with_id(struct napi_struct *napi,
+				  unsigned int napi_id)
+{
+	spin_lock(&napi_hash_lock);
+	__napi_hash_add_with_id(napi, napi_id);
+	spin_unlock(&napi_hash_lock);
+}
+
 static void napi_hash_add(struct napi_struct *napi)
 {
 	if (test_bit(NAPI_STATE_NO_BUSY_POLL, &napi->state))
@@ -6511,10 +6527,8 @@ static void napi_hash_add(struct napi_struct *napi)
 		if (unlikely(++napi_gen_id < MIN_NAPI_ID))
 			napi_gen_id = MIN_NAPI_ID;
 	} while (napi_by_id(napi_gen_id));
-	napi->napi_id = napi_gen_id;
 
-	hlist_add_head_rcu(&napi->napi_hash_node,
-			   &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
+	__napi_hash_add_with_id(napi, napi_gen_id);
 
 	spin_unlock(&napi_hash_lock);
 }
@@ -6637,6 +6651,28 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
 }
 EXPORT_SYMBOL(netif_queue_set_napi);
 
+static void napi_restore_config(struct napi_struct *n)
+{
+	n->defer_hard_irqs = n->config->defer_hard_irqs;
+	n->gro_flush_timeout = n->config->gro_flush_timeout;
+	/* a NAPI ID might be stored in the config, if so use it. if not, use
+	 * napi_hash_add to generate one for us. It will be saved to the config
+	 * in napi_disable.
+	 */
+	if (n->config->napi_id)
+		napi_hash_add_with_id(n, n->config->napi_id);
+	else
+		napi_hash_add(n);
+}
+
+static void napi_save_config(struct napi_struct *n)
+{
+	n->config->defer_hard_irqs = n->defer_hard_irqs;
+	n->config->gro_flush_timeout = n->gro_flush_timeout;
+	n->config->napi_id = n->napi_id;
+	napi_hash_del(n);
+}
+
 void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 			   int (*poll)(struct napi_struct *, int), int weight)
 {
@@ -6647,8 +6683,6 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 	INIT_HLIST_NODE(&napi->napi_hash_node);
 	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	napi->timer.function = napi_watchdog;
-	napi_set_defer_hard_irqs(napi, READ_ONCE(dev->napi_defer_hard_irqs));
-	napi_set_gro_flush_timeout(napi, READ_ONCE(dev->gro_flush_timeout));
 	init_gro_hash(napi);
 	napi->skb = NULL;
 	INIT_LIST_HEAD(&napi->rx_list);
@@ -6666,7 +6700,13 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 	set_bit(NAPI_STATE_SCHED, &napi->state);
 	set_bit(NAPI_STATE_NPSVC, &napi->state);
 	list_add_rcu(&napi->dev_list, &dev->napi_list);
-	napi_hash_add(napi);
+
+	/* default settings from sysfs are applied to all NAPIs. any per-NAPI
+	 * configuration will be loaded in napi_enable
+	 */
+	napi_set_defer_hard_irqs(napi, READ_ONCE(dev->napi_defer_hard_irqs));
+	napi_set_gro_flush_timeout(napi, READ_ONCE(dev->gro_flush_timeout));
+
 	napi_get_frags_check(napi);
 	/* Create kthread for this napi if dev->threaded is set.
 	 * Clear dev->threaded if kthread creation failed so that
@@ -6698,6 +6738,11 @@ void napi_disable(struct napi_struct *n)
 
 	hrtimer_cancel(&n->timer);
 
+	if (n->config)
+		napi_save_config(n);
+	else
+		napi_hash_del(n);
+
 	clear_bit(NAPI_STATE_DISABLE, &n->state);
 }
 EXPORT_SYMBOL(napi_disable);
@@ -6713,6 +6758,11 @@ void napi_enable(struct napi_struct *n)
 {
 	unsigned long new, val = READ_ONCE(n->state);
 
+	if (n->config)
+		napi_restore_config(n);
+	else
+		napi_hash_add(n);
+
 	do {
 		BUG_ON(!test_bit(NAPI_STATE_SCHED, &val));
 
@@ -6742,7 +6792,11 @@ void __netif_napi_del(struct napi_struct *napi)
 	if (!test_and_clear_bit(NAPI_STATE_LISTED, &napi->state))
 		return;
 
-	napi_hash_del(napi);
+	if (napi->config) {
+		napi->index = -1;
+		napi->config = NULL;
+	}
+
 	list_del_rcu(&napi->dev_list);
 	napi_free_frags(napi);
 
@@ -11079,6 +11133,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 		unsigned int txqs, unsigned int rxqs)
 {
 	struct net_device *dev;
+	size_t napi_config_sz;
+	unsigned int maxqs;
 
 	BUG_ON(strlen(name) >= sizeof(dev->name));
 
@@ -11092,6 +11148,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 		return NULL;
 	}
 
+	maxqs = max(txqs, rxqs);
+
 	dev = kvzalloc(struct_size(dev, priv, sizeof_priv),
 		       GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
 	if (!dev)
@@ -11166,6 +11224,11 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	if (!dev->ethtool)
 		goto free_all;
 
+	napi_config_sz = array_size(maxqs, sizeof(*dev->napi_config));
+	dev->napi_config = kvzalloc(napi_config_sz, GFP_KERNEL_ACCOUNT);
+	if (!dev->napi_config)
+		goto free_all;
+
 	strscpy(dev->name, name);
 	dev->name_assign_type = name_assign_type;
 	dev->group = INIT_NETDEV_GROUP;
@@ -11227,6 +11290,8 @@ void free_netdev(struct net_device *dev)
 	list_for_each_entry_safe(p, n, &dev->napi_list, dev_list)
 		netif_napi_del(p);
 
+	kvfree(dev->napi_config);
+
 	ref_tracker_dir_exit(&dev->refcnt_tracker);
 #ifdef CONFIG_PCPU_DEV_REFCNT
 	free_percpu(dev->pcpu_refcnt);
diff --git a/net/core/dev.h b/net/core/dev.h
index 26e598aa56c3..7365fa0ffdc7 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -167,11 +167,18 @@ static inline void napi_set_defer_hard_irqs(struct napi_struct *n, u32 defer)
 static inline void netdev_set_defer_hard_irqs(struct net_device *netdev,
 					      u32 defer)
 {
+	unsigned int count = max(netdev->num_rx_queues,
+				 netdev->num_tx_queues);
 	struct napi_struct *napi;
+	int i;
 
 	WRITE_ONCE(netdev->napi_defer_hard_irqs, defer);
 	list_for_each_entry(napi, &netdev->napi_list, dev_list)
 		napi_set_defer_hard_irqs(napi, defer);
+
+	if (netdev->napi_config)
+		for (i = 0; i < count; i++)
+			netdev->napi_config[i].defer_hard_irqs = defer;
 }
 
 /**
@@ -207,11 +214,18 @@ static inline void napi_set_gro_flush_timeout(struct napi_struct *n,
 static inline void netdev_set_gro_flush_timeout(struct net_device *netdev,
 						unsigned long timeout)
 {
+	unsigned int count = max(netdev->num_rx_queues,
+				 netdev->num_tx_queues);
 	struct napi_struct *napi;
+	int i;
 
 	WRITE_ONCE(netdev->gro_flush_timeout, timeout);
 	list_for_each_entry(napi, &netdev->napi_list, dev_list)
 		napi_set_gro_flush_timeout(napi, timeout);
+
+	if (netdev->napi_config)
+		for (i = 0; i < count; i++)
+			netdev->napi_config[i].gro_flush_timeout = timeout;
 }
 
 int rps_cpumask_housekeeping(struct cpumask *mask);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
                   ` (4 preceding siblings ...)
  2024-10-01 23:52 ` [RFC net-next v4 5/9] net: napi: Add napi_config Joe Damato
@ 2024-10-01 23:52 ` Joe Damato
  2024-10-08 18:20   ` Joe Damato
  2024-10-01 23:52 ` [RFC net-next v4 7/9] bnxt: Add support for persistent NAPI config Joe Damato
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Donald Hunter, Jesper Dangaard Brouer, Mina Almasry, Xuan Zhuo,
	Daniel Jurgens, open list

Add support to set per-NAPI defer_hard_irqs and gro_flush_timeout.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 Documentation/netlink/specs/netdev.yaml | 11 ++++++
 include/uapi/linux/netdev.h             |  1 +
 net/core/netdev-genl-gen.c              | 14 ++++++++
 net/core/netdev-genl-gen.h              |  1 +
 net/core/netdev-genl.c                  | 45 +++++++++++++++++++++++++
 tools/include/uapi/linux/netdev.h       |  1 +
 6 files changed, 73 insertions(+)

diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index bf13613eaa0d..7f8d2489c68c 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -690,6 +690,17 @@ operations:
         reply:
           attributes:
             - id
+    -
+      name: napi-set
+      doc: Set configurable NAPI instance settings.
+      attribute-set: napi
+      flags: [ admin-perm ]
+      do:
+        request:
+          attributes:
+            - napi-id
+            - defer-hard-irqs
+            - gro-flush-timeout
 
 kernel-family:
   headers: [ "linux/list.h"]
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index cacd33359c76..e3ebb49f60d2 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -201,6 +201,7 @@ enum {
 	NETDEV_CMD_NAPI_GET,
 	NETDEV_CMD_QSTATS_GET,
 	NETDEV_CMD_BIND_RX,
+	NETDEV_CMD_NAPI_SET,
 
 	__NETDEV_CMD_MAX,
 	NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index b28424ae06d5..901c6f65b735 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -87,6 +87,13 @@ static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD + 1]
 	[NETDEV_A_DMABUF_QUEUES] = NLA_POLICY_NESTED(netdev_queue_id_nl_policy),
 };
 
+/* NETDEV_CMD_NAPI_SET - set */
+static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT + 1] = {
+	[NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
+	[NETDEV_A_NAPI_DEFER_HARD_IRQS] = { .type = NLA_S32 },
+	[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT] = { .type = NLA_UINT },
+};
+
 /* Ops table for netdev */
 static const struct genl_split_ops netdev_nl_ops[] = {
 	{
@@ -171,6 +178,13 @@ static const struct genl_split_ops netdev_nl_ops[] = {
 		.maxattr	= NETDEV_A_DMABUF_FD,
 		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
 	},
+	{
+		.cmd		= NETDEV_CMD_NAPI_SET,
+		.doit		= netdev_nl_napi_set_doit,
+		.policy		= netdev_napi_set_nl_policy,
+		.maxattr	= NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
 };
 
 static const struct genl_multicast_group netdev_nl_mcgrps[] = {
diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h
index 8cda334fd042..85e6d7c95ada 100644
--- a/net/core/netdev-genl-gen.h
+++ b/net/core/netdev-genl-gen.h
@@ -30,6 +30,7 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb,
 			       struct netlink_callback *cb);
 int netdev_nl_napi_get_doit(struct sk_buff *skb, struct genl_info *info);
 int netdev_nl_napi_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
+int netdev_nl_napi_set_doit(struct sk_buff *skb, struct genl_info *info);
 int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
 				struct netlink_callback *cb);
 int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info);
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 64e5e4cee60d..59523318d620 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -303,6 +303,51 @@ int netdev_nl_napi_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 	return err;
 }
 
+static int
+netdev_nl_napi_set_config(struct napi_struct *napi, struct genl_info *info)
+{
+	u64 gro_flush_timeout = 0;
+	u32 defer = 0;
+
+	if (info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]) {
+		defer = nla_get_u32(info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]);
+		napi_set_defer_hard_irqs(napi, defer);
+	}
+
+	if (info->attrs[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT]) {
+		gro_flush_timeout = nla_get_uint(info->attrs[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT]);
+		napi_set_gro_flush_timeout(napi, gro_flush_timeout);
+	}
+
+	return 0;
+}
+
+int netdev_nl_napi_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct napi_struct *napi;
+	unsigned int napi_id;
+	int err;
+
+	if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_NAPI_ID))
+		return -EINVAL;
+
+	napi_id = nla_get_u32(info->attrs[NETDEV_A_NAPI_ID]);
+
+	rtnl_lock();
+
+	napi = napi_by_id(napi_id);
+	if (napi) {
+		err = netdev_nl_napi_set_config(napi, info);
+	} else {
+		NL_SET_BAD_ATTR(info->extack, info->attrs[NETDEV_A_NAPI_ID]);
+		err = -ENOENT;
+	}
+
+	rtnl_unlock();
+
+	return err;
+}
+
 static int
 netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
 			 u32 q_idx, u32 q_type, const struct genl_info *info)
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index cacd33359c76..e3ebb49f60d2 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -201,6 +201,7 @@ enum {
 	NETDEV_CMD_NAPI_GET,
 	NETDEV_CMD_QSTATS_GET,
 	NETDEV_CMD_BIND_RX,
+	NETDEV_CMD_NAPI_SET,
 
 	__NETDEV_CMD_MAX,
 	NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC net-next v4 7/9] bnxt: Add support for persistent NAPI config
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
                   ` (5 preceding siblings ...)
  2024-10-01 23:52 ` [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values Joe Damato
@ 2024-10-01 23:52 ` Joe Damato
  2024-10-01 23:52 ` [RFC net-next v4 8/9] mlx5: " Joe Damato
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	Michael Chan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, open list

Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6e422e24750a..f5da2dace982 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10986,7 +10986,8 @@ static void bnxt_init_napi(struct bnxt *bp)
 		cp_nr_rings--;
 	for (i = 0; i < cp_nr_rings; i++) {
 		bnapi = bp->bnapi[i];
-		netif_napi_add(bp->dev, &bnapi->napi, poll_fn);
+		netif_napi_add_config(bp->dev, &bnapi->napi, poll_fn,
+				      bnapi->index);
 	}
 	if (BNXT_CHIP_TYPE_NITRO_A0(bp)) {
 		bnapi = bp->bnapi[cp_nr_rings];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC net-next v4 8/9] mlx5: Add support for persistent NAPI config
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
                   ` (6 preceding siblings ...)
  2024-10-01 23:52 ` [RFC net-next v4 7/9] bnxt: Add support for persistent NAPI config Joe Damato
@ 2024-10-01 23:52 ` Joe Damato
  2024-10-01 23:52 ` [RFC net-next v4 9/9] mlx4: Add support for persistent NAPI config to RX CQs Joe Damato
  2024-10-03 23:29 ` [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Stanislav Fomichev
  9 siblings, 0 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	Saeed Mahameed, Tariq Toukan, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	open list:MELLANOX MLX5 core VPI driver, open list

Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a5659c0c4236..09ab7ac07c29 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2697,7 +2697,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 	c->aff_mask = irq_get_effective_affinity_mask(irq);
 	c->lag_port = mlx5e_enumerate_lag_port(mdev, ix);
 
-	netif_napi_add(netdev, &c->napi, mlx5e_napi_poll);
+	netif_napi_add_config(netdev, &c->napi, mlx5e_napi_poll, ix);
 	netif_napi_set_irq(&c->napi, irq);
 
 	err = mlx5e_open_queues(c, params, cparam);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC net-next v4 9/9] mlx4: Add support for persistent NAPI config to RX CQs
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
                   ` (7 preceding siblings ...)
  2024-10-01 23:52 ` [RFC net-next v4 8/9] mlx5: " Joe Damato
@ 2024-10-01 23:52 ` Joe Damato
  2024-10-03 23:29 ` [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Stanislav Fomichev
  9 siblings, 0 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-01 23:52 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Joe Damato,
	Tariq Toukan, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, open list:MELLANOX MLX4 core VPI driver, open list

Use netif_napi_add_config to assign persistent per-NAPI config when
initializing RX CQ NAPIs.

Presently, struct napi_config only has support for two fields used for
RX, so there is no need to support them with TX CQs, yet.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 461cc2c79c71..0e92956e84cf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -156,7 +156,8 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 		break;
 	case RX:
 		cq->mcq.comp = mlx4_en_rx_irq;
-		netif_napi_add(cq->dev, &cq->napi, mlx4_en_poll_rx_cq);
+		netif_napi_add_config(cq->dev, &cq->napi, mlx4_en_poll_rx_cq,
+				      cq_idx);
 		netif_napi_set_irq(&cq->napi, irq);
 		napi_enable(&cq->napi);
 		netif_queue_set_napi(cq->dev, cq_idx, NETDEV_QUEUE_TYPE_RX, &cq->napi);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 0/9] Add support for per-NAPI config via netlink
  2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
                   ` (8 preceding siblings ...)
  2024-10-01 23:52 ` [RFC net-next v4 9/9] mlx4: Add support for persistent NAPI config to RX CQs Joe Damato
@ 2024-10-03 23:29 ` Stanislav Fomichev
  2024-10-03 23:53   ` Joe Damato
  9 siblings, 1 reply; 25+ messages in thread
From: Stanislav Fomichev @ 2024-10-03 23:29 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Alexander Lobakin,
	Breno Leitao, Daniel Jurgens, David Ahern, David S. Miller,
	Donald Hunter, Eric Dumazet,
	moderated list:INTEL ETHERNET DRIVERS, Jakub Kicinski,
	Jesper Dangaard Brouer, Jiri Pirko, Johannes Berg,
	Jonathan Corbet, Kory Maincent, Leon Romanovsky,
	open list:DOCUMENTATION, open list,
	open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
	Michael Chan, Mina Almasry, Paolo Abeni, Przemek Kitszel,
	Saeed Mahameed, Sebastian Andrzej Siewior, Tariq Toukan,
	Tony Nguyen, Xuan Zhuo

On 10/01, Joe Damato wrote:
> Greetings:
> 
> Welcome to RFC v4.
> 
> Very important and significant changes have been made since RFC v3 [1],
> please see the changelog below for details.
> 
> A couple important call outs for this revision for reviewers:
> 
>   1. idpf embeds a napi_struct in an internal data structure and
>      includes an assertion on the size of napi_struct. The maintainers
>      have stated that they think anyone touching napi_struct should update
>      the assertion [2], so I've done this in patch 3. 
> 
>      Even though the assertion has been updated, I've given the
>      cacheline placement of napi_struct within idpf's internals no
>      thought or consideration.
> 
>      Would appreciate other opinions on this; I think idpf should be
>      fixed. It seems unreasonable to me that anyone changing the size of
>      a struct in the core should need to think about cachelines in idpf.

[..]

>   2. This revision seems to work (see below for a full walk through). Is
>      this the behavior we want? Am I missing some use case or some
>      behavioral thing other folks need?

The walk through looks good!


>   3. Re a previous point made by Stanislav regarding "taking over a NAPI
>      ID" when the channel count changes: mlx5 seems to call napi_disable
>      followed by netif_napi_del for the old queues and then calls
>      napi_enable for the new ones. In this RFC, the NAPI ID generation
>      is deferred to napi_enable. This means we won't end up with two of
>      the same NAPI IDs added to the hash at the same time (I am pretty
>      sure).


[..]

>      Can we assume all drivers will napi_disable the old queues before
>      napi_enable the new ones? If yes, we might not need to worry about
>      a NAPI ID takeover function.

With the explicit driver opt-in via netif_napi_add_config, this
shouldn't matter? When somebody gets to converting the drivers that
don't follow this common pattern they'll have to solve the takeover
part :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 0/9] Add support for per-NAPI config via netlink
  2024-10-03 23:29 ` [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Stanislav Fomichev
@ 2024-10-03 23:53   ` Joe Damato
  2024-10-04  2:33     ` Joe Damato
  0 siblings, 1 reply; 25+ messages in thread
From: Joe Damato @ 2024-10-03 23:53 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, Alexander Lobakin,
	Breno Leitao, Daniel Jurgens, David Ahern, David S. Miller,
	Donald Hunter, Eric Dumazet,
	moderated list:INTEL ETHERNET DRIVERS, Jakub Kicinski,
	Jesper Dangaard Brouer, Jiri Pirko, Johannes Berg,
	Jonathan Corbet, Kory Maincent, Leon Romanovsky,
	open list:DOCUMENTATION, open list,
	open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
	Michael Chan, Mina Almasry, Paolo Abeni, Przemek Kitszel,
	Saeed Mahameed, Sebastian Andrzej Siewior, Tariq Toukan,
	Tony Nguyen, Xuan Zhuo

On Thu, Oct 03, 2024 at 04:29:37PM -0700, Stanislav Fomichev wrote:
> On 10/01, Joe Damato wrote:

[...]
 
> >   2. This revision seems to work (see below for a full walk through). Is
> >      this the behavior we want? Am I missing some use case or some
> >      behavioral thing other folks need?
> 
> The walk through looks good!

Thanks for taking a look.

> >   3. Re a previous point made by Stanislav regarding "taking over a NAPI
> >      ID" when the channel count changes: mlx5 seems to call napi_disable
> >      followed by netif_napi_del for the old queues and then calls
> >      napi_enable for the new ones. In this RFC, the NAPI ID generation
> >      is deferred to napi_enable. This means we won't end up with two of
> >      the same NAPI IDs added to the hash at the same time (I am pretty
> >      sure).
> 
> 
> [..]
> 
> >      Can we assume all drivers will napi_disable the old queues before
> >      napi_enable the new ones? If yes, we might not need to worry about
> >      a NAPI ID takeover function.
> 
> With the explicit driver opt-in via netif_napi_add_config, this
> shouldn't matter? When somebody gets to converting the drivers that
> don't follow this common pattern they'll have to solve the takeover
> part :-)

That is true; that's a good point. I'll let the RFC hang out on the
list for another day or two just to give Jakub time to catch up on
his mails ;) but if you all agree... this might be ready to be
resent as a PATCH instead of an RFC.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 0/9] Add support for per-NAPI config via netlink
  2024-10-03 23:53   ` Joe Damato
@ 2024-10-04  2:33     ` Joe Damato
  2024-10-04 16:22       ` Stanislav Fomichev
  0 siblings, 1 reply; 25+ messages in thread
From: Joe Damato @ 2024-10-04  2:33 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, mkarsten, skhawaja, sdf, bjorn,
	amritha.nambiar, sridhar.samudrala, willemdebruijn.kernel,
	Alexander Lobakin, Breno Leitao, Daniel Jurgens, David Ahern,
	David S. Miller, Donald Hunter, Eric Dumazet,
	moderated list:INTEL ETHERNET DRIVERS, Jakub Kicinski,
	Jesper Dangaard Brouer, Jiri Pirko, Johannes Berg,
	Jonathan Corbet, Kory Maincent, Leon Romanovsky,
	open list:DOCUMENTATION, open list,
	open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
	Michael Chan, Mina Almasry, Paolo Abeni, Przemek Kitszel,
	Saeed Mahameed, Sebastian Andrzej Siewior, Tariq Toukan,
	Tony Nguyen, Xuan Zhuo

On Thu, Oct 03, 2024 at 04:53:13PM -0700, Joe Damato wrote:
> On Thu, Oct 03, 2024 at 04:29:37PM -0700, Stanislav Fomichev wrote:
> > On 10/01, Joe Damato wrote:
> 
> [...]
>  
> > >   2. This revision seems to work (see below for a full walk through). Is
> > >      this the behavior we want? Am I missing some use case or some
> > >      behavioral thing other folks need?
> > 
> > The walk through looks good!
> 
> Thanks for taking a look.
> 
> > >   3. Re a previous point made by Stanislav regarding "taking over a NAPI
> > >      ID" when the channel count changes: mlx5 seems to call napi_disable
> > >      followed by netif_napi_del for the old queues and then calls
> > >      napi_enable for the new ones. In this RFC, the NAPI ID generation
> > >      is deferred to napi_enable. This means we won't end up with two of
> > >      the same NAPI IDs added to the hash at the same time (I am pretty
> > >      sure).
> > 
> > 
> > [..]
> > 
> > >      Can we assume all drivers will napi_disable the old queues before
> > >      napi_enable the new ones? If yes, we might not need to worry about
> > >      a NAPI ID takeover function.
> > 
> > With the explicit driver opt-in via netif_napi_add_config, this
> > shouldn't matter? When somebody gets to converting the drivers that
> > don't follow this common pattern they'll have to solve the takeover
> > part :-)
> 
> That is true; that's a good point.

Actually, sorry, that isn't strictly true. NAPI ID generation is
moved for everything to napi_enable; they just are (or are not)
persisted depending on whether the driver opted in to add_config or
not.

So, the change does affect all drivers. NAPI IDs won't be generated
and added to the hash until napi_enable and they will be removed
from the hash in napi_disable... even if you didn't opt-in to having
storage.

Opt-ing in to storage via netif_napi_add_config just means that your
NAPI IDs (and other settings) will be persistent.

Sorry about my confusion when replying earlier.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 0/9] Add support for per-NAPI config via netlink
  2024-10-04  2:33     ` Joe Damato
@ 2024-10-04 16:22       ` Stanislav Fomichev
  0 siblings, 0 replies; 25+ messages in thread
From: Stanislav Fomichev @ 2024-10-04 16:22 UTC (permalink / raw)
  To: Joe Damato, netdev, mkarsten, skhawaja, sdf, bjorn,
	amritha.nambiar, sridhar.samudrala, willemdebruijn.kernel,
	Alexander Lobakin, Breno Leitao, Daniel Jurgens, David Ahern,
	David S. Miller, Donald Hunter, Eric Dumazet,
	moderated list:INTEL ETHERNET DRIVERS, Jakub Kicinski,
	Jesper Dangaard Brouer, Jiri Pirko, Johannes Berg,
	Jonathan Corbet, Kory Maincent, Leon Romanovsky,
	open list:DOCUMENTATION, open list,
	open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
	Michael Chan, Mina Almasry, Paolo Abeni, Przemek Kitszel,
	Saeed Mahameed, Sebastian Andrzej Siewior, Tariq Toukan,
	Tony Nguyen, Xuan Zhuo

On 10/03, Joe Damato wrote:
> On Thu, Oct 03, 2024 at 04:53:13PM -0700, Joe Damato wrote:
> > On Thu, Oct 03, 2024 at 04:29:37PM -0700, Stanislav Fomichev wrote:
> > > On 10/01, Joe Damato wrote:
> > 
> > [...]
> >  
> > > >   2. This revision seems to work (see below for a full walk through). Is
> > > >      this the behavior we want? Am I missing some use case or some
> > > >      behavioral thing other folks need?
> > > 
> > > The walk through looks good!
> > 
> > Thanks for taking a look.
> > 
> > > >   3. Re a previous point made by Stanislav regarding "taking over a NAPI
> > > >      ID" when the channel count changes: mlx5 seems to call napi_disable
> > > >      followed by netif_napi_del for the old queues and then calls
> > > >      napi_enable for the new ones. In this RFC, the NAPI ID generation
> > > >      is deferred to napi_enable. This means we won't end up with two of
> > > >      the same NAPI IDs added to the hash at the same time (I am pretty
> > > >      sure).
> > > 
> > > 
> > > [..]
> > > 
> > > >      Can we assume all drivers will napi_disable the old queues before
> > > >      napi_enable the new ones? If yes, we might not need to worry about
> > > >      a NAPI ID takeover function.
> > > 
> > > With the explicit driver opt-in via netif_napi_add_config, this
> > > shouldn't matter? When somebody gets to converting the drivers that
> > > don't follow this common pattern they'll have to solve the takeover
> > > part :-)
> > 
> > That is true; that's a good point.
> 
> Actually, sorry, that isn't strictly true. NAPI ID generation is
> moved for everything to napi_enable; they just are (or are not)
> persisted depending on whether the driver opted in to add_config or
> not.
> 
> So, the change does affect all drivers. NAPI IDs won't be generated
> and added to the hash until napi_enable and they will be removed
> from the hash in napi_disable... even if you didn't opt-in to having
> storage.
> 
> Opt-ing in to storage via netif_napi_add_config just means that your
> NAPI IDs (and other settings) will be persistent.
> 
> Sorry about my confusion when replying earlier.

AFAIA, all control operations (ethtool or similar ones via netlink),
should grab rtnl lock. So as long as both enable/disable happen
under rtnl (and in my mind they should), I don't think there is gonna
be any user-visible side-effects of your change. But I might be wrong,
let's see if others can come up with some corner cases..

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 5/9] net: napi: Add napi_config
  2024-10-01 23:52 ` [RFC net-next v4 5/9] net: napi: Add napi_config Joe Damato
@ 2024-10-08 18:19   ` Joe Damato
  2024-10-08 22:17   ` Jakub Kicinski
  1 sibling, 0 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-08 18:19 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
	Johannes Berg, open list:DOCUMENTATION, open list

On Tue, Oct 01, 2024 at 11:52:36PM +0000, Joe Damato wrote:
> Add a persistent NAPI config area for NAPI configuration to the core.
> Drivers opt-in to setting the persistent config for a NAPI by passing an
> index when calling netif_napi_add_config.
> 
> napi_config is allocated in alloc_netdev_mqs, freed in free_netdev
> (after the NAPIs are deleted).
> 
> Drivers which call netif_napi_add_config will have persistent per-NAPI
> settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings.
> 
> Per-NAPI settings are saved in napi_disable and restored in napi_enable.
> 
> Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
> Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  .../networking/net_cachelines/net_device.rst  |  1 +
>  include/linux/netdevice.h                     | 32 ++++++++
>  net/core/dev.c                                | 79 +++++++++++++++++--
>  net/core/dev.h                                | 14 ++++
>  4 files changed, 119 insertions(+), 7 deletions(-)

[...]
  
> +/**
> + * netif_napi_add_config - initialize a NAPI context with persistent config
> + * @dev: network device
> + * @napi: NAPI context
> + * @poll: polling function
> + * @weight: the poll weight of this NAPI

For anyone following along, I noticed this unnecessary bit of kdoc
left in from a previous revision and will remove it when submitting
v5.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values
  2024-10-01 23:52 ` [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values Joe Damato
@ 2024-10-08 18:20   ` Joe Damato
  2024-10-08 22:19     ` Jakub Kicinski
  0 siblings, 1 reply; 25+ messages in thread
From: Joe Damato @ 2024-10-08 18:20 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Donald Hunter,
	Jesper Dangaard Brouer, Mina Almasry, Xuan Zhuo, Daniel Jurgens,
	open list

On Tue, Oct 01, 2024 at 11:52:37PM +0000, Joe Damato wrote:
> Add support to set per-NAPI defer_hard_irqs and gro_flush_timeout.
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  Documentation/netlink/specs/netdev.yaml | 11 ++++++
>  include/uapi/linux/netdev.h             |  1 +
>  net/core/netdev-genl-gen.c              | 14 ++++++++
>  net/core/netdev-genl-gen.h              |  1 +
>  net/core/netdev-genl.c                  | 45 +++++++++++++++++++++++++
>  tools/include/uapi/linux/netdev.h       |  1 +
>  6 files changed, 73 insertions(+)

[...]

> diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
> index b28424ae06d5..901c6f65b735 100644
> --- a/net/core/netdev-genl-gen.c
> +++ b/net/core/netdev-genl-gen.c
> @@ -87,6 +87,13 @@ static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD + 1]
>  	[NETDEV_A_DMABUF_QUEUES] = NLA_POLICY_NESTED(netdev_queue_id_nl_policy),
>  };
>  
> +/* NETDEV_CMD_NAPI_SET - set */
> +static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT + 1] = {
> +	[NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
> +	[NETDEV_A_NAPI_DEFER_HARD_IRQS] = { .type = NLA_S32 },

Noticed this while re-reading the code; planning on changing this
from NLA_S32 to NLA_U32 for v5.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 3/9] net: napi: Make gro_flush_timeout per-NAPI
  2024-10-01 23:52 ` [RFC net-next v4 3/9] net: napi: Make gro_flush_timeout per-NAPI Joe Damato
@ 2024-10-08 18:22   ` Joe Damato
  2024-10-08 22:10   ` Jakub Kicinski
  1 sibling, 0 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-08 18:22 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Tony Nguyen, Przemek Kitszel, Jiri Pirko,
	Sebastian Andrzej Siewior, Lorenzo Bianconi, David Ahern,
	Kory Maincent, Johannes Berg, Breno Leitao, Alexander Lobakin,
	open list:DOCUMENTATION, open list,
	moderated list:INTEL ETHERNET DRIVERS

On Tue, Oct 01, 2024 at 11:52:34PM +0000, Joe Damato wrote:

[...]

> Note that idpf has embedded napi_struct in its internals and has
> established some series of asserts that involve the size of napi
> structure. Since this change increases the napi_struct size from 400 to
> 416 (according to pahole on my system), I've increased the assertion in
> idpf by 16 bytes. No attention whatsoever was paid to the cacheline
> placement of idpf internals as a result of this change.
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  .../networking/net_cachelines/net_device.rst  |  2 +-
>  drivers/net/ethernet/intel/idpf/idpf_txrx.h   |  2 +-
>  include/linux/netdevice.h                     |  3 +-
>  net/core/dev.c                                | 12 +++---
>  net/core/dev.h                                | 40 +++++++++++++++++++
>  net/core/net-sysfs.c                          |  2 +-
>  6 files changed, 51 insertions(+), 10 deletions(-)

[...]

> diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
> index f0537826f840..fcdf73486d46 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
> +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
> @@ -438,7 +438,7 @@ struct idpf_q_vector {
>  	__cacheline_group_end_aligned(cold);
>  };
>  libeth_cacheline_set_assert(struct idpf_q_vector, 112,
> -			    424 + 2 * sizeof(struct dim),
> +			    440 + 2 * sizeof(struct dim),
>  			    8 + sizeof(cpumask_var_t));
>  
>  struct idpf_rx_queue_stats {

Now that idpf was fixed separately [1], this will be removed in the
v5.

[1]: https://lore.kernel.org/netdev/20241004105407.73585-1-jdamato@fastly.com/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 1/9] net: napi: Make napi_defer_hard_irqs per-NAPI
  2024-10-01 23:52 ` [RFC net-next v4 1/9] net: napi: Make napi_defer_hard_irqs per-NAPI Joe Damato
@ 2024-10-08 22:08   ` Jakub Kicinski
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Kicinski @ 2024-10-08 22:08 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Paolo Abeni, Jonathan Corbet, Jiri Pirko,
	Sebastian Andrzej Siewior, Lorenzo Bianconi, Kory Maincent,
	Johannes Berg, Breno Leitao, Alexander Lobakin,
	open list:DOCUMENTATION, open list

On Tue,  1 Oct 2024 23:52:32 +0000 Joe Damato wrote:
> @@ -377,6 +377,7 @@ struct napi_struct {
>  	struct list_head	dev_list;
>  	struct hlist_node	napi_hash_node;
>  	int			irq;
> +	u32			defer_hard_irqs;

This will be read on every unmasking, let's put it above 
the "/* control-path-only fields follow */" comment

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 3/9] net: napi: Make gro_flush_timeout per-NAPI
  2024-10-01 23:52 ` [RFC net-next v4 3/9] net: napi: Make gro_flush_timeout per-NAPI Joe Damato
  2024-10-08 18:22   ` Joe Damato
@ 2024-10-08 22:10   ` Jakub Kicinski
  1 sibling, 0 replies; 25+ messages in thread
From: Jakub Kicinski @ 2024-10-08 22:10 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Paolo Abeni, Jonathan Corbet, Tony Nguyen,
	Przemek Kitszel, Jiri Pirko, Sebastian Andrzej Siewior,
	Lorenzo Bianconi, David Ahern, Kory Maincent, Johannes Berg,
	Breno Leitao, Alexander Lobakin, open list:DOCUMENTATION,
	open list, moderated list:INTEL ETHERNET DRIVERS

On Tue,  1 Oct 2024 23:52:34 +0000 Joe Damato wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 55764efc5c93..33897edd16c8 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -377,6 +377,7 @@ struct napi_struct {
>  	struct list_head	dev_list;
>  	struct hlist_node	napi_hash_node;
>  	int			irq;
> +	unsigned long		gro_flush_timeout;
>  	u32			defer_hard_irqs;
>  };

Same story

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 5/9] net: napi: Add napi_config
  2024-10-01 23:52 ` [RFC net-next v4 5/9] net: napi: Add napi_config Joe Damato
  2024-10-08 18:19   ` Joe Damato
@ 2024-10-08 22:17   ` Jakub Kicinski
  2024-10-08 22:28     ` Joe Damato
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2024-10-08 22:17 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Paolo Abeni, Jonathan Corbet, Jiri Pirko,
	Sebastian Andrzej Siewior, Lorenzo Bianconi, Johannes Berg,
	open list:DOCUMENTATION, open list

On Tue,  1 Oct 2024 23:52:36 +0000 Joe Damato wrote:
>  static inline void netdev_set_defer_hard_irqs(struct net_device *netdev,
>  					      u32 defer)
>  {
> +	unsigned int count = max(netdev->num_rx_queues,
> +				 netdev->num_tx_queues);
>  	struct napi_struct *napi;
> +	int i;
>  
>  	WRITE_ONCE(netdev->napi_defer_hard_irqs, defer);
>  	list_for_each_entry(napi, &netdev->napi_list, dev_list)
>  		napi_set_defer_hard_irqs(napi, defer);
> +
> +	if (netdev->napi_config)

Could this ever be NULL ?

> +		for (i = 0; i < count; i++)
> +			netdev->napi_config[i].defer_hard_irqs = defer;
>  }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values
  2024-10-08 18:20   ` Joe Damato
@ 2024-10-08 22:19     ` Jakub Kicinski
  2024-10-08 23:00       ` Joe Damato
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2024-10-08 22:19 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Paolo Abeni, Donald Hunter, Jesper Dangaard Brouer,
	Mina Almasry, Xuan Zhuo, Daniel Jurgens, open list

On Tue, 8 Oct 2024 11:20:47 -0700 Joe Damato wrote:
> Noticed this while re-reading the code; planning on changing this
> from NLA_S32 to NLA_U32 for v5.

Make sure you edit the spec, not the output. Looks like there may be 
a problem here (napi-id vs id in the attributes).

Make sure you run: ./tools/net/ynl/ynl-regen.sh -f
and the tree is clean afterwards

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 5/9] net: napi: Add napi_config
  2024-10-08 22:17   ` Jakub Kicinski
@ 2024-10-08 22:28     ` Joe Damato
  0 siblings, 0 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-08 22:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Paolo Abeni, Jonathan Corbet, Jiri Pirko,
	Sebastian Andrzej Siewior, Lorenzo Bianconi, Johannes Berg,
	open list:DOCUMENTATION, open list

On Tue, Oct 08, 2024 at 03:17:01PM -0700, Jakub Kicinski wrote:
> On Tue,  1 Oct 2024 23:52:36 +0000 Joe Damato wrote:
> >  static inline void netdev_set_defer_hard_irqs(struct net_device *netdev,
> >  					      u32 defer)
> >  {
> > +	unsigned int count = max(netdev->num_rx_queues,
> > +				 netdev->num_tx_queues);
> >  	struct napi_struct *napi;
> > +	int i;
> >  
> >  	WRITE_ONCE(netdev->napi_defer_hard_irqs, defer);
> >  	list_for_each_entry(napi, &netdev->napi_list, dev_list)
> >  		napi_set_defer_hard_irqs(napi, defer);
> > +
> > +	if (netdev->napi_config)
> 
> Could this ever be NULL ?

Ah, good catch. I think this was an artifact from a previous
revision where it could have been. But, I'll double check.

In the current proposed implementation, however, I don't think it
can be NULL as it is always allocated.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values
  2024-10-08 22:19     ` Jakub Kicinski
@ 2024-10-08 23:00       ` Joe Damato
  2024-10-08 23:19         ` Jakub Kicinski
  0 siblings, 1 reply; 25+ messages in thread
From: Joe Damato @ 2024-10-08 23:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Paolo Abeni, Donald Hunter, Jesper Dangaard Brouer,
	Mina Almasry, Xuan Zhuo, Daniel Jurgens, open list

On Tue, Oct 08, 2024 at 03:19:34PM -0700, Jakub Kicinski wrote:
> On Tue, 8 Oct 2024 11:20:47 -0700 Joe Damato wrote:
> > Noticed this while re-reading the code; planning on changing this
> > from NLA_S32 to NLA_U32 for v5.
> 
> Make sure you edit the spec, not the output. Looks like there may be 
> a problem here (napi-id vs id in the attributes).

I'm not sure I follow this part, sorry if I'm just missing something
here.

I was referring to NETDEV_A_NAPI_DEFER_HARD_IRQS which in RFCv4 is
listed as NLA_S32 (in this patch):

static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT + 1] = {
     [NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
     [NETDEV_A_NAPI_DEFER_HARD_IRQS] = { .type = NLA_S32 },

However, in the yaml spec (patch 2/9):

+      -
+        name: defer-hard-irqs
+        doc: The number of consecutive empty polls before IRQ deferral ends
+             and hardware IRQs are re-enabled.
+        type: u32
+        checks:
+          max: s32-max

So the type is u32 but with a "checks" to match what happens now in
sysfs.

That's why I mentioned changing NLA_S32 to NLA_U32.

Am I missing something? Not sure what you meant by "napi-id vs id" ?

> Make sure you run: ./tools/net/ynl/ynl-regen.sh -f
> and the tree is clean afterwards

OK, will do.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values
  2024-10-08 23:00       ` Joe Damato
@ 2024-10-08 23:19         ` Jakub Kicinski
  2024-10-08 23:57           ` Joe Damato
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2024-10-08 23:19 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Paolo Abeni, Donald Hunter, Jesper Dangaard Brouer,
	Mina Almasry, Xuan Zhuo, Daniel Jurgens, open list

On Tue, 8 Oct 2024 16:00:41 -0700 Joe Damato wrote:
> > Make sure you edit the spec, not the output. Looks like there may be 
> > a problem here (napi-id vs id in the attributes).  
> 
> I'm not sure I follow this part, sorry if I'm just missing something
> here.
> 
> I was referring to NETDEV_A_NAPI_DEFER_HARD_IRQS which in RFCv4 is
> listed as NLA_S32 (in this patch):
> 
> static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT + 1] = {
>      [NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
>      [NETDEV_A_NAPI_DEFER_HARD_IRQS] = { .type = NLA_S32 },
> 
> However, in the yaml spec (patch 2/9):
> 
> +      -
> +        name: defer-hard-irqs
> +        doc: The number of consecutive empty polls before IRQ deferral ends
> +             and hardware IRQs are re-enabled.
> +        type: u32
> +        checks:
> +          max: s32-max
> 
> So the type is u32 but with a "checks" to match what happens now in
> sysfs.
> 
> That's why I mentioned changing NLA_S32 to NLA_U32.
> 
> Am I missing something?

YNL will generate the correct code for your - the right type
and the right range validation. Run the command below to see.

> Not sure what you meant by "napi-id vs id" ?

I can't apply the series now, but when it was posted the YNL code
generation failed here complaining about napi-id not existing in
the attribute set in which it is used. In the napi attribute set
the NAPI ID is called just "id", not "napi-id".

> > Make sure you run: ./tools/net/ynl/ynl-regen.sh -f
> > and the tree is clean afterwards  
> 
> OK, will do.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values
  2024-10-08 23:19         ` Jakub Kicinski
@ 2024-10-08 23:57           ` Joe Damato
  0 siblings, 0 replies; 25+ messages in thread
From: Joe Damato @ 2024-10-08 23:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, David S. Miller,
	Eric Dumazet, Paolo Abeni, Donald Hunter, Jesper Dangaard Brouer,
	Mina Almasry, Xuan Zhuo, Daniel Jurgens, open list

On Tue, Oct 08, 2024 at 04:19:19PM -0700, Jakub Kicinski wrote:
> On Tue, 8 Oct 2024 16:00:41 -0700 Joe Damato wrote:
> > > Make sure you edit the spec, not the output. Looks like there may be 
> > > a problem here (napi-id vs id in the attributes).  
> > 
> > I'm not sure I follow this part, sorry if I'm just missing something
> > here.
> > 
> > I was referring to NETDEV_A_NAPI_DEFER_HARD_IRQS which in RFCv4 is
> > listed as NLA_S32 (in this patch):
> > 
> > static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT + 1] = {
> >      [NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
> >      [NETDEV_A_NAPI_DEFER_HARD_IRQS] = { .type = NLA_S32 },
> > 
> > However, in the yaml spec (patch 2/9):
> > 
> > +      -
> > +        name: defer-hard-irqs
> > +        doc: The number of consecutive empty polls before IRQ deferral ends
> > +             and hardware IRQs are re-enabled.
> > +        type: u32
> > +        checks:
> > +          max: s32-max
> > 
> > So the type is u32 but with a "checks" to match what happens now in
> > sysfs.
> > 
> > That's why I mentioned changing NLA_S32 to NLA_U32.
> > 
> > Am I missing something?
> 
> YNL will generate the correct code for your - the right type
> and the right range validation. Run the command below to see.
> 
> > Not sure what you meant by "napi-id vs id" ?
> 
> I can't apply the series now, but when it was posted the YNL code
> generation failed here complaining about napi-id not existing in
> the attribute set in which it is used. In the napi attribute set
> the NAPI ID is called just "id", not "napi-id".

Ah, I see what you mean now. It should have been obvious, but the
-gen* files are, uh, auto-generated ;)

And yes, I see now that the attribute set names it "id", so I've
fixed it and the command runs clean and I'll include the generated
output this time in the v5.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-10-08 23:57 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 23:52 [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Joe Damato
2024-10-01 23:52 ` [RFC net-next v4 1/9] net: napi: Make napi_defer_hard_irqs per-NAPI Joe Damato
2024-10-08 22:08   ` Jakub Kicinski
2024-10-01 23:52 ` [RFC net-next v4 2/9] netdev-genl: Dump napi_defer_hard_irqs Joe Damato
2024-10-01 23:52 ` [RFC net-next v4 3/9] net: napi: Make gro_flush_timeout per-NAPI Joe Damato
2024-10-08 18:22   ` Joe Damato
2024-10-08 22:10   ` Jakub Kicinski
2024-10-01 23:52 ` [RFC net-next v4 4/9] netdev-genl: Dump gro_flush_timeout Joe Damato
2024-10-01 23:52 ` [RFC net-next v4 5/9] net: napi: Add napi_config Joe Damato
2024-10-08 18:19   ` Joe Damato
2024-10-08 22:17   ` Jakub Kicinski
2024-10-08 22:28     ` Joe Damato
2024-10-01 23:52 ` [RFC net-next v4 6/9] netdev-genl: Support setting per-NAPI config values Joe Damato
2024-10-08 18:20   ` Joe Damato
2024-10-08 22:19     ` Jakub Kicinski
2024-10-08 23:00       ` Joe Damato
2024-10-08 23:19         ` Jakub Kicinski
2024-10-08 23:57           ` Joe Damato
2024-10-01 23:52 ` [RFC net-next v4 7/9] bnxt: Add support for persistent NAPI config Joe Damato
2024-10-01 23:52 ` [RFC net-next v4 8/9] mlx5: " Joe Damato
2024-10-01 23:52 ` [RFC net-next v4 9/9] mlx4: Add support for persistent NAPI config to RX CQs Joe Damato
2024-10-03 23:29 ` [RFC net-next v4 0/9] Add support for per-NAPI config via netlink Stanislav Fomichev
2024-10-03 23:53   ` Joe Damato
2024-10-04  2:33     ` Joe Damato
2024-10-04 16:22       ` Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).