* [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config
@ 2025-02-24 23:22 Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 1/6] net: move aRFS rmap management and CPU affinity to core Ahmed Zaki
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: Ahmed Zaki @ 2025-02-24 23:22 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, andrew+netdev, edumazet, kuba, horms, pabeni,
davem, michael.chan, tariqt, anthony.l.nguyen, przemyslaw.kitszel,
jdamato, shayd, akpm, shayagr, Ahmed Zaki
Drivers usually need to re-apply the user-set IRQ affinity to their IRQs
after reset. However, since there can be only one IRQ affinity notifier
for each IRQ, registering IRQ notifiers conflicts with the ARFS rmap
management in the core (which also registers separate IRQ affinity
notifiers).
Move the IRQ affinity management to the napi struct. This way we can have
a unified IRQ notifier to re-apply the user-set affinity and also manage
the ARFS rmaps.
The first patch moves the aRFS rmap management to CORE. It also adds the
IRQ affinity mask to napi_config and re-applies the mask after reset.
Patches 2, 4 and 5 use the new API for ena, ice and idpf drivers.
ICE does not always delete the NAPIs before releasing the IRQs. The third
patch makes sure the driver removes the IRQ number along with the queue
when the NAPIs are disabled. Without this, the next patches in this series
would free the IRQ before releasing the IRQ notifier (which generates
warnings).
Tested on ice and idpf.
V9:
- Merge all core changes in the first patch, then followed by drivers
changes. (Jakub)
- Refactor netif_napi_set_irq_locked() to show differences between
irq_affinity_auto only vs rx_cpu_rmap_auto = true. (Jakub)
- Move default affinity setting to netif_set_affinity_auto (Jakub).
- Add a py selftest for IRQ affinity. (Jakub)
- Remove bnxt changes since it recently added TPH (commit c214410c47d6
"bnxt_en: Add TPH support in BNXT driver"). This required the driver
to have custom IRQ affinity function. For the core to support this,
we may need to extend the API in this series to allow drivers to
provide their own callbacks when calling netif_napi_set_irq().
V8:
- https://lore.kernel.org/netdev/20250211210657.428439-1-ahmed.zaki@intel.com/
- Add a new patch in "ice" that releases the IRQs and their notifiers
when clearing the NAPI queues (pls read 3rd paragraph above).
- Add a new NAPI flag "NAPI_STATE_HAS_NOTIFIER" that simplifies the
code for IRQ notifier detection (Patch 2).
- Move the IRQ notifier auto-removal to napi_delete() instead of
napi_disable(). This is the reason for the new ice patch. (Jakub)
- Add a WARN_ON_ONCE(!napi->config) in netif_napi_set_irq_locked().
This would detect drivers that asked for irq_affinity_auto but did
not use napi_add_config(). (Patch 3) (Joe)
- Rename netif_enable_irq_affinity() to netif_set_affinity_auto()
(Patch 3) (Jakub).
V7:
- https://lore.kernel.org/netdev/20250204220622.156061-1-ahmed.zaki@intel.com/
- P1: add documentation for netif_enable_cpu_rmap()
- P1: move a couple of "if (rx_cpu_rmap_auto)" from patch 1 to patch 2
where they are really needed.
- P1: remove a defensive "if (!rmap)"
- p1: In netif_disable_cpu_rmap(), remove the for loop that frees
notifiers since this is already done in napi_disable_locked().
Also rename it to netif_del_cpu_rmap().
- P1 and P2: simplify the if conditions in netif_napi_set_irq_locked()
- Other nits
V6:
- https://lore.kernel.org/netdev/20250118003335.155379-1-ahmed.zaki@intel.com/
- Modifications to have less #ifdef CONFIG_RF_ACCL guards
- Remove rmap entry in napi_disable
- Rebase on rc7 and use netif_napi_set_irq_locked()
- Assume IRQ can be -1 and free resources if an old valid IRQ was
associated with the napi. For this, I had to merge the first 2
patches to use the new rmap API.
V5:
- https://lore.kernel.org/netdev/20250113171042.158123-1-ahmed.zaki@intel.com/
- Add kernel doc for new netdev flags (Simon).
- Remove defensive (if !napi) check in napi_irq_cpu_rmap_add()
(patch 2) since caller is already dereferencing the pointer (Simon).
- Fix build error when CONFIG_ARFS_ACCEL is not defined (patch 3).
v4:
- https://lore.kernel.org/netdev/20250109233107.17519-1-ahmed.zaki@intel.com/
- Better introduction in the cover letter.
- Fix Kernel build errors in ena_init_rx_cpu_rmap() (Patch 1)
- Fix kernel test robot warnings reported by Dan Carpenter:
https://lore.kernel.org/all/202501050625.nY1c97EX-lkp@intel.com/
- Remove unrelated empty line in patch 4 (Kalesh Anakkur Purayil)
- Fix a memleak (rmap was not freed) by calling cpu_rmap_put() in
netif_napi_affinity_release() (patch 2).
v3:
- https://lore.kernel.org/netdev/20250104004314.208259-1-ahmed.zaki@intel.com/
- Assign one cpu per mask starting from local NUMA node (Shay Drori).
- Keep the new ARFS and Affinity flags per nedev (Jakub).
v2:
- https://lore.kernel.org/netdev/202412190454.nwvp3hU2-lkp@intel.com/T/
- Also move the ARFS IRQ affinity management from drivers to core. Via
netif_napi_set_irq(), drivers can ask the core to add the IRQ to the
ARFS rmap (already allocated by the driver).
RFC -> v1:
- https://lore.kernel.org/netdev/20241210002626.366878-1-ahmed.zaki@intel.com/
- move static inline affinity functions to net/dev/core.c
- add the new napi->irq_flags (patch 1)
- add code changes to bnxt, mlx4 and ice.
Ahmed Zaki (5):
net: move aRFS rmap management and CPU affinity to core
net: ena: use napi's aRFS rmap notifers
ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues()
ice: use napi's irq affinity and rmap IRQ notifiers
idpf: use napi's irq affinity
Jakub Kicinski (1):
selftests: drv-net: add tests for napi IRQ affinity notifiers
Documentation/networking/scaling.rst | 6 +-
drivers/net/ethernet/amazon/ena/ena_netdev.c | 43 +----
drivers/net/ethernet/intel/ice/ice.h | 3 -
drivers/net/ethernet/intel/ice/ice_arfs.c | 33 +---
drivers/net/ethernet/intel/ice/ice_arfs.h | 2 -
drivers/net/ethernet/intel/ice/ice_base.c | 7 +-
drivers/net/ethernet/intel/ice/ice_lib.c | 16 +-
drivers/net/ethernet/intel/ice/ice_main.c | 47 +----
drivers/net/ethernet/intel/idpf/idpf_lib.c | 1 +
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 22 +--
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 6 +-
include/linux/cpu_rmap.h | 1 +
include/linux/netdevice.h | 24 ++-
lib/cpu_rmap.c | 2 +-
net/core/dev.c | 169 ++++++++++++++++++
.../testing/selftests/drivers/net/hw/Makefile | 4 +
tools/testing/selftests/drivers/net/hw/irq.py | 99 ++++++++++
.../selftests/drivers/net/hw/xdp_dummy.bpf.c | 13 ++
.../selftests/drivers/net/lib/py/env.py | 8 +-
19 files changed, 343 insertions(+), 163 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/hw/irq.py
create mode 100644 tools/testing/selftests/drivers/net/hw/xdp_dummy.bpf.c
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next v9 1/6] net: move aRFS rmap management and CPU affinity to core
2025-02-24 23:22 [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config Ahmed Zaki
@ 2025-02-24 23:22 ` Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 2/6] net: ena: use napi's aRFS rmap notifers Ahmed Zaki
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Ahmed Zaki @ 2025-02-24 23:22 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, andrew+netdev, edumazet, kuba, horms, pabeni,
davem, michael.chan, tariqt, anthony.l.nguyen, przemyslaw.kitszel,
jdamato, shayd, akpm, shayagr, Ahmed Zaki
A common task for most drivers is to remember the user-set CPU affinity
to its IRQs. On each netdev reset, the driver should re-assign the user's
settings to the IRQs. Unify this task across all drivers by moving the CPU
affinity to napi->config.
However, to move the CPU affinity to core, we also need to move aRFS
rmap management since aRFS uses its own IRQ notifiers.
For the aRFS, add a new netdev flag "rx_cpu_rmap_auto". Drivers supporting
aRFS should set the flag via netif_enable_cpu_rmap() and core will allocate
and manage the aRFS rmaps. Freeing the rmap is also done by core when the
netdev is freed. For better IRQ affinity management, move the IRQ rmap
notifier inside the napi_struct and add new notify.notify and
notify.release functions: netif_irq_cpu_rmap_notify() and
netif_napi_affinity_release().
Now we have the aRFS rmap management in core, add CPU affinity mask to
napi_config. To delegate the CPU affinity management to the core, drivers
must:
1 - set the new netdev flag "irq_affinity_auto":
netif_enable_irq_affinity(netdev)
2 - create the napi with persistent config:
netif_napi_add_config()
3 - bind an IRQ to the napi instance: netif_napi_set_irq()
the core will then make sure to use re-assign affinity to the napi's
IRQ.
The default IRQ mask is set to one cpu starting from the closest NUMA.
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
Documentation/networking/scaling.rst | 6 +-
include/linux/cpu_rmap.h | 1 +
include/linux/netdevice.h | 24 +++-
lib/cpu_rmap.c | 2 +-
net/core/dev.c | 169 +++++++++++++++++++++++++++
5 files changed, 195 insertions(+), 7 deletions(-)
diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst
index 4eb50bcb9d42..e5d4d3ecb980 100644
--- a/Documentation/networking/scaling.rst
+++ b/Documentation/networking/scaling.rst
@@ -427,8 +427,10 @@ rps_dev_flow_table. The stack consults a CPU to hardware queue map which
is maintained by the NIC driver. This is an auto-generated reverse map of
the IRQ affinity table shown by /proc/interrupts. Drivers can use
functions in the cpu_rmap (“CPU affinity reverse map”) kernel library
-to populate the map. For each CPU, the corresponding queue in the map is
-set to be one whose processing CPU is closest in cache locality.
+to populate the map. Alternatively, drivers can delegate the cpu_rmap
+management to the Kernel by calling netif_enable_cpu_rmap(). For each CPU,
+the corresponding queue in the map is set to be one whose processing CPU is
+closest in cache locality.
Accelerated RFS Configuration
diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h
index 20b5729903d7..2fd7ba75362a 100644
--- a/include/linux/cpu_rmap.h
+++ b/include/linux/cpu_rmap.h
@@ -32,6 +32,7 @@ struct cpu_rmap {
#define CPU_RMAP_DIST_INF 0xffff
extern struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags);
+extern void cpu_rmap_get(struct cpu_rmap *rmap);
extern int cpu_rmap_put(struct cpu_rmap *rmap);
extern int cpu_rmap_add(struct cpu_rmap *rmap, void *obj);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9a387d456592..2094d3edda73 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -352,6 +352,7 @@ struct napi_config {
u64 gro_flush_timeout;
u64 irq_suspend_timeout;
u32 defer_hard_irqs;
+ cpumask_t affinity_mask;
unsigned int napi_id;
};
@@ -394,6 +395,8 @@ struct napi_struct {
struct list_head dev_list;
struct hlist_node napi_hash_node;
int irq;
+ struct irq_affinity_notify notify;
+ int napi_rmap_idx;
int index;
struct napi_config *config;
};
@@ -409,6 +412,7 @@ enum {
NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/
NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/
NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */
+ NAPI_STATE_HAS_NOTIFIER, /* Napi has an IRQ notifier */
};
enum {
@@ -422,6 +426,7 @@ enum {
NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL),
NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED),
NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED),
+ NAPIF_STATE_HAS_NOTIFIER = BIT(NAPI_STATE_HAS_NOTIFIER),
};
enum gro_result {
@@ -1989,6 +1994,15 @@ enum netdev_reg_state {
*
* @threaded: napi threaded mode is enabled
*
+ * @irq_affinity_auto: driver wants the core to store and re-assign the IRQ
+ * affinity. Set by netif_enable_irq_affinity(), then
+ * the driver must create a persistent napi by
+ * netif_napi_add_config() and finally bind the napi to
+ * IRQ (via netif_napi_set_irq()).
+ *
+ * @rx_cpu_rmap_auto: driver wants the core to manage the ARFS rmap.
+ * Set by calling netif_enable_cpu_rmap().
+ *
* @see_all_hwtstamp_requests: device wants to see calls to
* ndo_hwtstamp_set() for all timestamp requests
* regardless of source, even if those aren't
@@ -2396,6 +2410,8 @@ struct net_device {
struct lock_class_key *qdisc_tx_busylock;
bool proto_down;
bool threaded;
+ bool irq_affinity_auto;
+ bool rx_cpu_rmap_auto;
/* priv_flags_slow, ungrouped to save space */
unsigned long see_all_hwtstamp_requests:1;
@@ -2724,10 +2740,7 @@ static inline void netdev_assert_locked_or_invisible(struct net_device *dev)
netdev_assert_locked(dev);
}
-static inline void netif_napi_set_irq_locked(struct napi_struct *napi, int irq)
-{
- napi->irq = irq;
-}
+void netif_napi_set_irq_locked(struct napi_struct *napi, int irq);
static inline void netif_napi_set_irq(struct napi_struct *napi, int irq)
{
@@ -2865,6 +2878,9 @@ static inline void netif_napi_del(struct napi_struct *napi)
synchronize_net();
}
+int netif_enable_cpu_rmap(struct net_device *dev, unsigned int num_irqs);
+void netif_set_affinity_auto(struct net_device *dev);
+
struct packet_type {
__be16 type; /* This is really htons(ether_type). */
bool ignore_outgoing;
diff --git a/lib/cpu_rmap.c b/lib/cpu_rmap.c
index 4c348670da31..f03d9be3f06b 100644
--- a/lib/cpu_rmap.c
+++ b/lib/cpu_rmap.c
@@ -73,7 +73,7 @@ static void cpu_rmap_release(struct kref *ref)
* cpu_rmap_get - internal helper to get new ref on a cpu_rmap
* @rmap: reverse-map allocated with alloc_cpu_rmap()
*/
-static inline void cpu_rmap_get(struct cpu_rmap *rmap)
+void cpu_rmap_get(struct cpu_rmap *rmap)
{
kref_get(&rmap->refcount);
}
diff --git a/net/core/dev.c b/net/core/dev.c
index 3f525278a871..b0ab0169f507 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6943,11 +6943,175 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
}
EXPORT_SYMBOL(netif_queue_set_napi);
+static void
+netif_napi_irq_notify(struct irq_affinity_notify *notify,
+ const cpumask_t *mask)
+{
+ struct napi_struct *napi =
+ container_of(notify, struct napi_struct, notify);
+#ifdef CONFIG_RFS_ACCEL
+ struct cpu_rmap *rmap = napi->dev->rx_cpu_rmap;
+ int err;
+#endif
+
+ if (napi->config && napi->dev->irq_affinity_auto)
+ cpumask_copy(&napi->config->affinity_mask, mask);
+
+#ifdef CONFIG_RFS_ACCEL
+ if (napi->dev->rx_cpu_rmap_auto) {
+ err = cpu_rmap_update(rmap, napi->napi_rmap_idx, mask);
+ if (err)
+ netdev_warn(napi->dev, "RMAP update failed (%d)\n",
+ err);
+ }
+#endif
+}
+
+#ifdef CONFIG_RFS_ACCEL
+static void netif_napi_affinity_release(struct kref *ref)
+{
+ struct napi_struct *napi =
+ container_of(ref, struct napi_struct, notify.kref);
+ struct cpu_rmap *rmap = napi->dev->rx_cpu_rmap;
+
+ netdev_assert_locked(napi->dev);
+ WARN_ON(test_and_clear_bit(NAPI_STATE_HAS_NOTIFIER,
+ &napi->state));
+
+ if (!napi->dev->rx_cpu_rmap_auto)
+ return;
+ rmap->obj[napi->napi_rmap_idx] = NULL;
+ napi->napi_rmap_idx = -1;
+ cpu_rmap_put(rmap);
+}
+
+int netif_enable_cpu_rmap(struct net_device *dev, unsigned int num_irqs)
+{
+ if (dev->rx_cpu_rmap_auto)
+ return 0;
+
+ dev->rx_cpu_rmap = alloc_irq_cpu_rmap(num_irqs);
+ if (!dev->rx_cpu_rmap)
+ return -ENOMEM;
+
+ dev->rx_cpu_rmap_auto = true;
+ return 0;
+}
+EXPORT_SYMBOL(netif_enable_cpu_rmap);
+
+static void netif_del_cpu_rmap(struct net_device *dev)
+{
+ struct cpu_rmap *rmap = dev->rx_cpu_rmap;
+
+ if (!dev->rx_cpu_rmap_auto)
+ return;
+
+ /* Free the rmap */
+ cpu_rmap_put(rmap);
+ dev->rx_cpu_rmap = NULL;
+ dev->rx_cpu_rmap_auto = false;
+}
+
+#else
+static void netif_napi_affinity_release(struct kref *ref)
+{
+}
+
+int netif_enable_cpu_rmap(struct net_device *dev, unsigned int num_irqs)
+{
+ return 0;
+}
+EXPORT_SYMBOL(netif_enable_cpu_rmap);
+
+static void netif_del_cpu_rmap(struct net_device *dev)
+{
+}
+#endif
+
+void netif_set_affinity_auto(struct net_device *dev)
+{
+ unsigned int i, maxqs, numa;
+
+ maxqs = max(dev->num_tx_queues, dev->num_rx_queues);
+ numa = dev_to_node(&dev->dev);
+
+ for (i = 0; i < maxqs; i++)
+ cpumask_set_cpu(cpumask_local_spread(i, numa),
+ &dev->napi_config[i].affinity_mask);
+
+ dev->irq_affinity_auto = true;
+}
+EXPORT_SYMBOL(netif_set_affinity_auto);
+
+void netif_napi_set_irq_locked(struct napi_struct *napi, int irq)
+{
+ int rc;
+
+ netdev_assert_locked_or_invisible(napi->dev);
+
+ if (napi->irq == irq)
+ return;
+
+ /* Remove existing resources */
+ if (test_and_clear_bit(NAPI_STATE_HAS_NOTIFIER, &napi->state))
+ irq_set_affinity_notifier(napi->irq, NULL);
+
+ napi->irq = irq;
+ if (irq < 0 ||
+ (!napi->dev->rx_cpu_rmap_auto && !napi->dev->irq_affinity_auto))
+ return;
+
+ /* Abort for buggy drivers */
+ if (napi->dev->irq_affinity_auto && WARN_ON_ONCE(!napi->config))
+ return;
+
+#ifdef CONFIG_RFS_ACCEL
+ if (napi->dev->rx_cpu_rmap_auto) {
+ rc = cpu_rmap_add(napi->dev->rx_cpu_rmap, napi);
+ if (rc < 0)
+ return;
+
+ cpu_rmap_get(napi->dev->rx_cpu_rmap);
+ napi->napi_rmap_idx = rc;
+ }
+#endif
+
+ /* Use core IRQ notifier */
+ napi->notify.notify = netif_napi_irq_notify;
+ napi->notify.release = netif_napi_affinity_release;
+ rc = irq_set_affinity_notifier(irq, &napi->notify);
+ if (rc) {
+ netdev_warn(napi->dev, "Unable to set IRQ notifier (%d)\n",
+ rc);
+ goto put_rmap;
+ }
+
+ set_bit(NAPI_STATE_HAS_NOTIFIER, &napi->state);
+ return;
+
+put_rmap:
+#ifdef CONFIG_RFS_ACCEL
+ if (napi->dev->rx_cpu_rmap_auto) {
+ cpu_rmap_put(napi->dev->rx_cpu_rmap);
+ napi->dev->rx_cpu_rmap->obj[napi->napi_rmap_idx] = NULL;
+ napi->napi_rmap_idx = -1;
+ }
+#endif
+ napi->notify.notify = NULL;
+ napi->notify.release = NULL;
+}
+EXPORT_SYMBOL(netif_napi_set_irq_locked);
+
static void napi_restore_config(struct napi_struct *n)
{
n->defer_hard_irqs = n->config->defer_hard_irqs;
n->gro_flush_timeout = n->config->gro_flush_timeout;
n->irq_suspend_timeout = n->config->irq_suspend_timeout;
+
+ if (n->dev->irq_affinity_auto &&
+ test_bit(NAPI_STATE_HAS_NOTIFIER, &n->state))
+ irq_set_affinity(n->irq, &n->config->affinity_mask);
+
/* a NAPI ID might be stored in the config, if so use it. if not, use
* napi_hash_add to generate one for us.
*/
@@ -7168,6 +7332,9 @@ void __netif_napi_del_locked(struct napi_struct *napi)
/* Make sure NAPI is disabled (or was never enabled). */
WARN_ON(!test_bit(NAPI_STATE_SCHED, &napi->state));
+ if (test_and_clear_bit(NAPI_STATE_HAS_NOTIFIER, &napi->state))
+ irq_set_affinity_notifier(napi->irq, NULL);
+
if (napi->config) {
napi->index = -1;
napi->config = NULL;
@@ -11720,6 +11887,8 @@ void free_netdev(struct net_device *dev)
netdev_napi_exit(dev);
+ netif_del_cpu_rmap(dev);
+
ref_tracker_dir_exit(&dev->refcnt_tracker);
#ifdef CONFIG_PCPU_DEV_REFCNT
free_percpu(dev->pcpu_refcnt);
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v9 2/6] net: ena: use napi's aRFS rmap notifers
2025-02-24 23:22 [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 1/6] net: move aRFS rmap management and CPU affinity to core Ahmed Zaki
@ 2025-02-24 23:22 ` Ahmed Zaki
2025-03-03 17:11 ` Arinzon, David
2025-02-24 23:22 ` [PATCH net-next v9 3/6] ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues() Ahmed Zaki
` (4 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Ahmed Zaki @ 2025-02-24 23:22 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, andrew+netdev, edumazet, kuba, horms, pabeni,
davem, michael.chan, tariqt, anthony.l.nguyen, przemyslaw.kitszel,
jdamato, shayd, akpm, shayagr, Ahmed Zaki, David Arinzon
Use the core's rmap notifiers and delete our own.
Acked-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
drivers/net/ethernet/amazon/ena/ena_netdev.c | 43 +-------------------
1 file changed, 1 insertion(+), 42 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index c1295dfad0d0..6aab85a7c60a 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -5,9 +5,6 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#ifdef CONFIG_RFS_ACCEL
-#include <linux/cpu_rmap.h>
-#endif /* CONFIG_RFS_ACCEL */
#include <linux/ethtool.h>
#include <linux/kernel.h>
#include <linux/module.h>
@@ -162,30 +159,6 @@ int ena_xmit_common(struct ena_adapter *adapter,
return 0;
}
-static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter)
-{
-#ifdef CONFIG_RFS_ACCEL
- u32 i;
- int rc;
-
- adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter->num_io_queues);
- if (!adapter->netdev->rx_cpu_rmap)
- return -ENOMEM;
- for (i = 0; i < adapter->num_io_queues; i++) {
- int irq_idx = ENA_IO_IRQ_IDX(i);
-
- rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap,
- pci_irq_vector(adapter->pdev, irq_idx));
- if (rc) {
- free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
- adapter->netdev->rx_cpu_rmap = NULL;
- return rc;
- }
- }
-#endif /* CONFIG_RFS_ACCEL */
- return 0;
-}
-
static void ena_init_io_rings_common(struct ena_adapter *adapter,
struct ena_ring *ring, u16 qid)
{
@@ -1596,7 +1569,7 @@ static int ena_enable_msix(struct ena_adapter *adapter)
adapter->num_io_queues = irq_cnt - ENA_ADMIN_MSIX_VEC;
}
- if (ena_init_rx_cpu_rmap(adapter))
+ if (netif_enable_cpu_rmap(adapter->netdev, adapter->num_io_queues))
netif_warn(adapter, probe, adapter->netdev,
"Failed to map IRQs to CPUs\n");
@@ -1742,13 +1715,6 @@ static void ena_free_io_irq(struct ena_adapter *adapter)
struct ena_irq *irq;
int i;
-#ifdef CONFIG_RFS_ACCEL
- if (adapter->msix_vecs >= 1) {
- free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
- adapter->netdev->rx_cpu_rmap = NULL;
- }
-#endif /* CONFIG_RFS_ACCEL */
-
for (i = ENA_IO_IRQ_FIRST_IDX; i < ENA_MAX_MSIX_VEC(io_queue_count); i++) {
irq = &adapter->irq_tbl[i];
irq_set_affinity_hint(irq->vector, NULL);
@@ -4131,13 +4097,6 @@ static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
ena_dev = adapter->ena_dev;
netdev = adapter->netdev;
-#ifdef CONFIG_RFS_ACCEL
- if ((adapter->msix_vecs >= 1) && (netdev->rx_cpu_rmap)) {
- free_irq_cpu_rmap(netdev->rx_cpu_rmap);
- netdev->rx_cpu_rmap = NULL;
- }
-
-#endif /* CONFIG_RFS_ACCEL */
/* Make sure timer and reset routine won't be called after
* freeing device resources.
*/
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v9 3/6] ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues()
2025-02-24 23:22 [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 1/6] net: move aRFS rmap management and CPU affinity to core Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 2/6] net: ena: use napi's aRFS rmap notifers Ahmed Zaki
@ 2025-02-24 23:22 ` Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 4/6] ice: use napi's irq affinity and rmap IRQ notifiers Ahmed Zaki
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Ahmed Zaki @ 2025-02-24 23:22 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, andrew+netdev, edumazet, kuba, horms, pabeni,
davem, michael.chan, tariqt, anthony.l.nguyen, przemyslaw.kitszel,
jdamato, shayd, akpm, shayagr, Ahmed Zaki
We set the NAPI's IRQ number in ice_vsi_set_napi_queues(). Clear the
NAPI's IRQ in ice_vsi_clear_napi_queues().
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
drivers/net/ethernet/intel/ice/ice_lib.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 7f5b229cab05..ce30674abf8f 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2765,11 +2765,18 @@ void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
void ice_vsi_clear_napi_queues(struct ice_vsi *vsi)
{
struct net_device *netdev = vsi->netdev;
- int q_idx;
+ int q_idx, v_idx;
if (!netdev)
return;
+ /* Clear the NAPI's interrupt number */
+ ice_for_each_q_vector(vsi, v_idx) {
+ struct ice_q_vector *q_vector = vsi->q_vectors[v_idx];
+
+ netif_napi_set_irq(&q_vector->napi, -1);
+ }
+
ice_for_each_txq(vsi, q_idx)
netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX, NULL);
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v9 4/6] ice: use napi's irq affinity and rmap IRQ notifiers
2025-02-24 23:22 [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config Ahmed Zaki
` (2 preceding siblings ...)
2025-02-24 23:22 ` [PATCH net-next v9 3/6] ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues() Ahmed Zaki
@ 2025-02-24 23:22 ` Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 5/6] idpf: use napi's irq affinity Ahmed Zaki
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Ahmed Zaki @ 2025-02-24 23:22 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, andrew+netdev, edumazet, kuba, horms, pabeni,
davem, michael.chan, tariqt, anthony.l.nguyen, przemyslaw.kitszel,
jdamato, shayd, akpm, shayagr, Ahmed Zaki
Delete the driver CPU affinity and aRFS rmap info, use the core's
API instead.
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
drivers/net/ethernet/intel/ice/ice.h | 3 --
drivers/net/ethernet/intel/ice/ice_arfs.c | 33 +---------------
drivers/net/ethernet/intel/ice/ice_arfs.h | 2 -
drivers/net/ethernet/intel/ice/ice_base.c | 7 +---
drivers/net/ethernet/intel/ice/ice_lib.c | 7 ----
drivers/net/ethernet/intel/ice/ice_main.c | 47 ++---------------------
6 files changed, 6 insertions(+), 93 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index c9104b13e1d2..5ceeae664598 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -475,9 +475,6 @@ struct ice_q_vector {
struct ice_ring_container rx;
struct ice_ring_container tx;
- cpumask_t affinity_mask;
- struct irq_affinity_notify affinity_notify;
-
struct ice_channel *ch;
char name[ICE_INT_NAME_STR_LEN];
diff --git a/drivers/net/ethernet/intel/ice/ice_arfs.c b/drivers/net/ethernet/intel/ice/ice_arfs.c
index 7cee365cc7d1..171cdec741c2 100644
--- a/drivers/net/ethernet/intel/ice/ice_arfs.c
+++ b/drivers/net/ethernet/intel/ice/ice_arfs.c
@@ -570,25 +570,6 @@ void ice_clear_arfs(struct ice_vsi *vsi)
vsi->arfs_fltr_cntrs = NULL;
}
-/**
- * ice_free_cpu_rx_rmap - free setup CPU reverse map
- * @vsi: the VSI to be forwarded to
- */
-void ice_free_cpu_rx_rmap(struct ice_vsi *vsi)
-{
- struct net_device *netdev;
-
- if (!vsi || vsi->type != ICE_VSI_PF)
- return;
-
- netdev = vsi->netdev;
- if (!netdev || !netdev->rx_cpu_rmap)
- return;
-
- free_irq_cpu_rmap(netdev->rx_cpu_rmap);
- netdev->rx_cpu_rmap = NULL;
-}
-
/**
* ice_set_cpu_rx_rmap - setup CPU reverse map for each queue
* @vsi: the VSI to be forwarded to
@@ -597,7 +578,6 @@ int ice_set_cpu_rx_rmap(struct ice_vsi *vsi)
{
struct net_device *netdev;
struct ice_pf *pf;
- int i;
if (!vsi || vsi->type != ICE_VSI_PF)
return 0;
@@ -610,18 +590,7 @@ int ice_set_cpu_rx_rmap(struct ice_vsi *vsi)
netdev_dbg(netdev, "Setup CPU RMAP: vsi type 0x%x, ifname %s, q_vectors %d\n",
vsi->type, netdev->name, vsi->num_q_vectors);
- netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(vsi->num_q_vectors);
- if (unlikely(!netdev->rx_cpu_rmap))
- return -EINVAL;
-
- ice_for_each_q_vector(vsi, i)
- if (irq_cpu_rmap_add(netdev->rx_cpu_rmap,
- vsi->q_vectors[i]->irq.virq)) {
- ice_free_cpu_rx_rmap(vsi);
- return -EINVAL;
- }
-
- return 0;
+ return netif_enable_cpu_rmap(netdev, vsi->num_q_vectors);
}
/**
diff --git a/drivers/net/ethernet/intel/ice/ice_arfs.h b/drivers/net/ethernet/intel/ice/ice_arfs.h
index 9669ad9bf7b5..9706293128c3 100644
--- a/drivers/net/ethernet/intel/ice/ice_arfs.h
+++ b/drivers/net/ethernet/intel/ice/ice_arfs.h
@@ -45,7 +45,6 @@ int
ice_rx_flow_steer(struct net_device *netdev, const struct sk_buff *skb,
u16 rxq_idx, u32 flow_id);
void ice_clear_arfs(struct ice_vsi *vsi);
-void ice_free_cpu_rx_rmap(struct ice_vsi *vsi);
void ice_init_arfs(struct ice_vsi *vsi);
void ice_sync_arfs_fltrs(struct ice_pf *pf);
int ice_set_cpu_rx_rmap(struct ice_vsi *vsi);
@@ -56,7 +55,6 @@ ice_is_arfs_using_perfect_flow(struct ice_hw *hw,
enum ice_fltr_ptype flow_type);
#else
static inline void ice_clear_arfs(struct ice_vsi *vsi) { }
-static inline void ice_free_cpu_rx_rmap(struct ice_vsi *vsi) { }
static inline void ice_init_arfs(struct ice_vsi *vsi) { }
static inline void ice_sync_arfs_fltrs(struct ice_pf *pf) { }
static inline void ice_remove_arfs(struct ice_pf *pf) { }
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index b3234a55a253..6db4ad8fc70b 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -147,10 +147,6 @@ static int ice_vsi_alloc_q_vector(struct ice_vsi *vsi, u16 v_idx)
q_vector->reg_idx = q_vector->irq.index;
q_vector->vf_reg_idx = q_vector->irq.index;
- /* only set affinity_mask if the CPU is online */
- if (cpu_online(v_idx))
- cpumask_set_cpu(v_idx, &q_vector->affinity_mask);
-
/* This will not be called in the driver load path because the netdev
* will not be created yet. All other cases with register the NAPI
* handler here (i.e. resume, reset/rebuild, etc.)
@@ -276,7 +272,8 @@ static void ice_cfg_xps_tx_ring(struct ice_tx_ring *ring)
if (test_and_set_bit(ICE_TX_XPS_INIT_DONE, ring->xps_state))
return;
- netif_set_xps_queue(ring->netdev, &ring->q_vector->affinity_mask,
+ netif_set_xps_queue(ring->netdev,
+ &ring->q_vector->napi.config->affinity_mask,
ring->q_index);
}
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index ce30674abf8f..715efd8a359f 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2592,7 +2592,6 @@ void ice_vsi_free_irq(struct ice_vsi *vsi)
return;
vsi->irqs_ready = false;
- ice_free_cpu_rx_rmap(vsi);
ice_for_each_q_vector(vsi, i) {
int irq_num;
@@ -2605,12 +2604,6 @@ void ice_vsi_free_irq(struct ice_vsi *vsi)
vsi->q_vectors[i]->num_ring_rx))
continue;
- /* clear the affinity notifier in the IRQ descriptor */
- if (!IS_ENABLED(CONFIG_RFS_ACCEL))
- irq_set_affinity_notifier(irq_num, NULL);
-
- /* clear the affinity_hint in the IRQ descriptor */
- irq_update_affinity_hint(irq_num, NULL);
synchronize_irq(irq_num);
devm_free_irq(ice_pf_to_dev(pf), irq_num, vsi->q_vectors[i]);
}
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index b084839eb811..eff4afabeef6 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2527,34 +2527,6 @@ int ice_schedule_reset(struct ice_pf *pf, enum ice_reset_req reset)
return 0;
}
-/**
- * ice_irq_affinity_notify - Callback for affinity changes
- * @notify: context as to what irq was changed
- * @mask: the new affinity mask
- *
- * This is a callback function used by the irq_set_affinity_notifier function
- * so that we may register to receive changes to the irq affinity masks.
- */
-static void
-ice_irq_affinity_notify(struct irq_affinity_notify *notify,
- const cpumask_t *mask)
-{
- struct ice_q_vector *q_vector =
- container_of(notify, struct ice_q_vector, affinity_notify);
-
- cpumask_copy(&q_vector->affinity_mask, mask);
-}
-
-/**
- * ice_irq_affinity_release - Callback for affinity notifier release
- * @ref: internal core kernel usage
- *
- * This is a callback function used by the irq_set_affinity_notifier function
- * to inform the current notification subscriber that they will no longer
- * receive notifications.
- */
-static void ice_irq_affinity_release(struct kref __always_unused *ref) {}
-
/**
* ice_vsi_ena_irq - Enable IRQ for the given VSI
* @vsi: the VSI being configured
@@ -2618,19 +2590,6 @@ static int ice_vsi_req_irq_msix(struct ice_vsi *vsi, char *basename)
err);
goto free_q_irqs;
}
-
- /* register for affinity change notifications */
- if (!IS_ENABLED(CONFIG_RFS_ACCEL)) {
- struct irq_affinity_notify *affinity_notify;
-
- affinity_notify = &q_vector->affinity_notify;
- affinity_notify->notify = ice_irq_affinity_notify;
- affinity_notify->release = ice_irq_affinity_release;
- irq_set_affinity_notifier(irq_num, affinity_notify);
- }
-
- /* assign the mask for this irq */
- irq_update_affinity_hint(irq_num, &q_vector->affinity_mask);
}
err = ice_set_cpu_rx_rmap(vsi);
@@ -2646,9 +2605,6 @@ static int ice_vsi_req_irq_msix(struct ice_vsi *vsi, char *basename)
free_q_irqs:
while (vector--) {
irq_num = vsi->q_vectors[vector]->irq.virq;
- if (!IS_ENABLED(CONFIG_RFS_ACCEL))
- irq_set_affinity_notifier(irq_num, NULL);
- irq_update_affinity_hint(irq_num, NULL);
devm_free_irq(dev, irq_num, &vsi->q_vectors[vector]);
}
return err;
@@ -3675,6 +3631,9 @@ void ice_set_netdev_features(struct net_device *netdev)
*/
netdev->hw_features |= NETIF_F_RXFCS;
+ /* Allow core to manage IRQs affinity */
+ netif_set_affinity_auto(netdev);
+
netif_set_tso_max_size(netdev, ICE_MAX_TSO_SIZE);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v9 5/6] idpf: use napi's irq affinity
2025-02-24 23:22 [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config Ahmed Zaki
` (3 preceding siblings ...)
2025-02-24 23:22 ` [PATCH net-next v9 4/6] ice: use napi's irq affinity and rmap IRQ notifiers Ahmed Zaki
@ 2025-02-24 23:22 ` Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 6/6] selftests: drv-net: add tests for napi IRQ affinity notifiers Ahmed Zaki
2025-02-27 4:10 ` [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config patchwork-bot+netdevbpf
6 siblings, 0 replies; 12+ messages in thread
From: Ahmed Zaki @ 2025-02-24 23:22 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, andrew+netdev, edumazet, kuba, horms, pabeni,
davem, michael.chan, tariqt, anthony.l.nguyen, przemyslaw.kitszel,
jdamato, shayd, akpm, shayagr, Ahmed Zaki
Delete the driver CPU affinity info and use the core's napi config
instead.
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
drivers/net/ethernet/intel/idpf/idpf_lib.c | 1 +
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 22 +++++++--------------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 6 ++----
3 files changed, 10 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index a3d6b8f198a8..f3aea7bcdaa3 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -814,6 +814,7 @@ static int idpf_cfg_netdev(struct idpf_vport *vport)
netdev->hw_features |= dflt_features | offloads;
netdev->hw_enc_features |= dflt_features | offloads;
idpf_set_ethtool_ops(netdev);
+ netif_set_affinity_auto(netdev);
SET_NETDEV_DEV(netdev, &adapter->pdev->dev);
/* carrier off on init to avoid Tx hangs */
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 2747dc69999a..672dfad1fb21 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -3551,8 +3551,6 @@ void idpf_vport_intr_rel(struct idpf_vport *vport)
q_vector->tx = NULL;
kfree(q_vector->rx);
q_vector->rx = NULL;
-
- free_cpumask_var(q_vector->affinity_mask);
}
kfree(vport->q_vectors);
@@ -3579,8 +3577,6 @@ static void idpf_vport_intr_rel_irq(struct idpf_vport *vport)
vidx = vport->q_vector_idxs[vector];
irq_num = adapter->msix_entries[vidx].vector;
- /* clear the affinity_mask in the IRQ descriptor */
- irq_set_affinity_hint(irq_num, NULL);
kfree(free_irq(irq_num, q_vector));
}
}
@@ -3768,8 +3764,6 @@ static int idpf_vport_intr_req_irq(struct idpf_vport *vport)
"Request_irq failed, error: %d\n", err);
goto free_q_irqs;
}
- /* assign the mask for this irq */
- irq_set_affinity_hint(irq_num, q_vector->affinity_mask);
}
return 0;
@@ -4181,7 +4175,8 @@ static int idpf_vport_intr_init_vec_idx(struct idpf_vport *vport)
static void idpf_vport_intr_napi_add_all(struct idpf_vport *vport)
{
int (*napi_poll)(struct napi_struct *napi, int budget);
- u16 v_idx;
+ u16 v_idx, qv_idx;
+ int irq_num;
if (idpf_is_queue_model_split(vport->txq_model))
napi_poll = idpf_vport_splitq_napi_poll;
@@ -4190,12 +4185,12 @@ static void idpf_vport_intr_napi_add_all(struct idpf_vport *vport)
for (v_idx = 0; v_idx < vport->num_q_vectors; v_idx++) {
struct idpf_q_vector *q_vector = &vport->q_vectors[v_idx];
+ qv_idx = vport->q_vector_idxs[v_idx];
+ irq_num = vport->adapter->msix_entries[qv_idx].vector;
- netif_napi_add(vport->netdev, &q_vector->napi, napi_poll);
-
- /* only set affinity_mask if the CPU is online */
- if (cpu_online(v_idx))
- cpumask_set_cpu(v_idx, q_vector->affinity_mask);
+ netif_napi_add_config(vport->netdev, &q_vector->napi,
+ napi_poll, v_idx);
+ netif_napi_set_irq(&q_vector->napi, irq_num);
}
}
@@ -4239,9 +4234,6 @@ int idpf_vport_intr_alloc(struct idpf_vport *vport)
q_vector->rx_intr_mode = IDPF_ITR_DYNAMIC;
q_vector->rx_itr_idx = VIRTCHNL2_ITR_IDX_0;
- if (!zalloc_cpumask_var(&q_vector->affinity_mask, GFP_KERNEL))
- goto error;
-
q_vector->tx = kcalloc(txqs_per_vector, sizeof(*q_vector->tx),
GFP_KERNEL);
if (!q_vector->tx)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index cd9a20c9cc7f..b029f566e57c 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -382,7 +382,6 @@ struct idpf_intr_reg {
* @rx_intr_mode: Dynamic ITR or not
* @rx_itr_idx: RX ITR index
* @v_idx: Vector index
- * @affinity_mask: CPU affinity mask
*/
struct idpf_q_vector {
__cacheline_group_begin_aligned(read_mostly);
@@ -419,13 +418,12 @@ struct idpf_q_vector {
__cacheline_group_begin_aligned(cold);
u16 v_idx;
- cpumask_var_t affinity_mask;
__cacheline_group_end_aligned(cold);
};
libeth_cacheline_set_assert(struct idpf_q_vector, 120,
24 + sizeof(struct napi_struct) +
2 * sizeof(struct dim),
- 8 + sizeof(cpumask_var_t));
+ 8);
struct idpf_rx_queue_stats {
u64_stats_t packets;
@@ -921,7 +919,7 @@ static inline int idpf_q_vector_to_mem(const struct idpf_q_vector *q_vector)
if (!q_vector)
return NUMA_NO_NODE;
- cpu = cpumask_first(q_vector->affinity_mask);
+ cpu = cpumask_first(&q_vector->napi.config->affinity_mask);
return cpu < nr_cpu_ids ? cpu_to_mem(cpu) : NUMA_NO_NODE;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v9 6/6] selftests: drv-net: add tests for napi IRQ affinity notifiers
2025-02-24 23:22 [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config Ahmed Zaki
` (4 preceding siblings ...)
2025-02-24 23:22 ` [PATCH net-next v9 5/6] idpf: use napi's irq affinity Ahmed Zaki
@ 2025-02-24 23:22 ` Ahmed Zaki
2025-02-27 4:10 ` [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config patchwork-bot+netdevbpf
6 siblings, 0 replies; 12+ messages in thread
From: Ahmed Zaki @ 2025-02-24 23:22 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, andrew+netdev, edumazet, kuba, horms, pabeni,
davem, michael.chan, tariqt, anthony.l.nguyen, przemyslaw.kitszel,
jdamato, shayd, akpm, shayagr, Ahmed Zaki
From: Jakub Kicinski <kuba@kernel.org>
Add tests to check that the napi retained the IRQ after down/up,
multiple changes in the number of rx queues and after
attaching/releasing XDP program.
Tested on ice and idpf:
# NETIF=<iface> tools/testing/selftests/drivers/net/hw/irq.py
KTAP version 1
1..4
ok 1 irq.check_irqs_reported
ok 2 irq.check_reconfig_queues
ok 3 irq.check_reconfig_xdp
ok 4 irq.check_down
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Tested-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
.../testing/selftests/drivers/net/hw/Makefile | 4 +
tools/testing/selftests/drivers/net/hw/irq.py | 99 +++++++++++++++++++
.../selftests/drivers/net/hw/xdp_dummy.bpf.c | 13 +++
.../selftests/drivers/net/lib/py/env.py | 8 +-
4 files changed, 123 insertions(+), 1 deletion(-)
create mode 100755 tools/testing/selftests/drivers/net/hw/irq.py
create mode 100644 tools/testing/selftests/drivers/net/hw/xdp_dummy.bpf.c
diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index ae783e18be83..73e7b826a141 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -10,6 +10,7 @@ TEST_PROGS = \
ethtool_rmon.sh \
hw_stats_l3.sh \
hw_stats_l3_gre.sh \
+ irq.py \
loopback.sh \
nic_link_layer.py \
nic_performance.py \
@@ -33,9 +34,12 @@ TEST_INCLUDES := \
# YNL files, must be before "include ..lib.mk"
YNL_GEN_FILES := ncdevmem
TEST_GEN_FILES += $(YNL_GEN_FILES)
+TEST_GEN_FILES += $(patsubst %.c,%.o,$(wildcard *.bpf.c))
include ../../../lib.mk
# YNL build
YNL_GENS := ethtool netdev
include ../../../net/ynl.mk
+
+include ../../../net/bpf.mk
diff --git a/tools/testing/selftests/drivers/net/hw/irq.py b/tools/testing/selftests/drivers/net/hw/irq.py
new file mode 100755
index 000000000000..42ab98370245
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/irq.py
@@ -0,0 +1,99 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+from lib.py import ksft_run, ksft_exit
+from lib.py import ksft_ge, ksft_eq
+from lib.py import KsftSkipEx
+from lib.py import ksft_disruptive
+from lib.py import EthtoolFamily, NetdevFamily
+from lib.py import NetDrvEnv
+from lib.py import cmd, ip, defer
+
+
+def read_affinity(irq) -> str:
+ with open(f'/proc/irq/{irq}/smp_affinity', 'r') as fp:
+ return fp.read().lstrip("0,").strip()
+
+
+def write_affinity(irq, what) -> str:
+ if what != read_affinity(irq):
+ with open(f'/proc/irq/{irq}/smp_affinity', 'w') as fp:
+ fp.write(what)
+
+
+def check_irqs_reported(cfg) -> None:
+ """ Check that device reports IRQs for NAPI instances """
+ napis = cfg.netnl.napi_get({"ifindex": cfg.ifindex}, dump=True)
+ irqs = sum(['irq' in x for x in napis])
+
+ ksft_ge(irqs, 1)
+ ksft_eq(irqs, len(napis))
+
+
+def _check_reconfig(cfg, reconfig_cb) -> None:
+ napis = cfg.netnl.napi_get({"ifindex": cfg.ifindex}, dump=True)
+ for n in reversed(napis):
+ if 'irq' in n:
+ break
+ else:
+ raise KsftSkipEx(f"Device has no NAPI with IRQ attribute (#napis: {len(napis)}")
+
+ old = read_affinity(n['irq'])
+ # pick an affinity that's not the current one
+ new = "3" if old != "3" else "5"
+ write_affinity(n['irq'], new)
+ defer(write_affinity, n['irq'], old)
+
+ reconfig_cb(cfg)
+
+ ksft_eq(read_affinity(n['irq']), new, comment="IRQ affinity changed after reconfig")
+
+
+def check_reconfig_queues(cfg) -> None:
+ def reconfig(cfg) -> None:
+ channels = cfg.ethnl.channels_get({'header': {'dev-index': cfg.ifindex}})
+ if channels['combined-count'] == 0:
+ rx_type = 'rx'
+ else:
+ rx_type = 'combined'
+ cur_queue_cnt = channels[f'{rx_type}-count']
+ max_queue_cnt = channels[f'{rx_type}-max']
+
+ cmd(f"ethtool -L {cfg.ifname} {rx_type} 1")
+ cmd(f"ethtool -L {cfg.ifname} {rx_type} {max_queue_cnt}")
+ cmd(f"ethtool -L {cfg.ifname} {rx_type} {cur_queue_cnt}")
+
+ _check_reconfig(cfg, reconfig)
+
+
+def check_reconfig_xdp(cfg) -> None:
+ def reconfig(cfg) -> None:
+ ip(f"link set dev %s xdp obj %s sec xdp" %
+ (cfg.ifname, cfg.rpath("xdp_dummy.bpf.o")))
+ ip(f"link set dev %s xdp off" % cfg.ifname)
+
+ _check_reconfig(cfg, reconfig)
+
+
+@ksft_disruptive
+def check_down(cfg) -> None:
+ def reconfig(cfg) -> None:
+ ip("link set dev %s down" % cfg.ifname)
+ ip("link set dev %s up" % cfg.ifname)
+
+ _check_reconfig(cfg, reconfig)
+
+
+def main() -> None:
+ with NetDrvEnv(__file__, nsim_test=False) as cfg:
+ cfg.ethnl = EthtoolFamily()
+ cfg.netnl = NetdevFamily()
+
+ ksft_run([check_irqs_reported, check_reconfig_queues,
+ check_reconfig_xdp, check_down],
+ args=(cfg, ))
+ ksft_exit()
+
+
+if __name__ == "__main__":
+ main()
diff --git a/tools/testing/selftests/drivers/net/hw/xdp_dummy.bpf.c b/tools/testing/selftests/drivers/net/hw/xdp_dummy.bpf.c
new file mode 100644
index 000000000000..d988b2e0cee8
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/xdp_dummy.bpf.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define KBUILD_MODNAME "xdp_dummy"
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("xdp")
+int xdp_dummy_prog(struct xdp_md *ctx)
+{
+ return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index 96b33b5ef9dd..fd4d674e6c72 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -58,14 +58,20 @@ class NetDrvEnv(NetDrvEnvBase):
"""
Class for a single NIC / host env, with no remote end
"""
- def __init__(self, src_path, **kwargs):
+ def __init__(self, src_path, nsim_test=None, **kwargs):
super().__init__(src_path)
self._ns = None
if 'NETIF' in self.env:
+ if nsim_test is True:
+ raise KsftXfailEx("Test only works on netdevsim")
+
self.dev = ip("-d link show dev " + self.env['NETIF'], json=True)[0]
else:
+ if nsim_test is False:
+ raise KsftXfailEx("Test does not work on netdevsim")
+
self._ns = NetdevSimDev(**kwargs)
self.dev = self._ns.nsims[0].dev
self.ifname = self.dev['ifname']
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config
2025-02-24 23:22 [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config Ahmed Zaki
` (5 preceding siblings ...)
2025-02-24 23:22 ` [PATCH net-next v9 6/6] selftests: drv-net: add tests for napi IRQ affinity notifiers Ahmed Zaki
@ 2025-02-27 4:10 ` patchwork-bot+netdevbpf
6 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-02-27 4:10 UTC (permalink / raw)
To: Ahmed Zaki
Cc: netdev, intel-wired-lan, andrew+netdev, edumazet, kuba, horms,
pabeni, davem, michael.chan, tariqt, anthony.l.nguyen,
przemyslaw.kitszel, jdamato, shayd, akpm, shayagr
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 24 Feb 2025 16:22:21 -0700 you wrote:
> Drivers usually need to re-apply the user-set IRQ affinity to their IRQs
> after reset. However, since there can be only one IRQ affinity notifier
> for each IRQ, registering IRQ notifiers conflicts with the ARFS rmap
> management in the core (which also registers separate IRQ affinity
> notifiers).
>
> Move the IRQ affinity management to the napi struct. This way we can have
> a unified IRQ notifier to re-apply the user-set affinity and also manage
> the ARFS rmaps.
>
> [...]
Here is the summary with links:
- [net-next,v9,1/6] net: move aRFS rmap management and CPU affinity to core
https://git.kernel.org/netdev/net-next/c/bd7c00605ee0
- [net-next,v9,2/6] net: ena: use napi's aRFS rmap notifers
https://git.kernel.org/netdev/net-next/c/de340d8206bf
- [net-next,v9,3/6] ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues()
https://git.kernel.org/netdev/net-next/c/30b78ba3d4fe
- [net-next,v9,4/6] ice: use napi's irq affinity and rmap IRQ notifiers
https://git.kernel.org/netdev/net-next/c/4063af296762
- [net-next,v9,5/6] idpf: use napi's irq affinity
https://git.kernel.org/netdev/net-next/c/deab38f8f011
- [net-next,v9,6/6] selftests: drv-net: add tests for napi IRQ affinity notifiers
https://git.kernel.org/netdev/net-next/c/185646a8a0a8
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH net-next v9 2/6] net: ena: use napi's aRFS rmap notifers
2025-02-24 23:22 ` [PATCH net-next v9 2/6] net: ena: use napi's aRFS rmap notifers Ahmed Zaki
@ 2025-03-03 17:11 ` Arinzon, David
2025-03-04 22:37 ` [Intel-wired-lan] " Ahmed Zaki
0 siblings, 1 reply; 12+ messages in thread
From: Arinzon, David @ 2025-03-03 17:11 UTC (permalink / raw)
To: Ahmed Zaki, netdev@vger.kernel.org
Cc: intel-wired-lan@lists.osuosl.org, andrew+netdev@lunn.ch,
edumazet@google.com, kuba@kernel.org, horms@kernel.org,
pabeni@redhat.com, davem@davemloft.net, michael.chan@broadcom.com,
tariqt@nvidia.com, anthony.l.nguyen@intel.com,
przemyslaw.kitszel@intel.com, jdamato@fastly.com,
shayd@nvidia.com, akpm@linux-foundation.org, Allen, Neil
> Use the core's rmap notifiers and delete our own.
>
> Acked-by: David Arinzon <darinzon@amazon.com>
> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
> ---
> drivers/net/ethernet/amazon/ena/ena_netdev.c | 43 +-------------------
> 1 file changed, 1 insertion(+), 42 deletions(-)
>
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index c1295dfad0d0..6aab85a7c60a 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -5,9 +5,6 @@
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> -#ifdef CONFIG_RFS_ACCEL
> -#include <linux/cpu_rmap.h>
> -#endif /* CONFIG_RFS_ACCEL */
> #include <linux/ethtool.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> @@ -162,30 +159,6 @@ int ena_xmit_common(struct ena_adapter
> *adapter,
> return 0;
> }
>
> -static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter) -{ -#ifdef
> CONFIG_RFS_ACCEL
> - u32 i;
> - int rc;
> -
> - adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter-
> >num_io_queues);
> - if (!adapter->netdev->rx_cpu_rmap)
> - return -ENOMEM;
> - for (i = 0; i < adapter->num_io_queues; i++) {
> - int irq_idx = ENA_IO_IRQ_IDX(i);
> -
> - rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap,
> - pci_irq_vector(adapter->pdev, irq_idx));
> - if (rc) {
> - free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
> - adapter->netdev->rx_cpu_rmap = NULL;
> - return rc;
> - }
> - }
> -#endif /* CONFIG_RFS_ACCEL */
> - return 0;
> -}
> -
> static void ena_init_io_rings_common(struct ena_adapter *adapter,
> struct ena_ring *ring, u16 qid) { @@ -1596,7 +1569,7 @@
> static int ena_enable_msix(struct ena_adapter *adapter)
> adapter->num_io_queues = irq_cnt - ENA_ADMIN_MSIX_VEC;
> }
>
> - if (ena_init_rx_cpu_rmap(adapter))
> + if (netif_enable_cpu_rmap(adapter->netdev,
> + adapter->num_io_queues))
> netif_warn(adapter, probe, adapter->netdev,
> "Failed to map IRQs to CPUs\n");
>
> @@ -1742,13 +1715,6 @@ static void ena_free_io_irq(struct ena_adapter
> *adapter)
> struct ena_irq *irq;
> int i;
>
> -#ifdef CONFIG_RFS_ACCEL
> - if (adapter->msix_vecs >= 1) {
> - free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
> - adapter->netdev->rx_cpu_rmap = NULL;
> - }
> -#endif /* CONFIG_RFS_ACCEL */
> -
> for (i = ENA_IO_IRQ_FIRST_IDX; i <
> ENA_MAX_MSIX_VEC(io_queue_count); i++) {
> irq = &adapter->irq_tbl[i];
> irq_set_affinity_hint(irq->vector, NULL); @@ -4131,13 +4097,6 @@
> static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
> ena_dev = adapter->ena_dev;
> netdev = adapter->netdev;
>
> -#ifdef CONFIG_RFS_ACCEL
> - if ((adapter->msix_vecs >= 1) && (netdev->rx_cpu_rmap)) {
> - free_irq_cpu_rmap(netdev->rx_cpu_rmap);
> - netdev->rx_cpu_rmap = NULL;
> - }
> -
> -#endif /* CONFIG_RFS_ACCEL */
> /* Make sure timer and reset routine won't be called after
> * freeing device resources.
> */
> --
> 2.43.0
Hi Ahmed,
After the merging of this patch, I see the below stack trace when the IRQs are freed.
It can be reproduced by unloading and loading the driver using `modprobe -r ena; modprobe ena` (happens during unload)
Based on the patchset and the changes to other drivers, I think there's a missing call to the function
that releases the affinity notifier (The warn is in https://web.git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/kernel/irq/manage.c#n2031)
I saw in the intel code in the patchset that ` netif_napi_set_irq(<napi>, -1);` is added?
After adding the code snippet I don't see this anymore, but I want to understand whether it's the right call by design.
@@ -1716,8 +1716,11 @@ static void ena_free_io_irq(struct ena_adapter *adapter)
int i;
for (i = ENA_IO_IRQ_FIRST_IDX; i < ENA_MAX_MSIX_VEC(io_queue_count); i++) {
+ struct ena_napi *ena_napi;
+
irq = &adapter->irq_tbl[i];
irq_set_affinity_hint(irq->vector, NULL);
+ ena_napi = (struct ena_napi *)irq->data;
+ netif_napi_set_irq(&ena_napi->napi, -1);
free_irq(irq->vector, irq->data);
}
}
[ 484.544586] ? __warn+0x84/0x130
[ 484.544843] ? free_irq+0x5c/0x70
[ 484.545105] ? report_bug+0x18a/0x1a0
[ 484.545390] ? handle_bug+0x53/0x90
[ 484.545664] ? exc_invalid_op+0x14/0x70
[ 484.545959] ? asm_exc_invalid_op+0x16/0x20
[ 484.546279] ? free_irq+0x5c/0x70
[ 484.546545] ? free_irq+0x10/0x70
[ 484.546807] ena_free_io_irq+0x5f/0x70 [ena]
[ 484.547138] ena_down+0x250/0x3e0 [ena]
[ 484.547435] ena_destroy_device+0x118/0x150 [ena]
[ 484.547796] __ena_shutoff+0x5a/0xe0 [ena]
[ 484.548110] pci_device_remove+0x3b/0xb0
[ 484.548412] device_release_driver_internal+0x193/0x200
[ 484.548804] driver_detach+0x44/0x90
[ 484.549084] bus_remove_driver+0x69/0xf0
[ 484.549386] pci_unregister_driver+0x2a/0xb0
[ 484.549717] ena_cleanup+0xc/0x130 [ena]
[ 484.550021] __do_sys_delete_module.constprop.0+0x176/0x310
[ 484.550438] ? syscall_trace_enter+0xfb/0x1c0
[ 484.550782] do_syscall_64+0x5b/0x170
[ 484.551067] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Thanks,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH net-next v9 2/6] net: ena: use napi's aRFS rmap notifers
2025-03-03 17:11 ` Arinzon, David
@ 2025-03-04 22:37 ` Ahmed Zaki
2025-03-05 6:33 ` Arinzon, David
0 siblings, 1 reply; 12+ messages in thread
From: Ahmed Zaki @ 2025-03-04 22:37 UTC (permalink / raw)
To: Arinzon, David, netdev@vger.kernel.org
Cc: intel-wired-lan@lists.osuosl.org, andrew+netdev@lunn.ch,
edumazet@google.com, kuba@kernel.org, horms@kernel.org,
pabeni@redhat.com, davem@davemloft.net, michael.chan@broadcom.com,
tariqt@nvidia.com, anthony.l.nguyen@intel.com,
przemyslaw.kitszel@intel.com, jdamato@fastly.com,
shayd@nvidia.com, akpm@linux-foundation.org, Allen, Neil
[RE-SEND] I just realized I sent this only to iwl, sorry for spamming.
On 2025-03-03 10:11 a.m., Arinzon, David wrote:
>> Use the core's rmap notifiers and delete our own.
>>
>> Acked-by: David Arinzon <darinzon@amazon.com>
>> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
>> ---
>> drivers/net/ethernet/amazon/ena/ena_netdev.c | 43 +-------------------
>> 1 file changed, 1 insertion(+), 42 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
>> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
>> index c1295dfad0d0..6aab85a7c60a 100644
>> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
>> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
>> @@ -5,9 +5,6 @@
>>
>> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>
>> -#ifdef CONFIG_RFS_ACCEL
>> -#include <linux/cpu_rmap.h>
>> -#endif /* CONFIG_RFS_ACCEL */
>> #include <linux/ethtool.h>
>> #include <linux/kernel.h>
>> #include <linux/module.h>
>> @@ -162,30 +159,6 @@ int ena_xmit_common(struct ena_adapter
>> *adapter,
>> return 0;
>> }
>>
>> -static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter) -{ -#ifdef
>> CONFIG_RFS_ACCEL
>> - u32 i;
>> - int rc;
>> -
>> - adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter-
>>> num_io_queues);
>> - if (!adapter->netdev->rx_cpu_rmap)
>> - return -ENOMEM;
>> - for (i = 0; i < adapter->num_io_queues; i++) {
>> - int irq_idx = ENA_IO_IRQ_IDX(i);
>> -
>> - rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap,
>> - pci_irq_vector(adapter->pdev, irq_idx));
>> - if (rc) {
>> - free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
>> - adapter->netdev->rx_cpu_rmap = NULL;
>> - return rc;
>> - }
>> - }
>> -#endif /* CONFIG_RFS_ACCEL */
>> - return 0;
>> -}
>> -
>> static void ena_init_io_rings_common(struct ena_adapter *adapter,
>> struct ena_ring *ring, u16 qid) { @@ -1596,7 +1569,7 @@
>> static int ena_enable_msix(struct ena_adapter *adapter)
>> adapter->num_io_queues = irq_cnt - ENA_ADMIN_MSIX_VEC;
>> }
>>
>> - if (ena_init_rx_cpu_rmap(adapter))
>> + if (netif_enable_cpu_rmap(adapter->netdev,
>> + adapter->num_io_queues))
>> netif_warn(adapter, probe, adapter->netdev,
>> "Failed to map IRQs to CPUs\n");
>>
>> @@ -1742,13 +1715,6 @@ static void ena_free_io_irq(struct ena_adapter
>> *adapter)
>> struct ena_irq *irq;
>> int i;
>>
>> -#ifdef CONFIG_RFS_ACCEL
>> - if (adapter->msix_vecs >= 1) {
>> - free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
>> - adapter->netdev->rx_cpu_rmap = NULL;
>> - }
>> -#endif /* CONFIG_RFS_ACCEL */
>> -
>> for (i = ENA_IO_IRQ_FIRST_IDX; i <
>> ENA_MAX_MSIX_VEC(io_queue_count); i++) {
>> irq = &adapter->irq_tbl[i];
>> irq_set_affinity_hint(irq->vector, NULL); @@ -4131,13 +4097,6 @@
>> static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
>> ena_dev = adapter->ena_dev;
>> netdev = adapter->netdev;
>>
>> -#ifdef CONFIG_RFS_ACCEL
>> - if ((adapter->msix_vecs >= 1) && (netdev->rx_cpu_rmap)) {
>> - free_irq_cpu_rmap(netdev->rx_cpu_rmap);
>> - netdev->rx_cpu_rmap = NULL;
>> - }
>> -
>> -#endif /* CONFIG_RFS_ACCEL */
>> /* Make sure timer and reset routine won't be called after
>> * freeing device resources.
>> */
>> --
>> 2.43.0
>
> Hi Ahmed,
>
> After the merging of this patch, I see the below stack trace when the IRQs are freed.
> It can be reproduced by unloading and loading the driver using `modprobe -r ena; modprobe ena` (happens during unload)
>
> Based on the patchset and the changes to other drivers, I think there's a missing call to the function
> that releases the affinity notifier (The warn is in https://web.git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/kernel/irq/manage.c#n2031)
>
> I saw in the intel code in the patchset that ` netif_napi_set_irq(<napi>, -1);` is added?
>
> After adding the code snippet I don't see this anymore, but I want to understand whether it's the right call by design.
Yes, in ena_down() the IRQs are freed before napis are deleted (where
IRQ notifiers are released). The code below is fine (and is better IMO)
but you can also delete napis then free IRQs.
>
> @@ -1716,8 +1716,11 @@ static void ena_free_io_irq(struct ena_adapter *adapter)
> int i;
>
> for (i = ENA_IO_IRQ_FIRST_IDX; i < ENA_MAX_MSIX_VEC(io_queue_count); i++) {
> + struct ena_napi *ena_napi;
> +
> irq = &adapter->irq_tbl[i];
> irq_set_affinity_hint(irq->vector, NULL);
> + ena_napi = (struct ena_napi *)irq->data;
> + netif_napi_set_irq(&ena_napi->napi, -1);
> free_irq(irq->vector, irq->data);
> }
> }
>
> [ 484.544586] ? __warn+0x84/0x130
> [ 484.544843] ? free_irq+0x5c/0x70
> [ 484.545105] ? report_bug+0x18a/0x1a0
> [ 484.545390] ? handle_bug+0x53/0x90
> [ 484.545664] ? exc_invalid_op+0x14/0x70
> [ 484.545959] ? asm_exc_invalid_op+0x16/0x20
> [ 484.546279] ? free_irq+0x5c/0x70
> [ 484.546545] ? free_irq+0x10/0x70
> [ 484.546807] ena_free_io_irq+0x5f/0x70 [ena]
> [ 484.547138] ena_down+0x250/0x3e0 [ena]
> [ 484.547435] ena_destroy_device+0x118/0x150 [ena]
> [ 484.547796] __ena_shutoff+0x5a/0xe0 [ena]
> [ 484.548110] pci_device_remove+0x3b/0xb0
> [ 484.548412] device_release_driver_internal+0x193/0x200
> [ 484.548804] driver_detach+0x44/0x90
> [ 484.549084] bus_remove_driver+0x69/0xf0
> [ 484.549386] pci_unregister_driver+0x2a/0xb0
> [ 484.549717] ena_cleanup+0xc/0x130 [ena]
> [ 484.550021] __do_sys_delete_module.constprop.0+0x176/0x310
> [ 484.550438] ? syscall_trace_enter+0xfb/0x1c0
> [ 484.550782] do_syscall_64+0x5b/0x170
> [ 484.551067] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Thanks,
> David
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [Intel-wired-lan] [PATCH net-next v9 2/6] net: ena: use napi's aRFS rmap notifers
2025-03-04 22:37 ` [Intel-wired-lan] " Ahmed Zaki
@ 2025-03-05 6:33 ` Arinzon, David
2025-03-05 14:00 ` Ahmed Zaki
0 siblings, 1 reply; 12+ messages in thread
From: Arinzon, David @ 2025-03-05 6:33 UTC (permalink / raw)
To: Ahmed Zaki, netdev@vger.kernel.org, jdamato@fastly.com
Cc: intel-wired-lan@lists.osuosl.org, andrew+netdev@lunn.ch,
edumazet@google.com, kuba@kernel.org, horms@kernel.org,
pabeni@redhat.com, davem@davemloft.net, michael.chan@broadcom.com,
tariqt@nvidia.com, anthony.l.nguyen@intel.com,
przemyslaw.kitszel@intel.com, jdamato@fastly.com,
shayd@nvidia.com, akpm@linux-foundation.org, Allen, Neil
> [RE-SEND] I just realized I sent this only to iwl, sorry for spamming.
>
>
> On 2025-03-03 10:11 a.m., Arinzon, David wrote:
> >> Use the core's rmap notifiers and delete our own.
> >>
> >> Acked-by: David Arinzon <darinzon@amazon.com>
> >> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
> >> ---
> >> drivers/net/ethernet/amazon/ena/ena_netdev.c | 43 +-------------------
> >> 1 file changed, 1 insertion(+), 42 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> >> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> >> index c1295dfad0d0..6aab85a7c60a 100644
> >> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> >> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> >> @@ -5,9 +5,6 @@
> >>
> >> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >>
> >> -#ifdef CONFIG_RFS_ACCEL
> >> -#include <linux/cpu_rmap.h>
> >> -#endif /* CONFIG_RFS_ACCEL */
> >> #include <linux/ethtool.h>
> >> #include <linux/kernel.h>
> >> #include <linux/module.h>
> >> @@ -162,30 +159,6 @@ int ena_xmit_common(struct ena_adapter
> *adapter,
> >> return 0;
> >> }
> >>
> >> -static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter) -{
> >> -#ifdef CONFIG_RFS_ACCEL
> >> - u32 i;
> >> - int rc;
> >> -
> >> - adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter-
> >>> num_io_queues);
> >> - if (!adapter->netdev->rx_cpu_rmap)
> >> - return -ENOMEM;
> >> - for (i = 0; i < adapter->num_io_queues; i++) {
> >> - int irq_idx = ENA_IO_IRQ_IDX(i);
> >> -
> >> - rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap,
> >> - pci_irq_vector(adapter->pdev, irq_idx));
> >> - if (rc) {
> >> - free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
> >> - adapter->netdev->rx_cpu_rmap = NULL;
> >> - return rc;
> >> - }
> >> - }
> >> -#endif /* CONFIG_RFS_ACCEL */
> >> - return 0;
> >> -}
> >> -
> >> static void ena_init_io_rings_common(struct ena_adapter *adapter,
> >> struct ena_ring *ring, u16 qid)
> >> { @@ -1596,7 +1569,7 @@ static int ena_enable_msix(struct ena_adapter
> *adapter)
> >> adapter->num_io_queues = irq_cnt - ENA_ADMIN_MSIX_VEC;
> >> }
> >>
> >> - if (ena_init_rx_cpu_rmap(adapter))
> >> + if (netif_enable_cpu_rmap(adapter->netdev,
> >> + adapter->num_io_queues))
> >> netif_warn(adapter, probe, adapter->netdev,
> >> "Failed to map IRQs to CPUs\n");
> >>
> >> @@ -1742,13 +1715,6 @@ static void ena_free_io_irq(struct ena_adapter
> >> *adapter)
> >> struct ena_irq *irq;
> >> int i;
> >>
> >> -#ifdef CONFIG_RFS_ACCEL
> >> - if (adapter->msix_vecs >= 1) {
> >> - free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
> >> - adapter->netdev->rx_cpu_rmap = NULL;
> >> - }
> >> -#endif /* CONFIG_RFS_ACCEL */
> >> -
> >> for (i = ENA_IO_IRQ_FIRST_IDX; i <
> >> ENA_MAX_MSIX_VEC(io_queue_count); i++) {
> >> irq = &adapter->irq_tbl[i];
> >> irq_set_affinity_hint(irq->vector, NULL); @@
> >> -4131,13 +4097,6 @@ static void __ena_shutoff(struct pci_dev *pdev,
> bool shutdown)
> >> ena_dev = adapter->ena_dev;
> >> netdev = adapter->netdev;
> >>
> >> -#ifdef CONFIG_RFS_ACCEL
> >> - if ((adapter->msix_vecs >= 1) && (netdev->rx_cpu_rmap)) {
> >> - free_irq_cpu_rmap(netdev->rx_cpu_rmap);
> >> - netdev->rx_cpu_rmap = NULL;
> >> - }
> >> -
> >> -#endif /* CONFIG_RFS_ACCEL */
> >> /* Make sure timer and reset routine won't be called after
> >> * freeing device resources.
> >> */
> >> --
> >> 2.43.0
> >
> > Hi Ahmed,
> >
> > After the merging of this patch, I see the below stack trace when the IRQs
> are freed.
> > It can be reproduced by unloading and loading the driver using
> > `modprobe -r ena; modprobe ena` (happens during unload)
> >
> > Based on the patchset and the changes to other drivers, I think
> > there's a missing call to the function that releases the affinity
> > notifier (The warn is in
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.gi
> > t/tree/kernel/irq/manage.c#n2031)
> >
> > I saw in the intel code in the patchset that ` netif_napi_set_irq(<napi>, -1);`
> is added?
> >
> > After adding the code snippet I don't see this anymore, but I want to
> understand whether it's the right call by design.
>
> Yes, in ena_down() the IRQs are freed before napis are deleted (where IRQ
> notifiers are released). The code below is fine (and is better IMO) but you
> can also delete napis then free IRQs.
>
>
Thanks for the clarification. Some book-keeping, as this change fixes the issue.
The need to use `netif_napi_set_irq` was introduced in https://lore.kernel.org/netdev/20241002001331.65444-2-jdamato@fastly.com/,
But, technically, there was not need to use the call with the -1 until the introduction of this patch.
Is my understanding correct?
If it's correct, then the fix is for this patch.
(Also adding Joe who authored the mentioned patch)
> >
> > @@ -1716,8 +1716,11 @@ static void ena_free_io_irq(struct ena_adapter
> *adapter)
> > int i;
> >
> > for (i = ENA_IO_IRQ_FIRST_IDX; i <
> > ENA_MAX_MSIX_VEC(io_queue_count); i++) {
> > + struct ena_napi *ena_napi;
> > +
> > irq = &adapter->irq_tbl[i];
> > irq_set_affinity_hint(irq->vector, NULL);
> > + ena_napi = (struct ena_napi *)irq->data;
> > + netif_napi_set_irq(&ena_napi->napi, -1);
> > free_irq(irq->vector, irq->data);
> > }
> > }
> >
> > [ 484.544586] ? __warn+0x84/0x130
> > [ 484.544843] ? free_irq+0x5c/0x70
> > [ 484.545105] ? report_bug+0x18a/0x1a0 [ 484.545390] ?
> > handle_bug+0x53/0x90 [ 484.545664] ? exc_invalid_op+0x14/0x70 [
> > 484.545959] ? asm_exc_invalid_op+0x16/0x20 [ 484.546279] ?
> > free_irq+0x5c/0x70 [ 484.546545] ? free_irq+0x10/0x70 [ 484.546807]
> > ena_free_io_irq+0x5f/0x70 [ena] [ 484.547138] ena_down+0x250/0x3e0
> > [ena] [ 484.547435] ena_destroy_device+0x118/0x150 [ena] [
> > 484.547796] __ena_shutoff+0x5a/0xe0 [ena] [ 484.548110]
> > pci_device_remove+0x3b/0xb0 [ 484.548412]
> > device_release_driver_internal+0x193/0x200
> > [ 484.548804] driver_detach+0x44/0x90 [ 484.549084]
> > bus_remove_driver+0x69/0xf0 [ 484.549386]
> > pci_unregister_driver+0x2a/0xb0 [ 484.549717] ena_cleanup+0xc/0x130
> > [ena] [ 484.550021] __do_sys_delete_module.constprop.0+0x176/0x310
> > [ 484.550438] ? syscall_trace_enter+0xfb/0x1c0 [ 484.550782]
> > do_syscall_64+0x5b/0x170 [ 484.551067]
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > Thanks,
> > David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH net-next v9 2/6] net: ena: use napi's aRFS rmap notifers
2025-03-05 6:33 ` Arinzon, David
@ 2025-03-05 14:00 ` Ahmed Zaki
0 siblings, 0 replies; 12+ messages in thread
From: Ahmed Zaki @ 2025-03-05 14:00 UTC (permalink / raw)
To: Arinzon, David, netdev@vger.kernel.org, jdamato@fastly.com
Cc: intel-wired-lan@lists.osuosl.org, andrew+netdev@lunn.ch,
edumazet@google.com, kuba@kernel.org, horms@kernel.org,
pabeni@redhat.com, davem@davemloft.net, michael.chan@broadcom.com,
tariqt@nvidia.com, anthony.l.nguyen@intel.com,
przemyslaw.kitszel@intel.com, shayd@nvidia.com,
akpm@linux-foundation.org, Allen, Neil
On 2025-03-04 11:33 p.m., Arinzon, David wrote:
>> [RE-SEND] I just realized I sent this only to iwl, sorry for spamming.
>>
>>
>> On 2025-03-03 10:11 a.m., Arinzon, David wrote:
>>>> Use the core's rmap notifiers and delete our own.
>>>>
>>>> Acked-by: David Arinzon <darinzon@amazon.com>
>>>> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
>>>> ---
>>>> drivers/net/ethernet/amazon/ena/ena_netdev.c | 43 +-------------------
>>>> 1 file changed, 1 insertion(+), 42 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
>>>> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
>>>> index c1295dfad0d0..6aab85a7c60a 100644
>>>> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
>>>> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
>>>> @@ -5,9 +5,6 @@
>>>>
>>>> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>>>
>>>> -#ifdef CONFIG_RFS_ACCEL
>>>> -#include <linux/cpu_rmap.h>
>>>> -#endif /* CONFIG_RFS_ACCEL */
>>>> #include <linux/ethtool.h>
>>>> #include <linux/kernel.h>
>>>> #include <linux/module.h>
>>>> @@ -162,30 +159,6 @@ int ena_xmit_common(struct ena_adapter
>> *adapter,
>>>> return 0;
>>>> }
>>>>
>>>> -static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter) -{
>>>> -#ifdef CONFIG_RFS_ACCEL
>>>> - u32 i;
>>>> - int rc;
>>>> -
>>>> - adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter-
>>>>> num_io_queues);
>>>> - if (!adapter->netdev->rx_cpu_rmap)
>>>> - return -ENOMEM;
>>>> - for (i = 0; i < adapter->num_io_queues; i++) {
>>>> - int irq_idx = ENA_IO_IRQ_IDX(i);
>>>> -
>>>> - rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap,
>>>> - pci_irq_vector(adapter->pdev, irq_idx));
>>>> - if (rc) {
>>>> - free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
>>>> - adapter->netdev->rx_cpu_rmap = NULL;
>>>> - return rc;
>>>> - }
>>>> - }
>>>> -#endif /* CONFIG_RFS_ACCEL */
>>>> - return 0;
>>>> -}
>>>> -
>>>> static void ena_init_io_rings_common(struct ena_adapter *adapter,
>>>> struct ena_ring *ring, u16 qid)
>>>> { @@ -1596,7 +1569,7 @@ static int ena_enable_msix(struct ena_adapter
>> *adapter)
>>>> adapter->num_io_queues = irq_cnt - ENA_ADMIN_MSIX_VEC;
>>>> }
>>>>
>>>> - if (ena_init_rx_cpu_rmap(adapter))
>>>> + if (netif_enable_cpu_rmap(adapter->netdev,
>>>> + adapter->num_io_queues))
>>>> netif_warn(adapter, probe, adapter->netdev,
>>>> "Failed to map IRQs to CPUs\n");
>>>>
>>>> @@ -1742,13 +1715,6 @@ static void ena_free_io_irq(struct ena_adapter
>>>> *adapter)
>>>> struct ena_irq *irq;
>>>> int i;
>>>>
>>>> -#ifdef CONFIG_RFS_ACCEL
>>>> - if (adapter->msix_vecs >= 1) {
>>>> - free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
>>>> - adapter->netdev->rx_cpu_rmap = NULL;
>>>> - }
>>>> -#endif /* CONFIG_RFS_ACCEL */
>>>> -
>>>> for (i = ENA_IO_IRQ_FIRST_IDX; i <
>>>> ENA_MAX_MSIX_VEC(io_queue_count); i++) {
>>>> irq = &adapter->irq_tbl[i];
>>>> irq_set_affinity_hint(irq->vector, NULL); @@
>>>> -4131,13 +4097,6 @@ static void __ena_shutoff(struct pci_dev *pdev,
>> bool shutdown)
>>>> ena_dev = adapter->ena_dev;
>>>> netdev = adapter->netdev;
>>>>
>>>> -#ifdef CONFIG_RFS_ACCEL
>>>> - if ((adapter->msix_vecs >= 1) && (netdev->rx_cpu_rmap)) {
>>>> - free_irq_cpu_rmap(netdev->rx_cpu_rmap);
>>>> - netdev->rx_cpu_rmap = NULL;
>>>> - }
>>>> -
>>>> -#endif /* CONFIG_RFS_ACCEL */
>>>> /* Make sure timer and reset routine won't be called after
>>>> * freeing device resources.
>>>> */
>>>> --
>>>> 2.43.0
>>>
>>> Hi Ahmed,
>>>
>>> After the merging of this patch, I see the below stack trace when the IRQs
>> are freed.
>>> It can be reproduced by unloading and loading the driver using
>>> `modprobe -r ena; modprobe ena` (happens during unload)
>>>
>>> Based on the patchset and the changes to other drivers, I think
>>> there's a missing call to the function that releases the affinity
>>> notifier (The warn is in
>>> https://web.git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.gi
>>> t/tree/kernel/irq/manage.c#n2031)
>>>
>>> I saw in the intel code in the patchset that ` netif_napi_set_irq(<napi>, -1);`
>> is added?
>>>
>>> After adding the code snippet I don't see this anymore, but I want to
>> understand whether it's the right call by design.
>>
>> Yes, in ena_down() the IRQs are freed before napis are deleted (where IRQ
>> notifiers are released). The code below is fine (and is better IMO) but you
>> can also delete napis then free IRQs.
>>
>>
>
> Thanks for the clarification. Some book-keeping, as this change fixes the issue.
> The need to use `netif_napi_set_irq` was introduced in https://lore.kernel.org/netdev/20241002001331.65444-2-jdamato@fastly.com/,
> But, technically, there was not need to use the call with the -1 until the introduction of this patch.
> Is my understanding correct?
Correct. The new patch attaches resources (IRQ notifieres) to the napi
instance that should be released before freeing IRQs.
>
> If it's correct, then the fix is for this patch.
>
> (Also adding Joe who authored the mentioned patch)
>
I guess so since there was no need to call set_irq(-1) previoulsy.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-03-05 14:01 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-24 23:22 [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 1/6] net: move aRFS rmap management and CPU affinity to core Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 2/6] net: ena: use napi's aRFS rmap notifers Ahmed Zaki
2025-03-03 17:11 ` Arinzon, David
2025-03-04 22:37 ` [Intel-wired-lan] " Ahmed Zaki
2025-03-05 6:33 ` Arinzon, David
2025-03-05 14:00 ` Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 3/6] ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues() Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 4/6] ice: use napi's irq affinity and rmap IRQ notifiers Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 5/6] idpf: use napi's irq affinity Ahmed Zaki
2025-02-24 23:22 ` [PATCH net-next v9 6/6] selftests: drv-net: add tests for napi IRQ affinity notifiers Ahmed Zaki
2025-02-27 4:10 ` [PATCH net-next v9 0/6] net: napi: add CPU affinity to napi->config patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).