* [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
@ 2024-08-12 14:56 Joe Damato
2024-08-12 14:56 ` [RFC net-next 2/6] mlx5: Use napi_affinity_no_change Joe Damato
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Joe Damato @ 2024-08-12 14:56 UTC (permalink / raw)
To: netdev
Cc: Joe Damato, Daniel Borkmann, David S. Miller, Eric Dumazet,
Harshitha Ramamurthy, moderated list:INTEL ETHERNET DRIVERS,
Jakub Kicinski, Jeroen de Borst, Jiri Pirko, Leon Romanovsky,
open list, open list:MELLANOX MLX4 core VPI driver,
Lorenzo Bianconi, Paolo Abeni, Praveen Kaligineedi,
Przemek Kitszel, Saeed Mahameed, Sebastian Andrzej Siewior,
Shailend Chand, Tariq Toukan, Tony Nguyen, Willem de Bruijn,
Yishai Hadas, Ziwei Xiao
Greetings:
Several drivers make a check in their napi poll functions to determine
if the CPU affinity of the IRQ has changed. If it has, the napi poll
function returns a value less than the budget to force polling mode to
be disabled, so that it can be rescheduled on the correct CPU next time
the softirq is raised.
This code is repeated in at least 5 drivers that I found, but there
might be more I missed (please let me know and I'll fix them). IMHO,
it'd be nice to fix this in existing drivers and avoid future drivers
repeating the same pattern.
FWIW, it's possible that patch 4, 5, and 6 could be separated into
"fixes" for the type mismatches and then, separaately, new code, but
that seemed like a lot of noise for the list and maybe unnecessary.
If I should first send fixes for 4, 5, and 6 and then send this cleanup
series after, let me know and I'll do that.
Sending as an RFC because:
- I wanted to see if this cleanup was desirable overall, and
- If so, do I need to send fixes for 4-6 first?
Thanks,
Joe
Joe Damato (6):
netdevice: Add napi_affinity_no_change
mlx5: Use napi_affinity_no_change
gve: Use napi_affinity_no_change
i40e: Use napi_affinity_no_change
iavf: Use napi_affinity_no_change
mlx4: Use napi_affinity_no_change
drivers/net/ethernet/google/gve/gve_main.c | 14 +-------------
drivers/net/ethernet/intel/i40e/i40e.h | 2 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 4 +---
drivers/net/ethernet/intel/iavf/iavf.h | 1 +
drivers/net/ethernet/intel/iavf/iavf_main.c | 4 +++-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 4 +---
drivers/net/ethernet/mellanox/mlx4/en_cq.c | 6 ++++--
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 6 +-----
drivers/net/ethernet/mellanox/mlx4/eq.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 9 +--------
include/linux/mlx4/device.h | 2 +-
include/linux/netdevice.h | 8 ++++++++
net/core/dev.c | 14 ++++++++++++++
17 files changed, 42 insertions(+), 41 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread* [RFC net-next 2/6] mlx5: Use napi_affinity_no_change
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
@ 2024-08-12 14:56 ` Joe Damato
2024-08-12 14:56 ` [RFC net-next 6/6] mlx4: " Joe Damato
2024-08-14 0:17 ` [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Jakub Kicinski
2 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2024-08-12 14:56 UTC (permalink / raw)
To: netdev
Cc: Joe Damato, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
open list:MELLANOX MLX5 core VPI driver, open list
Use napi_affinity_no_change instead of mlx5's internal implementation,
simplifying and centralizing the logic.
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 9 +--------
3 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 7832f6b6c8a8..c288e3f45a06 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -767,7 +767,7 @@ struct mlx5e_channel {
spinlock_t async_icosq_lock;
/* data path - accessed per napi poll */
- const struct cpumask *aff_mask;
+ unsigned int irq;
struct mlx5e_ch_stats *stats;
/* control */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6f686fabed44..7b217aafbdfd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2684,8 +2684,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
c->mkey_be = cpu_to_be32(mdev->mlx5e_res.hw_objs.mkey);
c->num_tc = mlx5e_get_dcb_num_tc(params);
c->xdp = !!params->xdp_prog;
+ c->irq = irq;
c->stats = &priv->channel_stats[ix]->ch;
- c->aff_mask = irq_get_effective_affinity_mask(irq);
c->lag_port = mlx5e_enumerate_lag_port(mdev, ix);
netif_napi_add(netdev, &c->napi, mlx5e_napi_poll);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 5873fde65c2e..ee453d019c3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -39,13 +39,6 @@
#include "en/xsk/tx.h"
#include "en_accel/ktls_txrx.h"
-static inline bool mlx5e_channel_no_affinity_change(struct mlx5e_channel *c)
-{
- int current_cpu = smp_processor_id();
-
- return cpumask_test_cpu(current_cpu, c->aff_mask);
-}
-
static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
{
struct mlx5e_sq_stats *stats = sq->stats;
@@ -201,7 +194,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
busy |= busy_xsk;
if (busy) {
- if (likely(mlx5e_channel_no_affinity_change(c))) {
+ if (likely(napi_affinity_no_change(c->irq))) {
work_done = budget;
goto out;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [RFC net-next 6/6] mlx4: Use napi_affinity_no_change
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
2024-08-12 14:56 ` [RFC net-next 2/6] mlx5: Use napi_affinity_no_change Joe Damato
@ 2024-08-12 14:56 ` Joe Damato
2024-08-14 0:17 ` [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Jakub Kicinski
2 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2024-08-12 14:56 UTC (permalink / raw)
To: netdev
Cc: Joe Damato, Tariq Toukan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Yishai Hadas,
open list:MELLANOX MLX4 core VPI driver, open list
Use napi_affinity_no_change instead of mlx4's internal implementation,
simplifying and centralizing the logic.
To support this, some type changes were made, which might have been bugs:
- mlx4_eq_get_irq should return unsigned int (since the field it
returns is u16)
- fix the type of irq (from int to unsigned int) in mlx4_en_activate_cq
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/mellanox/mlx4/en_cq.c | 6 ++++--
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 6 +-----
drivers/net/ethernet/mellanox/mlx4/eq.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 +
include/linux/mlx4/device.h | 2 +-
5 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 461cc2c79c71..9c3e1c810412 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -90,9 +90,10 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
int cq_idx)
{
struct mlx4_en_dev *mdev = priv->mdev;
- int irq, err = 0;
- int timestamp_en = 0;
bool assigned_eq = false;
+ int timestamp_en = 0;
+ unsigned int irq;
+ int err = 0;
cq->dev = mdev->pndev[priv->port];
cq->mcq.set_ci_db = cq->wqres.db.db;
@@ -144,6 +145,7 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
goto free_eq;
cq->cq_idx = cq_idx;
+ cq->irq = irq;
cq->mcq.event = mlx4_en_cq_event;
switch (cq->type) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 15c57e9517e9..4c296327b75b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1022,14 +1022,10 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
/* If we used up all the quota - we're probably not done yet... */
if (done == budget || !clean_complete) {
- int cpu_curr;
-
/* in case we got here because of !clean_complete */
done = budget;
- cpu_curr = smp_processor_id();
-
- if (likely(cpumask_test_cpu(cpu_curr, cq->aff_mask)))
+ if (likely(napi_affinity_no_change(cq->irq)))
return budget;
/* Current cpu is not according to smp_irq_affinity -
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 9572a45f6143..feca4ea30750 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1539,7 +1539,7 @@ int mlx4_assign_eq(struct mlx4_dev *dev, u8 port, int *vector)
}
EXPORT_SYMBOL(mlx4_assign_eq);
-int mlx4_eq_get_irq(struct mlx4_dev *dev, int cq_vec)
+unsigned int mlx4_eq_get_irq(struct mlx4_dev *dev, int cq_vec)
{
struct mlx4_priv *priv = mlx4_priv(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 28b70dcc652e..1f9830b23c9c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -381,6 +381,7 @@ struct mlx4_en_cq {
const struct cpumask *aff_mask;
int cq_idx;
+ unsigned int irq;
};
struct mlx4_en_port_profile {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 27f42f713c89..1f2e9e954768 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1433,7 +1433,7 @@ int mlx4_assign_eq(struct mlx4_dev *dev, u8 port, int *vector);
void mlx4_release_eq(struct mlx4_dev *dev, int vec);
int mlx4_is_eq_shared(struct mlx4_dev *dev, int vector);
-int mlx4_eq_get_irq(struct mlx4_dev *dev, int vec);
+unsigned int mlx4_eq_get_irq(struct mlx4_dev *dev, int vec);
int mlx4_get_phys_port_id(struct mlx4_dev *dev);
int mlx4_wol_read(struct mlx4_dev *dev, u64 *config, int port);
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
2024-08-12 14:56 ` [RFC net-next 2/6] mlx5: Use napi_affinity_no_change Joe Damato
2024-08-12 14:56 ` [RFC net-next 6/6] mlx4: " Joe Damato
@ 2024-08-14 0:17 ` Jakub Kicinski
2024-08-14 7:14 ` Joe Damato
2 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-08-14 0:17 UTC (permalink / raw)
To: Joe Damato
Cc: netdev, Daniel Borkmann, David S. Miller, Eric Dumazet,
Harshitha Ramamurthy, moderated list:INTEL ETHERNET DRIVERS,
Jeroen de Borst, Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Mon, 12 Aug 2024 14:56:21 +0000 Joe Damato wrote:
> Several drivers make a check in their napi poll functions to determine
> if the CPU affinity of the IRQ has changed. If it has, the napi poll
> function returns a value less than the budget to force polling mode to
> be disabled, so that it can be rescheduled on the correct CPU next time
> the softirq is raised.
Any reason not to use the irq number already stored in napi_struct ?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 0:17 ` [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Jakub Kicinski
@ 2024-08-14 7:14 ` Joe Damato
2024-08-14 12:12 ` Joe Damato
0 siblings, 1 reply; 14+ messages in thread
From: Joe Damato @ 2024-08-14 7:14 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Daniel Borkmann, David S. Miller, Eric Dumazet,
Harshitha Ramamurthy, moderated list:INTEL ETHERNET DRIVERS,
Jeroen de Borst, Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Tue, Aug 13, 2024 at 05:17:10PM -0700, Jakub Kicinski wrote:
> On Mon, 12 Aug 2024 14:56:21 +0000 Joe Damato wrote:
> > Several drivers make a check in their napi poll functions to determine
> > if the CPU affinity of the IRQ has changed. If it has, the napi poll
> > function returns a value less than the budget to force polling mode to
> > be disabled, so that it can be rescheduled on the correct CPU next time
> > the softirq is raised.
>
> Any reason not to use the irq number already stored in napi_struct ?
Thanks for taking a look.
IIUC, that's possible if i40e, iavf, and gve are updated to call
netif_napi_set_irq first, which I could certainly do.
But as Stanislav points out, I would be adding a call to
irq_get_effective_affinity_mask in the hot path where one did not
exist before for 4 of 5 drivers.
In that case, it might make more sense to introduce:
bool napi_affinity_no_change(const struct cpumask *aff_mask)
instead and the drivers which have a cached mask can pass it in and
gve can be updated later to cache it.
Not sure how crucial avoiding the irq_get_effective_affinity_mask
call is; I would guess maybe some driver owners would object to
adding a new call in the hot path where one didn't exist before.
What do you think?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 7:14 ` Joe Damato
@ 2024-08-14 12:12 ` Joe Damato
2024-08-14 15:09 ` Jakub Kicinski
0 siblings, 1 reply; 14+ messages in thread
From: Joe Damato @ 2024-08-14 12:12 UTC (permalink / raw)
To: Jakub Kicinski, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, Aug 14, 2024 at 08:14:48AM +0100, Joe Damato wrote:
> On Tue, Aug 13, 2024 at 05:17:10PM -0700, Jakub Kicinski wrote:
> > On Mon, 12 Aug 2024 14:56:21 +0000 Joe Damato wrote:
> > > Several drivers make a check in their napi poll functions to determine
> > > if the CPU affinity of the IRQ has changed. If it has, the napi poll
> > > function returns a value less than the budget to force polling mode to
> > > be disabled, so that it can be rescheduled on the correct CPU next time
> > > the softirq is raised.
> >
> > Any reason not to use the irq number already stored in napi_struct ?
>
> Thanks for taking a look.
>
> IIUC, that's possible if i40e, iavf, and gve are updated to call
> netif_napi_set_irq first, which I could certainly do.
>
> But as Stanislav points out, I would be adding a call to
> irq_get_effective_affinity_mask in the hot path where one did not
> exist before for 4 of 5 drivers.
>
> In that case, it might make more sense to introduce:
>
> bool napi_affinity_no_change(const struct cpumask *aff_mask)
>
> instead and the drivers which have a cached mask can pass it in and
> gve can be updated later to cache it.
>
> Not sure how crucial avoiding the irq_get_effective_affinity_mask
> call is; I would guess maybe some driver owners would object to
> adding a new call in the hot path where one didn't exist before.
>
> What do you think?
Actually... how about a slightly different approach, which caches
the affinity mask in the core?
0. Extend napi struct to have a struct cpumask * field
1. extend netif_napi_set_irq to:
a. store the IRQ number in the napi struct (as you suggested)
b. call irq_get_effective_affinity_mask to store the mask in the
napi struct
c. set up generic affinity_notify.notify and
affinity_notify.release callbacks to update the in core mask
when it changes
2. add napi_affinity_no_change which now takes a napi_struct
3. cleanup all 5 drivers:
a. add calls to netif_napi_set_irq for all 5 (I think no RTNL
is needed, so I think this would be straight forward?)
b. remove all affinity_mask caching code in 4 of 5 drivers
c. update all 5 drivers to call napi_affinity_no_change in poll
Then ... anyone who adds support for netif_napi_set_irq to their
driver in the future gets automatic support in-core for
caching/updating of the mask? And in the future netdev-genl could
dump the mask since its in-core?
I'll mess around with that locally to see how it looks, but let me
know if that sounds like a better overall approach.
- Joe
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 12:12 ` Joe Damato
@ 2024-08-14 15:09 ` Jakub Kicinski
2024-08-14 15:19 ` Joe Damato
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-08-14 15:09 UTC (permalink / raw)
To: Joe Damato
Cc: netdev, Daniel Borkmann, David S. Miller, Eric Dumazet,
Harshitha Ramamurthy, moderated list:INTEL ETHERNET DRIVERS,
Jeroen de Borst, Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote:
> Actually... how about a slightly different approach, which caches
> the affinity mask in the core?
I was gonna say :)
> 0. Extend napi struct to have a struct cpumask * field
>
> 1. extend netif_napi_set_irq to:
> a. store the IRQ number in the napi struct (as you suggested)
> b. call irq_get_effective_affinity_mask to store the mask in the
> napi struct
> c. set up generic affinity_notify.notify and
> affinity_notify.release callbacks to update the in core mask
> when it changes
This part I'm not an export on.
> 2. add napi_affinity_no_change which now takes a napi_struct
>
> 3. cleanup all 5 drivers:
> a. add calls to netif_napi_set_irq for all 5 (I think no RTNL
> is needed, so I think this would be straight forward?)
> b. remove all affinity_mask caching code in 4 of 5 drivers
> c. update all 5 drivers to call napi_affinity_no_change in poll
>
> Then ... anyone who adds support for netif_napi_set_irq to their
> driver in the future gets automatic support in-core for
> caching/updating of the mask? And in the future netdev-genl could
> dump the mask since its in-core?
>
> I'll mess around with that locally to see how it looks, but let me
> know if that sounds like a better overall approach.
Could we even handle this directly as part of __napi_poll(),
once the driver gives core all of the relevant pieces of information ?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 15:09 ` Jakub Kicinski
@ 2024-08-14 15:19 ` Joe Damato
2024-08-14 16:03 ` Shay Drori
0 siblings, 1 reply; 14+ messages in thread
From: Joe Damato @ 2024-08-14 15:19 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Daniel Borkmann, David S. Miller, Eric Dumazet,
Harshitha Ramamurthy, moderated list:INTEL ETHERNET DRIVERS,
Jeroen de Borst, Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, Aug 14, 2024 at 08:09:15AM -0700, Jakub Kicinski wrote:
> On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote:
> > Actually... how about a slightly different approach, which caches
> > the affinity mask in the core?
>
> I was gonna say :)
>
> > 0. Extend napi struct to have a struct cpumask * field
> >
> > 1. extend netif_napi_set_irq to:
> > a. store the IRQ number in the napi struct (as you suggested)
> > b. call irq_get_effective_affinity_mask to store the mask in the
> > napi struct
> > c. set up generic affinity_notify.notify and
> > affinity_notify.release callbacks to update the in core mask
> > when it changes
>
> This part I'm not an export on.
>
> > 2. add napi_affinity_no_change which now takes a napi_struct
> >
> > 3. cleanup all 5 drivers:
> > a. add calls to netif_napi_set_irq for all 5 (I think no RTNL
> > is needed, so I think this would be straight forward?)
> > b. remove all affinity_mask caching code in 4 of 5 drivers
> > c. update all 5 drivers to call napi_affinity_no_change in poll
> >
> > Then ... anyone who adds support for netif_napi_set_irq to their
> > driver in the future gets automatic support in-core for
> > caching/updating of the mask? And in the future netdev-genl could
> > dump the mask since its in-core?
> >
> > I'll mess around with that locally to see how it looks, but let me
> > know if that sounds like a better overall approach.
I ended up going with the approach laid out above; moving the IRQ
affinity mask updating code into the core (which adds that ability
to gve/mlx4/mlx5... it seems mlx4/5 cached but didn't have notifiers
setup to update the cached copy?) and adding calls to
netif_napi_set_irq in i40e/iavf and deleting their custom notifier
code.
It's almost ready for rfcv2; I think this approach is probably
better ?
> Could we even handle this directly as part of __napi_poll(),
> once the driver gives core all of the relevant pieces of information ?
I had been thinking the same thing, too, but it seems like at least
one driver (mlx5) counts the number of affinity changes to export as
a stat, so moving all of this to core would break that.
So, I may avoid attempting that for this series.
I'm still messing around with this but will send an rfcv2 in a bit.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 15:19 ` Joe Damato
@ 2024-08-14 16:03 ` Shay Drori
2024-08-14 18:01 ` Joe Damato
0 siblings, 1 reply; 14+ messages in thread
From: Shay Drori @ 2024-08-14 16:03 UTC (permalink / raw)
To: Joe Damato, Jakub Kicinski, netdev, Daniel Borkmann,
David S. Miller, Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On 14/08/2024 18:19, Joe Damato wrote:
> On Wed, Aug 14, 2024 at 08:09:15AM -0700, Jakub Kicinski wrote:
>> On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote:
>>> Actually... how about a slightly different approach, which caches
>>> the affinity mask in the core?
>>
>> I was gonna say :)
>>
>>> 0. Extend napi struct to have a struct cpumask * field
>>>
>>> 1. extend netif_napi_set_irq to:
>>> a. store the IRQ number in the napi struct (as you suggested)
>>> b. call irq_get_effective_affinity_mask to store the mask in the
>>> napi struct
>>> c. set up generic affinity_notify.notify and
>>> affinity_notify.release callbacks to update the in core mask
>>> when it changes
>>
>> This part I'm not an export on.
several net drivers (mlx5, mlx4, ice, ena and more) are using a feature
called ARFS (rmap)[1], and this feature is using the affinity notifier
mechanism.
Also, affinity notifier infra is supporting only a single notifier per
IRQ.
Hence, your suggestion (1.c) will break the ARFS feature.
[1] see irq_cpu_rmap_add()
>>
>>> 2. add napi_affinity_no_change which now takes a napi_struct
>>>
>>> 3. cleanup all 5 drivers:
>>> a. add calls to netif_napi_set_irq for all 5 (I think no RTNL
>>> is needed, so I think this would be straight forward?)
>>> b. remove all affinity_mask caching code in 4 of 5 drivers
>>> c. update all 5 drivers to call napi_affinity_no_change in poll
>>>
>>> Then ... anyone who adds support for netif_napi_set_irq to their
>>> driver in the future gets automatic support in-core for
>>> caching/updating of the mask? And in the future netdev-genl could
>>> dump the mask since its in-core?
>>>
>>> I'll mess around with that locally to see how it looks, but let me
>>> know if that sounds like a better overall approach.
>
> I ended up going with the approach laid out above; moving the IRQ
> affinity mask updating code into the core (which adds that ability
> to gve/mlx4/mlx5... it seems mlx4/5 cached but didn't have notifiers
> setup to update the cached copy?)
maybe This is probably due to what I wrote above..
> and adding calls to
> netif_napi_set_irq in i40e/iavf and deleting their custom notifier
> code.
>
> It's almost ready for rfcv2; I think this approach is probably
> better ?
>
>> Could we even handle this directly as part of __napi_poll(),
>> once the driver gives core all of the relevant pieces of information ?
>
> I had been thinking the same thing, too, but it seems like at least
> one driver (mlx5) counts the number of affinity changes to export as
> a stat, so moving all of this to core would break that.
>
> So, I may avoid attempting that for this series.
>
> I'm still messing around with this but will send an rfcv2 in a bit.
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 16:03 ` Shay Drori
@ 2024-08-14 18:01 ` Joe Damato
2024-08-15 0:20 ` Jakub Kicinski
0 siblings, 1 reply; 14+ messages in thread
From: Joe Damato @ 2024-08-14 18:01 UTC (permalink / raw)
To: Shay Drori
Cc: Jakub Kicinski, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, Aug 14, 2024 at 07:03:35PM +0300, Shay Drori wrote:
>
>
> On 14/08/2024 18:19, Joe Damato wrote:
> > On Wed, Aug 14, 2024 at 08:09:15AM -0700, Jakub Kicinski wrote:
> > > On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote:
> > > > Actually... how about a slightly different approach, which caches
> > > > the affinity mask in the core?
> > >
> > > I was gonna say :)
> > >
> > > > 0. Extend napi struct to have a struct cpumask * field
> > > >
> > > > 1. extend netif_napi_set_irq to:
> > > > a. store the IRQ number in the napi struct (as you suggested)
> > > > b. call irq_get_effective_affinity_mask to store the mask in the
> > > > napi struct
> > > > c. set up generic affinity_notify.notify and
> > > > affinity_notify.release callbacks to update the in core mask
> > > > when it changes
> > >
> > > This part I'm not an export on.
>
> several net drivers (mlx5, mlx4, ice, ena and more) are using a feature
> called ARFS (rmap)[1], and this feature is using the affinity notifier
> mechanism.
> Also, affinity notifier infra is supporting only a single notifier per
> IRQ.
>
> Hence, your suggestion (1.c) will break the ARFS feature.
>
> [1] see irq_cpu_rmap_add()
Thanks for taking a look and your reply.
I did notice ARFS use by some drivers and figured that might be why
the notifiers were being used in some cases.
I guess the question comes down to whether adding a call to
irq_get_effective_affinity_mask in the hot path is a bad idea.
If it is, then the only option is to have the drivers pass in their
IRQ affinity masks, as Stanislav suggested, to avoid adding that
call to the hot path.
If not, then the IRQ from napi_struct can be used and the affinity
mask can be generated on every napi poll. i40e/gve/iavf would need
calls to netif_napi_set_irq to set the IRQ mapping, which seems to
be straightforward.
In both cases: the IRQ notifier stuff would be left as is so that it
wouldn't break ARFS.
I suspect that the preferred solution would be to avoid adding that
call to the hot path, right?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 18:01 ` Joe Damato
@ 2024-08-15 0:20 ` Jakub Kicinski
2024-08-15 10:22 ` Joe Damato
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-08-15 0:20 UTC (permalink / raw)
To: Joe Damato
Cc: Shay Drori, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote:
> If it is, then the only option is to have the drivers pass in their
> IRQ affinity masks, as Stanislav suggested, to avoid adding that
> call to the hot path.
>
> If not, then the IRQ from napi_struct can be used and the affinity
> mask can be generated on every napi poll. i40e/gve/iavf would need
> calls to netif_napi_set_irq to set the IRQ mapping, which seems to
> be straightforward.
It's a bit sad to have the generic solution blocked.
cpu_rmap_update() is exported. Maybe we can call it from our notifier?
rmap lives in struct net_device
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-15 0:20 ` Jakub Kicinski
@ 2024-08-15 10:22 ` Joe Damato
2024-08-20 6:40 ` Shay Drori
0 siblings, 1 reply; 14+ messages in thread
From: Joe Damato @ 2024-08-15 10:22 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Shay Drori, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, Aug 14, 2024 at 05:20:46PM -0700, Jakub Kicinski wrote:
> On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote:
> > If it is, then the only option is to have the drivers pass in their
> > IRQ affinity masks, as Stanislav suggested, to avoid adding that
> > call to the hot path.
> >
> > If not, then the IRQ from napi_struct can be used and the affinity
> > mask can be generated on every napi poll. i40e/gve/iavf would need
> > calls to netif_napi_set_irq to set the IRQ mapping, which seems to
> > be straightforward.
>
> It's a bit sad to have the generic solution blocked.
> cpu_rmap_update() is exported. Maybe we can call it from our notifier?
> rmap lives in struct net_device
I agree on the sadness. I will take a look today.
I guess if we were being really ambitious, we'd try to move ARFS
stuff into the core (as RSS was moved into the core).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-15 10:22 ` Joe Damato
@ 2024-08-20 6:40 ` Shay Drori
2024-08-20 8:33 ` Joe Damato
0 siblings, 1 reply; 14+ messages in thread
From: Shay Drori @ 2024-08-20 6:40 UTC (permalink / raw)
To: Joe Damato, Jakub Kicinski, netdev, Daniel Borkmann,
David S. Miller, Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao,
Thomas Gleixner
On 15/08/2024 13:22, Joe Damato wrote:
> External email: Use caution opening links or attachments
>
>
> On Wed, Aug 14, 2024 at 05:20:46PM -0700, Jakub Kicinski wrote:
>> On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote:
>>> If it is, then the only option is to have the drivers pass in their
>>> IRQ affinity masks, as Stanislav suggested, to avoid adding that
>>> call to the hot path.
>>>
>>> If not, then the IRQ from napi_struct can be used and the affinity
>>> mask can be generated on every napi poll. i40e/gve/iavf would need
>>> calls to netif_napi_set_irq to set the IRQ mapping, which seems to
>>> be straightforward.
>>
>> It's a bit sad to have the generic solution blocked.
>> cpu_rmap_update() is exported. Maybe we can call it from our notifier?
>> rmap lives in struct net_device
>
> I agree on the sadness. I will take a look today.
>
> I guess if we were being really ambitious, we'd try to move ARFS
> stuff into the core (as RSS was moved into the core).
Sorry for the late reply. Maybe we can modify affinity notifier infra to
support more than a single notifier per IRQ.
@Thomas, do you know why only a single notifier per IRQ is supported?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-20 6:40 ` Shay Drori
@ 2024-08-20 8:33 ` Joe Damato
0 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2024-08-20 8:33 UTC (permalink / raw)
To: Shay Drori
Cc: Jakub Kicinski, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao,
Thomas Gleixner
On Tue, Aug 20, 2024 at 09:40:31AM +0300, Shay Drori wrote:
>
>
> On 15/08/2024 13:22, Joe Damato wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Aug 14, 2024 at 05:20:46PM -0700, Jakub Kicinski wrote:
> > > On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote:
> > > > If it is, then the only option is to have the drivers pass in their
> > > > IRQ affinity masks, as Stanislav suggested, to avoid adding that
> > > > call to the hot path.
> > > >
> > > > If not, then the IRQ from napi_struct can be used and the affinity
> > > > mask can be generated on every napi poll. i40e/gve/iavf would need
> > > > calls to netif_napi_set_irq to set the IRQ mapping, which seems to
> > > > be straightforward.
> > >
> > > It's a bit sad to have the generic solution blocked.
> > > cpu_rmap_update() is exported. Maybe we can call it from our notifier?
> > > rmap lives in struct net_device
> >
> > I agree on the sadness. I will take a look today.
> >
> > I guess if we were being really ambitious, we'd try to move ARFS
> > stuff into the core (as RSS was moved into the core).
>
>
> Sorry for the late reply. Maybe we can modify affinity notifier infra to
> support more than a single notifier per IRQ.
> @Thomas, do you know why only a single notifier per IRQ is supported?
Sorry for the delayed response as well on my side; I've been in
between lots of different kernel RFCs :)
Jakub: the issue seems to be that the internals in lib/cpu_rmap.c
are needed to call cpu_rmap_update. It's probably possible to expose
them somehow so that a generic IRQ notifier could call
cpu_rmap_update, as you mentioned, but some rewiring is going to be
needed, I think.
I had a couple ideas for rewiring stuff, but I haven't had time to
context switch back on to this work as I've been busy with a few
other things (the IRQ suspension stuff and another mlx5 thing I have
yet to send upstream).
I hope to take another look at it this week, but I welcome any
suggestions from Shay/Thomas in the meantime.
- Joe
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-08-20 8:33 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
2024-08-12 14:56 ` [RFC net-next 2/6] mlx5: Use napi_affinity_no_change Joe Damato
2024-08-12 14:56 ` [RFC net-next 6/6] mlx4: " Joe Damato
2024-08-14 0:17 ` [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Jakub Kicinski
2024-08-14 7:14 ` Joe Damato
2024-08-14 12:12 ` Joe Damato
2024-08-14 15:09 ` Jakub Kicinski
2024-08-14 15:19 ` Joe Damato
2024-08-14 16:03 ` Shay Drori
2024-08-14 18:01 ` Joe Damato
2024-08-15 0:20 ` Jakub Kicinski
2024-08-15 10:22 ` Joe Damato
2024-08-20 6:40 ` Shay Drori
2024-08-20 8:33 ` Joe Damato
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox