* [RFC net-next 1/6] netdevice: Add napi_affinity_no_change
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
@ 2024-08-12 14:56 ` Joe Damato
2024-08-12 20:23 ` Stanislav Fomichev
2024-08-12 14:56 ` [RFC net-next 2/6] mlx5: Use napi_affinity_no_change Joe Damato
` (5 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Joe Damato @ 2024-08-12 14:56 UTC (permalink / raw)
To: netdev
Cc: Joe Damato, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jiri Pirko, Sebastian Andrzej Siewior,
Lorenzo Bianconi, Daniel Borkmann, open list
Several drivers have their own, very similar, implementations of
determining if IRQ affinity has changed. Create napi_affinity_no_change
to centralize this logic in the core.
This will be used in following commits for various drivers to eliminate
duplicated code.
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
include/linux/netdevice.h | 8 ++++++++
net/core/dev.c | 14 ++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0ef3eaa23f4b..dc714a04b90a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -464,6 +464,14 @@ enum rx_handler_result {
typedef enum rx_handler_result rx_handler_result_t;
typedef rx_handler_result_t rx_handler_func_t(struct sk_buff **pskb);
+/**
+ * napi_affinity_no_change - determine if CPU affinity changed
+ * @irq: the IRQ whose affinity may have changed
+ *
+ * Return true if the CPU affinity has NOT changed, false otherwise.
+ */
+bool napi_affinity_no_change(unsigned int irq);
+
void __napi_schedule(struct napi_struct *n);
void __napi_schedule_irqoff(struct napi_struct *n);
diff --git a/net/core/dev.c b/net/core/dev.c
index 751d9b70e6ad..9c56ad49490c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -89,6 +89,7 @@
#include <linux/errno.h>
#include <linux/interrupt.h>
#include <linux/if_ether.h>
+#include <linux/irq.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/ethtool.h>
@@ -6210,6 +6211,19 @@ void __napi_schedule_irqoff(struct napi_struct *n)
}
EXPORT_SYMBOL(__napi_schedule_irqoff);
+bool napi_affinity_no_change(unsigned int irq)
+{
+ int cpu_curr = smp_processor_id();
+ const struct cpumask *aff_mask;
+
+ aff_mask = irq_get_effective_affinity_mask(irq);
+ if (unlikely(!aff_mask))
+ return true;
+
+ return cpumask_test_cpu(cpu_curr, aff_mask);
+}
+EXPORT_SYMBOL(napi_affinity_no_change);
+
bool napi_complete_done(struct napi_struct *n, int work_done)
{
unsigned long flags, val, new, timeout = 0;
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [RFC net-next 1/6] netdevice: Add napi_affinity_no_change
2024-08-12 14:56 ` [RFC net-next 1/6] netdevice: Add napi_affinity_no_change Joe Damato
@ 2024-08-12 20:23 ` Stanislav Fomichev
2024-08-12 21:08 ` Joe Damato
0 siblings, 1 reply; 27+ messages in thread
From: Stanislav Fomichev @ 2024-08-12 20:23 UTC (permalink / raw)
To: Joe Damato
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jiri Pirko, Sebastian Andrzej Siewior,
Lorenzo Bianconi, Daniel Borkmann, open list
On 08/12, Joe Damato wrote:
> Several drivers have their own, very similar, implementations of
> determining if IRQ affinity has changed. Create napi_affinity_no_change
> to centralize this logic in the core.
>
> This will be used in following commits for various drivers to eliminate
> duplicated code.
>
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
> include/linux/netdevice.h | 8 ++++++++
> net/core/dev.c | 14 ++++++++++++++
> 2 files changed, 22 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0ef3eaa23f4b..dc714a04b90a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -464,6 +464,14 @@ enum rx_handler_result {
> typedef enum rx_handler_result rx_handler_result_t;
> typedef rx_handler_result_t rx_handler_func_t(struct sk_buff **pskb);
>
> +/**
> + * napi_affinity_no_change - determine if CPU affinity changed
> + * @irq: the IRQ whose affinity may have changed
> + *
> + * Return true if the CPU affinity has NOT changed, false otherwise.
> + */
> +bool napi_affinity_no_change(unsigned int irq);
> +
> void __napi_schedule(struct napi_struct *n);
> void __napi_schedule_irqoff(struct napi_struct *n);
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 751d9b70e6ad..9c56ad49490c 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -89,6 +89,7 @@
> #include <linux/errno.h>
> #include <linux/interrupt.h>
> #include <linux/if_ether.h>
> +#include <linux/irq.h>
> #include <linux/netdevice.h>
> #include <linux/etherdevice.h>
> #include <linux/ethtool.h>
> @@ -6210,6 +6211,19 @@ void __napi_schedule_irqoff(struct napi_struct *n)
> }
> EXPORT_SYMBOL(__napi_schedule_irqoff);
>
> +bool napi_affinity_no_change(unsigned int irq)
> +{
> + int cpu_curr = smp_processor_id();
> + const struct cpumask *aff_mask;
> +
[..]
> + aff_mask = irq_get_effective_affinity_mask(irq);
Most drivers don't seem to call this on every napi_poll (and
cache the aff_mask somewhere instead). Should we try to keep this
out of the past path as well?
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [RFC net-next 1/6] netdevice: Add napi_affinity_no_change
2024-08-12 20:23 ` Stanislav Fomichev
@ 2024-08-12 21:08 ` Joe Damato
2024-08-12 22:36 ` Stanislav Fomichev
0 siblings, 1 reply; 27+ messages in thread
From: Joe Damato @ 2024-08-12 21:08 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jiri Pirko, Sebastian Andrzej Siewior,
Lorenzo Bianconi, Daniel Borkmann, open list
On Mon, Aug 12, 2024 at 01:23:27PM -0700, Stanislav Fomichev wrote:
> On 08/12, Joe Damato wrote:
> > Several drivers have their own, very similar, implementations of
> > determining if IRQ affinity has changed. Create napi_affinity_no_change
> > to centralize this logic in the core.
> >
> > This will be used in following commits for various drivers to eliminate
> > duplicated code.
> >
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> > include/linux/netdevice.h | 8 ++++++++
> > net/core/dev.c | 14 ++++++++++++++
> > 2 files changed, 22 insertions(+)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 0ef3eaa23f4b..dc714a04b90a 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -464,6 +464,14 @@ enum rx_handler_result {
> > typedef enum rx_handler_result rx_handler_result_t;
> > typedef rx_handler_result_t rx_handler_func_t(struct sk_buff **pskb);
> >
> > +/**
> > + * napi_affinity_no_change - determine if CPU affinity changed
> > + * @irq: the IRQ whose affinity may have changed
> > + *
> > + * Return true if the CPU affinity has NOT changed, false otherwise.
> > + */
> > +bool napi_affinity_no_change(unsigned int irq);
> > +
> > void __napi_schedule(struct napi_struct *n);
> > void __napi_schedule_irqoff(struct napi_struct *n);
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 751d9b70e6ad..9c56ad49490c 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -89,6 +89,7 @@
> > #include <linux/errno.h>
> > #include <linux/interrupt.h>
> > #include <linux/if_ether.h>
> > +#include <linux/irq.h>
> > #include <linux/netdevice.h>
> > #include <linux/etherdevice.h>
> > #include <linux/ethtool.h>
> > @@ -6210,6 +6211,19 @@ void __napi_schedule_irqoff(struct napi_struct *n)
> > }
> > EXPORT_SYMBOL(__napi_schedule_irqoff);
> >
> > +bool napi_affinity_no_change(unsigned int irq)
> > +{
> > + int cpu_curr = smp_processor_id();
> > + const struct cpumask *aff_mask;
> > +
>
> [..]
>
> > + aff_mask = irq_get_effective_affinity_mask(irq);
>
> Most drivers don't seem to call this on every napi_poll (and
> cache the aff_mask somewhere instead). Should we try to keep this
> out of the past path as well?
Hm, I see what you mean. It looks like only gve calls it on every
poll, while the others use a cached value.
Maybe a better solution is to:
1. Have the helper take the cached affinity mask from the driver
and return true/false.
2. Update gve to cache the mask (like the other 4 are doing).
FWIW, it seems i40e added this code to solve a specific bug [1] and
I would assume other drivers either hit the same issue (or were
inspired by i40e).
In general: I think the logic is here to stay and other drivers may
do something similar in the future.
It'd be nice to have one helper instead of several different
copies/implementations.
[1]: https://patchwork.ozlabs.org/project/intel-wired-lan/patch/1473895479-23035-9-git-send-email-bimmy.pujari@intel.com/
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [RFC net-next 1/6] netdevice: Add napi_affinity_no_change
2024-08-12 21:08 ` Joe Damato
@ 2024-08-12 22:36 ` Stanislav Fomichev
2024-08-13 9:11 ` Joe Damato
0 siblings, 1 reply; 27+ messages in thread
From: Stanislav Fomichev @ 2024-08-12 22:36 UTC (permalink / raw)
To: Joe Damato, netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jiri Pirko, Sebastian Andrzej Siewior,
Lorenzo Bianconi, Daniel Borkmann, open list
On 08/12, Joe Damato wrote:
> On Mon, Aug 12, 2024 at 01:23:27PM -0700, Stanislav Fomichev wrote:
> > On 08/12, Joe Damato wrote:
> > > Several drivers have their own, very similar, implementations of
> > > determining if IRQ affinity has changed. Create napi_affinity_no_change
> > > to centralize this logic in the core.
> > >
> > > This will be used in following commits for various drivers to eliminate
> > > duplicated code.
> > >
> > > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > > ---
> > > include/linux/netdevice.h | 8 ++++++++
> > > net/core/dev.c | 14 ++++++++++++++
> > > 2 files changed, 22 insertions(+)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 0ef3eaa23f4b..dc714a04b90a 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -464,6 +464,14 @@ enum rx_handler_result {
> > > typedef enum rx_handler_result rx_handler_result_t;
> > > typedef rx_handler_result_t rx_handler_func_t(struct sk_buff **pskb);
> > >
> > > +/**
> > > + * napi_affinity_no_change - determine if CPU affinity changed
> > > + * @irq: the IRQ whose affinity may have changed
> > > + *
> > > + * Return true if the CPU affinity has NOT changed, false otherwise.
> > > + */
> > > +bool napi_affinity_no_change(unsigned int irq);
> > > +
> > > void __napi_schedule(struct napi_struct *n);
> > > void __napi_schedule_irqoff(struct napi_struct *n);
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 751d9b70e6ad..9c56ad49490c 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -89,6 +89,7 @@
> > > #include <linux/errno.h>
> > > #include <linux/interrupt.h>
> > > #include <linux/if_ether.h>
> > > +#include <linux/irq.h>
> > > #include <linux/netdevice.h>
> > > #include <linux/etherdevice.h>
> > > #include <linux/ethtool.h>
> > > @@ -6210,6 +6211,19 @@ void __napi_schedule_irqoff(struct napi_struct *n)
> > > }
> > > EXPORT_SYMBOL(__napi_schedule_irqoff);
> > >
> > > +bool napi_affinity_no_change(unsigned int irq)
> > > +{
> > > + int cpu_curr = smp_processor_id();
> > > + const struct cpumask *aff_mask;
> > > +
> >
> > [..]
> >
> > > + aff_mask = irq_get_effective_affinity_mask(irq);
> >
> > Most drivers don't seem to call this on every napi_poll (and
> > cache the aff_mask somewhere instead). Should we try to keep this
> > out of the past path as well?
>
> Hm, I see what you mean. It looks like only gve calls it on every
> poll, while the others use a cached value.
>
> Maybe a better solution is to:
> 1. Have the helper take the cached affinity mask from the driver
> and return true/false.
> 2. Update gve to cache the mask (like the other 4 are doing).
SG! GVE is definitely the outlier here.
> FWIW, it seems i40e added this code to solve a specific bug [1] and
> I would assume other drivers either hit the same issue (or were
> inspired by i40e).
>
> In general: I think the logic is here to stay and other drivers may
> do something similar in the future.
+1 on pushing this logic to the core if possible.
> It'd be nice to have one helper instead of several different
> copies/implementations.
>
> [1]: https://patchwork.ozlabs.org/project/intel-wired-lan/patch/1473895479-23035-9-git-send-email-bimmy.pujari@intel.com/
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [RFC net-next 1/6] netdevice: Add napi_affinity_no_change
2024-08-12 22:36 ` Stanislav Fomichev
@ 2024-08-13 9:11 ` Joe Damato
2024-08-13 10:03 ` Joe Damato
2024-08-13 13:05 ` Simon Horman
0 siblings, 2 replies; 27+ messages in thread
From: Joe Damato @ 2024-08-13 9:11 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jiri Pirko, Sebastian Andrzej Siewior,
Lorenzo Bianconi, Daniel Borkmann, open list
On Mon, Aug 12, 2024 at 03:36:42PM -0700, Stanislav Fomichev wrote:
> On 08/12, Joe Damato wrote:
> > On Mon, Aug 12, 2024 at 01:23:27PM -0700, Stanislav Fomichev wrote:
> > > On 08/12, Joe Damato wrote:
> > > > Several drivers have their own, very similar, implementations of
> > > > determining if IRQ affinity has changed. Create napi_affinity_no_change
> > > > to centralize this logic in the core.
> > > >
> > > > This will be used in following commits for various drivers to eliminate
> > > > duplicated code.
> > > >
[...]
> > > > +bool napi_affinity_no_change(unsigned int irq)
> > > > +{
> > > > + int cpu_curr = smp_processor_id();
> > > > + const struct cpumask *aff_mask;
> > > > +
> > >
> > > [..]
> > >
> > > > + aff_mask = irq_get_effective_affinity_mask(irq);
> > >
> > > Most drivers don't seem to call this on every napi_poll (and
> > > cache the aff_mask somewhere instead). Should we try to keep this
> > > out of the past path as well?
> >
> > Hm, I see what you mean. It looks like only gve calls it on every
> > poll, while the others use a cached value.
> >
> > Maybe a better solution is to:
> > 1. Have the helper take the cached affinity mask from the driver
> > and return true/false.
> > 2. Update gve to cache the mask (like the other 4 are doing).
>
> SG! GVE is definitely the outlier here.
OK, I'll hack on that for rfcv2 and see what it looks like. Thanks
for the suggestion.
Hopefully the maintainers (or other folks) will chime in on whether
or not I should submit fixes for patches 4 - 6 for the type mismatch
stuff first or just handle it all together.
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [RFC net-next 1/6] netdevice: Add napi_affinity_no_change
2024-08-13 9:11 ` Joe Damato
@ 2024-08-13 10:03 ` Joe Damato
2024-08-13 13:05 ` Simon Horman
1 sibling, 0 replies; 27+ messages in thread
From: Joe Damato @ 2024-08-13 10:03 UTC (permalink / raw)
To: Stanislav Fomichev, netdev, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
open list
On Tue, Aug 13, 2024 at 10:11:09AM +0100, Joe Damato wrote:
> On Mon, Aug 12, 2024 at 03:36:42PM -0700, Stanislav Fomichev wrote:
> > On 08/12, Joe Damato wrote:
> > > On Mon, Aug 12, 2024 at 01:23:27PM -0700, Stanislav Fomichev wrote:
> > > > On 08/12, Joe Damato wrote:
> > > > > Several drivers have their own, very similar, implementations of
> > > > > determining if IRQ affinity has changed. Create napi_affinity_no_change
> > > > > to centralize this logic in the core.
> > > > >
> > > > > This will be used in following commits for various drivers to eliminate
> > > > > duplicated code.
> > > > >
>
> [...]
>
> > > > > +bool napi_affinity_no_change(unsigned int irq)
> > > > > +{
> > > > > + int cpu_curr = smp_processor_id();
> > > > > + const struct cpumask *aff_mask;
> > > > > +
> > > >
> > > > [..]
> > > >
> > > > > + aff_mask = irq_get_effective_affinity_mask(irq);
> > > >
> > > > Most drivers don't seem to call this on every napi_poll (and
> > > > cache the aff_mask somewhere instead). Should we try to keep this
> > > > out of the past path as well?
> > >
> > > Hm, I see what you mean. It looks like only gve calls it on every
> > > poll, while the others use a cached value.
> > >
> > > Maybe a better solution is to:
> > > 1. Have the helper take the cached affinity mask from the driver
> > > and return true/false.
> > > 2. Update gve to cache the mask (like the other 4 are doing).
> >
> > SG! GVE is definitely the outlier here.
>
> OK, I'll hack on that for rfcv2 and see what it looks like. Thanks
> for the suggestion.
Yea, I just did this for rfcv2 and it looks a lot nicer/fewer
changes. Will hold off on sending an rfc v2 until the 48 hour timer
expires ;)
> Hopefully the maintainers (or other folks) will chime in on whether
> or not I should submit fixes for patches 4 - 6 for the type mismatch
> stuff first or just handle it all together.
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [RFC net-next 1/6] netdevice: Add napi_affinity_no_change
2024-08-13 9:11 ` Joe Damato
2024-08-13 10:03 ` Joe Damato
@ 2024-08-13 13:05 ` Simon Horman
1 sibling, 0 replies; 27+ messages in thread
From: Simon Horman @ 2024-08-13 13:05 UTC (permalink / raw)
To: Joe Damato, Stanislav Fomichev, netdev, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
open list
On Tue, Aug 13, 2024 at 10:11:09AM +0100, Joe Damato wrote:
> On Mon, Aug 12, 2024 at 03:36:42PM -0700, Stanislav Fomichev wrote:
> > On 08/12, Joe Damato wrote:
> > > On Mon, Aug 12, 2024 at 01:23:27PM -0700, Stanislav Fomichev wrote:
> > > > On 08/12, Joe Damato wrote:
> > > > > Several drivers have their own, very similar, implementations of
> > > > > determining if IRQ affinity has changed. Create napi_affinity_no_change
> > > > > to centralize this logic in the core.
> > > > >
> > > > > This will be used in following commits for various drivers to eliminate
> > > > > duplicated code.
> > > > >
>
> [...]
>
> > > > > +bool napi_affinity_no_change(unsigned int irq)
> > > > > +{
> > > > > + int cpu_curr = smp_processor_id();
> > > > > + const struct cpumask *aff_mask;
> > > > > +
> > > >
> > > > [..]
> > > >
> > > > > + aff_mask = irq_get_effective_affinity_mask(irq);
> > > >
> > > > Most drivers don't seem to call this on every napi_poll (and
> > > > cache the aff_mask somewhere instead). Should we try to keep this
> > > > out of the past path as well?
> > >
> > > Hm, I see what you mean. It looks like only gve calls it on every
> > > poll, while the others use a cached value.
> > >
> > > Maybe a better solution is to:
> > > 1. Have the helper take the cached affinity mask from the driver
> > > and return true/false.
> > > 2. Update gve to cache the mask (like the other 4 are doing).
> >
> > SG! GVE is definitely the outlier here.
>
> OK, I'll hack on that for rfcv2 and see what it looks like. Thanks
> for the suggestion.
>
> Hopefully the maintainers (or other folks) will chime in on whether
> or not I should submit fixes for patches 4 - 6 for the type mismatch
> stuff first or just handle it all together.
<2c>
Patches 4 - 6 seem more like clean-ups that fixes to me: they aren't fixing
any bugs are they? So I would just keep them as part of this patchset
unless it becomes unwieldy.
</2c>
In any case thanks for all your good work in this area.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [RFC net-next 2/6] mlx5: Use napi_affinity_no_change
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
2024-08-12 14:56 ` [RFC net-next 1/6] netdevice: Add napi_affinity_no_change Joe Damato
@ 2024-08-12 14:56 ` Joe Damato
2024-08-12 14:56 ` [RFC net-next 3/6] gve: " Joe Damato
` (4 subsequent siblings)
6 siblings, 0 replies; 27+ messages in thread
From: Joe Damato @ 2024-08-12 14:56 UTC (permalink / raw)
To: netdev
Cc: Joe Damato, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
open list:MELLANOX MLX5 core VPI driver, open list
Use napi_affinity_no_change instead of mlx5's internal implementation,
simplifying and centralizing the logic.
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 9 +--------
3 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 7832f6b6c8a8..c288e3f45a06 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -767,7 +767,7 @@ struct mlx5e_channel {
spinlock_t async_icosq_lock;
/* data path - accessed per napi poll */
- const struct cpumask *aff_mask;
+ unsigned int irq;
struct mlx5e_ch_stats *stats;
/* control */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6f686fabed44..7b217aafbdfd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2684,8 +2684,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
c->mkey_be = cpu_to_be32(mdev->mlx5e_res.hw_objs.mkey);
c->num_tc = mlx5e_get_dcb_num_tc(params);
c->xdp = !!params->xdp_prog;
+ c->irq = irq;
c->stats = &priv->channel_stats[ix]->ch;
- c->aff_mask = irq_get_effective_affinity_mask(irq);
c->lag_port = mlx5e_enumerate_lag_port(mdev, ix);
netif_napi_add(netdev, &c->napi, mlx5e_napi_poll);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 5873fde65c2e..ee453d019c3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -39,13 +39,6 @@
#include "en/xsk/tx.h"
#include "en_accel/ktls_txrx.h"
-static inline bool mlx5e_channel_no_affinity_change(struct mlx5e_channel *c)
-{
- int current_cpu = smp_processor_id();
-
- return cpumask_test_cpu(current_cpu, c->aff_mask);
-}
-
static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
{
struct mlx5e_sq_stats *stats = sq->stats;
@@ -201,7 +194,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
busy |= busy_xsk;
if (busy) {
- if (likely(mlx5e_channel_no_affinity_change(c))) {
+ if (likely(napi_affinity_no_change(c->irq))) {
work_done = budget;
goto out;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread* [RFC net-next 3/6] gve: Use napi_affinity_no_change
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
2024-08-12 14:56 ` [RFC net-next 1/6] netdevice: Add napi_affinity_no_change Joe Damato
2024-08-12 14:56 ` [RFC net-next 2/6] mlx5: Use napi_affinity_no_change Joe Damato
@ 2024-08-12 14:56 ` Joe Damato
2024-08-13 18:55 ` Shailend Chand
2024-08-12 14:56 ` [RFC net-next 4/6] i40e: " Joe Damato
` (3 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Joe Damato @ 2024-08-12 14:56 UTC (permalink / raw)
To: netdev
Cc: Joe Damato, Jeroen de Borst, Praveen Kaligineedi, Shailend Chand,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, Harshitha Ramamurthy, Ziwei Xiao, open list
Use napi_affinity_no_change instead of gve's internal implementation,
simplifying and centralizing the logic.
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/google/gve/gve_main.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 661566db68c8..ad5e85b8c6a5 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -298,18 +298,6 @@ static irqreturn_t gve_intr_dqo(int irq, void *arg)
return IRQ_HANDLED;
}
-static int gve_is_napi_on_home_cpu(struct gve_priv *priv, u32 irq)
-{
- int cpu_curr = smp_processor_id();
- const struct cpumask *aff_mask;
-
- aff_mask = irq_get_effective_affinity_mask(irq);
- if (unlikely(!aff_mask))
- return 1;
-
- return cpumask_test_cpu(cpu_curr, aff_mask);
-}
-
int gve_napi_poll(struct napi_struct *napi, int budget)
{
struct gve_notify_block *block;
@@ -383,7 +371,7 @@ int gve_napi_poll_dqo(struct napi_struct *napi, int budget)
/* Reschedule by returning budget only if already on the correct
* cpu.
*/
- if (likely(gve_is_napi_on_home_cpu(priv, block->irq)))
+ if (likely(napi_affinity_no_change(block->irq)))
return budget;
/* If not on the cpu with which this queue's irq has affinity
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [RFC net-next 3/6] gve: Use napi_affinity_no_change
2024-08-12 14:56 ` [RFC net-next 3/6] gve: " Joe Damato
@ 2024-08-13 18:55 ` Shailend Chand
2024-08-13 21:44 ` Joe Damato
0 siblings, 1 reply; 27+ messages in thread
From: Shailend Chand @ 2024-08-13 18:55 UTC (permalink / raw)
To: Joe Damato
Cc: netdev, Jeroen de Borst, Praveen Kaligineedi, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Willem de Bruijn,
Harshitha Ramamurthy, Ziwei Xiao, open list
On Mon, Aug 12, 2024 at 7:57 AM Joe Damato <jdamato@fastly.com> wrote:
>
> Use napi_affinity_no_change instead of gve's internal implementation,
> simplifying and centralizing the logic.
>
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
> drivers/net/ethernet/google/gve/gve_main.c | 14 +-------------
> 1 file changed, 1 insertion(+), 13 deletions(-)
>
> diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> index 661566db68c8..ad5e85b8c6a5 100644
> --- a/drivers/net/ethernet/google/gve/gve_main.c
> +++ b/drivers/net/ethernet/google/gve/gve_main.c
> @@ -298,18 +298,6 @@ static irqreturn_t gve_intr_dqo(int irq, void *arg)
> return IRQ_HANDLED;
> }
>
> -static int gve_is_napi_on_home_cpu(struct gve_priv *priv, u32 irq)
> -{
> - int cpu_curr = smp_processor_id();
> - const struct cpumask *aff_mask;
> -
> - aff_mask = irq_get_effective_affinity_mask(irq);
> - if (unlikely(!aff_mask))
> - return 1;
> -
> - return cpumask_test_cpu(cpu_curr, aff_mask);
> -}
> -
> int gve_napi_poll(struct napi_struct *napi, int budget)
> {
> struct gve_notify_block *block;
> @@ -383,7 +371,7 @@ int gve_napi_poll_dqo(struct napi_struct *napi, int budget)
> /* Reschedule by returning budget only if already on the correct
> * cpu.
> */
> - if (likely(gve_is_napi_on_home_cpu(priv, block->irq)))
> + if (likely(napi_affinity_no_change(block->irq)))
Nice to centralize this code! Evolving this to cache the affinity mask
like the other drivers would probably also require a means to update
the cache when the affinity changes.
> return budget;
>
> /* If not on the cpu with which this queue's irq has affinity
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [RFC net-next 3/6] gve: Use napi_affinity_no_change
2024-08-13 18:55 ` Shailend Chand
@ 2024-08-13 21:44 ` Joe Damato
2024-08-13 21:50 ` Shailend Chand
0 siblings, 1 reply; 27+ messages in thread
From: Joe Damato @ 2024-08-13 21:44 UTC (permalink / raw)
To: Shailend Chand
Cc: netdev, Jeroen de Borst, Praveen Kaligineedi, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Willem de Bruijn,
Harshitha Ramamurthy, Ziwei Xiao, open list
On Tue, Aug 13, 2024 at 11:55:31AM -0700, Shailend Chand wrote:
> On Mon, Aug 12, 2024 at 7:57 AM Joe Damato <jdamato@fastly.com> wrote:
> >
> > Use napi_affinity_no_change instead of gve's internal implementation,
> > simplifying and centralizing the logic.
> >
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> > drivers/net/ethernet/google/gve/gve_main.c | 14 +-------------
> > 1 file changed, 1 insertion(+), 13 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> > index 661566db68c8..ad5e85b8c6a5 100644
> > --- a/drivers/net/ethernet/google/gve/gve_main.c
> > +++ b/drivers/net/ethernet/google/gve/gve_main.c
> > @@ -298,18 +298,6 @@ static irqreturn_t gve_intr_dqo(int irq, void *arg)
> > return IRQ_HANDLED;
> > }
> >
> > -static int gve_is_napi_on_home_cpu(struct gve_priv *priv, u32 irq)
> > -{
> > - int cpu_curr = smp_processor_id();
> > - const struct cpumask *aff_mask;
> > -
> > - aff_mask = irq_get_effective_affinity_mask(irq);
> > - if (unlikely(!aff_mask))
> > - return 1;
> > -
> > - return cpumask_test_cpu(cpu_curr, aff_mask);
> > -}
> > -
> > int gve_napi_poll(struct napi_struct *napi, int budget)
> > {
> > struct gve_notify_block *block;
> > @@ -383,7 +371,7 @@ int gve_napi_poll_dqo(struct napi_struct *napi, int budget)
> > /* Reschedule by returning budget only if already on the correct
> > * cpu.
> > */
> > - if (likely(gve_is_napi_on_home_cpu(priv, block->irq)))
> > + if (likely(napi_affinity_no_change(block->irq)))
>
> Nice to centralize this code! Evolving this to cache the affinity mask
> like the other drivers would probably also require a means to update
> the cache when the affinity changes.
Thanks for taking a look.
The gve driver already calls irq_get_effective_affinity_mask in the
hot path, so I'm planning on submitting a rfcv2 which will do this:
- if (likely(gve_is_napi_on_home_cpu(priv, block->irq)))
+ const struct cpumask *aff_mask =
+ irq_get_effective_affinity_mask(block->irq);
+
+ if (likely(napi_affinity_no_change(aff_mask)))
return budget;
with a change like that there'd be no behavioral change to gve since
it didn't cache before and still won't be caching after this change.
I think a change can be made to gve in a separate patch set to
support caching the affinity mask and does not need to be included
with this change.
What do you think?
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [RFC net-next 3/6] gve: Use napi_affinity_no_change
2024-08-13 21:44 ` Joe Damato
@ 2024-08-13 21:50 ` Shailend Chand
0 siblings, 0 replies; 27+ messages in thread
From: Shailend Chand @ 2024-08-13 21:50 UTC (permalink / raw)
To: Joe Damato, Shailend Chand, netdev, Jeroen de Borst,
Praveen Kaligineedi, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Willem de Bruijn,
Harshitha Ramamurthy, Ziwei Xiao, open list
On Tue, Aug 13, 2024 at 2:44 PM Joe Damato <jdamato@fastly.com> wrote:
>
> On Tue, Aug 13, 2024 at 11:55:31AM -0700, Shailend Chand wrote:
> > On Mon, Aug 12, 2024 at 7:57 AM Joe Damato <jdamato@fastly.com> wrote:
> > >
> > > Use napi_affinity_no_change instead of gve's internal implementation,
> > > simplifying and centralizing the logic.
> > >
> > > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > > ---
> > > drivers/net/ethernet/google/gve/gve_main.c | 14 +-------------
> > > 1 file changed, 1 insertion(+), 13 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> > > index 661566db68c8..ad5e85b8c6a5 100644
> > > --- a/drivers/net/ethernet/google/gve/gve_main.c
> > > +++ b/drivers/net/ethernet/google/gve/gve_main.c
> > > @@ -298,18 +298,6 @@ static irqreturn_t gve_intr_dqo(int irq, void *arg)
> > > return IRQ_HANDLED;
> > > }
> > >
> > > -static int gve_is_napi_on_home_cpu(struct gve_priv *priv, u32 irq)
> > > -{
> > > - int cpu_curr = smp_processor_id();
> > > - const struct cpumask *aff_mask;
> > > -
> > > - aff_mask = irq_get_effective_affinity_mask(irq);
> > > - if (unlikely(!aff_mask))
> > > - return 1;
> > > -
> > > - return cpumask_test_cpu(cpu_curr, aff_mask);
> > > -}
> > > -
> > > int gve_napi_poll(struct napi_struct *napi, int budget)
> > > {
> > > struct gve_notify_block *block;
> > > @@ -383,7 +371,7 @@ int gve_napi_poll_dqo(struct napi_struct *napi, int budget)
> > > /* Reschedule by returning budget only if already on the correct
> > > * cpu.
> > > */
> > > - if (likely(gve_is_napi_on_home_cpu(priv, block->irq)))
> > > + if (likely(napi_affinity_no_change(block->irq)))
> >
> > Nice to centralize this code! Evolving this to cache the affinity mask
> > like the other drivers would probably also require a means to update
> > the cache when the affinity changes.
>
> Thanks for taking a look.
>
> The gve driver already calls irq_get_effective_affinity_mask in the
> hot path, so I'm planning on submitting a rfcv2 which will do this:
>
> - if (likely(gve_is_napi_on_home_cpu(priv, block->irq)))
> + const struct cpumask *aff_mask =
> + irq_get_effective_affinity_mask(block->irq);
> +
> + if (likely(napi_affinity_no_change(aff_mask)))
> return budget;
>
> with a change like that there'd be no behavioral change to gve since
> it didn't cache before and still won't be caching after this change.
>
> I think a change can be made to gve in a separate patch set to
> support caching the affinity mask and does not need to be included
> with this change.
>
> What do you think?
Sounds good!
^ permalink raw reply [flat|nested] 27+ messages in thread
* [RFC net-next 4/6] i40e: Use napi_affinity_no_change
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
` (2 preceding siblings ...)
2024-08-12 14:56 ` [RFC net-next 3/6] gve: " Joe Damato
@ 2024-08-12 14:56 ` Joe Damato
2024-08-12 14:56 ` [RFC net-next 5/6] iavf: " Joe Damato
` (2 subsequent siblings)
6 siblings, 0 replies; 27+ messages in thread
From: Joe Damato @ 2024-08-12 14:56 UTC (permalink / raw)
To: netdev
Cc: Joe Damato, Tony Nguyen, Przemek Kitszel, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni,
moderated list:INTEL ETHERNET DRIVERS, open list
Use napi_affinity_no_change instead of i40e's internal implementation,
simplifying and centralizing the logic. To facilitate this, fix the type
of irq_num to be unsigned int.
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/intel/i40e/i40e.h | 2 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 4 +---
3 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index d546567e0286..326bb7ffab26 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -955,7 +955,7 @@ struct i40e_q_vector {
char name[I40E_INT_NAME_STR_LEN];
bool arm_wb_state;
bool in_busy_poll;
- int irq_num; /* IRQ assigned to this q_vector */
+ unsigned int irq_num; /* IRQ assigned to this q_vector */
} ____cacheline_internodealigned_in_smp;
/* lan device */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index cbcfada7b357..a120fdd87c3e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4137,10 +4137,10 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
int q_vectors = vsi->num_q_vectors;
struct i40e_pf *pf = vsi->back;
int base = vsi->base_vector;
+ unsigned int irq_num;
int rx_int_idx = 0;
int tx_int_idx = 0;
int vector, err;
- int irq_num;
int cpu;
for (vector = 0; vector < q_vectors; vector++) {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index c006f716a3bd..8fc39cf4c5d3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2810,8 +2810,6 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
/* If work not completed, return budget and polling will return */
if (!clean_complete) {
- int cpu_id = smp_processor_id();
-
/* It is possible that the interrupt affinity has changed but,
* if the cpu is pegged at 100%, polling will never exit while
* traffic continues and the interrupt will be stuck on this
@@ -2819,7 +2817,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
* continue to poll, otherwise we must stop polling so the
* interrupt can move to the correct cpu.
*/
- if (!cpumask_test_cpu(cpu_id, &q_vector->affinity_mask)) {
+ if (!napi_affinity_no_change(q_vector->irq_num)) {
/* Tell napi that we are done polling */
napi_complete_done(napi, work_done);
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread* [RFC net-next 5/6] iavf: Use napi_affinity_no_change
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
` (3 preceding siblings ...)
2024-08-12 14:56 ` [RFC net-next 4/6] i40e: " Joe Damato
@ 2024-08-12 14:56 ` Joe Damato
2024-08-12 14:56 ` [RFC net-next 6/6] mlx4: " Joe Damato
2024-08-14 0:17 ` [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Jakub Kicinski
6 siblings, 0 replies; 27+ messages in thread
From: Joe Damato @ 2024-08-12 14:56 UTC (permalink / raw)
To: netdev
Cc: Joe Damato, Tony Nguyen, Przemek Kitszel, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni,
moderated list:INTEL ETHERNET DRIVERS, open list
Use napi_affinity_no_change instead of iavf's internal implementation,
simplifying and centralizing the logic. To support this, struct
iavf_q_vector has been extended to store the IRQ number, and irq_num's
type is changed to unsigned int.
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/intel/iavf/iavf.h | 1 +
drivers/net/ethernet/intel/iavf/iavf_main.c | 4 +++-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 4 +---
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 23a6557fc3db..b066bac6fa7c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -113,6 +113,7 @@ struct iavf_q_vector {
bool arm_wb_state;
cpumask_t affinity_mask;
struct irq_affinity_notify affinity_notify;
+ unsigned int irq_num;
};
/* Helper macros to switch between ints/sec and what the register uses.
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index ff11bafb3b4f..6c7733025ce3 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -565,7 +565,8 @@ iavf_request_traffic_irqs(struct iavf_adapter *adapter, char *basename)
{
unsigned int vector, q_vectors;
unsigned int rx_int_idx = 0, tx_int_idx = 0;
- int irq_num, err;
+ unsigned int irq_num;
+ int err;
int cpu;
iavf_irq_disable(adapter);
@@ -601,6 +602,7 @@ iavf_request_traffic_irqs(struct iavf_adapter *adapter, char *basename)
"Request_irq failed, error: %d\n", err);
goto free_queue_irqs;
}
+ q_vector->irq_num = irq_num;
/* register for affinity change notifications */
q_vector->affinity_notify.notify = iavf_irq_affinity_notify;
q_vector->affinity_notify.release =
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 26b424fd6718..5eb78ac1f39d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1408,8 +1408,6 @@ int iavf_napi_poll(struct napi_struct *napi, int budget)
/* If work not completed, return budget and polling will return */
if (!clean_complete) {
- int cpu_id = smp_processor_id();
-
/* It is possible that the interrupt affinity has changed but,
* if the cpu is pegged at 100%, polling will never exit while
* traffic continues and the interrupt will be stuck on this
@@ -1417,7 +1415,7 @@ int iavf_napi_poll(struct napi_struct *napi, int budget)
* continue to poll, otherwise we must stop polling so the
* interrupt can move to the correct cpu.
*/
- if (!cpumask_test_cpu(cpu_id, &q_vector->affinity_mask)) {
+ if (!napi_affinity_no_change(q_vector->irq_num)) {
/* Tell napi that we are done polling */
napi_complete_done(napi, work_done);
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread* [RFC net-next 6/6] mlx4: Use napi_affinity_no_change
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
` (4 preceding siblings ...)
2024-08-12 14:56 ` [RFC net-next 5/6] iavf: " Joe Damato
@ 2024-08-12 14:56 ` Joe Damato
2024-08-14 0:17 ` [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Jakub Kicinski
6 siblings, 0 replies; 27+ messages in thread
From: Joe Damato @ 2024-08-12 14:56 UTC (permalink / raw)
To: netdev
Cc: Joe Damato, Tariq Toukan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Yishai Hadas,
open list:MELLANOX MLX4 core VPI driver, open list
Use napi_affinity_no_change instead of mlx4's internal implementation,
simplifying and centralizing the logic.
To support this, some type changes were made, which might have been bugs:
- mlx4_eq_get_irq should return unsigned int (since the field it
returns is u16)
- fix the type of irq (from int to unsigned int) in mlx4_en_activate_cq
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/mellanox/mlx4/en_cq.c | 6 ++++--
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 6 +-----
drivers/net/ethernet/mellanox/mlx4/eq.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 +
include/linux/mlx4/device.h | 2 +-
5 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 461cc2c79c71..9c3e1c810412 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -90,9 +90,10 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
int cq_idx)
{
struct mlx4_en_dev *mdev = priv->mdev;
- int irq, err = 0;
- int timestamp_en = 0;
bool assigned_eq = false;
+ int timestamp_en = 0;
+ unsigned int irq;
+ int err = 0;
cq->dev = mdev->pndev[priv->port];
cq->mcq.set_ci_db = cq->wqres.db.db;
@@ -144,6 +145,7 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
goto free_eq;
cq->cq_idx = cq_idx;
+ cq->irq = irq;
cq->mcq.event = mlx4_en_cq_event;
switch (cq->type) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 15c57e9517e9..4c296327b75b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1022,14 +1022,10 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
/* If we used up all the quota - we're probably not done yet... */
if (done == budget || !clean_complete) {
- int cpu_curr;
-
/* in case we got here because of !clean_complete */
done = budget;
- cpu_curr = smp_processor_id();
-
- if (likely(cpumask_test_cpu(cpu_curr, cq->aff_mask)))
+ if (likely(napi_affinity_no_change(cq->irq)))
return budget;
/* Current cpu is not according to smp_irq_affinity -
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 9572a45f6143..feca4ea30750 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1539,7 +1539,7 @@ int mlx4_assign_eq(struct mlx4_dev *dev, u8 port, int *vector)
}
EXPORT_SYMBOL(mlx4_assign_eq);
-int mlx4_eq_get_irq(struct mlx4_dev *dev, int cq_vec)
+unsigned int mlx4_eq_get_irq(struct mlx4_dev *dev, int cq_vec)
{
struct mlx4_priv *priv = mlx4_priv(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 28b70dcc652e..1f9830b23c9c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -381,6 +381,7 @@ struct mlx4_en_cq {
const struct cpumask *aff_mask;
int cq_idx;
+ unsigned int irq;
};
struct mlx4_en_port_profile {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 27f42f713c89..1f2e9e954768 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1433,7 +1433,7 @@ int mlx4_assign_eq(struct mlx4_dev *dev, u8 port, int *vector);
void mlx4_release_eq(struct mlx4_dev *dev, int vec);
int mlx4_is_eq_shared(struct mlx4_dev *dev, int vector);
-int mlx4_eq_get_irq(struct mlx4_dev *dev, int vec);
+unsigned int mlx4_eq_get_irq(struct mlx4_dev *dev, int vec);
int mlx4_get_phys_port_id(struct mlx4_dev *dev);
int mlx4_wol_read(struct mlx4_dev *dev, u64 *config, int port);
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-12 14:56 [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Joe Damato
` (5 preceding siblings ...)
2024-08-12 14:56 ` [RFC net-next 6/6] mlx4: " Joe Damato
@ 2024-08-14 0:17 ` Jakub Kicinski
2024-08-14 7:14 ` Joe Damato
6 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2024-08-14 0:17 UTC (permalink / raw)
To: Joe Damato
Cc: netdev, Daniel Borkmann, David S. Miller, Eric Dumazet,
Harshitha Ramamurthy, moderated list:INTEL ETHERNET DRIVERS,
Jeroen de Borst, Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Mon, 12 Aug 2024 14:56:21 +0000 Joe Damato wrote:
> Several drivers make a check in their napi poll functions to determine
> if the CPU affinity of the IRQ has changed. If it has, the napi poll
> function returns a value less than the budget to force polling mode to
> be disabled, so that it can be rescheduled on the correct CPU next time
> the softirq is raised.
Any reason not to use the irq number already stored in napi_struct ?
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 0:17 ` [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers Jakub Kicinski
@ 2024-08-14 7:14 ` Joe Damato
2024-08-14 12:12 ` Joe Damato
0 siblings, 1 reply; 27+ messages in thread
From: Joe Damato @ 2024-08-14 7:14 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Daniel Borkmann, David S. Miller, Eric Dumazet,
Harshitha Ramamurthy, moderated list:INTEL ETHERNET DRIVERS,
Jeroen de Borst, Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Tue, Aug 13, 2024 at 05:17:10PM -0700, Jakub Kicinski wrote:
> On Mon, 12 Aug 2024 14:56:21 +0000 Joe Damato wrote:
> > Several drivers make a check in their napi poll functions to determine
> > if the CPU affinity of the IRQ has changed. If it has, the napi poll
> > function returns a value less than the budget to force polling mode to
> > be disabled, so that it can be rescheduled on the correct CPU next time
> > the softirq is raised.
>
> Any reason not to use the irq number already stored in napi_struct ?
Thanks for taking a look.
IIUC, that's possible if i40e, iavf, and gve are updated to call
netif_napi_set_irq first, which I could certainly do.
But as Stanislav points out, I would be adding a call to
irq_get_effective_affinity_mask in the hot path where one did not
exist before for 4 of 5 drivers.
In that case, it might make more sense to introduce:
bool napi_affinity_no_change(const struct cpumask *aff_mask)
instead and the drivers which have a cached mask can pass it in and
gve can be updated later to cache it.
Not sure how crucial avoiding the irq_get_effective_affinity_mask
call is; I would guess maybe some driver owners would object to
adding a new call in the hot path where one didn't exist before.
What do you think?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 7:14 ` Joe Damato
@ 2024-08-14 12:12 ` Joe Damato
2024-08-14 15:09 ` Jakub Kicinski
0 siblings, 1 reply; 27+ messages in thread
From: Joe Damato @ 2024-08-14 12:12 UTC (permalink / raw)
To: Jakub Kicinski, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, Aug 14, 2024 at 08:14:48AM +0100, Joe Damato wrote:
> On Tue, Aug 13, 2024 at 05:17:10PM -0700, Jakub Kicinski wrote:
> > On Mon, 12 Aug 2024 14:56:21 +0000 Joe Damato wrote:
> > > Several drivers make a check in their napi poll functions to determine
> > > if the CPU affinity of the IRQ has changed. If it has, the napi poll
> > > function returns a value less than the budget to force polling mode to
> > > be disabled, so that it can be rescheduled on the correct CPU next time
> > > the softirq is raised.
> >
> > Any reason not to use the irq number already stored in napi_struct ?
>
> Thanks for taking a look.
>
> IIUC, that's possible if i40e, iavf, and gve are updated to call
> netif_napi_set_irq first, which I could certainly do.
>
> But as Stanislav points out, I would be adding a call to
> irq_get_effective_affinity_mask in the hot path where one did not
> exist before for 4 of 5 drivers.
>
> In that case, it might make more sense to introduce:
>
> bool napi_affinity_no_change(const struct cpumask *aff_mask)
>
> instead and the drivers which have a cached mask can pass it in and
> gve can be updated later to cache it.
>
> Not sure how crucial avoiding the irq_get_effective_affinity_mask
> call is; I would guess maybe some driver owners would object to
> adding a new call in the hot path where one didn't exist before.
>
> What do you think?
Actually... how about a slightly different approach, which caches
the affinity mask in the core?
0. Extend napi struct to have a struct cpumask * field
1. extend netif_napi_set_irq to:
a. store the IRQ number in the napi struct (as you suggested)
b. call irq_get_effective_affinity_mask to store the mask in the
napi struct
c. set up generic affinity_notify.notify and
affinity_notify.release callbacks to update the in core mask
when it changes
2. add napi_affinity_no_change which now takes a napi_struct
3. cleanup all 5 drivers:
a. add calls to netif_napi_set_irq for all 5 (I think no RTNL
is needed, so I think this would be straight forward?)
b. remove all affinity_mask caching code in 4 of 5 drivers
c. update all 5 drivers to call napi_affinity_no_change in poll
Then ... anyone who adds support for netif_napi_set_irq to their
driver in the future gets automatic support in-core for
caching/updating of the mask? And in the future netdev-genl could
dump the mask since its in-core?
I'll mess around with that locally to see how it looks, but let me
know if that sounds like a better overall approach.
- Joe
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 12:12 ` Joe Damato
@ 2024-08-14 15:09 ` Jakub Kicinski
2024-08-14 15:19 ` Joe Damato
0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2024-08-14 15:09 UTC (permalink / raw)
To: Joe Damato
Cc: netdev, Daniel Borkmann, David S. Miller, Eric Dumazet,
Harshitha Ramamurthy, moderated list:INTEL ETHERNET DRIVERS,
Jeroen de Borst, Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote:
> Actually... how about a slightly different approach, which caches
> the affinity mask in the core?
I was gonna say :)
> 0. Extend napi struct to have a struct cpumask * field
>
> 1. extend netif_napi_set_irq to:
> a. store the IRQ number in the napi struct (as you suggested)
> b. call irq_get_effective_affinity_mask to store the mask in the
> napi struct
> c. set up generic affinity_notify.notify and
> affinity_notify.release callbacks to update the in core mask
> when it changes
This part I'm not an export on.
> 2. add napi_affinity_no_change which now takes a napi_struct
>
> 3. cleanup all 5 drivers:
> a. add calls to netif_napi_set_irq for all 5 (I think no RTNL
> is needed, so I think this would be straight forward?)
> b. remove all affinity_mask caching code in 4 of 5 drivers
> c. update all 5 drivers to call napi_affinity_no_change in poll
>
> Then ... anyone who adds support for netif_napi_set_irq to their
> driver in the future gets automatic support in-core for
> caching/updating of the mask? And in the future netdev-genl could
> dump the mask since its in-core?
>
> I'll mess around with that locally to see how it looks, but let me
> know if that sounds like a better overall approach.
Could we even handle this directly as part of __napi_poll(),
once the driver gives core all of the relevant pieces of information ?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 15:09 ` Jakub Kicinski
@ 2024-08-14 15:19 ` Joe Damato
2024-08-14 16:03 ` Shay Drori
0 siblings, 1 reply; 27+ messages in thread
From: Joe Damato @ 2024-08-14 15:19 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Daniel Borkmann, David S. Miller, Eric Dumazet,
Harshitha Ramamurthy, moderated list:INTEL ETHERNET DRIVERS,
Jeroen de Borst, Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, Aug 14, 2024 at 08:09:15AM -0700, Jakub Kicinski wrote:
> On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote:
> > Actually... how about a slightly different approach, which caches
> > the affinity mask in the core?
>
> I was gonna say :)
>
> > 0. Extend napi struct to have a struct cpumask * field
> >
> > 1. extend netif_napi_set_irq to:
> > a. store the IRQ number in the napi struct (as you suggested)
> > b. call irq_get_effective_affinity_mask to store the mask in the
> > napi struct
> > c. set up generic affinity_notify.notify and
> > affinity_notify.release callbacks to update the in core mask
> > when it changes
>
> This part I'm not an export on.
>
> > 2. add napi_affinity_no_change which now takes a napi_struct
> >
> > 3. cleanup all 5 drivers:
> > a. add calls to netif_napi_set_irq for all 5 (I think no RTNL
> > is needed, so I think this would be straight forward?)
> > b. remove all affinity_mask caching code in 4 of 5 drivers
> > c. update all 5 drivers to call napi_affinity_no_change in poll
> >
> > Then ... anyone who adds support for netif_napi_set_irq to their
> > driver in the future gets automatic support in-core for
> > caching/updating of the mask? And in the future netdev-genl could
> > dump the mask since its in-core?
> >
> > I'll mess around with that locally to see how it looks, but let me
> > know if that sounds like a better overall approach.
I ended up going with the approach laid out above; moving the IRQ
affinity mask updating code into the core (which adds that ability
to gve/mlx4/mlx5... it seems mlx4/5 cached but didn't have notifiers
setup to update the cached copy?) and adding calls to
netif_napi_set_irq in i40e/iavf and deleting their custom notifier
code.
It's almost ready for rfcv2; I think this approach is probably
better ?
> Could we even handle this directly as part of __napi_poll(),
> once the driver gives core all of the relevant pieces of information ?
I had been thinking the same thing, too, but it seems like at least
one driver (mlx5) counts the number of affinity changes to export as
a stat, so moving all of this to core would break that.
So, I may avoid attempting that for this series.
I'm still messing around with this but will send an rfcv2 in a bit.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 15:19 ` Joe Damato
@ 2024-08-14 16:03 ` Shay Drori
2024-08-14 18:01 ` Joe Damato
0 siblings, 1 reply; 27+ messages in thread
From: Shay Drori @ 2024-08-14 16:03 UTC (permalink / raw)
To: Joe Damato, Jakub Kicinski, netdev, Daniel Borkmann,
David S. Miller, Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On 14/08/2024 18:19, Joe Damato wrote:
> On Wed, Aug 14, 2024 at 08:09:15AM -0700, Jakub Kicinski wrote:
>> On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote:
>>> Actually... how about a slightly different approach, which caches
>>> the affinity mask in the core?
>>
>> I was gonna say :)
>>
>>> 0. Extend napi struct to have a struct cpumask * field
>>>
>>> 1. extend netif_napi_set_irq to:
>>> a. store the IRQ number in the napi struct (as you suggested)
>>> b. call irq_get_effective_affinity_mask to store the mask in the
>>> napi struct
>>> c. set up generic affinity_notify.notify and
>>> affinity_notify.release callbacks to update the in core mask
>>> when it changes
>>
>> This part I'm not an export on.
several net drivers (mlx5, mlx4, ice, ena and more) are using a feature
called ARFS (rmap)[1], and this feature is using the affinity notifier
mechanism.
Also, affinity notifier infra is supporting only a single notifier per
IRQ.
Hence, your suggestion (1.c) will break the ARFS feature.
[1] see irq_cpu_rmap_add()
>>
>>> 2. add napi_affinity_no_change which now takes a napi_struct
>>>
>>> 3. cleanup all 5 drivers:
>>> a. add calls to netif_napi_set_irq for all 5 (I think no RTNL
>>> is needed, so I think this would be straight forward?)
>>> b. remove all affinity_mask caching code in 4 of 5 drivers
>>> c. update all 5 drivers to call napi_affinity_no_change in poll
>>>
>>> Then ... anyone who adds support for netif_napi_set_irq to their
>>> driver in the future gets automatic support in-core for
>>> caching/updating of the mask? And in the future netdev-genl could
>>> dump the mask since its in-core?
>>>
>>> I'll mess around with that locally to see how it looks, but let me
>>> know if that sounds like a better overall approach.
>
> I ended up going with the approach laid out above; moving the IRQ
> affinity mask updating code into the core (which adds that ability
> to gve/mlx4/mlx5... it seems mlx4/5 cached but didn't have notifiers
> setup to update the cached copy?)
maybe This is probably due to what I wrote above..
> and adding calls to
> netif_napi_set_irq in i40e/iavf and deleting their custom notifier
> code.
>
> It's almost ready for rfcv2; I think this approach is probably
> better ?
>
>> Could we even handle this directly as part of __napi_poll(),
>> once the driver gives core all of the relevant pieces of information ?
>
> I had been thinking the same thing, too, but it seems like at least
> one driver (mlx5) counts the number of affinity changes to export as
> a stat, so moving all of this to core would break that.
>
> So, I may avoid attempting that for this series.
>
> I'm still messing around with this but will send an rfcv2 in a bit.
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 16:03 ` Shay Drori
@ 2024-08-14 18:01 ` Joe Damato
2024-08-15 0:20 ` Jakub Kicinski
0 siblings, 1 reply; 27+ messages in thread
From: Joe Damato @ 2024-08-14 18:01 UTC (permalink / raw)
To: Shay Drori
Cc: Jakub Kicinski, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, Aug 14, 2024 at 07:03:35PM +0300, Shay Drori wrote:
>
>
> On 14/08/2024 18:19, Joe Damato wrote:
> > On Wed, Aug 14, 2024 at 08:09:15AM -0700, Jakub Kicinski wrote:
> > > On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote:
> > > > Actually... how about a slightly different approach, which caches
> > > > the affinity mask in the core?
> > >
> > > I was gonna say :)
> > >
> > > > 0. Extend napi struct to have a struct cpumask * field
> > > >
> > > > 1. extend netif_napi_set_irq to:
> > > > a. store the IRQ number in the napi struct (as you suggested)
> > > > b. call irq_get_effective_affinity_mask to store the mask in the
> > > > napi struct
> > > > c. set up generic affinity_notify.notify and
> > > > affinity_notify.release callbacks to update the in core mask
> > > > when it changes
> > >
> > > This part I'm not an export on.
>
> several net drivers (mlx5, mlx4, ice, ena and more) are using a feature
> called ARFS (rmap)[1], and this feature is using the affinity notifier
> mechanism.
> Also, affinity notifier infra is supporting only a single notifier per
> IRQ.
>
> Hence, your suggestion (1.c) will break the ARFS feature.
>
> [1] see irq_cpu_rmap_add()
Thanks for taking a look and your reply.
I did notice ARFS use by some drivers and figured that might be why
the notifiers were being used in some cases.
I guess the question comes down to whether adding a call to
irq_get_effective_affinity_mask in the hot path is a bad idea.
If it is, then the only option is to have the drivers pass in their
IRQ affinity masks, as Stanislav suggested, to avoid adding that
call to the hot path.
If not, then the IRQ from napi_struct can be used and the affinity
mask can be generated on every napi poll. i40e/gve/iavf would need
calls to netif_napi_set_irq to set the IRQ mapping, which seems to
be straightforward.
In both cases: the IRQ notifier stuff would be left as is so that it
wouldn't break ARFS.
I suspect that the preferred solution would be to avoid adding that
call to the hot path, right?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-14 18:01 ` Joe Damato
@ 2024-08-15 0:20 ` Jakub Kicinski
2024-08-15 10:22 ` Joe Damato
0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2024-08-15 0:20 UTC (permalink / raw)
To: Joe Damato
Cc: Shay Drori, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote:
> If it is, then the only option is to have the drivers pass in their
> IRQ affinity masks, as Stanislav suggested, to avoid adding that
> call to the hot path.
>
> If not, then the IRQ from napi_struct can be used and the affinity
> mask can be generated on every napi poll. i40e/gve/iavf would need
> calls to netif_napi_set_irq to set the IRQ mapping, which seems to
> be straightforward.
It's a bit sad to have the generic solution blocked.
cpu_rmap_update() is exported. Maybe we can call it from our notifier?
rmap lives in struct net_device
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-15 0:20 ` Jakub Kicinski
@ 2024-08-15 10:22 ` Joe Damato
2024-08-20 6:40 ` Shay Drori
0 siblings, 1 reply; 27+ messages in thread
From: Joe Damato @ 2024-08-15 10:22 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Shay Drori, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao
On Wed, Aug 14, 2024 at 05:20:46PM -0700, Jakub Kicinski wrote:
> On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote:
> > If it is, then the only option is to have the drivers pass in their
> > IRQ affinity masks, as Stanislav suggested, to avoid adding that
> > call to the hot path.
> >
> > If not, then the IRQ from napi_struct can be used and the affinity
> > mask can be generated on every napi poll. i40e/gve/iavf would need
> > calls to netif_napi_set_irq to set the IRQ mapping, which seems to
> > be straightforward.
>
> It's a bit sad to have the generic solution blocked.
> cpu_rmap_update() is exported. Maybe we can call it from our notifier?
> rmap lives in struct net_device
I agree on the sadness. I will take a look today.
I guess if we were being really ambitious, we'd try to move ARFS
stuff into the core (as RSS was moved into the core).
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-15 10:22 ` Joe Damato
@ 2024-08-20 6:40 ` Shay Drori
2024-08-20 8:33 ` Joe Damato
0 siblings, 1 reply; 27+ messages in thread
From: Shay Drori @ 2024-08-20 6:40 UTC (permalink / raw)
To: Joe Damato, Jakub Kicinski, netdev, Daniel Borkmann,
David S. Miller, Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao,
Thomas Gleixner
On 15/08/2024 13:22, Joe Damato wrote:
> External email: Use caution opening links or attachments
>
>
> On Wed, Aug 14, 2024 at 05:20:46PM -0700, Jakub Kicinski wrote:
>> On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote:
>>> If it is, then the only option is to have the drivers pass in their
>>> IRQ affinity masks, as Stanislav suggested, to avoid adding that
>>> call to the hot path.
>>>
>>> If not, then the IRQ from napi_struct can be used and the affinity
>>> mask can be generated on every napi poll. i40e/gve/iavf would need
>>> calls to netif_napi_set_irq to set the IRQ mapping, which seems to
>>> be straightforward.
>>
>> It's a bit sad to have the generic solution blocked.
>> cpu_rmap_update() is exported. Maybe we can call it from our notifier?
>> rmap lives in struct net_device
>
> I agree on the sadness. I will take a look today.
>
> I guess if we were being really ambitious, we'd try to move ARFS
> stuff into the core (as RSS was moved into the core).
Sorry for the late reply. Maybe we can modify affinity notifier infra to
support more than a single notifier per IRQ.
@Thomas, do you know why only a single notifier per IRQ is supported?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC net-next 0/6] Cleanup IRQ affinity checks in several drivers
2024-08-20 6:40 ` Shay Drori
@ 2024-08-20 8:33 ` Joe Damato
0 siblings, 0 replies; 27+ messages in thread
From: Joe Damato @ 2024-08-20 8:33 UTC (permalink / raw)
To: Shay Drori
Cc: Jakub Kicinski, netdev, Daniel Borkmann, David S. Miller,
Eric Dumazet, Harshitha Ramamurthy,
moderated list:INTEL ETHERNET DRIVERS, Jeroen de Borst,
Jiri Pirko, Leon Romanovsky, open list,
open list:MELLANOX MLX4 core VPI driver, Lorenzo Bianconi,
Paolo Abeni, Praveen Kaligineedi, Przemek Kitszel, Saeed Mahameed,
Sebastian Andrzej Siewior, Shailend Chand, Tariq Toukan,
Tony Nguyen, Willem de Bruijn, Yishai Hadas, Ziwei Xiao,
Thomas Gleixner
On Tue, Aug 20, 2024 at 09:40:31AM +0300, Shay Drori wrote:
>
>
> On 15/08/2024 13:22, Joe Damato wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Aug 14, 2024 at 05:20:46PM -0700, Jakub Kicinski wrote:
> > > On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote:
> > > > If it is, then the only option is to have the drivers pass in their
> > > > IRQ affinity masks, as Stanislav suggested, to avoid adding that
> > > > call to the hot path.
> > > >
> > > > If not, then the IRQ from napi_struct can be used and the affinity
> > > > mask can be generated on every napi poll. i40e/gve/iavf would need
> > > > calls to netif_napi_set_irq to set the IRQ mapping, which seems to
> > > > be straightforward.
> > >
> > > It's a bit sad to have the generic solution blocked.
> > > cpu_rmap_update() is exported. Maybe we can call it from our notifier?
> > > rmap lives in struct net_device
> >
> > I agree on the sadness. I will take a look today.
> >
> > I guess if we were being really ambitious, we'd try to move ARFS
> > stuff into the core (as RSS was moved into the core).
>
>
> Sorry for the late reply. Maybe we can modify affinity notifier infra to
> support more than a single notifier per IRQ.
> @Thomas, do you know why only a single notifier per IRQ is supported?
Sorry for the delayed response as well on my side; I've been in
between lots of different kernel RFCs :)
Jakub: the issue seems to be that the internals in lib/cpu_rmap.c
are needed to call cpu_rmap_update. It's probably possible to expose
them somehow so that a generic IRQ notifier could call
cpu_rmap_update, as you mentioned, but some rewiring is going to be
needed, I think.
I had a couple ideas for rewiring stuff, but I haven't had time to
context switch back on to this work as I've been busy with a few
other things (the IRQ suspension stuff and another mlx5 thing I have
yet to send upstream).
I hope to take another look at it this week, but I welcome any
suggestions from Shay/Thomas in the meantime.
- Joe
^ permalink raw reply [flat|nested] 27+ messages in thread