* [PATCH v5 0/3] sched, net: NUMA-aware CPU spreading interface
@ 2022-10-21 12:19 Valentin Schneider
2022-10-21 12:19 ` [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Valentin Schneider @ 2022-10-21 12:19 UTC (permalink / raw)
To: netdev, linux-rdma, linux-kernel
Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Yury Norov, Andy Shevchenko,
Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman,
Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman,
Tariq Toukan, Jesse Brandeburg
Hi folks,
Tariq pointed out in [1] that drivers allocating IRQ vectors would benefit
from having smarter NUMA-awareness (cpumask_local_spread() doesn't quite cut
it).
The proposed interface involved an array of CPUs and a temporary cpumask, and
being my difficult self what I'm proposing here is an interface that doesn't
require any temporary storage other than some stack variables (at the cost of
one wild macro).
[1]: https://lore.kernel.org/all/20220728191203.4055-1-tariqt@nvidia.com/
Revisions
=========
v4 -> v5
++++++++
o Rebased onto 6.1-rc1
o Ditched the CPU iterator, moved to a cpumask iterator (Yury)
v3 -> v4
++++++++
o Rebased on top of Yury's bitmap-for-next
o Added Tariq's mlx5e patch
o Made sched_numa_hop_mask() return cpu_online_mask for the NUMA_NO_NODE &&
hops=0 case
v2 -> v3
++++++++
o Added for_each_cpu_and() and for_each_cpu_andnot() tests (Yury)
o New patches to fix issues raised by running the above
o New patch to use for_each_cpu_andnot() in sched/core.c (Yury)
v1 -> v2
++++++++
o Split _find_next_bit() @invert into @invert1 and @invert2 (Yury)
o Rebase onto v6.0-rc1
Cheers,
Valentin
Tariq Toukan (1):
net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity
hints
Valentin Schneider (2):
sched/topology: Introduce sched_numa_hop_mask()
sched/topology: Introduce for_each_numa_hop_mask()
drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 +++++++++--
include/linux/topology.h | 32 ++++++++++++++++++++
kernel/sched/topology.c | 31 +++++++++++++++++++
3 files changed, 79 insertions(+), 2 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() 2022-10-21 12:19 [PATCH v5 0/3] sched, net: NUMA-aware CPU spreading interface Valentin Schneider @ 2022-10-21 12:19 ` Valentin Schneider 2022-10-24 22:55 ` Yury Norov 2022-10-21 12:19 ` [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() Valentin Schneider 2022-10-21 12:19 ` [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider 2 siblings, 1 reply; 11+ messages in thread From: Valentin Schneider @ 2022-10-21 12:19 UTC (permalink / raw) To: netdev, linux-rdma, linux-kernel Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Tariq Toukan, Jesse Brandeburg Tariq has pointed out that drivers allocating IRQ vectors would benefit from having smarter NUMA-awareness - cpumask_local_spread() only knows about the local node and everything outside is in the same bucket. sched_domains_numa_masks is pretty much what we want to hand out (a cpumask of CPUs reachable within a given distance budget), introduce sched_numa_hop_mask() to export those cpumasks. Link: http://lore.kernel.org/r/20220728191203.4055-1-tariqt@nvidia.com Signed-off-by: Valentin Schneider <vschneid@redhat.com> --- include/linux/topology.h | 12 ++++++++++++ kernel/sched/topology.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/include/linux/topology.h b/include/linux/topology.h index 4564faafd0e12..3e91ae6d0ad58 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -245,5 +245,17 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu) return cpumask_of_node(cpu_to_node(cpu)); } +#ifdef CONFIG_NUMA +extern const struct cpumask *sched_numa_hop_mask(int node, int hops); +#else +static inline const struct cpumask *sched_numa_hop_mask(int node, int hops) +{ + if (node == NUMA_NO_NODE && !hops) + return cpu_online_mask; + + return ERR_PTR(-EOPNOTSUPP); +} +#endif /* CONFIG_NUMA */ + #endif /* _LINUX_TOPOLOGY_H */ diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 8739c2a5a54ea..e3cb8cc375204 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -2067,6 +2067,37 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu) return found; } +/** + * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from + * @node + * @node: The node to count hops from. + * @hops: Include CPUs up to that many hops away. 0 means local node. + * + * Return: On success, a pointer to a cpumask of CPUs at most @hops away from + * @node, an error value otherwise. + * + * Requires rcu_lock to be held. Returned cpumask is only valid within that + * read-side section, copy it if required beyond that. + * + * Note that not all hops are equal in distance; see sched_init_numa() for how + * distances and masks are handled. + * Also note that this is a reflection of sched_domains_numa_masks, which may change + * during the lifetime of the system (offline nodes are taken out of the masks). + */ +const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops) +{ + struct cpumask ***masks = rcu_dereference(sched_domains_numa_masks); + + if (node >= nr_node_ids || hops >= sched_domains_numa_levels) + return ERR_PTR(-EINVAL); + + if (!masks) + return ERR_PTR(-EBUSY); + + return masks[hops][node]; +} +EXPORT_SYMBOL_GPL(sched_numa_hop_mask); + #endif /* CONFIG_NUMA */ static int __sdt_alloc(const struct cpumask *cpu_map) -- 2.31.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() 2022-10-21 12:19 ` [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider @ 2022-10-24 22:55 ` Yury Norov 0 siblings, 0 replies; 11+ messages in thread From: Yury Norov @ 2022-10-24 22:55 UTC (permalink / raw) To: Valentin Schneider Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Tariq Toukan, Jesse Brandeburg On Fri, Oct 21, 2022 at 01:19:25PM +0100, Valentin Schneider wrote: > Tariq has pointed out that drivers allocating IRQ vectors would benefit > from having smarter NUMA-awareness - cpumask_local_spread() only knows > about the local node and everything outside is in the same bucket. Can you keep 1st-person references in a cover letter? > sched_domains_numa_masks is pretty much what we want to hand out (a cpumask > of CPUs reachable within a given distance budget), introduce > sched_numa_hop_mask() to export those cpumasks. > > Link: http://lore.kernel.org/r/20220728191203.4055-1-tariqt@nvidia.com > Signed-off-by: Valentin Schneider <vschneid@redhat.com> > --- > include/linux/topology.h | 12 ++++++++++++ > kernel/sched/topology.c | 31 +++++++++++++++++++++++++++++++ > 2 files changed, 43 insertions(+) > > diff --git a/include/linux/topology.h b/include/linux/topology.h > index 4564faafd0e12..3e91ae6d0ad58 100644 > --- a/include/linux/topology.h > +++ b/include/linux/topology.h > @@ -245,5 +245,17 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu) > return cpumask_of_node(cpu_to_node(cpu)); > } > > +#ifdef CONFIG_NUMA > +extern const struct cpumask *sched_numa_hop_mask(int node, int hops); > +#else > +static inline const struct cpumask *sched_numa_hop_mask(int node, int hops) > +{ > + if (node == NUMA_NO_NODE && !hops) > + return cpu_online_mask; > + > + return ERR_PTR(-EOPNOTSUPP); > +} > +#endif /* CONFIG_NUMA */ > + > > #endif /* _LINUX_TOPOLOGY_H */ > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 8739c2a5a54ea..e3cb8cc375204 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -2067,6 +2067,37 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu) > return found; > } > > +/** > + * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from > + * @node > + * @node: The node to count hops from. > + * @hops: Include CPUs up to that many hops away. 0 means local node. > + * > + * Return: On success, a pointer to a cpumask of CPUs at most @hops away from > + * @node, an error value otherwise. > + * > + * Requires rcu_lock to be held. Returned cpumask is only valid within that > + * read-side section, copy it if required beyond that. > + * > + * Note that not all hops are equal in distance; see sched_init_numa() for how > + * distances and masks are handled. > + * Also note that this is a reflection of sched_domains_numa_masks, which may change > + * during the lifetime of the system (offline nodes are taken out of the masks). > + */ > +const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops) > +{ > + struct cpumask ***masks = rcu_dereference(sched_domains_numa_masks); > + > + if (node >= nr_node_ids || hops >= sched_domains_numa_levels) > + return ERR_PTR(-EINVAL); Can you dereference rcu things after sanity checks? > + if (!masks) > + return ERR_PTR(-EBUSY); > + > + return masks[hops][node]; > +} > +EXPORT_SYMBOL_GPL(sched_numa_hop_mask); > + > #endif /* CONFIG_NUMA */ > > static int __sdt_alloc(const struct cpumask *cpu_map) > -- > 2.31.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() 2022-10-21 12:19 [PATCH v5 0/3] sched, net: NUMA-aware CPU spreading interface Valentin Schneider 2022-10-21 12:19 ` [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider @ 2022-10-21 12:19 ` Valentin Schneider 2022-10-21 13:16 ` Andy Shevchenko 2022-10-21 12:19 ` [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider 2 siblings, 1 reply; 11+ messages in thread From: Valentin Schneider @ 2022-10-21 12:19 UTC (permalink / raw) To: netdev, linux-rdma, linux-kernel Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Tariq Toukan, Jesse Brandeburg The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs reachable within a given distance budget, wrap the logic for iterating over all (distance, mask) values inside an iterator macro. Signed-off-by: Valentin Schneider <vschneid@redhat.com> --- include/linux/topology.h | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/include/linux/topology.h b/include/linux/topology.h index 3e91ae6d0ad58..8185e12ec1ccc 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -246,16 +246,36 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu) } #ifdef CONFIG_NUMA -extern const struct cpumask *sched_numa_hop_mask(int node, int hops); +extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops); #else -static inline const struct cpumask *sched_numa_hop_mask(int node, int hops) +static inline const struct cpumask * +sched_numa_hop_mask(unsigned int node, unsigned int hops) { - if (node == NUMA_NO_NODE && !hops) - return cpu_online_mask; - return ERR_PTR(-EOPNOTSUPP); } #endif /* CONFIG_NUMA */ +/** + * for_each_numa_hop_mask - iterate over cpumasks of increasing NUMA distance + * from a given node. + * @mask: the iteration variable. + * @node: the NUMA node to start the search from. + * + * Requires rcu_lock to be held. + * + * Yields cpu_online_mask for @node == NUMA_NO_NODE. + */ +#define for_each_numa_hop_mask(mask, node) \ + for (unsigned int __hops = 0; \ + /* \ + * Unsightly trickery required as we can't both initialize \ + * @mask and declare __hops in for()'s first clause \ + */ \ + mask = __hops > 0 ? mask : \ + node == NUMA_NO_NODE ? \ + cpu_online_mask : sched_numa_hop_mask(node, 0), \ + !IS_ERR_OR_NULL(mask); \ + __hops++, \ + mask = sched_numa_hop_mask(node, __hops)) #endif /* _LINUX_TOPOLOGY_H */ -- 2.31.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() 2022-10-21 12:19 ` [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() Valentin Schneider @ 2022-10-21 13:16 ` Andy Shevchenko 2022-10-21 13:34 ` Andy Shevchenko 2022-10-21 13:57 ` Valentin Schneider 0 siblings, 2 replies; 11+ messages in thread From: Andy Shevchenko @ 2022-10-21 13:16 UTC (permalink / raw) To: Valentin Schneider Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Tariq Toukan, Jesse Brandeburg On Fri, Oct 21, 2022 at 01:19:26PM +0100, Valentin Schneider wrote: > The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs > reachable within a given distance budget, wrap the logic for iterating over > all (distance, mask) values inside an iterator macro. ... > #ifdef CONFIG_NUMA > -extern const struct cpumask *sched_numa_hop_mask(int node, int hops); > +extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops); > #else > -static inline const struct cpumask *sched_numa_hop_mask(int node, int hops) > +static inline const struct cpumask * > +sched_numa_hop_mask(unsigned int node, unsigned int hops) > { > - if (node == NUMA_NO_NODE && !hops) > - return cpu_online_mask; > - > return ERR_PTR(-EOPNOTSUPP); > } > #endif /* CONFIG_NUMA */ I didn't get how the above two changes are related to the 3rd one which introduces a for_each type of macro. If you need change int --> unsigned int, perhaps it can be done in a separate patch. The change inside inliner I dunno about. Not an expert. ... > +#define for_each_numa_hop_mask(mask, node) \ > + for (unsigned int __hops = 0; \ > + /* \ > + * Unsightly trickery required as we can't both initialize \ > + * @mask and declare __hops in for()'s first clause \ > + */ \ > + mask = __hops > 0 ? mask : \ > + node == NUMA_NO_NODE ? \ > + cpu_online_mask : sched_numa_hop_mask(node, 0), \ > + !IS_ERR_OR_NULL(mask); \ > + __hops++, \ > + mask = sched_numa_hop_mask(node, __hops)) This can be unified with conditional, see for_each_gpio_desc_with_flag() as example how. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() 2022-10-21 13:16 ` Andy Shevchenko @ 2022-10-21 13:34 ` Andy Shevchenko 2022-10-21 14:06 ` Valentin Schneider 2022-10-21 13:57 ` Valentin Schneider 1 sibling, 1 reply; 11+ messages in thread From: Andy Shevchenko @ 2022-10-21 13:34 UTC (permalink / raw) To: Valentin Schneider Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Tariq Toukan, Jesse Brandeburg On Fri, Oct 21, 2022 at 04:16:17PM +0300, Andy Shevchenko wrote: > On Fri, Oct 21, 2022 at 01:19:26PM +0100, Valentin Schneider wrote: ... > > +#define for_each_numa_hop_mask(mask, node) \ > > + for (unsigned int __hops = 0; \ > > + /* \ > > + * Unsightly trickery required as we can't both initialize \ > > + * @mask and declare __hops in for()'s first clause \ > > + */ \ > > + mask = __hops > 0 ? mask : \ > > + node == NUMA_NO_NODE ? \ > > + cpu_online_mask : sched_numa_hop_mask(node, 0), \ > > + !IS_ERR_OR_NULL(mask); \ > > > + __hops++, \ > > + mask = sched_numa_hop_mask(node, __hops)) > > This can be unified with conditional, see for_each_gpio_desc_with_flag() as > example how. Something like mask = (__hops || node != NUMA_NO_NODE) ? sched_numa_hop_mask(node, __hops) : cpu_online_mask -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() 2022-10-21 13:34 ` Andy Shevchenko @ 2022-10-21 14:06 ` Valentin Schneider 0 siblings, 0 replies; 11+ messages in thread From: Valentin Schneider @ 2022-10-21 14:06 UTC (permalink / raw) To: Andy Shevchenko Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Tariq Toukan, Jesse Brandeburg On 21/10/22 16:34, Andy Shevchenko wrote: > On Fri, Oct 21, 2022 at 04:16:17PM +0300, Andy Shevchenko wrote: >> On Fri, Oct 21, 2022 at 01:19:26PM +0100, Valentin Schneider wrote: > > ... > >> > +#define for_each_numa_hop_mask(mask, node) \ >> > + for (unsigned int __hops = 0; \ >> > + /* \ >> > + * Unsightly trickery required as we can't both initialize \ >> > + * @mask and declare __hops in for()'s first clause \ >> > + */ \ >> > + mask = __hops > 0 ? mask : \ >> > + node == NUMA_NO_NODE ? \ >> > + cpu_online_mask : sched_numa_hop_mask(node, 0), \ >> > + !IS_ERR_OR_NULL(mask); \ >> >> > + __hops++, \ >> > + mask = sched_numa_hop_mask(node, __hops)) >> >> This can be unified with conditional, see for_each_gpio_desc_with_flag() as >> example how. > > Something like > > mask = (__hops || node != NUMA_NO_NODE) ? sched_numa_hop_mask(node, __hops) : cpu_online_mask > That does simplify things somewhat, thanks! > -- > With Best Regards, > Andy Shevchenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() 2022-10-21 13:16 ` Andy Shevchenko 2022-10-21 13:34 ` Andy Shevchenko @ 2022-10-21 13:57 ` Valentin Schneider 1 sibling, 0 replies; 11+ messages in thread From: Valentin Schneider @ 2022-10-21 13:57 UTC (permalink / raw) To: Andy Shevchenko Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Tariq Toukan, Jesse Brandeburg On 21/10/22 16:16, Andy Shevchenko wrote: > On Fri, Oct 21, 2022 at 01:19:26PM +0100, Valentin Schneider wrote: >> The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs >> reachable within a given distance budget, wrap the logic for iterating over >> all (distance, mask) values inside an iterator macro. > > ... > >> #ifdef CONFIG_NUMA >> -extern const struct cpumask *sched_numa_hop_mask(int node, int hops); >> +extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops); >> #else >> -static inline const struct cpumask *sched_numa_hop_mask(int node, int hops) >> +static inline const struct cpumask * >> +sched_numa_hop_mask(unsigned int node, unsigned int hops) >> { >> - if (node == NUMA_NO_NODE && !hops) >> - return cpu_online_mask; >> - >> return ERR_PTR(-EOPNOTSUPP); >> } >> #endif /* CONFIG_NUMA */ > > I didn't get how the above two changes are related to the 3rd one which > introduces a for_each type of macro. > > If you need change int --> unsigned int, perhaps it can be done in a separate > patch. > > The change inside inliner I dunno about. Not an expert. > That's a rebase fail, this should all be in the first patch, my bad. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints 2022-10-21 12:19 [PATCH v5 0/3] sched, net: NUMA-aware CPU spreading interface Valentin Schneider 2022-10-21 12:19 ` [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider 2022-10-21 12:19 ` [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() Valentin Schneider @ 2022-10-21 12:19 ` Valentin Schneider 2022-10-24 11:24 ` Tariq Toukan 2022-10-24 23:17 ` Yury Norov 2 siblings, 2 replies; 11+ messages in thread From: Valentin Schneider @ 2022-10-21 12:19 UTC (permalink / raw) To: netdev, linux-rdma, linux-kernel Cc: Tariq Toukan, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Jesse Brandeburg From: Tariq Toukan <tariqt@nvidia.com> In the IRQ affinity hints, replace the binary NUMA preference (local / remote) with the improved for_each_numa_hop_cpu() API that minds the actual distances, so that remote NUMAs with short distance are preferred over farther ones. This has significant performance implications when using NUMA-aware allocated memory (follow [1] and derivatives for example). [1] drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel() int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); Performance tests: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ | | BW (Gbps) | TX side CPU util | RX side CPU util | +-------------------------+-----------+------------------+------------------+ | Baseline | 52.3 | 6.4 % | 17.9 % | +-------------------------+-----------+------------------+------------------+ | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | +-------------------------+-----------+------------------+------------------+ | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | +-------------------------+-----------+------------------+------------------+ | Applied on both sides | 95.1 | 8.4 % | 27.3 % | +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. * CPU util on active cores only. Setups details (similar for both sides): NIC: ConnectX6-DX dual port, 100 Gbps each. Single port used in the tests. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 256 On-line CPU(s) list: 0-255 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 16 Vendor ID: AuthenticAMD CPU family: 25 Model: 1 Model name: AMD EPYC 7763 64-Core Processor Stepping: 1 CPU MHz: 2594.804 BogoMIPS: 4890.73 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 32768K NUMA node0 CPU(s): 0-7,128-135 NUMA node1 CPU(s): 8-15,136-143 NUMA node2 CPU(s): 16-23,144-151 NUMA node3 CPU(s): 24-31,152-159 NUMA node4 CPU(s): 32-39,160-167 NUMA node5 CPU(s): 40-47,168-175 NUMA node6 CPU(s): 48-55,176-183 NUMA node7 CPU(s): 56-63,184-191 NUMA node8 CPU(s): 64-71,192-199 NUMA node9 CPU(s): 72-79,200-207 NUMA node10 CPU(s): 80-87,208-215 NUMA node11 CPU(s): 88-95,216-223 NUMA node12 CPU(s): 96-103,224-231 NUMA node13 CPU(s): 104-111,232-239 NUMA node14 CPU(s): 112-119,240-247 NUMA node15 CPU(s): 120-127,248-255 .. $ numactl -H .. node distances: node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: 10 11 11 11 12 12 12 12 32 32 32 32 32 32 32 32 1: 11 10 11 11 12 12 12 12 32 32 32 32 32 32 32 32 2: 11 11 10 11 12 12 12 12 32 32 32 32 32 32 32 32 3: 11 11 11 10 12 12 12 12 32 32 32 32 32 32 32 32 4: 12 12 12 12 10 11 11 11 32 32 32 32 32 32 32 32 5: 12 12 12 12 11 10 11 11 32 32 32 32 32 32 32 32 6: 12 12 12 12 11 11 10 11 32 32 32 32 32 32 32 32 7: 12 12 12 12 11 11 11 10 32 32 32 32 32 32 32 32 8: 32 32 32 32 32 32 32 32 10 11 11 11 12 12 12 12 9: 32 32 32 32 32 32 32 32 11 10 11 11 12 12 12 12 10: 32 32 32 32 32 32 32 32 11 11 10 11 12 12 12 12 11: 32 32 32 32 32 32 32 32 11 11 11 10 12 12 12 12 12: 32 32 32 32 32 32 32 32 12 12 12 12 10 11 11 11 13: 32 32 32 32 32 32 32 32 12 12 12 12 11 10 11 11 14: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 10 11 15: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 11 10 $ cat /sys/class/net/ens5f0/device/numa_node 14 Affinity hints (127 IRQs): Before: 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002 349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004 350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008 351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010 352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020 353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040 354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080 355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100 356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200 357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400 358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800 359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000 360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000 361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000 362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000 363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000 364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000 365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000 366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000 367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000 368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000 369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000 370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000 371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000 372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000 373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000 374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000 375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000 376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000 377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000 378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000 379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000 380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000 381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000 382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000 383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000 384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000 385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000 386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000 387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000 388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000 389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000 390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000 391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000 392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000 393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000 394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000 395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000 396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000 397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000 398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000 399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000 400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000 401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000 402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000 403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000 404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000 405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000 406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000 407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000 408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000 409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000 410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000 411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 After: 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000 363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000 364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000 365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000 366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000 367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000 368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000 369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000 370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000 371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000 372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000 373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000 374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000 375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000 376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000 377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000 378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000 379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000 380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000 381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000 382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000 383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000 428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000 429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000 430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000 431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000 432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000 433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000 434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000 435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000 436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000 437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000 438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000 439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000 440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000 441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000 442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000 443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000 444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000 445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000 446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000 447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000 448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000 449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000 450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000 451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000 452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000 453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000 454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000 455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000 456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000 457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000 Signed-off-by: Tariq Toukan <tariqt@nvidia.com> [Tweaked API use] Signed-off-by: Valentin Schneider <vschneid@redhat.com> --- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index a0242dc15741c..7acbeb3d51846 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev) static int comp_irqs_request(struct mlx5_core_dev *dev) { struct mlx5_eq_table *table = dev->priv.eq_table; + const struct cpumask *prev = cpu_none_mask; + const struct cpumask *mask; int ncomp_eqs = table->num_comp_eqs; u16 *cpus; int ret; + int cpu; int i; ncomp_eqs = table->num_comp_eqs; @@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev) ret = -ENOMEM; goto free_irqs; } - for (i = 0; i < ncomp_eqs; i++) - cpus[i] = cpumask_local_spread(i, dev->priv.numa_node); + + i = 0; + rcu_read_lock(); + for_each_numa_hop_mask(mask, dev->priv.numa_node) { + for_each_cpu_andnot(cpu, mask, prev) { + cpus[i] = cpu; + if (++i == ncomp_eqs) + goto spread_done; + } + prev = mask; + } +spread_done: + rcu_read_unlock(); ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs); kfree(cpus); if (ret < 0) -- 2.31.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints 2022-10-21 12:19 ` [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider @ 2022-10-24 11:24 ` Tariq Toukan 2022-10-24 23:17 ` Yury Norov 1 sibling, 0 replies; 11+ messages in thread From: Tariq Toukan @ 2022-10-24 11:24 UTC (permalink / raw) To: Valentin Schneider, netdev, linux-rdma, linux-kernel Cc: Tariq Toukan, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Yury Norov, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Jesse Brandeburg On 10/21/2022 3:19 PM, Valentin Schneider wrote: > From: Tariq Toukan <tariqt@nvidia.com> > > In the IRQ affinity hints, replace the binary NUMA preference (local / > remote) with the improved for_each_numa_hop_cpu() API that minds the > actual distances, so that remote NUMAs with short distance are preferred > over farther ones. > > This has significant performance implications when using NUMA-aware > allocated memory (follow [1] and derivatives for example). > > [1] > drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel() > int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); > > Performance tests: > > TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). > Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 > > +-------------------------+-----------+------------------+------------------+ > | | BW (Gbps) | TX side CPU util | RX side CPU util | > +-------------------------+-----------+------------------+------------------+ > | Baseline | 52.3 | 6.4 % | 17.9 % | > +-------------------------+-----------+------------------+------------------+ > | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | > +-------------------------+-----------+------------------+------------------+ > | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | > +-------------------------+-----------+------------------+------------------+ > | Applied on both sides | 95.1 | 8.4 % | 27.3 % | > +-------------------------+-----------+------------------+------------------+ > > Bottleneck in RX side is released, reached linerate (~1.8x speedup). > ~30% less cpu util on TX. > > * CPU util on active cores only. > > Setups details (similar for both sides): > > NIC: ConnectX6-DX dual port, 100 Gbps each. > Single port used in the tests. > > $ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 256 > On-line CPU(s) list: 0-255 > Thread(s) per core: 2 > Core(s) per socket: 64 > Socket(s): 2 > NUMA node(s): 16 > Vendor ID: AuthenticAMD > CPU family: 25 > Model: 1 > Model name: AMD EPYC 7763 64-Core Processor > Stepping: 1 > CPU MHz: 2594.804 > BogoMIPS: 4890.73 > Virtualization: AMD-V > L1d cache: 32K > L1i cache: 32K > L2 cache: 512K > L3 cache: 32768K > NUMA node0 CPU(s): 0-7,128-135 > NUMA node1 CPU(s): 8-15,136-143 > NUMA node2 CPU(s): 16-23,144-151 > NUMA node3 CPU(s): 24-31,152-159 > NUMA node4 CPU(s): 32-39,160-167 > NUMA node5 CPU(s): 40-47,168-175 > NUMA node6 CPU(s): 48-55,176-183 > NUMA node7 CPU(s): 56-63,184-191 > NUMA node8 CPU(s): 64-71,192-199 > NUMA node9 CPU(s): 72-79,200-207 > NUMA node10 CPU(s): 80-87,208-215 > NUMA node11 CPU(s): 88-95,216-223 > NUMA node12 CPU(s): 96-103,224-231 > NUMA node13 CPU(s): 104-111,232-239 > NUMA node14 CPU(s): 112-119,240-247 > NUMA node15 CPU(s): 120-127,248-255 > .. ... > > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > [Tweaked API use] Thanks for your modification. It looks good to me. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > Signed-off-by: Valentin Schneider <vschneid@redhat.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c > index a0242dc15741c..7acbeb3d51846 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c > @@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev) > static int comp_irqs_request(struct mlx5_core_dev *dev) > { > struct mlx5_eq_table *table = dev->priv.eq_table; > + const struct cpumask *prev = cpu_none_mask; > + const struct cpumask *mask; > int ncomp_eqs = table->num_comp_eqs; > u16 *cpus; > int ret; > + int cpu; > int i; > > ncomp_eqs = table->num_comp_eqs; > @@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev) > ret = -ENOMEM; > goto free_irqs; > } > - for (i = 0; i < ncomp_eqs; i++) > - cpus[i] = cpumask_local_spread(i, dev->priv.numa_node); > + > + i = 0; > + rcu_read_lock(); > + for_each_numa_hop_mask(mask, dev->priv.numa_node) { > + for_each_cpu_andnot(cpu, mask, prev) { > + cpus[i] = cpu; > + if (++i == ncomp_eqs) > + goto spread_done; > + } > + prev = mask; > + } > +spread_done: > + rcu_read_unlock(); > ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs); > kfree(cpus); > if (ret < 0) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints 2022-10-21 12:19 ` [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider 2022-10-24 11:24 ` Tariq Toukan @ 2022-10-24 23:17 ` Yury Norov 1 sibling, 0 replies; 11+ messages in thread From: Yury Norov @ 2022-10-24 23:17 UTC (permalink / raw) To: Valentin Schneider Cc: netdev, linux-rdma, linux-kernel, Tariq Toukan, Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman, Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman, Jesse Brandeburg On Fri, Oct 21, 2022 at 01:19:27PM +0100, Valentin Schneider wrote: > From: Tariq Toukan <tariqt@nvidia.com> > > In the IRQ affinity hints, replace the binary NUMA preference (local / > remote) with the improved for_each_numa_hop_cpu() API that minds the > actual distances, so that remote NUMAs with short distance are preferred > over farther ones. > > This has significant performance implications when using NUMA-aware > allocated memory (follow [1] and derivatives for example). > > [1] > drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel() > int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); > > Performance tests: > > TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). > Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 > > +-------------------------+-----------+------------------+------------------+ > | | BW (Gbps) | TX side CPU util | RX side CPU util | > +-------------------------+-----------+------------------+------------------+ > | Baseline | 52.3 | 6.4 % | 17.9 % | > +-------------------------+-----------+------------------+------------------+ > | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | > +-------------------------+-----------+------------------+------------------+ > | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | > +-------------------------+-----------+------------------+------------------+ > | Applied on both sides | 95.1 | 8.4 % | 27.3 % | > +-------------------------+-----------+------------------+------------------+ > > Bottleneck in RX side is released, reached linerate (~1.8x speedup). > ~30% less cpu util on TX. > > * CPU util on active cores only. > > Setups details (similar for both sides): > > NIC: ConnectX6-DX dual port, 100 Gbps each. > Single port used in the tests. > > $ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 256 > On-line CPU(s) list: 0-255 > Thread(s) per core: 2 > Core(s) per socket: 64 > Socket(s): 2 > NUMA node(s): 16 > Vendor ID: AuthenticAMD > CPU family: 25 > Model: 1 > Model name: AMD EPYC 7763 64-Core Processor > Stepping: 1 > CPU MHz: 2594.804 > BogoMIPS: 4890.73 > Virtualization: AMD-V > L1d cache: 32K > L1i cache: 32K > L2 cache: 512K > L3 cache: 32768K > NUMA node0 CPU(s): 0-7,128-135 > NUMA node1 CPU(s): 8-15,136-143 > NUMA node2 CPU(s): 16-23,144-151 > NUMA node3 CPU(s): 24-31,152-159 > NUMA node4 CPU(s): 32-39,160-167 > NUMA node5 CPU(s): 40-47,168-175 > NUMA node6 CPU(s): 48-55,176-183 > NUMA node7 CPU(s): 56-63,184-191 > NUMA node8 CPU(s): 64-71,192-199 > NUMA node9 CPU(s): 72-79,200-207 > NUMA node10 CPU(s): 80-87,208-215 > NUMA node11 CPU(s): 88-95,216-223 > NUMA node12 CPU(s): 96-103,224-231 > NUMA node13 CPU(s): 104-111,232-239 > NUMA node14 CPU(s): 112-119,240-247 > NUMA node15 CPU(s): 120-127,248-255 > .. > > $ numactl -H > .. > node distances: > node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > 0: 10 11 11 11 12 12 12 12 32 32 32 32 32 32 32 32 > 1: 11 10 11 11 12 12 12 12 32 32 32 32 32 32 32 32 > 2: 11 11 10 11 12 12 12 12 32 32 32 32 32 32 32 32 > 3: 11 11 11 10 12 12 12 12 32 32 32 32 32 32 32 32 > 4: 12 12 12 12 10 11 11 11 32 32 32 32 32 32 32 32 > 5: 12 12 12 12 11 10 11 11 32 32 32 32 32 32 32 32 > 6: 12 12 12 12 11 11 10 11 32 32 32 32 32 32 32 32 > 7: 12 12 12 12 11 11 11 10 32 32 32 32 32 32 32 32 > 8: 32 32 32 32 32 32 32 32 10 11 11 11 12 12 12 12 > 9: 32 32 32 32 32 32 32 32 11 10 11 11 12 12 12 12 > 10: 32 32 32 32 32 32 32 32 11 11 10 11 12 12 12 12 > 11: 32 32 32 32 32 32 32 32 11 11 11 10 12 12 12 12 > 12: 32 32 32 32 32 32 32 32 12 12 12 12 10 11 11 11 > 13: 32 32 32 32 32 32 32 32 12 12 12 12 11 10 11 11 > 14: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 10 11 > 15: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 11 10 > > $ cat /sys/class/net/ens5f0/device/numa_node > 14 > > Affinity hints (127 IRQs): > Before: > 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 > 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 > 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 > 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 > 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 > 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 > 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 > 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 > 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 > 348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002 > 349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004 > 350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008 > 351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010 > 352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020 > 353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040 > 354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080 > 355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100 > 356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200 > 357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400 > 358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800 > 359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000 > 360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000 > 361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000 > 362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000 > 363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000 > 364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000 > 365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000 > 366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000 > 367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000 > 368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000 > 369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000 > 370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000 > 371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000 > 372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000 > 373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000 > 374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000 > 375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000 > 376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000 > 377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000 > 378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000 > 379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000 > 380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000 > 381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000 > 382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000 > 383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000 > 384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000 > 385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000 > 386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000 > 387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000 > 388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000 > 389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000 > 390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000 > 391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000 > 392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000 > 393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000 > 394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000 > 395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000 > 396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000 > 397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000 > 398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000 > 399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000 > 400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000 > 401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000 > 402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000 > 403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000 > 404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000 > 405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000 > 406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000 > 407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000 > 408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000 > 409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000 > 410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000 > 411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 > 412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 > 413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 > 414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 > 415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 > 416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 > 417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 > 418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 > 419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 > 420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 > 421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 > 422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 > 423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 > 424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 > 425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 > 426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 > 427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 > 428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 > 429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 > 430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 > 431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 > 432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 > 433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 > 434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 > 435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 > 436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 > 437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 > 438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 > 439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 > 440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 > 441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 > 442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 > 443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 > 444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 > 445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 > 446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 > 447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 > 448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 > 449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 > 450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 > 451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 > 452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 > 453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 > 454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 > 455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 > 456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 > 457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 > > After: > 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 > 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 > 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 > 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 > 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 > 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 > 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 > 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 > 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 > 348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 > 349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 > 350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 > 351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 > 352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 > 353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 > 354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 > 355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 > 356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 > 357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 > 358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 > 359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 > 360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 > 361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 > 362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000 > 363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000 > 364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000 > 365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000 > 366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000 > 367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000 > 368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000 > 369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000 > 370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000 > 371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > 395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 > 396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 > 397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 > 398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 > 399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 > 400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 > 401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 > 402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 > 403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 > 404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 > 405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 > 406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 > 407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 > 408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 > 409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 > 410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 > 411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 > 412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 > 413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 > 414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 > 415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 > 416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 > 417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 > 418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 > 419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 > 420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 > 421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 > 422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 > 423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 > 424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 > 425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 > 426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 > 427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000 > 428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000 > 429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000 > 430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000 > 431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000 > 432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000 > 433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000 > 434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000 > 435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000 > 436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000 > 437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000 > 438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000 > 439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000 > 440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000 > 441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000 > 442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000 > 443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000 > 444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000 > 445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000 > 446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000 > 447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000 > 448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000 > 449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000 > 450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000 > 451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000 > 452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000 > 453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000 > 454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000 > 455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000 > 456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000 > 457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000 > > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > [Tweaked API use] > Signed-off-by: Valentin Schneider <vschneid@redhat.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c > index a0242dc15741c..7acbeb3d51846 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c > @@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev) > static int comp_irqs_request(struct mlx5_core_dev *dev) > { > struct mlx5_eq_table *table = dev->priv.eq_table; > + const struct cpumask *prev = cpu_none_mask; > + const struct cpumask *mask; > int ncomp_eqs = table->num_comp_eqs; > u16 *cpus; > int ret; > + int cpu; > int i; > > ncomp_eqs = table->num_comp_eqs; > @@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev) > ret = -ENOMEM; > goto free_irqs; > } > - for (i = 0; i < ncomp_eqs; i++) > - cpus[i] = cpumask_local_spread(i, dev->priv.numa_node); > + > + i = 0; > + rcu_read_lock(); > + for_each_numa_hop_mask(mask, dev->priv.numa_node) { > + for_each_cpu_andnot(cpu, mask, prev) { > + cpus[i] = cpu; > + if (++i == ncomp_eqs) > + goto spread_done; > + } > + prev = mask; > + } I think it was me who suggested splitting the for_each_numa_hop_cpu() from v4 to for_each_cpu_andnot() and for_each_numa_hop_mask() in email from Sep 25. So, for this part: Suggested-by: Yury Norov <yury.norov@gmail.com> I'm also glad to see that anonymous structure disappeared. Nice work. For the series: Reviewed-by: Yury Norov <yury.norov@gmail.com> > +spread_done: > + rcu_read_unlock(); > ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs); > kfree(cpus); > if (ret < 0) > -- > 2.31.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-10-25 0:46 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-21 12:19 [PATCH v5 0/3] sched, net: NUMA-aware CPU spreading interface Valentin Schneider 2022-10-21 12:19 ` [PATCH v5 1/3] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider 2022-10-24 22:55 ` Yury Norov 2022-10-21 12:19 ` [PATCH v5 2/3] sched/topology: Introduce for_each_numa_hop_mask() Valentin Schneider 2022-10-21 13:16 ` Andy Shevchenko 2022-10-21 13:34 ` Andy Shevchenko 2022-10-21 14:06 ` Valentin Schneider 2022-10-21 13:57 ` Valentin Schneider 2022-10-21 12:19 ` [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider 2022-10-24 11:24 ` Tariq Toukan 2022-10-24 23:17 ` Yury Norov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox