[PATCH v2 0/8] sched/topology: add for_each_numa

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
@ 2023-04-30 17:18 Yury Norov
  2023-04-30 17:18 ` [PATCH v3 1/8] sched: fix sched_numa_find_nth_cpu() in non-NUMA case Yury Norov
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in kernel, and it's beneficial to create
a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to it is not an easy process.

This series adds for_each_numa_cpu(), which is designed to be similar to
the for_each_cpu(). It allows to convert existing code to NUMA-aware as
simple as adding a hop iterator variable and passing it inside new macro.
for_each_numa_cpu() takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

        for_each_numa_hop_mask(cpus, prev, node) {
                for_each_cpu_andnot(cpu, cpus, prev) {
                        if (cnt++ == max_num)
                                goto out;
                        do_something(cpu);
                }
                prev = cpus;
        }

With the new macro, it has a more standard look, like this:

        for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

        cpumask_t cpus = get_mask();
        int cnt = 0, cpu;

        for_each_cpu(cpu, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Converting it to NUMA-aware version would be as simple as:

        cpumask_t cpus = get_mask();
        int node = get_node();
        int cnt = 0, hop, cpu;

        for_each_numa_cpu(cpu, hop, node, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

v2: https://lore.kernel.org/netdev/ZD3l6FBnUh9vTIGc@yury-ThinkPad/T/
v3:
 - fix sched_numa_find_{next,nth}_cpu() when CONFIG_NUMA is off to
   only traverse online CPUs;
 - don't export sched_domains_numa_levels for testing purposes. In
   the test, use for_each_node() macro;
 - extend the test for for_each_node();
 - in comments, mention that only online CPUs are traversed;
 - rebase on top of 6.3. 

Yury Norov (8):
  sched: fix sched_numa_find_nth_cpu() in non-NUMA case
  lib/find: add find_next_and_andnot_bit()
  sched/topology: introduce sched_numa_find_next_cpu()
  sched/topology: add for_each_numa_{,online}_cpu() macro
  net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  lib/cpumask: update comment to cpumask_local_spread()
  sched: drop for_each_numa_hop_mask()
  lib: test for_each_numa_cpus()

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 ++---
 include/linux/find.h                         | 43 ++++++++++++
 include/linux/topology.h                     | 40 ++++++-----
 kernel/sched/topology.c                      | 53 ++++++++-------
 lib/cpumask.c                                |  7 +-
 lib/find_bit.c                               | 12 ++++
 lib/test_bitmap.c                            | 70 +++++++++++++++++++-
 7 files changed, 183 insertions(+), 58 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/8] sched: fix sched_numa_find_nth_cpu() in non-NUMA case
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
@ 2023-04-30 17:18 ` Yury Norov
  2023-04-30 17:18 ` [PATCH v3 2/8] lib/find: add find_next_and_andnot_bit() Yury Norov
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

When CONFIG_NUMA is enabled, sched_numa_find_nth_cpu() searches for a
CPU in sched_domains_numa_masks. The masks includes only online CPUs,
so effectively offline CPUs are skipped.

When CONFIG_NUMA is disabled, the fallback function should be consistent.

Fixes: cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index fea32377f7c7..52f5850730b3 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -251,7 +251,7 @@ extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int
 #else
 static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 {
-	return cpumask_nth(cpu, cpus);
+	return cpumask_nth_and(cpu, cpus, cpu_online_mask);
 }
 
 static inline const struct cpumask *
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/8] lib/find: add find_next_and_andnot_bit()
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
  2023-04-30 17:18 ` [PATCH v3 1/8] sched: fix sched_numa_find_nth_cpu() in non-NUMA case Yury Norov
@ 2023-04-30 17:18 ` Yury Norov
  2023-04-30 17:18 ` [PATCH v3 3/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Similarly to find_nth_and_andnot_bit(), find_next_and_andnot_bit() is
a convenient helper that allows traversing bitmaps without storing
intermediate results in a temporary bitmap.

In the following patches the function is used to implement NUMA-aware
CPUs enumeration.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/find.h | 43 +++++++++++++++++++++++++++++++++++++++++++
 lib/find_bit.c       | 12 ++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/include/linux/find.h b/include/linux/find.h
index 5e4f39ef2e72..90b68d76c073 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -16,6 +16,9 @@ unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned l
 					unsigned long nbits, unsigned long start);
 unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned long *addr2,
 					unsigned long nbits, unsigned long start);
+unsigned long _find_next_and_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
+					const unsigned long *addr3, unsigned long nbits,
+					unsigned long start);
 unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits,
 					 unsigned long start);
 extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
@@ -159,6 +162,40 @@ unsigned long find_next_or_bit(const unsigned long *addr1,
 }
 #endif
 
+#ifndef find_next_and_andnot_bit
+/**
+ * find_next_and_andnot_bit - find the next bit set in *addr1 and *addr2,
+ *			      excluding all the bits in *addr3
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @addr3: The third address to base the search on
+ * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Return: the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static __always_inline
+unsigned long find_next_and_andnot_bit(const unsigned long *addr1,
+				   const unsigned long *addr2,
+				   const unsigned long *addr3,
+				   unsigned long size,
+				   unsigned long offset)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr1 & *addr2 & ~*addr3 & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
+	return _find_next_and_andnot_bit(addr1, addr2, addr3, size, offset);
+}
+#endif
+
 #ifndef find_next_zero_bit
 /**
  * find_next_zero_bit - find the next cleared bit in a memory region
@@ -568,6 +605,12 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 	     (bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
 	     (bit)++)
 
+#define for_each_and_andnot_bit(bit, addr1, addr2, addr3, size) \
+	for ((bit) = 0;									\
+	     (bit) = find_next_and_andnot_bit((addr1), (addr2), (addr3), (size), (bit)),\
+	     (bit) < (size);								\
+	     (bit)++)
+
 #define for_each_or_bit(bit, addr1, addr2, size) \
 	for ((bit) = 0;									\
 	     (bit) = find_next_or_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 32f99e9a670e..4403e00890b1 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -182,6 +182,18 @@ unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned l
 EXPORT_SYMBOL(_find_next_andnot_bit);
 #endif
 
+#ifndef find_next_and_andnot_bit
+unsigned long _find_next_and_andnot_bit(const unsigned long *addr1,
+					const unsigned long *addr2,
+					const unsigned long *addr3,
+					unsigned long nbits,
+					unsigned long start)
+{
+	return FIND_NEXT_BIT(addr1[idx] & addr2[idx] & ~addr3[idx], /* nop */, nbits, start);
+}
+EXPORT_SYMBOL(_find_next_and_andnot_bit);
+#endif
+
 #ifndef find_next_or_bit
 unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned long *addr2,
 					unsigned long nbits, unsigned long start)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/8] sched/topology: introduce sched_numa_find_next_cpu()
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
  2023-04-30 17:18 ` [PATCH v3 1/8] sched: fix sched_numa_find_nth_cpu() in non-NUMA case Yury Norov
  2023-04-30 17:18 ` [PATCH v3 2/8] lib/find: add find_next_and_andnot_bit() Yury Norov
@ 2023-04-30 17:18 ` Yury Norov
  2023-04-30 17:18 ` [PATCH v3 4/8] sched/topology: add for_each_numa_{,online}_cpu() macro Yury Norov
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

The function searches for a next CPU in a given cpumask according to
NUMA topology, so that it traverses CPUs per-hop.

If the CPU is the last CPU in a given hop, sched_numa_find_next_cpu()
switches to the next hop, and picks the first CPU from there, excluding
those already traversed.

Because only online CPUs are presented in the NUMA topology masks, offline
CPUs will be skipped even if presented in the 'cpus' mask provided in the
arguments.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h | 12 ++++++++++++
 kernel/sched/topology.c  | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 52f5850730b3..da92fea38585 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -245,8 +245,13 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
 	return cpumask_of_node(cpu_to_node(cpu));
 }
 
+/*
+ * sched_numa_find_*_cpu() functions family traverses only accessible CPUs,
+ * i.e. those listed in cpu_online_mask.
+ */
 #ifdef CONFIG_NUMA
 int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node);
+int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop);
 extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
 #else
 static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
@@ -254,6 +259,13 @@ static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, i
 	return cpumask_nth_and(cpu, cpus, cpu_online_mask);
 }
 
+static __always_inline
+int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop)
+{
+	return find_next_and_bit(cpumask_bits(cpus), cpumask_bits(cpu_online_mask),
+						small_cpumask_bits, cpu);
+}
+
 static inline const struct cpumask *
 sched_numa_hop_mask(unsigned int node, unsigned int hops)
 {
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 051aaf65c749..fc163e4181e6 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2130,6 +2130,45 @@ int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 }
 EXPORT_SYMBOL_GPL(sched_numa_find_nth_cpu);
 
+/*
+ * sched_numa_find_next_cpu() - given the NUMA topology, find the next cpu
+ * cpumask: cpumask to find a CPU from
+ * cpu: current CPU
+ * node: local node
+ * hop: (in/out) indicates distance order of current CPU to a local node
+ *
+ * The function searches for a next CPU at a given NUMA distance, indicated
+ * by hop, and if nothing found, tries to find CPUs at a greater distance,
+ * starting from the beginning.
+ *
+ * Return: cpu, or >= nr_cpu_ids when nothing found.
+ */
+int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop)
+{
+	unsigned long *cur, *prev;
+	struct cpumask ***masks;
+	unsigned int ret;
+
+	if (*hop >= sched_domains_numa_levels)
+		return nr_cpu_ids;
+
+	masks = rcu_dereference(sched_domains_numa_masks);
+	cur = cpumask_bits(masks[*hop][node]);
+	if (*hop == 0)
+		ret = find_next_and_bit(cpumask_bits(cpus), cur, nr_cpu_ids, cpu);
+	else {
+		prev = cpumask_bits(masks[*hop - 1][node]);
+		ret = find_next_and_andnot_bit(cpumask_bits(cpus), cur, prev, nr_cpu_ids, cpu);
+	}
+
+	if (ret < nr_cpu_ids)
+		return ret;
+
+	*hop += 1;
+	return sched_numa_find_next_cpu(cpus, 0, node, hop);
+}
+EXPORT_SYMBOL_GPL(sched_numa_find_next_cpu);
+
 /**
  * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from
  *                         @node
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 4/8] sched/topology: add for_each_numa_{,online}_cpu() macro
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (2 preceding siblings ...)
  2023-04-30 17:18 ` [PATCH v3 3/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
@ 2023-04-30 17:18 ` Yury Norov
  2023-04-30 17:18 ` [PATCH v3 5/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in the kernel, and it's beneficial to
create a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to using it is not an easy process.

New for_each_numa_cpu() is designed to be similar to the for_each_cpu().
It allows to convert existing code to NUMA-aware as simple as adding a
hop iterator variable and passing it inside new macro. for_each_numa_cpu()
takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

	for_each_numa_hop_mask(cpus, prev, node) {
 		for_each_cpu_andnot(cpu, cpus, prev) {
 			if (cnt++ == max_num)
 				goto out;
 			do_something(cpu);
 		}
		prev = cpus;
 	}

With the new macro, it would look like this:

	for_each_numa_online_cpu(cpu, hop, node) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

	cpumask_t cpus = get_mask();
	int cnt = 0, cpu;

	for_each_cpu(cpu, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Converting it to NUMA-aware version would be as simple as:

	cpumask_t cpus = get_mask();
	int node = get_node();
	int cnt = 0, hop, cpu;

	for_each_numa_cpu(cpu, hop, node, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index da92fea38585..6ed01962878c 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -291,4 +291,23 @@ sched_numa_hop_mask(unsigned int node, unsigned int hops)
 	     !IS_ERR_OR_NULL(mask);					       \
 	     __hops++)

+/**
+ * for_each_numa_cpu - iterate over cpus in increasing order taking into account
+ *		       NUMA distances from a given node.
+ * @cpu: the (optionally unsigned) integer iterator
+ * @hop: the iterator variable, must be initialized to a desired minimal hop.
+ * @node: the NUMA node to start the search from.
+ * @mask: the cpumask pointer
+ *
+ * Requires rcu_lock to be held.
+ */
+#define for_each_numa_cpu(cpu, hop, node, mask)					\
+	for ((cpu) = 0, (hop) = 0;						\
+		(cpu) = sched_numa_find_next_cpu((mask), (cpu), (node), &(hop)),\
+		(cpu) < nr_cpu_ids;						\
+		(cpu)++)
+
+#define for_each_numa_online_cpu(cpu, hop, node)				\
+	for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
+
 #endif /* _LINUX_TOPOLOGY_H */
-- 
2.37.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 5/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (3 preceding siblings ...)
  2023-04-30 17:18 ` [PATCH v3 4/8] sched/topology: add for_each_numa_{,online}_cpu() macro Yury Norov
@ 2023-04-30 17:18 ` Yury Norov
  2023-04-30 17:18 ` [PATCH v3 6/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_numa_online_cpu() is a more straightforward alternative to
for_each_numa_hop_mask() + for_each_cpu_andnot().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 38b32e98f3bd..d3511e45f121 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -817,12 +817,10 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
 static int comp_irqs_request(struct mlx5_core_dev *dev)
 {
 	struct mlx5_eq_table *table = dev->priv.eq_table;
-	const struct cpumask *prev = cpu_none_mask;
-	const struct cpumask *mask;
 	int ncomp_eqs = table->num_comp_eqs;
 	u16 *cpus;
 	int ret;
-	int cpu;
+	int cpu, hop;
 	int i;
 
 	ncomp_eqs = table->num_comp_eqs;
@@ -844,15 +842,11 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
 
 	i = 0;
 	rcu_read_lock();
-	for_each_numa_hop_mask(mask, dev->priv.numa_node) {
-		for_each_cpu_andnot(cpu, mask, prev) {
-			cpus[i] = cpu;
-			if (++i == ncomp_eqs)
-				goto spread_done;
-		}
-		prev = mask;
+	for_each_numa_online_cpu(cpu, hop, dev->priv.numa_node) {
+		cpus[i] = cpu;
+		if (++i == ncomp_eqs)
+			break;
 	}
-spread_done:
 	rcu_read_unlock();
 	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
 	kfree(cpus);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 6/8] lib/cpumask: update comment to cpumask_local_spread()
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (4 preceding siblings ...)
  2023-04-30 17:18 ` [PATCH v3 5/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
@ 2023-04-30 17:18 ` Yury Norov
  2023-04-30 17:18 ` [PATCH v3 7/8] sched: drop for_each_numa_hop_mask() Yury Norov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Now that we have a for_each_numa_online_cpu(), which is a more
straightforward replacement to the cpumask_local_spread() when it comes to
enumeration of CPUs with respect to NUMA topology, it's worth to update the
comment.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/cpumask.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index e7258836b60b..774966483ca9 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -127,11 +127,8 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
  *
  * There's a better alternative based on for_each()-like iterators:
  *
- *	for_each_numa_hop_mask(mask, node) {
- *		for_each_cpu_andnot(cpu, mask, prev)
- *			do_something(cpu);
- *		prev = mask;
- *	}
+ *	for_each_numa_online_cpu(cpu, hop, node)
+ *		do_something(cpu);
  *
  * It's simpler and more verbose than above. Complexity of iterator-based
  * enumeration is O(sched_domains_numa_levels * nr_cpu_ids), while
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 7/8] sched: drop for_each_numa_hop_mask()
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (5 preceding siblings ...)
  2023-04-30 17:18 ` [PATCH v3 6/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
@ 2023-04-30 17:18 ` Yury Norov
  2023-04-30 17:18 ` [PATCH v3 8/8] lib: test for_each_numa_cpus() Yury Norov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Now that we have for_each_numa_cpu(), for_each_numa_hop_mask()
and all related code is a dead code. Drop it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h | 25 -------------------------
 kernel/sched/topology.c  | 32 --------------------------------
 2 files changed, 57 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 6ed01962878c..808b5dcf6e36 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -252,7 +252,6 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
 #ifdef CONFIG_NUMA
 int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node);
 int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop);
-extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
 #else
 static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 {
@@ -265,32 +264,8 @@ int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsi
 	return find_next_and_bit(cpumask_bits(cpus), cpumask_bits(cpu_online_mask),
 						small_cpumask_bits, cpu);
 }
-
-static inline const struct cpumask *
-sched_numa_hop_mask(unsigned int node, unsigned int hops)
-{
-	return ERR_PTR(-EOPNOTSUPP);
-}
 #endif	/* CONFIG_NUMA */
 
-/**
- * for_each_numa_hop_mask - iterate over cpumasks of increasing NUMA distance
- *                          from a given node.
- * @mask: the iteration variable.
- * @node: the NUMA node to start the search from.
- *
- * Requires rcu_lock to be held.
- *
- * Yields cpu_online_mask for @node == NUMA_NO_NODE.
- */
-#define for_each_numa_hop_mask(mask, node)				       \
-	for (unsigned int __hops = 0;					       \
-	     mask = (node != NUMA_NO_NODE || __hops) ?			       \
-		     sched_numa_hop_mask(node, __hops) :		       \
-		     cpu_online_mask,					       \
-	     !IS_ERR_OR_NULL(mask);					       \
-	     __hops++)
-
 /**
  * for_each_numa_cpu - iterate over cpus in increasing order taking into account
  *		       NUMA distances from a given node.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index fc163e4181e6..bb5ba2c5589a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2169,38 +2169,6 @@ int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsi
 }
 EXPORT_SYMBOL_GPL(sched_numa_find_next_cpu);
 
-/**
- * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from
- *                         @node
- * @node: The node to count hops from.
- * @hops: Include CPUs up to that many hops away. 0 means local node.
- *
- * Return: On success, a pointer to a cpumask of CPUs at most @hops away from
- * @node, an error value otherwise.
- *
- * Requires rcu_lock to be held. Returned cpumask is only valid within that
- * read-side section, copy it if required beyond that.
- *
- * Note that not all hops are equal in distance; see sched_init_numa() for how
- * distances and masks are handled.
- * Also note that this is a reflection of sched_domains_numa_masks, which may change
- * during the lifetime of the system (offline nodes are taken out of the masks).
- */
-const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops)
-{
-	struct cpumask ***masks;
-
-	if (node >= nr_node_ids || hops >= sched_domains_numa_levels)
-		return ERR_PTR(-EINVAL);
-
-	masks = rcu_dereference(sched_domains_numa_masks);
-	if (!masks)
-		return ERR_PTR(-EBUSY);
-
-	return masks[hops][node];
-}
-EXPORT_SYMBOL_GPL(sched_numa_hop_mask);
-
 #endif /* CONFIG_NUMA */
 
 static int __sdt_alloc(const struct cpumask *cpu_map)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 8/8] lib: test for_each_numa_cpus()
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (6 preceding siblings ...)
  2023-04-30 17:18 ` [PATCH v3 7/8] sched: drop for_each_numa_hop_mask() Yury Norov
@ 2023-04-30 17:18 ` Yury Norov
  2023-07-22 15:47   ` Guenter Roeck
  2023-05-02 16:59 ` [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Valentin Schneider
  2023-05-31 15:43 ` Yury Norov
  9 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Test for_each_numa_cpus() output to ensure that:
 - all CPUs are picked from NUMA nodes with non-decreasing distances to the
   original node; 
 - only online CPUs are enumerated;
 - the macro enumerates each online CPUs only once;
 - enumeration order is consistent with cpumask_local_spread().

The latter is an implementation-defined behavior. If cpumask_local_spread()
or for_each_numa_cpu() will get changed in future, the subtest may need
to be adjusted or even removed, as appropriate.

It's useful now because some architectures don't implement numa_distance(),
and generic implementation only distinguishes local and remote nodes, which
doesn't allow to test the for_each_numa_cpu() properly.

Suggested-by: Valentin Schneider <vschneid@redhat.com> (for node_distance() test)
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/test_bitmap.c | 70 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index a8005ad3bd58..ac4fe621d37b 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -12,6 +12,7 @@
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/topology.h>
 #include <linux/uaccess.h>
 
 #include "../tools/testing/selftests/kselftest_module.h"
@@ -71,6 +72,16 @@ __check_eq_uint(const char *srcfile, unsigned int line,
 	return true;
 }
 
+static bool __init
+__check_ge_uint(const char *srcfile, unsigned int line,
+		const unsigned int exp_uint, unsigned int x)
+{
+	if (exp_uint >=  x)
+		return true;
+
+	pr_err("[%s:%u] expected >= %u, got %u\n", srcfile, line, exp_uint, x);
+	return false;
+}
 
 static bool __init
 __check_eq_bitmap(const char *srcfile, unsigned int line,
@@ -86,6 +97,18 @@ __check_eq_bitmap(const char *srcfile, unsigned int line,
 	return true;
 }
 
+static bool __init
+__check_eq_cpumask(const char *srcfile, unsigned int line,
+		  const struct cpumask *exp_cpumask, const struct cpumask *cpumask)
+{
+	if (cpumask_equal(exp_cpumask, cpumask))
+		return true;
+
+	pr_warn("[%s:%u] cpumasks contents differ: expected \"%*pbl\", got \"%*pbl\"\n",
+		srcfile, line, cpumask_pr_args(exp_cpumask), cpumask_pr_args(cpumask));
+	return false;
+}
+
 static bool __init
 __check_eq_pbl(const char *srcfile, unsigned int line,
 	       const char *expected_pbl,
@@ -173,11 +196,11 @@ __check_eq_str(const char *srcfile, unsigned int line,
 	return eq;
 }
 
-#define __expect_eq(suffix, ...)					\
+#define __expect(suffix, ...)						\
 	({								\
 		int result = 0;						\
 		total_tests++;						\
-		if (!__check_eq_ ## suffix(__FILE__, __LINE__,		\
+		if (!__check_ ## suffix(__FILE__, __LINE__,		\
 					   ##__VA_ARGS__)) {		\
 			failed_tests++;					\
 			result = 1;					\
@@ -185,13 +208,19 @@ __check_eq_str(const char *srcfile, unsigned int line,
 		result;							\
 	})
 
+#define __expect_eq(suffix, ...)	__expect(eq_ ## suffix, ##__VA_ARGS__)
+#define __expect_ge(suffix, ...)	__expect(ge_ ## suffix, ##__VA_ARGS__)
+
 #define expect_eq_uint(...)		__expect_eq(uint, ##__VA_ARGS__)
 #define expect_eq_bitmap(...)		__expect_eq(bitmap, ##__VA_ARGS__)
+#define expect_eq_cpumask(...)		__expect_eq(cpumask, ##__VA_ARGS__)
 #define expect_eq_pbl(...)		__expect_eq(pbl, ##__VA_ARGS__)
 #define expect_eq_u32_array(...)	__expect_eq(u32_array, ##__VA_ARGS__)
 #define expect_eq_clump8(...)		__expect_eq(clump8, ##__VA_ARGS__)
 #define expect_eq_str(...)		__expect_eq(str, ##__VA_ARGS__)
 
+#define expect_ge_uint(...)		__expect_ge(uint, ##__VA_ARGS__)
+
 static void __init test_zero_clear(void)
 {
 	DECLARE_BITMAP(bmap, 1024);
@@ -751,6 +780,42 @@ static void __init test_for_each_set_bit_wrap(void)
 	}
 }
 
+static void __init test_for_each_numa_cpu(void)
+{
+	unsigned int node, cpu, hop;
+	cpumask_var_t mask;
+
+	if (!alloc_cpumask_var(&mask, GFP_KERNEL)) {
+		pr_err("Can't allocate cpumask. Skipping for_each_numa_cpu() test");
+		return;
+	}
+
+	for_each_node(node) {
+		unsigned int c = 0, dist, old_dist = node_distance(node, node);
+
+		cpumask_clear(mask);
+
+		rcu_read_lock();
+		for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
+			dist = node_distance(cpu_to_node(cpu), node);
+
+			/* Distance between nodes must never decrease */
+			expect_ge_uint(dist, old_dist);
+
+			/* Test for coherence with cpumask_local_spread() */
+			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+
+			cpumask_set_cpu(cpu, mask);
+			old_dist = dist;
+		}
+		rcu_read_unlock();
+
+		/* Each online CPU must be visited exactly once */
+		expect_eq_uint(c, num_online_cpus());
+		expect_eq_cpumask(mask, cpu_online_mask);
+	}
+}
+
 static void __init test_for_each_set_bit(void)
 {
 	DECLARE_BITMAP(orig, 500);
@@ -1237,6 +1302,7 @@ static void __init selftest(void)
 	test_for_each_clear_bitrange_from();
 	test_for_each_set_clump8();
 	test_for_each_set_bit_wrap();
+	test_for_each_numa_cpu();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 8/8] lib: test for_each_numa_cpus()
  2023-04-30 17:18 ` [PATCH v3 8/8] lib: test for_each_numa_cpus() Yury Norov
@ 2023-07-22 15:47   ` Guenter Roeck
  2023-07-27  2:11     ` Yury Norov
  0 siblings, 1 reply; 21+ messages in thread
From: Guenter Roeck @ 2023-07-22 15:47 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

Hi,

On Sun, Apr 30, 2023 at 10:18:09AM -0700, Yury Norov wrote:
> Test for_each_numa_cpus() output to ensure that:
>  - all CPUs are picked from NUMA nodes with non-decreasing distances to the
>    original node; 
>  - only online CPUs are enumerated;
>  - the macro enumerates each online CPUs only once;
>  - enumeration order is consistent with cpumask_local_spread().
> 
> The latter is an implementation-defined behavior. If cpumask_local_spread()
> or for_each_numa_cpu() will get changed in future, the subtest may need
> to be adjusted or even removed, as appropriate.
> 
> It's useful now because some architectures don't implement numa_distance(),
> and generic implementation only distinguishes local and remote nodes, which
> doesn't allow to test the for_each_numa_cpu() properly.
> 

This patch results in a crash when testing sparc64 images with qemu.

[    4.178301] Unable to handle kernel NULL pointer dereference
[    4.178836] tsk->{mm,active_mm}->context = 0000000000000000
[    4.179280] tsk->{mm,active_mm}->pgd = fffff80000402000
[    4.179710]               \|/ ____ \|/
[    4.179710]               "@'/ .. \`@"
[    4.179710]               /_| \__/ |_\
[    4.179710]                  \__U_/
[    4.180307] swapper/0(1): Oops [#1]
[    4.181070] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.5.0-rc2+ #1
[    4.181720] TSTATE: 0000000011001600 TPC: 000000000094d800 TNPC: 000000000094d804 Y: 00000000    Tainted: G                 N
[    4.182324] TPC: <_find_next_and_bit+0x20/0xa0>
[    4.183136] g0: 0000000000530b90 g1: 0000000000000000 g2: 0000000000000000 g3: ffffffffffffffff
[    4.183611] g4: fffff80004200020 g5: fffff8001dc42000 g6: fffff80004168000 g7: 0000000000000000
[    4.184080] o0: 0000000000000001 o1: 0000000001a28190 o2: 0000000000000009 o3: 00000000020e9e28
[    4.184549] o4: 0000000000000200 o5: 0000000000000001 sp: fffff8000416af11 ret_pc: 0000000000f6529c
[    4.185020] RPC: <lock_is_held_type+0xbc/0x180>
[    4.185477] l0: 0000000001bbfa58 l1: 0000000000000000 l2: 00000000020ea228 l3: fffff80004200aa0
[    4.185950] l4: 81b8e1e5a4e0c637 l5: 000000000192b000 l6: 00000000023b3800 l7: 00000000020e9e28
[    4.186417] i0: 000000000192a3f8 i1: 0000000000000000 i2: 0000000000000001 i3: 0000000000000000
[    4.186885] i4: 0000000000000001 i5: fffff80004200aa0 i6: fffff8000416afc1 i7: 00000000004c79bc
[    4.187356] I7: <sched_numa_find_next_cpu+0x13c/0x180>
[    4.187821] Call Trace:
[    4.188274] [<00000000004c79bc>] sched_numa_find_next_cpu+0x13c/0x180
[    4.188762] [<0000000001b77c10>] test_for_each_numa_cpu+0x164/0x37c
[    4.189196] [<0000000001b7878c>] test_bitmap_init+0x964/0x9f4
[    4.189637] [<0000000000427f40>] do_one_initcall+0x60/0x340
[    4.190069] [<0000000001b56f34>] kernel_init_freeable+0x1bc/0x228
[    4.190496] [<0000000000f66aa4>] kernel_init+0x18/0x134
[    4.190911] [<00000000004060e8>] ret_from_fork+0x1c/0x2c
[    4.191326] [<0000000000000000>] 0x0
[    4.191827] Disabling lock debugging due to kernel taint
[    4.192363] Caller[00000000004c79bc]: sched_numa_find_next_cpu+0x13c/0x180
[    4.192825] Caller[0000000001b77c10]: test_for_each_numa_cpu+0x164/0x37c
[    4.193255] Caller[0000000001b7878c]: test_bitmap_init+0x964/0x9f4
[    4.193681] Caller[0000000000427f40]: do_one_initcall+0x60/0x340
[    4.194097] Caller[0000000001b56f34]: kernel_init_freeable+0x1bc/0x228
[    4.194516] Caller[0000000000f66aa4]: kernel_init+0x18/0x134
[    4.194922] Caller[00000000004060e8]: ret_from_fork+0x1c/0x2c
[    4.195326] Caller[0000000000000000]: 0x0
[    4.195728] Instruction DUMP:
[    4.195762]  8328b003
[    4.196237]  8728d01b
[    4.196600]  d05e0001
[    4.196956] <ce5e4001>
[    4.197311]  900a0007
[    4.197669]  900a0003
[    4.198024]  0ac20013
[    4.198379]  bb28b006
[    4.198733]  8400a001
[    4.199093]
[    4.199873] note: swapper/0[1] exited with preempt_count 1
[    4.200539] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

Bisect log attached.

Guenter

---
# bad: [ae867bc97b713121b2a7f5fcac68378a0774739b] Add linux-next specific files for 20230721
# good: [fdf0eaf11452d72945af31804e2a1048ee1b574c] Linux 6.5-rc2
git bisect start 'HEAD' 'v6.5-rc2'
# good: [f09bf8f6c8cbbff6f52523abcda88c86db72e31c] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
git bisect good f09bf8f6c8cbbff6f52523abcda88c86db72e31c
# good: [86374a6210aeebceb927204d80f9e65739134bc3] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
git bisect good 86374a6210aeebceb927204d80f9e65739134bc3
# good: [d588c93cae9e3dff15d125e755edcba5d842f41a] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
git bisect good d588c93cae9e3dff15d125e755edcba5d842f41a
# good: [3c3990d304820b12a07c77a6e091d6711b31f8e5] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire.git
git bisect good 3c3990d304820b12a07c77a6e091d6711b31f8e5
# good: [b80a945fabd7acc5984d421c73fa0b667195adb7] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
git bisect good b80a945fabd7acc5984d421c73fa0b667195adb7
# good: [22c343fad503564a2ef5c6aff1dcb1ec0640006e] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
git bisect good 22c343fad503564a2ef5c6aff1dcb1ec0640006e
# good: [bf05130eebc3265314f14c1314077f500a5c8d98] Merge branch 'mhi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi.git
git bisect good bf05130eebc3265314f14c1314077f500a5c8d98
# good: [18eea171e03cc2b30fe7c11d6e94521d905026f0] Merge branch 'rust-next' of https://github.com/Rust-for-Linux/linux.git
git bisect good 18eea171e03cc2b30fe7c11d6e94521d905026f0
# bad: [94b1547668965e1fde8bde3638845ab582b40034] lib: test for_each_numa_cpus()
git bisect bad 94b1547668965e1fde8bde3638845ab582b40034
# good: [310ae5d9d46b65fdbd18ac1e5bd03681fbc19ae8] sched/topology: introduce sched_numa_find_next_cpu()
git bisect good 310ae5d9d46b65fdbd18ac1e5bd03681fbc19ae8
# good: [a4be5fa84bb269886310f563e9095e8164f82c8c] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
git bisect good a4be5fa84bb269886310f563e9095e8164f82c8c
# good: [b9833b80d87030b0def7aeda88471ac7f6acd3cb] sched: drop for_each_numa_hop_mask()
git bisect good b9833b80d87030b0def7aeda88471ac7f6acd3cb
# first bad commit: [94b1547668965e1fde8bde3638845ab582b40034] lib: test for_each_numa_cpus()

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 8/8] lib: test for_each_numa_cpus()
  2023-07-22 15:47   ` Guenter Roeck
@ 2023-07-27  2:11     ` Yury Norov
  0 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-07-27  2:11 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Sat, Jul 22, 2023 at 08:47:16AM -0700, Guenter Roeck wrote:
> Hi,
> 
> On Sun, Apr 30, 2023 at 10:18:09AM -0700, Yury Norov wrote:
> > Test for_each_numa_cpus() output to ensure that:
> >  - all CPUs are picked from NUMA nodes with non-decreasing distances to the
> >    original node; 
> >  - only online CPUs are enumerated;
> >  - the macro enumerates each online CPUs only once;
> >  - enumeration order is consistent with cpumask_local_spread().
> > 
> > The latter is an implementation-defined behavior. If cpumask_local_spread()
> > or for_each_numa_cpu() will get changed in future, the subtest may need
> > to be adjusted or even removed, as appropriate.
> > 
> > It's useful now because some architectures don't implement numa_distance(),
> > and generic implementation only distinguishes local and remote nodes, which
> > doesn't allow to test the for_each_numa_cpu() properly.
> > 
> 
> This patch results in a crash when testing sparc64 images with qemu.

Thanks Guenter for reporting. I'll remove the series until fixing the
issue.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (7 preceding siblings ...)
  2023-04-30 17:18 ` [PATCH v3 8/8] lib: test for_each_numa_cpus() Yury Norov
@ 2023-05-02 16:59 ` Valentin Schneider
  2023-05-02 21:58   ` Yury Norov
  2023-05-31 15:43 ` Yury Norov
  9 siblings, 1 reply; 21+ messages in thread
From: Valentin Schneider @ 2023-05-02 16:59 UTC (permalink / raw)
  To: Yury Norov, Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 30/04/23 10:18, Yury Norov wrote:
> for_each_cpu() is widely used in kernel, and it's beneficial to create
> a NUMA-aware version of the macro.
>
> Recently added for_each_numa_hop_mask() works, but switching existing
> codebase to it is not an easy process.
>
> This series adds for_each_numa_cpu(), which is designed to be similar to
> the for_each_cpu(). It allows to convert existing code to NUMA-aware as
> simple as adding a hop iterator variable and passing it inside new macro.
> for_each_numa_cpu() takes care of the rest.
>
> At the moment, we have 2 users of NUMA-aware enumerators. One is
> Melanox's in-tree driver, and another is Intel's in-review driver:
>
> https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/
>
> Both real-life examples follow the same pattern:
>
>         for_each_numa_hop_mask(cpus, prev, node) {
>                 for_each_cpu_andnot(cpu, cpus, prev) {
>                         if (cnt++ == max_num)
>                                 goto out;
>                         do_something(cpu);
>                 }
>                 prev = cpus;
>         }
>
> With the new macro, it has a more standard look, like this:
>
>         for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
>                 if (cnt++ == max_num)
>                         break;
>                 do_something(cpu);
>         }
>
> Straight conversion of existing for_each_cpu() codebase to NUMA-aware
> version with for_each_numa_hop_mask() is difficult because it doesn't
> take a user-provided cpu mask, and eventually ends up with open-coded
> double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
> Consider the NUMA-ignorant example:
>
>         cpumask_t cpus = get_mask();
>         int cnt = 0, cpu;
>
>         for_each_cpu(cpu, cpus) {
>                 if (cnt++ == max_num)
>                         break;
>                 do_something(cpu);
>         }
>
> Converting it to NUMA-aware version would be as simple as:
>
>         cpumask_t cpus = get_mask();
>         int node = get_node();
>         int cnt = 0, hop, cpu;
>
>         for_each_numa_cpu(cpu, hop, node, cpus) {
>                 if (cnt++ == max_num)
>                         break;
>                 do_something(cpu);
>         }
>
> The latter looks more verbose and avoids from open-coding that annoying
> double loop. Another advantage is that it works with a 'hop' parameter with
> the clear meaning of NUMA distance, and doesn't make people not familiar
> to enumerator internals bothering with current and previous masks machinery.
>

LGTM, I ran the tests on a few NUMA topologies and that all seems to behave
as expected. Thanks for working on this! 

Reviewed-by: Valentin Schneider <vschneid@redhat.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-05-02 16:59 ` [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Valentin Schneider
@ 2023-05-02 21:58   ` Yury Norov
  2023-05-03 10:00     ` Valentin Schneider
  0 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-05-02 21:58 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

> LGTM, I ran the tests on a few NUMA topologies and that all seems to behave
> as expected. Thanks for working on this!
>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>

Thank you Valentin. If you spent time testing the series, why
don't you add your Tested-by?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-05-02 21:58   ` Yury Norov
@ 2023-05-03 10:00     ` Valentin Schneider
  0 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2023-05-03 10:00 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 02/05/23 14:58, Yury Norov wrote:
>> LGTM, I ran the tests on a few NUMA topologies and that all seems to behave
>> as expected. Thanks for working on this!
>>
>> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
>
> Thank you Valentin. If you spent time testing the series, why
> don't you add your Tested-by?

Well, I only ran the test_bitmap stuff and checked the output of the
iterator then, I didn't get to test on actual hardware with a mellanox card
:-)

But yeah, I suppose that does count for the rest, so feel free to add to
all patches but #5:

Tested-by: Valentin Schneider <vschneid@redhat.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (8 preceding siblings ...)
  2023-05-02 16:59 ` [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Valentin Schneider
@ 2023-05-31 15:43 ` Yury Norov
  2023-05-31 17:01   ` Jakub Kicinski
  9 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-05-31 15:43 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

On Sun, Apr 30, 2023 at 10:18:01AM -0700, Yury Norov wrote:
> for_each_cpu() is widely used in kernel, and it's beneficial to create
> a NUMA-aware version of the macro.
> 
> Recently added for_each_numa_hop_mask() works, but switching existing
> codebase to it is not an easy process.
> 
> This series adds for_each_numa_cpu(), which is designed to be similar to
> the for_each_cpu(). It allows to convert existing code to NUMA-aware as
> simple as adding a hop iterator variable and passing it inside new macro.
> for_each_numa_cpu() takes care of the rest.

Hi Jakub,

Now that the series reviewed, can you consider taking it in sched
tree?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-05-31 15:43 ` Yury Norov
@ 2023-05-31 17:01   ` Jakub Kicinski
  2023-05-31 17:08     ` Yury Norov
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2023-05-31 17:01 UTC (permalink / raw)
  To: Yury Norov
  Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Wed, 31 May 2023 08:43:46 -0700 Yury Norov wrote:
> On Sun, Apr 30, 2023 at 10:18:01AM -0700, Yury Norov wrote:
> > for_each_cpu() is widely used in kernel, and it's beneficial to create
> > a NUMA-aware version of the macro.
> > 
> > Recently added for_each_numa_hop_mask() works, but switching existing
> > codebase to it is not an easy process.
> > 
> > This series adds for_each_numa_cpu(), which is designed to be similar to
> > the for_each_cpu(). It allows to convert existing code to NUMA-aware as
> > simple as adding a hop iterator variable and passing it inside new macro.
> > for_each_numa_cpu() takes care of the rest.  
> 
> Hi Jakub,
> 
> Now that the series reviewed, can you consider taking it in sched
> tree?

Do you mean someone else or did you mean the net-next tree?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-05-31 17:01   ` Jakub Kicinski
@ 2023-05-31 17:08     ` Yury Norov
  0 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-05-31 17:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Wed, May 31, 2023 at 10:01:25AM -0700, Jakub Kicinski wrote:
> On Wed, 31 May 2023 08:43:46 -0700 Yury Norov wrote:
> > On Sun, Apr 30, 2023 at 10:18:01AM -0700, Yury Norov wrote:
> > > for_each_cpu() is widely used in kernel, and it's beneficial to create
> > > a NUMA-aware version of the macro.
> > > 
> > > Recently added for_each_numa_hop_mask() works, but switching existing
> > > codebase to it is not an easy process.
> > > 
> > > This series adds for_each_numa_cpu(), which is designed to be similar to
> > > the for_each_cpu(). It allows to convert existing code to NUMA-aware as
> > > simple as adding a hop iterator variable and passing it inside new macro.
> > > for_each_numa_cpu() takes care of the rest.  
> > 
> > Hi Jakub,
> > 
> > Now that the series reviewed, can you consider taking it in sched
> > tree?
> 
> Do you mean someone else or did you mean the net-next tree?

Sorry, net-next.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
@ 2023-04-20  5:19 Yury Norov
  0 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in kernel, and it's beneficial to create
a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to it is not an easy process.

This series adds for_each_numa_cpu(), which is designed to be similar to
the for_each_cpu(). It allows to convert existing code to NUMA-aware as
simple as adding a hop iterator variable and passing it inside new macro.
for_each_numa_cpu() takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

        for_each_numa_hop_mask(cpus, prev, node) {
                for_each_cpu_andnot(cpu, cpus, prev) {
                        if (cnt++ == max_num)
                                goto out;
                        do_something(cpu);
                }
                prev = cpus;
        }

With the new macro, it has a more standard look, like this:

        for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

        cpumask_t cpus = get_mask();
        int cnt = 0, cpu;

        for_each_cpu(cpu, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Converting it to NUMA-aware version would be as simple as:

        cpumask_t cpus = get_mask();
        int node = get_node();
        int cnt = 0, hop, cpu;

        for_each_numa_cpu(cpu, hop, node, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

v2:
 - repase on top of master;
 - cleanup comments and tweak them to comply with kernel-doc;
 - remove RFC from patch #8 as there's no objections.

Yury Norov (8):
  lib/find: add find_next_and_andnot_bit()
  sched/topology: introduce sched_numa_find_next_cpu()
  sched/topology: add for_each_numa_cpu() macro
  net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  lib/cpumask: update comment to cpumask_local_spread()
  sched/topology: export sched_domains_numa_levels
  lib: add test for for_each_numa_{cpu,hop_mask}()
  sched: drop for_each_numa_hop_mask()

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 ++----
 include/linux/find.h                         | 43 ++++++++++++++
 include/linux/topology.h                     | 37 ++++++------
 kernel/sched/topology.c                      | 59 +++++++++++---------
 lib/cpumask.c                                |  7 +--
 lib/find_bit.c                               | 12 ++++
 lib/test_bitmap.c                            | 16 ++++++
 7 files changed, 133 insertions(+), 57 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
@ 2023-04-15  5:06 Yury Norov
  2023-04-17 11:11 ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-04-15  5:06 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in kernel, and it's beneficial to create
a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to it is not an easy process.

This series adds for_each_numa_cpu(), which is designed to be similar to
the for_each_cpu(). It allows to convert existing code to NUMA-aware as
simple as adding a hop iterator variable and passing it inside new macro.
for_each_numa_cpu() takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

        for_each_numa_hop_mask(cpus, prev, node) {
                for_each_cpu_andnot(cpu, cpus, prev) {
                        if (cnt++ == max_num)
                                goto out;
                        do_something(cpu);
                }
                prev = cpus;
        }

With the new macro, it has a more standard look, like this:

        for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

        cpumask_t cpus = get_mask();
        int cnt = 0, cpu;

        for_each_cpu(cpu, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Converting it to NUMA-aware version would be as simple as:

        cpumask_t cpus = get_mask();
        int node = get_node();
        int cnt = 0, hop, cpu;

        for_each_numa_cpu(cpu, hop, node, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

v2:
 - repase on top of master;
 - cleanup comments and tweak them to comply with kernel-doc;
 - remove RFC from patch #8 as there's no objections.

Yury Norov (8):
  lib/find: add find_next_and_andnot_bit()
  sched/topology: introduce sched_numa_find_next_cpu()
  sched/topology: add for_each_numa_cpu() macro
  net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  lib/cpumask: update comment to cpumask_local_spread()
  sched/topology: export sched_domains_numa_levels
  lib: add test for for_each_numa_{cpu,hop_mask}()
  sched: drop for_each_numa_hop_mask()

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 ++----
 include/linux/find.h                         | 43 ++++++++++++++
 include/linux/topology.h                     | 37 ++++++------
 kernel/sched/topology.c                      | 59 +++++++++++---------
 lib/cpumask.c                                |  7 +--
 lib/find_bit.c                               | 12 ++++
 lib/test_bitmap.c                            | 16 ++++++
 7 files changed, 133 insertions(+), 57 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-15  5:06 Yury Norov
@ 2023-04-17 11:11 ` Peter Zijlstra
  2023-04-18  0:51   ` Yury Norov
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2023-04-17 11:11 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song



Something went wrong with sending your patches; I only have 0/n.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-17 11:11 ` Peter Zijlstra
@ 2023-04-18  0:51   ` Yury Norov
  0 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-04-18  0:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Paolo Abeni, Andy Shevchenko, Rasmus Villemoes, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Mon, Apr 17, 2023 at 01:11:37PM +0200, Peter Zijlstra wrote:
> 
> 
> Something went wrong with sending your patches; I only have 0/n.

Indeed. Thanks for pointing. I'll resend shortly.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-27  2:11 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
2023-04-30 17:18 ` [PATCH v3 1/8] sched: fix sched_numa_find_nth_cpu() in non-NUMA case Yury Norov
2023-04-30 17:18 ` [PATCH v3 2/8] lib/find: add find_next_and_andnot_bit() Yury Norov
2023-04-30 17:18 ` [PATCH v3 3/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
2023-04-30 17:18 ` [PATCH v3 4/8] sched/topology: add for_each_numa_{,online}_cpu() macro Yury Norov
2023-04-30 17:18 ` [PATCH v3 5/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
2023-04-30 17:18 ` [PATCH v3 6/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
2023-04-30 17:18 ` [PATCH v3 7/8] sched: drop for_each_numa_hop_mask() Yury Norov
2023-04-30 17:18 ` [PATCH v3 8/8] lib: test for_each_numa_cpus() Yury Norov
2023-07-22 15:47   ` Guenter Roeck
2023-07-27  2:11     ` Yury Norov
2023-05-02 16:59 ` [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Valentin Schneider
2023-05-02 21:58   ` Yury Norov
2023-05-03 10:00     ` Valentin Schneider
2023-05-31 15:43 ` Yury Norov
2023-05-31 17:01   ` Jakub Kicinski
2023-05-31 17:08     ` Yury Norov
  -- strict thread matches above, loose matches on Subject: below --
2023-04-20  5:19 Yury Norov
2023-04-15  5:06 Yury Norov
2023-04-17 11:11 ` Peter Zijlstra
2023-04-18  0:51   ` Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).