[PATCH v4 0/9] lib/group_cpus: rework grp_spread_init

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1)
@ 2023-12-28 20:09 Yury Norov
  2023-12-28 20:09 ` [PATCH 1/9] cpumask: introduce for_each_cpu_and_from() Yury Norov
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Hi Andrew, Ming,

Now that we've got a couple more weeks, let's try to merge this series in
the upcoming merge window? In addition to addressing the v3 comments, in
v4 I added a few more patches that simplify the code for more by using
cleanup machinery.

Thanks,
	Yury
--

grp_spread_init_one() implementation is sub-optimal because it
traverses bitmaps from the beginning, instead of picking from the
previous iteration.

Fix it and use find_bit API where appropriate. While here, optimize
cpumasks allocation and drop unneeded cpumask_empty() call.

---
v1: https://lore.kernel.org/all/ZW5MI3rKQueLM0Bz@yury-ThinkPad/T/
v2: https://lore.kernel.org/lkml/ZXKNVRu3AfvjaFhK@fedora/T/
v3: https://lore.kernel.org/all/ZYnD4Bp8R9oIz19s@yury-ThinkPad/T/
v4:
 - drop patch v3 #7 and add a comment in patch #4;
 - patch #2: fix cpus_per_grp decrement order;
 - patch #3: add NAK fro Ming Lei;
 - add patches #7-9 that simplify the code for more.

Yury Norov (9):
  cpumask: introduce for_each_cpu_and_from()
  lib/group_cpus: optimize inner loop in grp_spread_init_one()
  lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  lib/group_cpus: optimize outer loop in grp_spread_init_one()
  lib/group_cpus: don't zero cpumasks in group_cpus_evenly() on
    allocation
  lib/group_cpus: drop unneeded cpumask_empty() call in
    __group_cpus_evenly()
  cpumask: define cleanup function for cpumasks
  lib/group_cpus: rework group_cpus_evenly()
  lib/group_cpus: simplify group_cpus_evenly() for more

 include/linux/cpumask.h |  14 ++++++
 include/linux/find.h    |   3 ++
 lib/group_cpus.c        | 104 +++++++++++++++++-----------------------
 3 files changed, 62 insertions(+), 59 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/9] cpumask: introduce for_each_cpu_and_from()
  2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
@ 2023-12-28 20:09 ` Yury Norov
  2023-12-28 20:09 ` [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Similarly to for_each_cpu_and(), introduce a for_each_cpu_and_from(),
which is handy when it's needed to traverse 2 cpumasks or bitmaps,
starting from a given position.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/cpumask.h | 11 +++++++++++
 include/linux/find.h    |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index cfb545841a2c..73ff2e0ef090 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -332,6 +332,17 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta
 #define for_each_cpu_and(cpu, mask1, mask2)				\
 	for_each_and_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits)
 
+/**
+ * for_each_cpu_and_from - iterate over every cpu in both masks starting from a given cpu
+ * @cpu: the (optionally unsigned) integer iterator
+ * @mask1: the first cpumask pointer
+ * @mask2: the second cpumask pointer
+ *
+ * After the loop, cpu is >= nr_cpu_ids.
+ */
+#define for_each_cpu_and_from(cpu, mask1, mask2)				\
+	for_each_and_bit_from(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits)
+
 /**
  * for_each_cpu_andnot - iterate over every cpu present in one mask, excluding
  *			 those present in another.
diff --git a/include/linux/find.h b/include/linux/find.h
index 5e4f39ef2e72..dfd3d51ff590 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -563,6 +563,9 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 	     (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
 	     (bit)++)
 
+#define for_each_and_bit_from(bit, addr1, addr2, size) \
+	for (; (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size); (bit)++)
+
 #define for_each_andnot_bit(bit, addr1, addr2, size) \
 	for ((bit) = 0;									\
 	     (bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
  2023-12-28 20:09 ` [PATCH 1/9] cpumask: introduce for_each_cpu_and_from() Yury Norov
@ 2023-12-28 20:09 ` Yury Norov
  2024-01-02  0:59   ` Ming Lei
  2023-12-28 20:09 ` [PATCH 3/9] lib/group_cpus: relax atomicity requirement " Yury Norov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

The loop starts from the beginning every time we switch to the next
sibling mask. This is the Schlemiel the Painter's style of coding
because we know for sure that nmsk is clear up to the current CPU,
and we can just continue from the next CPU.

Also, we can do it nicer if leverage the dedicated for_each() iterator,
and simplify the logic of clearing a bit in nmsk.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index ee272c4cefcc..063ed9ae1b8d 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -30,14 +30,14 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 
 		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
-		for (sibl = -1; cpus_per_grp > 0; ) {
-			sibl = cpumask_next(sibl, siblmsk);
-			if (sibl >= nr_cpu_ids)
-				break;
-			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
-				continue;
+		sibl = cpu + 1;
+
+		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
+			if (cpus_per_grp-- == 0)
+				return;
+
+			cpumask_clear_cpu(sibl, nmsk);
 			cpumask_set_cpu(sibl, irqmsk);
-			cpus_per_grp--;
 		}
 	}
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/9] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
  2023-12-28 20:09 ` [PATCH 1/9] cpumask: introduce for_each_cpu_and_from() Yury Norov
  2023-12-28 20:09 ` [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
@ 2023-12-28 20:09 ` Yury Norov
  2023-12-29 18:39   ` Andrew Morton
  2023-12-28 20:09 ` [PATCH 4/9] lib/group_cpus: optimize outer loop " Yury Norov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Because nmsk and irqmsk are stable, extra atomicity is not required.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
NAKed-by: Ming Lei <ming.lei@redhat.com>
---
 lib/group_cpus.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 063ed9ae1b8d..0a8ac7cb1a5d 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -24,8 +24,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 		if (cpu >= nr_cpu_ids)
 			return;
 
-		cpumask_clear_cpu(cpu, nmsk);
-		cpumask_set_cpu(cpu, irqmsk);
+		__cpumask_clear_cpu(cpu, nmsk);
+		__cpumask_set_cpu(cpu, irqmsk);
 		cpus_per_grp--;
 
 		/* If the cpu has siblings, use them first */
@@ -36,8 +36,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 			if (cpus_per_grp-- == 0)
 				return;
 
-			cpumask_clear_cpu(sibl, nmsk);
-			cpumask_set_cpu(sibl, irqmsk);
+			__cpumask_clear_cpu(sibl, nmsk);
+			__cpumask_set_cpu(sibl, irqmsk);
 		}
 	}
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/9] lib/group_cpus: optimize outer loop in grp_spread_init_one()
  2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (2 preceding siblings ...)
  2023-12-28 20:09 ` [PATCH 3/9] lib/group_cpus: relax atomicity requirement " Yury Norov
@ 2023-12-28 20:09 ` Yury Norov
  2023-12-28 20:09 ` [PATCH 5/9] lib/group_cpus: don't zero cpumasks in group_cpus_evenly() on allocation Yury Norov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Similarly to the inner loop, in the outer loop we can use for_each_cpu()
macro, and skip CPUs that have been copied.

With this patch, the function becomes O(1), despite that it's a
double-loop.

While here, add a comment why we can't merge the inner and outer logic.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 0a8ac7cb1a5d..952aac9eaa81 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -17,16 +17,17 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 	const struct cpumask *siblmsk;
 	int cpu, sibl;
 
-	for ( ; cpus_per_grp > 0; ) {
-		cpu = cpumask_first(nmsk);
-
-		/* Should not happen, but I'm too lazy to think about it */
-		if (cpu >= nr_cpu_ids)
+	for_each_cpu(cpu, nmsk) {
+		if (cpus_per_grp-- == 0)
 			return;
 
+		/*
+		 * If a caller wants to spread IRQa on offline CPUs, we need to
+		 * take care of it explicitly because those offline CPUS are not
+		 * included in siblings cpumask.
+		 */
 		__cpumask_clear_cpu(cpu, nmsk);
 		__cpumask_set_cpu(cpu, irqmsk);
-		cpus_per_grp--;
 
 		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
@@ -38,6 +39,7 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 
 			__cpumask_clear_cpu(sibl, nmsk);
 			__cpumask_set_cpu(sibl, irqmsk);
+			cpu = sibl + 1;
 		}
 	}
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/9] lib/group_cpus: don't zero cpumasks in group_cpus_evenly() on allocation
  2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (3 preceding siblings ...)
  2023-12-28 20:09 ` [PATCH 4/9] lib/group_cpus: optimize outer loop " Yury Norov
@ 2023-12-28 20:09 ` Yury Norov
  2023-12-28 20:09 ` [PATCH 6/9] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly() Yury Norov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

nmsk and npresmsk are both allocated with zalloc_cpumask_var(), but they
are initialized by copying later in the code, and so may be allocated
uninitialized.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 952aac9eaa81..72c308f8c322 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -354,10 +354,10 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 	int ret = -ENOMEM;
 	struct cpumask *masks = NULL;
 
-	if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
+	if (!alloc_cpumask_var(&nmsk, GFP_KERNEL))
 		return NULL;
 
-	if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
+	if (!alloc_cpumask_var(&npresmsk, GFP_KERNEL))
 		goto fail_nmsk;
 
 	node_to_cpumask = alloc_node_to_cpumask();
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/9] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly()
  2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (4 preceding siblings ...)
  2023-12-28 20:09 ` [PATCH 5/9] lib/group_cpus: don't zero cpumasks in group_cpus_evenly() on allocation Yury Norov
@ 2023-12-28 20:09 ` Yury Norov
  2023-12-28 20:09 ` [PATCH 7/9] cpumask: define cleanup function for cpumasks Yury Norov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

The function is called twice. First time it's called with
cpumask_present as a parameter, which can't be empty. Second time it's
called with a mask created with cpumask_andnot(), which returns false if
the result is an empty mask.

We can safely drop redundant cpumask_empty() call from the
__group_cpus_evenly() and save few cycles.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 lib/group_cpus.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 72c308f8c322..b8c0c3ae2bbd 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -259,9 +259,6 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
 	nodemask_t nodemsk = NODE_MASK_NONE;
 	struct node_groups *node_groups;
 
-	if (cpumask_empty(cpu_mask))
-		return 0;
-
 	nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk);
 
 	/*
@@ -401,9 +398,14 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 		curgrp = 0;
 	else
 		curgrp = nr_present;
-	cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk);
-	ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
-				  npresmsk, nmsk, masks);
+
+	if (cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk))
+		/* If npresmsk is not empty */
+		ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
+					  npresmsk, nmsk, masks);
+	else
+		ret = 0;
+
 	if (ret >= 0)
 		nr_others = ret;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/9] cpumask: define cleanup function for cpumasks
  2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (5 preceding siblings ...)
  2023-12-28 20:09 ` [PATCH 6/9] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly() Yury Norov
@ 2023-12-28 20:09 ` Yury Norov
  2023-12-28 20:09 ` [PATCH 8/9] lib/group_cpus: rework group_cpus_evenly() Yury Norov
  2023-12-28 20:09 ` [PATCH 9/9] lib/group_cpus: simplify group_cpus_evenly() for more Yury Norov
  8 siblings, 0 replies; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Now we can simplify a code that allocates cpumasks for local needs.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/cpumask.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 73ff2e0ef090..f85515ebcf42 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -7,6 +7,7 @@
  * set of CPUs in a system, one bit position per CPU number.  In general,
  * only nr_cpu_ids (<= NR_CPUS) bits are valid.
  */
+#include <linux/cleanup.h>
 #include <linux/kernel.h>
 #include <linux/threads.h>
 #include <linux/bitmap.h>
@@ -988,6 +989,8 @@ static inline bool cpumask_available(cpumask_var_t mask)
 }
 #endif /* CONFIG_CPUMASK_OFFSTACK */
 
+DEFINE_FREE(free_cpumask_var, struct cpumask *, if (_T) free_cpumask_var(_T));
+
 /* It's common to want to use cpu_all_mask in struct member initializers,
  * so it has to refer to an address rather than a pointer. */
 extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 8/9] lib/group_cpus: rework group_cpus_evenly()
  2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (6 preceding siblings ...)
  2023-12-28 20:09 ` [PATCH 7/9] cpumask: define cleanup function for cpumasks Yury Norov
@ 2023-12-28 20:09 ` Yury Norov
  2023-12-28 20:09 ` [PATCH 9/9] lib/group_cpus: simplify group_cpus_evenly() for more Yury Norov
  8 siblings, 0 replies; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Leverage cleanup machinery and drop most of housekeeping code.
Particularly, drop unneeded and erroneously initialized with -ENOMEM
variable ret.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 79 +++++++++++++++---------------------------------
 1 file changed, 25 insertions(+), 54 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index b8c0c3ae2bbd..b9ab32e00a79 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -76,6 +76,8 @@ static void free_node_to_cpumask(cpumask_var_t *masks)
 	kfree(masks);
 }
 
+DEFINE_FREE(free_node_to_cpumask, cpumask_var_t *, if (_T) free_node_to_cpumask(_T));
+
 static void build_node_to_cpumask(cpumask_var_t *masks)
 {
 	int cpu;
@@ -345,26 +347,16 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
  */
 struct cpumask *group_cpus_evenly(unsigned int numgrps)
 {
-	unsigned int curgrp = 0, nr_present = 0, nr_others = 0;
-	cpumask_var_t *node_to_cpumask;
-	cpumask_var_t nmsk, npresmsk;
-	int ret = -ENOMEM;
-	struct cpumask *masks = NULL;
-
-	if (!alloc_cpumask_var(&nmsk, GFP_KERNEL))
+	cpumask_var_t *node_to_cpumask __free(free_node_to_cpumask) = alloc_node_to_cpumask();
+	struct cpumask *masks __free(kfree) = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
+	cpumask_var_t npresmsk __free(free_cpumask_var);
+	cpumask_var_t nmsk __free(free_cpumask_var);
+	unsigned int curgrp, nr_present, nr_others;
+
+	if (!masks || !node_to_cpumask || !alloc_cpumask_var(&nmsk, GFP_KERNEL)
+			|| !alloc_cpumask_var(&npresmsk, GFP_KERNEL))
 		return NULL;
 
-	if (!alloc_cpumask_var(&npresmsk, GFP_KERNEL))
-		goto fail_nmsk;
-
-	node_to_cpumask = alloc_node_to_cpumask();
-	if (!node_to_cpumask)
-		goto fail_npresmsk;
-
-	masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
-	if (!masks)
-		goto fail_node_to_cpumask;
-
 	build_node_to_cpumask(node_to_cpumask);
 
 	/*
@@ -382,11 +374,15 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 	cpumask_copy(npresmsk, data_race(cpu_present_mask));
 
 	/* grouping present CPUs first */
-	ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
-				  npresmsk, nmsk, masks);
-	if (ret < 0)
-		goto fail_build_affinity;
-	nr_present = ret;
+	nr_present = __group_cpus_evenly(0, numgrps, node_to_cpumask, npresmsk, nmsk, masks);
+	if (nr_present < 0)
+		return NULL;
+
+	/* If npresmsk is empty */
+	if (!cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk))
+		return_ptr(masks);
+
+	curgrp = nr_present < numgrps ? nr_present : 0;
 
 	/*
 	 * Allocate non present CPUs starting from the next group to be
@@ -394,38 +390,13 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 	 * group space, assign the non present CPUs to the already
 	 * allocated out groups.
 	 */
-	if (nr_present >= numgrps)
-		curgrp = 0;
-	else
-		curgrp = nr_present;
-
-	if (cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk))
-		/* If npresmsk is not empty */
-		ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
-					  npresmsk, nmsk, masks);
-	else
-		ret = 0;
-
-	if (ret >= 0)
-		nr_others = ret;
-
- fail_build_affinity:
-	if (ret >= 0)
-		WARN_ON(nr_present + nr_others < numgrps);
-
- fail_node_to_cpumask:
-	free_node_to_cpumask(node_to_cpumask);
-
- fail_npresmsk:
-	free_cpumask_var(npresmsk);
-
- fail_nmsk:
-	free_cpumask_var(nmsk);
-	if (ret < 0) {
-		kfree(masks);
+	nr_others = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
+					npresmsk, nmsk, masks);
+	if (nr_others < 0)
 		return NULL;
-	}
-	return masks;
+
+	WARN_ON(nr_present + nr_others < numgrps);
+	return_ptr(masks);
 }
 #else /* CONFIG_SMP */
 struct cpumask *group_cpus_evenly(unsigned int numgrps)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 9/9] lib/group_cpus: simplify group_cpus_evenly() for more
  2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (7 preceding siblings ...)
  2023-12-28 20:09 ` [PATCH 8/9] lib/group_cpus: rework group_cpus_evenly() Yury Norov
@ 2023-12-28 20:09 ` Yury Norov
  8 siblings, 0 replies; 16+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

The nmsk parameter is used only in helper function, so move it there.

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index b9ab32e00a79..3a0db0f51f09 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -253,13 +253,17 @@ static void alloc_nodes_groups(unsigned int numgrps,
 static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
 			       cpumask_var_t *node_to_cpumask,
 			       const struct cpumask *cpu_mask,
-			       struct cpumask *nmsk, struct cpumask *masks)
+			       struct cpumask *masks)
 {
 	unsigned int i, n, nodes, cpus_per_grp, extra_grps, done = 0;
 	unsigned int last_grp = numgrps;
 	unsigned int curgrp = startgrp;
 	nodemask_t nodemsk = NODE_MASK_NONE;
 	struct node_groups *node_groups;
+	cpumask_var_t nmsk __free(free_cpumask_var);
+
+	if (!alloc_cpumask_var(&nmsk, GFP_KERNEL))
+		return -ENOMEM;
 
 	nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk);
 
@@ -350,11 +354,9 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 	cpumask_var_t *node_to_cpumask __free(free_node_to_cpumask) = alloc_node_to_cpumask();
 	struct cpumask *masks __free(kfree) = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL);
 	cpumask_var_t npresmsk __free(free_cpumask_var);
-	cpumask_var_t nmsk __free(free_cpumask_var);
 	unsigned int curgrp, nr_present, nr_others;
 
-	if (!masks || !node_to_cpumask || !alloc_cpumask_var(&nmsk, GFP_KERNEL)
-			|| !alloc_cpumask_var(&npresmsk, GFP_KERNEL))
+	if (!masks || !node_to_cpumask || !alloc_cpumask_var(&npresmsk, GFP_KERNEL))
 		return NULL;
 
 	build_node_to_cpumask(node_to_cpumask);
@@ -374,7 +376,7 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 	cpumask_copy(npresmsk, data_race(cpu_present_mask));
 
 	/* grouping present CPUs first */
-	nr_present = __group_cpus_evenly(0, numgrps, node_to_cpumask, npresmsk, nmsk, masks);
+	nr_present = __group_cpus_evenly(0, numgrps, node_to_cpumask, npresmsk, masks);
 	if (nr_present < 0)
 		return NULL;
 
@@ -390,8 +392,7 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 	 * group space, assign the non present CPUs to the already
 	 * allocated out groups.
 	 */
-	nr_others = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
-					npresmsk, nmsk, masks);
+	nr_others = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, npresmsk, masks);
 	if (nr_others < 0)
 		return NULL;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/9] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-12-28 20:09 ` [PATCH 3/9] lib/group_cpus: relax atomicity requirement " Yury Norov
@ 2023-12-29 18:39   ` Andrew Morton
  2023-12-29 18:45     ` Yury Norov
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2023-12-29 18:39 UTC (permalink / raw)
  To: Yury Norov
  Cc: Thomas Gleixner, Ming Lei, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Thu, 28 Dec 2023 12:09:30 -0800 Yury Norov <yury.norov@gmail.com> wrote:

> Because nmsk and irqmsk are stable, extra atomicity is not required.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> NAKed-by: Ming Lei <ming.lei@redhat.com>

Well that's unusual.  I suggest that the changelog at least describe the
objection, and its counterargument?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/9] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-12-29 18:39   ` Andrew Morton
@ 2023-12-29 18:45     ` Yury Norov
  0 siblings, 0 replies; 16+ messages in thread
From: Yury Norov @ 2023-12-29 18:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Thomas Gleixner, Ming Lei, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Fri, Dec 29, 2023 at 10:39 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Thu, 28 Dec 2023 12:09:30 -0800 Yury Norov <yury.norov@gmail.com> wrote:
>
> > Because nmsk and irqmsk are stable, extra atomicity is not required.
> >
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > NAKed-by: Ming Lei <ming.lei@redhat.com>
>
> Well that's unusual.  I suggest that the changelog at least describe the
> objection, and its counterargument?

Sorry, forgot to copy it from v3 discussion. Please find below:


> > > > I think this kind of change should be avoided, here the code is
> > > > absolutely in slow path, and we care code cleanness and readability
> > > > much more than the saved cycle from non atomicity.
> > >
> > > Atomic ops have special meaning and special function. This 'atomic' way
> > > of moving a bit from one bitmap to another looks completely non-trivial
> > > and puzzling to me.
> > >
> > > A sequence of atomic ops is not atomic itself. Normally it's a sing of
> > > a bug. But in this case, both masks are stable, and we don't need
> > > atomicity at all.
> >
> > Here we don't care the atomicity.
> >
> > >
> > > It's not about performance, it's about readability.
> >
> > __cpumask_clear_cpu() and __cpumask_set_cpu() are more like private
> > helper, and more hard to follow.
>
> No that's not true. Non-atomic version of the function is not a
> private helper of course.
>
> > [@linux]$ git grep -n -w -E "cpumask_clear_cpu|cpumask_set_cpu" ./ | wc
> >     674    2055   53954
> > [@linux]$ git grep -n -w -E "__cpumask_clear_cpu|__cpumask_set_cpu" ./ | wc
> >      21      74    1580
> >
> > I don't object to comment the current usage, but NAK for this change.
>
> No problem, I'll add you NAK.

You can add the following words meantime:

__cpumask_clear_cpu() and __cpumask_set_cpu() are added in commit 6c8557bdb28d
("smp, cpumask: Use non-atomic cpumask_{set,clear}_cpu()") for fast code path(
smp_call_function_many()).

We have ~670 users of cpumask_clear_cpu & cpumask_set_cpu, lots of them
fall into same category with group_cpus.c(doesn't care atomicity, not in fast
code path), and needn't change to __cpumask_clear_cpu() and __cpumask_set_cpu().
Otherwise, this way may encourage to update others into the __cpumask_* version.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2023-12-28 20:09 ` [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
@ 2024-01-02  0:59   ` Ming Lei
  0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2024-01-02  0:59 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Thu, Dec 28, 2023 at 12:09:29PM -0800, Yury Norov wrote:
> The loop starts from the beginning every time we switch to the next
> sibling mask. This is the Schlemiel the Painter's style of coding
> because we know for sure that nmsk is clear up to the current CPU,
> and we can just continue from the next CPU.
> 
> Also, we can do it nicer if leverage the dedicated for_each() iterator,
> and simplify the logic of clearing a bit in nmsk.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  lib/group_cpus.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index ee272c4cefcc..063ed9ae1b8d 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -30,14 +30,14 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  
>  		/* If the cpu has siblings, use them first */
>  		siblmsk = topology_sibling_cpumask(cpu);
> -		for (sibl = -1; cpus_per_grp > 0; ) {
> -			sibl = cpumask_next(sibl, siblmsk);
> -			if (sibl >= nr_cpu_ids)
> -				break;
> -			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> -				continue;
> +		sibl = cpu + 1;
> +
> +		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> +			if (cpus_per_grp-- == 0)
> +				return;
> +
> +			cpumask_clear_cpu(sibl, nmsk);
>  			cpumask_set_cpu(sibl, irqmsk);
> -			cpus_per_grp--;

Again, here it is simpler to use for_each_cpu_and() directly, see previous
comment:

https://lore.kernel.org/lkml/ZXgsDcM21H%2F2BTck@fedora/

Meantime patch 1 isn't needed, follows this easier proposal:

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index ee272c4cefcc..564d8e817f65 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -30,14 +30,11 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 
 		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
-		for (sibl = -1; cpus_per_grp > 0; ) {
-			sibl = cpumask_next(sibl, siblmsk);
-			if (sibl >= nr_cpu_ids)
-				break;
-			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
-				continue;
+		for_each_cpu_and(sibl, siblmsk, nmsk) {
+			cpumask_clear_cpu(sibl, nmsk);
 			cpumask_set_cpu(sibl, irqmsk);
-			cpus_per_grp--;
+			if (--cpus_per_grp == 0)
+				return;
 		}
 	}
 }

Thanks,
Ming


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2024-01-20  2:50 [PATCH v5 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
@ 2024-01-20  2:50 ` Yury Norov
  2024-01-20  3:17   ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Yury Norov @ 2024-01-20  2:50 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Breno Leitao, Nathan Chancellor,
	Rasmus Villemoes, Zi Yan

The loop starts from the beginning every time we switch to the next
sibling mask. This is the Schlemiel the Painter's style of coding
because we know for sure that nmsk is clear up to current CPU, and we
can just continue from the next CPU.

Also, we can do it nicer if leverage the dedicated for_each() iterator,
and simplify the logic of clearing a bit in nmsk.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index ee272c4cefcc..063ed9ae1b8d 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -30,14 +30,14 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 
 		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
-		for (sibl = -1; cpus_per_grp > 0; ) {
-			sibl = cpumask_next(sibl, siblmsk);
-			if (sibl >= nr_cpu_ids)
-				break;
-			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
-				continue;
+		sibl = cpu + 1;
+
+		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
+			if (cpus_per_grp-- == 0)
+				return;
+
+			cpumask_clear_cpu(sibl, nmsk);
 			cpumask_set_cpu(sibl, irqmsk);
-			cpus_per_grp--;
 		}
 	}
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2024-01-20  2:50 ` [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
@ 2024-01-20  3:17   ` Ming Lei
  2024-01-20  7:03     ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2024-01-20  3:17 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Breno Leitao, Nathan Chancellor, Rasmus Villemoes, Zi Yan,
	ming.lei

On Fri, Jan 19, 2024 at 06:50:46PM -0800, Yury Norov wrote:
> The loop starts from the beginning every time we switch to the next
> sibling mask. This is the Schlemiel the Painter's style of coding
> because we know for sure that nmsk is clear up to current CPU, and we
> can just continue from the next CPU.
> 
> Also, we can do it nicer if leverage the dedicated for_each() iterator,
> and simplify the logic of clearing a bit in nmsk.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  lib/group_cpus.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index ee272c4cefcc..063ed9ae1b8d 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -30,14 +30,14 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  
>  		/* If the cpu has siblings, use them first */
>  		siblmsk = topology_sibling_cpumask(cpu);
> -		for (sibl = -1; cpus_per_grp > 0; ) {
> -			sibl = cpumask_next(sibl, siblmsk);
> -			if (sibl >= nr_cpu_ids)
> -				break;
> -			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> -				continue;
> +		sibl = cpu + 1;

No, it is silly to let 'sibl' point to 'cpu + 1', cause we just
want to iterate over 'siblmsk & nmsk', and nothing to do with
the next cpu('cpu + 1').

> +
> +		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> +			if (cpus_per_grp-- == 0)
> +				return;
> +
> +			cpumask_clear_cpu(sibl, nmsk);
>  			cpumask_set_cpu(sibl, irqmsk);
> -			cpus_per_grp--;

Andrew, please replace the 1st two patches with the following one:

From 7a983ee5e1b4f05e5ae26c025dffd801b909e2f3 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Sat, 20 Jan 2024 11:07:26 +0800
Subject: [PATCH] lib/group_cpus.c: simplify grp_spread_init_one()

What the inner loop needs to do is to iterate over `siblmsk & nmsk`, and
clear the cpu in 'nmsk' and set it in 'irqmsk'.

Clean it by for_each_cpu_and().

This is based on Yury Norov's patch, which needs one extra
for_each_cpu_and_from(), which is really not necessary.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 lib/group_cpus.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index ee272c4cefcc..564d8e817f65 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -30,14 +30,11 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 
 		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
-		for (sibl = -1; cpus_per_grp > 0; ) {
-			sibl = cpumask_next(sibl, siblmsk);
-			if (sibl >= nr_cpu_ids)
-				break;
-			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
-				continue;
+		for_each_cpu_and(sibl, siblmsk, nmsk) {
+			cpumask_clear_cpu(sibl, nmsk);
 			cpumask_set_cpu(sibl, irqmsk);
-			cpus_per_grp--;
+			if (--cpus_per_grp == 0)
+				return;
 		}
 	}
 }
-- 
2.42.0





Thanks,
Ming


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2024-01-20  3:17   ` Ming Lei
@ 2024-01-20  7:03     ` Ming Lei
  0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2024-01-20  7:03 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Breno Leitao, Nathan Chancellor, Rasmus Villemoes, Zi Yan

On Sat, Jan 20, 2024 at 11:17:00AM +0800, Ming Lei wrote:
> On Fri, Jan 19, 2024 at 06:50:46PM -0800, Yury Norov wrote:
> > The loop starts from the beginning every time we switch to the next
> > sibling mask. This is the Schlemiel the Painter's style of coding
> > because we know for sure that nmsk is clear up to current CPU, and we
> > can just continue from the next CPU.
> > 
> > Also, we can do it nicer if leverage the dedicated for_each() iterator,
> > and simplify the logic of clearing a bit in nmsk.
> > 
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  lib/group_cpus.c | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > index ee272c4cefcc..063ed9ae1b8d 100644
> > --- a/lib/group_cpus.c
> > +++ b/lib/group_cpus.c
> > @@ -30,14 +30,14 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> >  
> >  		/* If the cpu has siblings, use them first */
> >  		siblmsk = topology_sibling_cpumask(cpu);
> > -		for (sibl = -1; cpus_per_grp > 0; ) {
> > -			sibl = cpumask_next(sibl, siblmsk);
> > -			if (sibl >= nr_cpu_ids)
> > -				break;
> > -			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> > -				continue;
> > +		sibl = cpu + 1;
> 
> No, it is silly to let 'sibl' point to 'cpu + 1', cause we just
> want to iterate over 'siblmsk & nmsk', and nothing to do with
> the next cpu('cpu + 1').
> 
> > +
> > +		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > +			if (cpus_per_grp-- == 0)
> > +				return;
> > +
> > +			cpumask_clear_cpu(sibl, nmsk);
> >  			cpumask_set_cpu(sibl, irqmsk);
> > -			cpus_per_grp--;
> 
> Andrew, please replace the 1st two patches with the following one:
> 
> From 7a983ee5e1b4f05e5ae26c025dffd801b909e2f3 Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@redhat.com>
> Date: Sat, 20 Jan 2024 11:07:26 +0800
> Subject: [PATCH] lib/group_cpus.c: simplify grp_spread_init_one()
> 
> What the inner loop needs to do is to iterate over `siblmsk & nmsk`, and
> clear the cpu in 'nmsk' and set it in 'irqmsk'.
> 
> Clean it by for_each_cpu_and().
> 
> This is based on Yury Norov's patch, which needs one extra
> for_each_cpu_and_from(), which is really not necessary.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  lib/group_cpus.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index ee272c4cefcc..564d8e817f65 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -30,14 +30,11 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  
>  		/* If the cpu has siblings, use them first */
>  		siblmsk = topology_sibling_cpumask(cpu);
> -		for (sibl = -1; cpus_per_grp > 0; ) {
> -			sibl = cpumask_next(sibl, siblmsk);
> -			if (sibl >= nr_cpu_ids)
> -				break;
> -			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> -				continue;
> +		for_each_cpu_and(sibl, siblmsk, nmsk) {
> +			cpumask_clear_cpu(sibl, nmsk);
>  			cpumask_set_cpu(sibl, irqmsk);
> -			cpus_per_grp--;
> +			if (--cpus_per_grp == 0)
> +				return;

Iterator variable of 'nmsk' is updated inside loop, and it is still
tricky, so please ignore it, I just sent one formal & revised patch:

https://lkml.org/lkml/2024/1/20/43


Thanks,
Ming


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-01-20  7:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
2023-12-28 20:09 ` [PATCH 1/9] cpumask: introduce for_each_cpu_and_from() Yury Norov
2023-12-28 20:09 ` [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
2024-01-02  0:59   ` Ming Lei
2023-12-28 20:09 ` [PATCH 3/9] lib/group_cpus: relax atomicity requirement " Yury Norov
2023-12-29 18:39   ` Andrew Morton
2023-12-29 18:45     ` Yury Norov
2023-12-28 20:09 ` [PATCH 4/9] lib/group_cpus: optimize outer loop " Yury Norov
2023-12-28 20:09 ` [PATCH 5/9] lib/group_cpus: don't zero cpumasks in group_cpus_evenly() on allocation Yury Norov
2023-12-28 20:09 ` [PATCH 6/9] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly() Yury Norov
2023-12-28 20:09 ` [PATCH 7/9] cpumask: define cleanup function for cpumasks Yury Norov
2023-12-28 20:09 ` [PATCH 8/9] lib/group_cpus: rework group_cpus_evenly() Yury Norov
2023-12-28 20:09 ` [PATCH 9/9] lib/group_cpus: simplify group_cpus_evenly() for more Yury Norov
  -- strict thread matches above, loose matches on Subject: below --
2024-01-20  2:50 [PATCH v5 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
2024-01-20  2:50 ` [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
2024-01-20  3:17   ` Ming Lei
2024-01-20  7:03     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox