public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1)
@ 2023-11-01 22:34 Yury Norov
  0 siblings, 0 replies; 7+ messages in thread
From: Yury Norov @ 2023-11-01 22:34 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

grp_spread_init_one() implementation is sub-optimal now because it
traverses bitmaps from the beginning, instead of picking from the
previous iteration.

Fix it and use find_bit API where appropriate.

Yury Norov (4):
  cpumask: introduce for_each_cpu_and_from()
  lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  lib/group_cpus: optimize inner loop in grp_spread_init_one()
  lib/group_cpus: optimize outer loop in grp_spread_init_one()

 include/linux/cpumask.h | 11 +++++++++++
 include/linux/find.h    |  3 +++
 lib/group_cpus.c        | 29 ++++++++++++-----------------
 3 files changed, 26 insertions(+), 17 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1)
@ 2023-11-01 22:58 Yury Norov
  2023-11-01 22:58 ` [PATCH 1/4] cpumask: introduce for_each_cpu_and_from() Yury Norov
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Yury Norov @ 2023-11-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

grp_spread_init_one() implementation is sub-optimal now because it
traverses bitmaps from the beginning, instead of picking from the
previous iteration.

Fix it and use find_bit API where appropriate.

Yury Norov (4):
  cpumask: introduce for_each_cpu_and_from()
  lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  lib/group_cpus: optimize inner loop in grp_spread_init_one()
  lib/group_cpus: optimize outer loop in grp_spread_init_one()

 include/linux/cpumask.h | 11 +++++++++++
 include/linux/find.h    |  3 +++
 lib/group_cpus.c        | 29 ++++++++++++-----------------
 3 files changed, 26 insertions(+), 17 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] cpumask: introduce for_each_cpu_and_from()
  2023-11-01 22:58 [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
@ 2023-11-01 22:58 ` Yury Norov
  2023-11-01 22:58 ` [PATCH 2/4] lib/group_cpus: relax atomicity requirement in grp_spread_init_one() Yury Norov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Yury Norov @ 2023-11-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Similarly to for_each_cpu_and(), introduce a for_each_cpu_and_from(),
which is handy when it's needed to traverse 2 cpumasks or bitmaps,
starting from a given position.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/cpumask.h | 11 +++++++++++
 include/linux/find.h    |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f10fb87d49db..fc9f9d86229a 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -332,6 +332,17 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta
 #define for_each_cpu_and(cpu, mask1, mask2)				\
 	for_each_and_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits)
 
+/**
+ * for_each_cpu_and_from - iterate over every cpu in both masks starting from a given cpu
+ * @cpu: the (optionally unsigned) integer iterator
+ * @mask1: the first cpumask pointer
+ * @mask2: the second cpumask pointer
+ *
+ * After the loop, cpu is >= nr_cpu_ids.
+ */
+#define for_each_cpu_and_from(cpu, mask1, mask2)				\
+	for_each_and_bit_from(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits)
+
 /**
  * for_each_cpu_andnot - iterate over every cpu present in one mask, excluding
  *			 those present in another.
diff --git a/include/linux/find.h b/include/linux/find.h
index 5e4f39ef2e72..dfd3d51ff590 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -563,6 +563,9 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 	     (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
 	     (bit)++)
 
+#define for_each_and_bit_from(bit, addr1, addr2, size) \
+	for (; (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size); (bit)++)
+
 #define for_each_andnot_bit(bit, addr1, addr2, size) \
 	for ((bit) = 0;									\
 	     (bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-11-01 22:58 [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
  2023-11-01 22:58 ` [PATCH 1/4] cpumask: introduce for_each_cpu_and_from() Yury Norov
@ 2023-11-01 22:58 ` Yury Norov
  2023-11-01 22:58 ` [PATCH 3/4] lib/group_cpus: optimize inner loop " Yury Norov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Yury Norov @ 2023-11-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Because the function is called with cpus_read_lock() held, all the masks
involved are stabilized, and thus any extra atomicity is not required.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index aa3f6815bb12..b1e7d9cc9da9 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -24,8 +24,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 		if (cpu >= nr_cpu_ids)
 			return;
 
-		cpumask_clear_cpu(cpu, nmsk);
-		cpumask_set_cpu(cpu, irqmsk);
+		__cpumask_clear_cpu(cpu, nmsk);
+		__cpumask_set_cpu(cpu, irqmsk);
 		cpus_per_grp--;
 
 		/* If the cpu has siblings, use them first */
@@ -34,9 +34,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 			sibl = cpumask_next(sibl, siblmsk);
 			if (sibl >= nr_cpu_ids)
 				break;
-			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
-				continue;
-			cpumask_set_cpu(sibl, irqmsk);
+			__cpumask_clear_cpu(sibl, nmsk);
+			__cpumask_set_cpu(sibl, irqmsk);
 			cpus_per_grp--;
 		}
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2023-11-01 22:58 [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
  2023-11-01 22:58 ` [PATCH 1/4] cpumask: introduce for_each_cpu_and_from() Yury Norov
  2023-11-01 22:58 ` [PATCH 2/4] lib/group_cpus: relax atomicity requirement in grp_spread_init_one() Yury Norov
@ 2023-11-01 22:58 ` Yury Norov
  2023-11-01 22:58 ` [PATCH 4/4] lib/group_cpus: optimize outer " Yury Norov
  2023-12-04 22:01 ` [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
  4 siblings, 0 replies; 7+ messages in thread
From: Yury Norov @ 2023-11-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

The loop starts from the beginning every time we switch to the next
sibling mask. This is the Schlemiel the Painter's style of coding
because we know for sure that nmsk is clear up to current CPU, and we
can just continue from the next CPU.

Also, we can do it nicer if leverage the dedicated for_each() iterator.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index b1e7d9cc9da9..8afe0c71d204 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -30,13 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 
 		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
-		for (sibl = -1; cpus_per_grp > 0; ) {
-			sibl = cpumask_next(sibl, siblmsk);
-			if (sibl >= nr_cpu_ids)
-				break;
+		sibl = cpu + 1;
+
+		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
 			__cpumask_clear_cpu(sibl, nmsk);
 			__cpumask_set_cpu(sibl, irqmsk);
-			cpus_per_grp--;
+			if (cpus_per_grp-- == 0)
+				return;
 		}
 	}
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] lib/group_cpus: optimize outer loop in grp_spread_init_one()
  2023-11-01 22:58 [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (2 preceding siblings ...)
  2023-11-01 22:58 ` [PATCH 3/4] lib/group_cpus: optimize inner loop " Yury Norov
@ 2023-11-01 22:58 ` Yury Norov
  2023-12-04 22:01 ` [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
  4 siblings, 0 replies; 7+ messages in thread
From: Yury Norov @ 2023-11-01 22:58 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Similarly to the inner loop, in the outer loop we can use for_each_cpu()
macro, and skip cpus that have been copied.

With this patch, the function becomes O(1), despite that it's a
double-loop.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 8afe0c71d204..2442b38cbf37 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -17,16 +17,11 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 	const struct cpumask *siblmsk;
 	int cpu, sibl;
 
-	for ( ; cpus_per_grp > 0; ) {
-		cpu = cpumask_first(nmsk);
-
-		/* Should not happen, but I'm too lazy to think about it */
-		if (cpu >= nr_cpu_ids)
-			return;
-
+	for_each_cpu(cpu, nmsk) {
 		__cpumask_clear_cpu(cpu, nmsk);
 		__cpumask_set_cpu(cpu, irqmsk);
-		cpus_per_grp--;
+		if (cpus_per_grp-- == 0)
+			return;
 
 		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
@@ -37,6 +32,7 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 			__cpumask_set_cpu(sibl, irqmsk);
 			if (cpus_per_grp-- == 0)
 				return;
+			cpu = sibl + 1;
 		}
 	}
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1)
  2023-11-01 22:58 [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (3 preceding siblings ...)
  2023-11-01 22:58 ` [PATCH 4/4] lib/group_cpus: optimize outer " Yury Norov
@ 2023-12-04 22:01 ` Yury Norov
  4 siblings, 0 replies; 7+ messages in thread
From: Yury Norov @ 2023-12-04 22:01 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel; +Cc: Andy Shevchenko, Rasmus Villemoes

Ping?

On Wed, Nov 01, 2023 at 03:58:16PM -0700, Yury Norov wrote:
> grp_spread_init_one() implementation is sub-optimal now because it
> traverses bitmaps from the beginning, instead of picking from the
> previous iteration.
> 
> Fix it and use find_bit API where appropriate.
> 
> Yury Norov (4):
>   cpumask: introduce for_each_cpu_and_from()
>   lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
>   lib/group_cpus: optimize inner loop in grp_spread_init_one()
>   lib/group_cpus: optimize outer loop in grp_spread_init_one()
> 
>  include/linux/cpumask.h | 11 +++++++++++
>  include/linux/find.h    |  3 +++
>  lib/group_cpus.c        | 29 ++++++++++++-----------------
>  3 files changed, 26 insertions(+), 17 deletions(-)
> 
> -- 
> 2.39.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-12-04 22:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-01 22:58 [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
2023-11-01 22:58 ` [PATCH 1/4] cpumask: introduce for_each_cpu_and_from() Yury Norov
2023-11-01 22:58 ` [PATCH 2/4] lib/group_cpus: relax atomicity requirement in grp_spread_init_one() Yury Norov
2023-11-01 22:58 ` [PATCH 3/4] lib/group_cpus: optimize inner loop " Yury Norov
2023-11-01 22:58 ` [PATCH 4/4] lib/group_cpus: optimize outer " Yury Norov
2023-12-04 22:01 ` [PATCH 0/4] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
  -- strict thread matches above, loose matches on Subject: below --
2023-11-01 22:34 Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox