public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1)
@ 2024-01-20  2:50 Yury Norov
  2024-01-20  2:50 ` [PATCH 1/9] cpumask: introduce for_each_cpu_and_from() Yury Norov
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Yury Norov @ 2024-01-20  2:50 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Breno Leitao, Nathan Chancellor,
	Rasmus Villemoes, Zi Yan

grp_spread_init_one() implementation is sub-optimal because it
traverses bitmaps from the beginning, instead of picking from the
previous iteration.

Fix it and use find_bit API where appropriate. While here, optimize
cpumasks allocation and drop unneeded cpumask_empty() call.

---
v1: https://lore.kernel.org/all/ZW5MI3rKQueLM0Bz@yury-ThinkPad/T/
v2: https://lore.kernel.org/lkml/ZXKNVRu3AfvjaFhK@fedora/T/
v3: https://lore.kernel.org/lkml/20231212042108.682072-7-yury.norov@gmail.com/T/
v4: https://lore.kernel.org/lkml/20231228200936.2475595-1-yury.norov@gmail.com/T/
v5: add CPUMASK_NULL macro and use it to initialize cpumask_var_t
    variables properly.

On cpumask_var_t initialization issue:

The idea of having different types behind the same typedef has been
considered nasty for quite a while. See a comment in include/linux/cpumask.h
for example.

Now that I'm trying to adopt kernel cleanup machinery to cpumasks, it
reveals another disadvantage of this approach - there's no way to assign
a cpumask_var_t variable at declaration time, which is required by
cleanup implementation.

To fix that, in v5 I added a CPUMASK_NULL macro as a workaround. This
CPUMASK_NULL would be also useful for those converting existing codebase
to enable cleanup variables.

On a long term, it's better to drop CPUMASK_OFFSTACK entirely. Moreover,
it's used only on Power and x86 machines if NR_CPUS >= 8K (unless people
enable it explicitly, and nobody bothers doing that in a real life). But
it requires some more discussions with Power and x64 people...

Meanwhile, I'm going to submit a patchset that deprecates cpumask_var_t,
and adds a new set of allocators which would support initialization at
declaration time.

Yury Norov (9):
  cpumask: introduce for_each_cpu_and_from()
  lib/group_cpus: optimize inner loop in grp_spread_init_one()
  lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  lib/group_cpus: optimize outer loop in grp_spread_init_one()
  lib/group_cpus: don't zero cpumasks in group_cpus_evenly() on
    allocation
  lib/group_cpus: drop unneeded cpumask_empty() call in
    __group_cpus_evenly()
  cpumask: define cleanup function for cpumasks
  lib/group_cpus: rework group_cpus_evenly()
  lib/group_cpus: simplify group_cpus_evenly() for more

 include/linux/cpumask.h |  16 ++++++
 include/linux/find.h    |   3 ++
 lib/group_cpus.c        | 110 ++++++++++++++++------------------------
 3 files changed, 62 insertions(+), 67 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 18+ messages in thread
* [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1)
@ 2023-12-28 20:09 Yury Norov
  2023-12-28 20:09 ` [PATCH 1/9] cpumask: introduce for_each_cpu_and_from() Yury Norov
  0 siblings, 1 reply; 18+ messages in thread
From: Yury Norov @ 2023-12-28 20:09 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Hi Andrew, Ming,

Now that we've got a couple more weeks, let's try to merge this series in
the upcoming merge window? In addition to addressing the v3 comments, in
v4 I added a few more patches that simplify the code for more by using
cleanup machinery.

Thanks,
	Yury
--

grp_spread_init_one() implementation is sub-optimal because it
traverses bitmaps from the beginning, instead of picking from the
previous iteration.

Fix it and use find_bit API where appropriate. While here, optimize
cpumasks allocation and drop unneeded cpumask_empty() call.

---
v1: https://lore.kernel.org/all/ZW5MI3rKQueLM0Bz@yury-ThinkPad/T/
v2: https://lore.kernel.org/lkml/ZXKNVRu3AfvjaFhK@fedora/T/
v3: https://lore.kernel.org/all/ZYnD4Bp8R9oIz19s@yury-ThinkPad/T/
v4:
 - drop patch v3 #7 and add a comment in patch #4;
 - patch #2: fix cpus_per_grp decrement order;
 - patch #3: add NAK fro Ming Lei;
 - add patches #7-9 that simplify the code for more.

Yury Norov (9):
  cpumask: introduce for_each_cpu_and_from()
  lib/group_cpus: optimize inner loop in grp_spread_init_one()
  lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  lib/group_cpus: optimize outer loop in grp_spread_init_one()
  lib/group_cpus: don't zero cpumasks in group_cpus_evenly() on
    allocation
  lib/group_cpus: drop unneeded cpumask_empty() call in
    __group_cpus_evenly()
  cpumask: define cleanup function for cpumasks
  lib/group_cpus: rework group_cpus_evenly()
  lib/group_cpus: simplify group_cpus_evenly() for more

 include/linux/cpumask.h |  14 ++++++
 include/linux/find.h    |   3 ++
 lib/group_cpus.c        | 104 +++++++++++++++++-----------------------
 3 files changed, 62 insertions(+), 59 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-01-22  2:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-20  2:50 [PATCH v5 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
2024-01-20  2:50 ` [PATCH 1/9] cpumask: introduce for_each_cpu_and_from() Yury Norov
2024-01-20  3:03   ` Ming Lei
2024-01-21 19:50     ` Yury Norov
2024-01-22  2:41       ` Ming Lei
2024-01-20  2:50 ` [PATCH 2/9] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
2024-01-20  3:17   ` Ming Lei
2024-01-20  7:03     ` Ming Lei
2024-01-20  2:50 ` [PATCH 3/9] lib/group_cpus: relax atomicity requirement " Yury Norov
2024-01-20  2:50 ` [PATCH 4/9] lib/group_cpus: optimize outer loop " Yury Norov
2024-01-20  3:51   ` Ming Lei
2024-01-20  6:17     ` Ming Lei
2024-01-20  2:50 ` [PATCH 5/9] lib/group_cpus: don't zero cpumasks in group_cpus_evenly() on allocation Yury Norov
2024-01-20  2:50 ` [PATCH 6/9] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly() Yury Norov
2024-01-20  2:50 ` [PATCH 7/9] cpumask: define cleanup function for cpumasks Yury Norov
2024-01-20  2:50 ` [PATCH 8/9] lib/group_cpus: rework group_cpus_evenly() Yury Norov
2024-01-20  2:50 ` [PATCH 9/9] lib/group_cpus: simplify group_cpus_evenly() for more Yury Norov
  -- strict thread matches above, loose matches on Subject: below --
2023-12-28 20:09 [PATCH v4 0/9] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
2023-12-28 20:09 ` [PATCH 1/9] cpumask: introduce for_each_cpu_and_from() Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox