public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v3 sched_ext/for-6.13] sched_ext: split global idle cpumask into per-NUMA cpumasks
@ 2024-12-03 15:36 Andrea Righi
  2024-12-03 15:36 ` [PATCH 1/3] nodemask: Introduce for_each_node_mask_wrap/for_each_node_state_wrap() Andrea Righi
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrea Righi @ 2024-12-03 15:36 UTC (permalink / raw)
  To: Tejun Heo, David Vernet; +Cc: Yury Norov, linux-kernel

= Overview =

As discussed during the sched_ext office hours, using a global cpumask
to keep track of the idle CPUs can be inefficient and it may not scale
really well on large NUMA systems.

Therefore, split the idle cpumask into multiple per-NUMA node cpumasks
to improve scalability and performance on such large systems.

Scalability issues seem to be more noticeable on Intel Sapphire Rapids
dual-socket architectures.

= Test =

Hardware:
 - System: DGX B200
    - CPUs: 224 SMT threads (112 physical cores)
    - Processor: INTEL(R) XEON(R) PLATINUM 8570
    - 2 NUMA nodes

Scheduler:
 - scx_simple [1] (so that we can focus at the built-in idle selection
   policy and not at the scheduling policy itself)

Test:
 - Run a parallel kernel build `make -j $(nproc)` and measure the average
   elapsed time over 10 runs:

          avg time | stdev
          ---------+------
 before:   52.431s | 2.895
  after:   50.342s | 2.895

= Conclusion =

Splitting the global cpumask into multiple per-NUMA cpumasks helped to
achieve a speedup of approximately +4% with this particular architecture
and test case.

I've repeated the same test on a DGX-1 (40 physical cores, Intel Xeon
E5-2698 v4 @ 2.20GHz, 2 NUMA nodes) and I didn't observe any measurable
difference.

In general, on smaller systems, I haven't noticed any measurable
regressions or improvements with the same test (parallel kernel build)
and scheduler (scx_simple).

NOTE: splitting the global cpumask into multiple cpumasks may increase
the overhead of scx_bpf_pick_idle_cpu() or ops.select_cpu() (for
schedulers relying on the built-in CPU idle selection policy) in
presence of multiple NUMA nodes, particularly under high system load,
since we may have to access multiple cpumasks to find an idle CPU.

However, this increased overhead seems to be highly compensated by a
lower overhead when updating the idle state (__scx_update_idle()) and by
the fact that CPUs are more likely operating within their local idle
cpumask, reducing the stress on the cache coherency protocol.

= References =

[1] https://github.com/sched-ext/scx/blob/main/scheds/c/scx_simple.bpf.c

ChangeLog v2 -> v3:
  - introduce for_each_online_node_wrap()
  - re-introduce cpumask_intersects() in test_and_clear_cpu_idle() (to
    reduce memory writes / cache coherence pressure)
  - get rid of the redundant scx_selcpu_topo_numa logic
  [test results are pretty much identical, so I haven't updated them from v2]

ChangeLog v1 -> v2:
  - renamed for_each_node_mask|state_from() -> for_each_node_mask|state_wrap()
  - misc cpumask optimizations (thanks to Yury)

Andrea Righi (3):
      nodemask: Introduce for_each_node_mask_wrap/for_each_node_state_wrap()
      sched_ext: Introduce per-NUMA idle cpumasks
      sched_ext: get rid of the scx_selcpu_topo_numa logic

 include/linux/nodemask.h |  14 ++++
 kernel/sched/ext.c       | 200 ++++++++++++++++++++++++++---------------------
 2 files changed, 124 insertions(+), 90 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] nodemask: Introduce for_each_node_mask_wrap/for_each_node_state_wrap()
  2024-12-03 15:36 [PATCHSET v3 sched_ext/for-6.13] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
@ 2024-12-03 15:36 ` Andrea Righi
  2024-12-03 16:27   ` Yury Norov
  2024-12-03 15:36 ` [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks Andrea Righi
  2024-12-03 15:36 ` [PATCH 3/3] sched_ext: get rid of the scx_selcpu_topo_numa logic Andrea Righi
  2 siblings, 1 reply; 10+ messages in thread
From: Andrea Righi @ 2024-12-03 15:36 UTC (permalink / raw)
  To: Tejun Heo, David Vernet; +Cc: Yury Norov, linux-kernel

Introduce NUMA node iterators to support circular iteration, starting
from a specified node.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 include/linux/nodemask.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index b61438313a73..7ba35c65ab99 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -392,6 +392,16 @@ static inline void __nodes_fold(nodemask_t *dstp, const nodemask_t *origp,
 	for ((node) = 0; (node) < 1 && !nodes_empty(mask); (node)++)
 #endif /* MAX_NUMNODES */
 
+#if MAX_NUMNODES > 1
+#define for_each_node_mask_wrap(node, mask, start)			\
+	for_each_set_bit_wrap((node), (mask).bits, MAX_NUMNODES, (start))
+#else /* MAX_NUMNODES == 1 */
+#define for_each_node_mask_wrap(node, mask, start)			\
+	for ((node) = 0;						\
+	     (node) < 1 && !nodes_empty(mask);				\
+	     (node)++, (void)(start))
+#endif /* MAX_NUMNODES */
+
 /*
  * Bitmasks that are kept for all the nodes.
  */
@@ -441,6 +451,9 @@ static inline int num_node_state(enum node_states state)
 #define for_each_node_state(__node, __state) \
 	for_each_node_mask((__node), node_states[__state])
 
+#define for_each_node_state_wrap(__node, __state, __start) \
+	for_each_node_mask_wrap((__node), node_states[__state], __start)
+
 #define first_online_node	first_node(node_states[N_ONLINE])
 #define first_memory_node	first_node(node_states[N_MEMORY])
 static inline unsigned int next_online_node(int nid)
@@ -535,6 +548,7 @@ static inline int node_random(const nodemask_t *maskp)
 
 #define for_each_node(node)	   for_each_node_state(node, N_POSSIBLE)
 #define for_each_online_node(node) for_each_node_state(node, N_ONLINE)
+#define for_each_online_node_wrap(node, start) for_each_node_state(node, N_ONLINE, start)
 
 /*
  * For nodemask scratch area.
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks
  2024-12-03 15:36 [PATCHSET v3 sched_ext/for-6.13] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
  2024-12-03 15:36 ` [PATCH 1/3] nodemask: Introduce for_each_node_mask_wrap/for_each_node_state_wrap() Andrea Righi
@ 2024-12-03 15:36 ` Andrea Righi
  2024-12-04  0:04   ` Tejun Heo
  2024-12-03 15:36 ` [PATCH 3/3] sched_ext: get rid of the scx_selcpu_topo_numa logic Andrea Righi
  2 siblings, 1 reply; 10+ messages in thread
From: Andrea Righi @ 2024-12-03 15:36 UTC (permalink / raw)
  To: Tejun Heo, David Vernet; +Cc: Yury Norov, linux-kernel

Using a single global idle mask can lead to inefficiencies and a lot of
stress on the cache coherency protocol on large systems with multiple
NUMA nodes, since all the CPUs can create a really intense read/write
activity on the single global cpumask.

Therefore, split the global cpumask into multiple per-NUMA node cpumasks
to improve scalability and performance on large systems.

The concept is that each cpumask will track only the idle CPUs within
its corresponding NUMA node, treating CPUs in other NUMA nodes as busy.
In this way concurrent access to the idle cpumask will be restricted
within each NUMA node.

[Open issue]

The scx_bpf_get_idle_cpu/smtmask() kfunc's, that are supposed to return
a single cpumask for all the CPUs, have been changed to report only the
cpumask of the current NUMA node (using the current CPU); this breaks
the old behavior, so it can potentially introduce regressions in some
scx schedulers.

An alternative approach could be to construct a global cpumask
on-the-fly, but this could add significant overhead to ops.select_cpu()
for schedulers relying on these kfunc's. Additionally, it would be less
reliable than accessing the actual cpumasks, as the copy could quickly
become out of sync and not represent the actual idle state very well.

Probably a better way to solve this issue is to introduce new kfunc's to
explicitly select specific per-NUMA cpumask and modify the scx
schedulers to transition to this new API, for example:

  const struct cpumask *scx_bpf_get_idle_numa_cpumask(int node)
  const struct cpumask *scx_bpf_get_idle_numa_smtmask(int node)

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 159 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 114 insertions(+), 45 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 3c4a94e4258f..cff4210e9c7b 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -933,7 +933,37 @@ static struct delayed_work scx_watchdog_work;
 static struct {
 	cpumask_var_t cpu;
 	cpumask_var_t smt;
-} idle_masks CL_ALIGNED_IF_ONSTACK;
+} **idle_masks CL_ALIGNED_IF_ONSTACK;
+
+static struct cpumask *get_idle_cpumask(int cpu)
+{
+	int node = cpu_to_node(cpu);
+
+	return idle_masks[node]->cpu;
+}
+
+static struct cpumask *get_idle_smtmask(int cpu)
+{
+	int node = cpu_to_node(cpu);
+
+	return idle_masks[node]->smt;
+}
+
+static void idle_masks_init(void)
+{
+	int node;
+
+	idle_masks = kcalloc(num_possible_nodes(), sizeof(*idle_masks), GFP_KERNEL);
+	BUG_ON(!idle_masks);
+
+	for_each_node_state(node, N_POSSIBLE) {
+		idle_masks[node] = kzalloc_node(sizeof(**idle_masks), GFP_KERNEL, node);
+		BUG_ON(!idle_masks[node]);
+
+		BUG_ON(!alloc_cpumask_var_node(&idle_masks[node]->cpu, GFP_KERNEL, node));
+		BUG_ON(!alloc_cpumask_var_node(&idle_masks[node]->smt, GFP_KERNEL, node));
+	}
+}
 
 #endif	/* CONFIG_SMP */
 
@@ -3156,29 +3186,34 @@ static bool test_and_clear_cpu_idle(int cpu)
 	 */
 	if (sched_smt_active()) {
 		const struct cpumask *smt = cpu_smt_mask(cpu);
+		struct cpumask *idle_smt = get_idle_smtmask(cpu);
 
 		/*
 		 * If offline, @cpu is not its own sibling and
 		 * scx_pick_idle_cpu() can get caught in an infinite loop as
-		 * @cpu is never cleared from idle_masks.smt. Ensure that @cpu
-		 * is eventually cleared.
+		 * @cpu is never cleared from the idle SMT mask. Ensure that
+		 * @cpu is eventually cleared.
+		 *
+		 * NOTE: Use cpumask_intersects() and cpumask_test_cpu() to
+		 * reduce memory writes, which may help alleviate cache
+		 * coherence pressure.
 		 */
-		if (cpumask_intersects(smt, idle_masks.smt))
-			cpumask_andnot(idle_masks.smt, idle_masks.smt, smt);
-		else if (cpumask_test_cpu(cpu, idle_masks.smt))
-			__cpumask_clear_cpu(cpu, idle_masks.smt);
+		if (cpumask_intersects(smt, idle_smt))
+			cpumask_andnot(idle_smt, idle_smt, smt);
+		else if (cpumask_test_cpu(cpu, idle_smt))
+			__cpumask_clear_cpu(cpu, idle_smt);
 	}
 #endif
-	return cpumask_test_and_clear_cpu(cpu, idle_masks.cpu);
+	return cpumask_test_and_clear_cpu(cpu, get_idle_cpumask(cpu));
 }
 
-static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
+static s32 scx_pick_idle_cpu_from_node(int node, const struct cpumask *cpus_allowed, u64 flags)
 {
 	int cpu;
 
 retry:
 	if (sched_smt_active()) {
-		cpu = cpumask_any_and_distribute(idle_masks.smt, cpus_allowed);
+		cpu = cpumask_any_and_distribute(idle_masks[node]->smt, cpus_allowed);
 		if (cpu < nr_cpu_ids)
 			goto found;
 
@@ -3186,15 +3221,42 @@ static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
 			return -EBUSY;
 	}
 
-	cpu = cpumask_any_and_distribute(idle_masks.cpu, cpus_allowed);
-	if (cpu >= nr_cpu_ids)
-		return -EBUSY;
+	cpu = cpumask_any_and_distribute(idle_masks[node]->cpu, cpus_allowed);
+	if (cpu < nr_cpu_ids)
+		goto found;
+
+	return -EBUSY;
 
 found:
 	if (test_and_clear_cpu_idle(cpu))
 		return cpu;
 	else
 		goto retry;
+
+}
+
+static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
+{
+	int start = cpu_to_node(smp_processor_id());
+	int node, cpu;
+
+	for_each_node_state_wrap(node, N_ONLINE, start) {
+		/*
+		 * scx_pick_idle_cpu_from_node() can be expensive and redundant
+		 * if none of the CPUs in the NUMA node can be used (according
+		 * to cpus_allowed).
+		 *
+		 * Therefore, check if the NUMA node is usable in advance to
+		 * save some CPU cycles.
+		 */
+		if (!cpumask_intersects(cpumask_of_node(node), cpus_allowed))
+			continue;
+		cpu = scx_pick_idle_cpu_from_node(node, cpus_allowed, flags);
+		if (cpu >= 0)
+			return cpu;
+	}
+
+	return -EBUSY;
 }
 
 /*
@@ -3338,11 +3400,11 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 {
 	const struct cpumask *llc_cpus = NULL;
 	const struct cpumask *numa_cpus = NULL;
+	int node = cpu_to_node(prev_cpu);
 	s32 cpu;
 
 	*found = false;
 
-
 	/*
 	 * This is necessary to protect llc_cpus.
 	 */
@@ -3361,7 +3423,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 	 */
 	if (p->nr_cpus_allowed >= num_possible_cpus()) {
 		if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa))
-			numa_cpus = cpumask_of_node(cpu_to_node(prev_cpu));
+			numa_cpus = p->cpus_ptr;
 
 		if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc)) {
 			struct sched_domain *sd;
@@ -3401,9 +3463,9 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 		 * piled up on it even if there is an idle core elsewhere on
 		 * the system.
 		 */
-		if (!cpumask_empty(idle_masks.cpu) &&
-		    !(current->flags & PF_EXITING) &&
-		    cpu_rq(cpu)->scx.local_dsq.nr == 0) {
+		if (!(current->flags & PF_EXITING) &&
+		    cpu_rq(cpu)->scx.local_dsq.nr == 0 &&
+		    !cpumask_empty(get_idle_cpumask(cpu))) {
 			if (cpumask_test_cpu(cpu, p->cpus_ptr))
 				goto cpu_found;
 		}
@@ -3417,7 +3479,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 		/*
 		 * Keep using @prev_cpu if it's part of a fully idle core.
 		 */
-		if (cpumask_test_cpu(prev_cpu, idle_masks.smt) &&
+		if (cpumask_test_cpu(prev_cpu, get_idle_smtmask(prev_cpu)) &&
 		    test_and_clear_cpu_idle(prev_cpu)) {
 			cpu = prev_cpu;
 			goto cpu_found;
@@ -3427,7 +3489,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 		 * Search for any fully idle core in the same LLC domain.
 		 */
 		if (llc_cpus) {
-			cpu = scx_pick_idle_cpu(llc_cpus, SCX_PICK_IDLE_CORE);
+			cpu = scx_pick_idle_cpu_from_node(node, llc_cpus, SCX_PICK_IDLE_CORE);
 			if (cpu >= 0)
 				goto cpu_found;
 		}
@@ -3436,7 +3498,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 		 * Search for any fully idle core in the same NUMA node.
 		 */
 		if (numa_cpus) {
-			cpu = scx_pick_idle_cpu(numa_cpus, SCX_PICK_IDLE_CORE);
+			cpu = scx_pick_idle_cpu_from_node(node, numa_cpus, SCX_PICK_IDLE_CORE);
 			if (cpu >= 0)
 				goto cpu_found;
 		}
@@ -3444,7 +3506,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 		/*
 		 * Search for any full idle core usable by the task.
 		 */
-		cpu = scx_pick_idle_cpu(p->cpus_ptr, SCX_PICK_IDLE_CORE);
+		cpu = scx_pick_idle_cpu(p->cpus_ptr, prev_cpu, SCX_PICK_IDLE_CORE);
 		if (cpu >= 0)
 			goto cpu_found;
 	}
@@ -3461,7 +3523,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 	 * Search for any idle CPU in the same LLC domain.
 	 */
 	if (llc_cpus) {
-		cpu = scx_pick_idle_cpu(llc_cpus, 0);
+		cpu = scx_pick_idle_cpu_from_node(node, llc_cpus, 0);
 		if (cpu >= 0)
 			goto cpu_found;
 	}
@@ -3470,7 +3532,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 	 * Search for any idle CPU in the same NUMA node.
 	 */
 	if (numa_cpus) {
-		cpu = scx_pick_idle_cpu(numa_cpus, 0);
+		cpu = scx_pick_idle_cpu_from_node(node, numa_cpus, 0);
 		if (cpu >= 0)
 			goto cpu_found;
 	}
@@ -3478,7 +3540,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 	/*
 	 * Search for any idle CPU usable by the task.
 	 */
-	cpu = scx_pick_idle_cpu(p->cpus_ptr, 0);
+	cpu = scx_pick_idle_cpu(p->cpus_ptr, prev_cpu, 0);
 	if (cpu >= 0)
 		goto cpu_found;
 
@@ -3560,12 +3622,18 @@ static void set_cpus_allowed_scx(struct task_struct *p,
 
 static void reset_idle_masks(void)
 {
+	int node;
+
 	/*
 	 * Consider all online cpus idle. Should converge to the actual state
 	 * quickly.
 	 */
-	cpumask_copy(idle_masks.cpu, cpu_online_mask);
-	cpumask_copy(idle_masks.smt, cpu_online_mask);
+	for_each_node_state(node, N_POSSIBLE) {
+		const struct cpumask *node_mask = cpumask_of_node(node);
+
+		cpumask_and(idle_masks[node]->cpu, cpu_online_mask, node_mask);
+		cpumask_copy(idle_masks[node]->smt, idle_masks[node]->cpu);
+	}
 }
 
 void __scx_update_idle(struct rq *rq, bool idle)
@@ -3578,14 +3646,13 @@ void __scx_update_idle(struct rq *rq, bool idle)
 			return;
 	}
 
-	if (idle)
-		cpumask_set_cpu(cpu, idle_masks.cpu);
-	else
-		cpumask_clear_cpu(cpu, idle_masks.cpu);
+	assign_cpu(cpu, get_idle_cpumask(cpu), idle);
 
 #ifdef CONFIG_SCHED_SMT
 	if (sched_smt_active()) {
 		const struct cpumask *smt = cpu_smt_mask(cpu);
+		struct cpumask *idle_cpu = get_idle_cpumask(cpu);
+		struct cpumask *idle_smt = get_idle_smtmask(cpu);
 
 		if (idle) {
 			/*
@@ -3593,12 +3660,12 @@ void __scx_update_idle(struct rq *rq, bool idle)
 			 * it's only for optimization and self-correcting.
 			 */
 			for_each_cpu(cpu, smt) {
-				if (!cpumask_test_cpu(cpu, idle_masks.cpu))
+				if (!cpumask_test_cpu(cpu, idle_cpu))
 					return;
 			}
-			cpumask_or(idle_masks.smt, idle_masks.smt, smt);
+			cpumask_or(idle_smt, idle_smt, smt);
 		} else {
-			cpumask_andnot(idle_masks.smt, idle_masks.smt, smt);
+			cpumask_andnot(idle_smt, idle_smt, smt);
 		}
 	}
 #endif
@@ -3646,7 +3713,10 @@ static void rq_offline_scx(struct rq *rq)
 #else	/* CONFIG_SMP */
 
 static bool test_and_clear_cpu_idle(int cpu) { return false; }
-static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags) { return -EBUSY; }
+static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, s32 prev_cpu, u64 flags)
+{
+	return -EBUSY;
+}
 static void reset_idle_masks(void) {}
 
 #endif	/* CONFIG_SMP */
@@ -6174,8 +6244,7 @@ void __init init_sched_ext_class(void)
 
 	BUG_ON(rhashtable_init(&dsq_hash, &dsq_hash_params));
 #ifdef CONFIG_SMP
-	BUG_ON(!alloc_cpumask_var(&idle_masks.cpu, GFP_KERNEL));
-	BUG_ON(!alloc_cpumask_var(&idle_masks.smt, GFP_KERNEL));
+	idle_masks_init();
 #endif
 	scx_kick_cpus_pnt_seqs =
 		__alloc_percpu(sizeof(scx_kick_cpus_pnt_seqs[0]) * nr_cpu_ids,
@@ -7321,7 +7390,7 @@ __bpf_kfunc void scx_bpf_put_cpumask(const struct cpumask *cpumask)
 
 /**
  * scx_bpf_get_idle_cpumask - Get a referenced kptr to the idle-tracking
- * per-CPU cpumask.
+ * per-CPU cpumask of the current NUMA node.
  *
  * Returns NULL if idle tracking is not enabled, or running on a UP kernel.
  */
@@ -7333,7 +7402,7 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask(void)
 	}
 
 #ifdef CONFIG_SMP
-	return idle_masks.cpu;
+	return get_idle_cpumask(smp_processor_id());
 #else
 	return cpu_none_mask;
 #endif
@@ -7341,8 +7410,8 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask(void)
 
 /**
  * scx_bpf_get_idle_smtmask - Get a referenced kptr to the idle-tracking,
- * per-physical-core cpumask. Can be used to determine if an entire physical
- * core is free.
+ * per-physical-core cpumask of the current NUMA node. Can be used to determine
+ * if an entire physical core is free.
  *
  * Returns NULL if idle tracking is not enabled, or running on a UP kernel.
  */
@@ -7355,9 +7424,9 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_smtmask(void)
 
 #ifdef CONFIG_SMP
 	if (sched_smt_active())
-		return idle_masks.smt;
+		return get_idle_smtmask(smp_processor_id());
 	else
-		return idle_masks.cpu;
+		return get_idle_cpumask(smp_processor_id());
 #else
 	return cpu_none_mask;
 #endif
@@ -7427,7 +7496,7 @@ __bpf_kfunc s32 scx_bpf_pick_idle_cpu(const struct cpumask *cpus_allowed,
 		return -EBUSY;
 	}
 
-	return scx_pick_idle_cpu(cpus_allowed, flags);
+	return scx_pick_idle_cpu(cpus_allowed, smp_processor_id(), flags);
 }
 
 /**
@@ -7450,7 +7519,7 @@ __bpf_kfunc s32 scx_bpf_pick_any_cpu(const struct cpumask *cpus_allowed,
 	s32 cpu;
 
 	if (static_branch_likely(&scx_builtin_idle_enabled)) {
-		cpu = scx_pick_idle_cpu(cpus_allowed, flags);
+		cpu = scx_pick_idle_cpu(cpus_allowed, smp_processor_id(), flags);
 		if (cpu >= 0)
 			return cpu;
 	}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] sched_ext: get rid of the scx_selcpu_topo_numa logic
  2024-12-03 15:36 [PATCHSET v3 sched_ext/for-6.13] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
  2024-12-03 15:36 ` [PATCH 1/3] nodemask: Introduce for_each_node_mask_wrap/for_each_node_state_wrap() Andrea Righi
  2024-12-03 15:36 ` [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks Andrea Righi
@ 2024-12-03 15:36 ` Andrea Righi
  2 siblings, 0 replies; 10+ messages in thread
From: Andrea Righi @ 2024-12-03 15:36 UTC (permalink / raw)
  To: Tejun Heo, David Vernet; +Cc: Yury Norov, linux-kernel

With the introduction of separate per-NUMA node cpumasks, we
automatically track idle CPUs within each NUMA node.

This makes the special logic for determining idle CPUs in each NUMA node
redundant and unnecessary, so we can get rid of it.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 57 +++++-----------------------------------------
 1 file changed, 6 insertions(+), 51 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index cff4210e9c7b..6a91d0f5d2a3 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -886,7 +886,6 @@ static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled);
 
 #ifdef CONFIG_SMP
 static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_llc);
-static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_numa);
 #endif
 
 static struct static_key_false scx_has_op[SCX_OPI_END] =
@@ -3235,10 +3234,9 @@ static s32 scx_pick_idle_cpu_from_node(int node, const struct cpumask *cpus_allo
 
 }
 
-static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
+static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, s32 prev_cpu, u64 flags)
 {
-	int start = cpu_to_node(smp_processor_id());
-	int node, cpu;
+	int node, cpu, start = cpu_to_node(prev_cpu);
 
 	for_each_node_state_wrap(node, N_ONLINE, start) {
 		/*
@@ -3319,9 +3317,8 @@ static bool llc_numa_mismatch(void)
  */
 static void update_selcpu_topology(void)
 {
-	bool enable_llc = false, enable_numa = false;
+	bool enable_llc = false;
 	struct sched_domain *sd;
-	const struct cpumask *cpus;
 	s32 cpu = cpumask_first(cpu_online_mask);
 
 	/*
@@ -3337,37 +3334,18 @@ static void update_selcpu_topology(void)
 	rcu_read_lock();
 	sd = rcu_dereference(per_cpu(sd_llc, cpu));
 	if (sd) {
-		if (sd->span_weight < num_online_cpus())
+		if ((sd->span_weight < num_online_cpus()) && llc_numa_mismatch())
 			enable_llc = true;
 	}
-
-	/*
-	 * Enable NUMA optimization only when there are multiple NUMA domains
-	 * among the online CPUs and the NUMA domains don't perfectly overlaps
-	 * with the LLC domains.
-	 *
-	 * If all CPUs belong to the same NUMA node and the same LLC domain,
-	 * enabling both NUMA and LLC optimizations is unnecessary, as checking
-	 * for an idle CPU in the same domain twice is redundant.
-	 */
-	cpus = cpumask_of_node(cpu_to_node(cpu));
-	if ((cpumask_weight(cpus) < num_online_cpus()) && llc_numa_mismatch())
-		enable_numa = true;
 	rcu_read_unlock();
 
 	pr_debug("sched_ext: LLC idle selection %s\n",
 		 enable_llc ? "enabled" : "disabled");
-	pr_debug("sched_ext: NUMA idle selection %s\n",
-		 enable_numa ? "enabled" : "disabled");
 
 	if (enable_llc)
 		static_branch_enable_cpuslocked(&scx_selcpu_topo_llc);
 	else
 		static_branch_disable_cpuslocked(&scx_selcpu_topo_llc);
-	if (enable_numa)
-		static_branch_enable_cpuslocked(&scx_selcpu_topo_numa);
-	else
-		static_branch_disable_cpuslocked(&scx_selcpu_topo_numa);
 }
 
 /*
@@ -3388,9 +3366,8 @@ static void update_selcpu_topology(void)
  * 4. Pick a CPU within the same NUMA node, if enabled:
  *   - choose a CPU from the same NUMA node to reduce memory access latency.
  *
- * Step 3 and 4 are performed only if the system has, respectively, multiple
- * LLC domains / multiple NUMA nodes (see scx_selcpu_topo_llc and
- * scx_selcpu_topo_numa).
+ * Step 3 is performed only if the system has multiple LLC domains that are not
+ * perfectly overlapping with the NUMA domains (see scx_selcpu_topo_llc).
  *
  * NOTE: tasks that can only run on 1 CPU are excluded by this logic, because
  * we never call ops.select_cpu() for them, see select_task_rq().
@@ -3399,7 +3376,6 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 			      u64 wake_flags, bool *found)
 {
 	const struct cpumask *llc_cpus = NULL;
-	const struct cpumask *numa_cpus = NULL;
 	int node = cpu_to_node(prev_cpu);
 	s32 cpu;
 
@@ -3422,9 +3398,6 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 	 * defined by user-space.
 	 */
 	if (p->nr_cpus_allowed >= num_possible_cpus()) {
-		if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa))
-			numa_cpus = p->cpus_ptr;
-
 		if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc)) {
 			struct sched_domain *sd;
 
@@ -3494,15 +3467,6 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 				goto cpu_found;
 		}
 
-		/*
-		 * Search for any fully idle core in the same NUMA node.
-		 */
-		if (numa_cpus) {
-			cpu = scx_pick_idle_cpu_from_node(node, numa_cpus, SCX_PICK_IDLE_CORE);
-			if (cpu >= 0)
-				goto cpu_found;
-		}
-
 		/*
 		 * Search for any full idle core usable by the task.
 		 */
@@ -3528,15 +3492,6 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 			goto cpu_found;
 	}
 
-	/*
-	 * Search for any idle CPU in the same NUMA node.
-	 */
-	if (numa_cpus) {
-		cpu = scx_pick_idle_cpu_from_node(node, numa_cpus, 0);
-		if (cpu >= 0)
-			goto cpu_found;
-	}
-
 	/*
 	 * Search for any idle CPU usable by the task.
 	 */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] nodemask: Introduce for_each_node_mask_wrap/for_each_node_state_wrap()
  2024-12-03 15:36 ` [PATCH 1/3] nodemask: Introduce for_each_node_mask_wrap/for_each_node_state_wrap() Andrea Righi
@ 2024-12-03 16:27   ` Yury Norov
  0 siblings, 0 replies; 10+ messages in thread
From: Yury Norov @ 2024-12-03 16:27 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Tejun Heo, David Vernet, linux-kernel

On Tue, Dec 03, 2024 at 04:36:10PM +0100, Andrea Righi wrote:
> Introduce NUMA node iterators to support circular iteration, starting
> from a specified node.
> 
> Signed-off-by: Andrea Righi <arighi@nvidia.com>

Acked-by: Yury Norov <yury.norov@gmail.com>

> ---
>  include/linux/nodemask.h | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
> index b61438313a73..7ba35c65ab99 100644
> --- a/include/linux/nodemask.h
> +++ b/include/linux/nodemask.h
> @@ -392,6 +392,16 @@ static inline void __nodes_fold(nodemask_t *dstp, const nodemask_t *origp,
>  	for ((node) = 0; (node) < 1 && !nodes_empty(mask); (node)++)
>  #endif /* MAX_NUMNODES */
>  
> +#if MAX_NUMNODES > 1
> +#define for_each_node_mask_wrap(node, mask, start)			\
> +	for_each_set_bit_wrap((node), (mask).bits, MAX_NUMNODES, (start))
> +#else /* MAX_NUMNODES == 1 */
> +#define for_each_node_mask_wrap(node, mask, start)			\
> +	for ((node) = 0;						\
> +	     (node) < 1 && !nodes_empty(mask);				\
> +	     (node)++, (void)(start))
> +#endif /* MAX_NUMNODES */
> +
>  /*
>   * Bitmasks that are kept for all the nodes.
>   */
> @@ -441,6 +451,9 @@ static inline int num_node_state(enum node_states state)
>  #define for_each_node_state(__node, __state) \
>  	for_each_node_mask((__node), node_states[__state])
>  
> +#define for_each_node_state_wrap(__node, __state, __start) \
> +	for_each_node_mask_wrap((__node), node_states[__state], __start)
> +
>  #define first_online_node	first_node(node_states[N_ONLINE])
>  #define first_memory_node	first_node(node_states[N_MEMORY])
>  static inline unsigned int next_online_node(int nid)
> @@ -535,6 +548,7 @@ static inline int node_random(const nodemask_t *maskp)
>  
>  #define for_each_node(node)	   for_each_node_state(node, N_POSSIBLE)
>  #define for_each_online_node(node) for_each_node_state(node, N_ONLINE)
> +#define for_each_online_node_wrap(node, start) for_each_node_state(node, N_ONLINE, start)
>  
>  /*
>   * For nodemask scratch area.
> -- 
> 2.47.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks
  2024-12-03 15:36 ` [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks Andrea Righi
@ 2024-12-04  0:04   ` Tejun Heo
  2024-12-04  0:38     ` Yury Norov
  2024-12-04  8:41     ` Andrea Righi
  0 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2024-12-04  0:04 UTC (permalink / raw)
  To: Andrea Righi; +Cc: David Vernet, Yury Norov, linux-kernel

Hello,

On Tue, Dec 03, 2024 at 04:36:11PM +0100, Andrea Righi wrote:
...
> Probably a better way to solve this issue is to introduce new kfunc's to
> explicitly select specific per-NUMA cpumask and modify the scx
> schedulers to transition to this new API, for example:
> 
>   const struct cpumask *scx_bpf_get_idle_numa_cpumask(int node)
>   const struct cpumask *scx_bpf_get_idle_numa_smtmask(int node)

Yeah, I don't think we want to break backward compat here. Can we introduce
a flag to switch between node-aware and flattened logic and trigger ops
error if the wrong flavor is used? Then, we can deprecate and drop the old
behavior after a few releases. Also, I think it can be named
scx_bpf_get_idle_cpumask_node().

> +static struct cpumask *get_idle_cpumask(int cpu)
> +{
> +	int node = cpu_to_node(cpu);
> +
> +	return idle_masks[node]->cpu;
> +}
> +
> +static struct cpumask *get_idle_smtmask(int cpu)
> +{
> +	int node = cpu_to_node(cpu);
> +
> +	return idle_masks[node]->smt;
> +}

Hmm... why are they keyed by cpu? Wouldn't it make more sense to key them by
node?

> +static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
> +{
> +	int start = cpu_to_node(smp_processor_id());
> +	int node, cpu;
> +
> +	for_each_node_state_wrap(node, N_ONLINE, start) {
> +		/*
> +		 * scx_pick_idle_cpu_from_node() can be expensive and redundant
> +		 * if none of the CPUs in the NUMA node can be used (according
> +		 * to cpus_allowed).
> +		 *
> +		 * Therefore, check if the NUMA node is usable in advance to
> +		 * save some CPU cycles.
> +		 */
> +		if (!cpumask_intersects(cpumask_of_node(node), cpus_allowed))
> +			continue;
> +		cpu = scx_pick_idle_cpu_from_node(node, cpus_allowed, flags);
> +		if (cpu >= 0)
> +			return cpu;

This is fine for now but it'd be ideal if the iteration is in inter-node
distance order so that each CPU radiates from local node to the furthest
ones.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks
  2024-12-04  0:04   ` Tejun Heo
@ 2024-12-04  0:38     ` Yury Norov
  2024-12-04  8:47       ` Andrea Righi
  2024-12-04  8:41     ` Andrea Righi
  1 sibling, 1 reply; 10+ messages in thread
From: Yury Norov @ 2024-12-04  0:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrea Righi, David Vernet, linux-kernel

On Tue, Dec 03, 2024 at 02:04:15PM -1000, Tejun Heo wrote:
> Hello,
> 
> On Tue, Dec 03, 2024 at 04:36:11PM +0100, Andrea Righi wrote:
> ...
> > Probably a better way to solve this issue is to introduce new kfunc's to
> > explicitly select specific per-NUMA cpumask and modify the scx
> > schedulers to transition to this new API, for example:
> > 
> >   const struct cpumask *scx_bpf_get_idle_numa_cpumask(int node)
> >   const struct cpumask *scx_bpf_get_idle_numa_smtmask(int node)
> 
> Yeah, I don't think we want to break backward compat here. Can we introduce
> a flag to switch between node-aware and flattened logic and trigger ops
> error if the wrong flavor is used? Then, we can deprecate and drop the old
> behavior after a few releases. Also, I think it can be named
> scx_bpf_get_idle_cpumask_node().
> 
> > +static struct cpumask *get_idle_cpumask(int cpu)
> > +{
> > +	int node = cpu_to_node(cpu);
> > +
> > +	return idle_masks[node]->cpu;
> > +}
> > +
> > +static struct cpumask *get_idle_smtmask(int cpu)
> > +{
> > +	int node = cpu_to_node(cpu);
> > +
> > +	return idle_masks[node]->smt;
> > +}
> 
> Hmm... why are they keyed by cpu? Wouldn't it make more sense to key them by
> node?
> 
> > +static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
> > +{
> > +	int start = cpu_to_node(smp_processor_id());
> > +	int node, cpu;
> > +
> > +	for_each_node_state_wrap(node, N_ONLINE, start) {
> > +		/*
> > +		 * scx_pick_idle_cpu_from_node() can be expensive and redundant
> > +		 * if none of the CPUs in the NUMA node can be used (according
> > +		 * to cpus_allowed).
> > +		 *
> > +		 * Therefore, check if the NUMA node is usable in advance to
> > +		 * save some CPU cycles.
> > +		 */
> > +		if (!cpumask_intersects(cpumask_of_node(node), cpus_allowed))
> > +			continue;
> > +		cpu = scx_pick_idle_cpu_from_node(node, cpus_allowed, flags);
> > +		if (cpu >= 0)
> > +			return cpu;
> 
> This is fine for now but it'd be ideal if the iteration is in inter-node
> distance order so that each CPU radiates from local node to the furthest
> ones.

cpumask_local_spread() does exactly that - traverses CPUs in NUMA-aware
order. Or we can use for_each_numa_hop_mask() iterator, which does the
same thing more efficiently.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks
  2024-12-04  0:04   ` Tejun Heo
  2024-12-04  0:38     ` Yury Norov
@ 2024-12-04  8:41     ` Andrea Righi
  2024-12-04 18:53       ` Tejun Heo
  1 sibling, 1 reply; 10+ messages in thread
From: Andrea Righi @ 2024-12-04  8:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: David Vernet, Yury Norov, linux-kernel

On Tue, Dec 03, 2024 at 02:04:15PM -1000, Tejun Heo wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hello,
> 
> On Tue, Dec 03, 2024 at 04:36:11PM +0100, Andrea Righi wrote:
> ...
> > Probably a better way to solve this issue is to introduce new kfunc's to
> > explicitly select specific per-NUMA cpumask and modify the scx
> > schedulers to transition to this new API, for example:
> >
> >   const struct cpumask *scx_bpf_get_idle_numa_cpumask(int node)
> >   const struct cpumask *scx_bpf_get_idle_numa_smtmask(int node)
> 
> Yeah, I don't think we want to break backward compat here. Can we introduce
> a flag to switch between node-aware and flattened logic and trigger ops
> error if the wrong flavor is used? Then, we can deprecate and drop the old
> behavior after a few releases. Also, I think it can be named
> scx_bpf_get_idle_cpumask_node().

I like the idea of introducing a flag. The default should be flattened
cpumask, so everything remains the same, and if a scheduler explicitly
enables SCX_OPS_NUMA_IDLE_MASK (suggestions for the name?) we can switch
to the NUMA-aware idle logic.

> 
> > +static struct cpumask *get_idle_cpumask(int cpu)
> > +{
> > +     int node = cpu_to_node(cpu);
> > +
> > +     return idle_masks[node]->cpu;
> > +}
> > +
> > +static struct cpumask *get_idle_smtmask(int cpu)
> > +{
> > +     int node = cpu_to_node(cpu);
> > +
> > +     return idle_masks[node]->smt;
> > +}
> 
> Hmm... why are they keyed by cpu? Wouldn't it make more sense to key them by
> node?

I was trying to save some code, but it's definitely more clear to use
node as key and rename those get_idle_cpumask_node() /
get_idle_smtmask_node(). Will change this.

> 
> > +static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
> > +{
> > +     int start = cpu_to_node(smp_processor_id());
> > +     int node, cpu;
> > +
> > +     for_each_node_state_wrap(node, N_ONLINE, start) {
> > +             /*
> > +              * scx_pick_idle_cpu_from_node() can be expensive and redundant
> > +              * if none of the CPUs in the NUMA node can be used (according
> > +              * to cpus_allowed).
> > +              *
> > +              * Therefore, check if the NUMA node is usable in advance to
> > +              * save some CPU cycles.
> > +              */
> > +             if (!cpumask_intersects(cpumask_of_node(node), cpus_allowed))
> > +                     continue;
> > +             cpu = scx_pick_idle_cpu_from_node(node, cpus_allowed, flags);
> > +             if (cpu >= 0)
> > +                     return cpu;
> 
> This is fine for now but it'd be ideal if the iteration is in inter-node
> distance order so that each CPU radiates from local node to the furthest
> ones.

Ok.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks
  2024-12-04  0:38     ` Yury Norov
@ 2024-12-04  8:47       ` Andrea Righi
  0 siblings, 0 replies; 10+ messages in thread
From: Andrea Righi @ 2024-12-04  8:47 UTC (permalink / raw)
  To: Yury Norov; +Cc: Tejun Heo, David Vernet, linux-kernel

On Tue, Dec 03, 2024 at 04:38:58PM -0800, Yury Norov wrote:
> On Tue, Dec 03, 2024 at 02:04:15PM -1000, Tejun Heo wrote:
...
> > > +static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
> > > +{
> > > +   int start = cpu_to_node(smp_processor_id());
> > > +   int node, cpu;
> > > +
> > > +   for_each_node_state_wrap(node, N_ONLINE, start) {
> > > +           /*
> > > +            * scx_pick_idle_cpu_from_node() can be expensive and redundant
> > > +            * if none of the CPUs in the NUMA node can be used (according
> > > +            * to cpus_allowed).
> > > +            *
> > > +            * Therefore, check if the NUMA node is usable in advance to
> > > +            * save some CPU cycles.
> > > +            */
> > > +           if (!cpumask_intersects(cpumask_of_node(node), cpus_allowed))
> > > +                   continue;
> > > +           cpu = scx_pick_idle_cpu_from_node(node, cpus_allowed, flags);
> > > +           if (cpu >= 0)
> > > +                   return cpu;
> >
> > This is fine for now but it'd be ideal if the iteration is in inter-node
> > distance order so that each CPU radiates from local node to the furthest
> > ones.
> 
> cpumask_local_spread() does exactly that - traverses CPUs in NUMA-aware
> order. Or we can use for_each_numa_hop_mask() iterator, which does the
> same thing more efficiently.

Nice, for_each_numa_hop_mask() seems to be exactly what I need, there's
also a starting node, so with that we don't need to introduce
for_each_online_node_wrap() and the other new *_wrap() helpers.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks
  2024-12-04  8:41     ` Andrea Righi
@ 2024-12-04 18:53       ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2024-12-04 18:53 UTC (permalink / raw)
  To: Andrea Righi; +Cc: David Vernet, Yury Norov, linux-kernel

Hello,

On Wed, Dec 04, 2024 at 09:41:43AM +0100, Andrea Righi wrote:
...
> I like the idea of introducing a flag. The default should be flattened
> cpumask, so everything remains the same, and if a scheduler explicitly
> enables SCX_OPS_NUMA_IDLE_MASK (suggestions for the name?) we can switch
> to the NUMA-aware idle logic.

I think it generally works better to use "node" instead of "numa" for these
interfaces. That's more common and matches the interface functions that need
to be used and exposed, so maybe sth like SCX_OPS_BUILTIN_IDLE_PER_NODE or
SCX_OPS_NODE_BUILTIN_IDLE?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-12-04 18:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-03 15:36 [PATCHSET v3 sched_ext/for-6.13] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2024-12-03 15:36 ` [PATCH 1/3] nodemask: Introduce for_each_node_mask_wrap/for_each_node_state_wrap() Andrea Righi
2024-12-03 16:27   ` Yury Norov
2024-12-03 15:36 ` [PATCH 2/3] sched_ext: Introduce per-NUMA idle cpumasks Andrea Righi
2024-12-04  0:04   ` Tejun Heo
2024-12-04  0:38     ` Yury Norov
2024-12-04  8:47       ` Andrea Righi
2024-12-04  8:41     ` Andrea Righi
2024-12-04 18:53       ` Tejun Heo
2024-12-03 15:36 ` [PATCH 3/3] sched_ext: get rid of the scx_selcpu_topo_numa logic Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox