Re: [PATCH 07/10] sched_ext: Introduce per-node idle cpumasks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yury Norov <yury.norov@gmail.com>
To: Andrea Righi <arighi@nvidia.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 07/10] sched_ext: Introduce per-node idle cpumasks
Date: Mon, 23 Dec 2024 20:05:53 -0800	[thread overview]
Message-ID: <Z2ozISbYmWPj7VNA@yury-ThinkPad> (raw)
In-Reply-To: <20241220154107.287478-8-arighi@nvidia.com>

On Fri, Dec 20, 2024 at 04:11:39PM +0100, Andrea Righi wrote:
> Using a single global idle mask can lead to inefficiencies and a lot of
> stress on the cache coherency protocol on large systems with multiple
> NUMA nodes, since all the CPUs can create a really intense read/write
> activity on the single global cpumask.
> 
> Therefore, split the global cpumask into multiple per-NUMA node cpumasks
> to improve scalability and performance on large systems.
> 
> The concept is that each cpumask will track only the idle CPUs within
> its corresponding NUMA node, treating CPUs in other NUMA nodes as busy.
> In this way concurrent access to the idle cpumask will be restricted
> within each NUMA node.
> 
> NOTE: if a scheduler enables the per-node idle cpumasks (via
> SCX_OPS_BUILTIN_IDLE_PER_NODE), scx_bpf_get_idle_cpu/smtmask() will
> trigger an scx error, since there are no system-wide cpumasks.
> 
> By default (when SCX_OPS_BUILTIN_IDLE_PER_NODE is not enabled), only the
> cpumask of node 0 is used as a single global flat CPU mask, maintaining
> the previous behavior.
> 
> Signed-off-by: Andrea Righi <arighi@nvidia.com>

This is a rather big patch... Can you split it somehow? Maybe
introduce new functions in a separate patch, and use them in the
following patch(es)?

> ---
>  kernel/sched/ext.c      |   7 +-
>  kernel/sched/ext_idle.c | 258 +++++++++++++++++++++++++++++++---------
>  2 files changed, 208 insertions(+), 57 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 148ec04d4a0a..143938e935f1 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3228,7 +3228,7 @@ static void handle_hotplug(struct rq *rq, bool online)
>  	atomic_long_inc(&scx_hotplug_seq);
>  
>  	if (scx_enabled())
> -		update_selcpu_topology();
> +		update_selcpu_topology(&scx_ops);
>  
>  	if (online && SCX_HAS_OP(cpu_online))
>  		SCX_CALL_OP(SCX_KF_UNLOCKED, cpu_online, cpu);
> @@ -5107,7 +5107,7 @@ static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
>  
>  	check_hotplug_seq(ops);
>  #ifdef CONFIG_SMP
> -	update_selcpu_topology();
> +	update_selcpu_topology(ops);
>  #endif
>  	cpus_read_unlock();
>  
> @@ -5800,8 +5800,7 @@ void __init init_sched_ext_class(void)
>  
>  	BUG_ON(rhashtable_init(&dsq_hash, &dsq_hash_params));
>  #ifdef CONFIG_SMP
> -	BUG_ON(!alloc_cpumask_var(&idle_masks.cpu, GFP_KERNEL));
> -	BUG_ON(!alloc_cpumask_var(&idle_masks.smt, GFP_KERNEL));
> +	idle_masks_init();
>  #endif
>  	scx_kick_cpus_pnt_seqs =
>  		__alloc_percpu(sizeof(scx_kick_cpus_pnt_seqs[0]) * nr_cpu_ids,
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index 4952e2793304..444f2a15f1d4 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -10,7 +10,14 @@
>   * Copyright (c) 2024 Andrea Righi <arighi@nvidia.com>
>   */
>  
> +/*
> + * If NUMA awareness is disabled consider only node 0 as a single global
> + * NUMA node.
> + */
> +#define NUMA_FLAT_NODE	0

If it's a global idle node maybe 

 #define GLOBAL_IDLE_NODE	0

This actually bypasses NUMA, so it's weird to mention NUMA here.

> +
>  static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled);
> +static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_per_node);
>  
>  static bool check_builtin_idle_enabled(void)
>  {
> @@ -22,22 +29,82 @@ static bool check_builtin_idle_enabled(void)
>  }
>  
>  #ifdef CONFIG_SMP
> -#ifdef CONFIG_CPUMASK_OFFSTACK
> -#define CL_ALIGNED_IF_ONSTACK
> -#else
> -#define CL_ALIGNED_IF_ONSTACK __cacheline_aligned_in_smp
> -#endif
> -
> -static struct {
> +struct idle_cpumask {
>  	cpumask_var_t cpu;
>  	cpumask_var_t smt;
> -} idle_masks CL_ALIGNED_IF_ONSTACK;
> +};

We already have struct cpumask, and this struct idle_cpumask may
mislead. Maybe struct idle_cpus or something?

> +
> +/*
> + * cpumasks to track idle CPUs within each NUMA node.
> + *
> + * If SCX_OPS_BUILTIN_IDLE_PER_NODE is not specified, a single flat cpumask
> + * from node 0 is used to track all idle CPUs system-wide.
> + */
> +static struct idle_cpumask **scx_idle_masks;
> +
> +static struct idle_cpumask *get_idle_mask(int node)

Didn't we agree to drop this 'get' thing?

> +{
> +	if (node == NUMA_NO_NODE)
> +		node = numa_node_id();
> +	else if (WARN_ON_ONCE(node < 0 || node >= nr_node_ids))
> +		return NULL;

Kernel users always provide correct parameters. I don't even think you
need to check for NO_NODE, because if I as user of your API need to
provide current node, I can use numa_node_id() just as well.

If you drop all that sanity bloating, your function will be a
one-liner, and the question is: do you need it at all?

We usually need such wrappers to apply 'const' qualifier or do some
housekeeping before dereferencing. But in this case you just return
a pointer, and I don't understand why local users can't do it
themself.

The following idle_mask_init() happily ignores just added accessor...

> +	return scx_idle_masks[node];
> +}

> +
> +static struct cpumask *get_idle_cpumask(int node)
> +{
> +	struct idle_cpumask *mask = get_idle_mask(node);
> +
> +	return mask ? mask->cpu : cpu_none_mask;
> +}
> +
> +static struct cpumask *get_idle_smtmask(int node)
> +{
> +	struct idle_cpumask *mask = get_idle_mask(node);
> +
> +	return mask ? mask->smt : cpu_none_mask;
> +}

For those two guys... I think you agreed with Tejun that you don't
need them. To me the following is more verbose:
        
        idle_cpus(node)->smt;

> +
> +static void idle_masks_init(void)
> +{
> +	int node;
> +
> +	scx_idle_masks = kcalloc(num_possible_nodes(), sizeof(*scx_idle_masks), GFP_KERNEL);
> +	BUG_ON(!scx_idle_masks);
> +
> +	for_each_node_state(node, N_POSSIBLE) {
> +		scx_idle_masks[node] = kzalloc_node(sizeof(**scx_idle_masks), GFP_KERNEL, node);
> +		BUG_ON(!scx_idle_masks[node]);
> +
> +		BUG_ON(!alloc_cpumask_var_node(&scx_idle_masks[node]->cpu, GFP_KERNEL, node));
> +		BUG_ON(!alloc_cpumask_var_node(&scx_idle_masks[node]->smt, GFP_KERNEL, node));
> +	}
> +}
>  
>  static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_llc);
>  static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_numa);
>  
> +/*
> + * Return the node id associated to a target idle CPU (used to determine
> + * the proper idle cpumask).
> + */
> +static int idle_cpu_to_node(int cpu)
> +{
> +	int node;
> +
> +	if (static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node))
> +		node = cpu_to_node(cpu);

Nit: can you just return cpu_to_node(cpu). This will save 3 LOCs

> +	else
> +		node = NUMA_FLAT_NODE;
> +
> +	return node;
> +}
> +
>  static bool test_and_clear_cpu_idle(int cpu)
>  {
> +	int node = idle_cpu_to_node(cpu);
> +	struct cpumask *idle_cpus = get_idle_cpumask(node);
> +
>  #ifdef CONFIG_SCHED_SMT
>  	/*
>  	 * SMT mask should be cleared whether we can claim @cpu or not. The SMT
> @@ -46,33 +113,37 @@ static bool test_and_clear_cpu_idle(int cpu)
>  	 */
>  	if (sched_smt_active()) {
>  		const struct cpumask *smt = cpu_smt_mask(cpu);
> +		struct cpumask *idle_smts = get_idle_smtmask(node);
>  
>  		/*
>  		 * If offline, @cpu is not its own sibling and
>  		 * scx_pick_idle_cpu() can get caught in an infinite loop as
> -		 * @cpu is never cleared from idle_masks.smt. Ensure that @cpu
> -		 * is eventually cleared.
> +		 * @cpu is never cleared from the idle SMT mask. Ensure that
> +		 * @cpu is eventually cleared.
>  		 *
>  		 * NOTE: Use cpumask_intersects() and cpumask_test_cpu() to
>  		 * reduce memory writes, which may help alleviate cache
>  		 * coherence pressure.
>  		 */
> -		if (cpumask_intersects(smt, idle_masks.smt))
> -			cpumask_andnot(idle_masks.smt, idle_masks.smt, smt);
> -		else if (cpumask_test_cpu(cpu, idle_masks.smt))
> -			__cpumask_clear_cpu(cpu, idle_masks.smt);
> +		if (cpumask_intersects(smt, idle_smts))
> +			cpumask_andnot(idle_smts, idle_smts, smt);
> +		else if (cpumask_test_cpu(cpu, idle_smts))
> +			__cpumask_clear_cpu(cpu, idle_smts);
>  	}
>  #endif
> -	return cpumask_test_and_clear_cpu(cpu, idle_masks.cpu);
> +	return cpumask_test_and_clear_cpu(cpu, idle_cpus);
>  }
>  
> -static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
> +/*
> + * Pick an idle CPU in a specific NUMA node.
> + */
> +static s32 pick_idle_cpu_from_node(const struct cpumask *cpus_allowed, int node, u64 flags)
>  {
>  	int cpu;
>  
>  retry:
>  	if (sched_smt_active()) {
> -		cpu = cpumask_any_and_distribute(idle_masks.smt, cpus_allowed);
> +		cpu = cpumask_any_and_distribute(get_idle_smtmask(node), cpus_allowed);
>  		if (cpu < nr_cpu_ids)
>  			goto found;
>  
> @@ -80,15 +151,57 @@ static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
>  			return -EBUSY;
>  	}
>  
> -	cpu = cpumask_any_and_distribute(idle_masks.cpu, cpus_allowed);
> +	cpu = cpumask_any_and_distribute(get_idle_cpumask(node), cpus_allowed);
>  	if (cpu >= nr_cpu_ids)
>  		return -EBUSY;
>  
>  found:
>  	if (test_and_clear_cpu_idle(cpu))
>  		return cpu;
> -	else
> -		goto retry;
> +	goto retry;
> +}

Yes, I see this too. But to me minimizing your patch and preserving as
much history as you can is more important.

After all, newcomers should have a room to practice :)

> +
> +/*
> + * Find the best idle CPU in the system, relative to @node.
> + *
> + * If @node is NUMA_NO_NODE, start from the current node.
> + */

And if you don't invent this rule for kernel users, you don't need to
explain it everywhere.

> +static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node, u64 flags)
> +{
> +	nodemask_t hop_nodes = NODE_MASK_NONE;
> +	s32 cpu = -EBUSY;
> +
> +	if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node))
> +		return pick_idle_cpu_from_node(cpus_allowed, NUMA_FLAT_NODE, flags);
> +
> +	/*
> +	 * If a NUMA node was not specified, start with the current one.
> +	 */
> +	if (node == NUMA_NO_NODE)
> +		node = numa_node_id();

And enforce too...

> +
> +	/*
> +	 * Traverse all nodes in order of increasing distance, starting
> +	 * from prev_cpu's node.
> +	 *
> +	 * This loop is O(N^2), with N being the amount of NUMA nodes,
> +	 * which might be quite expensive in large NUMA systems. However,
> +	 * this complexity comes into play only when a scheduler enables
> +	 * SCX_OPS_BUILTIN_IDLE_PER_NODE and it's requesting an idle CPU
> +	 * without specifying a target NUMA node, so it shouldn't be a
> +	 * bottleneck is most cases.
> +	 *
> +	 * As a future optimization we may want to cache the list of hop
> +	 * nodes in a per-node array, instead of actually traversing them
> +	 * every time.
> +	 */
> +	for_each_numa_hop_node(n, node, hop_nodes, N_POSSIBLE) {
> +		cpu = pick_idle_cpu_from_node(cpus_allowed, n, flags);
> +		if (cpu >= 0)
> +			break;
> +	}
> +
> +	return cpu;
>  }
>  
>  /*
> @@ -208,7 +321,7 @@ static bool llc_numa_mismatch(void)
>   * CPU belongs to a single LLC domain, and that each LLC domain is entirely
>   * contained within a single NUMA node.
>   */
> -static void update_selcpu_topology(void)
> +static void update_selcpu_topology(struct sched_ext_ops *ops)
>  {
>  	bool enable_llc = false, enable_numa = false;
>  	unsigned int nr_cpus;
> @@ -298,6 +411,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  {
>  	const struct cpumask *llc_cpus = NULL;
>  	const struct cpumask *numa_cpus = NULL;
> +	int node = idle_cpu_to_node(prev_cpu);
>  	s32 cpu;
>  
>  	*found = false;
> @@ -355,9 +469,9 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  		 * piled up on it even if there is an idle core elsewhere on
>  		 * the system.
>  		 */
> -		if (!cpumask_empty(idle_masks.cpu) &&
> -		    !(current->flags & PF_EXITING) &&
> -		    cpu_rq(cpu)->scx.local_dsq.nr == 0) {
> +		if (!(current->flags & PF_EXITING) &&
> +		    cpu_rq(cpu)->scx.local_dsq.nr == 0 &&
> +		    !cpumask_empty(get_idle_cpumask(idle_cpu_to_node(cpu)))) {
>  			if (cpumask_test_cpu(cpu, p->cpus_ptr))
>  				goto cpu_found;
>  		}
> @@ -371,7 +485,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  		/*
>  		 * Keep using @prev_cpu if it's part of a fully idle core.
>  		 */
> -		if (cpumask_test_cpu(prev_cpu, idle_masks.smt) &&
> +		if (cpumask_test_cpu(prev_cpu, get_idle_smtmask(node)) &&
>  		    test_and_clear_cpu_idle(prev_cpu)) {
>  			cpu = prev_cpu;
>  			goto cpu_found;
> @@ -381,7 +495,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  		 * Search for any fully idle core in the same LLC domain.
>  		 */
>  		if (llc_cpus) {
> -			cpu = scx_pick_idle_cpu(llc_cpus, SCX_PICK_IDLE_CORE);
> +			cpu = pick_idle_cpu_from_node(llc_cpus, node, SCX_PICK_IDLE_CORE);
>  			if (cpu >= 0)
>  				goto cpu_found;
>  		}
> @@ -390,15 +504,19 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  		 * Search for any fully idle core in the same NUMA node.
>  		 */
>  		if (numa_cpus) {
> -			cpu = scx_pick_idle_cpu(numa_cpus, SCX_PICK_IDLE_CORE);
> +			cpu = scx_pick_idle_cpu(numa_cpus, node, SCX_PICK_IDLE_CORE);
>  			if (cpu >= 0)
>  				goto cpu_found;
>  		}
>  
>  		/*
>  		 * Search for any full idle core usable by the task.
> +		 *
> +		 * If NUMA aware idle selection is enabled, the search will
> +		 * begin in prev_cpu's node and proceed to other nodes in
> +		 * order of increasing distance.
>  		 */
> -		cpu = scx_pick_idle_cpu(p->cpus_ptr, SCX_PICK_IDLE_CORE);
> +		cpu = scx_pick_idle_cpu(p->cpus_ptr, node, SCX_PICK_IDLE_CORE);
>  		if (cpu >= 0)
>  			goto cpu_found;
>  	}
> @@ -415,7 +533,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  	 * Search for any idle CPU in the same LLC domain.
>  	 */
>  	if (llc_cpus) {
> -		cpu = scx_pick_idle_cpu(llc_cpus, 0);
> +		cpu = pick_idle_cpu_from_node(llc_cpus, node, 0);
>  		if (cpu >= 0)
>  			goto cpu_found;
>  	}
> @@ -424,7 +542,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  	 * Search for any idle CPU in the same NUMA node.
>  	 */
>  	if (numa_cpus) {
> -		cpu = scx_pick_idle_cpu(numa_cpus, 0);
> +		cpu = pick_idle_cpu_from_node(numa_cpus, node, 0);
>  		if (cpu >= 0)
>  			goto cpu_found;
>  	}
> @@ -432,7 +550,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  	/*
>  	 * Search for any idle CPU usable by the task.
>  	 */
> -	cpu = scx_pick_idle_cpu(p->cpus_ptr, 0);
> +	cpu = scx_pick_idle_cpu(p->cpus_ptr, node, 0);
>  	if (cpu >= 0)
>  		goto cpu_found;
>  
> @@ -448,17 +566,33 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  
>  static void reset_idle_masks(void)
>  {
> +	int node;
> +
> +	if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node)) {
> +		cpumask_copy(get_idle_cpumask(NUMA_FLAT_NODE), cpu_online_mask);
> +		cpumask_copy(get_idle_smtmask(NUMA_FLAT_NODE), cpu_online_mask);
> +		return;
> +	}
> +
>  	/*
>  	 * Consider all online cpus idle. Should converge to the actual state
>  	 * quickly.
>  	 */
> -	cpumask_copy(idle_masks.cpu, cpu_online_mask);
> -	cpumask_copy(idle_masks.smt, cpu_online_mask);
> +	for_each_node_state(node, N_POSSIBLE) {
> +		const struct cpumask *node_mask = cpumask_of_node(node);
> +		struct cpumask *idle_cpu = get_idle_cpumask(node);
> +		struct cpumask *idle_smt = get_idle_smtmask(node);
> +
> +		cpumask_and(idle_cpu, cpu_online_mask, node_mask);
> +		cpumask_copy(idle_smt, idle_cpu);

Tejun asked you to use cpumask_and() in both cases, didn't he?

> +	}
>  }
>  
>  void __scx_update_idle(struct rq *rq, bool idle)
>  {
>  	int cpu = cpu_of(rq);
> +	int node = idle_cpu_to_node(cpu);
> +	struct cpumask *idle_cpu = get_idle_cpumask(node);
>  
>  	if (SCX_HAS_OP(update_idle) && !scx_rq_bypassing(rq)) {
>  		SCX_CALL_OP(SCX_KF_REST, update_idle, cpu_of(rq), idle);
> @@ -466,24 +600,25 @@ void __scx_update_idle(struct rq *rq, bool idle)
>  			return;
>  	}
>  
> -	assign_cpu(cpu, idle_masks.cpu, idle);
> +	assign_cpu(cpu, idle_cpu, idle);
>  
>  #ifdef CONFIG_SCHED_SMT
>  	if (sched_smt_active()) {
>  		const struct cpumask *smt = cpu_smt_mask(cpu);
> +		struct cpumask *idle_smt = get_idle_smtmask(node);
>  
>  		if (idle) {
>  			/*
> -			 * idle_masks.smt handling is racy but that's fine as
> -			 * it's only for optimization and self-correcting.
> +			 * idle_smt handling is racy but that's fine as it's
> +			 * only for optimization and self-correcting.
>  			 */
>  			for_each_cpu(cpu, smt) {
> -				if (!cpumask_test_cpu(cpu, idle_masks.cpu))
> +				if (!cpumask_test_cpu(cpu, idle_cpu))
>  					return;
>  			}
> -			cpumask_or(idle_masks.smt, idle_masks.smt, smt);
> +			cpumask_or(idle_smt, idle_smt, smt);
>  		} else {
> -			cpumask_andnot(idle_masks.smt, idle_masks.smt, smt);
> +			cpumask_andnot(idle_smt, idle_smt, smt);
>  		}
>  	}
>  #endif
> @@ -491,8 +626,23 @@ void __scx_update_idle(struct rq *rq, bool idle)
>  
>  #else	/* !CONFIG_SMP */
>  
> +static struct cpumask *get_idle_cpumask(int node)
> +{
> +	return cpu_none_mask;
> +}
> +
> +static struct cpumask *get_idle_smtmask(int node)
> +{
> +	return cpu_none_mask;
> +}
> +
>  static bool test_and_clear_cpu_idle(int cpu) { return false; }
> -static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags) { return -EBUSY; }
> +
> +static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node, u64 flags)
> +{
> +	return -EBUSY;
> +}
> +
>  static void reset_idle_masks(void) {}
>  
>  #endif	/* CONFIG_SMP */
> @@ -546,11 +696,12 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask(void)
>  	if (!check_builtin_idle_enabled())
>  		return cpu_none_mask;
>  
> -#ifdef CONFIG_SMP
> -	return idle_masks.cpu;
> -#else
> -	return cpu_none_mask;
> -#endif
> +	if (static_branch_unlikely(&scx_builtin_idle_per_node)) {
> +		scx_ops_error("SCX_OPS_BUILTIN_IDLE_PER_NODE enabled");
> +		return cpu_none_mask;
> +	}
> +
> +	return get_idle_cpumask(NUMA_FLAT_NODE);
>  }
>  
>  /**
> @@ -565,14 +716,15 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_smtmask(void)
>  	if (!check_builtin_idle_enabled())
>  		return cpu_none_mask;
>  
> -#ifdef CONFIG_SMP
> +	if (static_branch_unlikely(&scx_builtin_idle_per_node)) {
> +		scx_ops_error("SCX_OPS_BUILTIN_IDLE_PER_NODE enabled");
> +		return cpu_none_mask;
> +	}
> +
>  	if (sched_smt_active())
> -		return idle_masks.smt;
> +		return get_idle_smtmask(NUMA_FLAT_NODE);
>  	else
> -		return idle_masks.cpu;
> -#else
> -	return cpu_none_mask;
> -#endif
> +		return get_idle_cpumask(NUMA_FLAT_NODE);
>  }
>  
>  /**
> @@ -635,7 +787,7 @@ __bpf_kfunc s32 scx_bpf_pick_idle_cpu(const struct cpumask *cpus_allowed,
>  	if (!check_builtin_idle_enabled())
>  		return -EBUSY;
>  
> -	return scx_pick_idle_cpu(cpus_allowed, flags);
> +	return scx_pick_idle_cpu(cpus_allowed, NUMA_NO_NODE, flags);
>  }
>  
>  /**
> @@ -658,7 +810,7 @@ __bpf_kfunc s32 scx_bpf_pick_any_cpu(const struct cpumask *cpus_allowed,
>  	s32 cpu;
>  
>  	if (static_branch_likely(&scx_builtin_idle_enabled)) {
> -		cpu = scx_pick_idle_cpu(cpus_allowed, flags);
> +		cpu = scx_pick_idle_cpu(cpus_allowed, NUMA_NO_NODE, flags);
>  		if (cpu >= 0)
>  			return cpu;
>  	}
> -- 
> 2.47.1

next prev parent reply	other threads:[~2024-12-24  4:05 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20 15:11 [PATCHSET v8 sched_ext/for-6.14] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2024-12-20 15:11 ` [PATCH 01/10] sched/topology: introduce for_each_numa_hop_node() / sched_numa_hop_node() Andrea Righi
2024-12-23 21:18   ` Yury Norov
2024-12-24  7:54     ` Andrea Righi
2024-12-24 17:33       ` Yury Norov
2024-12-20 15:11 ` [PATCH 02/10] sched_ext: Move built-in idle CPU selection policy to a separate file Andrea Righi
2024-12-24 21:21   ` Tejun Heo
2024-12-20 15:11 ` [PATCH 03/10] sched_ext: idle: introduce check_builtin_idle_enabled() helper Andrea Righi
2024-12-20 15:11 ` [PATCH 04/10] sched_ext: idle: use assign_cpu() to update the idle cpumask Andrea Righi
2024-12-23 22:26   ` Yury Norov
2024-12-20 15:11 ` [PATCH 05/10] sched_ext: idle: clarify comments Andrea Righi
2024-12-23 22:28   ` Yury Norov
2024-12-20 15:11 ` [PATCH 06/10] sched_ext: Introduce SCX_OPS_NODE_BUILTIN_IDLE Andrea Righi
2024-12-20 15:11 ` [PATCH 07/10] sched_ext: Introduce per-node idle cpumasks Andrea Righi
2024-12-24  4:05   ` Yury Norov [this message]
2024-12-24  8:18     ` Andrea Righi
2024-12-24 17:59       ` Yury Norov
2024-12-20 15:11 ` [PATCH 08/10] sched_ext: idle: introduce SCX_PICK_IDLE_NODE Andrea Righi
2024-12-24  2:48   ` Yury Norov
2024-12-24  3:53     ` Yury Norov
2024-12-24  8:37       ` Andrea Righi
2024-12-24 18:15         ` Yury Norov
2024-12-24  8:22     ` Andrea Righi
2024-12-24 21:29       ` Tejun Heo
2024-12-20 15:11 ` [PATCH 09/10] sched_ext: idle: Get rid of the scx_selcpu_topo_numa logic Andrea Righi
2024-12-23 23:39   ` Yury Norov
2024-12-24  8:58     ` Andrea Righi
2024-12-20 15:11 ` [PATCH 10/10] sched_ext: idle: Introduce NUMA aware idle cpu kfunc helpers Andrea Righi
2024-12-24  0:57   ` Yury Norov
2024-12-24  9:32     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2ozISbYmWPj7VNA@yury-ThinkPad \
    --to=yury.norov@gmail.com \
    --cc=arighi@nvidia.com \
    --cc=bpf@vger.kernel.org \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.