linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched_ext: idle: Skip cross-node search with !CONFIG_NUMA
@ 2025-06-03  8:22 Andrea Righi
  2025-06-03 18:29 ` Tejun Heo
  2025-06-04 14:05 ` Yury Norov
  0 siblings, 2 replies; 4+ messages in thread
From: Andrea Righi @ 2025-06-03  8:22 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: Yury Norov, linux-kernel

In the idle CPU selection logic, attempting cross-node searches adds
unnecessary complexity when CONFIG_NUMA is disabled.

Since there's no meaningful concept of nodes in this case, simplify the
logic by restricting the idle CPU search to the current node only.

Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext_idle.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 66da03cc0b338..8660d9ae40169 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -138,6 +138,7 @@ static s32 pick_idle_cpu_in_node(const struct cpumask *cpus_allowed, int node, u
 		goto retry;
 }
 
+#ifdef CONFIG_NUMA
 /*
  * Tracks nodes that have not yet been visited when searching for an idle
  * CPU across all available nodes.
@@ -186,6 +187,13 @@ static s32 pick_idle_cpu_from_online_nodes(const struct cpumask *cpus_allowed, i
 
 	return cpu;
 }
+#else
+static inline s32
+pick_idle_cpu_from_online_nodes(const struct cpumask *cpus_allowed, int node, u64 flags)
+{
+	return -EBUSY;
+}
+#endif
 
 /*
  * Find an idle CPU in the system, starting from @node.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched_ext: idle: Skip cross-node search with !CONFIG_NUMA
  2025-06-03  8:22 [PATCH] sched_ext: idle: Skip cross-node search with !CONFIG_NUMA Andrea Righi
@ 2025-06-03 18:29 ` Tejun Heo
  2025-06-04 14:05 ` Yury Norov
  1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2025-06-03 18:29 UTC (permalink / raw)
  To: Andrea Righi; +Cc: David Vernet, Changwoo Min, Yury Norov, linux-kernel

On Tue, Jun 03, 2025 at 10:22:01AM +0200, Andrea Righi wrote:
> In the idle CPU selection logic, attempting cross-node searches adds
> unnecessary complexity when CONFIG_NUMA is disabled.
> 
> Since there's no meaningful concept of nodes in this case, simplify the
> logic by restricting the idle CPU search to the current node only.
> 
> Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
> Signed-off-by: Andrea Righi <arighi@nvidia.com>

Applied to sched_ext/for-6.16-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched_ext: idle: Skip cross-node search with !CONFIG_NUMA
  2025-06-03  8:22 [PATCH] sched_ext: idle: Skip cross-node search with !CONFIG_NUMA Andrea Righi
  2025-06-03 18:29 ` Tejun Heo
@ 2025-06-04 14:05 ` Yury Norov
  2025-06-04 15:07   ` Andrea Righi
  1 sibling, 1 reply; 4+ messages in thread
From: Yury Norov @ 2025-06-04 14:05 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Tejun Heo, David Vernet, Changwoo Min, linux-kernel

Hi Andrea!

On Tue, Jun 03, 2025 at 10:22:01AM +0200, Andrea Righi wrote:
> In the idle CPU selection logic, attempting cross-node searches adds
> unnecessary complexity when CONFIG_NUMA is disabled.
> 
> Since there's no meaningful concept of nodes in this case, simplify the
> logic by restricting the idle CPU search to the current node only.
> 
> Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
> Signed-off-by: Andrea Righi <arighi@nvidia.com>
> ---
>  kernel/sched/ext_idle.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index 66da03cc0b338..8660d9ae40169 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -138,6 +138,7 @@ static s32 pick_idle_cpu_in_node(const struct cpumask *cpus_allowed, int node, u
>  		goto retry;
>  }
>  
> +#ifdef CONFIG_NUMA

It would be more natural if you move this inside the function body,
and not duplicate the function declaration.

>  /*
>   * Tracks nodes that have not yet been visited when searching for an idle
>   * CPU across all available nodes.
> @@ -186,6 +187,13 @@ static s32 pick_idle_cpu_from_online_nodes(const struct cpumask *cpus_allowed, i
>  
>  	return cpu;
>  }
> +#else
> +static inline s32
> +pick_idle_cpu_from_online_nodes(const struct cpumask *cpus_allowed, int node, u64 flags)
> +{
> +	return -EBUSY;
> +}

This is misleading errno. The system is nut busy, it is disabled. If
it was a syscall, I would say you should return ENOSYS. ENODATA is
another candidate. Or you have a special policy for the subsystem/

The above pick_idle_cpu_in_node() doesn't have CONFIG_NUMA protection
as well. Is it safe against CONFIG_NUMA?

> +#endif
>  
>  /*
>   * Find an idle CPU in the system, starting from @node.
> -- 
> 2.49.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched_ext: idle: Skip cross-node search with !CONFIG_NUMA
  2025-06-04 14:05 ` Yury Norov
@ 2025-06-04 15:07   ` Andrea Righi
  0 siblings, 0 replies; 4+ messages in thread
From: Andrea Righi @ 2025-06-04 15:07 UTC (permalink / raw)
  To: Yury Norov; +Cc: Tejun Heo, David Vernet, Changwoo Min, linux-kernel

Hi Yuri,

On Wed, Jun 04, 2025 at 10:05:15AM -0400, Yury Norov wrote:
> Hi Andrea!
> 
> On Tue, Jun 03, 2025 at 10:22:01AM +0200, Andrea Righi wrote:
> > In the idle CPU selection logic, attempting cross-node searches adds
> > unnecessary complexity when CONFIG_NUMA is disabled.
> > 
> > Since there's no meaningful concept of nodes in this case, simplify the
> > logic by restricting the idle CPU search to the current node only.
> > 
> > Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/ext_idle.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> > index 66da03cc0b338..8660d9ae40169 100644
> > --- a/kernel/sched/ext_idle.c
> > +++ b/kernel/sched/ext_idle.c
> > @@ -138,6 +138,7 @@ static s32 pick_idle_cpu_in_node(const struct cpumask *cpus_allowed, int node, u
> >  		goto retry;
> >  }
> >  
> > +#ifdef CONFIG_NUMA
> 
> It would be more natural if you move this inside the function body,
> and not duplicate the function declaration.

I was trying to catch both the function and the per_cpu_unvisited with a
single #ifdef, but I can definitely split that and add another #ifdef
inside the function body.

> 
> >  /*
> >   * Tracks nodes that have not yet been visited when searching for an idle
> >   * CPU across all available nodes.
> > @@ -186,6 +187,13 @@ static s32 pick_idle_cpu_from_online_nodes(const struct cpumask *cpus_allowed, i
> >  
> >  	return cpu;
> >  }
> > +#else
> > +static inline s32
> > +pick_idle_cpu_from_online_nodes(const struct cpumask *cpus_allowed, int node, u64 flags)
> > +{
> > +	return -EBUSY;
> > +}
> 
> This is misleading errno. The system is nut busy, it is disabled. If
> it was a syscall, I would say you should return ENOSYS. ENODATA is
> another candidate. Or you have a special policy for the subsystem/

So, this function is called only from scx_pick_idle_cpu(), that can still
call pick_idle_cpu_from_online_nodes() even on kernels with !CONFIG_NUMA,
if the BPF scheduler enables the per-node idle cpumask (setting the flag
SCX_OPS_BUILTIN_IDLE_PER_NODE).

We can return -ENOSYS, but then we still need to return -EBUSY from
scx_pick_idle_cpu(), since its logic is host-wide, so the choice of -EBUSY
was to be consistent with that.

However, I don't have a strong opinion, if you think it's clearer to return
-ENOSYS/ENODATA from pick_idle_cpu_from_online_nodes() I can change that,
but I'd still return -EBUSY from scx_pick_idle_cpu().

> 
> The above pick_idle_cpu_in_node() doesn't have CONFIG_NUMA protection
> as well. Is it safe against CONFIG_NUMA?

pick_idle_cpu_in_node() is always called with a validated node (when passed
from BPF) or a node from the kernel and idle_cpumask() is handling the
NUMA_NO_NODE case, so that should be fine in theory.

Thanks,
-Andrea

PS Tejun already applied this patch to his tree, so I'll send all the
changes as a followup patch, at least the original bug is fixed. :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-04 15:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03  8:22 [PATCH] sched_ext: idle: Skip cross-node search with !CONFIG_NUMA Andrea Righi
2025-06-03 18:29 ` Tejun Heo
2025-06-04 14:05 ` Yury Norov
2025-06-04 15:07   ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).