* [PATCH] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
@ 2026-03-21 10:54 Cheng-Yang Chou
2026-03-21 17:45 ` Andrea Righi
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Cheng-Yang Chou @ 2026-03-21 10:54 UTC (permalink / raw)
To: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
Cc: Ching-Chun Huang, Chia-Ping Tsai, yphbchou0911
In the WAKE_SYNC path of scx_select_cpu_dfl(), waker_node was computed
with cpu_to_node(), while node (for prev_cpu) was computed with
scx_cpu_node_if_enabled(). When scx_builtin_idle_per_node is disabled,
node is NUMA_NO_NODE but waker_node would be the actual NUMA node,
causing two issues:
1. The (waker_node == node) check always fails when SCX_PICK_IDLE_IN_NODE
is set, preventing the waker CPU optimization from ever triggering.
2. idle_cpumask(waker_node) is called with a real node ID even though
per-node idle tracking is disabled, resulting in undefined behavior.
Fix by using scx_cpu_node_if_enabled() for waker_node as well, ensuring
both variables are computed consistently.
Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
kernel/sched/ext_idle.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index c7e405262697..8436c7df0a56 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -543,7 +543,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
* piled up on it even if there is an idle core elsewhere on
* the system.
*/
- waker_node = cpu_to_node(cpu);
+ waker_node = scx_cpu_node_if_enabled(cpu);
if (!(current->flags & PF_EXITING) &&
cpu_rq(cpu)->scx.local_dsq.nr == 0 &&
(!(flags & SCX_PICK_IDLE_IN_NODE) || (waker_node == node)) &&
--
2.48.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
2026-03-21 10:54 [PATCH] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() Cheng-Yang Chou
@ 2026-03-21 17:45 ` Andrea Righi
2026-03-21 18:42 ` Tejun Heo
2026-03-21 19:38 ` [PATCH v2] " Cheng-Yang Chou
2 siblings, 0 replies; 6+ messages in thread
From: Andrea Righi @ 2026-03-21 17:45 UTC (permalink / raw)
To: Cheng-Yang Chou
Cc: sched-ext, Tejun Heo, David Vernet, Changwoo Min,
Ching-Chun Huang, Chia-Ping Tsai
Hi Cheng-Yang,
On Sat, Mar 21, 2026 at 06:54:58PM +0800, Cheng-Yang Chou wrote:
> In the WAKE_SYNC path of scx_select_cpu_dfl(), waker_node was computed
> with cpu_to_node(), while node (for prev_cpu) was computed with
> scx_cpu_node_if_enabled(). When scx_builtin_idle_per_node is disabled,
> node is NUMA_NO_NODE but waker_node would be the actual NUMA node,
> causing two issues:
>
> 1. The (waker_node == node) check always fails when SCX_PICK_IDLE_IN_NODE
> is set, preventing the waker CPU optimization from ever triggering.
When scx_builtin_idle_per_node is disabled, SCX_PICK_IDLE_IN_NODE won't be
set, which means !(flags & SCX_PICK_IDLE_IN_NODE) should be always true,
short-circuiting the ||, and the waker_node == node comparison is never
evaluated.
However, ...
> 2. idle_cpumask(waker_node) is called with a real node ID even though
> per-node idle tracking is disabled, resulting in undefined behavior.
...this looks like a legit bug. I'm wondering how this fix impacts
performance, will do some testing. Nice catch!
>
> Fix by using scx_cpu_node_if_enabled() for waker_node as well, ensuring
> both variables are computed consistently.
>
> Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
> Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
We should also add:
Cc: stable@vger.kernel.org # v6.15+
Thanks!
-Andrea
> ---
> kernel/sched/ext_idle.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index c7e405262697..8436c7df0a56 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -543,7 +543,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
> * piled up on it even if there is an idle core elsewhere on
> * the system.
> */
> - waker_node = cpu_to_node(cpu);
> + waker_node = scx_cpu_node_if_enabled(cpu);
> if (!(current->flags & PF_EXITING) &&
> cpu_rq(cpu)->scx.local_dsq.nr == 0 &&
> (!(flags & SCX_PICK_IDLE_IN_NODE) || (waker_node == node)) &&
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
2026-03-21 10:54 [PATCH] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() Cheng-Yang Chou
2026-03-21 17:45 ` Andrea Righi
@ 2026-03-21 18:42 ` Tejun Heo
2026-03-21 19:39 ` Cheng-Yang Chou
2026-03-21 19:38 ` [PATCH v2] " Cheng-Yang Chou
2 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2026-03-21 18:42 UTC (permalink / raw)
To: Cheng-Yang Chou, sched-ext, David Vernet, Andrea Righi,
Changwoo Min
Cc: Ching-Chun Huang, Chia-Ping Tsai, Emil Tsalapatis, linux-kernel
Hello,
Applied to sched_ext/for-7.0-fixes with the following additions:
Cc: stable@vger.kernel.org # v6.15+
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
2026-03-21 10:54 [PATCH] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() Cheng-Yang Chou
2026-03-21 17:45 ` Andrea Righi
2026-03-21 18:42 ` Tejun Heo
@ 2026-03-21 19:38 ` Cheng-Yang Chou
2026-03-22 0:26 ` Tejun Heo
2 siblings, 1 reply; 6+ messages in thread
From: Cheng-Yang Chou @ 2026-03-21 19:38 UTC (permalink / raw)
To: yphbchou0911
Cc: arighi, changwoo, chia7712, jserv, sched-ext, tj, void, stable
In the WAKE_SYNC path of scx_select_cpu_dfl(), waker_node was computed
with cpu_to_node(), while node (for prev_cpu) was computed with
scx_cpu_node_if_enabled(). When scx_builtin_idle_per_node is disabled,
idle_cpumask(waker_node) is called with a real node ID even though
per-node idle tracking is disabled, resulting in undefined behavior.
Fix by using scx_cpu_node_if_enabled() for waker_node as well, ensuring
both variables are computed consistently.
Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
---
Changes in v2:
- Update commit message to drop the incorrect short-circuiting claim
(Andrea Righi)
- Link to v1:
https://lore.kernel.org/all/20260321105503.869337-1-yphbchou0911@gmail.com/
kernel/sched/ext_idle.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index c7e405262697..8436c7df0a56 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -543,7 +543,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
* piled up on it even if there is an idle core elsewhere on
* the system.
*/
- waker_node = cpu_to_node(cpu);
+ waker_node = scx_cpu_node_if_enabled(cpu);
if (!(current->flags & PF_EXITING) &&
cpu_rq(cpu)->scx.local_dsq.nr == 0 &&
(!(flags & SCX_PICK_IDLE_IN_NODE) || (waker_node == node)) &&
--
2.48.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
2026-03-21 18:42 ` Tejun Heo
@ 2026-03-21 19:39 ` Cheng-Yang Chou
0 siblings, 0 replies; 6+ messages in thread
From: Cheng-Yang Chou @ 2026-03-21 19:39 UTC (permalink / raw)
To: Tejun Heo
Cc: sched-ext, David Vernet, Andrea Righi, Changwoo Min,
Ching-Chun Huang, Chia-Ping Tsai, Emil Tsalapatis, linux-kernel
Hi Tejun, Andrea,
On Sat, Mar 21, 2026 at 08:42:13AM -1000, Tejun Heo wrote:
> Hello,
>
> Applied to sched_ext/for-7.0-fixes with the following additions:
>
> Cc: stable@vger.kernel.org # v6.15+
> Reviewed-by: Andrea Righi <arighi@nvidia.com>
>
> Thanks.
>
> --
> tejun
Thanks for reviewing and applying the patch!
Regarding Andrea's feedback on the commit message, thanks for pointinig
out the short-circuiting behavior. Since the patch is already applied to
sched_ext/for-7.0-fixes, how would you prefer to handle the commit message
update?
I have sent a v2 patch in reply to the v1 thread with the corrected
description. Please feel free to drop the current one and apply v2.
Alternatively, if it is easier, the commit message can be amended
directly in your tree.
Sorry for the mix-up.
--
Thanks,
Cheng-Yang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
2026-03-21 19:38 ` [PATCH v2] " Cheng-Yang Chou
@ 2026-03-22 0:26 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2026-03-22 0:26 UTC (permalink / raw)
To: Cheng-Yang Chou
Cc: Andrea Righi, Changwoo Min, David Vernet, Emil Tsalapatis,
Ching-Chun Huang, Chia-Ping Tsai, sched-ext, linux-kernel
Hello,
The commit message on sched_ext/for-7.0-fixes has been amended to match
v2.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-22 0:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-21 10:54 [PATCH] sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() Cheng-Yang Chou
2026-03-21 17:45 ` Andrea Righi
2026-03-21 18:42 ` Tejun Heo
2026-03-21 19:39 ` Cheng-Yang Chou
2026-03-21 19:38 ` [PATCH v2] " Cheng-Yang Chou
2026-03-22 0:26 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox