All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Feng Yang <yangfeng59949@163.com>
Cc: changwoo@igalia.com, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, sched-ext@lists.linux.dev,
	shuah@kernel.org, tj@kernel.org, void@manifault.com
Subject: Re: [PATCH] selftests/sched_ext: Fix the incorrect logic validation
Date: Mon, 2 Mar 2026 16:18:15 +0100	[thread overview]
Message-ID: <aaWqNx-DDtzzSqW6@gpd4> (raw)
In-Reply-To: <20260302084130.480694-1-yangfeng59949@163.com>

Hi Feng,

On Mon, Mar 02, 2026 at 04:41:30PM +0800, Feng Yang wrote:
> On Mon, 2 Mar 2026 08:47:17 +0100, Andrea Righi wrote:
> > On Mon, Mar 02, 2026 at 02:48:51PM +0800, Feng Yang wrote:
> > > From: Feng Yang <yangfeng@kylinos.cn>
> > > 
> > > When __COMPAT_scx_bpf_pick_idle_cpu_node selects an idle CPU,
> > > it reports that the CPU should be marked as busy.
> > > 
> > > Fixes: 5ae5161820e5 ("selftests/sched_ext: Add NUMA-aware scheduler test")
> > > Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
> > > ---
> > >  tools/testing/selftests/sched_ext/numa.bpf.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/tools/testing/selftests/sched_ext/numa.bpf.c b/tools/testing/selftests/sched_ext/numa.bpf.c
> > > index a79d86ed54a1..98423628b05c 100644
> > > --- a/tools/testing/selftests/sched_ext/numa.bpf.c
> > > +++ b/tools/testing/selftests/sched_ext/numa.bpf.c
> > > @@ -44,12 +44,12 @@ s32 BPF_STRUCT_OPS(numa_select_cpu,
> > >  	 */
> > >  	cpu = __COMPAT_scx_bpf_pick_idle_cpu_node(p->cpus_ptr, node,
> > >  					__COMPAT_SCX_PICK_IDLE_IN_NODE);
> > > -	if (cpu < 0)
> > > +	if (cpu < 0) {
> > >  		cpu = __COMPAT_scx_bpf_pick_any_cpu_node(p->cpus_ptr, node,
> > >  						__COMPAT_SCX_PICK_IDLE_IN_NODE);
> > > -
> > > -	if (is_cpu_idle(cpu, node))
> > > -		scx_bpf_error("CPU %d should be marked as busy", cpu);
> > > +		if (is_cpu_idle(cpu, node))
> > > +			scx_bpf_error("CPU %d should be marked as busy", cpu);
> > > +	}
> > 
> > No, this is not correct. The CPU returned by scx_bpf_pick_idle_cpu_node()
> > should be marked as busy at this point (bit is set in the idle cpumask),
> > essentially it has been reserved/allocated by the caller.
> 
> Okay, I misunderstood.
> The CI test of this patch (https://patchwork.kernel.org/project/netdevbpf/list/?series=1058866&state=*) 
> resulted in the error "CPU 0 should be marked as busy"(https://github.com/kernel-patches/bpf/actions/runs/22548310848/job/65314767856),
> but I don't see the connection between them that would cause this error. Could you explain why this happens?

The is_cpu_idle() test is going to be always a bit flaky in some cases,
for what I see we may have two races:

1) if the first call to scx_bpf_pick_idle_cpu_node() returns a negative
   value, we fallback to "pick any CPU". Between selecting the CPU and
   checking if it's idle, the task running on the CPU may have released it,
   allowing the idle task to run and set the idle bit again.

2) Even if the first scx_bpf_pick_idle_cpu_node() returns a cpu >= 0, that
   CPU might still execute a short-lived task, which then releases it and
   sets the idle bit again.

Given this, I think the most reliable approach is to remove the
is_cpu_idle() check here to avoid flakiness and instead implement something
along the lines of the following. What do you think?

Thanks,
-Andrea

 tools/testing/selftests/sched_ext/numa.bpf.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/sched_ext/numa.bpf.c b/tools/testing/selftests/sched_ext/numa.bpf.c
index a79d86ed54a1b..209dacdbd117e 100644
--- a/tools/testing/selftests/sched_ext/numa.bpf.c
+++ b/tools/testing/selftests/sched_ext/numa.bpf.c
@@ -19,18 +19,6 @@ UEI_DEFINE(uei);
 
 const volatile unsigned int __COMPAT_SCX_PICK_IDLE_IN_NODE;
 
-static bool is_cpu_idle(s32 cpu, int node)
-{
-	const struct cpumask *idle_cpumask;
-	bool idle;
-
-	idle_cpumask = __COMPAT_scx_bpf_get_idle_cpumask_node(node);
-	idle = bpf_cpumask_test_cpu(cpu, idle_cpumask);
-	scx_bpf_put_cpumask(idle_cpumask);
-
-	return idle;
-}
-
s32 BPF_STRUCT_OPS(numa_select_cpu,
 		   struct task_struct *p, s32 prev_cpu, u64 wake_flags)
 {
@@ -44,12 +32,12 @@ s32 BPF_STRUCT_OPS(numa_select_cpu,
 	 */
 	cpu = __COMPAT_scx_bpf_pick_idle_cpu_node(p->cpus_ptr, node,
 					__COMPAT_SCX_PICK_IDLE_IN_NODE);
-	if (cpu < 0)
+	if (cpu < 0) {
 		cpu = __COMPAT_scx_bpf_pick_any_cpu_node(p->cpus_ptr, node,
 						__COMPAT_SCX_PICK_IDLE_IN_NODE);
-
-	if (is_cpu_idle(cpu, node))
-		scx_bpf_error("CPU %d should be marked as busy", cpu);
+		if (cpu < 0)
+			return prev_cpu;
+	}
 
 	if (__COMPAT_scx_bpf_cpu_node(cpu) != node)
 		scx_bpf_error("CPU %d should be in node %d", cpu, node);

  reply	other threads:[~2026-03-02 15:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02  6:48 [PATCH] selftests/sched_ext: Fix the incorrect logic validation Feng Yang
2026-03-02  7:47 ` Andrea Righi
2026-03-02  8:41   ` Feng Yang
2026-03-02 15:18     ` Andrea Righi [this message]
2026-03-03  2:31       ` Feng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaWqNx-DDtzzSqW6@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    --cc=yangfeng59949@163.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.