Re: [PATCH 3/3] selftests/sched_ext: Fix consume_immed test reliability

public inbox for sched-ext@lists.linux.dev
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: zhidao su <soolaugust@gmail.com>
Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	tj@kernel.org, void@manifault.com, changwoo@igalia.com,
	peterz@infradead.org, mingo@redhat.com,
	zhidao su <suzhidao@xiaomi.com>
Subject: Re: [PATCH 3/3] selftests/sched_ext: Fix consume_immed test reliability
Date: Thu, 26 Mar 2026 08:06:02 +0100	[thread overview]
Message-ID: <acTa2k5Fkhni_e7L@gpd4> (raw)
In-Reply-To: <20260326022827.3826287-3-suzhidao@xiaomi.com>

Hi zhidao,

On Thu, Mar 26, 2026 at 10:28:27AM +0800, zhidao su wrote:
> The consume_immed test was failing because nr_consume_immed_reenq
> stayed 0. Two issues:
> 
> 1. Workers were spread across CPUs, so CPU 0's local DSQ rarely
>    accumulated multiple tasks. Fix: pin all workers to CPU 0 and
>    increase NUM_WORKERS to 8 to ensure USER_DSQ is always backlogged.
> 
> 2. ops.dispatch() called scx_bpf_dsq_move_to_local() only once, so
>    CPU 0's local DSQ would contain exactly 1 task after dispatch.
>    The IMMED slow path requires dsq->nr > 1 at the time of insertion
>    (in dsq_inc_nr()), so a single dispatch call never triggers it.
> 
>    Fix: call scx_bpf_dsq_move_to_local() in a loop (up to 4 times)
>    within a single ops.dispatch() invocation. The second call finds
>    dsq->nr already 1, so dsq->nr increments to 2, triggering
>    schedule_reenq_local() and the IMMED slow path.
> 
> Signed-off-by: zhidao su <suzhidao@xiaomi.com>
> ---
>  .../selftests/sched_ext/consume_immed.bpf.c   | 25 ++++++++++++++-----
>  .../selftests/sched_ext/consume_immed.c       | 14 ++++++++++-
>  2 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/sched_ext/consume_immed.bpf.c b/tools/testing/selftests/sched_ext/consume_immed.bpf.c
> index 9c7808f5abe1..e99bea0b2c24 100644
> --- a/tools/testing/selftests/sched_ext/consume_immed.bpf.c
> +++ b/tools/testing/selftests/sched_ext/consume_immed.bpf.c
> @@ -11,9 +11,9 @@
>   *                   explicit SCX_ENQ_IMMED in enq_flags (requires v2 kfunc)
>   *
>   * Worker threads belonging to test_tgid are inserted into USER_DSQ.
> - * ops.dispatch() on CPU 0 consumes from USER_DSQ with SCX_ENQ_IMMED.
> - * With multiple workers competing for CPU 0, dsq->nr > 1 triggers the
> - * IMMED slow path (reenqueue with SCX_TASK_REENQ_IMMED).
> + * ops.dispatch() on CPU 0 consumes multiple tasks from USER_DSQ with
> + * SCX_ENQ_IMMED in a single dispatch call, causing dsq->nr to exceed 1
> + * and triggering the IMMED slow path (reenqueue with SCX_TASK_REENQ_IMMED).
>   *
>   * Requires scx_bpf_dsq_move_to_local___v2() (v7.1+) for enq_flags support.
>   */
> @@ -55,10 +55,23 @@ void BPF_STRUCT_OPS(consume_immed_enqueue, struct task_struct *p,

We should define a custom ops.select_cpu() to make sure all tasks are
bounced to ops.enqueue(), in order to have more inserts into USER_DSQ,
something like:

s32 BPF_STRUCT_OPS(consume_immed_select_cpu, struct task_struct *p,
                   s32 prev_cpu, u64 wake_flags)
{
	return prev_cpu;
}

>  
>  void BPF_STRUCT_OPS(consume_immed_dispatch, s32 cpu, struct task_struct *prev)
>  {
> -	if (cpu == 0)
> -		scx_bpf_dsq_move_to_local(USER_DSQ, SCX_ENQ_IMMED);
> -	else
> +	int i;
> +
> +	if (cpu != 0) {
>  		scx_bpf_dsq_move_to_local(SCX_DSQ_GLOBAL, 0);

Hm.. this should trigger an error, you can't use
scx_bpf_dsq_move_to_local() with SCX_DSQ_GLOBAL.

> +		return;
> +	}
> +
> +	/*
> +	 * Move multiple tasks into CPU 0's local DSQ with SCX_ENQ_IMMED in a
> +	 * single dispatch call. When the second task is inserted (dsq->nr > 1),
> +	 * dsq_inc_nr() triggers the IMMED slow path via schedule_reenq_local(),
> +	 * which calls ops.enqueue() with SCX_ENQ_REENQ | SCX_TASK_REENQ_IMMED.
> +	 */
> +	for (i = 0; i < 4; i++) {
> +		if (!scx_bpf_dsq_move_to_local(USER_DSQ, SCX_ENQ_IMMED))
> +			break;
> +	}

Why 4? Two consecutive scx_bpf_dsq_move_to_local() should be enough, right?

>  }
>  
>  s32 BPF_STRUCT_OPS_SLEEPABLE(consume_immed_init)

We don't see this from the context, but to check if SCX_ENQ_IMMED is
available can we just check if it's != 0, instead of checking the
_scx_bpf_dsq_move_to_local___v2 symbol?

Something like:

if (SCX_ENQ_IMMED == 0) {
	scx_bpf_error("SCX_ENQ_IMMED not available");
	return -EOPNOTSUPP;
}

And remove the same check in consume_immed.c, because we're already
checking it in the BPF part.

Thanks,
-Andrea

> diff --git a/tools/testing/selftests/sched_ext/consume_immed.c b/tools/testing/selftests/sched_ext/consume_immed.c
> index 7f9594cfa9cb..61cbc0fe3663 100644
> --- a/tools/testing/selftests/sched_ext/consume_immed.c
> +++ b/tools/testing/selftests/sched_ext/consume_immed.c
> @@ -12,18 +12,30 @@
>  #include <stdio.h>
>  #include <unistd.h>
>  #include <pthread.h>
> +#include <sched.h>
>  #include <bpf/bpf.h>
>  #include <scx/common.h>
>  #include "consume_immed.bpf.skel.h"
>  #include "scx_test.h"
>  
> -#define NUM_WORKERS		4
> +/*
> + * Use more workers than CPUs, all pinned to CPU 0, so CPU 0's local DSQ
> + * accumulates multiple IMMED tasks at once, reliably triggering the slow path.
> + */
> +#define NUM_WORKERS		8
>  #define TEST_DURATION_SEC	3
>  
>  static volatile bool stop_workers;
>  
>  static void *worker_fn(void *arg)
>  {
> +	cpu_set_t cpuset;
> +
> +	/* Pin to CPU 0 to saturate its local DSQ */
> +	CPU_ZERO(&cpuset);
> +	CPU_SET(0, &cpuset);
> +	pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
> +
>  	while (!stop_workers) {
>  		volatile unsigned long i;
>  
> -- 
> 2.43.0
>

next prev parent reply	other threads:[~2026-03-26  7:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26  2:28 [PATCH 1/3] sched_ext: fix NULL deref in bpf_scx_unreg() due to ops->priv race zhidao su
2026-03-26  2:28 ` [PATCH 2/3] selftests/sched_ext: Fix dsq_reenq test reliability zhidao su
2026-03-26  2:28 ` [PATCH 3/3] selftests/sched_ext: Fix consume_immed " zhidao su
2026-03-26  7:06   ` Andrea Righi [this message]
2026-03-26  2:45 ` [PATCH 1/3] sched_ext: fix NULL deref in bpf_scx_unreg() due to ops->priv race Tejun Heo
2026-03-26  5:13   ` zhidao su

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acTa2k5Fkhni_e7L@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=soolaugust@gmail.com \
    --cc=suzhidao@xiaomi.com \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox