Re: [PATCH 3/3] selftests/sched_ext: Fix consume_immed test reliability

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: zhidao su <soolaugust@gmail.com>
Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	tj@kernel.org, void@manifault.com, changwoo@igalia.com,
	peterz@infradead.org, mingo@redhat.com,
	zhidao su <suzhidao@xiaomi.com>
Subject: Re: [PATCH 3/3] selftests/sched_ext: Fix consume_immed test reliability
Date: Thu, 26 Mar 2026 08:06:02 +0100	[thread overview]
Message-ID: <acTa2k5Fkhni_e7L@gpd4> (raw)
In-Reply-To: <20260326022827.3826287-3-suzhidao@xiaomi.com>

Hi zhidao,

On Thu, Mar 26, 2026 at 10:28:27AM +0800, zhidao su wrote:
> The consume_immed test was failing because nr_consume_immed_reenq
> stayed 0. Two issues:
> 
> 1. Workers were spread across CPUs, so CPU 0's local DSQ rarely
>    accumulated multiple tasks. Fix: pin all workers to CPU 0 and
>    increase NUM_WORKERS to 8 to ensure USER_DSQ is always backlogged.
> 
> 2. ops.dispatch() called scx_bpf_dsq_move_to_local() only once, so
>    CPU 0's local DSQ would contain exactly 1 task after dispatch.
>    The IMMED slow path requires dsq->nr > 1 at the time of insertion
>    (in dsq_inc_nr()), so a single dispatch call never triggers it.
> 
>    Fix: call scx_bpf_dsq_move_to_local() in a loop (up to 4 times)
>    within a single ops.dispatch() invocation. The second call finds
>    dsq->nr already 1, so dsq->nr increments to 2, triggering
>    schedule_reenq_local() and the IMMED slow path.
> 
> Signed-off-by: zhidao su <suzhidao@xiaomi.com>
> ---
>  .../selftests/sched_ext/consume_immed.bpf.c   | 25 ++++++++++++++-----
>  .../selftests/sched_ext/consume_immed.c       | 14 ++++++++++-
>  2 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/sched_ext/consume_immed.bpf.c b/tools/testing/selftests/sched_ext/consume_immed.bpf.c
> index 9c7808f5abe1..e99bea0b2c24 100644
> --- a/tools/testing/selftests/sched_ext/consume_immed.bpf.c
> +++ b/tools/testing/selftests/sched_ext/consume_immed.bpf.c
> @@ -11,9 +11,9 @@
>   *                   explicit SCX_ENQ_IMMED in enq_flags (requires v2 kfunc)
>   *
>   * Worker threads belonging to test_tgid are inserted into USER_DSQ.
> - * ops.dispatch() on CPU 0 consumes from USER_DSQ with SCX_ENQ_IMMED.
> - * With multiple workers competing for CPU 0, dsq->nr > 1 triggers the
> - * IMMED slow path (reenqueue with SCX_TASK_REENQ_IMMED).
> + * ops.dispatch() on CPU 0 consumes multiple tasks from USER_DSQ with
> + * SCX_ENQ_IMMED in a single dispatch call, causing dsq->nr to exceed 1
> + * and triggering the IMMED slow path (reenqueue with SCX_TASK_REENQ_IMMED).
>   *
>   * Requires scx_bpf_dsq_move_to_local___v2() (v7.1+) for enq_flags support.
>   */
> @@ -55,10 +55,23 @@ void BPF_STRUCT_OPS(consume_immed_enqueue, struct task_struct *p,

We should define a custom ops.select_cpu() to make sure all tasks are
bounced to ops.enqueue(), in order to have more inserts into USER_DSQ,
something like:

s32 BPF_STRUCT_OPS(consume_immed_select_cpu, struct task_struct *p,
                   s32 prev_cpu, u64 wake_flags)
{
	return prev_cpu;
}

>  
>  void BPF_STRUCT_OPS(consume_immed_dispatch, s32 cpu, struct task_struct *prev)
>  {
> -	if (cpu == 0)
> -		scx_bpf_dsq_move_to_local(USER_DSQ, SCX_ENQ_IMMED);
> -	else
> +	int i;
> +
> +	if (cpu != 0) {
>  		scx_bpf_dsq_move_to_local(SCX_DSQ_GLOBAL, 0);

Hm.. this should trigger an error, you can't use
scx_bpf_dsq_move_to_local() with SCX_DSQ_GLOBAL.

> +		return;
> +	}
> +
> +	/*
> +	 * Move multiple tasks into CPU 0's local DSQ with SCX_ENQ_IMMED in a
> +	 * single dispatch call. When the second task is inserted (dsq->nr > 1),
> +	 * dsq_inc_nr() triggers the IMMED slow path via schedule_reenq_local(),
> +	 * which calls ops.enqueue() with SCX_ENQ_REENQ | SCX_TASK_REENQ_IMMED.
> +	 */
> +	for (i = 0; i < 4; i++) {
> +		if (!scx_bpf_dsq_move_to_local(USER_DSQ, SCX_ENQ_IMMED))
> +			break;
> +	}

Why 4? Two consecutive scx_bpf_dsq_move_to_local() should be enough, right?

>  }
>  
>  s32 BPF_STRUCT_OPS_SLEEPABLE(consume_immed_init)

We don't see this from the context, but to check if SCX_ENQ_IMMED is
available can we just check if it's != 0, instead of checking the
_scx_bpf_dsq_move_to_local___v2 symbol?

Something like:

if (SCX_ENQ_IMMED == 0) {
	scx_bpf_error("SCX_ENQ_IMMED not available");
	return -EOPNOTSUPP;
}

And remove the same check in consume_immed.c, because we're already
checking it in the BPF part.

Thanks,
-Andrea

> diff --git a/tools/testing/selftests/sched_ext/consume_immed.c b/tools/testing/selftests/sched_ext/consume_immed.c
> index 7f9594cfa9cb..61cbc0fe3663 100644
> --- a/tools/testing/selftests/sched_ext/consume_immed.c
> +++ b/tools/testing/selftests/sched_ext/consume_immed.c
> @@ -12,18 +12,30 @@
>  #include <stdio.h>
>  #include <unistd.h>
>  #include <pthread.h>
> +#include <sched.h>
>  #include <bpf/bpf.h>
>  #include <scx/common.h>
>  #include "consume_immed.bpf.skel.h"
>  #include "scx_test.h"
>  
> -#define NUM_WORKERS		4
> +/*
> + * Use more workers than CPUs, all pinned to CPU 0, so CPU 0's local DSQ
> + * accumulates multiple IMMED tasks at once, reliably triggering the slow path.
> + */
> +#define NUM_WORKERS		8
>  #define TEST_DURATION_SEC	3
>  
>  static volatile bool stop_workers;
>  
>  static void *worker_fn(void *arg)
>  {
> +	cpu_set_t cpuset;
> +
> +	/* Pin to CPU 0 to saturate its local DSQ */
> +	CPU_ZERO(&cpuset);
> +	CPU_SET(0, &cpuset);
> +	pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
> +
>  	while (!stop_workers) {
>  		volatile unsigned long i;
>  
> -- 
> 2.43.0
>

next prev parent reply	other threads:[~2026-03-26  7:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26  2:28 [PATCH 1/3] sched_ext: fix NULL deref in bpf_scx_unreg() due to ops->priv race zhidao su
2026-03-26  2:28 ` [PATCH 2/3] selftests/sched_ext: Fix dsq_reenq test reliability zhidao su
2026-03-26  2:28 ` [PATCH 3/3] selftests/sched_ext: Fix consume_immed " zhidao su
2026-03-26  7:06   ` Andrea Righi [this message]
2026-03-26  2:45 ` [PATCH 1/3] sched_ext: fix NULL deref in bpf_scx_unreg() due to ops->priv race Tejun Heo
2026-03-26  5:13   ` zhidao su

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acTa2k5Fkhni_e7L@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=soolaugust@gmail.com \
    --cc=suzhidao@xiaomi.com \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.