public inbox for sched-ext@lists.linux.dev
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Cheng-Yang Chou <yphbchou0911@gmail.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	Ching-Chun Huang <jserv@ccns.ncku.edu.tw>,
	Chia-Ping Tsai <chia7712@gmail.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH sched_ext/for-7.1] tools/sched_ext: Kick home CPU for stranded tasks in scx_qmap
Date: Mon, 13 Apr 2026 07:32:56 +0200	[thread overview]
Message-ID: <adyACFXObn1rZNPh@gpd4> (raw)
In-Reply-To: <9e172bda49dade833db7118929332693@kernel.org>

Hi Tejun,

On Sun, Apr 12, 2026 at 05:30:52PM -1000, Tejun Heo wrote:
> scx_qmap uses global BPF queue maps (BPF_MAP_TYPE_QUEUE) that any CPU's
> ops.dispatch() can pop from. When a CPU pops a task that can't run on it
> (e.g. a pinned per-CPU kthread), it inserts the task into SHARED_DSQ.
> consume_dispatch_q() then skips the task due to affinity mismatch, leaving it
> stranded until some CPU in its allowed mask calls ops.dispatch(). This doesn't
> cause indefinite stalls -- the periodic tick keeps firing (can_stop_idle_tick()
> returns false when softirq is pending) -- but can cause noticeable scheduling
> delays.
> 
> After inserting to SHARED_DSQ, kick the task's home CPU if this CPU can't run
> it. There's a small race window where the home CPU can enter idle before the
> kick lands -- if a per-CPU kthread like ksoftirqd is the stranded task, this
> can trigger a "NOHZ tick-stop error" warning. The kick arrives shortly after
> and the home CPU drains the task.
> 
> Rather than fully eliminating the warning by routing pinned tasks to local or
> global DSQs, the current code keeps them going through the normal BPF queue
> path and documents the race and the resulting warning in detail. scx_qmap is an
> example scheduler and having tasks go through the usual dispatch path is useful
> for testing. The detailed comment also serves as a reference for other
> schedulers that may encounter similar warnings.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> v2: Replaced the previous enqueue-side fix which kicked when a pinned task was
>     enqueued. That was based on the theory that ops.select_cpu() being skipped
>     meant the home CPU wouldn't be woken, which wasn't quite right --
>     wakeup_preempt() kicks the target CPU regardless. Moved the fix to
>     ops.dispatch() where the stranding is actually observable.

Looks good now!

Reviewed-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

> 
>  tools/sched_ext/scx_qmap.bpf.c | 40 ++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c
> index f3587fb709c9..a4543c7ab25d 100644
> --- a/tools/sched_ext/scx_qmap.bpf.c
> +++ b/tools/sched_ext/scx_qmap.bpf.c
> @@ -471,6 +471,46 @@ void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev)
>  			__sync_fetch_and_add(&nr_dispatched, 1);
> 
>  			scx_bpf_dsq_insert(p, SHARED_DSQ, slice_ns, 0);
> +
> +			/*
> +			 * scx_qmap uses a global BPF queue that any CPU's
> +			 * dispatch can pop from. If this CPU popped a task that
> +			 * can't run here, it gets stranded on SHARED_DSQ after
> +			 * consume_dispatch_q() skips it. Kick the task's home
> +			 * CPU so it drains SHARED_DSQ.
> +			 *
> +			 * There's a race between the pop and the flush of the
> +			 * buffered dsq_insert:
> +			 *
> +			 *  CPU 0 (dispatching)      CPU 1 (home, idle)
> +			 *  ~~~~~~~~~~~~~~~~~~~      ~~~~~~~~~~~~~~~~~~~
> +			 *  pop from BPF queue
> +			 *  dsq_insert(buffered)
> +			 *                           balance:
> +			 *                             SHARED_DSQ empty
> +			 *                             BPF queue empty
> +			 *                             -> goes idle
> +			 *  flush -> on SHARED
> +			 *  kick CPU 1
> +			 *                           wakes, drains task
> +			 *
> +			 * The kick prevents indefinite stalls but a per-CPU
> +			 * kthread like ksoftirqd can be briefly stranded when
> +			 * its home CPU enters idle with softirq pending,
> +			 * triggering:
> +			 *
> +			 *  "NOHZ tick-stop error: local softirq work is pending, handler #N!!!"
> +			 *
> +			 * from report_idle_softirq(). The kick lands shortly
> +			 * after and the home CPU drains the task. This could be
> +			 * avoided by e.g. dispatching pinned tasks to local or
> +			 * global DSQs, but the current code is left as-is to
> +			 * document this class of issue -- other schedulers
> +			 * seeing similar warnings can use this as a reference.
> +			 */
> +			if (!bpf_cpumask_test_cpu(cpu, p->cpus_ptr))
> +				scx_bpf_kick_cpu(scx_bpf_task_cpu(p), 0);
> +
>  			bpf_task_release(p);
> 
>  			batch--;
> --
> 2.53.0

  reply	other threads:[~2026-04-13  5:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-11 11:33 [PATCH sched_ext/for-7.1] tools/sched_ext: Kick idle CPU for pinned tasks in scx_qmap Tejun Heo
2026-04-11 12:57 ` Cheng-Yang Chou
2026-04-11 14:27   ` Cheng-Yang Chou
2026-04-11 15:03 ` Andrea Righi
2026-04-13  3:30 ` [PATCH sched_ext/for-7.1] tools/sched_ext: Kick home CPU for stranded " Tejun Heo
2026-04-13  5:32   ` Andrea Righi [this message]
2026-04-13  5:38   ` Cheng-Yang Chou
2026-04-13 16:21   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adyACFXObn1rZNPh@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=chia7712@gmail.com \
    --cc=emil@etsalapatis.com \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    --cc=yphbchou0911@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox