The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Christian Loehle <christian.loehle@arm.com>
Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	tj@kernel.org, void@manifault.com, changwoo@igalia.com
Subject: Re: [RFC][PATCH] sched_ext: Allow consuming local tasks when aborting
Date: Fri, 8 May 2026 16:14:27 +0200	[thread overview]
Message-ID: <af3vw6Sjh1dW8oEA@gpd4> (raw)
In-Reply-To: <20260507135642.692290-1-christian.loehle@arm.com>

Hi Christian,

On Thu, May 07, 2026 at 02:56:42PM +0100, Christian Loehle wrote:
> When aborting, consume_dispatch_q() breaks out of the task iteration
> loop entirely for non-bypass DSQs. This prevents CPUs from consuming
> even their own tasks (where rq == task_rq) from any DSQ.
> 
> This causes a deadlock during CPU hotplug:
> 
> 1. The BPF scheduler's cpu_offline callback calls scx_bpf_exit(),
>    setting sch->aborting and queuing the disable_work on the helper
>    kthread.
> 
> 2. The helper kthread (and other tasks) are stuck on the global or
>    user DSQs because bypass mode hasn't been entered yet.
> 
> 3. No CPU can consume these tasks due to the aborting break, so the
>    helper never runs scx_root_disable() -> scx_bypass().
> 
> 4. The cpuhp thread is stuck in balance_hotplug_wait() because the
>    dying CPU's rq never drains.
> 
> Tasks on user DSQs are equally affected: BPF schedulers can dispatch
> RCU and other critical kthreads to user DSQs, causing RCU stalls when
> those tasks become unconsumable.
> 
> The aborting check was added to prevent live-locks from the remote task
> migration path (consume_remote_task() -> goto retry), but also avoid
> holding the dsq->lock for too long.
> 
> Change the break to skip only remote tasks via continue, allowing each
> CPU to still consume tasks already on its own rq. This unblocks the
> helper kthread, lets bypass mode activate, and allows both hotplug and
> RCU grace periods to complete.

Have you been able to reproduce this stall condition?

When the kernel forces bypass, scx_bypass() explicitly walks every CPU's
runnable_list and cycles tasks through DEQUEUE_SAVE | DEQUEUE_MOVE so
dispatching stops depending on BPF.

On CPU hotplug the helper kthread (and all the other critical kthreads) should
be also in the runnable_list, so they should be moved to SCX_DSQ_BYPASS and
consume_dispatch_q() should be able to consume them.

Maybe the problem is that in do_enqueue_task() we keep tasks on the local DSQ
when !scx_rq_online(rq), instead we should prioritize the bypass condition.

Does something like the following make sense to you?

Thanks,
-Andrea

 kernel/sched/ext.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 7ac7d10a41bef..277110d950c30 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1901,6 +1901,17 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
 	 */
 	p->scx.flags &= ~SCX_TASK_IMMED;
 
+	/*
+	 * Check bypass before testing the rq online state: bypass mode stops
+	 * processing local DSQs, so tasks should be routed through
+	 * SCX_DSQ_BYPASS rather than dispatched to the local DSQ during CPU
+	 * hotplug events.
+	 */
+	if (scx_bypassing(sch, cpu_of(rq))) {
+		__scx_add_event(sch, SCX_EV_BYPASS_DISPATCH, 1);
+		goto bypass;
+	}
+
 	/*
 	 * If !scx_rq_online(), we already told the BPF scheduler that the CPU
 	 * is offline and are just running the hotplug path. Don't bother the
@@ -1909,11 +1920,6 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
 	if (!scx_rq_online(rq))
 		goto local;
 
-	if (scx_bypassing(sch, cpu_of(rq))) {
-		__scx_add_event(sch, SCX_EV_BYPASS_DISPATCH, 1);
-		goto bypass;
-	}
-
 	if (p->scx.ddsp_dsq_id != SCX_DSQ_INVALID)
 		goto direct;
 

  reply	other threads:[~2026-05-08 14:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07 13:56 [RFC][PATCH] sched_ext: Allow consuming local tasks when aborting Christian Loehle
2026-05-08 14:14 ` Andrea Righi [this message]
2026-05-08 15:45   ` Christian Loehle
2026-05-08 15:28 ` Tejun Heo
2026-05-08 15:47   ` Andrea Righi
2026-05-08 17:59     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af3vw6Sjh1dW8oEA@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=christian.loehle@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox