From: Andrea Righi <arighi@nvidia.com>
To: Christian Loehle <christian.loehle@arm.com>
Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
tj@kernel.org, void@manifault.com, changwoo@igalia.com
Subject: Re: [RFC][PATCH] sched_ext: Allow consuming local tasks when aborting
Date: Fri, 8 May 2026 16:14:27 +0200 [thread overview]
Message-ID: <af3vw6Sjh1dW8oEA@gpd4> (raw)
In-Reply-To: <20260507135642.692290-1-christian.loehle@arm.com>
Hi Christian,
On Thu, May 07, 2026 at 02:56:42PM +0100, Christian Loehle wrote:
> When aborting, consume_dispatch_q() breaks out of the task iteration
> loop entirely for non-bypass DSQs. This prevents CPUs from consuming
> even their own tasks (where rq == task_rq) from any DSQ.
>
> This causes a deadlock during CPU hotplug:
>
> 1. The BPF scheduler's cpu_offline callback calls scx_bpf_exit(),
> setting sch->aborting and queuing the disable_work on the helper
> kthread.
>
> 2. The helper kthread (and other tasks) are stuck on the global or
> user DSQs because bypass mode hasn't been entered yet.
>
> 3. No CPU can consume these tasks due to the aborting break, so the
> helper never runs scx_root_disable() -> scx_bypass().
>
> 4. The cpuhp thread is stuck in balance_hotplug_wait() because the
> dying CPU's rq never drains.
>
> Tasks on user DSQs are equally affected: BPF schedulers can dispatch
> RCU and other critical kthreads to user DSQs, causing RCU stalls when
> those tasks become unconsumable.
>
> The aborting check was added to prevent live-locks from the remote task
> migration path (consume_remote_task() -> goto retry), but also avoid
> holding the dsq->lock for too long.
>
> Change the break to skip only remote tasks via continue, allowing each
> CPU to still consume tasks already on its own rq. This unblocks the
> helper kthread, lets bypass mode activate, and allows both hotplug and
> RCU grace periods to complete.
Have you been able to reproduce this stall condition?
When the kernel forces bypass, scx_bypass() explicitly walks every CPU's
runnable_list and cycles tasks through DEQUEUE_SAVE | DEQUEUE_MOVE so
dispatching stops depending on BPF.
On CPU hotplug the helper kthread (and all the other critical kthreads) should
be also in the runnable_list, so they should be moved to SCX_DSQ_BYPASS and
consume_dispatch_q() should be able to consume them.
Maybe the problem is that in do_enqueue_task() we keep tasks on the local DSQ
when !scx_rq_online(rq), instead we should prioritize the bypass condition.
Does something like the following make sense to you?
Thanks,
-Andrea
kernel/sched/ext.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 7ac7d10a41bef..277110d950c30 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1901,6 +1901,17 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
*/
p->scx.flags &= ~SCX_TASK_IMMED;
+ /*
+ * Check bypass before testing the rq online state: bypass mode stops
+ * processing local DSQs, so tasks should be routed through
+ * SCX_DSQ_BYPASS rather than dispatched to the local DSQ during CPU
+ * hotplug events.
+ */
+ if (scx_bypassing(sch, cpu_of(rq))) {
+ __scx_add_event(sch, SCX_EV_BYPASS_DISPATCH, 1);
+ goto bypass;
+ }
+
/*
* If !scx_rq_online(), we already told the BPF scheduler that the CPU
* is offline and are just running the hotplug path. Don't bother the
@@ -1909,11 +1920,6 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
if (!scx_rq_online(rq))
goto local;
- if (scx_bypassing(sch, cpu_of(rq))) {
- __scx_add_event(sch, SCX_EV_BYPASS_DISPATCH, 1);
- goto bypass;
- }
-
if (p->scx.ddsp_dsq_id != SCX_DSQ_INVALID)
goto direct;
next prev parent reply other threads:[~2026-05-08 14:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 13:56 [RFC][PATCH] sched_ext: Allow consuming local tasks when aborting Christian Loehle
2026-05-08 14:14 ` Andrea Righi [this message]
2026-05-08 15:45 ` Christian Loehle
2026-05-08 15:28 ` Tejun Heo
2026-05-08 15:47 ` Andrea Righi
2026-05-08 17:59 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=af3vw6Sjh1dW8oEA@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=christian.loehle@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox