The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Christian Loehle <christian.loehle@arm.com>
To: Andrea Righi <arighi@nvidia.com>
Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	tj@kernel.org, void@manifault.com, changwoo@igalia.com
Subject: Re: [RFC][PATCH] sched_ext: Allow consuming local tasks when aborting
Date: Fri, 8 May 2026 16:45:08 +0100	[thread overview]
Message-ID: <993ad147-0cfe-4cc6-8a58-45845064db38@arm.com> (raw)
In-Reply-To: <af3vw6Sjh1dW8oEA@gpd4>

On 5/8/26 15:14, Andrea Righi wrote:
> Hi Christian,
> 
> On Thu, May 07, 2026 at 02:56:42PM +0100, Christian Loehle wrote:
>> When aborting, consume_dispatch_q() breaks out of the task iteration
>> loop entirely for non-bypass DSQs. This prevents CPUs from consuming
>> even their own tasks (where rq == task_rq) from any DSQ.
>>
>> This causes a deadlock during CPU hotplug:
>>
>> 1. The BPF scheduler's cpu_offline callback calls scx_bpf_exit(),
>>    setting sch->aborting and queuing the disable_work on the helper
>>    kthread.
>>
>> 2. The helper kthread (and other tasks) are stuck on the global or
>>    user DSQs because bypass mode hasn't been entered yet.
>>
>> 3. No CPU can consume these tasks due to the aborting break, so the
>>    helper never runs scx_root_disable() -> scx_bypass().
>>
>> 4. The cpuhp thread is stuck in balance_hotplug_wait() because the
>>    dying CPU's rq never drains.
>>
>> Tasks on user DSQs are equally affected: BPF schedulers can dispatch
>> RCU and other critical kthreads to user DSQs, causing RCU stalls when
>> those tasks become unconsumable.
>>
>> The aborting check was added to prevent live-locks from the remote task
>> migration path (consume_remote_task() -> goto retry), but also avoid
>> holding the dsq->lock for too long.
>>
>> Change the break to skip only remote tasks via continue, allowing each
>> CPU to still consume tasks already on its own rq. This unblocks the
>> helper kthread, lets bypass mode activate, and allows both hotplug and
>> RCU grace periods to complete.
> 
> Have you been able to reproduce this stall condition?

Yes, the hotplug selftest reproduces this for me occasionally, I guess
with 100 iteration loop around the 4 test cases it's up to 100%. 

> 
> When the kernel forces bypass, scx_bypass() explicitly walks every CPU's
> runnable_list and cycles tasks through DEQUEUE_SAVE | DEQUEUE_MOVE so
> dispatching stops depending on BPF.
> 
> On CPU hotplug the helper kthread (and all the other critical kthreads) should
> be also in the runnable_list, so they should be moved to SCX_DSQ_BYPASS and
> consume_dispatch_q() should be able to consume them.
> 
> Maybe the problem is that in do_enqueue_task() we keep tasks on the local DSQ
> when !scx_rq_online(rq), instead we should prioritize the bypass condition.
> 
> Does something like the following make sense to you?
> 
> Thanks,
> -Andrea
> 
>  kernel/sched/ext.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 7ac7d10a41bef..277110d950c30 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -1901,6 +1901,17 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
>  	 */
>  	p->scx.flags &= ~SCX_TASK_IMMED;
>  
> +	/*
> +	 * Check bypass before testing the rq online state: bypass mode stops
> +	 * processing local DSQs, so tasks should be routed through
> +	 * SCX_DSQ_BYPASS rather than dispatched to the local DSQ during CPU
> +	 * hotplug events.
> +	 */
> +	if (scx_bypassing(sch, cpu_of(rq))) {
> +		__scx_add_event(sch, SCX_EV_BYPASS_DISPATCH, 1);
> +		goto bypass;
> +	}
> +
>  	/*
>  	 * If !scx_rq_online(), we already told the BPF scheduler that the CPU
>  	 * is offline and are just running the hotplug path. Don't bother the
> @@ -1909,11 +1920,6 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
>  	if (!scx_rq_online(rq))
>  		goto local;
>  
> -	if (scx_bypassing(sch, cpu_of(rq))) {
> -		__scx_add_event(sch, SCX_EV_BYPASS_DISPATCH, 1);
> -		goto bypass;
> -	}
> -
>  	if (p->scx.ddsp_dsq_id != SCX_DSQ_INVALID)
>  		goto direct;
>  
> 

Unfortunately that also locks up, let me go have another look.

  reply	other threads:[~2026-05-08 15:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07 13:56 [RFC][PATCH] sched_ext: Allow consuming local tasks when aborting Christian Loehle
2026-05-08 14:14 ` Andrea Righi
2026-05-08 15:45   ` Christian Loehle [this message]
2026-05-08 15:28 ` Tejun Heo
2026-05-08 15:47   ` Andrea Righi
2026-05-08 17:59     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=993ad147-0cfe-4cc6-8a58-45845064db38@arm.com \
    --to=christian.loehle@arm.com \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox