From: zhidao su <soolaugust@gmail.com>
To: Andrea Righi <arighi@nvidia.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] sched_ext: Honor SCX_OPS_ALWAYS_ENQ_IMMED on framework-internal goto-local paths
Date: Mon, 20 Apr 2026 13:29:49 +0800 [thread overview]
Message-ID: <FBBD5AAE-2537-436B-91AC-8031B02F2DB2@gmail.com> (raw)
In-Reply-To: <aeW4UlU5Zi-XSd42@gpd4>
Hi Andrea,
Thank you for the careful review — you're right.
I am not notice the re-enqueue side-effect, will drop this patch and think again.
Apologies for the noise, including the v1 submission error that sent unrelated files.
Thanks,
zhidao
> 2026年4月20日 13:23,Andrea Righi <arighi@nvidia.com> 写道:
>
> Hi zhidao,
>
> On Mon, Apr 20, 2026 at 11:56:46AM +0800, zhidao su wrote:
>> SCX_OPS_ALWAYS_ENQ_IMMED promises that SCX_ENQ_IMMED is set on all local
>> DSQ enqueues. scx_vet_enq_flags() enforces this for BPF kfunc callers
>> (scx_bpf_dsq_insert, scx_bpf_dsq_move_*), but the framework-internal
>> goto-local paths in do_enqueue_task() — PF_EXITING, migration-disabled,
>> and !scx_rq_online fallbacks — bypass scx_vet_enq_flags() entirely.
>>
>> When a scheduler sets SCX_OPS_ALWAYS_ENQ_IMMED, tasks hitting these
>> goto-local paths arrive at dispatch_enqueue() without SCX_ENQ_IMMED in
>> enq_flags, violating the flag's documented semantics.
>>
>> This can be observed with trace_printk instrumentation at the enqueue:
>> label while running a multi-threaded fork-exit workload under a scheduler
>> with SCX_OPS_ALWAYS_ENQ_IMMED:
>>
>> Before (scx_simple + ALWAYS_ENQ_IMMED, 2 CPUs, mmap-contention exit):
>> 95 PF_EXITING local enqueues, 95/95 IMMED=0 ALW=1 <-- bug
>>
>> After:
>> 1030 PF_EXITING local enqueues, 1030/1030 IMMED=1 ALW=1 <-- fixed
>>
>> Fix by checking SCX_OPS_ALWAYS_ENQ_IMMED at the enqueue: label and
>> setting SCX_ENQ_IMMED when dispatching to a local DSQ. This mirrors
>> what scx_vet_enq_flags() does for BPF callers.
>>
>> Fixes: 3229ac4a5ef5 ("sched_ext: Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag")
>> Signed-off-by: zhidao su <suzhidao@xiaomi.com>
>> ---
>> v2: Resend to correct a submission error in v1 where unrelated files
>> were accidentally included in the patch. The code change is
>> identical; only kernel/sched/ext.c is modified. Apologies for
>> the noise.
>> ---
>> kernel/sched/ext.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
>> index 9628c64e5592..0758f5e5a8f0 100644
>> --- a/kernel/sched/ext.c
>> +++ b/kernel/sched/ext.c
>> @@ -1859,7 +1859,9 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
>> * Clear persistent TASK_IMMED for fresh enqueues, see dsq_inc_nr().
>> * Note that exiting and migration-disabled tasks that skip
>> * ops.enqueue() below will lose IMMED protection unless
>> - * %SCX_OPS_ENQ_EXITING / %SCX_OPS_ENQ_MIGRATION_DISABLED are set.
>> + * %SCX_OPS_ENQ_EXITING / %SCX_OPS_ENQ_MIGRATION_DISABLED are set,
>> + * or %SCX_OPS_ALWAYS_ENQ_IMMED is enabled (which re-applies IMMED
>> + * at the enqueue: label below).
>> */
>> p->scx.flags &= ~SCX_TASK_IMMED;
>>
>> @@ -1949,6 +1951,17 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
>> */
>> touch_core_sched(rq, p);
>> refill_task_slice_dfl(sch, p);
>> +
>> + /*
>> + * Honor %SCX_OPS_ALWAYS_ENQ_IMMED for framework-internal local DSQ
>> + * enqueues (PF_EXITING, migration-disabled, !online fallbacks).
>> + * scx_vet_enq_flags() already handles this for BPF kfunc callers,
>> + * but the goto-local paths above bypass it.
>> + */
>> + if ((sch->ops.flags & SCX_OPS_ALWAYS_ENQ_IMMED) &&
>> + dsq == &rq->scx.local_dsq)
>> + enq_flags |= SCX_ENQ_IMMED;
>> +
>
> I'm not sure this should be applied across all fallback cases, it's probably
> safer to avoid triggering re-enqueues for the internal events, especially
> considering that the BPF scheduler doesn't have visibility into them.
>
> If we do this we should also update %SCX_ENQ_IMMED documentation, that says:
>
> * Exiting and migration-disabled tasks bypass ops.enqueue() and
> * are placed directly on a local DSQ without IMMED protection
> * unless %SCX_OPS_ENQ_EXITING and %SCX_OPS_ENQ_MIGRATION_DISABLED
> * are set respectively.
>
> But again, do we actually want to do this? If SCX_OPS_ENQ_EXITING and
> SCX_OPS_ENQ_MIGRATION_DISABLED aren't set, these cases are handled internally by
> the sched_ext core, so triggering a re-enqueue seems unnecessary, as the BPF
> scheduler wouldn't have visibility of such events anyway.
>
>> dispatch_enqueue(sch, rq, dsq, p, enq_flags);
>> }
>>
>> --
>> 2.43.0
>>
>
> Thanks,
> -Andrea
prev parent reply other threads:[~2026-04-20 5:30 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 3:56 [PATCH v2] sched_ext: Honor SCX_OPS_ALWAYS_ENQ_IMMED on framework-internal goto-local paths zhidao su
2026-04-20 5:23 ` Andrea Righi
2026-04-20 5:29 ` zhidao su [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=FBBD5AAE-2537-436B-91AC-8031B02F2DB2@gmail.com \
--to=soolaugust@gmail.com \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox