From: "Emil Tsalapatis" <emil@etsalapatis.com>
To: "Tejun Heo" <tj@kernel.org>, <linux-kernel@vger.kernel.org>,
<sched-ext@lists.linux.dev>
Cc: <void@manifault.com>, <arighi@nvidia.com>, <changwoo@igalia.com>
Subject: Re: [PATCH 08/15] sched_ext: Introduce scx_bpf_dsq_reenq() for remote local DSQ reenqueue
Date: Mon, 09 Mar 2026 13:33:05 -0400 [thread overview]
Message-ID: <DGYFJQ2SOLDI.27KTMALD2PMY0@etsalapatis.com> (raw)
In-Reply-To: <20260306190623.1076074-9-tj@kernel.org>
On Fri Mar 6, 2026 at 2:06 PM EST, Tejun Heo wrote:
> scx_bpf_reenqueue_local() can only trigger re-enqueue of the current CPU's
> local DSQ. Introduce scx_bpf_dsq_reenq() which takes a DSQ ID and can target
> any local DSQ including remote CPUs via SCX_DSQ_LOCAL_ON | cpu. This will be
> expanded to support user DSQs by future changes.
>
> scx_bpf_reenqueue_local() is reimplemented as a simple wrapper around
> scx_bpf_dsq_reenq(SCX_DSQ_LOCAL, 0) and may be deprecated in the future.
>
> Update compat.bpf.h with a compatibility shim and scx_qmap to test the new
> functionality.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
> kernel/sched/ext.c | 118 ++++++++++++++---------
> tools/sched_ext/include/scx/compat.bpf.h | 21 ++++
> tools/sched_ext/scx_qmap.bpf.c | 11 ++-
> tools/sched_ext/scx_qmap.c | 5 +-
> 4 files changed, 106 insertions(+), 49 deletions(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 80d1e6ccc326..b02143b10f0f 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -1080,6 +1080,31 @@ static void schedule_deferred_locked(struct rq *rq)
> schedule_deferred(rq);
> }
>
> +static void schedule_dsq_reenq(struct scx_sched *sch, struct scx_dispatch_q *dsq)
> +{
> + /*
> + * Allowing reenqueues doesn't make sense while bypassing. This also
> + * blocks from new reenqueues to be scheduled on dead scheds.
> + */
> + if (unlikely(READ_ONCE(sch->bypass_depth)))
> + return;
> +
> + if (dsq->id == SCX_DSQ_LOCAL) {
> + struct rq *rq = container_of(dsq, struct rq, scx.local_dsq);
> + struct scx_sched_pcpu *sch_pcpu = per_cpu_ptr(sch->pcpu, cpu_of(rq));
> + struct scx_deferred_reenq_local *drl = &sch_pcpu->deferred_reenq_local;
> +
> + scoped_guard (raw_spinlock_irqsave, &rq->scx.deferred_reenq_lock) {
> + if (list_empty(&drl->node))
> + list_move_tail(&drl->node, &rq->scx.deferred_reenq_locals);
> + }
> +
> + schedule_deferred(rq);
> + } else {
> + scx_error(sch, "DSQ 0x%llx not allowed for reenq", dsq->id);
> + }
> +}
> +
> /**
> * touch_core_sched - Update timestamp used for core-sched task ordering
> * @rq: rq to read clock from, must be locked
> @@ -7775,9 +7800,6 @@ __bpf_kfunc_start_defs();
> * Iterate over all of the tasks currently enqueued on the local DSQ of the
> * caller's CPU, and re-enqueue them in the BPF scheduler. Returns the number of
> * processed tasks. Can only be called from ops.cpu_release().
> - *
> - * COMPAT: Will be removed in v6.23 along with the ___v2 suffix on the void
> - * returning variant that can be called from anywhere.
> */
> __bpf_kfunc u32 scx_bpf_reenqueue_local(const struct bpf_prog_aux *aux)
> {
> @@ -8207,6 +8229,52 @@ __bpf_kfunc struct task_struct *scx_bpf_dsq_peek(u64 dsq_id,
> return rcu_dereference(dsq->first_task);
> }
>
> +/**
> + * scx_bpf_dsq_reenq - Re-enqueue tasks on a DSQ
> + * @dsq_id: DSQ to re-enqueue
> + * @reenq_flags: %SCX_RENQ_*
> + * @aux: implicit BPF argument to access bpf_prog_aux hidden from BPF progs
> + *
> + * Iterate over all of the tasks currently enqueued on the DSQ identified by
> + * @dsq_id, and re-enqueue them in the BPF scheduler. The following DSQs are
> + * supported:
> + *
> + * - Local DSQs (%SCX_DSQ_LOCAL or %SCX_DSQ_LOCAL_ON | $cpu)
> + *
> + * Re-enqueues are performed asynchronously. Can be called from anywhere.
> + */
> +__bpf_kfunc void scx_bpf_dsq_reenq(u64 dsq_id, u64 reenq_flags,
> + const struct bpf_prog_aux *aux)
> +{
> + struct scx_sched *sch;
> + struct scx_dispatch_q *dsq;
> +
> + guard(preempt)();
> +
> + sch = scx_prog_sched(aux);
> + if (unlikely(!sch))
> + return;
> +
> + dsq = find_dsq_for_dispatch(sch, this_rq(), dsq_id, smp_processor_id());
> + schedule_dsq_reenq(sch, dsq);
> +}
> +
> +/**
> + * scx_bpf_reenqueue_local - Re-enqueue tasks on a local DSQ
> + * @aux: implicit BPF argument to access bpf_prog_aux hidden from BPF progs
> + *
> + * Iterate over all of the tasks currently enqueued on the local DSQ of the
> + * caller's CPU, and re-enqueue them in the BPF scheduler. Can be called from
> + * anywhere.
> + *
> + * This is now a special case of scx_bpf_dsq_reenq() and may be removed in the
> + * future.
> + */
> +__bpf_kfunc void scx_bpf_reenqueue_local___v2(const struct bpf_prog_aux *aux)
> +{
> + scx_bpf_dsq_reenq(SCX_DSQ_LOCAL, 0, aux);
> +}
> +
> __bpf_kfunc_end_defs();
>
> static s32 __bstr_format(struct scx_sched *sch, u64 *data_buf, char *line_buf,
> @@ -8364,47 +8432,6 @@ __bpf_kfunc void scx_bpf_dump_bstr(char *fmt, unsigned long long *data,
> ops_dump_flush();
> }
>
> -/**
> - * scx_bpf_reenqueue_local - Re-enqueue tasks on a local DSQ
> - * @aux: implicit BPF argument to access bpf_prog_aux hidden from BPF progs
> - *
> - * Iterate over all of the tasks currently enqueued on the local DSQ of the
> - * caller's CPU, and re-enqueue them in the BPF scheduler. Can be called from
> - * anywhere.
> - */
> -__bpf_kfunc void scx_bpf_reenqueue_local___v2(const struct bpf_prog_aux *aux)
> -{
> - unsigned long flags;
> - struct scx_sched *sch;
> - struct rq *rq;
> -
> - raw_local_irq_save(flags);
> -
> - sch = scx_prog_sched(aux);
> - if (unlikely(!sch))
> - goto out_irq_restore;
> -
> - /*
> - * Allowing reenqueue-locals doesn't make sense while bypassing. This
> - * also blocks from new reenqueues to be scheduled on dead scheds.
> - */
> - if (unlikely(sch->bypass_depth))
> - goto out_irq_restore;
> -
> - rq = this_rq();
> - scoped_guard (raw_spinlock, &rq->scx.deferred_reenq_lock) {
> - struct scx_sched_pcpu *pcpu = this_cpu_ptr(sch->pcpu);
> -
> - if (list_empty(&pcpu->deferred_reenq_local.node))
> - list_move_tail(&pcpu->deferred_reenq_local.node,
> - &rq->scx.deferred_reenq_locals);
> - }
> -
> - schedule_deferred(rq);
> -out_irq_restore:
> - raw_local_irq_restore(flags);
> -}
> -
> /**
> * scx_bpf_cpuperf_cap - Query the maximum relative capacity of a CPU
> * @cpu: CPU of interest
> @@ -8821,13 +8848,14 @@ BTF_ID_FLAGS(func, scx_bpf_kick_cpu, KF_IMPLICIT_ARGS)
> BTF_ID_FLAGS(func, scx_bpf_dsq_nr_queued)
> BTF_ID_FLAGS(func, scx_bpf_destroy_dsq)
> BTF_ID_FLAGS(func, scx_bpf_dsq_peek, KF_IMPLICIT_ARGS | KF_RCU_PROTECTED | KF_RET_NULL)
> +BTF_ID_FLAGS(func, scx_bpf_dsq_reenq, KF_IMPLICIT_ARGS)
> +BTF_ID_FLAGS(func, scx_bpf_reenqueue_local___v2, KF_IMPLICIT_ARGS)
> BTF_ID_FLAGS(func, bpf_iter_scx_dsq_new, KF_IMPLICIT_ARGS | KF_ITER_NEW | KF_RCU_PROTECTED)
> BTF_ID_FLAGS(func, bpf_iter_scx_dsq_next, KF_ITER_NEXT | KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_scx_dsq_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, scx_bpf_exit_bstr, KF_IMPLICIT_ARGS)
> BTF_ID_FLAGS(func, scx_bpf_error_bstr, KF_IMPLICIT_ARGS)
> BTF_ID_FLAGS(func, scx_bpf_dump_bstr, KF_IMPLICIT_ARGS)
> -BTF_ID_FLAGS(func, scx_bpf_reenqueue_local___v2, KF_IMPLICIT_ARGS)
> BTF_ID_FLAGS(func, scx_bpf_cpuperf_cap, KF_IMPLICIT_ARGS)
> BTF_ID_FLAGS(func, scx_bpf_cpuperf_cur, KF_IMPLICIT_ARGS)
> BTF_ID_FLAGS(func, scx_bpf_cpuperf_set, KF_IMPLICIT_ARGS)
> diff --git a/tools/sched_ext/include/scx/compat.bpf.h b/tools/sched_ext/include/scx/compat.bpf.h
> index f2969c3061a7..2d3985be7e2c 100644
> --- a/tools/sched_ext/include/scx/compat.bpf.h
> +++ b/tools/sched_ext/include/scx/compat.bpf.h
> @@ -375,6 +375,27 @@ static inline void scx_bpf_reenqueue_local(void)
> scx_bpf_reenqueue_local___v1();
> }
>
> +/*
> + * v6.20: New scx_bpf_dsq_reenq() that allows re-enqueues on more DSQs. This
> + * will eventually deprecate scx_bpf_reenqueue_local().
> + */
> +void scx_bpf_dsq_reenq___compat(u64 dsq_id, u64 reenq_flags, const struct bpf_prog_aux *aux__prog) __ksym __weak;
> +
> +static inline bool __COMPAT_has_generic_reenq(void)
> +{
> + return bpf_ksym_exists(scx_bpf_dsq_reenq___compat);
> +}
> +
> +static inline void scx_bpf_dsq_reenq(u64 dsq_id, u64 reenq_flags)
> +{
> + if (bpf_ksym_exists(scx_bpf_dsq_reenq___compat))
> + scx_bpf_dsq_reenq___compat(dsq_id, reenq_flags, NULL);
> + else if (dsq_id == SCX_DSQ_LOCAL && reenq_flags == 0)
> + scx_bpf_reenqueue_local();
> + else
> + scx_bpf_error("kernel too old to reenqueue foreign local or user DSQs");
> +}
> +
> /*
> * Define sched_ext_ops. This may be expanded to define multiple variants for
> * backward compatibility. See compat.h::SCX_OPS_LOAD/ATTACH().
> diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c
> index 91b8eac83f52..83e8289e8c0c 100644
> --- a/tools/sched_ext/scx_qmap.bpf.c
> +++ b/tools/sched_ext/scx_qmap.bpf.c
> @@ -131,7 +131,7 @@ struct {
> } cpu_ctx_stor SEC(".maps");
>
> /* Statistics */
> -u64 nr_enqueued, nr_dispatched, nr_reenqueued, nr_dequeued, nr_ddsp_from_enq;
> +u64 nr_enqueued, nr_dispatched, nr_reenqueued, nr_reenqueued_cpu0, nr_dequeued, nr_ddsp_from_enq;
> u64 nr_core_sched_execed;
> u64 nr_expedited_local, nr_expedited_remote, nr_expedited_lost, nr_expedited_from_timer;
> u32 cpuperf_min, cpuperf_avg, cpuperf_max;
> @@ -206,8 +206,11 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags)
> void *ring;
> s32 cpu;
>
> - if (enq_flags & SCX_ENQ_REENQ)
> + if (enq_flags & SCX_ENQ_REENQ) {
> __sync_fetch_and_add(&nr_reenqueued, 1);
> + if (scx_bpf_task_cpu(p) == 0)
> + __sync_fetch_and_add(&nr_reenqueued_cpu0, 1);
> + }
>
> if (p->flags & PF_KTHREAD) {
> if (stall_kernel_nth && !(++kernel_cnt % stall_kernel_nth))
> @@ -561,6 +564,10 @@ int BPF_PROG(qmap_sched_switch, bool preempt, struct task_struct *prev,
> case 2: /* SCHED_RR */
> case 6: /* SCHED_DEADLINE */
> scx_bpf_reenqueue_local();
> +
> + /* trigger re-enqueue on CPU0 just to exercise LOCAL_ON */
> + if (__COMPAT_has_generic_reenq())
> + scx_bpf_dsq_reenq(SCX_DSQ_LOCAL_ON | 0, 0);
> }
>
> return 0;
> diff --git a/tools/sched_ext/scx_qmap.c b/tools/sched_ext/scx_qmap.c
> index 5d762d10f4db..9252037284d3 100644
> --- a/tools/sched_ext/scx_qmap.c
> +++ b/tools/sched_ext/scx_qmap.c
> @@ -137,9 +137,10 @@ int main(int argc, char **argv)
> long nr_enqueued = skel->bss->nr_enqueued;
> long nr_dispatched = skel->bss->nr_dispatched;
>
> - printf("stats : enq=%lu dsp=%lu delta=%ld reenq=%"PRIu64" deq=%"PRIu64" core=%"PRIu64" enq_ddsp=%"PRIu64"\n",
> + printf("stats : enq=%lu dsp=%lu delta=%ld reenq/cpu0=%"PRIu64"/%"PRIu64" deq=%"PRIu64" core=%"PRIu64" enq_ddsp=%"PRIu64"\n",
> nr_enqueued, nr_dispatched, nr_enqueued - nr_dispatched,
> - skel->bss->nr_reenqueued, skel->bss->nr_dequeued,
> + skel->bss->nr_reenqueued, skel->bss->nr_reenqueued_cpu0,
> + skel->bss->nr_dequeued,
> skel->bss->nr_core_sched_execed,
> skel->bss->nr_ddsp_from_enq);
> printf(" exp_local=%"PRIu64" exp_remote=%"PRIu64" exp_timer=%"PRIu64" exp_lost=%"PRIu64"\n",
next prev parent reply other threads:[~2026-03-09 17:33 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-06 19:06 [PATCHSET sched_ext/for-7.1] sched_ext: Overhaul DSQ reenqueue infrastructure Tejun Heo
2026-03-06 19:06 ` [PATCH 01/15] sched_ext: Relocate scx_bpf_task_cgroup() and its BTF_ID to the end of kfunc section Tejun Heo
2026-03-06 20:45 ` Emil Tsalapatis
2026-03-06 23:20 ` Daniel Jordan
2026-03-06 19:06 ` [PATCH 02/15] sched_ext: Wrap global DSQs in per-node structure Tejun Heo
2026-03-06 20:52 ` Emil Tsalapatis
2026-03-06 23:20 ` Daniel Jordan
2026-03-06 19:06 ` [PATCH 03/15] sched_ext: Factor out pnode allocation and deallocation into helpers Tejun Heo
2026-03-06 20:54 ` Emil Tsalapatis
2026-03-06 23:21 ` Daniel Jordan
2026-03-06 19:06 ` [PATCH 04/15] sched_ext: Change find_global_dsq() to take CPU number instead of task Tejun Heo
2026-03-06 21:06 ` Emil Tsalapatis
2026-03-06 22:33 ` [PATCH v2 " Tejun Heo
2026-03-06 23:21 ` [PATCH " Daniel Jordan
2026-03-06 19:06 ` [PATCH 05/15] sched_ext: Relocate reenq_local() and run_deferred() Tejun Heo
2026-03-06 21:09 ` Emil Tsalapatis
2026-03-06 23:34 ` Daniel Jordan
2026-03-07 0:12 ` [PATCH v2 05/15] sched_ext: Relocate run_deferred() and its callees Tejun Heo
2026-03-06 19:06 ` [PATCH 06/15] sched_ext: Convert deferred_reenq_locals from llist to regular list Tejun Heo
2026-03-09 17:12 ` Emil Tsalapatis
2026-03-09 17:16 ` Emil Tsalapatis
2026-03-06 19:06 ` [PATCH 07/15] sched_ext: Wrap deferred_reenq_local_node into a struct Tejun Heo
2026-03-09 17:16 ` Emil Tsalapatis
2026-03-06 19:06 ` [PATCH 08/15] sched_ext: Introduce scx_bpf_dsq_reenq() for remote local DSQ reenqueue Tejun Heo
2026-03-09 17:33 ` Emil Tsalapatis [this message]
2026-03-06 19:06 ` [PATCH 09/15] sched_ext: Add reenq_flags plumbing to scx_bpf_dsq_reenq() Tejun Heo
2026-03-09 17:47 ` Emil Tsalapatis
2026-03-06 19:06 ` [PATCH 10/15] sched_ext: Add per-CPU data to DSQs Tejun Heo
2026-03-06 22:54 ` Andrea Righi
2026-03-06 22:56 ` Andrea Righi
2026-03-06 23:09 ` [PATCH v2 " Tejun Heo
2026-03-06 19:06 ` [PATCH 11/15] sched_ext: Factor out nldsq_cursor_next_task() and nldsq_cursor_lost_task() Tejun Heo
2026-03-06 19:06 ` [PATCH 12/15] sched_ext: Implement scx_bpf_dsq_reenq() for user DSQs Tejun Heo
2026-03-06 19:06 ` [PATCH 13/15] sched_ext: Optimize schedule_dsq_reenq() with lockless fast path Tejun Heo
2026-03-06 19:06 ` [PATCH 14/15] sched_ext: Simplify task state handling Tejun Heo
2026-03-06 19:06 ` [PATCH 15/15] sched_ext: Add SCX_TASK_REENQ_REASON flags Tejun Heo
2026-03-06 23:14 ` [PATCHSET sched_ext/for-7.1] sched_ext: Overhaul DSQ reenqueue infrastructure Andrea Righi
2026-03-07 15:38 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DGYFJQ2SOLDI.27KTMALD2PMY0@etsalapatis.com \
--to=emil@etsalapatis.com \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox