Re: [PATCH 1/2] sched_ext: Auto-register/unregister dl_server reservations

Sched_ext development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Andrea Righi" <arighi@nvidia.com>
Cc: sched-ext@lists.linux.dev
Subject: Re: [PATCH 1/2] sched_ext: Auto-register/unregister dl_server reservations
Date: Thu, 21 May 2026 18:23:43 +0000	[thread overview]
Message-ID: <20260521182344.77E671F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260521174509.1534623-2-arighi@nvidia.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Deferring the ext_server bandwidth reservation to sched_ext load time allows deadline bandwidth overcommit, breaking real-time guarantees.
- [High] Dropping `dl_b->lock` between detaching and attaching deadline servers allows concurrent tasks to steal bandwidth, permanently breaking starvation protection.
- [High] `dl_server_detach_bw()` and `dl_server_attach_bw()` fail to update local runqueue bandwidth (`rq->dl.this_bw`), breaking GRUB reclamation and causing CPU overcommit.
--

commit c14e78e1decaa133b4b018a638661ae8c0968dc2
Author: Andrea Righi <arighi@nvidia.com>

sched_ext: Auto-register/unregister dl_server reservations

This commit changes the deadline server bandwidth reservations for sched_ext
and fair servers to be dynamic. Rather than statically reserving bandwidth
for both at boot, it attaches and detaches the bandwidth reservations
based on whether the BPF scheduler is loaded and operating in full or
partial mode.

> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c

[ ... ]

> +int dl_server_attach_bw(struct sched_dl_entity *dl_se)
> +{
> +	struct rq *rq = dl_se->rq;
> +	int cpu = cpu_of(rq);
> +	struct dl_bw *dl_b;
> +	unsigned long cap;
> +	int cpus;
> +
> +	if (dl_se->dl_bw_attached)
> +		return 0;
> +
> +	dl_b = dl_bw_of(cpu);
> +	guard(raw_spinlock)(&dl_b->lock);
> +
> +	cpus = dl_bw_cpus(cpu);
> +
> +	/*
> +	 * If there's no active CPU in this root domain (e.g., @cpu is offline),
> +	 * just record the intent, so dl_server_add_bw() honors it when CPUs
> +	 * come back.
> +	 */
> +	if (!cpus) {
> +		dl_se->dl_bw_attached = 1;
> +		return 0;
> +	}
> +
> +	cap = dl_bw_capacity(cpu);
> +
> +	if (__dl_overflow(dl_b, cap, 0, dl_se->dl_bw))
> +		return -EBUSY;
> +
> +	__dl_add(dl_b, dl_se->dl_bw, cpus);
> +	dl_se->dl_bw_attached = 1;
> +
> +	return 0;
> +}
> +
> +/*
> + * Detach @dl_se's bandwidth from the root domain's total_bw accounting.
> + *
> + * Use to dynamically unregister a dl_server's bandwidth reservation while
> + * preserving its configured @dl_runtime / @dl_period. No-op if @dl_se is
> + * not currently attached.
> + */
> +void dl_server_detach_bw(struct sched_dl_entity *dl_se)
> +{
> +	struct rq *rq = dl_se->rq;
> +	int cpu = cpu_of(rq);
> +	struct dl_bw *dl_b;
> +	int cpus;
> +
> +	if (!dl_se->dl_bw_attached)
> +		return;
> +
> +	dl_b = dl_bw_of(cpu);
> +	guard(raw_spinlock)(&dl_b->lock);
> +
> +	cpus = dl_bw_cpus(cpu);
> +
> +	/*
> +	 * If no active CPUs in this root domain, the bandwidth isn't in
> +	 * @dl_b right now; only clear the flag so dl_server_add_bw() skips
> +	 * the server on the next root-domain rebuild.
> +	 */
> +	if (cpus)
> +		__dl_sub(dl_b, dl_se->dl_bw, cpus);
> +	dl_se->dl_bw_attached = 0;
> +}

[Severity: High]
Does this miss updating the local runqueue bandwidth?

While the global root domain bandwidth (dl_b->total_bw) is updated here
and in dl_server_attach_bw(), rq->dl.this_bw doesn't appear to be updated
via __sub_rq_bw() and __add_rq_bw(). 

During boot, ext_server is initialized and its bandwidth is added to
this_bw, but the subsequent dl_server_detach_bw() does not subtract it.

Could this permanently inflate this_bw, which then causes the GRUB
reclamation algorithm to incorrectly calculate inactive utilization and
allow active DL tasks to consume more CPU time than admitted?

> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c

[ ... ]

> @@ -6136,10 +6144,24 @@ static void scx_root_disable(struct scx_sched *sch)
>  	/*
>  	 * Invalidate all the rq clocks to prevent getting outdated
>  	 * rq clocks from a previous scx scheduler.
> +	 *
> +	 * Also re-balance the dl_server bandwidth reservations: detach
> +	 * ext_server (no more sched_ext tasks) and reinstate fair_server
> +	 * if it was previously detached because we were running in full
> +	 * mode. Detach before attach to avoid a transient overflow of the
> +	 * root domain's bandwidth capacity.
>  	 */
>  	for_each_possible_cpu(cpu) {
>  		struct rq *rq = cpu_rq(cpu);
> +
>  		scx_rq_clock_invalidate(rq);
> +
> +		scoped_guard(rq_lock_irqsave, rq) {
> +			dl_server_detach_bw(&rq->ext_server);
> +			if (was_switched_all &&
> +			    WARN_ON_ONCE(dl_server_attach_bw(&rq->fair_server)))

[Severity: High]
Can a concurrent task steal the bandwidth during this transition?

Because dl_server_detach_bw() and dl_server_attach_bw() each acquire and
release the root domain's dl_b->lock independently, the lock is dropped
between the two calls.

If another CPU executes sched_setattr(SCHED_DEADLINE) during this window,
can it consume the newly freed bandwidth, causing the attach operation to
fail and permanently leaving the CPU without its starvation protection
server?

> +				pr_warn("failed to re-attach fair_server on CPU %d\n", cpu);
> +		}
>  	}

[ ... ]

> @@ -7314,6 +7336,27 @@ static void scx_root_enable_workfn(struct kthread_work *work)
>  	if (!(ops->flags & SCX_OPS_SWITCH_PARTIAL))
>  		static_branch_enable(&__scx_switched_all);
>  
> +	/*
> +	 * Re-balance the dl_server bandwidth reservations.
> +	 *
> +	 * In full mode (!SCX_OPS_SWITCH_PARTIAL) no task will ever run in
> +	 * the fair class, so detach the fair_server reservation and give
> +	 * that bandwidth back to the RT class. Always attach the
> +	 * ext_server reservation since sched_ext tasks are now possible.
> +	 *
> +	 * Detach before attach to avoid a transient overflow of the root
> +	 * domain's bandwidth capacity.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		struct rq *rq = cpu_rq(cpu);
> +
> +		guard(rq_lock_irqsave)(rq);
> +		if (scx_switched_all())
> +			dl_server_detach_bw(&rq->fair_server);
> +		if (WARN_ON_ONCE(dl_server_attach_bw(&rq->ext_server)))

[Severity: High]
Does this allow deadline bandwidth overcommit?

In partial mode (!scx_switched_all()), the system attempts to attach
ext_server while fair_server is still attached. If user deadline tasks
have already consumed the remaining root domain bandwidth prior to loading
the BPF scheduler, dl_server_attach_bw() will fail with -EBUSY.

Since this error only triggers a warning and proceeds to load the scheduler,
ext_server remains active and will throttle other DL tasks. Does this
silently bypass DL admission control and break real-time guarantees?

> +			pr_warn("failed to attach ext_server on CPU %d\n", cpu);
> +	}
> +
>  	pr_info("sched_ext: BPF scheduler \"%s\" enabled%s\n",
>  		sch->ops.name, scx_switched_all() ? "" : " (partial)");
>  	kobject_uevent(&sch->kobj, KOBJ_ADD);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260521174509.1534623-1-arighi@nvidia.com?part=1

next prev parent reply	other threads:[~2026-05-21 18:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-21 17:33 [PATCHSET sched_ext/for-7.2] sched_ext: Auto-manage ext/fair dl_server bandwidth Andrea Righi
2026-05-21 17:33 ` [PATCH 1/2] sched_ext: Auto-register/unregister dl_server reservations Andrea Righi
2026-05-21 18:23   ` sashiko-bot [this message]
2026-05-22  8:36   ` Peter Zijlstra
2026-05-22 10:02     ` Andrea Righi
2026-05-21 17:33 ` [PATCH 2/2] selftests/sched_ext: Validate dl_server attach/detach in total_bw test Andrea Righi
2026-05-21 18:31   ` sashiko-bot
  -- strict thread matches above, loose matches on Subject: below --
2026-05-26  8:27 [PATCHSET v2 sched_ext/for-7.2] sched_ext: Auto-manage ext/fair dl_server bandwidth Andrea Righi
2026-05-26  8:27 ` [PATCH 1/2] sched_ext: Auto-register/unregister dl_server reservations Andrea Righi
2026-05-26  9:22   ` sashiko-bot
2026-05-26 16:42 [PATCHSET v3 sched_ext/for-7.2] sched_ext: Auto-manage ext/fair dl_server bandwidth Andrea Righi
2026-05-26 16:42 ` [PATCH 1/2] sched_ext: Auto-register/unregister dl_server reservations Andrea Righi
2026-05-26 17:14   ` sashiko-bot
2026-05-28 11:36   ` Peter Zijlstra
2026-05-28 16:13     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260521182344.77E671F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=sched-ext@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox