public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>, Shuah Khan <shuah@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	Luigi De Matteis <ldematteis123@gmail.com>,
	sched-ext@lists.linux.dev, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 04/11] sched/deadline: Add support to initialize and remove dl_server bandwidth
Date: Thu, 6 Nov 2025 18:09:07 +0100	[thread overview]
Message-ID: <aQzWM4vv30etfhok@gpd4> (raw)
In-Reply-To: <aQxvIBIwOCDDu60b@jlelli-thinkpadt14gen4.remote.csb>

On Thu, Nov 06, 2025 at 10:49:20AM +0100, Juri Lelli wrote:
> Hi,
> 
> On 29/10/25 20:08, Andrea Righi wrote:
> > During switching from sched_ext to fair tasks and vice-versa, we need
> > support for intializing and removing the bandwidth contribution of
> > either DL server.
> 
> My first and more general/design question is do we strictly need this
> automagic bandwidth management. We seem to agree [1] that we want to
> move towards explicit dl-server(s) and tasks bandwidth handling, so we
> might want to consider leaving the burden completely to whomever might
> be configuring the system.

I think we decided to take this approach because, once a sched_ext
scheduler is loaded and all tasks are moved to the ext class, the fair
class becomes "empty", but the fair dl-server would still keep its
bandwidth reserved, so somehow we need to release that reservation,
right?

> 
> > Add support for handling these transitions.
> 
> Anyway, if we still want to do this :) ...
> 
> > Moreover, remove references specific to the fair server, in preparation
> > for adding the ext server.
> > 
> > v2: - wait for inactive_task_timer to fire before removing the bandwidth
> >       reservation (Juri Lelli)
> >     - add WARN_ON_ONCE(!cpus) sanity check in dl_server_apply_params()
> >       (Andrea Righi)
> > 
> > Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
> > Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> 
> ...
> 
> > +/**
> > + * dl_server_remove_params - Remove bandwidth reservation for a DL server
> > + * @dl_se: The DL server entity to remove bandwidth for
> > + *
> > + * This function removes the bandwidth reservation for a DL server entity,
> > + * cleaning up all bandwidth accounting and server state.
> > + *
> > + * Returns: 0 on success, negative error code on failure
> > + */
> > +int dl_server_remove_params(struct sched_dl_entity *dl_se,
> > +			    struct rq *rq, struct rq_flags *rf)
> > +{
> > +	if (!dl_se->dl_server)
> > +		return 0; /* Already disabled */
> > +
> > +	/*
> > +	 * First dequeue if still queued. It should not be queued since
> > +	 * we call this only after the last dl_server_stop().
> > +	 */
> > +	if (WARN_ON_ONCE(on_dl_rq(dl_se)))
> > +		dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);
> > +
> > +	if (hrtimer_try_to_cancel(&dl_se->inactive_timer) == -1) {
> > +		rq_unlock_irqrestore(rq, rf);
> 
> This seems racy. I fear the moment we release the rq lock something can
> slip in and the server(s) state might change?

Yeah, however, sched_ext is the only user of dl_server_remove_params() (at
least for now), this is called when an scx scheduler is started/stopped and
concurrent starts/stops can't happen, so maybe we're safe...? And in that
case it'd be helpful to document this scenario clearly, unless we want to
rethink this part and avoid cancelling the timer.

> 
> > +
> > +		hrtimer_cancel(&dl_se->inactive_timer);
> 
> I am not sure we actually need to force cancel the timer (but still
> contradicting myself every time I go back at staring at code :). The way
> I believe this should work 'in theory' is
> 
>  - we remove a server (either automagic or user sets runtime to 0 -
>    which is probably to fix/look at in current implementation as well
>    btw)
>  - current bandwidth is retained and only freed (and server reset) at
>    0-lag (when inactive_timer fires)
>  - if server is activated back before 0-lag it will use it's current
>    parameters
>  - after 0-lag it's a new instance with new parameters

Hm... that means just setting the runtime to 0 IIUC. I think I tried that
approach in the past, but I was seeing some inconsistencies with the
total_bw kselftest, starting/stopping an scx scheduler multiple times
seemed to gradually consume all the available bandwidth.

But I can give it another try, maybe that behavior was caused by other
issues, since we've fixed quite a few things since then.

> 
> In inactive_timer() we have this behavior for simple tasks, but we skip
> __dl_sub() etc for servers (since we clear it up immediately).
> 
> In all this I essentially fear that if we clear parameters immediately
> one could be able to trick the system by quickly disabling/enabling a
> dl-server to let fair/scx tasks execute more than what requested (as
> each new enable will be seen as a new instance). But, again, I wasn't
> yet able to demonstrate this and I am still uncomfortably uncertain.
> Please Peter and others keep me honest.
> 
> Also, server parameters changes are root only, so maybe not a big deal?
> For scx automagic as well?

Only root can start/stop scx schedulers, or better whoever has the CAP_BPF
or CAP_SYS_ADMIN capabilities. There's also some natural throttling in the
rate of scx restarts you can do, due to the overhead of switching to scx
bypass mode, re-enqueue all the tasks, and so on.

Maybe we could even explicitly throttle the rate of scx restarts if it
raises concerns from a security or stability standpoint.

Thanks,
-Andrea

  reply	other threads:[~2025-11-06 17:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29 19:08 [PATCHSET v10 sched_ext/for-6.19] Add a deadline server for sched_ext tasks Andrea Righi
2025-10-29 19:08 ` [PATCH 01/11] sched/debug: Fix updating of ppos on server write ops Andrea Righi
2025-10-29 19:08 ` [PATCH 02/11] sched/debug: Stop and start server based on if it was active Andrea Righi
2025-11-06  7:13   ` Juri Lelli
2025-11-06 16:39     ` Andrea Righi
2025-11-07  6:51       ` Juri Lelli
2025-11-12 17:35         ` Andrea Righi
2025-10-29 19:08 ` [PATCH 03/11] sched/deadline: Clear the defer params Andrea Righi
2025-10-29 19:08 ` [PATCH 04/11] sched/deadline: Add support to initialize and remove dl_server bandwidth Andrea Righi
2025-11-06  9:49   ` Juri Lelli
2025-11-06 17:09     ` Andrea Righi [this message]
2025-11-07 13:53       ` Juri Lelli
2025-10-29 19:08 ` [PATCH 05/11] sched/deadline: Add a server arg to dl_server_update_idle_time() Andrea Righi
2025-10-29 19:08 ` [PATCH 06/11] sched_ext: Add a DL server for sched_ext tasks Andrea Righi
2025-11-06 10:59   ` Juri Lelli
2025-11-06 17:15     ` Andrea Righi
2025-10-29 19:08 ` [PATCH 07/11] sched/debug: Add support to change sched_ext server params Andrea Righi
2025-10-29 19:08 ` [PATCH 08/11] sched/deadline: Account ext server bandwidth Andrea Righi
2025-10-29 19:08 ` [PATCH 09/11] sched_ext: Selectively enable ext and fair DL servers Andrea Righi
2025-10-29 19:08 ` [PATCH 10/11] selftests/sched_ext: Add test for sched_ext dl_server Andrea Righi
2025-10-30 16:49   ` Christian Loehle
2025-10-30 16:57     ` Andrea Righi
2025-10-29 19:08 ` [PATCH 11/11] selftests/sched_ext: Add test for DL server total_bw consistency Andrea Righi
2025-10-30 17:00 ` [PATCHSET v10 sched_ext/for-6.19] Add a deadline server for sched_ext tasks Christian Loehle
2025-11-05 13:47 ` Andrea Righi
2025-11-05 13:59   ` Peter Zijlstra
2025-11-05 14:20     ` Juri Lelli
2025-11-05 14:39       ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQzWM4vv30etfhok@gpd4 \
    --to=arighi@nvidia.com \
    --cc=bpf@vger.kernel.org \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=emil@etsalapatis.com \
    --cc=joelagnelf@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=ldematteis123@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox