From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: Matt Fleming <matt@readmodwrite.com>,
sched-ext@lists.linux.dev, kernel-team@cloudflare.com,
void@manifault.com, changwoo@igalia.com, peterz@infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: sched_ext: Partial mode priority and fallthrough to EEVDF
Date: Tue, 10 Mar 2026 19:46:00 +0100 [thread overview]
Message-ID: <abBm6B7wO0iZKidR@gpd4> (raw)
In-Reply-To: <abBidFKC2Hp6Wca-@slm.duckdns.org>
On Tue, Mar 10, 2026 at 08:27:00AM -1000, Tejun Heo wrote:
> Hello, Matt.
>
> On Tue, Mar 10, 2026 at 02:52:13PM +0000, Matt Fleming wrote:
> > At Cloudflare we're experimenting with inverting the priority of the
> > ext_sched_class and fair_sched_class to allow us to pick SCHED_EXT
> > tasks to run before SCHED_NORMAL. This gives us better scheduling
> > decisions for those SCHED_EXT tasks where we can embed business logic
> > into the BPF program and prevents them being starved by the larger
> > number of SCHED_NORMAL tasks under CPU contention. There are a couple
> > of reasons we took this route:
> >
> > 1. Our workloads are heterogeneous and complex and we can't move entire
> > systems to SCHED_EXT in one shot. We want to experiment with running
> > SCHED_EXT in partial mode as we progressively onboard more and more
> > services (we run multiple services on single machines).
> >
> > 2. There's no way today (AFAIK) to run in "full-mode" and have BPF
> > schedulers fallthrough to EEVDF.
> >
> > In an ideal world, 2 is what we'd want to do. Is anyone else interested
> > in this problem or currently working on it? Is there anything coming in
> > the future that would make it easier for those of us slowly
> > transitioning to SCHED_EXT?
>
> Hmm... I have a bit of hard time following how that's different from partial
> mode. If you want the scheduler to decide whether a task should be in SCX or
> fair, you can do so from ops.init_task() by asserting p->scx.disallow. If
> you mean that you want to switch dynamically on each scheduling event, I
> don't think that's a good idea given that each hop would be full sched_class
> switch.
>
> As for the ordering between the two, I don't know. How are you using partial
> mode? No matter how you order them, the behaviors on pathological cases are
> pretty bad and I've been thinking that most would use partial mode to
> partition the system so that some CPUs are managed by SCX and others by fair
> in which case the ordering doesn't matter that much. If you're mixing the
> two classes on the same CPUs, I wonder whether this is something which can
> be better dealt with the deadline servers. Andrea, what do you think?
I think you can model your scenario using the ext deadline server. For
instance, if you run:
# echo 500000000 | tee /sys/kernel/debug/sched/ext_server/cpu*/runtime
This would give sched_ext tasks a guaranteed 50% bandwidth on all CPUs,
(default is 5%), even if there are tasks running at higher sched classes.
Would this approach work for your needs?
-Andrea
next prev parent reply other threads:[~2026-03-10 18:46 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 14:52 sched_ext: Partial mode priority and fallthrough to EEVDF Matt Fleming
2026-03-10 18:27 ` Tejun Heo
2026-03-10 18:46 ` Andrea Righi [this message]
2026-03-11 11:22 ` Matt Fleming
2026-03-11 11:10 ` Matt Fleming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abBm6B7wO0iZKidR@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@readmodwrite.com \
--cc=peterz@infradead.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox