From: Qais Yousef <qyousef@layalina.io>
To: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>,
Qais Yousef <qyousef@google.com>,
Clark Williams <williams@redhat.com>,
Gabriele Monaco <gmonaco@redhat.com>,
Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>,
Luca Abeni <luca.abeni@santannapisa.it>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
Carlos Llamas <cmllamas@google.com>,
Alice Ryhl <aliceryhl@google.com>
Subject: Re: [PATCH RFC 0/4] sched/deadline: Add soft/reclaim mode via SCHED_OTHER demotion
Date: Sun, 19 Apr 2026 21:58:45 +0100 [thread overview]
Message-ID: <20260419205845.3off5qfgzcfwrznj@airbuntu> (raw)
In-Reply-To: <20260219-upstream-deadline-demotion-v1-0-528b96e53d12@redhat.com>
On 02/19/26 14:37, Juri Lelli wrote:
> Hi All,
>
> This RFC introduces a bandwidth reclaiming mechanism for SCHED_DEADLINE
> tasks through temporary demotion to SCHED_NORMAL when runtime is
> exhausted. This resurrects and refines the demotion concept from the
> original SCHED_DEADLINE development circa 2010, focusing exclusively on
> SCHED_NORMAL demotion.
>
> Discussions about the feature have been resurfacing over the years and I
> wanted to check for feasibility and real interest. Found a little time
> to play around with the idea and this is the result of that.
>
> When a DEADLINE task with SCHED_FLAG_DL_DEMOTION exhausts its runtime
> budget, the scheduler demotes it to SCHED_NORMAL rather than throttling
> it until the next period. The task continues execution competing fairly
> with other normal tasks, using the nice value specified in
> sched_attr.sched_nice. At the next period boundary, the replenishment
> timer automatically promotes the task back to SCHED_DEADLINE with a
> fresh runtime budget.
>
> This provides a "soft(er) real-time" mode where tasks get timing
> guarantees when within budget but gracefully degrade to best-effort
> execution during overruns rather than being suspended. The bandwidth
> reservation remains in place during demotion, making the mechanism
> transparent from an admission control perspective similar to throttling.
I think this can be useful for IPC like binder. Sadly binder can be used
excessively even when not necessary, which can easily add more overhead.
If we can use DL to give them 0.25-0.5ms chance to finish quickly otherwise
demote them to fair, that might be an interesting experiment.
Adding Carlos and Alice in case they're interested in looking at this ;-)
If the patches can be merged, it'd be easier to backport and construct an
experiment in general.
(once globbing is available constructing such experiments with schedqos would
be easy)
Thanks
--
Qais Yousef
>
> Key design aspects:
>
> The implementation focuses solely on SCHED_NORMAL demotion, unlike
> earlier proposals that suggested multiple demotion targets including RT
> and DL postponement. Simpler and maybe enough?
>
> The feature reuses the existing sched_attr.sched_nice field to specify
> the nice value during demotion, avoiding new UAPI additions while
> maintaining ABI compatibility. This is orthogonal to GRUB
> (SCHED_FLAG_RECLAIM) - tasks can combine both mechanisms for
> opportunistic reclaiming through accounting and continued execution
> through demotion (at least in principle, didn't actually test it yet :).
>
> Demoted tasks cannot migrate between CPUs. This simplification keeps
> bandwidth accounting straightforward by ensuring the reservation stays
> on the original CPU throughout demotion. Migration is re-enabled after
> promotion or explicit parameter changes via sched_setattr().
>
> The bandwidth accounting follows the throttling model rather than full
> class switching. Dequeue operations omit DEQUEUE_SAVE to keep the
> reservation in this_bw (admission control bandwidth). Running bandwidth
> (enforcement) is handled at 0-lag time for tasks that sleep while
> demoted, maintaining correct GRUB accounting.
>
> Explicit sched_setattr() calls on demoted tasks cancel the demotion
> state and perform full bandwidth cleanup including inactive timer
> handling and cpuset tracking. The replenishment timer remains armed but
> fires harmlessly when it detects the task is no longer DEADLINE.
>
> This posting is very much experimental. I added AI generated tests
> (included here just for reference) that helped checking a few cases
> during implementation. However, I am quite sure I'm missing several
> additional cases that can cause breakage. Test it at your own risk! :P
>
> Based on original work by Dario Faggioli:
> https://lore.kernel.org/lkml/1288334546.8661.161.camel@Palantir/
>
> As always comments and questions are more than welcome.
>
> Series also available at
>
> git@github.com:jlelli/linux.git upstream/deadline-demotion
>
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> ---
> Juri Lelli (4):
> sched/deadline: Implement reclaim/soft mode through SCHED_OTHER demotion
> sched/doc: Document SCHED_DEADLINE demotion feature
> DEBUG selftests/sched: Add tests for SCHED_DEADLINE demotion feature
> DEBUG selftests/sched: Add simple demonstration of SCHED_DEADLINE demotion
>
> Documentation/scheduler/sched-deadline.rst | 54 +++
> include/linux/sched.h | 10 +
> include/uapi/linux/sched.h | 4 +-
> include/uapi/linux/sched/types.h | 8 +
> kernel/sched/deadline.c | 213 +++++++++-
> kernel/sched/fair.c | 8 +
> kernel/sched/sched.h | 15 +-
> kernel/sched/syscalls.c | 8 +
> tools/testing/selftests/sched/.gitignore | 3 +
> tools/testing/selftests/sched/Makefile | 4 +-
> tools/testing/selftests/sched/README_dl_demotion | 83 ++++
> tools/testing/selftests/sched/dl_demotion_demo.c | 239 +++++++++++
> tools/testing/selftests/sched/dl_demotion_stress.c | 208 ++++++++++
> tools/testing/selftests/sched/dl_demotion_test.c | 460 +++++++++++++++++++++
> .../selftests/sched/run_dl_demotion_with_trace.sh | 71 ++++
> 15 files changed, 1382 insertions(+), 6 deletions(-)
> ---
> base-commit: e34881c84c255bc300f24d9fe685324be20da3d1
> change-id: 20260218-upstream-deadline-demotion-19511e741055
>
> Best regards,
> --
> Juri Lelli <juri.lelli@redhat.com>
>
prev parent reply other threads:[~2026-04-19 20:58 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-19 13:37 [PATCH RFC 0/4] sched/deadline: Add soft/reclaim mode via SCHED_OTHER demotion Juri Lelli
2026-02-19 13:37 ` [PATCH RFC 1/4] sched/deadline: Implement reclaim/soft mode through " Juri Lelli
2026-02-20 19:47 ` Peter Zijlstra
2026-02-23 7:12 ` Juri Lelli
2026-02-19 13:37 ` [PATCH RFC 2/4] sched/doc: Document SCHED_DEADLINE demotion feature Juri Lelli
2026-02-19 13:37 ` [PATCH RFC 3/4] DEBUG selftests/sched: Add tests for " Juri Lelli
2026-02-19 13:37 ` [PATCH RFC 4/4] DEBUG selftests/sched: Add simple demonstration of SCHED_DEADLINE demotion Juri Lelli
2026-04-19 20:58 ` Qais Yousef [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260419205845.3off5qfgzcfwrznj@airbuntu \
--to=qyousef@layalina.io \
--cc=aliceryhl@google.com \
--cc=bsegall@google.com \
--cc=cmllamas@google.com \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=gmonaco@redhat.com \
--cc=juri.lelli@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luca.abeni@santannapisa.it \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=qyousef@google.com \
--cc=rostedt@goodmis.org \
--cc=shuah@kernel.org \
--cc=tommaso.cucinotta@santannapisa.it \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox