From: Christian Loehle <christian.loehle@arm.com>
To: Andrea Righi <arighi@nvidia.com>, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>, Shuah Khan <shuah@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Emil Tsalapatis <emil@etsalapatis.com>,
Luigi De Matteis <ldematteis123@gmail.com>,
sched-ext@lists.linux.dev, bpf@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET v10 sched_ext/for-6.19] Add a deadline server for sched_ext tasks
Date: Thu, 30 Oct 2025 17:00:24 +0000 [thread overview]
Message-ID: <c978e3ed-054f-4849-a4ff-d0fba07e3c19@arm.com> (raw)
In-Reply-To: <20251029191111.167537-1-arighi@nvidia.com>
On 10/29/25 19:08, Andrea Righi wrote:
> sched_ext tasks can be starved by long-running RT tasks, especially since
> RT throttling was replaced by deadline servers to boost only SCHED_NORMAL
> tasks.
>
> Several users in the community have reported issues with RT stalling
> sched_ext tasks. This is fairly common on distributions or environments
> where applications like video compositors, audio services, etc. run as RT
> tasks by default.
>
> Example trace (showing a per-CPU kthread stalled due to the sway Wayland
> compositor running as an RT task):
>
> runnable task stall (kworker/0:0[106377] failed to run for 5.043s)
> ...
> CPU 0 : nr_run=3 flags=0xd cpu_rel=0 ops_qseq=20646200 pnt_seq=45388738
> curr=sway[994] class=rt_sched_class
> R kworker/0:0[106377] -5043ms
> scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=0/0
> sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=0 slice=20000000
> cpus=01
>
> This is often perceived as a bug in the BPF schedulers, but in reality
> schedulers can't do much: RT tasks run outside their control and can
> potentially consume 100% of the CPU bandwidth.
>
> Fix this by adding a sched_ext deadline server, so that sched_ext tasks are
> also boosted and do not suffer starvation.
>
> Two kselftests are also provided to verify the starvation fixes and
> bandwidth allocation is correct.
>
> == Highlights in this version ==
>
> - wait for inactive_task_timer() to fire before removing the bandwidth
> reservation (Juri/Peter: please check if this new
> dl_server_remove_params() implementation makes sense to you)
> - removed the explicit dl_server_stop() from dequeue_task_scx() and rely
> on the delayed stop behavior (Juri/Peter: ditto)
>
> This patchset is also available in the following git branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git scx-dl-server
>
> Changes in v10:
> - reordered patches to better isolate sched_ext changes vs sched/deadline
> changes (Andrea Righi)
> - define ext_server only with CONFIG_SCHED_CLASS_EXT=y (Andrea Righi)
> - add WARN_ON_ONCE(!cpus) check in dl_server_apply_params() (Andrea Righi)
> - wait for inactive_task_timer to fire before removing the bandwidth
> reservation (Juri Lelli)
> - remove explicit dl_server_stop() in dequeue_task_scx() to reduce timer
> reprogramming overhead (Juri Lelli)
> - do not restart pick_task() when invoked by the dl_server (Tejun Heo)
> - rename rq_dl_server to dl_server (Peter Zijlstra)
> - fixed a missing dl_server start in dl_server_on() (Christian Loehle)
> - add a comment to the rt_stall selftest to better explain the 4%
> threshold (Emil Tsalapatis)
>
> Changes in v9:
> - Drop the ->balance() logic as its functionality is now integrated into
> ->pick_task(), allowing dl_server to call pick_task_scx() directly
> - Link to v8: https://lore.kernel.org/all/20250903095008.162049-1-arighi@nvidia.com/
>
> Changes in v8:
> - Add tj's patch to de-couple balance and pick_task and avoid changing
> sched/core callbacks to propagate @rf
> - Simplify dl_se->dl_server check (suggested by PeterZ)
> - Small coding style fixes in the kselftests
> - Link to v7: https://lore.kernel.org/all/20250809184800.129831-1-joelagnelf@nvidia.com/
>
> Changes in v7:
> - Rebased to Linus master
> - Link to v6: https://lore.kernel.org/all/20250702232944.3221001-1-joelagnelf@nvidia.com/
>
> Changes in v6:
> - Added Acks to few patches
> - Fixes to few nits suggested by Tejun
> - Link to v5: https://lore.kernel.org/all/20250620203234.3349930-1-joelagnelf@nvidia.com/
>
> Changes in v5:
> - Added a kselftest (total_bw) to sched_ext to verify bandwidth values
> from debugfs
> - Address comment from Andrea about redundant rq clock invalidation
> - Link to v4: https://lore.kernel.org/all/20250617200523.1261231-1-joelagnelf@nvidia.com/
>
> Changes in v4:
> - Fixed issues with hotplugged CPUs having their DL server bandwidth
> altered due to loading SCX
> - Fixed other issues
> - Rebased on Linus master
> - All sched_ext kselftests reliably pass now, also verified that the
> total_bw in debugfs (CONFIG_SCHED_DEBUG) is conserved with these patches
> - Link to v3: https://lore.kernel.org/all/20250613051734.4023260-1-joelagnelf@nvidia.com/
>
> Changes in v3:
> - Removed code duplication in debugfs. Made ext interface separate
> - Fixed issue where rq_lock_irqsave was not used in the relinquish patch
> - Fixed running bw accounting issue in dl_server_remove_params
> - Link to v2: https://lore.kernel.org/all/20250602180110.816225-1-joelagnelf@nvidia.com/
>
> Changes in v2:
> - Fixed a hang related to using rq_lock instead of rq_lock_irqsave
> - Added support to remove BW of DL servers when they are switched to/from EXT
> - Link to v1: https://lore.kernel.org/all/20250315022158.2354454-1-joelagnelf@nvidia.com/
>
> Andrea Righi (5):
> sched/deadline: Add support to initialize and remove dl_server bandwidth
> sched_ext: Add a DL server for sched_ext tasks
> sched/deadline: Account ext server bandwidth
> sched_ext: Selectively enable ext and fair DL servers
> selftests/sched_ext: Add test for sched_ext dl_server
>
> Joel Fernandes (6):
> sched/debug: Fix updating of ppos on server write ops
> sched/debug: Stop and start server based on if it was active
> sched/deadline: Clear the defer params
> sched/deadline: Add a server arg to dl_server_update_idle_time()
> sched/debug: Add support to change sched_ext server params
> selftests/sched_ext: Add test for DL server total_bw consistency
>
> kernel/sched/core.c | 3 +
> kernel/sched/deadline.c | 169 +++++++++++---
> kernel/sched/debug.c | 171 +++++++++++---
> kernel/sched/ext.c | 144 +++++++++++-
> kernel/sched/fair.c | 2 +-
> kernel/sched/idle.c | 2 +-
> kernel/sched/sched.h | 8 +-
> kernel/sched/topology.c | 5 +
> tools/testing/selftests/sched_ext/Makefile | 2 +
> tools/testing/selftests/sched_ext/rt_stall.bpf.c | 23 ++
> tools/testing/selftests/sched_ext/rt_stall.c | 222 ++++++++++++++++++
> tools/testing/selftests/sched_ext/total_bw.c | 281 +++++++++++++++++++++++
> 12 files changed, 955 insertions(+), 77 deletions(-)
> create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c
> create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c
> create mode 100644 tools/testing/selftests/sched_ext/total_bw.c
Thanks Andrea, I've tested a few things I had in mind with no complaints.
Most importantly it a) it doesn't break the existing fair_server and b)
Ensures BPF schedulers don't stall even with something like:
sudo chrt -r 95 stress-ng --cpu 0 --taskset 0-$(($(nproc)-1)) -t 30m
For patches 0 to 9:
Tested-by: Christian Loehle <christian.loehle@arm.com>
next prev parent reply other threads:[~2025-10-30 17:00 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 19:08 [PATCHSET v10 sched_ext/for-6.19] Add a deadline server for sched_ext tasks Andrea Righi
2025-10-29 19:08 ` [PATCH 01/11] sched/debug: Fix updating of ppos on server write ops Andrea Righi
2025-10-29 19:08 ` [PATCH 02/11] sched/debug: Stop and start server based on if it was active Andrea Righi
2025-11-06 7:13 ` Juri Lelli
2025-11-06 16:39 ` Andrea Righi
2025-11-07 6:51 ` Juri Lelli
2025-11-12 17:35 ` Andrea Righi
2025-10-29 19:08 ` [PATCH 03/11] sched/deadline: Clear the defer params Andrea Righi
2025-10-29 19:08 ` [PATCH 04/11] sched/deadline: Add support to initialize and remove dl_server bandwidth Andrea Righi
2025-11-06 9:49 ` Juri Lelli
2025-11-06 17:09 ` Andrea Righi
2025-11-07 13:53 ` Juri Lelli
2025-10-29 19:08 ` [PATCH 05/11] sched/deadline: Add a server arg to dl_server_update_idle_time() Andrea Righi
2025-10-29 19:08 ` [PATCH 06/11] sched_ext: Add a DL server for sched_ext tasks Andrea Righi
2025-11-06 10:59 ` Juri Lelli
2025-11-06 17:15 ` Andrea Righi
2025-10-29 19:08 ` [PATCH 07/11] sched/debug: Add support to change sched_ext server params Andrea Righi
2025-10-29 19:08 ` [PATCH 08/11] sched/deadline: Account ext server bandwidth Andrea Righi
2025-10-29 19:08 ` [PATCH 09/11] sched_ext: Selectively enable ext and fair DL servers Andrea Righi
2025-10-29 19:08 ` [PATCH 10/11] selftests/sched_ext: Add test for sched_ext dl_server Andrea Righi
2025-10-30 16:49 ` Christian Loehle
2025-10-30 16:57 ` Andrea Righi
2025-10-29 19:08 ` [PATCH 11/11] selftests/sched_ext: Add test for DL server total_bw consistency Andrea Righi
2025-10-30 17:00 ` Christian Loehle [this message]
2025-11-05 13:47 ` [PATCHSET v10 sched_ext/for-6.19] Add a deadline server for sched_ext tasks Andrea Righi
2025-11-05 13:59 ` Peter Zijlstra
2025-11-05 14:20 ` Juri Lelli
2025-11-05 14:39 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c978e3ed-054f-4849-a4ff-d0fba07e3c19@arm.com \
--to=christian.loehle@arm.com \
--cc=arighi@nvidia.com \
--cc=bpf@vger.kernel.org \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=emil@etsalapatis.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=ldematteis123@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sched-ext@lists.linux.dev \
--cc=shuah@kernel.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox