From: Andreas Ziegler <br025@umbiko.net>
To: Christian Loehle <christian.loehle@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
linux-kernel@vger.kernel.org,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
John Stultz <jstultz@google.com>
Subject: Re: sched/deadline: Use revised wakeup rule for dl_server
Date: Mon, 11 May 2026 12:37:12 +0000 [thread overview]
Message-ID: <701f3a1dd4730f92cb3013176e068a16@umbiko.net> (raw)
In-Reply-To: <50156878-265d-4025-9b36-c819c80b7493@arm.com>
On 2026-05-11 09:47, Christian Loehle wrote:
> On 5/9/26 12:42, Andreas Ziegler wrote:
>> Hi Christian, Everyone,
>>
>> On 2026-05-08 14:13, Christian Loehle wrote:
>>> On 5/8/26 13:06, Andreas Ziegler wrote:
>>>> Hi Christian,
>>>>
>>>> On 2026-05-08 09:20, Christian Loehle wrote:
>>>>> On 5/8/26 09:09, Andreas Ziegler wrote:
>>>>>> Linux kernel version: 6.12
>>>>>> CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
>>>>>> Architecture: aarch64
>>>>>> Platform: Raspberry Pi 4
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for
>>>>>> dl_server) [1] introduced a marked degradation in scheduling
>>>>>> latency for real-time tasks in the presence of heavy I/O load.
>>>>>>
>>>>>> --- a/kernel/sched/deadline.c
>>>>>> +++ b/kernel/sched/deadline.c
>>>>>> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct
>>>>>> sched_dl_entity *dl_se)
>>>>>> if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>>>>>> dl_entity_overflow(dl_se, rq_clock(rq))) {
>>>>>>
>>>>>> - if (unlikely(!dl_is_implicit(dl_se) &&
>>>>>> + if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer)
>>>>>> &&
>>>>>> !dl_time_before(dl_se->deadline, rq_clock(rq))
>>>>>> &&
>>>>>> !is_dl_boosted(dl_se))) {
>>>>>> update_dl_revised_wakeup(dl_se, rq);
>>>>>>
>>>>>> This was observed using a modified version of Con Kolivas'
>>>>>> interactivity benchmark [2]; kernel bisection eventually pointed
>>>>>> to the above mentioned commit.
>>>>>>
>>>>>> Benchmark results before d66792919d4f:
>>>>>>
>>>>>> --- Benchmarking simulated cpu of Audio real time in the presence
>>>>>> of simulated ---
>>>>>> Load Latency +/- SD median max [100n] Desired CPU
>>>>>> Deadlines met [%]
>>>>>> None 76.6 +/- 8.3654 76 166
>>>>>> Video 78.5 +/- 3.9433 78 107
>>>>>> X 76.4 +/- 8.123 75 157
>>>>>> Burn 72.0 +/- 6.4733 71 127
>>>>>> Write 255.3 +/- 26.627 252 331
>>>>>> Read 226.6 +/- 12.38 227 262
>>>>>> Ring 84.2 +/- 6.6207 83 125
>>>>>> Compile 225.3 +/- 23.949 222 328
>>>>>>
>>>>>> 136.8 +/- 78.462 331
>>>>>>
>>>>>> Benchmark results after d66792919d4f:
>>>>>>
>>>>>> --- Benchmarking simulated cpu of Audio real time in the presence
>>>>>> of simulated ---
>>>>>> Load Latency +/- SD median max [100n] Desired CPU
>>>>>> Deadlines met [%]
>>>>>> None 68.4 +/- 9.7864 67 169
>>>>>> Video 74.4 +/- 3.724 74 97
>>>>>> X 72.0 +/- 6.5681 71 129
>>>>>> Burn 66.9 +/- 5.9059 66 117
>>>>>> Write 9576.9 +/- 67639 250500418 98.1 98.1
>>>>>> Read 209.3 +/- 11.018 209 267
>>>>>> Ring 80.5 +/- 8.0993 78 125
>>>>>> Compile 239.0 +/- 29.447 234 372
>>>>>>
>>>>>> 1298.4 +/- 24118 500418
>>>>>>
>>>>>> Reverting this commit obviously solves the issue for me. I have no
>>>>>> idea why this issue appears exclusively with heavy write loads in
>>>>>> the background.
>>>>>>
>>>>>> Is this a scheduler issue, or rather something in the background?
>>>>>>
>>>>>
>>>>> Hi Andreas,
>>>>> You're using cpufreq schedutil for your tests I'm assuming?
>>>>> Is there a difference in cpufreq behavior (avg cpufreq or OPP
>>>>> residencies?)
>>>>> Does the regression also happen on powersave/performance governor?
>>>>
>>>> Actually this is a very stripped-down system. The 'performance'
>>>> cpufreq governor is the only one compiled in, the processor cores
>>>> run on a fixed frequency. CONFIG_PM_OPP is not set.
>>>
>>> That certainly makes the analysis easier.
>>> I couldn't reproduce the issue so far on my system but it does seem
>>> like the dl server
>>> would get potentially unbounded running time with very frequent
>>> starting and stopping of the dlserver (which presumably happens
>>> because of
>>> the writeback) reset the runtime, which then leads to your 25s
>>> observed latency.
>>> Peter, how is the revised wakeup rule supposed to behave here?
>>>
>>>> [snip]
>>
>> This seems to be a case of runtime starvation. If I change
>> sched_rt_runtime_us to a smaller value, the benchmark returns
>> reasonable latency values.
>>
>> # echo "980000" > /proc/sys/kernel/sched_rt_runtime_us
>>
>> I could live with this workaround, since it seems not to impact
>> overall latency values in a noticeable way.
>>
>
> Not a very stable workaround unfortunately :/
> While I try to reproduce this, what you're observing should imply that
> the
> background SCHED_NORMAL work is enough to fully utilize the system,
> right?
> interbench Write does 4k (buffered) writes of a 1GB file and then
> close+open
> and repeat, nothing fancy really. Does this actually produce
> significant CPU
> utilization for you? Can you just run the background work and see what
> that
> looks like?
> (What you're seeing looks like a bug in any case, just so I'm not going
> down
> a wrong path when trying to reproduce here).
You are right, and this was a false positive; the problem seems to be
intermittent (maybe 1/20) and I just got lucky for one session.
Some background information about the current state of the system:
/* CONFIG_CPU_FREQ is not set */
Root filesystem in RAM (initrd)
Cpu 3 is isolated: boot parameters: console=tty1
console=ttyAMA0,115200 isolcpus=nohz,domain,managed_irq,3 nohz_full=3
rcu_nocbs=3
Background load is normally near 100% idle; this is from top after
reboot:
Mem: 95724K used, 853524K free, 42408K shrd, 72K buff, 43352K cached
CPU: 0.0% usr 0.0% sys 0.0% nic 100% idle 0.0% io 0.0% irq 0.0%
sirq
Load average: 0.21 0.17 0.07 3/126 702
The file size used by interbench is even less than 1GB, due to the
limits of the rootfs. Typical values are around 100-200 MiB. It is
written in an infinite loop until receiving the stop message (via pipe)
from the controlling process. The check for the abort signal occurs
after a completed write, not on block level.
I just noticed that interbench seems to have a bug itself: it uses only
one processor - looks like a mangled cpu mask. Top output during the
write benchmark:
Mem: 358024K used, 591224K free, 298516K shrd, 2504K buff, 299464K
cached
CPU: 1.8% usr 23.1% sys 0.0% nic 74.9% idle 0.0% io 0.0% irq 0.0%
sirq
Load average: 1.21 0.46 0.29 5/129 2116
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
2106 2105 root S 1228 0.1 0 23.6 interbench -r -t 60 -u -w
Write -W
2109 2105 root S 1228 0.1 0 1.2 interbench -r -t 60 -u -w
Write -W
1829 1274 root R 1600 0.1 2 0.0 top -d 5
22 2 root SW 0 0.0 0 0.0 [rcuc/0]
1270 2 root IW 0 0.0 0 0.0 [kworker/0:0-eve]
652 1 mpd S 27632 2.9 0 0.0 /usr/bin/mpd
2023 2021 root S 4476 0.4 0 0.0 sshd-session: root@notty
675 673 root S 4448 0.4 1 0.0 sshd-session: root@pts/0
673 601 root S 4140 0.4 0 0.0 sshd-session: root [priv]
2021 601 root S 4140 0.4 0 0.0 sshd-session: root [priv]
601 1 root S 3736 0.3 1 0.0 sshd: /usr/sbin/sshd
[listener] 0
2024 2023 root S 3224 0.3 1 0.0 /usr/libexec/sftp-server
2025 2023 root S 3188 0.3 2 0.0 /usr/libexec/sftp-server
501 1 root S 1884 0.2 1 0.0 /usr/sbin/wpa_supplicant
-B -P /va
131 1 root S 1672 0.1 0 0.0 /sbin/mdev -df
676 675 root S 1636 0.1 1 0.0 -sh
1274 605 root S 1636 0.1 1 0.0 -sh
605 1 root S 1592 0.1 1 0.0 /usr/sbin/telnetd -F
527 1 root S 1576 0.1 2 0.0 udhcpc -t1 -A2 -b -R -O
search -O
1 0 root S 1576 0.1 0 0.0 init
I tried limiting interbench's rather excessive SCHED_FIFO priorities to
values normal for the system, but without success.
next prev parent reply other threads:[~2026-05-11 12:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 8:09 sched/deadline: Use revised wakeup rule for dl_server Andreas Ziegler
2026-05-08 9:20 ` Christian Loehle
2026-05-08 12:06 ` Andreas Ziegler
2026-05-08 14:13 ` Christian Loehle
2026-05-09 11:42 ` Andreas Ziegler
2026-05-11 9:47 ` Christian Loehle
2026-05-11 12:37 ` Andreas Ziegler [this message]
2026-05-11 12:46 ` Juri Lelli
2026-05-11 14:13 ` Andreas Ziegler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=701f3a1dd4730f92cb3013176e068a16@umbiko.net \
--to=br025@umbiko.net \
--cc=christian.loehle@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox