From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from serv15.avernis.de (serv15.avernis.de [176.9.89.163]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C9C938C41B for ; Mon, 11 May 2026 12:37:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=176.9.89.163 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778503044; cv=none; b=dmVqLugiEv4ku1GqxdAzCPF6hREQTYhjUQApHp66i7vBn4yyf6/WpjvmbNpT0V/RedUNpw+h1t1Ox5unaDypH6+IR++g65c8kCm47HUly70QMJerxHFgo+oijxwHUVh1tCTpCRm93IfYtKURkxP4jknuBWNSSjYL2dkvP9kc/LU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778503044; c=relaxed/simple; bh=Kz3dcTuYXzrQSBmrCA/43WbKY4BVVZ84kDCCgR/VKB4=; h=MIME-Version:Date:From:To:Cc:Subject:In-Reply-To:References: Message-ID:Content-Type; b=F3Is4pV5pmcsflGTlB68ANcDv0rbOYhmF5DVhlAOQTZqr87WJKGbFMQezv70ET6pp6u+ClQhufJQhKGkRYIt+kktBC/uZh8eYxKW2OJ/eu7NpESE2N7jpROW/tRSIYF/Qi0GJgMwpE7UnAp3Mmgq0ejE6VqLiiF+oNJPiqDrphI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=umbiko.net; spf=none smtp.mailfrom=umbiko.net; dkim=pass (1024-bit key) header.d=umbiko.net header.i=@umbiko.net header.b=InLdO5Tp; arc=none smtp.client-ip=176.9.89.163 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=umbiko.net Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=umbiko.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=umbiko.net header.i=@umbiko.net header.b="InLdO5Tp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=umbiko.net; s=mail; t=1778503033; bh=Kz3dcTuYXzrQSBmrCA/43WbKY4BVVZ84kDCCgR/VKB4=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=InLdO5TpZneWKXcqkSJqHhkHvtxG7EUh+q6O0zJs6dsvP0gnUmS8mz0QpPFSifJFs h9v0yDbLLLzKtwbqSSLla8lCmNhimFnltzd/DisZNKLqjtIREZU9oFkWWNZSbEMnWt nW16MAWsUG7MbZARnKBUoPRPp0ZsA5nhRJUfD7CE= Received: by serv15.avernis.de (Postfix) with ESMTPSA id 9430EBDEC4BB; Mon, 11 May 2026 14:37:12 +0200 (CEST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Mon, 11 May 2026 12:37:12 +0000 From: Andreas Ziegler To: Christian Loehle Cc: Peter Zijlstra , Juri Lelli , linux-kernel@vger.kernel.org, Dietmar Eggemann , John Stultz Subject: Re: sched/deadline: Use revised wakeup rule for dl_server In-Reply-To: <50156878-265d-4025-9b36-c819c80b7493@arm.com> References: <496e4b3329fe258da9618b9f05b18fcf@umbiko.net> <97d9e04fd9d222f1a64f1ecfda8b81d7@umbiko.net> <50156878-265d-4025-9b36-c819c80b7493@arm.com> Message-ID: <701f3a1dd4730f92cb3013176e068a16@umbiko.net> X-Sender: br025@umbiko.net Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: clamav-milter 1.4.3 at serv15.avernis.de X-Virus-Status: Clean On 2026-05-11 09:47, Christian Loehle wrote: > On 5/9/26 12:42, Andreas Ziegler wrote: >> Hi Christian, Everyone, >> >> On 2026-05-08 14:13, Christian Loehle wrote: >>> On 5/8/26 13:06, Andreas Ziegler wrote: >>>> Hi Christian, >>>> >>>> On 2026-05-08 09:20, Christian Loehle wrote: >>>>> On 5/8/26 09:09, Andreas Ziegler wrote: >>>>>> Linux kernel version: 6.12 >>>>>>   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied) >>>>>> Architecture: aarch64 >>>>>> Platform: Raspberry Pi 4 >>>>>> >>>>>> Hi everyone, >>>>>> >>>>>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for >>>>>> dl_server) [1] introduced a marked degradation in scheduling >>>>>> latency for real-time tasks in the presence of heavy I/O load. >>>>>> >>>>>> --- a/kernel/sched/deadline.c >>>>>> +++ b/kernel/sched/deadline.c >>>>>> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct >>>>>> sched_dl_entity *dl_se) >>>>>>      if (dl_time_before(dl_se->deadline, rq_clock(rq)) || >>>>>>          dl_entity_overflow(dl_se, rq_clock(rq))) { >>>>>> >>>>>> -        if (unlikely(!dl_is_implicit(dl_se) && >>>>>> +        if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) >>>>>> && >>>>>>                   !dl_time_before(dl_se->deadline, rq_clock(rq)) >>>>>> && >>>>>>                   !is_dl_boosted(dl_se))) { >>>>>>              update_dl_revised_wakeup(dl_se, rq); >>>>>> >>>>>> This was observed using a modified version of Con Kolivas' >>>>>> interactivity benchmark [2]; kernel bisection eventually pointed >>>>>> to the above mentioned commit. >>>>>> >>>>>> Benchmark results before d66792919d4f: >>>>>> >>>>>> --- Benchmarking simulated cpu of Audio real time in the presence >>>>>> of simulated --- >>>>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  >>>>>> Deadlines met [%] >>>>>> None      76.6 +/- 8.3654    76  166 >>>>>> Video      78.5 +/- 3.9433    78  107 >>>>>> X      76.4 +/- 8.123     75  157 >>>>>> Burn      72.0 +/- 6.4733    71  127 >>>>>> Write     255.3 +/- 26.627   252  331 >>>>>> Read     226.6 +/- 12.38    227  262 >>>>>> Ring      84.2 +/- 6.6207    83  125 >>>>>> Compile     225.3 +/- 23.949   222  328 >>>>>> >>>>>>      136.8 +/- 78.462        331 >>>>>> >>>>>> Benchmark results after d66792919d4f: >>>>>> >>>>>> --- Benchmarking simulated cpu of Audio real time in the presence >>>>>> of simulated --- >>>>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  >>>>>> Deadlines met [%] >>>>>> None      68.4 +/- 9.7864    67  169 >>>>>> Video      74.4 +/- 3.724     74   97 >>>>>> X      72.0 +/- 6.5681    71  129 >>>>>> Burn      66.9 +/- 5.9059    66  117 >>>>>> Write    9576.9 +/- 67639    250500418        98.1         98.1 >>>>>> Read     209.3 +/- 11.018   209  267 >>>>>> Ring      80.5 +/- 8.0993    78  125 >>>>>> Compile     239.0 +/- 29.447   234  372 >>>>>> >>>>>>     1298.4 +/- 24118       500418 >>>>>> >>>>>> Reverting this commit obviously solves the issue for me. I have no >>>>>> idea why this issue appears exclusively with heavy write loads in >>>>>> the background. >>>>>> >>>>>> Is this a scheduler issue, or rather something in the background? >>>>>> >>>>> >>>>> Hi Andreas, >>>>> You're using cpufreq schedutil for your tests I'm assuming? >>>>> Is there a difference in cpufreq behavior (avg cpufreq or OPP >>>>> residencies?) >>>>> Does the regression also happen on powersave/performance governor? >>>> >>>> Actually this is a very stripped-down system. The 'performance' >>>> cpufreq governor is the only one compiled in, the processor cores >>>> run on a fixed frequency. CONFIG_PM_OPP is not set. >>> >>> That certainly makes the analysis easier. >>> I couldn't reproduce the issue so far on my system but it does seem >>> like the dl server >>> would get potentially unbounded running time with very frequent >>> starting and stopping of the dlserver (which presumably happens >>> because of >>> the writeback) reset the runtime, which then leads to your 25s >>> observed latency. >>> Peter, how is the revised wakeup rule supposed to behave here? >>> >>>> [snip] >> >> This seems to be a case of runtime starvation. If I change >> sched_rt_runtime_us to a smaller value, the benchmark returns >> reasonable latency values. >> >> # echo "980000" > /proc/sys/kernel/sched_rt_runtime_us >> >> I could live with this workaround, since it seems not to impact >> overall latency values in a noticeable way. >> > > Not a very stable workaround unfortunately :/ > While I try to reproduce this, what you're observing should imply that > the > background SCHED_NORMAL work is enough to fully utilize the system, > right? > interbench Write does 4k (buffered) writes of a 1GB file and then > close+open > and repeat, nothing fancy really. Does this actually produce > significant CPU > utilization for you? Can you just run the background work and see what > that > looks like? > (What you're seeing looks like a bug in any case, just so I'm not going > down > a wrong path when trying to reproduce here). You are right, and this was a false positive; the problem seems to be intermittent (maybe 1/20) and I just got lucky for one session. Some background information about the current state of the system: /* CONFIG_CPU_FREQ is not set */ Root filesystem in RAM (initrd) Cpu 3 is isolated: boot parameters: console=tty1 console=ttyAMA0,115200 isolcpus=nohz,domain,managed_irq,3 nohz_full=3 rcu_nocbs=3 Background load is normally near 100% idle; this is from top after reboot: Mem: 95724K used, 853524K free, 42408K shrd, 72K buff, 43352K cached CPU: 0.0% usr 0.0% sys 0.0% nic 100% idle 0.0% io 0.0% irq 0.0% sirq Load average: 0.21 0.17 0.07 3/126 702 The file size used by interbench is even less than 1GB, due to the limits of the rootfs. Typical values are around 100-200 MiB. It is written in an infinite loop until receiving the stop message (via pipe) from the controlling process. The check for the abort signal occurs after a completed write, not on block level. I just noticed that interbench seems to have a bug itself: it uses only one processor - looks like a mangled cpu mask. Top output during the write benchmark: Mem: 358024K used, 591224K free, 298516K shrd, 2504K buff, 299464K cached CPU: 1.8% usr 23.1% sys 0.0% nic 74.9% idle 0.0% io 0.0% irq 0.0% sirq Load average: 1.21 0.46 0.29 5/129 2116 PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 2106 2105 root S 1228 0.1 0 23.6 interbench -r -t 60 -u -w Write -W 2109 2105 root S 1228 0.1 0 1.2 interbench -r -t 60 -u -w Write -W 1829 1274 root R 1600 0.1 2 0.0 top -d 5 22 2 root SW 0 0.0 0 0.0 [rcuc/0] 1270 2 root IW 0 0.0 0 0.0 [kworker/0:0-eve] 652 1 mpd S 27632 2.9 0 0.0 /usr/bin/mpd 2023 2021 root S 4476 0.4 0 0.0 sshd-session: root@notty 675 673 root S 4448 0.4 1 0.0 sshd-session: root@pts/0 673 601 root S 4140 0.4 0 0.0 sshd-session: root [priv] 2021 601 root S 4140 0.4 0 0.0 sshd-session: root [priv] 601 1 root S 3736 0.3 1 0.0 sshd: /usr/sbin/sshd [listener] 0 2024 2023 root S 3224 0.3 1 0.0 /usr/libexec/sftp-server 2025 2023 root S 3188 0.3 2 0.0 /usr/libexec/sftp-server 501 1 root S 1884 0.2 1 0.0 /usr/sbin/wpa_supplicant -B -P /va 131 1 root S 1672 0.1 0 0.0 /sbin/mdev -df 676 675 root S 1636 0.1 1 0.0 -sh 1274 605 root S 1636 0.1 1 0.0 -sh 605 1 root S 1592 0.1 1 0.0 /usr/sbin/telnetd -F 527 1 root S 1576 0.1 2 0.0 udhcpc -t1 -A2 -b -R -O search -O 1 0 root S 1576 0.1 0 0.0 init I tried limiting interbench's rather excessive SCHED_FIFO priorities to values normal for the system, but without success.