Re: sched/deadline: Use revised wakeup rule for dl

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Re: sched/deadline: Use revised wakeup rule for dl_server
@ 2026-05-08  8:09 Andreas Ziegler
  2026-05-08  9:20 ` Christian Loehle
  2026-05-11 12:46 ` Juri Lelli
  0 siblings, 2 replies; 9+ messages in thread
From: Andreas Ziegler @ 2026-05-08  8:09 UTC (permalink / raw)
  To: Peter Zijlstra, Juri Lelli; +Cc: linux-kernel

Linux kernel version: 6.12
   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
Architecture: aarch64
Platform: Raspberry Pi 4

Hi everyone,

Commit d66792919d4f (sched/deadline: Use revised wakeup rule for 
dl_server) [1] introduced a marked degradation in scheduling latency for 
real-time tasks in the presence of heavy I/O load.

--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1079,7 +1079,7 @@ static void update_dl_entity(struct 
sched_dl_entity *dl_se)
  	if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
  	    dl_entity_overflow(dl_se, rq_clock(rq))) {

-		if (unlikely(!dl_is_implicit(dl_se) &&
+		if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
  			     !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
  			     !is_dl_boosted(dl_se))) {
  			update_dl_revised_wakeup(dl_se, rq);

This was observed using a modified version of Con Kolivas' interactivity 
benchmark [2]; kernel bisection eventually pointed to the above 
mentioned commit.

Benchmark results before d66792919d4f:

--- Benchmarking simulated cpu of Audio real time in the presence of 
simulated ---
Load	Latency +/- SD   median  max [100n]	Desired CPU  Deadlines met [%]
None	  76.6 +/- 8.3654    76  166
Video	  78.5 +/- 3.9433    78  107
X	  76.4 +/- 8.123     75  157
Burn	  72.0 +/- 6.4733    71  127
Write	 255.3 +/- 26.627   252  331
Read	 226.6 +/- 12.38    227  262
Ring	  84.2 +/- 6.6207    83  125
Compile	 225.3 +/- 23.949   222  328

	 136.8 +/- 78.462        331

Benchmark results after d66792919d4f:

--- Benchmarking simulated cpu of Audio real time in the presence of 
simulated ---
Load	Latency +/- SD   median  max [100n]	Desired CPU  Deadlines met [%]
None	  68.4 +/- 9.7864    67  169
Video	  74.4 +/- 3.724     74   97
X	  72.0 +/- 6.5681    71  129
Burn	  66.9 +/- 5.9059    66  117
Write	9576.9 +/- 67639    250500418		98.1	     98.1
Read	 209.3 +/- 11.018   209  267
Ring	  80.5 +/- 8.0993    78  125
Compile	 239.0 +/- 29.447   234  372

	1298.4 +/- 24118       500418

Reverting this commit obviously solves the issue for me. I have no idea 
why this issue appears exclusively with heavy write loads in the 
background.

Is this a scheduler issue, or rather something in the background?

Kind regards,
Andreas

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.12.86&id=d66792919d4f7bd326dfd8c21d019f7c5d4ef05c
[2] https://github.com/ckolivas/interbench

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched/deadline: Use revised wakeup rule for dl_server
  2026-05-08  8:09 sched/deadline: Use revised wakeup rule for dl_server Andreas Ziegler
@ 2026-05-08  9:20 ` Christian Loehle
  2026-05-08 12:06   ` Andreas Ziegler
  2026-05-11 12:46 ` Juri Lelli
  1 sibling, 1 reply; 9+ messages in thread
From: Christian Loehle @ 2026-05-08  9:20 UTC (permalink / raw)
  To: Andreas Ziegler, Peter Zijlstra, Juri Lelli
  Cc: linux-kernel, Dietmar Eggemann

On 5/8/26 09:09, Andreas Ziegler wrote:
> Linux kernel version: 6.12
>   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
> Architecture: aarch64
> Platform: Raspberry Pi 4
> 
> Hi everyone,
> 
> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) [1] introduced a marked degradation in scheduling latency for real-time tasks in the presence of heavy I/O load.
> 
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
>      if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>          dl_entity_overflow(dl_se, rq_clock(rq))) {
> 
> -        if (unlikely(!dl_is_implicit(dl_se) &&
> +        if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
>                   !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>                   !is_dl_boosted(dl_se))) {
>              update_dl_revised_wakeup(dl_se, rq);
> 
> This was observed using a modified version of Con Kolivas' interactivity benchmark [2]; kernel bisection eventually pointed to the above mentioned commit.
> 
> Benchmark results before d66792919d4f:
> 
> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
> Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines met [%]
> None      76.6 +/- 8.3654    76  166
> Video      78.5 +/- 3.9433    78  107
> X      76.4 +/- 8.123     75  157
> Burn      72.0 +/- 6.4733    71  127
> Write     255.3 +/- 26.627   252  331
> Read     226.6 +/- 12.38    227  262
> Ring      84.2 +/- 6.6207    83  125
> Compile     225.3 +/- 23.949   222  328
> 
>      136.8 +/- 78.462        331
> 
> Benchmark results after d66792919d4f:
> 
> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
> Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines met [%]
> None      68.4 +/- 9.7864    67  169
> Video      74.4 +/- 3.724     74   97
> X      72.0 +/- 6.5681    71  129
> Burn      66.9 +/- 5.9059    66  117
> Write    9576.9 +/- 67639    250500418        98.1         98.1
> Read     209.3 +/- 11.018   209  267
> Ring      80.5 +/- 8.0993    78  125
> Compile     239.0 +/- 29.447   234  372
> 
>     1298.4 +/- 24118       500418
> 
> Reverting this commit obviously solves the issue for me. I have no idea why this issue appears exclusively with heavy write loads in the background.
> 
> Is this a scheduler issue, or rather something in the background?
> 

Hi Andreas,
You're using cpufreq schedutil for your tests I'm assuming?
Is there a difference in cpufreq behavior (avg cpufreq or OPP residencies?)
Does the regression also happen on powersave/performance governor?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched/deadline: Use revised wakeup rule for dl_server
  2026-05-08  9:20 ` Christian Loehle
@ 2026-05-08 12:06   ` Andreas Ziegler
  2026-05-08 14:13     ` Christian Loehle
  0 siblings, 1 reply; 9+ messages in thread
From: Andreas Ziegler @ 2026-05-08 12:06 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Peter Zijlstra, Juri Lelli, linux-kernel, Dietmar Eggemann

Hi Christian,

On 2026-05-08 09:20, Christian Loehle wrote:
> On 5/8/26 09:09, Andreas Ziegler wrote:
>> Linux kernel version: 6.12
>>   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
>> Architecture: aarch64
>> Platform: Raspberry Pi 4
>> 
>> Hi everyone,
>> 
>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for 
>> dl_server) [1] introduced a marked degradation in scheduling latency 
>> for real-time tasks in the presence of heavy I/O load.
>> 
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct 
>> sched_dl_entity *dl_se)
>>      if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>>          dl_entity_overflow(dl_se, rq_clock(rq))) {
>> 
>> -        if (unlikely(!dl_is_implicit(dl_se) &&
>> +        if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
>>                   !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>>                   !is_dl_boosted(dl_se))) {
>>              update_dl_revised_wakeup(dl_se, rq);
>> 
>> This was observed using a modified version of Con Kolivas' 
>> interactivity benchmark [2]; kernel bisection eventually pointed to 
>> the above mentioned commit.
>> 
>> Benchmark results before d66792919d4f:
>> 
>> --- Benchmarking simulated cpu of Audio real time in the presence of 
>> simulated ---
>> Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines 
>> met [%]
>> None      76.6 +/- 8.3654    76  166
>> Video      78.5 +/- 3.9433    78  107
>> X      76.4 +/- 8.123     75  157
>> Burn      72.0 +/- 6.4733    71  127
>> Write     255.3 +/- 26.627   252  331
>> Read     226.6 +/- 12.38    227  262
>> Ring      84.2 +/- 6.6207    83  125
>> Compile     225.3 +/- 23.949   222  328
>> 
>>      136.8 +/- 78.462        331
>> 
>> Benchmark results after d66792919d4f:
>> 
>> --- Benchmarking simulated cpu of Audio real time in the presence of 
>> simulated ---
>> Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines 
>> met [%]
>> None      68.4 +/- 9.7864    67  169
>> Video      74.4 +/- 3.724     74   97
>> X      72.0 +/- 6.5681    71  129
>> Burn      66.9 +/- 5.9059    66  117
>> Write    9576.9 +/- 67639    250500418        98.1         98.1
>> Read     209.3 +/- 11.018   209  267
>> Ring      80.5 +/- 8.0993    78  125
>> Compile     239.0 +/- 29.447   234  372
>> 
>>     1298.4 +/- 24118       500418
>> 
>> Reverting this commit obviously solves the issue for me. I have no 
>> idea why this issue appears exclusively with heavy write loads in the 
>> background.
>> 
>> Is this a scheduler issue, or rather something in the background?
>> 
> 
> Hi Andreas,
> You're using cpufreq schedutil for your tests I'm assuming?
> Is there a difference in cpufreq behavior (avg cpufreq or OPP 
> residencies?)
> Does the regression also happen on powersave/performance governor?

Actually this is a very stripped-down system. The 'performance' cpufreq 
governor is the only one compiled in, the processor cores run on a fixed 
frequency. CONFIG_PM_OPP is not set.

Removing the frequency constraint and using 'powersave' governor lets 
the latency values rise generally, but the anomaly under write loads 
persists. The cpu frequency does not change, but remains stuck on the 
lowest level.

--- Benchmarking simulated cpu of Audio real time in the presence of 
simulated ---
Load	Latency +/- SD   median  max [100n]	Desired CPU  Deadlines met [%]
None	 238.7 +/- 31.416   229  405
Video	 228.6 +/- 13.668   226  291
X	 247.8 +/- 29.196   239  425
Burn	 222.6 +/- 30.631   215  348
Write	1214.8 +/- 20397    369500411		99.8	     99.8
Read	 393.9 +/- 21.375   394  476
Ring	 250.3 +/- 27.59    241  365
Compile	 411.2 +/- 23.41    411  474

	 401.0 +/- 7218.2      500411

Same with 'schedutil' governor; the cpu frequency adjusts with the load.

--- Benchmarking simulated cpu of Audio real time in the presence of 
simulated ---
Load	Latency +/- SD   median  max [100n]	Desired CPU  Deadlines met [%]
None	 200.9 +/- 57.332   208  431
Video	 136.2 +/- 23.784   136  250
X	 172.3 +/- 59.286   174  404
Burn	 104.1 +/- 22.847    97  247
Write	5337.5 +/- 49960    286500394		  99	       99
Read	 300.5 +/- 18.65    301  359
Ring	 119.8 +/- 15.8     115  196
Compile	 282.7 +/- 25.056   280  469

	 831.7 +/- 17746       500394

Kind regards,
Andreas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched/deadline: Use revised wakeup rule for dl_server
  2026-05-08 12:06   ` Andreas Ziegler
@ 2026-05-08 14:13     ` Christian Loehle
  2026-05-09 11:42       ` Andreas Ziegler
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Loehle @ 2026-05-08 14:13 UTC (permalink / raw)
  To: Andreas Ziegler
  Cc: Peter Zijlstra, Juri Lelli, linux-kernel, Dietmar Eggemann,
	John Stultz

On 5/8/26 13:06, Andreas Ziegler wrote:
> Hi Christian,
> 
> On 2026-05-08 09:20, Christian Loehle wrote:
>> On 5/8/26 09:09, Andreas Ziegler wrote:
>>> Linux kernel version: 6.12
>>>   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
>>> Architecture: aarch64
>>> Platform: Raspberry Pi 4
>>>
>>> Hi everyone,
>>>
>>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) [1] introduced a marked degradation in scheduling latency for real-time tasks in the presence of heavy I/O load.
>>>
>>> --- a/kernel/sched/deadline.c
>>> +++ b/kernel/sched/deadline.c
>>> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
>>>      if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>>>          dl_entity_overflow(dl_se, rq_clock(rq))) {
>>>
>>> -        if (unlikely(!dl_is_implicit(dl_se) &&
>>> +        if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
>>>                   !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>>>                   !is_dl_boosted(dl_se))) {
>>>              update_dl_revised_wakeup(dl_se, rq);
>>>
>>> This was observed using a modified version of Con Kolivas' interactivity benchmark [2]; kernel bisection eventually pointed to the above mentioned commit.
>>>
>>> Benchmark results before d66792919d4f:
>>>
>>> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines met [%]
>>> None      76.6 +/- 8.3654    76  166
>>> Video      78.5 +/- 3.9433    78  107
>>> X      76.4 +/- 8.123     75  157
>>> Burn      72.0 +/- 6.4733    71  127
>>> Write     255.3 +/- 26.627   252  331
>>> Read     226.6 +/- 12.38    227  262
>>> Ring      84.2 +/- 6.6207    83  125
>>> Compile     225.3 +/- 23.949   222  328
>>>
>>>      136.8 +/- 78.462        331
>>>
>>> Benchmark results after d66792919d4f:
>>>
>>> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines met [%]
>>> None      68.4 +/- 9.7864    67  169
>>> Video      74.4 +/- 3.724     74   97
>>> X      72.0 +/- 6.5681    71  129
>>> Burn      66.9 +/- 5.9059    66  117
>>> Write    9576.9 +/- 67639    250500418        98.1         98.1
>>> Read     209.3 +/- 11.018   209  267
>>> Ring      80.5 +/- 8.0993    78  125
>>> Compile     239.0 +/- 29.447   234  372
>>>
>>>     1298.4 +/- 24118       500418
>>>
>>> Reverting this commit obviously solves the issue for me. I have no idea why this issue appears exclusively with heavy write loads in the background.
>>>
>>> Is this a scheduler issue, or rather something in the background?
>>>
>>
>> Hi Andreas,
>> You're using cpufreq schedutil for your tests I'm assuming?
>> Is there a difference in cpufreq behavior (avg cpufreq or OPP residencies?)
>> Does the regression also happen on powersave/performance governor?
> 
> Actually this is a very stripped-down system. The 'performance' cpufreq governor is the only one compiled in, the processor cores run on a fixed frequency. CONFIG_PM_OPP is not set.

That certainly makes the analysis easier.
I couldn't reproduce the issue so far on my system but it does seem like the dl server
would get potentially unbounded running time with very frequent
starting and stopping of the dlserver (which presumably happens because of
the writeback) reset the runtime, which then leads to your 25s observed latency.
Peter, how is the revised wakeup rule supposed to behave here?

> [snip]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched/deadline: Use revised wakeup rule for dl_server
  2026-05-08 14:13     ` Christian Loehle
@ 2026-05-09 11:42       ` Andreas Ziegler
  2026-05-11  9:47         ` Christian Loehle
  0 siblings, 1 reply; 9+ messages in thread
From: Andreas Ziegler @ 2026-05-09 11:42 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Peter Zijlstra, Juri Lelli, linux-kernel, Dietmar Eggemann,
	John Stultz

Hi Christian, Everyone,

On 2026-05-08 14:13, Christian Loehle wrote:
> On 5/8/26 13:06, Andreas Ziegler wrote:
>> Hi Christian,
>> 
>> On 2026-05-08 09:20, Christian Loehle wrote:
>>> On 5/8/26 09:09, Andreas Ziegler wrote:
>>>> Linux kernel version: 6.12
>>>>   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
>>>> Architecture: aarch64
>>>> Platform: Raspberry Pi 4
>>>> 
>>>> Hi everyone,
>>>> 
>>>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for 
>>>> dl_server) [1] introduced a marked degradation in scheduling latency 
>>>> for real-time tasks in the presence of heavy I/O load.
>>>> 
>>>> --- a/kernel/sched/deadline.c
>>>> +++ b/kernel/sched/deadline.c
>>>> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct 
>>>> sched_dl_entity *dl_se)
>>>>      if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>>>>          dl_entity_overflow(dl_se, rq_clock(rq))) {
>>>> 
>>>> -        if (unlikely(!dl_is_implicit(dl_se) &&
>>>> +        if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
>>>>                   !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>>>>                   !is_dl_boosted(dl_se))) {
>>>>              update_dl_revised_wakeup(dl_se, rq);
>>>> 
>>>> This was observed using a modified version of Con Kolivas' 
>>>> interactivity benchmark [2]; kernel bisection eventually pointed to 
>>>> the above mentioned commit.
>>>> 
>>>> Benchmark results before d66792919d4f:
>>>> 
>>>> --- Benchmarking simulated cpu of Audio real time in the presence of 
>>>> simulated ---
>>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  
>>>> Deadlines met [%]
>>>> None      76.6 +/- 8.3654    76  166
>>>> Video      78.5 +/- 3.9433    78  107
>>>> X      76.4 +/- 8.123     75  157
>>>> Burn      72.0 +/- 6.4733    71  127
>>>> Write     255.3 +/- 26.627   252  331
>>>> Read     226.6 +/- 12.38    227  262
>>>> Ring      84.2 +/- 6.6207    83  125
>>>> Compile     225.3 +/- 23.949   222  328
>>>> 
>>>>      136.8 +/- 78.462        331
>>>> 
>>>> Benchmark results after d66792919d4f:
>>>> 
>>>> --- Benchmarking simulated cpu of Audio real time in the presence of 
>>>> simulated ---
>>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  
>>>> Deadlines met [%]
>>>> None      68.4 +/- 9.7864    67  169
>>>> Video      74.4 +/- 3.724     74   97
>>>> X      72.0 +/- 6.5681    71  129
>>>> Burn      66.9 +/- 5.9059    66  117
>>>> Write    9576.9 +/- 67639    250500418        98.1         98.1
>>>> Read     209.3 +/- 11.018   209  267
>>>> Ring      80.5 +/- 8.0993    78  125
>>>> Compile     239.0 +/- 29.447   234  372
>>>> 
>>>>     1298.4 +/- 24118       500418
>>>> 
>>>> Reverting this commit obviously solves the issue for me. I have no 
>>>> idea why this issue appears exclusively with heavy write loads in 
>>>> the background.
>>>> 
>>>> Is this a scheduler issue, or rather something in the background?
>>>> 
>>> 
>>> Hi Andreas,
>>> You're using cpufreq schedutil for your tests I'm assuming?
>>> Is there a difference in cpufreq behavior (avg cpufreq or OPP 
>>> residencies?)
>>> Does the regression also happen on powersave/performance governor?
>> 
>> Actually this is a very stripped-down system. The 'performance' 
>> cpufreq governor is the only one compiled in, the processor cores run 
>> on a fixed frequency. CONFIG_PM_OPP is not set.
> 
> That certainly makes the analysis easier.
> I couldn't reproduce the issue so far on my system but it does seem 
> like the dl server
> would get potentially unbounded running time with very frequent
> starting and stopping of the dlserver (which presumably happens because 
> of
> the writeback) reset the runtime, which then leads to your 25s observed 
> latency.
> Peter, how is the revised wakeup rule supposed to behave here?
> 
>> [snip]

This seems to be a case of runtime starvation. If I change 
sched_rt_runtime_us to a smaller value, the benchmark returns reasonable 
latency values.

# echo "980000" > /proc/sys/kernel/sched_rt_runtime_us

I could live with this workaround, since it seems not to impact overall 
latency values in a noticeable way.

Kind regards,
Andreas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched/deadline: Use revised wakeup rule for dl_server
  2026-05-09 11:42       ` Andreas Ziegler
@ 2026-05-11  9:47         ` Christian Loehle
  2026-05-11 12:37           ` Andreas Ziegler
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Loehle @ 2026-05-11  9:47 UTC (permalink / raw)
  To: Andreas Ziegler
  Cc: Peter Zijlstra, Juri Lelli, linux-kernel, Dietmar Eggemann,
	John Stultz

On 5/9/26 12:42, Andreas Ziegler wrote:
> Hi Christian, Everyone,
> 
> On 2026-05-08 14:13, Christian Loehle wrote:
>> On 5/8/26 13:06, Andreas Ziegler wrote:
>>> Hi Christian,
>>>
>>> On 2026-05-08 09:20, Christian Loehle wrote:
>>>> On 5/8/26 09:09, Andreas Ziegler wrote:
>>>>> Linux kernel version: 6.12
>>>>>   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
>>>>> Architecture: aarch64
>>>>> Platform: Raspberry Pi 4
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) [1] introduced a marked degradation in scheduling latency for real-time tasks in the presence of heavy I/O load.
>>>>>
>>>>> --- a/kernel/sched/deadline.c
>>>>> +++ b/kernel/sched/deadline.c
>>>>> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
>>>>>      if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>>>>>          dl_entity_overflow(dl_se, rq_clock(rq))) {
>>>>>
>>>>> -        if (unlikely(!dl_is_implicit(dl_se) &&
>>>>> +        if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
>>>>>                   !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>>>>>                   !is_dl_boosted(dl_se))) {
>>>>>              update_dl_revised_wakeup(dl_se, rq);
>>>>>
>>>>> This was observed using a modified version of Con Kolivas' interactivity benchmark [2]; kernel bisection eventually pointed to the above mentioned commit.
>>>>>
>>>>> Benchmark results before d66792919d4f:
>>>>>
>>>>> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
>>>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines met [%]
>>>>> None      76.6 +/- 8.3654    76  166
>>>>> Video      78.5 +/- 3.9433    78  107
>>>>> X      76.4 +/- 8.123     75  157
>>>>> Burn      72.0 +/- 6.4733    71  127
>>>>> Write     255.3 +/- 26.627   252  331
>>>>> Read     226.6 +/- 12.38    227  262
>>>>> Ring      84.2 +/- 6.6207    83  125
>>>>> Compile     225.3 +/- 23.949   222  328
>>>>>
>>>>>      136.8 +/- 78.462        331
>>>>>
>>>>> Benchmark results after d66792919d4f:
>>>>>
>>>>> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
>>>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines met [%]
>>>>> None      68.4 +/- 9.7864    67  169
>>>>> Video      74.4 +/- 3.724     74   97
>>>>> X      72.0 +/- 6.5681    71  129
>>>>> Burn      66.9 +/- 5.9059    66  117
>>>>> Write    9576.9 +/- 67639    250500418        98.1         98.1
>>>>> Read     209.3 +/- 11.018   209  267
>>>>> Ring      80.5 +/- 8.0993    78  125
>>>>> Compile     239.0 +/- 29.447   234  372
>>>>>
>>>>>     1298.4 +/- 24118       500418
>>>>>
>>>>> Reverting this commit obviously solves the issue for me. I have no idea why this issue appears exclusively with heavy write loads in the background.
>>>>>
>>>>> Is this a scheduler issue, or rather something in the background?
>>>>>
>>>>
>>>> Hi Andreas,
>>>> You're using cpufreq schedutil for your tests I'm assuming?
>>>> Is there a difference in cpufreq behavior (avg cpufreq or OPP residencies?)
>>>> Does the regression also happen on powersave/performance governor?
>>>
>>> Actually this is a very stripped-down system. The 'performance' cpufreq governor is the only one compiled in, the processor cores run on a fixed frequency. CONFIG_PM_OPP is not set.
>>
>> That certainly makes the analysis easier.
>> I couldn't reproduce the issue so far on my system but it does seem like the dl server
>> would get potentially unbounded running time with very frequent
>> starting and stopping of the dlserver (which presumably happens because of
>> the writeback) reset the runtime, which then leads to your 25s observed latency.
>> Peter, how is the revised wakeup rule supposed to behave here?
>>
>>> [snip]
> 
> This seems to be a case of runtime starvation. If I change sched_rt_runtime_us to a smaller value, the benchmark returns reasonable latency values.
> 
> # echo "980000" > /proc/sys/kernel/sched_rt_runtime_us
> 
> I could live with this workaround, since it seems not to impact overall latency values in a noticeable way.
> 

Not a very stable workaround unfortunately :/
While I try to reproduce this, what you're observing should imply that the
background SCHED_NORMAL work is enough to fully utilize the system, right?
interbench Write does 4k (buffered) writes of a 1GB file and then close+open
and repeat, nothing fancy really. Does this actually produce significant CPU
utilization for you? Can you just run the background work and see what that
looks like?
(What you're seeing looks like a bug in any case, just so I'm not going down
a wrong path when trying to reproduce here).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched/deadline: Use revised wakeup rule for dl_server
  2026-05-11  9:47         ` Christian Loehle
@ 2026-05-11 12:37           ` Andreas Ziegler
  0 siblings, 0 replies; 9+ messages in thread
From: Andreas Ziegler @ 2026-05-11 12:37 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Peter Zijlstra, Juri Lelli, linux-kernel, Dietmar Eggemann,
	John Stultz

On 2026-05-11 09:47, Christian Loehle wrote:
> On 5/9/26 12:42, Andreas Ziegler wrote:
>> Hi Christian, Everyone,
>> 
>> On 2026-05-08 14:13, Christian Loehle wrote:
>>> On 5/8/26 13:06, Andreas Ziegler wrote:
>>>> Hi Christian,
>>>> 
>>>> On 2026-05-08 09:20, Christian Loehle wrote:
>>>>> On 5/8/26 09:09, Andreas Ziegler wrote:
>>>>>> Linux kernel version: 6.12
>>>>>>   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
>>>>>> Architecture: aarch64
>>>>>> Platform: Raspberry Pi 4
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for 
>>>>>> dl_server) [1] introduced a marked degradation in scheduling 
>>>>>> latency for real-time tasks in the presence of heavy I/O load.
>>>>>> 
>>>>>> --- a/kernel/sched/deadline.c
>>>>>> +++ b/kernel/sched/deadline.c
>>>>>> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct 
>>>>>> sched_dl_entity *dl_se)
>>>>>>      if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>>>>>>          dl_entity_overflow(dl_se, rq_clock(rq))) {
>>>>>> 
>>>>>> -        if (unlikely(!dl_is_implicit(dl_se) &&
>>>>>> +        if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) 
>>>>>> &&
>>>>>>                   !dl_time_before(dl_se->deadline, rq_clock(rq)) 
>>>>>> &&
>>>>>>                   !is_dl_boosted(dl_se))) {
>>>>>>              update_dl_revised_wakeup(dl_se, rq);
>>>>>> 
>>>>>> This was observed using a modified version of Con Kolivas' 
>>>>>> interactivity benchmark [2]; kernel bisection eventually pointed 
>>>>>> to the above mentioned commit.
>>>>>> 
>>>>>> Benchmark results before d66792919d4f:
>>>>>> 
>>>>>> --- Benchmarking simulated cpu of Audio real time in the presence 
>>>>>> of simulated ---
>>>>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  
>>>>>> Deadlines met [%]
>>>>>> None      76.6 +/- 8.3654    76  166
>>>>>> Video      78.5 +/- 3.9433    78  107
>>>>>> X      76.4 +/- 8.123     75  157
>>>>>> Burn      72.0 +/- 6.4733    71  127
>>>>>> Write     255.3 +/- 26.627   252  331
>>>>>> Read     226.6 +/- 12.38    227  262
>>>>>> Ring      84.2 +/- 6.6207    83  125
>>>>>> Compile     225.3 +/- 23.949   222  328
>>>>>> 
>>>>>>      136.8 +/- 78.462        331
>>>>>> 
>>>>>> Benchmark results after d66792919d4f:
>>>>>> 
>>>>>> --- Benchmarking simulated cpu of Audio real time in the presence 
>>>>>> of simulated ---
>>>>>> Load    Latency +/- SD   median  max [100n]    Desired CPU  
>>>>>> Deadlines met [%]
>>>>>> None      68.4 +/- 9.7864    67  169
>>>>>> Video      74.4 +/- 3.724     74   97
>>>>>> X      72.0 +/- 6.5681    71  129
>>>>>> Burn      66.9 +/- 5.9059    66  117
>>>>>> Write    9576.9 +/- 67639    250500418        98.1         98.1
>>>>>> Read     209.3 +/- 11.018   209  267
>>>>>> Ring      80.5 +/- 8.0993    78  125
>>>>>> Compile     239.0 +/- 29.447   234  372
>>>>>> 
>>>>>>     1298.4 +/- 24118       500418
>>>>>> 
>>>>>> Reverting this commit obviously solves the issue for me. I have no 
>>>>>> idea why this issue appears exclusively with heavy write loads in 
>>>>>> the background.
>>>>>> 
>>>>>> Is this a scheduler issue, or rather something in the background?
>>>>>> 
>>>>> 
>>>>> Hi Andreas,
>>>>> You're using cpufreq schedutil for your tests I'm assuming?
>>>>> Is there a difference in cpufreq behavior (avg cpufreq or OPP 
>>>>> residencies?)
>>>>> Does the regression also happen on powersave/performance governor?
>>>> 
>>>> Actually this is a very stripped-down system. The 'performance' 
>>>> cpufreq governor is the only one compiled in, the processor cores 
>>>> run on a fixed frequency. CONFIG_PM_OPP is not set.
>>> 
>>> That certainly makes the analysis easier.
>>> I couldn't reproduce the issue so far on my system but it does seem 
>>> like the dl server
>>> would get potentially unbounded running time with very frequent
>>> starting and stopping of the dlserver (which presumably happens 
>>> because of
>>> the writeback) reset the runtime, which then leads to your 25s 
>>> observed latency.
>>> Peter, how is the revised wakeup rule supposed to behave here?
>>> 
>>>> [snip]
>> 
>> This seems to be a case of runtime starvation. If I change 
>> sched_rt_runtime_us to a smaller value, the benchmark returns 
>> reasonable latency values.
>> 
>> # echo "980000" > /proc/sys/kernel/sched_rt_runtime_us
>> 
>> I could live with this workaround, since it seems not to impact 
>> overall latency values in a noticeable way.
>> 
> 
> Not a very stable workaround unfortunately :/
> While I try to reproduce this, what you're observing should imply that 
> the
> background SCHED_NORMAL work is enough to fully utilize the system, 
> right?
> interbench Write does 4k (buffered) writes of a 1GB file and then 
> close+open
> and repeat, nothing fancy really. Does this actually produce 
> significant CPU
> utilization for you? Can you just run the background work and see what 
> that
> looks like?
> (What you're seeing looks like a bug in any case, just so I'm not going 
> down
> a wrong path when trying to reproduce here).

You are right, and this was a false positive; the problem seems to be 
intermittent (maybe 1/20) and I just got lucky for one session.

Some background information about the current state of the system:
   /* CONFIG_CPU_FREQ is not set */
   Root filesystem in RAM (initrd)
   Cpu 3 is isolated: boot parameters: console=tty1 
console=ttyAMA0,115200 isolcpus=nohz,domain,managed_irq,3 nohz_full=3 
rcu_nocbs=3

Background load is normally near 100% idle; this is from top after 
reboot:

Mem: 95724K used, 853524K free, 42408K shrd, 72K buff, 43352K cached
CPU:  0.0% usr  0.0% sys  0.0% nic  100% idle  0.0% io  0.0% irq  0.0% 
sirq
Load average: 0.21 0.17 0.07 3/126 702

The file size used by interbench is even less than 1GB, due to the 
limits of the rootfs. Typical values are around 100-200 MiB. It is 
written in an infinite loop until receiving the stop message (via pipe) 
from the controlling process. The check for the abort signal occurs 
after a completed write, not on block level.

I just noticed that interbench seems to have a bug itself: it uses only 
one processor - looks like a mangled cpu mask. Top output during the 
write benchmark:

Mem: 358024K used, 591224K free, 298516K shrd, 2504K buff, 299464K 
cached
CPU:  1.8% usr 23.1% sys  0.0% nic 74.9% idle  0.0% io  0.0% irq  0.0% 
sirq
Load average: 1.21 0.46 0.29 5/129 2116
   PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
  2106  2105 root     S     1228  0.1   0 23.6 interbench -r -t 60 -u -w 
Write -W
  2109  2105 root     S     1228  0.1   0  1.2 interbench -r -t 60 -u -w 
Write -W
  1829  1274 root     R     1600  0.1   2  0.0 top -d 5
    22     2 root     SW       0  0.0   0  0.0 [rcuc/0]
  1270     2 root     IW       0  0.0   0  0.0 [kworker/0:0-eve]
   652     1 mpd      S    27632  2.9   0  0.0 /usr/bin/mpd
  2023  2021 root     S     4476  0.4   0  0.0 sshd-session: root@notty
   675   673 root     S     4448  0.4   1  0.0 sshd-session: root@pts/0
   673   601 root     S     4140  0.4   0  0.0 sshd-session: root [priv]
  2021   601 root     S     4140  0.4   0  0.0 sshd-session: root [priv]
   601     1 root     S     3736  0.3   1  0.0 sshd: /usr/sbin/sshd 
[listener] 0
  2024  2023 root     S     3224  0.3   1  0.0 /usr/libexec/sftp-server
  2025  2023 root     S     3188  0.3   2  0.0 /usr/libexec/sftp-server
   501     1 root     S     1884  0.2   1  0.0 /usr/sbin/wpa_supplicant 
-B -P /va
   131     1 root     S     1672  0.1   0  0.0 /sbin/mdev -df
   676   675 root     S     1636  0.1   1  0.0 -sh
  1274   605 root     S     1636  0.1   1  0.0 -sh
   605     1 root     S     1592  0.1   1  0.0 /usr/sbin/telnetd -F
   527     1 root     S     1576  0.1   2  0.0 udhcpc -t1 -A2 -b -R -O 
search -O
     1     0 root     S     1576  0.1   0  0.0 init

I tried limiting interbench's rather excessive SCHED_FIFO priorities to 
values normal for the system, but without success.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched/deadline: Use revised wakeup rule for dl_server
  2026-05-08  8:09 sched/deadline: Use revised wakeup rule for dl_server Andreas Ziegler
  2026-05-08  9:20 ` Christian Loehle
@ 2026-05-11 12:46 ` Juri Lelli
  2026-05-11 14:13   ` Andreas Ziegler
  1 sibling, 1 reply; 9+ messages in thread
From: Juri Lelli @ 2026-05-11 12:46 UTC (permalink / raw)
  To: Andreas Ziegler; +Cc: Peter Zijlstra, linux-kernel

Hello,

On 08/05/26 08:09, Andreas Ziegler wrote:
> Linux kernel version: 6.12
>   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
> Architecture: aarch64
> Platform: Raspberry Pi 4
> 
> Hi everyone,
> 
> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server)
> [1] introduced a marked degradation in scheduling latency for real-time
> tasks in the presence of heavy I/O load.

Can this be the same regression reported here?

https://marc.info/?l=linux-rt-users&m=177844667227991

Please notice the list of missing subsequent fixes Mike is suggesting to
test with.

https://marc.info/?l=linux-rt-users&m=177847863710263&w=2

Thanks,
Juri


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sched/deadline: Use revised wakeup rule for dl_server
  2026-05-11 12:46 ` Juri Lelli
@ 2026-05-11 14:13   ` Andreas Ziegler
  0 siblings, 0 replies; 9+ messages in thread
From: Andreas Ziegler @ 2026-05-11 14:13 UTC (permalink / raw)
  To: Juri Lelli; +Cc: Peter Zijlstra, linux-kernel

Hi Juri,

On 2026-05-11 12:46, Juri Lelli wrote:
> Hello,
> 
> On 08/05/26 08:09, Andreas Ziegler wrote:
>> Linux kernel version: 6.12
>>   CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
>> Architecture: aarch64
>> Platform: Raspberry Pi 4
>> 
>> Hi everyone,
>> 
>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for 
>> dl_server)
>> [1] introduced a marked degradation in scheduling latency for 
>> real-time
>> tasks in the presence of heavy I/O load.
> 
> Can this be the same regression reported here?
> 
> https://marc.info/?l=linux-rt-users&m=177844667227991

Yes, this is the same issue. I wonder where the 50 ms are coming from 
... The value is fairly consistent also in my results.

> Please notice the list of missing subsequent fixes Mike is suggesting 
> to
> test with.
> 
> https://marc.info/?l=linux-rt-users&m=177847863710263&w=2

I will take a look at the mentioned patches.

> Thanks,
> Juri

Thank you for the update,
Andreas

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-11 14:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08  8:09 sched/deadline: Use revised wakeup rule for dl_server Andreas Ziegler
2026-05-08  9:20 ` Christian Loehle
2026-05-08 12:06   ` Andreas Ziegler
2026-05-08 14:13     ` Christian Loehle
2026-05-09 11:42       ` Andreas Ziegler
2026-05-11  9:47         ` Christian Loehle
2026-05-11 12:37           ` Andreas Ziegler
2026-05-11 12:46 ` Juri Lelli
2026-05-11 14:13   ` Andreas Ziegler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox