Linux kernel -stable discussions
 help / color / mirror / Atom feed
* [REGRESSION] 6.12.y: d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) causes latencies up to 50ms with PREEMPT_RT
@ 2026-05-10 20:57 Lukas Beckmann
  2026-05-11  5:50 ` Mike Galbraith
  2026-05-11 14:21 ` Sasha Levin
  0 siblings, 2 replies; 5+ messages in thread
From: Lukas Beckmann @ 2026-05-10 20:57 UTC (permalink / raw)
  To: Peter Zijlstra, Juri Lelli, Sasha Levin
  Cc: regressions, stable, linux-rt-users

Hi,

I am reporting a regression which was introduced by d66792919d4f on 6.12.y.
Since this commit, cyclictest reports latencies up to 50 milliseconds, 
on kernels with CONFIG_PREEMPT_RT=y.

Steps to reproduce:
1. run a load (e.g. stress-ng --cpu 4 --io 2 --vm 2 --vm-bytes 128M)
2. run cyclictest (e.g. cyclictest -a -t -m -p 80 -i 250 -d 0)

cyclictest results on the current linux-6.12.y branch (tag v6.12.87):
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 9.37 9.21 6.90 9/211 978
T: 0 ( 884) P:80 I:250 C:4688252 Min: 3 Act: 6 Avg: 6 Max: 51956
T: 1 ( 885) P:80 I:250 C:4688051 Min: 3 Act: 7 Avg: 6 Max: 50106
T: 2 ( 886) P:80 I:250 C:4688242 Min: 3 Act: 6 Avg: 6 Max: 51965
T: 3 ( 887) P:80 I:250 C:4688434 Min: 3 Act: 12 Avg: 8 Max: 59

cyclictest results on 6.12.y with d66792919d4f reverted:
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 9.43 9.50 9.44 8/204 5758
T: 0 ( 862) P:80 I:250 C:272329322 Min: 3 Act: 6 Avg: 6 Max: 57
T: 1 ( 863) P:80 I:250 C:272329324 Min: 3 Act: 7 Avg: 6 Max: 77
T: 2 ( 864) P:80 I:250 C:272329322 Min: 3 Act: 7 Avg: 6 Max: 68
T: 3 ( 865) P:80 I:250 C:272329322 Min: 3 Act: 16 Avg: 7 Max: 81

This is reproducible on multiple machines.

It looks like the timer fires and there is also a sched_waking event in 
the trace, but the cyclictest thread does not get scheduled for another 
50ms.

I found this, because Debian updated its rt kernel from 6.12.74 to 6.12.85.
The issue was also present with upstream 6.12.85 and HEAD, but not with 
6.12.74, so I started bisecting and eventually found d66792919d4f.

Is it possible to revert the commit?

I can provide traces or help with testing if needed.

Thanks
Lukas Beckmann

#regzbot introduced: d66792919d4f


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] 6.12.y: d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) causes latencies up to 50ms with PREEMPT_RT
  2026-05-10 20:57 [REGRESSION] 6.12.y: d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) causes latencies up to 50ms with PREEMPT_RT Lukas Beckmann
@ 2026-05-11  5:50 ` Mike Galbraith
  2026-05-11 14:21 ` Sasha Levin
  1 sibling, 0 replies; 5+ messages in thread
From: Mike Galbraith @ 2026-05-11  5:50 UTC (permalink / raw)
  To: Lukas Beckmann, Peter Zijlstra, Juri Lelli, Sasha Levin
  Cc: regressions, stable, linux-rt-users

On Sun, 2026-05-10 at 22:57 +0200, Lukas Beckmann wrote:
> Hi,

Greetings!

> I am reporting a regression which was introduced by d66792919d4f on 6.12.y.
> Since this commit, cyclictest reports latencies up to 50 milliseconds, 
> on kernels with CONFIG_PREEMPT_RT=y.
> 
> Steps to reproduce:
> 1. run a load (e.g. stress-ng --cpu 4 --io 2 --vm 2 --vm-bytes 128M)
> 2. run cyclictest (e.g. cyclictest -a -t -m -p 80 -i 250 -d 0)

...


> I found this, because Debian updated its rt kernel from 6.12.74 to 6.12.85.
> The issue was also present with upstream 6.12.85 and HEAD, but not with 
> 6.12.74, so I started bisecting and eventually found d66792919d4f.
> 
> Is it possible to revert the commit?
> 
> I can provide traces or help wth testing if needed.

FWIW, my box says this *may* be due to a fix (+follow-ups) that didn't
wander back to stable.

cccb45d7c429 ("sched/deadline: Less agressive dl_server handling")
   Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server")

Follow-ups:
4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
a3a70caf7906 ("sched/deadline: Fix dl_server behaviour")

Local 6.12-rt tree containing the above (et al) failed to reproduce in
the hour I let it try to, whereas a build excluding only all locally
added fix backports reproduced in fairly short order.

Suspects NOT confirmed in total isolation...

	-Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] 6.12.y: d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) causes latencies up to 50ms with PREEMPT_RT
  2026-05-10 20:57 [REGRESSION] 6.12.y: d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) causes latencies up to 50ms with PREEMPT_RT Lukas Beckmann
  2026-05-11  5:50 ` Mike Galbraith
@ 2026-05-11 14:21 ` Sasha Levin
  2026-05-11 15:30   ` Mike Galbraith
  2026-05-11 22:08   ` Lukas Beckmann
  1 sibling, 2 replies; 5+ messages in thread
From: Sasha Levin @ 2026-05-11 14:21 UTC (permalink / raw)
  To: Peter Zijlstra, Juri Lelli
  Cc: Sasha Levin, regressions, stable, linux-rt-users, Lukas Beckmann,
	Mike Galbraith

On Sun, May 10, 2026 at 10:57:46PM +0200, Lukas Beckmann wrote:
> I am reporting a regression which was introduced by d66792919d4f on 6.12.y.
> Since this commit, cyclictest reports latencies up to 50 milliseconds,
> on kernels with CONFIG_PREEMPT_RT=y.
[...]
> Is it possible to revert the commit?
>
> I can provide traces or help with testing if needed.

Thanks for the detailed report. Before I revert d66792919d4f from 6.12.y,
I'd like to confirm whether the underlying issue is the missing dl_server
rework chain on 6.12.y rather than the revised wakeup rule itself.

Mike's reply notes that his local 6.12-rt tree carrying the following
three commits in cannot reproduce, while the same tree without them
reproduces quickly:

  cccb45d7c429 ("sched/deadline: Less agressive dl_server handling")
  4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
  a3a70caf7906 ("sched/deadline: Fix dl_server behaviour")

d66792919d4f's upstream commit message explicitly says it relies on the
state established by a3a70caf7906, and none of the three are in 6.12.y.

Could you give those three commits a spin on top of 6.12.y (keeping
d66792919d4f in place) and see whether the latency goes away?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] 6.12.y: d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) causes latencies up to 50ms with PREEMPT_RT
  2026-05-11 14:21 ` Sasha Levin
@ 2026-05-11 15:30   ` Mike Galbraith
  2026-05-11 22:08   ` Lukas Beckmann
  1 sibling, 0 replies; 5+ messages in thread
From: Mike Galbraith @ 2026-05-11 15:30 UTC (permalink / raw)
  To: Sasha Levin, Peter Zijlstra, Juri Lelli
  Cc: regressions, stable, linux-rt-users, Lukas Beckmann

On Mon, 2026-05-11 at 10:21 -0400, Sasha Levin wrote:
> 
> Mike's reply notes that his local 6.12-rt tree carrying the following
> three commits in cannot reproduce, while the same tree without them
> reproduces quickly:
> 
>   cccb45d7c429 ("sched/deadline: Less agressive dl_server handling")
>   4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
>   a3a70caf7906 ("sched/deadline: Fix dl_server behaviour")
> 
> d66792919d4f's upstream commit message explicitly says it relies on the
> state established by a3a70caf7906, and none of the three are in 6.12.y.
> 
> Could you give those three commits a spin on top of 6.12.y (keeping
> d66792919d4f in place) and see whether the latency goes away?

I've meanwhile tried those three alone, and the size XXL hits my box
readily reproduces in virgin source do indeed go away.

'course there may be another shoe, so...

	-Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] 6.12.y: d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) causes latencies up to 50ms with PREEMPT_RT
  2026-05-11 14:21 ` Sasha Levin
  2026-05-11 15:30   ` Mike Galbraith
@ 2026-05-11 22:08   ` Lukas Beckmann
  1 sibling, 0 replies; 5+ messages in thread
From: Lukas Beckmann @ 2026-05-11 22:08 UTC (permalink / raw)
  To: Sasha Levin, Peter Zijlstra, Juri Lelli
  Cc: regressions, stable, linux-rt-users, Lukas Beckmann,
	Mike Galbraith


On 5/11/26 16:21, Sasha Levin wrote:
 > Thanks for the detailed report. Before I revert d66792919d4f from 6.12.y,
 > I'd like to confirm whether the underlying issue is the missing dl_server
 > rework chain on 6.12.y rather than the revised wakeup rule itself.
 >
 > Mike's reply notes that his local 6.12-rt tree carrying the following
 > three commits in cannot reproduce, while the same tree without them
 > reproduces quickly:
 >
 >   cccb45d7c429 ("sched/deadline: Less agressive dl_server handling")
 >   4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
 >   a3a70caf7906 ("sched/deadline: Fix dl_server behaviour")
 >
 > d66792919d4f's upstream commit message explicitly says it relies on the
 > state established by a3a70caf7906, and none of the three are in 6.12.y.
 >
 > Could you give those three commits a spin on top of 6.12.y (keeping
 > d66792919d4f in place) and see whether the latency goes away?

If I apply the three commits on 6.12.y, the latencies indeed go away.
This is running for a few hours now, and the latencies showed up after 
30 minutes tops, with plain 6.12.y before.
I will leave this running.

Note:
I also tried applying only cccb45d7c429 ("sched/deadline: Less agressive 
dl_server handling") before, and that also seems to fix the issue.

Thanks
Lukas

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-11 22:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-10 20:57 [REGRESSION] 6.12.y: d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) causes latencies up to 50ms with PREEMPT_RT Lukas Beckmann
2026-05-11  5:50 ` Mike Galbraith
2026-05-11 14:21 ` Sasha Levin
2026-05-11 15:30   ` Mike Galbraith
2026-05-11 22:08   ` Lukas Beckmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox