* task_non_contending() for fair_server leads to timer retries
@ 2025-07-15 14:39 markus.stockhausen
2025-07-17 9:05 ` Juri Lelli
0 siblings, 1 reply; 3+ messages in thread
From: markus.stockhausen @ 2025-07-15 14:39 UTC (permalink / raw)
To: peterz
Cc: 'Chris Packham', bjorn, mingo, juri.lelli,
vincent.guittot, anna-maria, frederic, tglx, linux-kernel
Hi Peter,
I'm currently investigating issues with the timer-rtl-otto driver in
6.12 longterm on the Realtek MIPS switch platform (Chris is working
hard to upstream this). While doing so I observed that timer retries
continually increase (~6/second) according to /proc/timer_list. The
system is otherwise totally idle. 6.6 longterm does not show that issue.
I'm unsure if this is related but documentation reads like "that's bad".
To be sure about this one I nailed it down to the fair server.
Whenever task_non_contending() handles the fair_server, zerolag_time is
calculated as 0 and a hrtimer_start(timer, 0, ...) call is issued. Going
down the stack clockevents_program_event() thinks the target time has
been exceeded. So it instructs clockevents_program_min_delta() to set
a minimum delta time (2560ns for the otto timer). From there the retry
counter is increased. See attached output.
To silence the noise and focus on the real bug I use this workaround
in task_non_contending():
if ((dl_se == &rq->fair_server) && (zerolag_time == 0))
zerolag_time = 6000;
Totally crap but serves the purpose. Maybe you can share insights about
this (un)desired behaviour.
Thanks in advance.
Markus
# uptime
00:41:19 up 41 min, load average: 0.00, 0.00, 0.00
# cat /proc/timer_list
...
Tick Device: mode: 1
Per CPU device: 0
Clock Event Device: timer@3100
max_delta_ns: 85899344321
min_delta_ns: 2560
mult: 13421773
shift: 32
mode: 3
next_event: 2469910000000 nsecs
set_next_event: rttm_next_event
shutdown: rttm_state_shutdown
periodic: rttm_state_periodic
oneshot: rttm_state_oneshot
event_handler: hrtimer_interrupt
retries: 14646
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: task_non_contending() for fair_server leads to timer retries
2025-07-15 14:39 task_non_contending() for fair_server leads to timer retries markus.stockhausen
@ 2025-07-17 9:05 ` Juri Lelli
2025-07-17 15:05 ` Bjørn Mork
0 siblings, 1 reply; 3+ messages in thread
From: Juri Lelli @ 2025-07-17 9:05 UTC (permalink / raw)
To: markus.stockhausen
Cc: peterz, 'Chris Packham', bjorn, mingo, vincent.guittot,
anna-maria, frederic, tglx, linux-kernel
Hi,
On 15/07/25 16:39, markus.stockhausen@gmx.de wrote:
> Hi Peter,
>
> I'm currently investigating issues with the timer-rtl-otto driver in
> 6.12 longterm on the Realtek MIPS switch platform (Chris is working
> hard to upstream this). While doing so I observed that timer retries
> continually increase (~6/second) according to /proc/timer_list. The
> system is otherwise totally idle. 6.6 longterm does not show that issue.
> I'm unsure if this is related but documentation reads like "that's bad".
>
> To be sure about this one I nailed it down to the fair server.
Apologies for interjecting before Peter had a chance to reply, but I had
a first look and I wonder if this recent patch from Peter (on
tip/sched/core atm) can already help with the issue, as it should
reduce the number of dl-server dequeues:
cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
Can you please check what you see with it?
Thanks!
Juri
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: task_non_contending() for fair_server leads to timer retries
2025-07-17 9:05 ` Juri Lelli
@ 2025-07-17 15:05 ` Bjørn Mork
0 siblings, 0 replies; 3+ messages in thread
From: Bjørn Mork @ 2025-07-17 15:05 UTC (permalink / raw)
To: Juri Lelli
Cc: markus.stockhausen, peterz, 'Chris Packham', mingo,
vincent.guittot, anna-maria, frederic, tglx, linux-kernel
Juri Lelli <juri.lelli@redhat.com> writes:
> On 15/07/25 16:39, markus.stockhausen@gmx.de wrote:
>> Hi Peter,
>>
>> I'm currently investigating issues with the timer-rtl-otto driver in
>> 6.12 longterm on the Realtek MIPS switch platform (Chris is working
>> hard to upstream this). While doing so I observed that timer retries
>> continually increase (~6/second) according to /proc/timer_list. The
>> system is otherwise totally idle. 6.6 longterm does not show that issue.
>> I'm unsure if this is related but documentation reads like "that's bad".
>>
>> To be sure about this one I nailed it down to the fair server.
>
> Apologies for interjecting before Peter had a chance to reply, but I had
> a first look and I wonder if this recent patch from Peter (on
> tip/sched/core atm) can already help with the issue, as it should
> reduce the number of dl-server dequeues:
>
> cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
>
> Can you please check what you see with it?
Spot on. Thanks
I tested cccb45d7c4295 ("sched/deadline: Less agressive dl_server
handling") on top of the 6.12 longterm we're running and the retries
rate is back to "normal".
Bjørn
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-07-17 15:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15 14:39 task_non_contending() for fair_server leads to timer retries markus.stockhausen
2025-07-17 9:05 ` Juri Lelli
2025-07-17 15:05 ` Bjørn Mork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).