linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* task_non_contending() for fair_server leads to timer retries
@ 2025-07-15 14:39 markus.stockhausen
  2025-07-17  9:05 ` Juri Lelli
  0 siblings, 1 reply; 3+ messages in thread
From: markus.stockhausen @ 2025-07-15 14:39 UTC (permalink / raw)
  To: peterz
  Cc: 'Chris Packham', bjorn, mingo, juri.lelli,
	vincent.guittot, anna-maria, frederic, tglx, linux-kernel

Hi Peter,

I'm currently investigating issues with the timer-rtl-otto driver in 
6.12 longterm on the Realtek MIPS switch platform (Chris is working
hard to upstream this). While doing so I observed that timer retries 
continually increase (~6/second) according to /proc/timer_list. The 
system is otherwise totally idle. 6.6 longterm does not show that issue.
I'm unsure if this is related but documentation reads like "that's bad". 

To be sure about this one I nailed it down to the fair server.

Whenever task_non_contending() handles the fair_server, zerolag_time is
calculated as 0 and a hrtimer_start(timer, 0, ...) call is issued. Going
down the stack clockevents_program_event() thinks the target time has 
been exceeded. So it instructs clockevents_program_min_delta() to set
a minimum delta time (2560ns for the otto timer). From there the retry
counter is increased. See attached output.

To silence the noise and focus on the real bug I use this workaround
in task_non_contending(): 

if ((dl_se == &rq->fair_server) && (zerolag_time == 0))
	zerolag_time = 6000;

Totally crap but serves the purpose. Maybe you can share insights about
this (un)desired behaviour. 

Thanks in advance.

Markus

# uptime
 00:41:19 up 41 min,  load average: 0.00, 0.00, 0.00

# cat /proc/timer_list
...
Tick Device: mode:     1
Per CPU device: 0
Clock Event Device: timer@3100
 max_delta_ns:   85899344321
 min_delta_ns:   2560
 mult:           13421773
 shift:          32
 mode:           3
 next_event:     2469910000000 nsecs
 set_next_event: rttm_next_event
 shutdown:       rttm_state_shutdown
 periodic:       rttm_state_periodic
 oneshot:        rttm_state_oneshot
 event_handler:  hrtimer_interrupt

 retries:        14646


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: task_non_contending() for fair_server leads to timer retries
  2025-07-15 14:39 task_non_contending() for fair_server leads to timer retries markus.stockhausen
@ 2025-07-17  9:05 ` Juri Lelli
  2025-07-17 15:05   ` Bjørn Mork
  0 siblings, 1 reply; 3+ messages in thread
From: Juri Lelli @ 2025-07-17  9:05 UTC (permalink / raw)
  To: markus.stockhausen
  Cc: peterz, 'Chris Packham', bjorn, mingo, vincent.guittot,
	anna-maria, frederic, tglx, linux-kernel

Hi,

On 15/07/25 16:39, markus.stockhausen@gmx.de wrote:
> Hi Peter,
> 
> I'm currently investigating issues with the timer-rtl-otto driver in 
> 6.12 longterm on the Realtek MIPS switch platform (Chris is working
> hard to upstream this). While doing so I observed that timer retries 
> continually increase (~6/second) according to /proc/timer_list. The 
> system is otherwise totally idle. 6.6 longterm does not show that issue.
> I'm unsure if this is related but documentation reads like "that's bad". 
> 
> To be sure about this one I nailed it down to the fair server.

Apologies for interjecting before Peter had a chance to reply, but I had
a first look and I wonder if this recent patch from Peter (on
tip/sched/core atm) can already help with the issue, as it should
reduce the number of dl-server dequeues:

cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")

Can you please check what you see with it?

Thanks!
Juri


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: task_non_contending() for fair_server leads to timer retries
  2025-07-17  9:05 ` Juri Lelli
@ 2025-07-17 15:05   ` Bjørn Mork
  0 siblings, 0 replies; 3+ messages in thread
From: Bjørn Mork @ 2025-07-17 15:05 UTC (permalink / raw)
  To: Juri Lelli
  Cc: markus.stockhausen, peterz, 'Chris Packham', mingo,
	vincent.guittot, anna-maria, frederic, tglx, linux-kernel

Juri Lelli <juri.lelli@redhat.com> writes:
> On 15/07/25 16:39, markus.stockhausen@gmx.de wrote:
>> Hi Peter,
>> 
>> I'm currently investigating issues with the timer-rtl-otto driver in 
>> 6.12 longterm on the Realtek MIPS switch platform (Chris is working
>> hard to upstream this). While doing so I observed that timer retries 
>> continually increase (~6/second) according to /proc/timer_list. The 
>> system is otherwise totally idle. 6.6 longterm does not show that issue.
>> I'm unsure if this is related but documentation reads like "that's bad". 
>> 
>> To be sure about this one I nailed it down to the fair server.
>
> Apologies for interjecting before Peter had a chance to reply, but I had
> a first look and I wonder if this recent patch from Peter (on
> tip/sched/core atm) can already help with the issue, as it should
> reduce the number of dl-server dequeues:
>
> cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
>
> Can you please check what you see with it?

Spot on.  Thanks
 
I tested cccb45d7c4295 ("sched/deadline: Less agressive dl_server
handling") on top of the 6.12 longterm we're running and the retries
rate is back to "normal".


Bjørn

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-07-17 15:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15 14:39 task_non_contending() for fair_server leads to timer retries markus.stockhausen
2025-07-17  9:05 ` Juri Lelli
2025-07-17 15:05   ` Bjørn Mork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).