All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
@ 2025-08-16 16:38 Jirka Hladky
  2025-08-24  9:44 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Jirka Hladky @ 2025-08-16 16:38 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, john.stultz, anna-maria
  Cc: Philip Auld, Prarit Bhargava, Luis Goncalves, Miroslav Lichvar,
	Luke Yang, Jan Jurca, Joe Mario

Hello,

I'm reporting a performance regression in kernel 6.13 that causes a
76% performance loss in timer-heavy workloads. Through kernel
bisection, we have identified the root cause as commit
513793bc6ab331b947111e8efaf8fcef33fb83e5.

Summary

Regression: 76% performance drop in applications using nanosleep()/POSIX timers
 * 4.3x increase in timer overruns and voluntary context switches
  * Dramatic drop in timer completion rate (76% -> 20%)
  * Over 99% of timers fail to expire when timer migration is disabled in 6.13
Root Cause: commit 513793bc6ab3 "posix-timers: Make signal delivery consistent"
Impact: Timer signal delivery mechanism broken
Reproducer: stress-ng --timer workload on any system.

/usr/bin/time -v ./stress-ng --timer 1 -t 23 --verbose --metrics-brief
--yaml /dev/stdout 2>&1 | tee $(uname -r)_timer.log
grep -Poh 'bogo-ops-per-second-real-time: \K[0-9.]+' $(uname -r)_timer.log

6.12 kernel:
User time (seconds): 9.71
Percent of CPU this job got: 73%
stress-ng: metrc: [39351] stressor       bogo ops real time  usr time
sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [39351]                           (secs)    (secs)
 (secs)   (real time) (usr+sys time)
stress-ng: metrc: [39351] timer          11253022     23.01      9.71
    7.01    489125.18      673113.26
timer: 3655093 timer overruns (instance 0)
Voluntary context switches: 720747

6.13 kernel:
User time (seconds): 4.02
Percent of CPU this job got: 28%
stress-ng: metrc: [5416] stressor       bogo ops real time  usr time
sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [5416]                           (secs)    (secs)
(secs)   (real time) (usr+sys time)
stress-ng: metrc: [5416] timer           3103864     23.00      4.02
   2.08    134950.34      509002.47
timer: 15578896 timer overruns (instance 0)
Voluntary context switches: 3100815

CPU utilization dropped significantly, while timer overruns and
voluntary context switches increased by a factor of 4.3x.

It's interesting to examine hrtimer events with perf-record:
perf sched record -e timer:hrtimer_start -e timer:hrtimer_expire_entry
-e timer:hrtimer_expire_exit --output="hrtimer-$(uname -r).perf"
./stress-ng --timer 1 -t 23 --metrics-brief --yaml /dev/stdout
perf sched script -i "hrtimer-$(uname -r).perf" > "hrtimer-$(uname -r).txt"

grep -c hrtimer_start hrtimer*txt
6.12: 10898132
6.13: 17105314

grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
6.12: 8358469
6.13: 3476757

The number of timers started increased significantly in 6.13, but most
timers do not expire. Completion rate went down from 76% to 20%

The next test was to disable timer migrations with the 6.13 kernel:
echo 0 > /proc/sys/kernel/timer_migration

6.13, /proc/sys/kernel/timer_migration set to zero
User time (seconds): 10.42
Percent of CPU this job got: 59%
stress-ng: metrc: [5927] stressor       bogo ops real time  usr time
sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [5927]                           (secs)    (secs)
(secs)   (real time) (usr+sys time)
stress-ng: metrc: [5927] timer           7004133     23.00     10.41
   3.11    304526.98      518257.73
timer: 7102554 timer overruns (instance 0)
Voluntary context switches: 7009365

Results improve, but there is still a 40% performance drop compared to
6.12 (489125 versus 304526 bogo ops/s).

I have also tried to add CPU pinning, but it had almost no effect:
6.13, /proc/sys/kernel/timer_migration set to zero, processed pin to one CPU:
$ taskset -c 10 /usr/bin/time -v ./stress-ng --timer 1 -t 23 --verbose
--metrics-brief 2>&1 | tee $(uname
-r)_timer_timer_migration_off_pinned.log
User time (seconds): 10.34
Percent of CPU this job got: 61%
stress-ng: metrc: [6230] stressor       bogo ops real time  usr time
sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [6230]                           (secs)    (secs)
(secs)   (real time) (usr+sys time)
stress-ng: metrc: [6230] timer           7129797     23.00     10.33
   3.53    309991.17      514479.47
timer: 7152958 timer overruns (instance 0)
Voluntary context switches: 7128460

Using perf record to trace hrtimer events reveals the following:

Kernel      hrtimer_start    hrtimer_expire_entry    Completion Rate
6.12         10,898,132         8,358,469               76.7%
6.13         17,105,314         3,476,757               20.3%
6.13+mig=0   17,067,784            30,841                0.18%

Over 99% of timers fail to expire properly in 6.13 with timer
migration disabled, indicating broken timer signal delivery.

We have collected results on a dual-socket Intel Emerald Rapids system
with 256 CPUs, but we observed the same problem on other systems as
well. Intel and AMD x86_64, aarch64, and ppc64le are all affected. The
regression is more pronounced on systems with higher CPU counts.

I have additional performance traces, perf data, and test
configurations available if needed for debugging. I'm happy to test
patches or provide more detailed analysis.

We have also tested kernel 6.16, and it behaves the same as kernel 6.13.

Thank you!
Jirka


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
  2025-08-16 16:38 [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent" Jirka Hladky
@ 2025-08-24  9:44 ` Thomas Gleixner
  2025-08-25  7:14   ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2025-08-24  9:44 UTC (permalink / raw)
  To: Jirka Hladky, linux-kernel, john.stultz, anna-maria
  Cc: Philip Auld, Prarit Bhargava, Luis Goncalves, Miroslav Lichvar,
	Luke Yang, Jan Jurca, Joe Mario

On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> I'm reporting a performance regression in kernel 6.13 that causes a
> 76% performance loss in timer-heavy workloads.

Are you talking about real world workloads or about the stress-ng bogosity?

> Through kernel bisection, we have identified the root cause as commit
> 513793bc6ab331b947111e8efaf8fcef33fb83e5.
>
> Summary
>
> Regression: 76% performance drop in applications using nanosleep()/POSIX timers
>  * 4.3x increase in timer overruns and voluntary context switches
>   * Dramatic drop in timer completion rate (76% -> 20%)
>   * Over 99% of timers fail to expire when timer migration is disabled in 6.13
> Root Cause: commit 513793bc6ab3 "posix-timers: Make signal delivery consistent"
> Impact: Timer signal delivery mechanism broken
> Reproducer: stress-ng --timer workload on any system.

That does:

arm_timer()
{
     timer.it_value.tv_sec = ...;
     timer.it_value.tv_nsec = ...;

     timer.it_interval.tv_sec = timer.it_value.tv_sec;
     timer.it_interval.tv_nsec = timer.it_value.tv_nsec;

     timer_settime(....&timer);
}

and in the signal handler it does:

     ...
     timer_getoverrun();
     arm_timer();

So from the kernel POV this means:

user space starts timer
arm_timer()
    ....    
        hrtimer_start()
    ...
        hrtimer_expire()
          raise_signal()

   signal_delivery()
        if (interval > 0)
#1          hrtimer_start()

user space signal_handler()
     
arm_timer()

        hrtimer_cancel();

#2      clear pending and overrun

        hrtimer_start();

So it's exactly doing what user space asks for.

Older kernels accounted for overruns and pending signals which might
have accumulated between #1 and #2, which is undefined behaviour as user
space cannot longer differentiate to which arming the expiry or the
overruns belong.

So clearing it when rearmed is the obvious correct thing to do because
it makes it consistent, no?

The same applies for the disarm scenario:

arm_timer()
     ...
     expires()
       raise_signal()

disarm_timer()
     ...
     discard signal

Older kernels did not discard it, but that makes zero sense because
after disarming the timer both the signal and the overrun becomes
immediately meaningless, no?

And this has nothing to do with timer migration or whatever, that's just
a matter of correctness.

If you can point me to a real world workload, which uses timers
correctly and does not just do random stuff with them, I'm happy to look
into it.

But this stress-ng thing is just made up nonsense which created bogus
statistics forever. So comparing bogus numbers is not an indicator for
a real regression.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
  2025-08-24  9:44 ` Thomas Gleixner
@ 2025-08-25  7:14   ` Thomas Gleixner
  2025-08-25 11:35     ` Jirka Hladky
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2025-08-25  7:14 UTC (permalink / raw)
  To: Jirka Hladky, linux-kernel, john.stultz, anna-maria
  Cc: Philip Auld, Prarit Bhargava, Luis Goncalves, Miroslav Lichvar,
	Luke Yang, Jan Jurca, Joe Mario

On Sun, Aug 24 2025 at 11:44, Thomas Gleixner wrote:
> On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> And this has nothing to do with timer migration or whatever, that's just
> a matter of correctness.

Just to come back to timer migration. That's completely irrelevant here
because /proc/sys/kernel/timer_migration only affects the timer wheel
and _not_ hrtimers, which are used here.

And just a few more comments about your findings:

> grep -c hrtimer_start hrtimer*txt
> 6.12: 10898132
> 6.13: 17105314
> 
> grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
> hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
> 6.12: 8358469
> 6.13: 3476757
> 
> The number of timers started increased significantly in 6.13, but most
> timers do not expire. Completion rate went down from 76% to 20%

Did you actually look _which_ timers were started and which ones did
expire and which ones not?

Data for a 2 seconds run (couldn't be bothered to wait 23 seconds)

On 6.10:

All start/expire:

# grep -c 'hrtimer_start' t.txt 
248039
# grep -c 'hrtimer_expire' t.txt 
247530

stress-ng Posix timer related:

# grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt 
246739
# grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt 
246739

stress-ng nanosleep related:

# grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt 
2
# grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt 
2

On 6.17-rc1:

All start/expire:

# grep -c 'hrtimer_start' t.txt 
457456
# grep -c 'hrtimer_expire' t.txt 
304959

stress-ng Posix timer related:

# grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt 
304673
# grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt 
304674

stress-ng nanosleep related:

# grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt 
152241
# grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt 
1

The 150k timers which do not expire are related to the restarted
nanosleep(), because the nanosleep is canceled due to the signal and has
to be re-started.

On 6.10 that does not even reach the nanosleep in the test thread
because the thing is too busy with bogus signal handling.

Trace for 6.10

 stress-ng-timer-2229    [110] .....   187.938505: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
 stress-ng-timer-2229    [110] d..2.   187.938505: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716941003 softexpires=186716941003 mode=ABS

Signal handler re-arms the timer

 stress-ng-timer-2229    [110] .....   187.938505: sys_timer_settime -> 0x0
 stress-ng-timer-2229    [110] .....   187.938506: sys_rt_sigreturn()

Returns from signal handler

 stress-ng-timer-2229    [110] d..1.   187.938506: posixtimer_rearm <-dequeue_signal

Dequeues the signal which was related to the arming _before_ the signal
handler re-arms it. So it's incorrectly delivered.

 stress-ng-timer-2229    [110] d.h..   187.938507: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716941468

Now the timer which was armed in the signal handler above expires

 stress-ng-timer-2229    [110] .....   187.938507: sys_timer_getoverrun(timer_id: 0)
 stress-ng-timer-2229    [110] .....   187.938507: sys_timer_getoverrun -> 0x0

While the signal handler handles the bogus left over signal

Lather, rinse and repeat.

 stress-ng-timer-2229    [110] .....   187.938508: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
 stress-ng-timer-2229    [110] d..2.   187.938508: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716943483 softexpires=186716943483 mode=ABS
 stress-ng-timer-2229    [110] .....   187.938508: sys_timer_settime -> 0x0
 stress-ng-timer-2229    [110] .....   187.938508: sys_rt_sigreturn()
 stress-ng-timer-2229    [110] d..1.   187.938508: posixtimer_rearm <-dequeue_signal
 stress-ng-timer-2229    [110] d.h..   187.938509: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716943952

vs. 6.17

 stress-ng-timer-1828    [029] .....    84.089978: sys_rt_sigreturn()
 stress-ng-timer-1828    [029] d..1.    84.089979: posixtimer_deliver_signal <-dequeue_signal

Signal, which was generated by the original armed timer is correctly ignored

 stress-ng-timer-1828    [029] d..1.    84.089979: hrtimer_start: hrtimer=0000000081582a37 function=hrtimer_wakeup expires=83144889279 softexpires=83144839279 mode=REL

Nanosleep is restarted

          <idle>-0       [029] d.h1.    84.089980: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134840265

Timer which was armed in the signal handler expires

 stress-ng-timer-1828    [029] d..1.    84.089981: posixtimer_deliver_signal <-dequeue_signal

Signal is delivered and timer is re-armed:

 stress-ng-timer-1828    [029] d..2.    84.089981: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134842396 softexpires=83134842396 mode=ABS

Signal is handled

 stress-ng-timer-1828    [029] .....    84.089982: sys_timer_getoverrun(timer_id: 0)
 stress-ng-timer-1828    [029] .....    84.089982: sys_timer_getoverrun -> 0x2
 stress-ng-timer-1828    [029] d.h..    84.089983: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134842856

Re-armed timer expires and queues a signal

 stress-ng-timer-1828    [029] .....    84.089983: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7cccf7dec0, old_setting: 0)

Timer is re-armed

 stress-ng-timer-1828    [029] d..2.    84.089983: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134844444 softexpires=83134844444 mode=ABS
 stress-ng-timer-1828    [029] .....    84.089983: sys_timer_settime -> 0x0
 stress-ng-timer-1828    [029] .....    84.089983: sys_rt_sigreturn()

Signal, which was generated by the timer armed on signal dequeue is
correctly ignored

 stress-ng-timer-1828    [029] d..1.    84.089984: posixtimer_deliver_signal <-dequeue_signal


Thanks,

        tglx


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
  2025-08-25  7:14   ` Thomas Gleixner
@ 2025-08-25 11:35     ` Jirka Hladky
  2025-08-25 20:04       ` Jirka Hladky
  0 siblings, 1 reply; 5+ messages in thread
From: Jirka Hladky @ 2025-08-25 11:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, anna-maria, Philip Auld,
	Prarit Bhargava, Luis Goncalves, Miroslav Lichvar, Luke Yang,
	Jan Jurca, Joe Mario

 Hallo Thomas,

Vielen Dank for the detailed analysis, explanations, and insights. You
are right, I fully agree with your conclusions.

I apologize for reporting this as a regression. I've filed an issue
with stress-ng: https://github.com/ColinIanKing/stress-ng/issues/560
to request improving or documenting the stress-ng timer test
accordingly.

Liebe Grüße,
Jirka


On Mon, Aug 25, 2025 at 9:15 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sun, Aug 24 2025 at 11:44, Thomas Gleixner wrote:
> > On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> > And this has nothing to do with timer migration or whatever, that's just
> > a matter of correctness.
>
> Just to come back to timer migration. That's completely irrelevant here
> because /proc/sys/kernel/timer_migration only affects the timer wheel
> and _not_ hrtimers, which are used here.
>
> And just a few more comments about your findings:
>
> > grep -c hrtimer_start hrtimer*txt
> > 6.12: 10898132
> > 6.13: 17105314
> >
> > grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
> > hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
> > 6.12: 8358469
> > 6.13: 3476757
> >
> > The number of timers started increased significantly in 6.13, but most
> > timers do not expire. Completion rate went down from 76% to 20%
>
> Did you actually look _which_ timers were started and which ones did
> expire and which ones not?
>
> Data for a 2 seconds run (couldn't be bothered to wait 23 seconds)
>
> On 6.10:
>
> All start/expire:
>
> # grep -c 'hrtimer_start' t.txt
> 248039
> # grep -c 'hrtimer_expire' t.txt
> 247530
>
> stress-ng Posix timer related:
>
> # grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
> 246739
> # grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
> 246739
>
> stress-ng nanosleep related:
>
> # grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
> 2
> # grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
> 2
>
> On 6.17-rc1:
>
> All start/expire:
>
> # grep -c 'hrtimer_start' t.txt
> 457456
> # grep -c 'hrtimer_expire' t.txt
> 304959
>
> stress-ng Posix timer related:
>
> # grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
> 304673
> # grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
> 304674
>
> stress-ng nanosleep related:
>
> # grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
> 152241
> # grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
> 1
>
> The 150k timers which do not expire are related to the restarted
> nanosleep(), because the nanosleep is canceled due to the signal and has
> to be re-started.
>
> On 6.10 that does not even reach the nanosleep in the test thread
> because the thing is too busy with bogus signal handling.
>
> Trace for 6.10
>
>  stress-ng-timer-2229    [110] .....   187.938505: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
>  stress-ng-timer-2229    [110] d..2.   187.938505: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716941003 softexpires=186716941003 mode=ABS
>
> Signal handler re-arms the timer
>
>  stress-ng-timer-2229    [110] .....   187.938505: sys_timer_settime -> 0x0
>  stress-ng-timer-2229    [110] .....   187.938506: sys_rt_sigreturn()
>
> Returns from signal handler
>
>  stress-ng-timer-2229    [110] d..1.   187.938506: posixtimer_rearm <-dequeue_signal
>
> Dequeues the signal which was related to the arming _before_ the signal
> handler re-arms it. So it's incorrectly delivered.
>
>  stress-ng-timer-2229    [110] d.h..   187.938507: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716941468
>
> Now the timer which was armed in the signal handler above expires
>
>  stress-ng-timer-2229    [110] .....   187.938507: sys_timer_getoverrun(timer_id: 0)
>  stress-ng-timer-2229    [110] .....   187.938507: sys_timer_getoverrun -> 0x0
>
> While the signal handler handles the bogus left over signal
>
> Lather, rinse and repeat.
>
>  stress-ng-timer-2229    [110] .....   187.938508: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
>  stress-ng-timer-2229    [110] d..2.   187.938508: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716943483 softexpires=186716943483 mode=ABS
>  stress-ng-timer-2229    [110] .....   187.938508: sys_timer_settime -> 0x0
>  stress-ng-timer-2229    [110] .....   187.938508: sys_rt_sigreturn()
>  stress-ng-timer-2229    [110] d..1.   187.938508: posixtimer_rearm <-dequeue_signal
>  stress-ng-timer-2229    [110] d.h..   187.938509: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716943952
>
> vs. 6.17
>
>  stress-ng-timer-1828    [029] .....    84.089978: sys_rt_sigreturn()
>  stress-ng-timer-1828    [029] d..1.    84.089979: posixtimer_deliver_signal <-dequeue_signal
>
> Signal, which was generated by the original armed timer is correctly ignored
>
>  stress-ng-timer-1828    [029] d..1.    84.089979: hrtimer_start: hrtimer=0000000081582a37 function=hrtimer_wakeup expires=83144889279 softexpires=83144839279 mode=REL
>
> Nanosleep is restarted
>
>           <idle>-0       [029] d.h1.    84.089980: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134840265
>
> Timer which was armed in the signal handler expires
>
>  stress-ng-timer-1828    [029] d..1.    84.089981: posixtimer_deliver_signal <-dequeue_signal
>
> Signal is delivered and timer is re-armed:
>
>  stress-ng-timer-1828    [029] d..2.    84.089981: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134842396 softexpires=83134842396 mode=ABS
>
> Signal is handled
>
>  stress-ng-timer-1828    [029] .....    84.089982: sys_timer_getoverrun(timer_id: 0)
>  stress-ng-timer-1828    [029] .....    84.089982: sys_timer_getoverrun -> 0x2
>  stress-ng-timer-1828    [029] d.h..    84.089983: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134842856
>
> Re-armed timer expires and queues a signal
>
>  stress-ng-timer-1828    [029] .....    84.089983: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7cccf7dec0, old_setting: 0)
>
> Timer is re-armed
>
>  stress-ng-timer-1828    [029] d..2.    84.089983: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134844444 softexpires=83134844444 mode=ABS
>  stress-ng-timer-1828    [029] .....    84.089983: sys_timer_settime -> 0x0
>  stress-ng-timer-1828    [029] .....    84.089983: sys_rt_sigreturn()
>
> Signal, which was generated by the timer armed on signal dequeue is
> correctly ignored
>
>  stress-ng-timer-1828    [029] d..1.    84.089984: posixtimer_deliver_signal <-dequeue_signal
>
>
> Thanks,
>
>         tglx
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
  2025-08-25 11:35     ` Jirka Hladky
@ 2025-08-25 20:04       ` Jirka Hladky
  0 siblings, 0 replies; 5+ messages in thread
From: Jirka Hladky @ 2025-08-25 20:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, john.stultz, anna-maria, Philip Auld,
	Prarit Bhargava, Luis Goncalves, Miroslav Lichvar, Luke Yang,
	Jan Jurca, Joe Mario

Let me close a loop on this one.

Colin Ian King has updated the timer test in Stress_ng accordingly:
https://github.com/ColinIanKing/stress-ng/commit/048b4cb1bbc83cee377330b20e1bd366b651e4ad

Thank you, everybody!
Jirka

On Mon, Aug 25, 2025 at 1:35 PM Jirka Hladky <jhladky@redhat.com> wrote:
>
>  Hallo Thomas,
>
> Vielen Dank for the detailed analysis, explanations, and insights. You
> are right, I fully agree with your conclusions.
>
> I apologize for reporting this as a regression. I've filed an issue
> with stress-ng: https://github.com/ColinIanKing/stress-ng/issues/560
> to request improving or documenting the stress-ng timer test
> accordingly.
>
> Liebe Grüße,
> Jirka
>
>
> On Mon, Aug 25, 2025 at 9:15 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Sun, Aug 24 2025 at 11:44, Thomas Gleixner wrote:
> > > On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> > > And this has nothing to do with timer migration or whatever, that's just
> > > a matter of correctness.
> >
> > Just to come back to timer migration. That's completely irrelevant here
> > because /proc/sys/kernel/timer_migration only affects the timer wheel
> > and _not_ hrtimers, which are used here.
> >
> > And just a few more comments about your findings:
> >
> > > grep -c hrtimer_start hrtimer*txt
> > > 6.12: 10898132
> > > 6.13: 17105314
> > >
> > > grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
> > > hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
> > > 6.12: 8358469
> > > 6.13: 3476757
> > >
> > > The number of timers started increased significantly in 6.13, but most
> > > timers do not expire. Completion rate went down from 76% to 20%
> >
> > Did you actually look _which_ timers were started and which ones did
> > expire and which ones not?
> >
> > Data for a 2 seconds run (couldn't be bothered to wait 23 seconds)
> >
> > On 6.10:
> >
> > All start/expire:
> >
> > # grep -c 'hrtimer_start' t.txt
> > 248039
> > # grep -c 'hrtimer_expire' t.txt
> > 247530
> >
> > stress-ng Posix timer related:
> >
> > # grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
> > 246739
> > # grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
> > 246739
> >
> > stress-ng nanosleep related:
> >
> > # grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
> > 2
> > # grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
> > 2
> >
> > On 6.17-rc1:
> >
> > All start/expire:
> >
> > # grep -c 'hrtimer_start' t.txt
> > 457456
> > # grep -c 'hrtimer_expire' t.txt
> > 304959
> >
> > stress-ng Posix timer related:
> >
> > # grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
> > 304673
> > # grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
> > 304674
> >
> > stress-ng nanosleep related:
> >
> > # grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
> > 152241
> > # grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
> > 1
> >
> > The 150k timers which do not expire are related to the restarted
> > nanosleep(), because the nanosleep is canceled due to the signal and has
> > to be re-started.
> >
> > On 6.10 that does not even reach the nanosleep in the test thread
> > because the thing is too busy with bogus signal handling.
> >
> > Trace for 6.10
> >
> >  stress-ng-timer-2229    [110] .....   187.938505: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
> >  stress-ng-timer-2229    [110] d..2.   187.938505: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716941003 softexpires=186716941003 mode=ABS
> >
> > Signal handler re-arms the timer
> >
> >  stress-ng-timer-2229    [110] .....   187.938505: sys_timer_settime -> 0x0
> >  stress-ng-timer-2229    [110] .....   187.938506: sys_rt_sigreturn()
> >
> > Returns from signal handler
> >
> >  stress-ng-timer-2229    [110] d..1.   187.938506: posixtimer_rearm <-dequeue_signal
> >
> > Dequeues the signal which was related to the arming _before_ the signal
> > handler re-arms it. So it's incorrectly delivered.
> >
> >  stress-ng-timer-2229    [110] d.h..   187.938507: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716941468
> >
> > Now the timer which was armed in the signal handler above expires
> >
> >  stress-ng-timer-2229    [110] .....   187.938507: sys_timer_getoverrun(timer_id: 0)
> >  stress-ng-timer-2229    [110] .....   187.938507: sys_timer_getoverrun -> 0x0
> >
> > While the signal handler handles the bogus left over signal
> >
> > Lather, rinse and repeat.
> >
> >  stress-ng-timer-2229    [110] .....   187.938508: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
> >  stress-ng-timer-2229    [110] d..2.   187.938508: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716943483 softexpires=186716943483 mode=ABS
> >  stress-ng-timer-2229    [110] .....   187.938508: sys_timer_settime -> 0x0
> >  stress-ng-timer-2229    [110] .....   187.938508: sys_rt_sigreturn()
> >  stress-ng-timer-2229    [110] d..1.   187.938508: posixtimer_rearm <-dequeue_signal
> >  stress-ng-timer-2229    [110] d.h..   187.938509: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716943952
> >
> > vs. 6.17
> >
> >  stress-ng-timer-1828    [029] .....    84.089978: sys_rt_sigreturn()
> >  stress-ng-timer-1828    [029] d..1.    84.089979: posixtimer_deliver_signal <-dequeue_signal
> >
> > Signal, which was generated by the original armed timer is correctly ignored
> >
> >  stress-ng-timer-1828    [029] d..1.    84.089979: hrtimer_start: hrtimer=0000000081582a37 function=hrtimer_wakeup expires=83144889279 softexpires=83144839279 mode=REL
> >
> > Nanosleep is restarted
> >
> >           <idle>-0       [029] d.h1.    84.089980: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134840265
> >
> > Timer which was armed in the signal handler expires
> >
> >  stress-ng-timer-1828    [029] d..1.    84.089981: posixtimer_deliver_signal <-dequeue_signal
> >
> > Signal is delivered and timer is re-armed:
> >
> >  stress-ng-timer-1828    [029] d..2.    84.089981: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134842396 softexpires=83134842396 mode=ABS
> >
> > Signal is handled
> >
> >  stress-ng-timer-1828    [029] .....    84.089982: sys_timer_getoverrun(timer_id: 0)
> >  stress-ng-timer-1828    [029] .....    84.089982: sys_timer_getoverrun -> 0x2
> >  stress-ng-timer-1828    [029] d.h..    84.089983: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134842856
> >
> > Re-armed timer expires and queues a signal
> >
> >  stress-ng-timer-1828    [029] .....    84.089983: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7cccf7dec0, old_setting: 0)
> >
> > Timer is re-armed
> >
> >  stress-ng-timer-1828    [029] d..2.    84.089983: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134844444 softexpires=83134844444 mode=ABS
> >  stress-ng-timer-1828    [029] .....    84.089983: sys_timer_settime -> 0x0
> >  stress-ng-timer-1828    [029] .....    84.089983: sys_rt_sigreturn()
> >
> > Signal, which was generated by the timer armed on signal dequeue is
> > correctly ignored
> >
> >  stress-ng-timer-1828    [029] d..1.    84.089984: posixtimer_deliver_signal <-dequeue_signal
> >
> >
> > Thanks,
> >
> >         tglx
> >
>
>
> --
> -Jirka



-- 
-Jirka


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-25 20:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-16 16:38 [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent" Jirka Hladky
2025-08-24  9:44 ` Thomas Gleixner
2025-08-25  7:14   ` Thomas Gleixner
2025-08-25 11:35     ` Jirka Hladky
2025-08-25 20:04       ` Jirka Hladky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.