* [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
@ 2025-08-16 16:38 Jirka Hladky
2025-08-24 9:44 ` Thomas Gleixner
0 siblings, 1 reply; 5+ messages in thread
From: Jirka Hladky @ 2025-08-16 16:38 UTC (permalink / raw)
To: linux-kernel, Thomas Gleixner, john.stultz, anna-maria
Cc: Philip Auld, Prarit Bhargava, Luis Goncalves, Miroslav Lichvar,
Luke Yang, Jan Jurca, Joe Mario
Hello,
I'm reporting a performance regression in kernel 6.13 that causes a
76% performance loss in timer-heavy workloads. Through kernel
bisection, we have identified the root cause as commit
513793bc6ab331b947111e8efaf8fcef33fb83e5.
Summary
Regression: 76% performance drop in applications using nanosleep()/POSIX timers
* 4.3x increase in timer overruns and voluntary context switches
* Dramatic drop in timer completion rate (76% -> 20%)
* Over 99% of timers fail to expire when timer migration is disabled in 6.13
Root Cause: commit 513793bc6ab3 "posix-timers: Make signal delivery consistent"
Impact: Timer signal delivery mechanism broken
Reproducer: stress-ng --timer workload on any system.
/usr/bin/time -v ./stress-ng --timer 1 -t 23 --verbose --metrics-brief
--yaml /dev/stdout 2>&1 | tee $(uname -r)_timer.log
grep -Poh 'bogo-ops-per-second-real-time: \K[0-9.]+' $(uname -r)_timer.log
6.12 kernel:
User time (seconds): 9.71
Percent of CPU this job got: 73%
stress-ng: metrc: [39351] stressor bogo ops real time usr time
sys time bogo ops/s bogo ops/s
stress-ng: metrc: [39351] (secs) (secs)
(secs) (real time) (usr+sys time)
stress-ng: metrc: [39351] timer 11253022 23.01 9.71
7.01 489125.18 673113.26
timer: 3655093 timer overruns (instance 0)
Voluntary context switches: 720747
6.13 kernel:
User time (seconds): 4.02
Percent of CPU this job got: 28%
stress-ng: metrc: [5416] stressor bogo ops real time usr time
sys time bogo ops/s bogo ops/s
stress-ng: metrc: [5416] (secs) (secs)
(secs) (real time) (usr+sys time)
stress-ng: metrc: [5416] timer 3103864 23.00 4.02
2.08 134950.34 509002.47
timer: 15578896 timer overruns (instance 0)
Voluntary context switches: 3100815
CPU utilization dropped significantly, while timer overruns and
voluntary context switches increased by a factor of 4.3x.
It's interesting to examine hrtimer events with perf-record:
perf sched record -e timer:hrtimer_start -e timer:hrtimer_expire_entry
-e timer:hrtimer_expire_exit --output="hrtimer-$(uname -r).perf"
./stress-ng --timer 1 -t 23 --metrics-brief --yaml /dev/stdout
perf sched script -i "hrtimer-$(uname -r).perf" > "hrtimer-$(uname -r).txt"
grep -c hrtimer_start hrtimer*txt
6.12: 10898132
6.13: 17105314
grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
6.12: 8358469
6.13: 3476757
The number of timers started increased significantly in 6.13, but most
timers do not expire. Completion rate went down from 76% to 20%
The next test was to disable timer migrations with the 6.13 kernel:
echo 0 > /proc/sys/kernel/timer_migration
6.13, /proc/sys/kernel/timer_migration set to zero
User time (seconds): 10.42
Percent of CPU this job got: 59%
stress-ng: metrc: [5927] stressor bogo ops real time usr time
sys time bogo ops/s bogo ops/s
stress-ng: metrc: [5927] (secs) (secs)
(secs) (real time) (usr+sys time)
stress-ng: metrc: [5927] timer 7004133 23.00 10.41
3.11 304526.98 518257.73
timer: 7102554 timer overruns (instance 0)
Voluntary context switches: 7009365
Results improve, but there is still a 40% performance drop compared to
6.12 (489125 versus 304526 bogo ops/s).
I have also tried to add CPU pinning, but it had almost no effect:
6.13, /proc/sys/kernel/timer_migration set to zero, processed pin to one CPU:
$ taskset -c 10 /usr/bin/time -v ./stress-ng --timer 1 -t 23 --verbose
--metrics-brief 2>&1 | tee $(uname
-r)_timer_timer_migration_off_pinned.log
User time (seconds): 10.34
Percent of CPU this job got: 61%
stress-ng: metrc: [6230] stressor bogo ops real time usr time
sys time bogo ops/s bogo ops/s
stress-ng: metrc: [6230] (secs) (secs)
(secs) (real time) (usr+sys time)
stress-ng: metrc: [6230] timer 7129797 23.00 10.33
3.53 309991.17 514479.47
timer: 7152958 timer overruns (instance 0)
Voluntary context switches: 7128460
Using perf record to trace hrtimer events reveals the following:
Kernel hrtimer_start hrtimer_expire_entry Completion Rate
6.12 10,898,132 8,358,469 76.7%
6.13 17,105,314 3,476,757 20.3%
6.13+mig=0 17,067,784 30,841 0.18%
Over 99% of timers fail to expire properly in 6.13 with timer
migration disabled, indicating broken timer signal delivery.
We have collected results on a dual-socket Intel Emerald Rapids system
with 256 CPUs, but we observed the same problem on other systems as
well. Intel and AMD x86_64, aarch64, and ppc64le are all affected. The
regression is more pronounced on systems with higher CPU counts.
I have additional performance traces, perf data, and test
configurations available if needed for debugging. I'm happy to test
patches or provide more detailed analysis.
We have also tested kernel 6.16, and it behaves the same as kernel 6.13.
Thank you!
Jirka
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
2025-08-16 16:38 [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent" Jirka Hladky
@ 2025-08-24 9:44 ` Thomas Gleixner
2025-08-25 7:14 ` Thomas Gleixner
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2025-08-24 9:44 UTC (permalink / raw)
To: Jirka Hladky, linux-kernel, john.stultz, anna-maria
Cc: Philip Auld, Prarit Bhargava, Luis Goncalves, Miroslav Lichvar,
Luke Yang, Jan Jurca, Joe Mario
On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> I'm reporting a performance regression in kernel 6.13 that causes a
> 76% performance loss in timer-heavy workloads.
Are you talking about real world workloads or about the stress-ng bogosity?
> Through kernel bisection, we have identified the root cause as commit
> 513793bc6ab331b947111e8efaf8fcef33fb83e5.
>
> Summary
>
> Regression: 76% performance drop in applications using nanosleep()/POSIX timers
> * 4.3x increase in timer overruns and voluntary context switches
> * Dramatic drop in timer completion rate (76% -> 20%)
> * Over 99% of timers fail to expire when timer migration is disabled in 6.13
> Root Cause: commit 513793bc6ab3 "posix-timers: Make signal delivery consistent"
> Impact: Timer signal delivery mechanism broken
> Reproducer: stress-ng --timer workload on any system.
That does:
arm_timer()
{
timer.it_value.tv_sec = ...;
timer.it_value.tv_nsec = ...;
timer.it_interval.tv_sec = timer.it_value.tv_sec;
timer.it_interval.tv_nsec = timer.it_value.tv_nsec;
timer_settime(....&timer);
}
and in the signal handler it does:
...
timer_getoverrun();
arm_timer();
So from the kernel POV this means:
user space starts timer
arm_timer()
....
hrtimer_start()
...
hrtimer_expire()
raise_signal()
signal_delivery()
if (interval > 0)
#1 hrtimer_start()
user space signal_handler()
arm_timer()
hrtimer_cancel();
#2 clear pending and overrun
hrtimer_start();
So it's exactly doing what user space asks for.
Older kernels accounted for overruns and pending signals which might
have accumulated between #1 and #2, which is undefined behaviour as user
space cannot longer differentiate to which arming the expiry or the
overruns belong.
So clearing it when rearmed is the obvious correct thing to do because
it makes it consistent, no?
The same applies for the disarm scenario:
arm_timer()
...
expires()
raise_signal()
disarm_timer()
...
discard signal
Older kernels did not discard it, but that makes zero sense because
after disarming the timer both the signal and the overrun becomes
immediately meaningless, no?
And this has nothing to do with timer migration or whatever, that's just
a matter of correctness.
If you can point me to a real world workload, which uses timers
correctly and does not just do random stuff with them, I'm happy to look
into it.
But this stress-ng thing is just made up nonsense which created bogus
statistics forever. So comparing bogus numbers is not an indicator for
a real regression.
Thanks,
tglx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
2025-08-24 9:44 ` Thomas Gleixner
@ 2025-08-25 7:14 ` Thomas Gleixner
2025-08-25 11:35 ` Jirka Hladky
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2025-08-25 7:14 UTC (permalink / raw)
To: Jirka Hladky, linux-kernel, john.stultz, anna-maria
Cc: Philip Auld, Prarit Bhargava, Luis Goncalves, Miroslav Lichvar,
Luke Yang, Jan Jurca, Joe Mario
On Sun, Aug 24 2025 at 11:44, Thomas Gleixner wrote:
> On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> And this has nothing to do with timer migration or whatever, that's just
> a matter of correctness.
Just to come back to timer migration. That's completely irrelevant here
because /proc/sys/kernel/timer_migration only affects the timer wheel
and _not_ hrtimers, which are used here.
And just a few more comments about your findings:
> grep -c hrtimer_start hrtimer*txt
> 6.12: 10898132
> 6.13: 17105314
>
> grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
> hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
> 6.12: 8358469
> 6.13: 3476757
>
> The number of timers started increased significantly in 6.13, but most
> timers do not expire. Completion rate went down from 76% to 20%
Did you actually look _which_ timers were started and which ones did
expire and which ones not?
Data for a 2 seconds run (couldn't be bothered to wait 23 seconds)
On 6.10:
All start/expire:
# grep -c 'hrtimer_start' t.txt
248039
# grep -c 'hrtimer_expire' t.txt
247530
stress-ng Posix timer related:
# grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
246739
# grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
246739
stress-ng nanosleep related:
# grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
2
# grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
2
On 6.17-rc1:
All start/expire:
# grep -c 'hrtimer_start' t.txt
457456
# grep -c 'hrtimer_expire' t.txt
304959
stress-ng Posix timer related:
# grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
304673
# grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
304674
stress-ng nanosleep related:
# grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
152241
# grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
1
The 150k timers which do not expire are related to the restarted
nanosleep(), because the nanosleep is canceled due to the signal and has
to be re-started.
On 6.10 that does not even reach the nanosleep in the test thread
because the thing is too busy with bogus signal handling.
Trace for 6.10
stress-ng-timer-2229 [110] ..... 187.938505: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
stress-ng-timer-2229 [110] d..2. 187.938505: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716941003 softexpires=186716941003 mode=ABS
Signal handler re-arms the timer
stress-ng-timer-2229 [110] ..... 187.938505: sys_timer_settime -> 0x0
stress-ng-timer-2229 [110] ..... 187.938506: sys_rt_sigreturn()
Returns from signal handler
stress-ng-timer-2229 [110] d..1. 187.938506: posixtimer_rearm <-dequeue_signal
Dequeues the signal which was related to the arming _before_ the signal
handler re-arms it. So it's incorrectly delivered.
stress-ng-timer-2229 [110] d.h.. 187.938507: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716941468
Now the timer which was armed in the signal handler above expires
stress-ng-timer-2229 [110] ..... 187.938507: sys_timer_getoverrun(timer_id: 0)
stress-ng-timer-2229 [110] ..... 187.938507: sys_timer_getoverrun -> 0x0
While the signal handler handles the bogus left over signal
Lather, rinse and repeat.
stress-ng-timer-2229 [110] ..... 187.938508: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
stress-ng-timer-2229 [110] d..2. 187.938508: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716943483 softexpires=186716943483 mode=ABS
stress-ng-timer-2229 [110] ..... 187.938508: sys_timer_settime -> 0x0
stress-ng-timer-2229 [110] ..... 187.938508: sys_rt_sigreturn()
stress-ng-timer-2229 [110] d..1. 187.938508: posixtimer_rearm <-dequeue_signal
stress-ng-timer-2229 [110] d.h.. 187.938509: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716943952
vs. 6.17
stress-ng-timer-1828 [029] ..... 84.089978: sys_rt_sigreturn()
stress-ng-timer-1828 [029] d..1. 84.089979: posixtimer_deliver_signal <-dequeue_signal
Signal, which was generated by the original armed timer is correctly ignored
stress-ng-timer-1828 [029] d..1. 84.089979: hrtimer_start: hrtimer=0000000081582a37 function=hrtimer_wakeup expires=83144889279 softexpires=83144839279 mode=REL
Nanosleep is restarted
<idle>-0 [029] d.h1. 84.089980: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134840265
Timer which was armed in the signal handler expires
stress-ng-timer-1828 [029] d..1. 84.089981: posixtimer_deliver_signal <-dequeue_signal
Signal is delivered and timer is re-armed:
stress-ng-timer-1828 [029] d..2. 84.089981: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134842396 softexpires=83134842396 mode=ABS
Signal is handled
stress-ng-timer-1828 [029] ..... 84.089982: sys_timer_getoverrun(timer_id: 0)
stress-ng-timer-1828 [029] ..... 84.089982: sys_timer_getoverrun -> 0x2
stress-ng-timer-1828 [029] d.h.. 84.089983: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134842856
Re-armed timer expires and queues a signal
stress-ng-timer-1828 [029] ..... 84.089983: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7cccf7dec0, old_setting: 0)
Timer is re-armed
stress-ng-timer-1828 [029] d..2. 84.089983: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134844444 softexpires=83134844444 mode=ABS
stress-ng-timer-1828 [029] ..... 84.089983: sys_timer_settime -> 0x0
stress-ng-timer-1828 [029] ..... 84.089983: sys_rt_sigreturn()
Signal, which was generated by the timer armed on signal dequeue is
correctly ignored
stress-ng-timer-1828 [029] d..1. 84.089984: posixtimer_deliver_signal <-dequeue_signal
Thanks,
tglx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
2025-08-25 7:14 ` Thomas Gleixner
@ 2025-08-25 11:35 ` Jirka Hladky
2025-08-25 20:04 ` Jirka Hladky
0 siblings, 1 reply; 5+ messages in thread
From: Jirka Hladky @ 2025-08-25 11:35 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-kernel, john.stultz, anna-maria, Philip Auld,
Prarit Bhargava, Luis Goncalves, Miroslav Lichvar, Luke Yang,
Jan Jurca, Joe Mario
Hallo Thomas,
Vielen Dank for the detailed analysis, explanations, and insights. You
are right, I fully agree with your conclusions.
I apologize for reporting this as a regression. I've filed an issue
with stress-ng: https://github.com/ColinIanKing/stress-ng/issues/560
to request improving or documenting the stress-ng timer test
accordingly.
Liebe Grüße,
Jirka
On Mon, Aug 25, 2025 at 9:15 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sun, Aug 24 2025 at 11:44, Thomas Gleixner wrote:
> > On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> > And this has nothing to do with timer migration or whatever, that's just
> > a matter of correctness.
>
> Just to come back to timer migration. That's completely irrelevant here
> because /proc/sys/kernel/timer_migration only affects the timer wheel
> and _not_ hrtimers, which are used here.
>
> And just a few more comments about your findings:
>
> > grep -c hrtimer_start hrtimer*txt
> > 6.12: 10898132
> > 6.13: 17105314
> >
> > grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
> > hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
> > 6.12: 8358469
> > 6.13: 3476757
> >
> > The number of timers started increased significantly in 6.13, but most
> > timers do not expire. Completion rate went down from 76% to 20%
>
> Did you actually look _which_ timers were started and which ones did
> expire and which ones not?
>
> Data for a 2 seconds run (couldn't be bothered to wait 23 seconds)
>
> On 6.10:
>
> All start/expire:
>
> # grep -c 'hrtimer_start' t.txt
> 248039
> # grep -c 'hrtimer_expire' t.txt
> 247530
>
> stress-ng Posix timer related:
>
> # grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
> 246739
> # grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
> 246739
>
> stress-ng nanosleep related:
>
> # grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
> 2
> # grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
> 2
>
> On 6.17-rc1:
>
> All start/expire:
>
> # grep -c 'hrtimer_start' t.txt
> 457456
> # grep -c 'hrtimer_expire' t.txt
> 304959
>
> stress-ng Posix timer related:
>
> # grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
> 304673
> # grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
> 304674
>
> stress-ng nanosleep related:
>
> # grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
> 152241
> # grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
> 1
>
> The 150k timers which do not expire are related to the restarted
> nanosleep(), because the nanosleep is canceled due to the signal and has
> to be re-started.
>
> On 6.10 that does not even reach the nanosleep in the test thread
> because the thing is too busy with bogus signal handling.
>
> Trace for 6.10
>
> stress-ng-timer-2229 [110] ..... 187.938505: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
> stress-ng-timer-2229 [110] d..2. 187.938505: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716941003 softexpires=186716941003 mode=ABS
>
> Signal handler re-arms the timer
>
> stress-ng-timer-2229 [110] ..... 187.938505: sys_timer_settime -> 0x0
> stress-ng-timer-2229 [110] ..... 187.938506: sys_rt_sigreturn()
>
> Returns from signal handler
>
> stress-ng-timer-2229 [110] d..1. 187.938506: posixtimer_rearm <-dequeue_signal
>
> Dequeues the signal which was related to the arming _before_ the signal
> handler re-arms it. So it's incorrectly delivered.
>
> stress-ng-timer-2229 [110] d.h.. 187.938507: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716941468
>
> Now the timer which was armed in the signal handler above expires
>
> stress-ng-timer-2229 [110] ..... 187.938507: sys_timer_getoverrun(timer_id: 0)
> stress-ng-timer-2229 [110] ..... 187.938507: sys_timer_getoverrun -> 0x0
>
> While the signal handler handles the bogus left over signal
>
> Lather, rinse and repeat.
>
> stress-ng-timer-2229 [110] ..... 187.938508: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
> stress-ng-timer-2229 [110] d..2. 187.938508: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716943483 softexpires=186716943483 mode=ABS
> stress-ng-timer-2229 [110] ..... 187.938508: sys_timer_settime -> 0x0
> stress-ng-timer-2229 [110] ..... 187.938508: sys_rt_sigreturn()
> stress-ng-timer-2229 [110] d..1. 187.938508: posixtimer_rearm <-dequeue_signal
> stress-ng-timer-2229 [110] d.h.. 187.938509: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716943952
>
> vs. 6.17
>
> stress-ng-timer-1828 [029] ..... 84.089978: sys_rt_sigreturn()
> stress-ng-timer-1828 [029] d..1. 84.089979: posixtimer_deliver_signal <-dequeue_signal
>
> Signal, which was generated by the original armed timer is correctly ignored
>
> stress-ng-timer-1828 [029] d..1. 84.089979: hrtimer_start: hrtimer=0000000081582a37 function=hrtimer_wakeup expires=83144889279 softexpires=83144839279 mode=REL
>
> Nanosleep is restarted
>
> <idle>-0 [029] d.h1. 84.089980: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134840265
>
> Timer which was armed in the signal handler expires
>
> stress-ng-timer-1828 [029] d..1. 84.089981: posixtimer_deliver_signal <-dequeue_signal
>
> Signal is delivered and timer is re-armed:
>
> stress-ng-timer-1828 [029] d..2. 84.089981: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134842396 softexpires=83134842396 mode=ABS
>
> Signal is handled
>
> stress-ng-timer-1828 [029] ..... 84.089982: sys_timer_getoverrun(timer_id: 0)
> stress-ng-timer-1828 [029] ..... 84.089982: sys_timer_getoverrun -> 0x2
> stress-ng-timer-1828 [029] d.h.. 84.089983: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134842856
>
> Re-armed timer expires and queues a signal
>
> stress-ng-timer-1828 [029] ..... 84.089983: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7cccf7dec0, old_setting: 0)
>
> Timer is re-armed
>
> stress-ng-timer-1828 [029] d..2. 84.089983: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134844444 softexpires=83134844444 mode=ABS
> stress-ng-timer-1828 [029] ..... 84.089983: sys_timer_settime -> 0x0
> stress-ng-timer-1828 [029] ..... 84.089983: sys_rt_sigreturn()
>
> Signal, which was generated by the timer armed on signal dequeue is
> correctly ignored
>
> stress-ng-timer-1828 [029] d..1. 84.089984: posixtimer_deliver_signal <-dequeue_signal
>
>
> Thanks,
>
> tglx
>
--
-Jirka
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
2025-08-25 11:35 ` Jirka Hladky
@ 2025-08-25 20:04 ` Jirka Hladky
0 siblings, 0 replies; 5+ messages in thread
From: Jirka Hladky @ 2025-08-25 20:04 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-kernel, john.stultz, anna-maria, Philip Auld,
Prarit Bhargava, Luis Goncalves, Miroslav Lichvar, Luke Yang,
Jan Jurca, Joe Mario
Let me close a loop on this one.
Colin Ian King has updated the timer test in Stress_ng accordingly:
https://github.com/ColinIanKing/stress-ng/commit/048b4cb1bbc83cee377330b20e1bd366b651e4ad
Thank you, everybody!
Jirka
On Mon, Aug 25, 2025 at 1:35 PM Jirka Hladky <jhladky@redhat.com> wrote:
>
> Hallo Thomas,
>
> Vielen Dank for the detailed analysis, explanations, and insights. You
> are right, I fully agree with your conclusions.
>
> I apologize for reporting this as a regression. I've filed an issue
> with stress-ng: https://github.com/ColinIanKing/stress-ng/issues/560
> to request improving or documenting the stress-ng timer test
> accordingly.
>
> Liebe Grüße,
> Jirka
>
>
> On Mon, Aug 25, 2025 at 9:15 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Sun, Aug 24 2025 at 11:44, Thomas Gleixner wrote:
> > > On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> > > And this has nothing to do with timer migration or whatever, that's just
> > > a matter of correctness.
> >
> > Just to come back to timer migration. That's completely irrelevant here
> > because /proc/sys/kernel/timer_migration only affects the timer wheel
> > and _not_ hrtimers, which are used here.
> >
> > And just a few more comments about your findings:
> >
> > > grep -c hrtimer_start hrtimer*txt
> > > 6.12: 10898132
> > > 6.13: 17105314
> > >
> > > grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
> > > hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
> > > 6.12: 8358469
> > > 6.13: 3476757
> > >
> > > The number of timers started increased significantly in 6.13, but most
> > > timers do not expire. Completion rate went down from 76% to 20%
> >
> > Did you actually look _which_ timers were started and which ones did
> > expire and which ones not?
> >
> > Data for a 2 seconds run (couldn't be bothered to wait 23 seconds)
> >
> > On 6.10:
> >
> > All start/expire:
> >
> > # grep -c 'hrtimer_start' t.txt
> > 248039
> > # grep -c 'hrtimer_expire' t.txt
> > 247530
> >
> > stress-ng Posix timer related:
> >
> > # grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
> > 246739
> > # grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
> > 246739
> >
> > stress-ng nanosleep related:
> >
> > # grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
> > 2
> > # grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
> > 2
> >
> > On 6.17-rc1:
> >
> > All start/expire:
> >
> > # grep -c 'hrtimer_start' t.txt
> > 457456
> > # grep -c 'hrtimer_expire' t.txt
> > 304959
> >
> > stress-ng Posix timer related:
> >
> > # grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
> > 304673
> > # grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
> > 304674
> >
> > stress-ng nanosleep related:
> >
> > # grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
> > 152241
> > # grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
> > 1
> >
> > The 150k timers which do not expire are related to the restarted
> > nanosleep(), because the nanosleep is canceled due to the signal and has
> > to be re-started.
> >
> > On 6.10 that does not even reach the nanosleep in the test thread
> > because the thing is too busy with bogus signal handling.
> >
> > Trace for 6.10
> >
> > stress-ng-timer-2229 [110] ..... 187.938505: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
> > stress-ng-timer-2229 [110] d..2. 187.938505: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716941003 softexpires=186716941003 mode=ABS
> >
> > Signal handler re-arms the timer
> >
> > stress-ng-timer-2229 [110] ..... 187.938505: sys_timer_settime -> 0x0
> > stress-ng-timer-2229 [110] ..... 187.938506: sys_rt_sigreturn()
> >
> > Returns from signal handler
> >
> > stress-ng-timer-2229 [110] d..1. 187.938506: posixtimer_rearm <-dequeue_signal
> >
> > Dequeues the signal which was related to the arming _before_ the signal
> > handler re-arms it. So it's incorrectly delivered.
> >
> > stress-ng-timer-2229 [110] d.h.. 187.938507: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716941468
> >
> > Now the timer which was armed in the signal handler above expires
> >
> > stress-ng-timer-2229 [110] ..... 187.938507: sys_timer_getoverrun(timer_id: 0)
> > stress-ng-timer-2229 [110] ..... 187.938507: sys_timer_getoverrun -> 0x0
> >
> > While the signal handler handles the bogus left over signal
> >
> > Lather, rinse and repeat.
> >
> > stress-ng-timer-2229 [110] ..... 187.938508: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
> > stress-ng-timer-2229 [110] d..2. 187.938508: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716943483 softexpires=186716943483 mode=ABS
> > stress-ng-timer-2229 [110] ..... 187.938508: sys_timer_settime -> 0x0
> > stress-ng-timer-2229 [110] ..... 187.938508: sys_rt_sigreturn()
> > stress-ng-timer-2229 [110] d..1. 187.938508: posixtimer_rearm <-dequeue_signal
> > stress-ng-timer-2229 [110] d.h.. 187.938509: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716943952
> >
> > vs. 6.17
> >
> > stress-ng-timer-1828 [029] ..... 84.089978: sys_rt_sigreturn()
> > stress-ng-timer-1828 [029] d..1. 84.089979: posixtimer_deliver_signal <-dequeue_signal
> >
> > Signal, which was generated by the original armed timer is correctly ignored
> >
> > stress-ng-timer-1828 [029] d..1. 84.089979: hrtimer_start: hrtimer=0000000081582a37 function=hrtimer_wakeup expires=83144889279 softexpires=83144839279 mode=REL
> >
> > Nanosleep is restarted
> >
> > <idle>-0 [029] d.h1. 84.089980: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134840265
> >
> > Timer which was armed in the signal handler expires
> >
> > stress-ng-timer-1828 [029] d..1. 84.089981: posixtimer_deliver_signal <-dequeue_signal
> >
> > Signal is delivered and timer is re-armed:
> >
> > stress-ng-timer-1828 [029] d..2. 84.089981: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134842396 softexpires=83134842396 mode=ABS
> >
> > Signal is handled
> >
> > stress-ng-timer-1828 [029] ..... 84.089982: sys_timer_getoverrun(timer_id: 0)
> > stress-ng-timer-1828 [029] ..... 84.089982: sys_timer_getoverrun -> 0x2
> > stress-ng-timer-1828 [029] d.h.. 84.089983: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134842856
> >
> > Re-armed timer expires and queues a signal
> >
> > stress-ng-timer-1828 [029] ..... 84.089983: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7cccf7dec0, old_setting: 0)
> >
> > Timer is re-armed
> >
> > stress-ng-timer-1828 [029] d..2. 84.089983: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134844444 softexpires=83134844444 mode=ABS
> > stress-ng-timer-1828 [029] ..... 84.089983: sys_timer_settime -> 0x0
> > stress-ng-timer-1828 [029] ..... 84.089983: sys_rt_sigreturn()
> >
> > Signal, which was generated by the timer armed on signal dequeue is
> > correctly ignored
> >
> > stress-ng-timer-1828 [029] d..1. 84.089984: posixtimer_deliver_signal <-dequeue_signal
> >
> >
> > Thanks,
> >
> > tglx
> >
>
>
> --
> -Jirka
--
-Jirka
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-08-25 20:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-16 16:38 [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent" Jirka Hladky
2025-08-24 9:44 ` Thomas Gleixner
2025-08-25 7:14 ` Thomas Gleixner
2025-08-25 11:35 ` Jirka Hladky
2025-08-25 20:04 ` Jirka Hladky
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.