From: Thomas Gleixner <tglx@linutronix.de>
To: Jirka Hladky <jhladky@redhat.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
john.stultz@linaro.org, anna-maria@linutronix.de
Cc: Philip Auld <pauld@redhat.com>,
Prarit Bhargava <prarit@redhat.com>,
Luis Goncalves <lgoncalv@redhat.com>,
Miroslav Lichvar <mlichvar@redhat.com>,
Luke Yang <luyang@redhat.com>, Jan Jurca <jjurca@redhat.com>,
Joe Mario <jmario@redhat.com>
Subject: Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
Date: Mon, 25 Aug 2025 09:14:58 +0200 [thread overview]
Message-ID: <87cy8j3ma5.ffs@tglx> (raw)
In-Reply-To: <87sehh2gw8.ffs@tglx>
On Sun, Aug 24 2025 at 11:44, Thomas Gleixner wrote:
> On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> And this has nothing to do with timer migration or whatever, that's just
> a matter of correctness.
Just to come back to timer migration. That's completely irrelevant here
because /proc/sys/kernel/timer_migration only affects the timer wheel
and _not_ hrtimers, which are used here.
And just a few more comments about your findings:
> grep -c hrtimer_start hrtimer*txt
> 6.12: 10898132
> 6.13: 17105314
>
> grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
> hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
> 6.12: 8358469
> 6.13: 3476757
>
> The number of timers started increased significantly in 6.13, but most
> timers do not expire. Completion rate went down from 76% to 20%
Did you actually look _which_ timers were started and which ones did
expire and which ones not?
Data for a 2 seconds run (couldn't be bothered to wait 23 seconds)
On 6.10:
All start/expire:
# grep -c 'hrtimer_start' t.txt
248039
# grep -c 'hrtimer_expire' t.txt
247530
stress-ng Posix timer related:
# grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
246739
# grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
246739
stress-ng nanosleep related:
# grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
2
# grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
2
On 6.17-rc1:
All start/expire:
# grep -c 'hrtimer_start' t.txt
457456
# grep -c 'hrtimer_expire' t.txt
304959
stress-ng Posix timer related:
# grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt
304673
# grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt
304674
stress-ng nanosleep related:
# grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt
152241
# grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt
1
The 150k timers which do not expire are related to the restarted
nanosleep(), because the nanosleep is canceled due to the signal and has
to be re-started.
On 6.10 that does not even reach the nanosleep in the test thread
because the thing is too busy with bogus signal handling.
Trace for 6.10
stress-ng-timer-2229 [110] ..... 187.938505: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
stress-ng-timer-2229 [110] d..2. 187.938505: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716941003 softexpires=186716941003 mode=ABS
Signal handler re-arms the timer
stress-ng-timer-2229 [110] ..... 187.938505: sys_timer_settime -> 0x0
stress-ng-timer-2229 [110] ..... 187.938506: sys_rt_sigreturn()
Returns from signal handler
stress-ng-timer-2229 [110] d..1. 187.938506: posixtimer_rearm <-dequeue_signal
Dequeues the signal which was related to the arming _before_ the signal
handler re-arms it. So it's incorrectly delivered.
stress-ng-timer-2229 [110] d.h.. 187.938507: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716941468
Now the timer which was armed in the signal handler above expires
stress-ng-timer-2229 [110] ..... 187.938507: sys_timer_getoverrun(timer_id: 0)
stress-ng-timer-2229 [110] ..... 187.938507: sys_timer_getoverrun -> 0x0
While the signal handler handles the bogus left over signal
Lather, rinse and repeat.
stress-ng-timer-2229 [110] ..... 187.938508: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
stress-ng-timer-2229 [110] d..2. 187.938508: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716943483 softexpires=186716943483 mode=ABS
stress-ng-timer-2229 [110] ..... 187.938508: sys_timer_settime -> 0x0
stress-ng-timer-2229 [110] ..... 187.938508: sys_rt_sigreturn()
stress-ng-timer-2229 [110] d..1. 187.938508: posixtimer_rearm <-dequeue_signal
stress-ng-timer-2229 [110] d.h.. 187.938509: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716943952
vs. 6.17
stress-ng-timer-1828 [029] ..... 84.089978: sys_rt_sigreturn()
stress-ng-timer-1828 [029] d..1. 84.089979: posixtimer_deliver_signal <-dequeue_signal
Signal, which was generated by the original armed timer is correctly ignored
stress-ng-timer-1828 [029] d..1. 84.089979: hrtimer_start: hrtimer=0000000081582a37 function=hrtimer_wakeup expires=83144889279 softexpires=83144839279 mode=REL
Nanosleep is restarted
<idle>-0 [029] d.h1. 84.089980: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134840265
Timer which was armed in the signal handler expires
stress-ng-timer-1828 [029] d..1. 84.089981: posixtimer_deliver_signal <-dequeue_signal
Signal is delivered and timer is re-armed:
stress-ng-timer-1828 [029] d..2. 84.089981: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134842396 softexpires=83134842396 mode=ABS
Signal is handled
stress-ng-timer-1828 [029] ..... 84.089982: sys_timer_getoverrun(timer_id: 0)
stress-ng-timer-1828 [029] ..... 84.089982: sys_timer_getoverrun -> 0x2
stress-ng-timer-1828 [029] d.h.. 84.089983: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134842856
Re-armed timer expires and queues a signal
stress-ng-timer-1828 [029] ..... 84.089983: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7cccf7dec0, old_setting: 0)
Timer is re-armed
stress-ng-timer-1828 [029] d..2. 84.089983: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134844444 softexpires=83134844444 mode=ABS
stress-ng-timer-1828 [029] ..... 84.089983: sys_timer_settime -> 0x0
stress-ng-timer-1828 [029] ..... 84.089983: sys_rt_sigreturn()
Signal, which was generated by the timer armed on signal dequeue is
correctly ignored
stress-ng-timer-1828 [029] d..1. 84.089984: posixtimer_deliver_signal <-dequeue_signal
Thanks,
tglx
next prev parent reply other threads:[~2025-08-25 7:15 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-16 16:38 [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent" Jirka Hladky
2025-08-24 9:44 ` Thomas Gleixner
2025-08-25 7:14 ` Thomas Gleixner [this message]
2025-08-25 11:35 ` Jirka Hladky
2025-08-25 20:04 ` Jirka Hladky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cy8j3ma5.ffs@tglx \
--to=tglx@linutronix.de \
--cc=anna-maria@linutronix.de \
--cc=jhladky@redhat.com \
--cc=jjurca@redhat.com \
--cc=jmario@redhat.com \
--cc=john.stultz@linaro.org \
--cc=lgoncalv@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luyang@redhat.com \
--cc=mlichvar@redhat.com \
--cc=pauld@redhat.com \
--cc=prarit@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.