All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Jirka Hladky <jhladky@redhat.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	john.stultz@linaro.org, anna-maria@linutronix.de
Cc: Philip Auld <pauld@redhat.com>,
	Prarit Bhargava <prarit@redhat.com>,
	Luis Goncalves <lgoncalv@redhat.com>,
	Miroslav Lichvar <mlichvar@redhat.com>,
	Luke Yang <luyang@redhat.com>, Jan Jurca <jjurca@redhat.com>,
	Joe Mario <jmario@redhat.com>
Subject: Re: [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
Date: Mon, 25 Aug 2025 09:14:58 +0200	[thread overview]
Message-ID: <87cy8j3ma5.ffs@tglx> (raw)
In-Reply-To: <87sehh2gw8.ffs@tglx>

On Sun, Aug 24 2025 at 11:44, Thomas Gleixner wrote:
> On Sat, Aug 16 2025 at 18:38, Jirka Hladky wrote:
> And this has nothing to do with timer migration or whatever, that's just
> a matter of correctness.

Just to come back to timer migration. That's completely irrelevant here
because /proc/sys/kernel/timer_migration only affects the timer wheel
and _not_ hrtimers, which are used here.

And just a few more comments about your findings:

> grep -c hrtimer_start hrtimer*txt
> 6.12: 10898132
> 6.13: 17105314
> 
> grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
> hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
> 6.12: 8358469
> 6.13: 3476757
> 
> The number of timers started increased significantly in 6.13, but most
> timers do not expire. Completion rate went down from 76% to 20%

Did you actually look _which_ timers were started and which ones did
expire and which ones not?

Data for a 2 seconds run (couldn't be bothered to wait 23 seconds)

On 6.10:

All start/expire:

# grep -c 'hrtimer_start' t.txt 
248039
# grep -c 'hrtimer_expire' t.txt 
247530

stress-ng Posix timer related:

# grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt 
246739
# grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt 
246739

stress-ng nanosleep related:

# grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt 
2
# grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt 
2

On 6.17-rc1:

All start/expire:

# grep -c 'hrtimer_start' t.txt 
457456
# grep -c 'hrtimer_expire' t.txt 
304959

stress-ng Posix timer related:

# grep -c 'hrtimer_start.*function=posix_timer_fn' t.txt 
304673
# grep -c 'hrtimer_expire.*function=posix_timer_fn' t.txt 
304674

stress-ng nanosleep related:

# grep -c 'hrtimer_start.*function=hrtimer_wakeup' t.txt 
152241
# grep -c 'hrtimer_expire.*function=hrtimer_wakeup' t.txt 
1

The 150k timers which do not expire are related to the restarted
nanosleep(), because the nanosleep is canceled due to the signal and has
to be re-started.

On 6.10 that does not even reach the nanosleep in the test thread
because the thing is too busy with bogus signal handling.

Trace for 6.10

 stress-ng-timer-2229    [110] .....   187.938505: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
 stress-ng-timer-2229    [110] d..2.   187.938505: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716941003 softexpires=186716941003 mode=ABS

Signal handler re-arms the timer

 stress-ng-timer-2229    [110] .....   187.938505: sys_timer_settime -> 0x0
 stress-ng-timer-2229    [110] .....   187.938506: sys_rt_sigreturn()

Returns from signal handler

 stress-ng-timer-2229    [110] d..1.   187.938506: posixtimer_rearm <-dequeue_signal

Dequeues the signal which was related to the arming _before_ the signal
handler re-arms it. So it's incorrectly delivered.

 stress-ng-timer-2229    [110] d.h..   187.938507: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716941468

Now the timer which was armed in the signal handler above expires

 stress-ng-timer-2229    [110] .....   187.938507: sys_timer_getoverrun(timer_id: 0)
 stress-ng-timer-2229    [110] .....   187.938507: sys_timer_getoverrun -> 0x0

While the signal handler handles the bogus left over signal

Lather, rinse and repeat.

 stress-ng-timer-2229    [110] .....   187.938508: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7880228ec0, old_setting: 0)
 stress-ng-timer-2229    [110] d..2.   187.938508: hrtimer_start: hrtimer=0000000023e2c3e0 function=posix_timer_fn expires=186716943483 softexpires=186716943483 mode=ABS
 stress-ng-timer-2229    [110] .....   187.938508: sys_timer_settime -> 0x0
 stress-ng-timer-2229    [110] .....   187.938508: sys_rt_sigreturn()
 stress-ng-timer-2229    [110] d..1.   187.938508: posixtimer_rearm <-dequeue_signal
 stress-ng-timer-2229    [110] d.h..   187.938509: hrtimer_expire_entry: hrtimer=0000000023e2c3e0 function=posix_timer_fn now=186716943952

vs. 6.17

 stress-ng-timer-1828    [029] .....    84.089978: sys_rt_sigreturn()
 stress-ng-timer-1828    [029] d..1.    84.089979: posixtimer_deliver_signal <-dequeue_signal

Signal, which was generated by the original armed timer is correctly ignored

 stress-ng-timer-1828    [029] d..1.    84.089979: hrtimer_start: hrtimer=0000000081582a37 function=hrtimer_wakeup expires=83144889279 softexpires=83144839279 mode=REL

Nanosleep is restarted

          <idle>-0       [029] d.h1.    84.089980: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134840265

Timer which was armed in the signal handler expires

 stress-ng-timer-1828    [029] d..1.    84.089981: posixtimer_deliver_signal <-dequeue_signal

Signal is delivered and timer is re-armed:

 stress-ng-timer-1828    [029] d..2.    84.089981: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134842396 softexpires=83134842396 mode=ABS

Signal is handled

 stress-ng-timer-1828    [029] .....    84.089982: sys_timer_getoverrun(timer_id: 0)
 stress-ng-timer-1828    [029] .....    84.089982: sys_timer_getoverrun -> 0x2
 stress-ng-timer-1828    [029] d.h..    84.089983: hrtimer_expire_entry: hrtimer=000000009e0c5084 function=posix_timer_fn now=83134842856

Re-armed timer expires and queues a signal

 stress-ng-timer-1828    [029] .....    84.089983: sys_timer_settime(timer_id: 0, flags: 0, new_setting: 7f7cccf7dec0, old_setting: 0)

Timer is re-armed

 stress-ng-timer-1828    [029] d..2.    84.089983: hrtimer_start: hrtimer=000000009e0c5084 function=posix_timer_fn expires=83134844444 softexpires=83134844444 mode=ABS
 stress-ng-timer-1828    [029] .....    84.089983: sys_timer_settime -> 0x0
 stress-ng-timer-1828    [029] .....    84.089983: sys_rt_sigreturn()

Signal, which was generated by the timer armed on signal dequeue is
correctly ignored

 stress-ng-timer-1828    [029] d..1.    84.089984: posixtimer_deliver_signal <-dequeue_signal


Thanks,

        tglx


  reply	other threads:[~2025-08-25  7:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-16 16:38 [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent" Jirka Hladky
2025-08-24  9:44 ` Thomas Gleixner
2025-08-25  7:14   ` Thomas Gleixner [this message]
2025-08-25 11:35     ` Jirka Hladky
2025-08-25 20:04       ` Jirka Hladky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cy8j3ma5.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=anna-maria@linutronix.de \
    --cc=jhladky@redhat.com \
    --cc=jjurca@redhat.com \
    --cc=jmario@redhat.com \
    --cc=john.stultz@linaro.org \
    --cc=lgoncalv@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luyang@redhat.com \
    --cc=mlichvar@redhat.com \
    --cc=pauld@redhat.com \
    --cc=prarit@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.