All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent"
@ 2025-08-16 16:38 Jirka Hladky
  2025-08-24  9:44 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Jirka Hladky @ 2025-08-16 16:38 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, john.stultz, anna-maria
  Cc: Philip Auld, Prarit Bhargava, Luis Goncalves, Miroslav Lichvar,
	Luke Yang, Jan Jurca, Joe Mario

Hello,

I'm reporting a performance regression in kernel 6.13 that causes a
76% performance loss in timer-heavy workloads. Through kernel
bisection, we have identified the root cause as commit
513793bc6ab331b947111e8efaf8fcef33fb83e5.

Summary

Regression: 76% performance drop in applications using nanosleep()/POSIX timers
 * 4.3x increase in timer overruns and voluntary context switches
  * Dramatic drop in timer completion rate (76% -> 20%)
  * Over 99% of timers fail to expire when timer migration is disabled in 6.13
Root Cause: commit 513793bc6ab3 "posix-timers: Make signal delivery consistent"
Impact: Timer signal delivery mechanism broken
Reproducer: stress-ng --timer workload on any system.

/usr/bin/time -v ./stress-ng --timer 1 -t 23 --verbose --metrics-brief
--yaml /dev/stdout 2>&1 | tee $(uname -r)_timer.log
grep -Poh 'bogo-ops-per-second-real-time: \K[0-9.]+' $(uname -r)_timer.log

6.12 kernel:
User time (seconds): 9.71
Percent of CPU this job got: 73%
stress-ng: metrc: [39351] stressor       bogo ops real time  usr time
sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [39351]                           (secs)    (secs)
 (secs)   (real time) (usr+sys time)
stress-ng: metrc: [39351] timer          11253022     23.01      9.71
    7.01    489125.18      673113.26
timer: 3655093 timer overruns (instance 0)
Voluntary context switches: 720747

6.13 kernel:
User time (seconds): 4.02
Percent of CPU this job got: 28%
stress-ng: metrc: [5416] stressor       bogo ops real time  usr time
sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [5416]                           (secs)    (secs)
(secs)   (real time) (usr+sys time)
stress-ng: metrc: [5416] timer           3103864     23.00      4.02
   2.08    134950.34      509002.47
timer: 15578896 timer overruns (instance 0)
Voluntary context switches: 3100815

CPU utilization dropped significantly, while timer overruns and
voluntary context switches increased by a factor of 4.3x.

It's interesting to examine hrtimer events with perf-record:
perf sched record -e timer:hrtimer_start -e timer:hrtimer_expire_entry
-e timer:hrtimer_expire_exit --output="hrtimer-$(uname -r).perf"
./stress-ng --timer 1 -t 23 --metrics-brief --yaml /dev/stdout
perf sched script -i "hrtimer-$(uname -r).perf" > "hrtimer-$(uname -r).txt"

grep -c hrtimer_start hrtimer*txt
6.12: 10898132
6.13: 17105314

grep -c hrtimer_expire_entry hrtimer-6.12.0-33.el10.x86_64.txt
hrtimer-6.13.0-0.rc2.22.eln144.x86_64.txt
6.12: 8358469
6.13: 3476757

The number of timers started increased significantly in 6.13, but most
timers do not expire. Completion rate went down from 76% to 20%

The next test was to disable timer migrations with the 6.13 kernel:
echo 0 > /proc/sys/kernel/timer_migration

6.13, /proc/sys/kernel/timer_migration set to zero
User time (seconds): 10.42
Percent of CPU this job got: 59%
stress-ng: metrc: [5927] stressor       bogo ops real time  usr time
sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [5927]                           (secs)    (secs)
(secs)   (real time) (usr+sys time)
stress-ng: metrc: [5927] timer           7004133     23.00     10.41
   3.11    304526.98      518257.73
timer: 7102554 timer overruns (instance 0)
Voluntary context switches: 7009365

Results improve, but there is still a 40% performance drop compared to
6.12 (489125 versus 304526 bogo ops/s).

I have also tried to add CPU pinning, but it had almost no effect:
6.13, /proc/sys/kernel/timer_migration set to zero, processed pin to one CPU:
$ taskset -c 10 /usr/bin/time -v ./stress-ng --timer 1 -t 23 --verbose
--metrics-brief 2>&1 | tee $(uname
-r)_timer_timer_migration_off_pinned.log
User time (seconds): 10.34
Percent of CPU this job got: 61%
stress-ng: metrc: [6230] stressor       bogo ops real time  usr time
sys time   bogo ops/s     bogo ops/s
stress-ng: metrc: [6230]                           (secs)    (secs)
(secs)   (real time) (usr+sys time)
stress-ng: metrc: [6230] timer           7129797     23.00     10.33
   3.53    309991.17      514479.47
timer: 7152958 timer overruns (instance 0)
Voluntary context switches: 7128460

Using perf record to trace hrtimer events reveals the following:

Kernel      hrtimer_start    hrtimer_expire_entry    Completion Rate
6.12         10,898,132         8,358,469               76.7%
6.13         17,105,314         3,476,757               20.3%
6.13+mig=0   17,067,784            30,841                0.18%

Over 99% of timers fail to expire properly in 6.13 with timer
migration disabled, indicating broken timer signal delivery.

We have collected results on a dual-socket Intel Emerald Rapids system
with 256 CPUs, but we observed the same problem on other systems as
well. Intel and AMD x86_64, aarch64, and ppc64le are all affected. The
regression is more pronounced on systems with higher CPU counts.

I have additional performance traces, perf data, and test
configurations available if needed for debugging. I'm happy to test
patches or provide more detailed analysis.

We have also tested kernel 6.16, and it behaves the same as kernel 6.13.

Thank you!
Jirka


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-25 20:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-16 16:38 [REGRESSION] 76% performance loss in timer workloads caused by 513793bc6ab3 "posix-timers: Make signal delivery consistent" Jirka Hladky
2025-08-24  9:44 ` Thomas Gleixner
2025-08-25  7:14   ` Thomas Gleixner
2025-08-25 11:35     ` Jirka Hladky
2025-08-25 20:04       ` Jirka Hladky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.