* [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
@ 2025-03-24 6:39 kernel test robot
2025-03-26 8:07 ` Thomas Gleixner
0 siblings, 1 reply; 16+ messages in thread
From: kernel test robot @ 2025-03-24 6:39 UTC (permalink / raw)
To: Thomas Gleixner
Cc: oe-lkp, lkp, linux-kernel, x86, Eric Dumazet, Benjamin Segall,
Frederic Weisbecker, oliver.sang
Hello,
kernel test robot noticed a 36.2% regression of stress-ng.epoll.ops_per_sec on:
commit: 1535cb80286e6fbc834f075039f85274538543c7 ("posix-timers: Improve hash table performance")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git timers/core
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: epoll
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.epoll.ops_per_sec 124.9% improvement |
| test machine | 256 threads 2 sockets GENUINE INTEL(R) XEON(R) (Sierra Forest) with 128G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=epoll |
| | testtime=60s |
+------------------+---------------------------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202503241406.5c9cb80a-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250324/202503241406.5c9cb80a-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/epoll/stress-ng/60s
commit:
feb864ee99 ("posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t")
1535cb8028 ("posix-timers: Improve hash table performance")
feb864ee99a2d8a2 1535cb80286e6fbc834f075039f
---------------- ---------------------------
%stddev %change %stddev
\ | \
2580 ± 7% +113.6% 5512 ± 6% uptime.idle
2.503e+08 ± 19% +1174.1% 3.189e+09 ± 8% cpuidle..time
965881 ± 22% +57.0% 1516318 ± 12% cpuidle..usage
48355626 -33.1% 32367066 ± 19% numa-numastat.node0.local_node
48387809 -33.0% 32398849 ± 19% numa-numastat.node0.numa_hit
47299728 -33.1% 31637160 ± 22% numa-numastat.node1.local_node
47323998 -33.1% 31677662 ± 22% numa-numastat.node1.numa_hit
5.71 ± 20% +72.0 77.67 ± 6% mpstat.cpu.all.idle%
0.36 ± 2% -0.2 0.12 ± 18% mpstat.cpu.all.irq%
88.37 -69.2 19.21 ± 21% mpstat.cpu.all.sys%
4.95 ± 4% -2.7 2.20 ± 15% mpstat.cpu.all.usr%
7.17 ± 25% -58.1% 3.00 mpstat.max_utilization.seconds
6528 ± 8% -80.3% 1282 ± 50% perf-c2c.DRAM.local
25918 -92.8% 1864 ± 31% perf-c2c.DRAM.remote
27117 ± 2% -68.6% 8524 ± 32% perf-c2c.HITM.local
18359 -94.9% 944.58 ± 32% perf-c2c.HITM.remote
45476 -79.2% 9468 ± 32% perf-c2c.HITM.total
8.15 ± 14% +858.2% 78.13 ± 5% vmstat.cpu.id
86.98 -77.4% 19.66 ± 21% vmstat.cpu.sy
98.89 ± 2% -84.1% 15.75 ± 20% vmstat.procs.r
768987 ± 2% -95.1% 37670 ± 17% vmstat.system.cs
209597 ± 2% -71.5% 59763 ± 21% vmstat.system.in
87617311 -36.1% 55996474 ± 18% stress-ng.epoll.ops
1458844 -36.2% 931227 ± 18% stress-ng.epoll.ops_per_sec
23167102 ± 3% -97.7% 541929 ± 20% stress-ng.time.involuntary_context_switches
32989 -5.4% 31202 stress-ng.time.minor_page_faults
5901 -76.9% 1365 ± 20% stress-ng.time.percent_of_cpu_this_job_got
3420 -77.6% 765.67 ± 20% stress-ng.time.system_time
138.74 -58.0% 58.20 ± 19% stress-ng.time.user_time
23490375 ± 2% -96.8% 762147 ± 19% stress-ng.time.voluntary_context_switches
4047201 ± 2% -58.1% 1695268 ± 9% meminfo.Active
4047201 ± 2% -58.1% 1695268 ± 9% meminfo.Active(anon)
6632349 -35.2% 4296645 ± 3% meminfo.Cached
4994255 ± 2% -47.4% 2627103 ± 6% meminfo.Committed_AS
826943 ± 27% -79.0% 173492 ± 7% meminfo.Mapped
9014037 -11.3% 7993381 ± 2% meminfo.Memused
328931 ± 5% +211.1% 1023179 ± 4% meminfo.SUnreclaim
3104430 ± 3% -75.2% 768720 ± 19% meminfo.Shmem
484360 ± 3% +142.8% 1176077 ± 3% meminfo.Slab
1755625 ± 32% -69.8% 529877 ± 58% numa-meminfo.node0.Active
1755625 ± 32% -69.8% 529877 ± 58% numa-meminfo.node0.Active(anon)
377524 ± 48% -81.2% 71066 ± 79% numa-meminfo.node0.Mapped
159257 ± 11% +229.0% 523897 ± 9% numa-meminfo.node0.SUnreclaim
1293413 ± 36% -91.7% 106929 ±192% numa-meminfo.node0.Shmem
231196 ± 11% +158.8% 598232 ± 7% numa-meminfo.node0.Slab
2291699 ± 22% -49.2% 1165268 ± 32% numa-meminfo.node1.Active
2291699 ± 22% -49.2% 1165268 ± 32% numa-meminfo.node1.Active(anon)
453283 ± 25% -77.6% 101593 ± 51% numa-meminfo.node1.Mapped
169108 ± 18% +194.8% 498594 ± 5% numa-meminfo.node1.SUnreclaim
1810819 ± 24% -63.4% 662050 ± 33% numa-meminfo.node1.Shmem
252608 ± 13% +128.5% 577128 ± 6% numa-meminfo.node1.Slab
439128 ± 32% -69.9% 132285 ± 58% numa-vmstat.node0.nr_active_anon
94937 ± 48% -81.2% 17880 ± 79% numa-vmstat.node0.nr_mapped
323574 ± 36% -91.7% 26744 ±192% numa-vmstat.node0.nr_shmem
39869 ± 11% +227.1% 130412 ± 9% numa-vmstat.node0.nr_slab_unreclaimable
439127 ± 32% -69.9% 132285 ± 58% numa-vmstat.node0.nr_zone_active_anon
48387857 -33.0% 32398745 ± 19% numa-vmstat.node0.numa_hit
48355674 -33.1% 32366962 ± 19% numa-vmstat.node0.numa_local
573170 ± 22% -49.2% 291167 ± 32% numa-vmstat.node1.nr_active_anon
113887 ± 25% -77.4% 25724 ± 51% numa-vmstat.node1.nr_mapped
452974 ± 24% -63.5% 165522 ± 33% numa-vmstat.node1.nr_shmem
42276 ± 18% +193.4% 124048 ± 5% numa-vmstat.node1.nr_slab_unreclaimable
573170 ± 22% -49.2% 291167 ± 32% numa-vmstat.node1.nr_zone_active_anon
47324233 -33.1% 31677233 ± 22% numa-vmstat.node1.numa_hit
47299963 -33.1% 31636731 ± 22% numa-vmstat.node1.numa_local
1011581 ± 2% -58.1% 423533 ± 9% proc-vmstat.nr_active_anon
1657746 -35.2% 1074278 ± 3% proc-vmstat.nr_file_pages
208502 ± 27% -79.2% 43425 ± 6% proc-vmstat.nr_mapped
775765 ± 3% -75.2% 192295 ± 19% proc-vmstat.nr_shmem
82134 ± 5% +210.4% 254960 ± 4% proc-vmstat.nr_slab_unreclaimable
1011581 ± 2% -58.1% 423533 ± 9% proc-vmstat.nr_zone_active_anon
22120 ± 43% -70.6% 6501 ± 92% proc-vmstat.numa_hint_faults
12945 ± 53% -89.1% 1410 ±102% proc-vmstat.numa_hint_faults_local
95713574 -33.1% 64078178 ± 18% proc-vmstat.numa_hit
95657121 -33.1% 64005892 ± 18% proc-vmstat.numa_local
45928 ± 22% -45.0% 25255 ± 33% proc-vmstat.numa_pages_migrated
335648 ± 12% -65.5% 115662 ± 36% proc-vmstat.numa_pte_updates
383375 ± 3% -18.7% 311501 ± 4% proc-vmstat.pgfault
45928 ± 22% -45.0% 25255 ± 33% proc-vmstat.pgmigrate_success
1.855e+10 -51.0% 9.092e+09 ± 18% perf-stat.i.branch-instructions
1.252e+08 ± 3% -52.3% 59710873 ± 11% perf-stat.i.branch-misses
39.80 -15.7 24.06 ± 11% perf-stat.i.cache-miss-rate%
2.409e+08 -52.5% 1.144e+08 ± 26% perf-stat.i.cache-misses
798933 ± 2% -95.1% 39139 ± 17% perf-stat.i.context-switches
2.53 -55.6% 1.12 ± 3% perf-stat.i.cpi
2.189e+11 -75.9% 5.273e+10 ± 20% perf-stat.i.cpu-cycles
6147 ± 9% -96.1% 236.94 ± 17% perf-stat.i.cpu-migrations
911.39 -45.2% 499.82 ± 12% perf-stat.i.cycles-between-cache-misses
8.65e+10 -50.0% 4.321e+10 ± 18% perf-stat.i.instructions
0.40 +129.6% 0.92 ± 3% perf-stat.i.ipc
22.06 ± 2% -88.8% 2.46 ± 24% perf-stat.i.metric.K/sec
4581 ± 5% -27.9% 3300 ± 3% perf-stat.i.minor-faults
613522 ± 2% -75.8% 148218 ± 22% perf-stat.i.page-faults
40.00 -13.0 27.00 ± 6% perf-stat.overall.cache-miss-rate%
2.53 -51.9% 1.22 ± 2% perf-stat.overall.cpi
908.65 -48.5% 468.05 ± 6% perf-stat.overall.cycles-between-cache-misses
0.40 +108.0% 0.82 ± 2% perf-stat.overall.ipc
1.825e+10 -50.9% 8.955e+09 ± 18% perf-stat.ps.branch-instructions
1.231e+08 ± 3% -52.2% 58818328 ± 11% perf-stat.ps.branch-misses
2.37e+08 -52.4% 1.128e+08 ± 26% perf-stat.ps.cache-misses
786402 ± 2% -95.1% 38556 ± 17% perf-stat.ps.context-switches
2.153e+11 -75.9% 5.197e+10 ± 20% perf-stat.ps.cpu-cycles
6033 ± 9% -96.1% 233.55 ± 17% perf-stat.ps.cpu-migrations
8.509e+10 -50.0% 4.256e+10 ± 18% perf-stat.ps.instructions
4498 ± 5% -27.8% 3246 ± 3% perf-stat.ps.minor-faults
603733 ± 2% -75.8% 145885 ± 22% perf-stat.ps.page-faults
5.261e+12 -48.8% 2.695e+12 ± 17% perf-stat.total.instructions
1908185 -90.4% 183760 ± 39% sched_debug.cfs_rq:/.avg_vruntime.avg
2085734 ± 2% -82.8% 357857 ± 33% sched_debug.cfs_rq:/.avg_vruntime.max
1755250 ± 3% -93.5% 113981 ± 44% sched_debug.cfs_rq:/.avg_vruntime.min
0.91 ± 8% -75.8% 0.22 ± 13% sched_debug.cfs_rq:/.h_nr_queued.avg
2.29 ± 13% -41.8% 1.33 ± 17% sched_debug.cfs_rq:/.h_nr_queued.max
0.57 ± 5% -25.8% 0.42 ± 6% sched_debug.cfs_rq:/.h_nr_queued.stddev
0.87 ± 8% -75.0% 0.22 ± 13% sched_debug.cfs_rq:/.h_nr_runnable.avg
2.21 ± 14% -45.3% 1.21 ± 20% sched_debug.cfs_rq:/.h_nr_runnable.max
0.56 ± 5% -25.6% 0.42 ± 6% sched_debug.cfs_rq:/.h_nr_runnable.stddev
11.50 ± 11% -100.0% 0.00 sched_debug.cfs_rq:/.load_avg.min
1908185 -90.4% 183760 ± 39% sched_debug.cfs_rq:/.min_vruntime.avg
2085734 ± 2% -82.8% 357857 ± 33% sched_debug.cfs_rq:/.min_vruntime.max
1755250 ± 3% -93.5% 113981 ± 44% sched_debug.cfs_rq:/.min_vruntime.min
0.62 ± 4% -64.5% 0.22 ± 13% sched_debug.cfs_rq:/.nr_queued.avg
0.33 ± 10% +26.7% 0.42 ± 6% sched_debug.cfs_rq:/.nr_queued.stddev
1001 ± 2% -68.0% 320.45 ± 15% sched_debug.cfs_rq:/.runnable_avg.avg
2014 ± 10% -38.4% 1239 ± 9% sched_debug.cfs_rq:/.runnable_avg.max
119.25 ±108% -100.0% 0.00 sched_debug.cfs_rq:/.runnable_avg.min
729.40 ± 2% -56.2% 319.46 ± 15% sched_debug.cfs_rq:/.util_avg.avg
1476 ± 8% -17.0% 1225 ± 9% sched_debug.cfs_rq:/.util_avg.max
90.17 ± 94% -100.0% 0.00 sched_debug.cfs_rq:/.util_avg.min
461.66 ± 14% -81.6% 84.96 ± 31% sched_debug.cfs_rq:/.util_est.avg
1464 ± 14% -48.0% 760.79 ± 13% sched_debug.cfs_rq:/.util_est.max
358.79 ± 7% -39.1% 218.62 ± 17% sched_debug.cfs_rq:/.util_est.stddev
440504 ± 5% +77.0% 779768 ± 4% sched_debug.cpu.avg_idle.avg
7249 ± 9% -32.7% 4875 ± 14% sched_debug.cpu.avg_idle.min
1611 ± 4% -68.5% 507.80 ± 13% sched_debug.cpu.curr->pid.avg
736.86 ± 13% +35.9% 1001 ± 7% sched_debug.cpu.curr->pid.stddev
0.91 ± 7% -75.9% 0.22 ± 13% sched_debug.cpu.nr_running.avg
2.29 ± 13% -41.8% 1.33 ± 17% sched_debug.cpu.nr_running.max
0.57 ± 5% -25.5% 0.42 ± 6% sched_debug.cpu.nr_running.stddev
377262 ± 2% -95.0% 19038 ± 32% sched_debug.cpu.nr_switches.avg
421952 ± 2% -87.1% 54289 ± 29% sched_debug.cpu.nr_switches.max
327735 ± 5% -98.5% 4957 ± 37% sched_debug.cpu.nr_switches.min
22410 ± 34% -55.0% 10080 ± 27% sched_debug.cpu.nr_switches.stddev
0.13 ± 54% -86.1% 0.02 ± 32% sched_debug.cpu.nr_uninterruptible.avg
225.08 ± 27% -92.7% 16.38 ± 44% sched_debug.cpu.nr_uninterruptible.max
-215.92 -94.0% -12.92 sched_debug.cpu.nr_uninterruptible.min
92.39 ± 14% -94.8% 4.79 ± 17% sched_debug.cpu.nr_uninterruptible.stddev
1.07 ± 64% -99.0% 0.01 ±189% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.allocate_slab.___slab_alloc
1.35 ± 43% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
1.02 ± 24% -90.9% 0.09 ± 54% perf-sched.sch_delay.avg.ms.__cond_resched.__dentry_kill.dput.__fput.__x64_sys_close
0.13 ± 41% -98.5% 0.00 ± 48% perf-sched.sch_delay.avg.ms.__cond_resched.__dentry_kill.dput.__fput.task_work_run
1.19 ± 24% -94.5% 0.07 ± 68% perf-sched.sch_delay.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.09 ± 80% -98.9% 0.00 ±122% perf-sched.sch_delay.avg.ms.__cond_resched.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64
0.72 ± 34% -90.0% 0.07 ±122% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
0.14 ± 63% -99.0% 0.00 ±105% perf-sched.sch_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.do_epoll_ctl
0.11 ± 48% -68.9% 0.04 ± 27% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
1.24 ± 44% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
0.14 ± 89% -99.8% 0.00 ±331% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
0.11 ±131% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
1.60 ± 20% -98.5% 0.02 ± 62% perf-sched.sch_delay.avg.ms.__cond_resched.down_write.__sock_release.sock_close.__fput
0.81 ±131% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.shmem_file_write_iter.vfs_write.ksys_write
0.02 ± 80% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
0.14 ±214% -99.0% 0.00 ±103% perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.task_work_run.syscall_exit_to_user_mode
1.60 ± 21% -99.4% 0.01 ± 51% perf-sched.sch_delay.avg.ms.__cond_resched.dput.path_put.unix_release_sock.unix_release
1.27 ± 42% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
0.05 ± 62% -96.3% 0.00 ±104% perf-sched.sch_delay.avg.ms.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
0.27 ± 22% -42.3% 0.16 ± 32% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
1.01 ± 27% -94.1% 0.06 ± 72% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
0.98 ± 25% -93.4% 0.06 ± 43% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
1.20 ± 26% -99.7% 0.00 ± 45% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
0.23 ± 61% -99.0% 0.00 ± 41% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.__ep_eventpoll_poll.isra
1.44 ± 22% -99.8% 0.00 ± 56% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
0.80 ± 23% -91.4% 0.07 ± 54% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
0.09 ± 20% -98.7% 0.00 ±174% perf-sched.sch_delay.avg.ms.__cond_resched.kvfree_rcu_drain_ready.kfree_rcu_monitor.process_one_work.worker_thread
0.12 ± 41% -99.4% 0.00 ±110% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.__ep_eventpoll_poll.isra.0
1.20 ± 29% -99.6% 0.01 ± 48% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
0.12 ± 76% -98.1% 0.00 ± 61% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
0.11 ± 14% -98.0% 0.00 ± 31% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.ep_loop_check_proc.do_epoll_ctl.__x64_sys_epoll_ctl
0.11 ± 32% -97.1% 0.00 ± 60% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.ep_send_events.ep_poll.do_epoll_wait
1.85 ± 32% -99.9% 0.00 ± 92% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.eventpoll_release_file.__fput.__x64_sys_close
0.85 ± 43% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
0.00 ± 10% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
0.04 ± 20% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
0.95 ± 42% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
1.34 ± 52% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
0.05 ± 32% -92.7% 0.00 ± 85% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1.14 ± 79% -100.0% 0.00 ±331% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
0.15 ± 96% -98.0% 0.00 ± 35% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.18 ± 45% -98.9% 0.00 ± 45% perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.87 ± 44% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.13 ± 77% -98.9% 0.01 ± 19% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.13 ± 49% -96.5% 0.00 ± 13% perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.11 ± 38% -94.9% 0.01 ± 14% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.01 ± 41% -56.6% 0.01 ± 13% perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
0.10 ± 77% -99.6% 0.00 ±182% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
0.93 ± 21% -98.2% 0.02 ± 48% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
0.24 ± 25% -52.7% 0.11 ± 54% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
0.29 ± 30% -96.1% 0.01 ±142% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
0.07 ± 23% -100.0% 0.00 perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
0.18 ± 72% -97.9% 0.00 ±120% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
0.05 ± 59% -89.3% 0.01 ± 23% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.26 ± 37% -99.6% 0.00 ±134% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
0.03 ± 4% -80.8% 0.01 ± 35% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.04 ± 4% -86.0% 0.01 ± 25% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.do_epoll_pwait.part
0.15 ± 7% -76.1% 0.04 ± 46% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.do_epoll_ctl
0.17 ± 51% -97.6% 0.00 ± 14% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.05 ± 38% -86.6% 0.01 ± 14% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
0.01 ± 19% -69.8% 0.00 ± 10% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.09 ± 52% -95.3% 0.00 ± 6% perf-sched.sch_delay.avg.ms.schedule_timeout.unix_wait_for_peer.unix_stream_connect.__sys_connect
0.04 ± 50% -92.5% 0.00 ± 33% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
0.05 ± 13% -86.9% 0.01 ± 51% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
5.17 ± 33% -92.6% 0.38 ± 39% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.alloc_skb_with_frags
7.13 ± 43% -99.4% 0.04 ±207% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.allocate_slab.___slab_alloc
9.23 ± 19% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
17.86 ± 75% -96.6% 0.60 ±153% perf-sched.sch_delay.max.ms.__cond_resched.__dentry_kill.dput.__fput.__x64_sys_close
3.17 ± 77% -99.9% 0.00 ± 36% perf-sched.sch_delay.max.ms.__cond_resched.__dentry_kill.dput.__fput.task_work_run
13.74 ± 12% -97.5% 0.35 ± 24% perf-sched.sch_delay.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.51 ± 89% -99.7% 0.00 ±119% perf-sched.sch_delay.max.ms.__cond_resched.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64
9.06 ± 24% -98.2% 0.16 ±112% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
3.82 ± 22% -91.5% 0.33 ± 41% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
2.75 ± 66% -99.9% 0.00 ±101% perf-sched.sch_delay.max.ms.__cond_resched.__mutex_lock.constprop.0.do_epoll_ctl
11.29 ±153% -83.3% 1.88 ± 42% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
7.69 ± 39% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
0.71 ±105% -100.0% 0.00 ±331% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
0.54 ±155% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
14.15 ± 8% -98.0% 0.28 ± 47% perf-sched.sch_delay.max.ms.__cond_resched.down_write.__sock_release.sock_close.__fput
2.28 ±106% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.shmem_file_write_iter.vfs_write.ksys_write
0.13 ± 90% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
18.33 ± 65% -97.3% 0.50 ± 71% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.75 ±199% -99.8% 0.00 ±100% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.task_work_run.syscall_exit_to_user_mode
4.78 ± 30% -91.9% 0.39 ± 32% perf-sched.sch_delay.max.ms.__cond_resched.dput.path_put.unix_find_other.unix_stream_connect
16.34 ± 12% -96.8% 0.52 ± 45% perf-sched.sch_delay.max.ms.__cond_resched.dput.path_put.unix_release_sock.unix_release
10.67 ± 15% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
9.21 ±221% -100.0% 0.00 ±104% perf-sched.sch_delay.max.ms.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
13.01 ± 15% -96.9% 0.40 ± 27% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
11.88 ± 11% -96.6% 0.40 ± 91% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
4.24 ± 30% -91.4% 0.36 ± 28% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
5.20 ± 27% -92.7% 0.38 ± 44% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.sock_wmalloc.unix_stream_connect
2.99 ± 35% -91.4% 0.26 ± 55% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.sock_wmalloc
14.05 ± 13% -97.6% 0.34 ± 36% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
2.62 ± 38% -89.3% 0.28 ± 55% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.do_timer_create.__x64_sys_timer_create.do_syscall_64
13.78 ± 10% -98.7% 0.18 ± 91% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
3.84 ± 78% -99.9% 0.00 ± 36% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.__ep_eventpoll_poll.isra
13.52 ± 18% -98.8% 0.16 ±104% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
3.76 ± 22% -90.9% 0.34 ± 21% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.getname_kernel.kern_path.unix_find_other
13.73 ± 11% -97.3% 0.37 ± 36% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
11.95 ±196% -96.6% 0.41 ± 29% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.sk_prot_alloc.sk_alloc.unix_create1
0.80 ± 36% -99.9% 0.00 ±174% perf-sched.sch_delay.max.ms.__cond_resched.kvfree_rcu_drain_ready.kfree_rcu_monitor.process_one_work.worker_thread
1.59 ± 64% -99.9% 0.00 ±101% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.__ep_eventpoll_poll.isra.0
13.47 ± 12% -98.0% 0.27 ± 65% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
1.76 ± 83% -99.8% 0.00 ± 59% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
6.39 ± 23% -99.9% 0.00 ± 45% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.ep_loop_check_proc.do_epoll_ctl.__x64_sys_epoll_ctl
4.63 ± 39% -97.4% 0.12 ±138% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.ep_send_events.ep_poll.do_epoll_wait
10.35 ± 18% -100.0% 0.00 ± 86% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.eventpoll_release_file.__fput.__x64_sys_close
6.07 ± 19% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
0.01 ± 64% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
0.64 ± 30% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
6.63 ± 16% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
9.38 ± 24% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
0.62 ± 24% -99.4% 0.00 ± 84% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
5.56 ± 60% -100.0% 0.00 ±331% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
1.54 ± 86% -99.7% 0.01 ± 83% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
4.87 ± 53% -99.9% 0.00 ± 33% perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.60 ± 35% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
19.93 ± 56% -96.8% 0.63 ± 99% perf-sched.sch_delay.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
5.69 ± 69% -99.1% 0.05 ±201% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
3.30 ± 66% -98.2% 0.06 ±125% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
2.15 ± 40% -99.5% 0.01 ± 12% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.04 ±111% -75.4% 0.01 ± 14% perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
0.37 ± 80% -99.9% 0.00 ±182% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
14.71 ± 17% -96.1% 0.58 ± 41% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
7.30 ± 13% -94.3% 0.41 ± 31% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
7.52 ± 24% -99.8% 0.01 ±135% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
1.92 ± 61% -100.0% 0.00 perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
6.67 ± 45% -98.0% 0.13 ±154% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
2.15 ± 59% -99.8% 0.00 ±121% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
14.55 ±169% -96.4% 0.53 ± 59% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
4.66 ± 34% -95.8% 0.20 ±210% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
22.51 ±112% -95.9% 0.92 ± 99% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
22.25 ±105% -88.6% 2.53 ± 41% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.do_epoll_pwait.part
12.34 ± 11% -96.4% 0.44 ± 46% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.do_epoll_ctl
3.61 ±101% -99.3% 0.02 ±147% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.33 ± 39% -95.5% 0.01 ± 76% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
2.72 ±123% -99.6% 0.01 ± 34% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
2.91 ± 78% -99.3% 0.02 ± 42% perf-sched.sch_delay.max.ms.schedule_timeout.unix_wait_for_peer.unix_stream_connect.__sys_connect
24.69 ± 84% -95.8% 1.03 ± 62% perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.58 ± 73% -96.0% 0.02 ± 86% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
0.05 ± 2% -84.4% 0.01 ± 50% perf-sched.total_sch_delay.average.ms
0.44 ± 6% +2883.9% 13.24 ± 14% perf-sched.total_wait_and_delay.average.ms
4187000 ± 5% -97.2% 118266 ± 20% perf-sched.total_wait_and_delay.count.ms
0.39 ± 7% +3249.7% 13.23 ± 14% perf-sched.total_wait_time.average.ms
2.22 ± 21% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__dentry_kill.dput.__fput.__x64_sys_close
2.59 ± 22% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.79 ± 25% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
10.43 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
2.60 ± 42% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
1.13 ± 25% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
3.54 ± 17% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.__sock_release.sock_close.__fput
0.07 ± 6% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
3.56 ± 17% -99.2% 0.03 ± 42% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.path_put.unix_release_sock.unix_release
9.85 ± 20% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
2.23 ± 25% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
2.17 ± 21% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
2.70 ± 21% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
3.15 ± 19% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
1.80 ± 19% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
5.06 ± 56% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kvfree_rcu_drain_ready.kfree_rcu_monitor.process_one_work.worker_thread
2.67 ± 25% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
3.94 ± 31% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.eventpoll_release_file.__fput.__x64_sys_close
2.21 ± 67% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
2.15 ± 16% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
98.19 ± 31% +133.2% 228.98 ± 7% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
102.21 ± 58% -56.1% 44.92 ± 12% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
0.12 ± 7% +16208.6% 19.68 ± 23% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.12 ± 6% +1262.0% 1.61 ± 22% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.do_epoll_pwait.part
1.15 ± 22% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
3.48 ± 2% +21.3% 4.22 ± 3% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
79.21 ± 17% +23.8% 98.06 perf-sched.wait_and_delay.avg.ms.schedule_timeout.unix_wait_for_peer.unix_stream_connect.__sys_connect
19.77 ± 21% +920.4% 201.76 ± 45% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
89.17 ± 53% -99.3% 0.58 ±130% perf-sched.wait_and_delay.count.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
1227 ± 12% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__dentry_kill.dput.__fput.__x64_sys_close
587.00 ± 10% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
106.58 ± 11% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
367.67 ± 6% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
20.75 ± 24% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
10.50 ± 48% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
1159 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down_write.__sock_release.sock_close.__fput
146001 ± 16% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
5554 ± 9% -82.9% 948.08 ± 31% perf-sched.wait_and_delay.count.__cond_resched.dput.path_put.unix_release_sock.unix_release
517.58 ± 37% -99.8% 1.08 ± 88% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
630.58 ± 8% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
358.25 ± 20% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
736.92 ± 13% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
668.08 ± 10% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
528.67 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
1350 ± 14% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
36.92 ± 27% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kvfree_rcu_drain_ready.kfree_rcu_monitor.process_one_work.worker_thread
821.33 ± 14% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
72.67 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.eventpoll_release_file.__fput.__x64_sys_close
58.67 ± 27% -99.7% 0.17 ±223% perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
303.75 ± 8% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
118.83 ± 28% -99.7% 0.33 ±187% perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
148.67 ± 45% -99.8% 0.25 ±173% perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
367.75 ± 74% -100.0% 0.17 ±223% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1049759 ± 4% -100.0% 67.08 ±331% perf-sched.wait_and_delay.count.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
93.92 ± 19% +795.8% 841.33 ± 2% perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
1522 ± 8% -100.0% 0.00 perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
754.25 ± 25% -59.8% 303.58 ± 7% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
146.25 ± 41% +148.3% 363.17 ± 10% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
878820 ± 6% -98.9% 9322 ± 22% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1161349 ± 5% -93.3% 77876 ± 21% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.do_epoll_pwait.part
77.50 ± 5% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1434 ± 2% -17.5% 1183 ± 3% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
109.25 ± 18% +662.5% 833.08 ± 2% perf-sched.wait_and_delay.count.schedule_timeout.unix_wait_for_peer.unix_stream_connect.__sys_connect
27163 ± 37% -91.0% 2448 ± 62% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
838738 ± 9% -99.9% 1097 ± 46% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
29065 ± 3% -30.1% 20320 ± 16% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
30.52 ± 32% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__dentry_kill.dput.__fput.__x64_sys_close
27.48 ± 12% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
28.30 ± 82% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
1002 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
15.38 ± 39% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
2.94 ± 39% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
28.70 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.__sock_release.sock_close.__fput
55.40 ±114% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
32.98 ± 12% -96.8% 1.05 ± 45% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.path_put.unix_release_sock.unix_release
635.38 ± 68% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
23.76 ± 11% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
28.32 ± 11% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
27.57 ± 10% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
27.30 ± 17% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
27.89 ± 10% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
46.49 ± 62% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kvfree_rcu_drain_ready.kfree_rcu_monitor.process_one_work.worker_thread
27.07 ± 11% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
21.24 ± 16% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.eventpoll_release_file.__fput.__x64_sys_close
278.04 ±146% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
55.47 ± 39% -99.2% 0.45 ±331% perf-sched.wait_and_delay.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
197.28 ±183% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
9.23 ± 97% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1868 ± 37% +98.4% 3707 ± 14% perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1.28 ± 57% -98.8% 0.02 ±133% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.allocate_slab.___slab_alloc
1.20 ± 19% -91.9% 0.10 ± 52% perf-sched.wait_time.avg.ms.__cond_resched.__dentry_kill.dput.__fput.__x64_sys_close
0.19 ± 35% -97.5% 0.00 ± 47% perf-sched.wait_time.avg.ms.__cond_resched.__dentry_kill.dput.__fput.task_work_run
0.16 ±101% -96.8% 0.01 ±196% perf-sched.wait_time.avg.ms.__cond_resched.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64
1.08 ± 28% -93.2% 0.07 ±120% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
0.18 ± 45% -98.5% 0.00 ±106% perf-sched.wait_time.avg.ms.__cond_resched.__mutex_lock.constprop.0.do_epoll_ctl
1.37 ± 40% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
0.12 ±135% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
1.93 ± 14% -98.4% 0.03 ± 56% perf-sched.wait_time.avg.ms.__cond_resched.down_write.__sock_release.sock_close.__fput
1.09 ±100% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.shmem_file_write_iter.vfs_write.ksys_write
0.16 ±183% -98.1% 0.00 ± 85% perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.task_work_run.syscall_exit_to_user_mode
1.96 ± 14% -99.0% 0.02 ± 38% perf-sched.wait_time.avg.ms.__cond_resched.dput.path_put.unix_release_sock.unix_release
9.80 ± 20% -94.2% 0.57 ±110% perf-sched.wait_time.avg.ms.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
0.31 ± 19% -49.6% 0.16 ± 32% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
1.22 ± 23% -94.6% 0.07 ± 64% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
1.19 ± 19% -94.2% 0.07 ± 40% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
1.50 ± 18% -99.2% 0.01 ± 36% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
0.39 ± 58% -98.5% 0.01 ± 73% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.__ep_eventpoll_poll.isra
1.71 ± 16% -99.4% 0.01 ± 44% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
1.00 ± 16% -92.7% 0.07 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
4.97 ± 57% -95.6% 0.22 ±174% perf-sched.wait_time.avg.ms.__cond_resched.kvfree_rcu_drain_ready.kfree_rcu_monitor.process_one_work.worker_thread
0.20 ± 51% -98.7% 0.00 ±112% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.__ep_eventpoll_poll.isra.0
1.47 ± 21% -99.2% 0.01 ± 36% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
0.23 ± 40% -97.8% 0.01 ± 60% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
0.18 ± 16% -96.3% 0.01 ± 38% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.ep_loop_check_proc.do_epoll_ctl.__x64_sys_epoll_ctl
0.19 ± 28% -94.4% 0.01 ± 45% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.ep_send_events.ep_poll.do_epoll_wait
2.09 ± 29% -99.8% 0.00 ± 58% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.eventpoll_release_file.__fput.__x64_sys_close
0.03 ± 78% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
2.18 ± 69% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
0.54 ±141% -98.7% 0.01 ± 50% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.12 ± 44% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.39 ± 28% -97.1% 0.01 ±142% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
0.08 ± 25% -100.0% 0.00 perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
1.31 ±177% -96.3% 0.05 ±132% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
98.14 ± 31% +133.3% 228.98 ± 7% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
101.95 ± 58% -55.9% 44.92 ± 12% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
0.09 ± 8% +21799.8% 19.67 ± 23% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.08 ± 6% +1892.9% 1.60 ± 22% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.do_epoll_pwait.part
0.23 ± 7% -78.1% 0.05 ± 39% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.do_epoll_ctl
0.98 ± 18% -50.8% 0.48 ± 3% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
3.47 ± 2% +21.6% 4.22 ± 3% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
79.12 ± 17% +23.9% 98.05 perf-sched.wait_time.avg.ms.schedule_timeout.unix_wait_for_peer.unix_stream_connect.__sys_connect
19.75 ± 21% +921.7% 201.74 ± 45% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2.28 ±327% -99.9% 0.00 ± 73% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
5.17 ± 33% -92.6% 0.38 ± 39% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.alloc_skb_with_frags
7.41 ± 38% -99.3% 0.05 ±164% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.allocate_slab.___slab_alloc
18.93 ± 19% -96.8% 0.60 ±153% perf-sched.wait_time.max.ms.__cond_resched.__dentry_kill.dput.__fput.__x64_sys_close
3.79 ± 67% -99.8% 0.01 ± 52% perf-sched.wait_time.max.ms.__cond_resched.__dentry_kill.dput.__fput.task_work_run
0.67 ± 83% -97.4% 0.02 ±269% perf-sched.wait_time.max.ms.__cond_resched.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64
21.06 ±119% -99.2% 0.16 ±112% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
3.82 ± 22% -91.5% 0.33 ± 41% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
3.06 ± 52% -99.8% 0.00 ±147% perf-sched.wait_time.max.ms.__cond_resched.__mutex_lock.constprop.0.do_epoll_ctl
7.74 ± 38% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
2.64 ± 44% -95.1% 0.13 ±331% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
0.55 ±151% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.change_pud_range.isra.0.change_protection_range
20.62 ± 18% -98.4% 0.33 ± 44% perf-sched.wait_time.max.ms.__cond_resched.down_write.__sock_release.sock_close.__fput
2.66 ± 85% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.shmem_file_write_iter.vfs_write.ksys_write
42.10 ±151% -98.8% 0.50 ± 71% perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.79 ±188% -99.6% 0.00 ± 88% perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.task_work_run.syscall_exit_to_user_mode
4.78 ± 30% -91.9% 0.39 ± 32% perf-sched.wait_time.max.ms.__cond_resched.dput.path_put.unix_find_other.unix_stream_connect
22.71 ± 15% -97.1% 0.66 ± 52% perf-sched.wait_time.max.ms.__cond_resched.dput.path_put.unix_release_sock.unix_release
635.37 ± 68% -99.9% 0.65 ±130% perf-sched.wait_time.max.ms.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
19.35 ± 40% -97.9% 0.40 ± 27% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
14.50 ± 18% -97.2% 0.41 ± 89% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
4.24 ± 30% -91.4% 0.36 ± 28% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
13.15 ±199% -97.1% 0.38 ± 44% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.sock_wmalloc.unix_stream_connect
3.04 ± 32% -91.6% 0.26 ± 55% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.sock_wmalloc
17.87 ± 20% -98.1% 0.34 ± 36% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
3.71 ±110% -92.4% 0.28 ± 55% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.do_timer_create.__x64_sys_timer_create.do_syscall_64
18.46 ± 21% -98.2% 0.32 ± 40% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
5.51 ± 63% -99.8% 0.01 ± 63% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.__ep_eventpoll_poll.isra
15.79 ± 17% -98.5% 0.24 ± 63% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
3.76 ± 22% -90.9% 0.34 ± 21% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.getname_kernel.kern_path.unix_find_other
18.23 ± 23% -97.9% 0.38 ± 27% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
12.52 ±186% -96.7% 0.41 ± 29% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.sk_prot_alloc.sk_alloc.unix_create1
46.47 ± 62% -99.5% 0.22 ±174% perf-sched.wait_time.max.ms.__cond_resched.kvfree_rcu_drain_ready.kfree_rcu_monitor.process_one_work.worker_thread
2.18 ± 93% -99.8% 0.00 ±135% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.__ep_eventpoll_poll.isra.0
20.73 ± 20% -98.3% 0.35 ± 40% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
3.11 ± 46% -99.7% 0.01 ±137% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
7.57 ± 31% -99.3% 0.05 ±171% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.ep_loop_check_proc.do_epoll_ctl.__x64_sys_epoll_ctl
6.56 ± 38% -96.4% 0.24 ± 58% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.ep_send_events.ep_poll.do_epoll_wait
11.80 ± 31% -100.0% 0.00 ± 63% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.eventpoll_release_file.__fput.__x64_sys_close
0.12 ±131% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
278.03 ±146% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
28.52 ±234% -99.9% 0.02 ± 99% perf-sched.wait_time.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.08 ± 49% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
47.81 ± 39% -98.7% 0.63 ± 99% perf-sched.wait_time.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
168.89 ±220% -99.5% 0.80 ± 6% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.08 ± 29% -99.9% 0.01 ±135% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
2.02 ± 54% -100.0% 0.00 perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
196.78 ±183% -99.9% 0.13 ±154% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
16.11 ± 25% -97.1% 0.46 ± 37% perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.do_epoll_ctl
6.64 ± 75% -76.5% 1.56 ± 4% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1868 ± 37% +98.4% 3707 ± 14% perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
83.79 ±329% -100.0% 0.02 ± 96% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
***************************************************************************************************
lkp-srf-2sp1: 256 threads 2 sockets GENUINE INTEL(R) XEON(R) (Sierra Forest) with 128G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp1/epoll/stress-ng/60s
commit:
feb864ee99 ("posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t")
1535cb8028 ("posix-timers: Improve hash table performance")
feb864ee99a2d8a2 1535cb80286e6fbc834f075039f
---------------- ---------------------------
%stddev %change %stddev
\ | \
43148 +13.5% 48960 ± 3% uptime.idle
1.055e+09 ± 51% +533.3% 6.682e+09 ± 24% cpuidle..time
538108 ± 15% +1136.2% 6652185 ± 12% cpuidle..usage
19057763 ± 2% +184.7% 54249799 ± 30% numa-numastat.node1.local_node
19247181 ± 2% +182.8% 54435088 ± 30% numa-numastat.node1.numa_hit
2528 ± 8% +139.1% 6046 ± 19% perf-c2c.DRAM.local
14118 ± 2% -59.4% 5735 ± 40% perf-c2c.DRAM.remote
19925 ± 2% +42.1% 28317 ± 26% perf-c2c.HITM.local
12344 ± 3% -72.4% 3407 ± 38% perf-c2c.HITM.remote
8.32 ± 32% +424.2% 43.62 ± 22% vmstat.cpu.id
90.61 ± 2% -39.3% 55.04 ± 17% vmstat.cpu.sy
311.79 ± 4% -46.1% 168.04 ± 16% vmstat.procs.r
890824 ± 2% -87.0% 115560 ± 13% vmstat.system.cs
678218 ± 2% -31.3% 466212 ± 11% vmstat.system.in
5.97 ± 56% +35.8 41.74 ± 24% mpstat.cpu.all.idle%
0.86 ± 3% -0.3 0.51 ± 11% mpstat.cpu.all.irq%
0.10 ± 3% +2.0 2.11 ± 13% mpstat.cpu.all.soft%
92.01 ± 3% -37.7 54.27 ± 18% mpstat.cpu.all.sys%
1.06 ± 3% +0.3 1.37 ± 8% mpstat.cpu.all.usr%
27.83 ± 38% -84.4% 4.33 ± 31% mpstat.max_utilization.seconds
33851562 +125.5% 76322657 ± 6% stress-ng.epoll.ops
563917 +124.9% 1268302 ± 6% stress-ng.epoll.ops_per_sec
28793963 -92.5% 2170167 ± 12% stress-ng.time.involuntary_context_switches
99879 +3.8% 103635 stress-ng.time.minor_page_faults
24802 -43.4% 14035 ± 18% stress-ng.time.percent_of_cpu_this_job_got
14926 -43.8% 8382 ± 18% stress-ng.time.system_time
115.24 +47.3% 169.80 ± 7% stress-ng.time.user_time
28334081 -91.0% 2554796 ± 19% stress-ng.time.voluntary_context_switches
15224825 ± 2% -9.9% 13712660 ± 2% numa-vmstat.node0.nr_free_pages
24052 ± 5% +11.2% 26754 ± 7% numa-vmstat.node0.nr_kernel_stack
26695 ± 30% +94.3% 51869 ± 25% numa-vmstat.node0.nr_slab_reclaimable
80931 ± 2% +629.1% 590074 ± 3% numa-vmstat.node0.nr_slab_unreclaimable
884632 ± 28% -67.7% 285909 ± 29% numa-vmstat.node1.nr_active_anon
247805 ± 34% -83.9% 39842 ± 86% numa-vmstat.node1.nr_mapped
25014 ± 32% +130.7% 57701 ± 11% numa-vmstat.node1.nr_slab_reclaimable
75561 ± 2% +240.7% 257418 ± 9% numa-vmstat.node1.nr_slab_unreclaimable
884628 ± 28% -67.7% 285908 ± 29% numa-vmstat.node1.nr_zone_active_anon
19254705 ± 2% +182.7% 54434306 ± 30% numa-vmstat.node1.numa_hit
19065286 ± 2% +184.5% 54249017 ± 30% numa-vmstat.node1.numa_local
4955679 ± 4% -30.9% 3424278 ± 5% meminfo.Active
4955679 ± 4% -30.9% 3424278 ± 5% meminfo.Active(anon)
6755066 ± 3% -22.9% 5210854 ± 3% meminfo.Cached
8262544 -17.7% 6803834 ± 2% meminfo.Committed_AS
207031 +110.4% 435508 ± 11% meminfo.KReclaimable
1206797 ± 7% -58.6% 499961 ± 15% meminfo.Mapped
11812020 +34.0% 15824159 meminfo.Memused
207031 +110.4% 435508 ± 11% meminfo.SReclaimable
625945 +440.8% 3385294 ± 4% meminfo.SUnreclaim
3240345 ± 7% -47.7% 1696129 ± 11% meminfo.Shmem
832977 +358.7% 3820803 ± 2% meminfo.Slab
12134020 ± 2% +40.3% 17018684 ± 2% meminfo.max_used_kB
107034 ± 30% +94.2% 207898 ± 26% numa-meminfo.node0.KReclaimable
24086 ± 5% +10.9% 26717 ± 7% numa-meminfo.node0.KernelStack
60890354 ± 2% -9.9% 54851181 ± 2% numa-meminfo.node0.MemFree
4808400 ± 28% +125.6% 10847573 ± 14% numa-meminfo.node0.MemUsed
107034 ± 30% +94.2% 207898 ± 26% numa-meminfo.node0.SReclaimable
324929 +626.1% 2359208 ± 3% numa-meminfo.node0.SUnreclaim
431963 ± 8% +494.3% 2567107 ± 3% numa-meminfo.node0.Slab
3538358 ± 28% -67.7% 1144230 ± 29% numa-meminfo.node1.Active
3538358 ± 28% -67.7% 1144230 ± 29% numa-meminfo.node1.Active(anon)
99985 ± 32% +132.3% 232273 ± 11% numa-meminfo.node1.KReclaimable
983848 ± 34% -83.8% 159767 ± 86% numa-meminfo.node1.Mapped
99985 ± 32% +132.3% 232273 ± 11% numa-meminfo.node1.SReclaimable
301620 +240.5% 1027163 ± 9% numa-meminfo.node1.SUnreclaim
401606 ± 8% +213.6% 1259437 ± 8% numa-meminfo.node1.Slab
1237560 ± 4% -30.8% 856038 ± 5% proc-vmstat.nr_active_anon
2976831 -3.4% 2876549 proc-vmstat.nr_dirty_background_threshold
5960941 -3.4% 5760131 proc-vmstat.nr_dirty_threshold
1688680 ± 3% -22.8% 1303065 ± 3% proc-vmstat.nr_file_pages
29966049 -3.4% 28961759 proc-vmstat.nr_free_pages
49329 +1.6% 50128 proc-vmstat.nr_kernel_stack
304482 ± 7% -58.6% 125967 ± 15% proc-vmstat.nr_mapped
809999 ± 7% -47.6% 424383 ± 11% proc-vmstat.nr_shmem
51715 +109.8% 108516 ± 12% proc-vmstat.nr_slab_reclaimable
156635 +440.7% 846916 ± 4% proc-vmstat.nr_slab_unreclaimable
1237560 ± 4% -30.8% 856038 ± 5% proc-vmstat.nr_zone_active_anon
37826648 +125.5% 85303573 ± 6% proc-vmstat.numa_hit
37562632 +126.4% 85036326 ± 6% proc-vmstat.numa_local
97014 ± 8% +26.4% 122652 ± 10% proc-vmstat.numa_pages_migrated
45240990 +150.6% 1.134e+08 ± 5% proc-vmstat.pgalloc_normal
798177 -5.8% 752280 ± 2% proc-vmstat.pgfault
43937747 +156.1% 1.125e+08 ± 5% proc-vmstat.pgfree
97014 ± 8% +26.4% 122652 ± 10% proc-vmstat.pgmigrate_success
0.69 ± 7% +313.5% 2.85 ± 11% perf-stat.i.MPKI
1.549e+10 ± 3% -22.1% 1.207e+10 ± 5% perf-stat.i.branch-instructions
73994092 ± 2% +22.3% 90471635 ± 3% perf-stat.i.branch-misses
35.70 +6.8 42.45 ± 10% perf-stat.i.cache-miss-rate%
47382742 +284.6% 1.822e+08 ± 11% perf-stat.i.cache-misses
1.291e+08 ± 3% +220.0% 4.132e+08 ± 5% perf-stat.i.cache-references
934844 ± 3% -87.1% 120611 ± 13% perf-stat.i.context-switches
8.80 ± 2% -23.7% 6.71 ± 17% perf-stat.i.cpi
6.394e+11 ± 3% -34.8% 4.166e+11 ± 16% perf-stat.i.cpu-cycles
13976 ± 5% -82.1% 2495 ± 7% perf-stat.i.cycles-between-cache-misses
7.202e+10 ± 3% -13.1% 6.26e+10 ± 5% perf-stat.i.instructions
0.12 ± 6% +35.7% 0.16 ± 20% perf-stat.i.ipc
5.21 ± 3% -88.7% 0.59 ± 93% perf-stat.i.metric.K/sec
402199 ± 3% -39.9% 241847 ± 33% perf-stat.i.page-faults
0.66 ± 2% +345.5% 2.93 ± 12% perf-stat.overall.MPKI
0.48 +0.3 0.75 ± 3% perf-stat.overall.branch-miss-rate%
36.72 +7.6 44.28 ± 10% perf-stat.overall.cache-miss-rate%
8.88 -24.5% 6.70 ± 18% perf-stat.overall.cpi
13516 ± 2% -83.2% 2269 ± 7% perf-stat.overall.cycles-between-cache-misses
0.11 +37.4% 0.15 ± 18% perf-stat.overall.ipc
1.526e+10 ± 3% -22.2% 1.188e+10 ± 5% perf-stat.ps.branch-instructions
72600465 ± 2% +22.6% 89019409 ± 3% perf-stat.ps.branch-misses
46600547 +286.1% 1.799e+08 ± 11% perf-stat.ps.cache-misses
1.269e+08 ± 2% +220.4% 4.067e+08 ± 5% perf-stat.ps.cache-references
920856 ± 2% -87.1% 118969 ± 13% perf-stat.ps.context-switches
6.299e+11 ± 3% -34.8% 4.106e+11 ± 16% perf-stat.ps.cpu-cycles
7.094e+10 ± 3% -13.2% 6.161e+10 ± 5% perf-stat.ps.instructions
396054 ± 3% -40.1% 237181 ± 33% perf-stat.ps.page-faults
4.463e+12 -14.9% 3.798e+12 ± 5% perf-stat.total.instructions
7856122 -72.6% 2151482 ± 76% sched_debug.cfs_rq:/.avg_vruntime.avg
8590951 -64.0% 3092345 ± 72% sched_debug.cfs_rq:/.avg_vruntime.max
7042407 ± 7% -82.6% 1227046 ± 84% sched_debug.cfs_rq:/.avg_vruntime.min
0.67 ± 6% -73.6% 0.18 ± 55% sched_debug.cfs_rq:/.h_nr_queued.avg
2.08 ± 8% -44.0% 1.17 ± 20% sched_debug.cfs_rq:/.h_nr_queued.max
0.65 ± 5% -72.5% 0.18 ± 55% sched_debug.cfs_rq:/.h_nr_runnable.avg
2.08 ± 8% -44.0% 1.17 ± 20% sched_debug.cfs_rq:/.h_nr_runnable.max
75.67 ± 22% +72.8% 130.79 ± 30% sched_debug.cfs_rq:/.load_avg.stddev
7856124 -72.6% 2151482 ± 76% sched_debug.cfs_rq:/.min_vruntime.avg
8590951 -64.0% 3092345 ± 72% sched_debug.cfs_rq:/.min_vruntime.max
7042407 ± 7% -82.6% 1227046 ± 84% sched_debug.cfs_rq:/.min_vruntime.min
0.52 -66.0% 0.18 ± 54% sched_debug.cfs_rq:/.nr_queued.avg
0.13 ± 19% +131.6% 0.29 ± 21% sched_debug.cfs_rq:/.nr_queued.stddev
712.87 ± 3% -69.0% 221.26 ± 41% sched_debug.cfs_rq:/.runnable_avg.avg
1791 ± 5% -33.4% 1193 ± 19% sched_debug.cfs_rq:/.runnable_avg.max
291.83 ± 72% -100.0% 0.00 sched_debug.cfs_rq:/.runnable_avg.min
562.59 -61.2% 218.45 ± 41% sched_debug.cfs_rq:/.util_avg.avg
204.08 ± 85% -100.0% 0.00 sched_debug.cfs_rq:/.util_avg.min
124.33 ± 9% +130.6% 286.67 ± 26% sched_debug.cfs_rq:/.util_avg.stddev
667089 ± 2% +26.3% 842681 ± 7% sched_debug.cpu.avg_idle.avg
37749 ± 6% -73.7% 9926 ± 48% sched_debug.cpu.avg_idle.min
31.26 ± 20% -35.5% 20.15 ± 20% sched_debug.cpu.clock.stddev
3365 ± 2% -68.3% 1067 ± 60% sched_debug.cpu.curr->pid.avg
588.77 ± 10% +168.8% 1582 ± 35% sched_debug.cpu.curr->pid.stddev
0.67 ± 6% -74.0% 0.17 ± 57% sched_debug.cpu.nr_running.avg
2.08 ± 8% -44.0% 1.17 ± 20% sched_debug.cpu.nr_running.max
112229 -90.6% 10588 ± 66% sched_debug.cpu.nr_switches.avg
131899 ± 2% -74.8% 33294 ± 34% sched_debug.cpu.nr_switches.max
86555 ± 8% -97.4% 2240 ± 83% sched_debug.cpu.nr_switches.min
0.27 ± 21% -97.1% 0.01 ±101% sched_debug.cpu.nr_uninterruptible.avg
0.20 ± 12% +215.1% 0.64 ± 54% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.alloc_skb_with_frags
1.05 ± 75% -94.5% 0.06 ±168% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
0.16 ±111% +555.0% 1.03 ± 55% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
0.19 ±122% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
0.04 ± 60% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
0.01 +4716.7% 0.53 ± 71% perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.16 ± 10% +209.1% 0.48 ± 51% perf-sched.sch_delay.avg.ms.__cond_resched.dput.path_put.unix_find_other.unix_stream_connect
0.04 ± 22% +1735.0% 0.79 ±103% perf-sched.sch_delay.avg.ms.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
0.11 ± 12% +432.0% 0.58 ± 53% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
0.26 ± 4% +121.3% 0.56 ± 47% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
0.16 ± 4% +267.9% 0.58 ± 46% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.sock_wmalloc.unix_stream_connect
0.16 ± 28% +377.0% 0.75 ± 85% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.sock_wmalloc
0.10 ± 46% +492.9% 0.61 ± 56% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.do_timer_create.__x64_sys_timer_create.do_syscall_64
0.15 ± 10% +426.6% 0.80 ± 71% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_kernel.kern_path.unix_find_other
0.14 ± 8% +267.4% 0.53 ± 46% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.sk_prot_alloc.sk_alloc.unix_create1
0.22 ± 17% -81.1% 0.04 ±105% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.21 ± 22% -83.9% 0.03 ± 57% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.05 ± 7% +153.4% 0.13 ± 44% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.21 ± 17% -85.4% 0.03 ± 96% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.09 ± 16% -42.9% 0.05 ± 41% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
0.08 ± 10% -53.7% 0.04 ± 48% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.05 ± 3% +367.7% 0.22 ± 27% perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.19 ± 30% -82.0% 0.03 ± 50% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
1.18 ± 42% +632.2% 8.64 ± 38% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.alloc_skb_with_frags
4.01 ± 60% -98.5% 0.06 ±155% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
2.71 ± 56% +302.4% 10.92 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.__dentry_kill.dput.__fput.__x64_sys_close
0.79 ±114% +2021.6% 16.77 ± 22% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
1.46 ± 65% +367.7% 6.82 ± 83% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
2.34 ± 31% +170.9% 6.33 ± 32% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.62 ± 93% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
0.21 ± 52% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
6.68 ± 26% +457.7% 37.23 ± 37% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
1.59 ± 31% +408.3% 8.07 ± 40% perf-sched.sch_delay.max.ms.__cond_resched.dput.path_put.unix_find_other.unix_stream_connect
0.17 ± 29% +62821.2% 106.44 ±196% perf-sched.sch_delay.max.ms.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
2.14 ± 52% +577.6% 14.52 ± 29% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
1.38 ±111% +575.2% 9.32 ± 40% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
1.37 ± 35% +1092.7% 16.35 ± 28% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
1.49 ± 19% +888.9% 14.77 ± 21% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.sock_wmalloc.unix_stream_connect
1.52 ± 45% +481.7% 8.83 ± 58% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.sock_wmalloc
0.37 ± 48% +1749.5% 6.77 ± 37% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.do_timer_create.__x64_sys_timer_create.do_syscall_64
1.89 ± 67% +501.1% 11.35 ± 49% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.getname_kernel.kern_path.unix_find_other
3.28 ± 63% +414.6% 16.87 ± 28% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
2.47 ± 47% +658.0% 18.73 ± 26% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.sk_prot_alloc.sk_alloc.unix_create1
2.54 ± 56% +385.8% 12.35 ± 44% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
2.81 ± 78% -87.4% 0.35 ±210% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
4.53 ± 45% -98.0% 0.09 ± 72% perf-sched.sch_delay.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
0.35 ± 30% +2421.7% 8.88 ± 56% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1.74 ± 56% -88.4% 0.20 ± 99% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
10.65 ± 11% +156.4% 27.30 ± 48% perf-sched.sch_delay.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
1.71 ± 43% +444.3% 9.29 ± 71% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
2.27 ± 22% -73.1% 0.61 ± 88% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.01 ±103% -83.0% 0.17 ±118% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
6.60 ± 76% -70.6% 1.94 ±137% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
6.12 ± 22% +133.1% 14.27 ± 24% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2.16 ± 56% -92.7% 0.16 ± 68% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
18.92 ± 37% +1957.7% 389.39 ± 74% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.06 ± 5% +228.9% 0.20 ± 55% perf-sched.total_sch_delay.average.ms
1.09 +1306.4% 15.38 ± 22% perf-sched.total_wait_and_delay.average.ms
4862008 -91.5% 414956 ± 18% perf-sched.total_wait_and_delay.count.ms
1.03 +1370.8% 15.18 ± 22% perf-sched.total_wait_time.average.ms
2.88 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
7.65 ± 53% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.__sock_release.sock_close.__fput
0.03 ± 9% +5978.6% 1.56 ± 59% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
11.25 ± 52% -79.8% 2.27 ± 73% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.path_put.unix_release_sock.unix_release
4.87 ± 61% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
4.50 ± 16% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
7.72 ± 30% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
10.49 ± 27% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
3.76 ± 41% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
5.08 ± 21% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
13.88 ± 21% +332.9% 60.10 ± 60% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
3.18 ± 74% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
12.40 ±118% +166.8% 33.07 ± 3% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
4.34 ± 70% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
93.41 ± 18% +199.5% 279.81 ± 20% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.53 +1911.6% 10.61 ± 47% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.44 +524.5% 2.77 ± 22% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.do_epoll_pwait.part
1.66 ± 14% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
4.06 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.15 ± 2% +354.0% 0.68 ± 22% perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
62.27 +16.1% 72.33 ± 2% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1536 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
5.83 ± 60% -91.4% 0.50 ±223% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.stop_two_cpus.migrate_swap.task_numa_migrate
39.67 ± 15% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down_write.__sock_release.sock_close.__fput
497089 ± 2% -97.4% 12678 ± 28% perf-sched.wait_and_delay.count.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
317.33 ± 14% +1910.9% 6381 ± 20% perf-sched.wait_and_delay.count.__cond_resched.dput.path_put.unix_release_sock.unix_release
42.33 ± 45% +807.5% 384.17 ± 30% perf-sched.wait_and_delay.count.__cond_resched.kfree_rcu_work.process_one_work.worker_thread.kthread
22.00 ± 16% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
53.00 ± 10% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
75.17 ± 8% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
36.00 ± 15% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
59.50 ± 17% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
77.00 ± 12% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
53.33 ± 47% -93.4% 3.50 ±223% perf-sched.wait_and_delay.count.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
41.67 ± 36% +999.6% 458.17 ± 56% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
229578 ± 5% -75.4% 56521 ± 56% perf-sched.wait_and_delay.count.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
1.67 ± 28% -100.0% 0.00 perf-sched.wait_and_delay.count.devkmsg_read.vfs_read.ksys_read.do_syscall_64
6.00 +40144.4% 2414 ± 15% perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
244.00 ± 7% -100.0% 0.00 perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
739.83 ± 19% -66.1% 250.67 ± 30% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
930842 -94.1% 55225 ± 32% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1423312 -89.3% 152928 ± 23% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.do_epoll_pwait.part
28105 ± 8% -47.9% 14637 ± 41% perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.do_epoll_ctl
79.83 ± 6% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1253 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
1682676 -98.2% 30021 ± 57% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
17624 +63.2% 28758 ± 14% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1001 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
63.56 ± 48% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.__sock_release.sock_close.__fput
47.03 ± 79% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.sock_alloc_inode.alloc_inode.sock_alloc
40.36 ± 20% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
61.87 ± 20% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
60.75 ± 52% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
52.23 ± 35% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
53.65 ± 22% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
76.72 ± 30% +574.4% 517.47 ± 74% perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
21.30 ± 11% +518.9% 131.79 ± 52% perf-sched.wait_and_delay.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
4.33 ± 40% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
229.85 ±160% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
89.98 ± 25% -53.0% 42.33 ± 24% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.do_epoll_ctl
6.63 ± 15% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
8.99 ± 32% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
2212 ± 23% +45.0% 3208 ± 17% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.20 ± 12% +309.5% 0.83 ± 77% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.alloc_skb_with_frags
1.05 ± 75% -94.5% 0.06 ±168% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
0.26 ± 13% +248.4% 0.91 ± 56% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
1.03 ± 54% -97.5% 0.03 ±158% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
7.19 ± 57% -77.3% 1.63 ± 48% perf-sched.wait_time.avg.ms.__cond_resched.down_write.__sock_release.sock_close.__fput
0.03 ± 62% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
0.01 ± 15% +7172.9% 1.03 ± 53% perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.16 ± 10% +353.5% 0.70 ± 39% perf-sched.wait_time.avg.ms.__cond_resched.dput.path_put.unix_find_other.unix_stream_connect
10.54 ± 55% -85.2% 1.56 ± 66% perf-sched.wait_time.avg.ms.__cond_resched.dput.path_put.unix_release_sock.unix_release
0.26 ± 4% +165.7% 0.68 ± 52% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
0.16 ± 4% +349.7% 0.71 ± 56% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.sock_wmalloc.unix_stream_connect
0.16 ± 28% +533.4% 0.99 ± 72% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.sock_wmalloc
4.35 ± 17% -53.4% 2.03 ± 76% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.10 ± 46% +727.9% 0.85 ± 67% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.do_timer_create.__x64_sys_timer_create.do_syscall_64
7.51 ± 31% -80.7% 1.45 ± 84% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
10.24 ± 27% -76.5% 2.40 ± 85% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
0.15 ± 10% +426.6% 0.80 ± 71% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_kernel.kern_path.unix_find_other
3.53 ± 45% -59.3% 1.44 ± 73% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
0.14 ± 8% +299.4% 0.57 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.sk_prot_alloc.sk_alloc.unix_create1
4.89 ± 22% -69.6% 1.49 ± 82% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
0.14 ± 22% -86.3% 0.02 ±151% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
13.79 ± 21% +335.0% 59.97 ± 60% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
7.93 ±100% +176.8% 21.96 ± 45% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.40 ±118% +166.8% 33.07 ± 3% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
0.21 ± 71% -96.0% 0.01 ±223% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
2.50 ±119% +605.3% 17.65 ± 48% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
0.10 ± 66% +35978.3% 37.64 ±112% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
93.36 ± 18% +199.7% 279.77 ± 20% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.48 +2103.7% 10.48 ± 48% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.39 +583.2% 2.65 ± 22% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.do_epoll_pwait.part
1.15 ± 5% -50.8% 0.57 ± 59% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.do_epoll_ctl
3.97 ± 2% +27.4% 5.06 ± 4% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.10 ± 2% +348.6% 0.47 ± 20% perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
62.18 +16.1% 72.22 ± 2% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1.18 ± 42% +1710.8% 21.38 ±134% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.alloc_skb_with_frags
4.01 ± 60% -98.5% 0.06 ±155% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
1.46 ± 65% +1683.9% 25.99 ±114% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
2.38 ± 51% -98.9% 0.03 ±158% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
0.21 ± 52% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
1.59 ± 31% +3311.2% 54.16 ± 84% perf-sched.wait_time.max.ms.__cond_resched.dput.path_put.unix_find_other.unix_stream_connect
84.86 ± 27% -52.1% 40.62 ± 36% perf-sched.wait_time.max.ms.__cond_resched.dput.path_put.unix_release_sock.unix_release
1.37 ± 35% +9209.7% 127.60 ± 65% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
1.49 ± 19% +5627.3% 85.56 ± 89% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.sock_wmalloc.unix_stream_connect
1.52 ± 45% +1550.7% 25.05 ±135% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.sock_wmalloc
0.37 ± 48% +5929.2% 22.07 ±158% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.do_timer_create.__x64_sys_timer_create.do_syscall_64
61.20 ± 21% -55.7% 27.08 ± 70% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_insert.do_epoll_ctl.__x64_sys_epoll_ctl
60.17 ± 51% -58.9% 24.71 ± 65% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.ep_ptable_queue_proc.unix_poll.sock_poll
1.89 ± 67% +501.1% 11.35 ± 49% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.getname_kernel.kern_path.unix_find_other
2.47 ± 47% +3700.3% 93.91 ± 63% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.sk_prot_alloc.sk_alloc.unix_create1
52.73 ± 20% -56.5% 22.96 ± 57% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.do_epoll_ctl.__x64_sys_epoll_ctl.do_syscall_64
42.82 ± 27% -70.8% 12.49 ± 55% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.ep_loop_check_proc.do_epoll_ctl.__x64_sys_epoll_ctl
2.81 ± 78% -87.4% 0.35 ±210% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.fdget_pos.ksys_write.do_syscall_64
0.22 ± 3% -91.6% 0.02 ±151% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
76.63 ± 30% +575.0% 517.27 ± 74% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
10.65 ± 11% +1061.1% 123.63 ± 61% perf-sched.wait_time.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.51 ± 76% -97.4% 0.01 ±223% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
1.01 ±103% -82.8% 0.17 ±116% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
19.43 ±110% +417.5% 100.53 perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
0.96 ±107% +30626.2% 293.64 ±113% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
84.15 ± 22% -67.6% 27.23 ± 29% perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.do_epoll_ctl
6.62 ± 5% +76.8% 11.70 ± 33% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
2212 ± 23% +45.0% 3208 ± 17% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-24 6:39 [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression kernel test robot
@ 2025-03-26 8:07 ` Thomas Gleixner
2025-03-26 21:11 ` Mateusz Guzik
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2025-03-26 8:07 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, linux-kernel, x86, Eric Dumazet, Benjamin Segall,
Frederic Weisbecker, oliver.sang
On Mon, Mar 24 2025 at 14:39, kernel test robot wrote:
> kernel test robot noticed a 36.2% regression of stress-ng.epoll.ops_per_sec on:
>
> commit: 1535cb80286e6fbc834f075039f85274538543c7 ("posix-timers: Improve hash table performance")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git timers/core
>
> testcase: stress-ng
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
> nr_threads: 100%
> testtime: 60s
> test: epoll
> cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+---------------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.epoll.ops_per_sec 124.9% improvement |
> | test machine | 256 threads 2 sockets GENUINE INTEL(R) XEON(R) (Sierra Forest) with 128G memory |
> | test parameters | cpufreq_governor=performance |
> | | nr_threads=100% |
> | | test=epoll |
> | | testtime=60s |
> +------------------+---------------------------------------------------------------------------------+
How on earth can this commit result in both a 36% regression and a 25%
improvement with the same test?
Unfortunately I can't reproduce any of it. I checked the epoll test
source and it uses a posix timer, but that commit makes the hash less
contended so there is zero explanation.
Thanks,
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-26 8:07 ` Thomas Gleixner
@ 2025-03-26 21:11 ` Mateusz Guzik
2025-03-26 21:43 ` Thomas Gleixner
2025-03-27 6:21 ` Eric Dumazet
0 siblings, 2 replies; 16+ messages in thread
From: Mateusz Guzik @ 2025-03-26 21:11 UTC (permalink / raw)
To: Thomas Gleixner
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Eric Dumazet,
Benjamin Segall, Frederic Weisbecker
On Wed, Mar 26, 2025 at 09:07:51AM +0100, Thomas Gleixner wrote:
> On Mon, Mar 24 2025 at 14:39, kernel test robot wrote:
> > kernel test robot noticed a 36.2% regression of stress-ng.epoll.ops_per_sec on:
> >
> > commit: 1535cb80286e6fbc834f075039f85274538543c7 ("posix-timers: Improve hash table performance")
> > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git timers/core
> >
[snip]
> > | testcase: change | stress-ng: stress-ng.epoll.ops_per_sec 124.9% improvement |
>
> How on earth can this commit result in both a 36% regression and a 25%
> improvement with the same test?
>
> Unfortunately I can't reproduce any of it. I checked the epoll test
> source and it uses a posix timer, but that commit makes the hash less
> contended so there is zero explanation.
>
The short summary is:
1. your change is fine
2. stress-ng is doing seriously weird stuff here resulting in the above
3. there may or may not be something the scheduler can do to help
for the regression stats are saying:
feb864ee99a2d8a2 1535cb80286e6fbc834f075039f
---------------- ---------------------------
%stddev %change %stddev
\ | \
5.97 ± 56% +35.8 41.74 ± 24% mpstat.cpu.all.idle%
0.86 ± 3% -0.3 0.51 ± 11% mpstat.cpu.all.irq%
0.10 ± 3% +2.0 2.11 ± 13% mpstat.cpu.all.soft%
92.01 ± 3% -37.7 54.27 ± 18% mpstat.cpu.all.sys%
1.06 ± 3% +0.3 1.37 ± 8% mpstat.cpu.all.usr%
27.83 ± 38% -84.4% 4.33 ± 31% mpstat.max_utilization.seconds
As in system time went down and idle went up.
Your patch must have a side effect where it messes with some of the
timings between workers.
However, there is a possibility the scheduler may be doing something
better here -- the testcase spawned as is has wildly unstable
performance, literally orders of magnitude difference between runs and
tons of idle and it stabilizes if I use a taskset.
In an attempt to narrow it down I tried with few workers:
taskset --cpu-list 1,2 stress-ng --timeout 10 --times --verify --metrics --no-rand-seed --epoll 1
--epoll 1 spawns two worker threads and both are bound to only execute
on cores 1 and 2.
With this I consistently see high CPU usage and total executed ops
hanging around 190k. Sample time output:
1.31s user 18.67s system 199% cpu 10.02s (10.023) total
If I whack the taskset or extend it to 1,2,3,4:
taskset --cpu-list 1,2,3,4 stress-ng --timeout 10 --times --verify --metrics --no-rand-seed --epoll 1
... I'm back to non-nensical perf, all the way down to 18k ops/s on the
lower end and over 200k on the higher one.
Sample time outputs in consecutive runs:
0.02s user 0.38s system 3% cpu 10.06s (10.060) total
0.34s user 4.59s system 48% cpu 10.13s (10.132) total
As in during the first run this spent almost the entire time off CPU.
During the second one it only used about a quarter of CPU time it could.
The testcase is doing a lot of weird stuff, including calling yield()
for every loop iteration. On top of that if the other worker does not
win the race there is also a sleep of 0.1s thrown in. I commented these
suckers out and weird anomalies persisted.
All that said, I'm not going to further look into it. Was curious wtf
though hence the write up.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-26 21:11 ` Mateusz Guzik
@ 2025-03-26 21:43 ` Thomas Gleixner
2025-03-27 6:21 ` Eric Dumazet
1 sibling, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2025-03-26 21:43 UTC (permalink / raw)
To: Mateusz Guzik
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Eric Dumazet,
Benjamin Segall, Frederic Weisbecker
On Wed, Mar 26 2025 at 22:11, Mateusz Guzik wrote:
> On Wed, Mar 26, 2025 at 09:07:51AM +0100, Thomas Gleixner wrote:
>> How on earth can this commit result in both a 36% regression and a 25%
>> improvement with the same test?
>>
>> Unfortunately I can't reproduce any of it. I checked the epoll test
>> source and it uses a posix timer, but that commit makes the hash less
>> contended so there is zero explanation.
>>
>
> The short summary is:
> 1. your change is fine
> 2. stress-ng is doing seriously weird stuff here resulting in the above
> 3. there may or may not be something the scheduler can do to help
>
> for the regression stats are saying:
> feb864ee99a2d8a2 1535cb80286e6fbc834f075039f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 5.97 ± 56% +35.8 41.74 ± 24% mpstat.cpu.all.idle%
> 0.86 ± 3% -0.3 0.51 ± 11% mpstat.cpu.all.irq%
> 0.10 ± 3% +2.0 2.11 ± 13% mpstat.cpu.all.soft%
> 92.01 ± 3% -37.7 54.27 ± 18% mpstat.cpu.all.sys%
> 1.06 ± 3% +0.3 1.37 ± 8% mpstat.cpu.all.usr%
> 27.83 ± 38% -84.4% 4.33 ± 31% mpstat.max_utilization.seconds
>
> As in system time went down and idle went up.
>
> Your patch must have a side effect where it messes with some of the
> timings between workers.
It does as it removes the global lock and the potential contention on
it.
> The testcase is doing a lot of weird stuff, including calling yield()
> for every loop iteration. On top of that if the other worker does not
> win the race there is also a sleep of 0.1s thrown in. I commented these
> suckers out and weird anomalies persisted.
>
> All that said, I'm not going to further look into it. Was curious wtf
> though hence the write up.
Thak you for taking the time and looking into this. The analysis of this
"benchmark" is a fun read and I agree that it matches my impression of
looking into the source of this thing that it does weird stuff, which
does not make any sense at all.
Thanks,
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-26 21:11 ` Mateusz Guzik
2025-03-26 21:43 ` Thomas Gleixner
@ 2025-03-27 6:21 ` Eric Dumazet
2025-03-27 8:10 ` Thomas Gleixner
1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2025-03-27 6:21 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Thomas Gleixner, kernel test robot, oe-lkp, lkp, linux-kernel,
x86, Benjamin Segall, Frederic Weisbecker
On Wed, Mar 26, 2025 at 10:11 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Wed, Mar 26, 2025 at 09:07:51AM +0100, Thomas Gleixner wrote:
> > On Mon, Mar 24 2025 at 14:39, kernel test robot wrote:
> > > kernel test robot noticed a 36.2% regression of stress-ng.epoll.ops_per_sec on:
> > >
> > > commit: 1535cb80286e6fbc834f075039f85274538543c7 ("posix-timers: Improve hash table performance")
> > > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git timers/core
> > >
> [snip]
> > > | testcase: change | stress-ng: stress-ng.epoll.ops_per_sec 124.9% improvement |
> >
> > How on earth can this commit result in both a 36% regression and a 25%
> > improvement with the same test?
> >
> > Unfortunately I can't reproduce any of it. I checked the epoll test
> > source and it uses a posix timer, but that commit makes the hash less
> > contended so there is zero explanation.
> >
>
> The short summary is:
> 1. your change is fine
Let me rephrase this.
Absolutely wonderful series, thanks a lot Thomas for doing it.
Next bottlenecks are now these ones, but showing up in synthetic
benchmarks only.
33.36% timer_storm [kernel.kallsyms] [k]
inc_rlimit_get_ucounts
|
--33.34%--inc_rlimit_get_ucounts
posixtimer_init_sigqueue
do_timer_create
__x64_sys_timer_create
do_syscall_64
entry_SYSCALL_64_after_hwframe
___timer_create
0xe
32.85% timer_storm [kernel.kallsyms] [k]
dec_rlimit_put_ucounts
|
--32.83%--dec_rlimit_put_ucounts
posix_timer_unhash_and_free
__se_sys_timer_delete
do_syscall_64
entry_SYSCALL_64_after_hwframe
___timer_delete
9.61% timer_storm [kernel.kallsyms] [k]
queued_spin_lock_slowpath
|
---queued_spin_lock_slowpath
|
|--8.92%--_raw_spin_lock_irqsave
| |
| --8.91%--get_partial_node
| ___slab_alloc
| kmem_cache_alloc_noprof
| do_timer_create
| __x64_sys_timer_create
| do_syscall_64
| entry_SYSCALL_64_after_hwframe
| ___timer_create
| 0xe
|
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 6:21 ` Eric Dumazet
@ 2025-03-27 8:10 ` Thomas Gleixner
2025-03-27 8:26 ` Eric Dumazet
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2025-03-27 8:10 UTC (permalink / raw)
To: Eric Dumazet, Mateusz Guzik
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86,
Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27 2025 at 07:21, Eric Dumazet wrote:
> On Wed, Mar 26, 2025 at 10:11 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>> On Wed, Mar 26, 2025 at 09:07:51AM +0100, Thomas Gleixner wrote:
>> > Unfortunately I can't reproduce any of it. I checked the epoll test
>> > source and it uses a posix timer, but that commit makes the hash less
>> > contended so there is zero explanation.
>> >
>>
>> The short summary is:
>> 1. your change is fine
>
> Let me rephrase this.
>
> Absolutely wonderful series, thanks a lot Thomas for doing it.
Thank you!
> Next bottlenecks are now these ones, but showing up in synthetic
> benchmarks only.
Right. I saw them too when working on this.
> 33.36% timer_storm [kernel.kallsyms] [k]
> inc_rlimit_get_ucounts
>
> 32.85% timer_storm [kernel.kallsyms] [k]
> dec_rlimit_put_ucounts
These two are not really posix-timer specific. They are also the
standouts for any signal micro benchmark.
I stared at the implementation a bit, but there is not much we can do
about that I fear.
Thanks,
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 8:10 ` Thomas Gleixner
@ 2025-03-27 8:26 ` Eric Dumazet
2025-03-27 9:11 ` Eric Dumazet
0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2025-03-27 8:26 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Mateusz Guzik, kernel test robot, oe-lkp, lkp, linux-kernel, x86,
Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27, 2025 at 9:10 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Thu, Mar 27 2025 at 07:21, Eric Dumazet wrote:
> > On Wed, Mar 26, 2025 at 10:11 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >> On Wed, Mar 26, 2025 at 09:07:51AM +0100, Thomas Gleixner wrote:
> >> > Unfortunately I can't reproduce any of it. I checked the epoll test
> >> > source and it uses a posix timer, but that commit makes the hash less
> >> > contended so there is zero explanation.
> >> >
> >>
> >> The short summary is:
> >> 1. your change is fine
> >
> > Let me rephrase this.
> >
> > Absolutely wonderful series, thanks a lot Thomas for doing it.
>
> Thank you!
>
> > Next bottlenecks are now these ones, but showing up in synthetic
> > benchmarks only.
>
> Right. I saw them too when working on this.
>
> > 33.36% timer_storm [kernel.kallsyms] [k]
> > inc_rlimit_get_ucounts
> >
> > 32.85% timer_storm [kernel.kallsyms] [k]
> > dec_rlimit_put_ucounts
>
> These two are not really posix-timer specific. They are also the
> standouts for any signal micro benchmark.
>
> I stared at the implementation a bit, but there is not much we can do
> about that I fear.
We could place all these atomic fields in separate cache lines,
to keep read-only fields shared as much as possible.
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 7183e5aca282..6ddf667022d9 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -118,7 +118,10 @@ struct ucounts {
struct hlist_node node;
struct user_namespace *ns;
kuid_t uid;
- atomic_t count;
+ atomic_t count ____cacheline_aligned_in_smp;
+ /* Note : should probably put all the following atomic_long_t
+ * in separate cache lines (one atomic_long_t per cache line).
+ */
atomic_long_t ucount[UCOUNT_COUNTS];
atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS];
};
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 8:26 ` Eric Dumazet
@ 2025-03-27 9:11 ` Eric Dumazet
2025-03-27 10:50 ` Thomas Gleixner
0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2025-03-27 9:11 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Mateusz Guzik, kernel test robot, oe-lkp, lkp, linux-kernel, x86,
Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27, 2025 at 9:26 AM Eric Dumazet <edumazet@google.com> wrote:
> We could place all these atomic fields in separate cache lines,
> to keep read-only fields shared as much as possible.
>
Following one-liner seems good enough to separate the 4 atomics used
to control/limit
UCOUNT_RLIMIT_NPROC, UCOUNT_RLIMIT_MSGQUEUE, UCOUNT_RLIMIT_SIGPENDING,
UCOUNT_RLIMIT_MEMLOCK,
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 7183e5aca282..6cc3fbec3632 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -120,7 +120,7 @@ struct ucounts {
kuid_t uid;
atomic_t count;
atomic_long_t ucount[UCOUNT_COUNTS];
- atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS];
+ atomic_long_t ____cacheline_aligned_in_smp rlimit[UCOUNT_RLIMIT_COUNTS];
};
extern struct user_namespace init_user_ns;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 9:11 ` Eric Dumazet
@ 2025-03-27 10:50 ` Thomas Gleixner
2025-03-27 11:37 ` Eric Dumazet
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2025-03-27 10:50 UTC (permalink / raw)
To: Eric Dumazet
Cc: Mateusz Guzik, kernel test robot, oe-lkp, lkp, linux-kernel, x86,
Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27 2025 at 10:11, Eric Dumazet wrote:
> On Thu, Mar 27, 2025 at 9:26 AM Eric Dumazet <edumazet@google.com> wrote:
>
>> We could place all these atomic fields in separate cache lines,
>> to keep read-only fields shared as much as possible.
>>
>
> Following one-liner seems good enough to separate the 4 atomics used
> to control/limit
>
> UCOUNT_RLIMIT_NPROC, UCOUNT_RLIMIT_MSGQUEUE, UCOUNT_RLIMIT_SIGPENDING,
> UCOUNT_RLIMIT_MEMLOCK,
>
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 7183e5aca282..6cc3fbec3632 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -120,7 +120,7 @@ struct ucounts {
> kuid_t uid;
> atomic_t count;
> atomic_long_t ucount[UCOUNT_COUNTS];
> - atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS];
> + atomic_long_t ____cacheline_aligned_in_smp rlimit[UCOUNT_RLIMIT_COUNTS];
> };
Cute. How much bloat does it cause?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 10:50 ` Thomas Gleixner
@ 2025-03-27 11:37 ` Eric Dumazet
2025-03-27 13:14 ` Thomas Gleixner
0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2025-03-27 11:37 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Mateusz Guzik, kernel test robot, oe-lkp, lkp, linux-kernel, x86,
Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27, 2025 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Thu, Mar 27 2025 at 10:11, Eric Dumazet wrote:
> > On Thu, Mar 27, 2025 at 9:26 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> >> We could place all these atomic fields in separate cache lines,
> >> to keep read-only fields shared as much as possible.
> >>
> >
> > Following one-liner seems good enough to separate the 4 atomics used
> > to control/limit
> >
> > UCOUNT_RLIMIT_NPROC, UCOUNT_RLIMIT_MSGQUEUE, UCOUNT_RLIMIT_SIGPENDING,
> > UCOUNT_RLIMIT_MEMLOCK,
> >
> >
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index 7183e5aca282..6cc3fbec3632 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -120,7 +120,7 @@ struct ucounts {
> > kuid_t uid;
> > atomic_t count;
> > atomic_long_t ucount[UCOUNT_COUNTS];
> > - atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS];
> > + atomic_long_t ____cacheline_aligned_in_smp rlimit[UCOUNT_RLIMIT_COUNTS];
> > };
>
> Cute. How much bloat does it cause?
This would expand 'struct ucounts' by 192 bytes on x86, if the patch
was actually working :)
Note sure if it is feasible without something more intrusive like
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 7183e5aca282..3513df720430 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -120,7 +120,10 @@ struct ucounts {
kuid_t uid;
atomic_t count;
atomic_long_t ucount[UCOUNT_COUNTS];
- atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS];
+
+ struct {
+ atomic_long_t ____cacheline_aligned_in_smp val;
+ } rlimit[UCOUNT_RLIMIT_COUNTS];
};
extern struct user_namespace init_user_ns;
@@ -136,7 +139,7 @@ void put_ucounts(struct ucounts *ucounts);
static inline long get_rlimit_value(struct ucounts *ucounts, enum
rlimit_type type)
{
- return atomic_long_read(&ucounts->rlimit[type]);
+ return atomic_long_read(&ucounts->rlimit[type].val);
}
long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type
type, long v);
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 86c5f1c0bad9..0cd5498890f8 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -266,7 +266,7 @@ long inc_rlimit_ucounts(struct ucounts *ucounts,
enum rlimit_type type, long v)
long ret = 0;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
- long new = atomic_long_add_return(v, &iter->rlimit[type]);
+ long new = atomic_long_add_return(v, &iter->rlimit[type].val);
if (new < 0 || new > max)
ret = LONG_MAX;
else if (iter == ucounts)
@@ -281,7 +281,7 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts,
enum rlimit_type type, long v)
struct ucounts *iter;
long new = -1; /* Silence compiler warning */
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
- long dec = atomic_long_sub_return(v, &iter->rlimit[type]);
+ long dec = atomic_long_sub_return(v, &iter->rlimit[type].val);
WARN_ON_ONCE(dec < 0);
if (iter == ucounts)
new = dec;
@@ -294,7 +294,7 @@ static void do_dec_rlimit_put_ucounts(struct
ucounts *ucounts,
{
struct ucounts *iter, *next;
for (iter = ucounts; iter != last; iter = next) {
- long dec = atomic_long_sub_return(1, &iter->rlimit[type]);
+ long dec = atomic_long_sub_return(1, &iter->rlimit[type].val);
WARN_ON_ONCE(dec < 0);
next = iter->ns->ucounts;
if (dec == 0)
@@ -316,7 +316,7 @@ long inc_rlimit_get_ucounts(struct ucounts
*ucounts, enum rlimit_type type,
long dec, ret = 0;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
- long new = atomic_long_add_return(1, &iter->rlimit[type]);
+ long new = atomic_long_add_return(1, &iter->rlimit[type].val);
if (new < 0 || new > max)
goto dec_unwind;
if (iter == ucounts)
@@ -334,7 +334,7 @@ long inc_rlimit_get_ucounts(struct ucounts
*ucounts, enum rlimit_type type,
}
return ret;
dec_unwind:
- dec = atomic_long_sub_return(1, &iter->rlimit[type]);
+ dec = atomic_long_sub_return(1, &iter->rlimit[type].val);
WARN_ON_ONCE(dec < 0);
do_dec_rlimit_put_ucounts(ucounts, iter, type);
return 0;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 11:37 ` Eric Dumazet
@ 2025-03-27 13:14 ` Thomas Gleixner
2025-03-27 13:17 ` Eric Dumazet
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2025-03-27 13:14 UTC (permalink / raw)
To: Eric Dumazet
Cc: Mateusz Guzik, kernel test robot, oe-lkp, lkp, linux-kernel, x86,
Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27 2025 at 12:37, Eric Dumazet wrote:
> On Thu, Mar 27, 2025 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> Cute. How much bloat does it cause?
>
> This would expand 'struct ucounts' by 192 bytes on x86, if the patch
> was actually working :)
>
> Note sure if it is feasible without something more intrusive like
I'm not sure about the actual benefit. The problem is that parallel
invocations which access the same ucount still will run into contention
of the cache line they are modifying.
For the signal case, all invocations increment rlimit[SIGPENDING], so
putting that into a different cache line does not buy a lot.
False sharing is when you have a lot of hot path readers on some other
member of the data structure, which happens to share the cache line with
the modified member. But that's not really the case here.
Thanks,
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 13:14 ` Thomas Gleixner
@ 2025-03-27 13:17 ` Eric Dumazet
2025-03-27 13:43 ` Mateusz Guzik
0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2025-03-27 13:17 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Mateusz Guzik, kernel test robot, oe-lkp, lkp, linux-kernel, x86,
Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27, 2025 at 2:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Thu, Mar 27 2025 at 12:37, Eric Dumazet wrote:
> > On Thu, Mar 27, 2025 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >> Cute. How much bloat does it cause?
> >
> > This would expand 'struct ucounts' by 192 bytes on x86, if the patch
> > was actually working :)
> >
> > Note sure if it is feasible without something more intrusive like
>
> I'm not sure about the actual benefit. The problem is that parallel
> invocations which access the same ucount still will run into contention
> of the cache line they are modifying.
>
> For the signal case, all invocations increment rlimit[SIGPENDING], so
> putting that into a different cache line does not buy a lot.
>
> False sharing is when you have a lot of hot path readers on some other
> member of the data structure, which happens to share the cache line with
> the modified member. But that's not really the case here.
We applications stressing all the counters at the same time (from
different threads)
You seem to focus on posix timers only :)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 13:17 ` Eric Dumazet
@ 2025-03-27 13:43 ` Mateusz Guzik
2025-03-27 13:44 ` Eric Dumazet
0 siblings, 1 reply; 16+ messages in thread
From: Mateusz Guzik @ 2025-03-27 13:43 UTC (permalink / raw)
To: Eric Dumazet
Cc: Thomas Gleixner, kernel test robot, oe-lkp, lkp, linux-kernel,
x86, Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27, 2025 at 2:17 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Mar 27, 2025 at 2:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Thu, Mar 27 2025 at 12:37, Eric Dumazet wrote:
> > > On Thu, Mar 27, 2025 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >> Cute. How much bloat does it cause?
> > >
> > > This would expand 'struct ucounts' by 192 bytes on x86, if the patch
> > > was actually working :)
> > >
> > > Note sure if it is feasible without something more intrusive like
> >
> > I'm not sure about the actual benefit. The problem is that parallel
> > invocations which access the same ucount still will run into contention
> > of the cache line they are modifying.
> >
> > For the signal case, all invocations increment rlimit[SIGPENDING], so
> > putting that into a different cache line does not buy a lot.
> >
> > False sharing is when you have a lot of hot path readers on some other
> > member of the data structure, which happens to share the cache line with
> > the modified member. But that's not really the case here.
>
> We applications stressing all the counters at the same time (from
> different threads)
>
> You seem to focus on posix timers only :)
Well in that case:
(gdb) ptype /o struct ucounts
/* offset | size */ type = struct ucounts {
/* 0 | 16 */ struct hlist_node {
/* 0 | 8 */ struct hlist_node *next;
/* 8 | 8 */ struct hlist_node **pprev;
/* total size (bytes): 16 */
} node;
/* 16 | 8 */ struct user_namespace *ns;
/* 24 | 4 */ kuid_t uid;
/* 28 | 4 */ atomic_t count;
/* 32 | 96 */ atomic_long_t ucount[12];
/* 128 | 256 */ struct {
/* 0 | 8 */ atomic_long_t val;
} rlimit[4];
/* total size (bytes): 384 */
}
This comes from malloc. Given 384 bytes of size it is going to be
backed by a 512-byte sized buffer -- that's a clear cut waste of 128
bytes.
It is plausible creating a 384-byte sized slab for kmalloc would help
save memory overall (not just for this specific struct), but that
would require extensive testing in real workloads. I think Google is
in position to do it on their fleet and android? fwiw Solaris and
FreeBSD do have slabs of this size and it does save memory over there.
I understand it is a tradeoff, hence I'm not claiming this needs to be
added. I do claim it does warrant evaluation, but I wont blame anyone
for not wanting to do dig into it.
The other option is to lean into it. In this case I point out the
refcount shares the cacheline with some of the limits and that it
could be moved to a dedicated line while still keeping the struct <
512 bytes, thus not spending more memory on allocation. the refcount
changes less frequently than limits themselves so it's not a big deal,
but it can be adjusted "for free" if you will.
while here I would probably change the name of the field. A reference
counter named "count" in a struct named "ucounts", followed by an
"ucount" array is rather unpleasing. How about s/count/refcount?
--
Mateusz Guzik <mjguzik gmail.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 13:43 ` Mateusz Guzik
@ 2025-03-27 13:44 ` Eric Dumazet
2025-03-27 13:48 ` Mateusz Guzik
0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2025-03-27 13:44 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Thomas Gleixner, kernel test robot, oe-lkp, lkp, linux-kernel,
x86, Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27, 2025 at 2:43 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Thu, Mar 27, 2025 at 2:17 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Mar 27, 2025 at 2:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > On Thu, Mar 27 2025 at 12:37, Eric Dumazet wrote:
> > > > On Thu, Mar 27, 2025 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >> Cute. How much bloat does it cause?
> > > >
> > > > This would expand 'struct ucounts' by 192 bytes on x86, if the patch
> > > > was actually working :)
> > > >
> > > > Note sure if it is feasible without something more intrusive like
> > >
> > > I'm not sure about the actual benefit. The problem is that parallel
> > > invocations which access the same ucount still will run into contention
> > > of the cache line they are modifying.
> > >
> > > For the signal case, all invocations increment rlimit[SIGPENDING], so
> > > putting that into a different cache line does not buy a lot.
> > >
> > > False sharing is when you have a lot of hot path readers on some other
> > > member of the data structure, which happens to share the cache line with
> > > the modified member. But that's not really the case here.
> >
> > We applications stressing all the counters at the same time (from
> > different threads)
> >
> > You seem to focus on posix timers only :)
>
> Well in that case:
> (gdb) ptype /o struct ucounts
> /* offset | size */ type = struct ucounts {
> /* 0 | 16 */ struct hlist_node {
> /* 0 | 8 */ struct hlist_node *next;
> /* 8 | 8 */ struct hlist_node **pprev;
>
> /* total size (bytes): 16 */
> } node;
> /* 16 | 8 */ struct user_namespace *ns;
> /* 24 | 4 */ kuid_t uid;
> /* 28 | 4 */ atomic_t count;
> /* 32 | 96 */ atomic_long_t ucount[12];
> /* 128 | 256 */ struct {
> /* 0 | 8 */ atomic_long_t val;
> } rlimit[4];
>
> /* total size (bytes): 384 */
> }
>
> This comes from malloc. Given 384 bytes of size it is going to be
> backed by a 512-byte sized buffer -- that's a clear cut waste of 128
> bytes.
>
> It is plausible creating a 384-byte sized slab for kmalloc would help
> save memory overall (not just for this specific struct), but that
> would require extensive testing in real workloads. I think Google is
> in position to do it on their fleet and android? fwiw Solaris and
> FreeBSD do have slabs of this size and it does save memory over there.
> I understand it is a tradeoff, hence I'm not claiming this needs to be
> added. I do claim it does warrant evaluation, but I wont blame anyone
> for not wanting to do dig into it.
>
> The other option is to lean into it. In this case I point out the
> refcount shares the cacheline with some of the limits and that it
> could be moved to a dedicated line while still keeping the struct <
> 512 bytes, thus not spending more memory on allocation. the refcount
> changes less frequently than limits themselves so it's not a big deal,
> but it can be adjusted "for free" if you will.
>
> while here I would probably change the name of the field. A reference
> counter named "count" in a struct named "ucounts", followed by an
> "ucount" array is rather unpleasing. How about s/count/refcount?
How many 'struct ucounts' are in use in a typical host ?
Compared to other costs, this seems pure noise to me.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 13:44 ` Eric Dumazet
@ 2025-03-27 13:48 ` Mateusz Guzik
2025-03-27 20:45 ` David Laight
0 siblings, 1 reply; 16+ messages in thread
From: Mateusz Guzik @ 2025-03-27 13:48 UTC (permalink / raw)
To: Eric Dumazet
Cc: Thomas Gleixner, kernel test robot, oe-lkp, lkp, linux-kernel,
x86, Benjamin Segall, Frederic Weisbecker
On Thu, Mar 27, 2025 at 2:44 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Mar 27, 2025 at 2:43 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > On Thu, Mar 27, 2025 at 2:17 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Thu, Mar 27, 2025 at 2:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >
> > > > On Thu, Mar 27 2025 at 12:37, Eric Dumazet wrote:
> > > > > On Thu, Mar 27, 2025 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >> Cute. How much bloat does it cause?
> > > > >
> > > > > This would expand 'struct ucounts' by 192 bytes on x86, if the patch
> > > > > was actually working :)
> > > > >
> > > > > Note sure if it is feasible without something more intrusive like
> > > >
> > > > I'm not sure about the actual benefit. The problem is that parallel
> > > > invocations which access the same ucount still will run into contention
> > > > of the cache line they are modifying.
> > > >
> > > > For the signal case, all invocations increment rlimit[SIGPENDING], so
> > > > putting that into a different cache line does not buy a lot.
> > > >
> > > > False sharing is when you have a lot of hot path readers on some other
> > > > member of the data structure, which happens to share the cache line with
> > > > the modified member. But that's not really the case here.
> > >
> > > We applications stressing all the counters at the same time (from
> > > different threads)
> > >
> > > You seem to focus on posix timers only :)
> >
> > Well in that case:
> > (gdb) ptype /o struct ucounts
> > /* offset | size */ type = struct ucounts {
> > /* 0 | 16 */ struct hlist_node {
> > /* 0 | 8 */ struct hlist_node *next;
> > /* 8 | 8 */ struct hlist_node **pprev;
> >
> > /* total size (bytes): 16 */
> > } node;
> > /* 16 | 8 */ struct user_namespace *ns;
> > /* 24 | 4 */ kuid_t uid;
> > /* 28 | 4 */ atomic_t count;
> > /* 32 | 96 */ atomic_long_t ucount[12];
> > /* 128 | 256 */ struct {
> > /* 0 | 8 */ atomic_long_t val;
> > } rlimit[4];
> >
> > /* total size (bytes): 384 */
> > }
> >
> > This comes from malloc. Given 384 bytes of size it is going to be
> > backed by a 512-byte sized buffer -- that's a clear cut waste of 128
> > bytes.
> >
> > It is plausible creating a 384-byte sized slab for kmalloc would help
> > save memory overall (not just for this specific struct), but that
> > would require extensive testing in real workloads. I think Google is
> > in position to do it on their fleet and android? fwiw Solaris and
> > FreeBSD do have slabs of this size and it does save memory over there.
> > I understand it is a tradeoff, hence I'm not claiming this needs to be
> > added. I do claim it does warrant evaluation, but I wont blame anyone
> > for not wanting to do dig into it.
> >
> > The other option is to lean into it. In this case I point out the
> > refcount shares the cacheline with some of the limits and that it
> > could be moved to a dedicated line while still keeping the struct <
> > 512 bytes, thus not spending more memory on allocation. the refcount
> > changes less frequently than limits themselves so it's not a big deal,
> > but it can be adjusted "for free" if you will.
> >
> > while here I would probably change the name of the field. A reference
> > counter named "count" in a struct named "ucounts", followed by an
> > "ucount" array is rather unpleasing. How about s/count/refcount?
>
>
> How many 'struct ucounts' are in use in a typical host ?
>
> Compared to other costs, this seems pure noise to me.
I did not claim this is going to increase memory usage in a significant manner.
I claim regardless of this change a 384-byte slab for kmalloc may be
saving memory and this bit may be enough of an excuse to evaluate it,
should someone be interested.
Apart from that I claim that if the 512-byte is going to be used to
back the 384 bytes used by the struct, the patch can trivially move
the refcount to a dedicated cacheline to avoid some of the bouncing
and still fit in the 512-byte allocation. I see no reason to not do
it.
--
Mateusz Guzik <mjguzik gmail.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
2025-03-27 13:48 ` Mateusz Guzik
@ 2025-03-27 20:45 ` David Laight
0 siblings, 0 replies; 16+ messages in thread
From: David Laight @ 2025-03-27 20:45 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Eric Dumazet, Thomas Gleixner, kernel test robot, oe-lkp, lkp,
linux-kernel, x86, Benjamin Segall, Frederic Weisbecker
On Thu, 27 Mar 2025 14:48:37 +0100
Mateusz Guzik <mjguzik@gmail.com> wrote:
> On Thu, Mar 27, 2025 at 2:44 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Mar 27, 2025 at 2:43 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> > >
> > > On Thu, Mar 27, 2025 at 2:17 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Thu, Mar 27, 2025 at 2:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >
> > > > > On Thu, Mar 27 2025 at 12:37, Eric Dumazet wrote:
> > > > > > On Thu, Mar 27, 2025 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > > >> Cute. How much bloat does it cause?
> > > > > >
> > > > > > This would expand 'struct ucounts' by 192 bytes on x86, if the patch
> > > > > > was actually working :)
> > > > > >
> > > > > > Note sure if it is feasible without something more intrusive like
> > > > >
> > > > > I'm not sure about the actual benefit. The problem is that parallel
> > > > > invocations which access the same ucount still will run into contention
> > > > > of the cache line they are modifying.
> > > > >
> > > > > For the signal case, all invocations increment rlimit[SIGPENDING], so
> > > > > putting that into a different cache line does not buy a lot.
> > > > >
> > > > > False sharing is when you have a lot of hot path readers on some other
> > > > > member of the data structure, which happens to share the cache line with
> > > > > the modified member. But that's not really the case here.
> > > >
> > > > We applications stressing all the counters at the same time (from
> > > > different threads)
> > > >
> > > > You seem to focus on posix timers only :)
> > >
> > > Well in that case:
> > > (gdb) ptype /o struct ucounts
> > > /* offset | size */ type = struct ucounts {
> > > /* 0 | 16 */ struct hlist_node {
> > > /* 0 | 8 */ struct hlist_node *next;
> > > /* 8 | 8 */ struct hlist_node **pprev;
> > >
> > > /* total size (bytes): 16 */
> > > } node;
> > > /* 16 | 8 */ struct user_namespace *ns;
> > > /* 24 | 4 */ kuid_t uid;
> > > /* 28 | 4 */ atomic_t count;
> > > /* 32 | 96 */ atomic_long_t ucount[12];
> > > /* 128 | 256 */ struct {
> > > /* 0 | 8 */ atomic_long_t val;
> > > } rlimit[4];
> > >
> > > /* total size (bytes): 384 */
> > > }
> > >
> > > This comes from malloc. Given 384 bytes of size it is going to be
> > > backed by a 512-byte sized buffer -- that's a clear cut waste of 128
> > > bytes.
> > >
> > > It is plausible creating a 384-byte sized slab for kmalloc would help
> > > save memory overall (not just for this specific struct), but that
> > > would require extensive testing in real workloads. I think Google is
> > > in position to do it on their fleet and android? fwiw Solaris and
> > > FreeBSD do have slabs of this size and it does save memory over there.
> > > I understand it is a tradeoff, hence I'm not claiming this needs to be
> > > added. I do claim it does warrant evaluation, but I wont blame anyone
> > > for not wanting to do dig into it.
> > >
> > > The other option is to lean into it. In this case I point out the
> > > refcount shares the cacheline with some of the limits and that it
> > > could be moved to a dedicated line while still keeping the struct <
> > > 512 bytes, thus not spending more memory on allocation. the refcount
> > > changes less frequently than limits themselves so it's not a big deal,
> > > but it can be adjusted "for free" if you will.
> > >
> > > while here I would probably change the name of the field. A reference
> > > counter named "count" in a struct named "ucounts", followed by an
> > > "ucount" array is rather unpleasing. How about s/count/refcount?
> >
> >
> > How many 'struct ucounts' are in use in a typical host ?
> >
> > Compared to other costs, this seems pure noise to me.
>
> I did not claim this is going to increase memory usage in a significant manner.
>
> I claim regardless of this change a 384-byte slab for kmalloc may be
> saving memory and this bit may be enough of an excuse to evaluate it,
> should someone be interested.
>
> Apart from that I claim that if the 512-byte is going to be used to
> back the 384 bytes used by the struct, the patch can trivially move
> the refcount to a dedicated cacheline to avoid some of the bouncing
> and still fit in the 512-byte allocation. I see no reason to not do
> it.
>
What about systems with much larger cache lines?
David
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-03-27 20:45 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-24 6:39 [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression kernel test robot
2025-03-26 8:07 ` Thomas Gleixner
2025-03-26 21:11 ` Mateusz Guzik
2025-03-26 21:43 ` Thomas Gleixner
2025-03-27 6:21 ` Eric Dumazet
2025-03-27 8:10 ` Thomas Gleixner
2025-03-27 8:26 ` Eric Dumazet
2025-03-27 9:11 ` Eric Dumazet
2025-03-27 10:50 ` Thomas Gleixner
2025-03-27 11:37 ` Eric Dumazet
2025-03-27 13:14 ` Thomas Gleixner
2025-03-27 13:17 ` Eric Dumazet
2025-03-27 13:43 ` Mateusz Guzik
2025-03-27 13:44 ` Eric Dumazet
2025-03-27 13:48 ` Mateusz Guzik
2025-03-27 20:45 ` David Laight
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.