* [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression
@ 2026-05-12 12:56 kernel test robot
2026-05-12 16:03 ` Shakeel Butt
0 siblings, 1 reply; 13+ messages in thread
From: kernel test robot @ 2026-05-12 12:56 UTC (permalink / raw)
To: Qi Zheng
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier,
Shakeel Butt, Allen Pais, Axel Rasmussen, Baoquan He,
Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz,
Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner,
Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes,
Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song,
Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan,
Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie,
Zi Yan, Usama Arif, cgroups, linux-mm, oliver.sang
Hello,
kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on:
commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[still regression on linus/master 66edb901bf874d9e0787326ba12d3548b2da8700]
[still regression on linux-next/master b9303e6bff706758c167af686b5315ad00233bf8]
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: switch
method: mq
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+--------------------------------------------------+
| testcase: change | unixbench: unixbench.throughput 3.8% regression |
| test parameters | cpufreq_governor=performance |
| | nr_task=100% |
| | runtime=300s |
| | test=spawn |
+------------------+--------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202605121641.b6a60cb0-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260512/202605121641.b6a60cb0-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/method/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-14/performance/x86_64-rhel-9.4/mq/100%/debian-13-x86_64-20250902.cgz/lkp-spr-r02/switch/stress-ng/60s
commit:
8285917d6f ("mm: memcontrol: prepare for reparenting non-hierarchical stats")
01b9da291c ("mm: memcontrol: convert objcg to be per-memcg per-node type")
8285917d6f383aef 01b9da291c4969354807b52956f
---------------- ---------------------------
%stddev %change %stddev
\ | \
5849 +210.2% 18145 ± 3% stress-ng.switch.nanosecs_per_context_switch_mq_method
2.296e+09 -67.7% 7.408e+08 ± 3% stress-ng.switch.ops
38288993 -67.7% 12355813 ± 3% stress-ng.switch.ops_per_sec
93416932 -68.6% 29310048 ± 3% stress-ng.time.involuntary_context_switches
15845 +11.0% 17584 stress-ng.time.percent_of_cpu_this_job_got
8556 +18.2% 10115 stress-ng.time.system_time
963.36 -53.5% 447.72 ± 3% stress-ng.time.user_time
1.518e+09 -69.7% 4.607e+08 ± 2% stress-ng.time.voluntary_context_switches
1124 ± 17% +34.3% 1509 ± 8% perf-c2c.HITM.remote
2.55e+09 -12.3% 2.236e+09 ± 3% cpuidle..time
8.29e+08 -71.8% 2.337e+08 ± 2% cpuidle..usage
4102960 ± 5% -19.0% 3324393 ± 4% numa-numastat.node1.local_node
4218983 ± 3% -18.7% 3430325 ± 3% numa-numastat.node1.numa_hit
14184409 ± 2% -16.4% 11860068 vmstat.memory.cache
39204964 -69.7% 11868752 ± 2% vmstat.system.cs
1808848 -38.5% 1111830 vmstat.system.in
22.48 -4.4 18.08 mpstat.cpu.all.idle%
1.13 -0.4 0.73 mpstat.cpu.all.irq%
0.10 -0.0 0.09 mpstat.cpu.all.soft%
67.98 +9.1 77.06 mpstat.cpu.all.sys%
8.32 -4.3 4.04 ± 2% mpstat.cpu.all.usr%
17.33 ± 2% +15.4% 20.00 ± 4% mpstat.max_utilization.seconds
10552401 ± 4% -23.3% 8092823 numa-meminfo.node1.Active
10552392 ± 4% -23.3% 8092820 numa-meminfo.node1.Active(anon)
12454155 ± 15% -34.9% 8106052 numa-meminfo.node1.FilePages
559046 ± 8% -19.2% 451929 ± 2% numa-meminfo.node1.Mapped
14688311 ± 13% -30.0% 10285394 ± 2% numa-meminfo.node1.MemUsed
10028979 ± 3% -22.4% 7783864 numa-meminfo.node1.Shmem
2638537 ± 4% -23.3% 2022639 numa-vmstat.node1.nr_active_anon
3113944 ± 15% -34.9% 2025946 numa-vmstat.node1.nr_file_pages
139848 ± 9% -19.3% 112912 ± 2% numa-vmstat.node1.nr_mapped
2507650 ± 3% -22.4% 1945399 numa-vmstat.node1.nr_shmem
2638531 ± 4% -23.3% 2022634 numa-vmstat.node1.nr_zone_active_anon
4219206 ± 3% -18.7% 3430093 ± 3% numa-vmstat.node1.numa_hit
4103183 ± 4% -19.0% 3324161 ± 4% numa-vmstat.node1.numa_local
10939677 ± 2% -21.0% 8641166 meminfo.Active
10939661 ± 2% -21.0% 8641149 meminfo.Active(anon)
13917673 ± 2% -16.4% 11633722 meminfo.Cached
14400924 ± 2% -16.0% 12102150 meminfo.Committed_AS
8394752 ± 5% +16.7% 9796949 ± 8% meminfo.DirectMap2M
617671 -12.0% 543559 meminfo.Mapped
18364992 -12.5% 16065468 meminfo.Memused
10124702 ± 2% -22.6% 7839682 meminfo.Shmem
18393665 -12.5% 16100473 meminfo.max_used_kB
10.59 -7.6 2.97 ± 8% turbostat.C1%
0.85 ± 3% +9.1 9.96 ± 2% turbostat.C1E%
1.29 ± 6% +19.4% 1.54 ± 2% turbostat.CPU%c1
48.67 ± 2% -15.1% 41.33 ± 3% turbostat.CoreTmp
0.56 -60.7% 0.22 ± 3% turbostat.IPC
1.153e+08 -38.7% 70680365 turbostat.IRQ
10242404 -14.8% 8723704 turbostat.NMI
88.65 -84.0 4.67 ± 33% turbostat.PKG_%
3.82 -3.8 0.04 ± 10% turbostat.POLL%
48.67 ± 2% -13.7% 42.00 ± 3% turbostat.PkgTmp
683.77 -13.1% 594.00 turbostat.PkgWatt
18.74 -3.3% 18.13 turbostat.RAMWatt
2735312 ± 2% -21.0% 2160742 proc-vmstat.nr_active_anon
204708 -1.6% 201435 proc-vmstat.nr_anon_pages
3479812 ± 2% -16.4% 2908863 proc-vmstat.nr_file_pages
154477 -12.0% 135959 proc-vmstat.nr_mapped
2531568 ± 2% -22.6% 1960353 proc-vmstat.nr_shmem
42010 -3.5% 40543 proc-vmstat.nr_slab_reclaimable
2735312 ± 2% -21.0% 2160742 proc-vmstat.nr_zone_active_anon
210167 ± 5% -11.5% 185950 ± 11% proc-vmstat.numa_hint_faults
4730338 ± 2% -18.2% 3871343 proc-vmstat.numa_hit
4498551 ± 2% -19.1% 3639783 proc-vmstat.numa_local
4808959 ± 2% -17.8% 3954157 proc-vmstat.pgalloc_normal
806619 -5.1% 765525 ± 2% proc-vmstat.pgfault
34098 ± 3% -14.8% 29054 proc-vmstat.pgreuse
0.11 +59.9% 0.17 ± 3% perf-stat.i.MPKI
6.653e+10 -61.7% 2.546e+10 ± 2% perf-stat.i.branch-instructions
0.76 +0.1 0.89 perf-stat.i.branch-miss-rate%
4.685e+08 -59.7% 1.888e+08 ± 2% perf-stat.i.branch-misses
1.12 +0.6 1.76 ± 3% perf-stat.i.cache-miss-rate%
35553724 ± 3% -40.4% 21188697 perf-stat.i.cache-misses
4.194e+09 -68.3% 1.331e+09 ± 2% perf-stat.i.cache-references
40710745 -69.6% 12395879 ± 2% perf-stat.i.context-switches
1.84 +189.1% 5.31 ± 2% perf-stat.i.cpi
5.965e+11 -2.0% 5.848e+11 perf-stat.i.cpu-cycles
8813175 -64.5% 3125097 ± 2% perf-stat.i.cpu-migrations
24447 ± 3% +68.5% 41184 ± 2% perf-stat.i.cycles-between-cache-misses
3.374e+11 -61.8% 1.287e+11 ± 2% perf-stat.i.instructions
0.57 -60.8% 0.22 ± 2% perf-stat.i.ipc
221.10 -68.6% 69.32 ± 2% perf-stat.i.metric.K/sec
11782 ± 3% -6.1% 11068 ± 3% perf-stat.i.minor-faults
11782 ± 3% -6.1% 11068 ± 3% perf-stat.i.page-faults
0.10 ± 2% +59.2% 0.17 ± 3% perf-stat.overall.MPKI
0.71 +0.0 0.75 perf-stat.overall.branch-miss-rate%
0.83 ± 3% +0.7 1.56 ± 3% perf-stat.overall.cache-miss-rate%
1.78 +162.2% 4.67 ± 2% perf-stat.overall.cpi
17181 ± 3% +64.6% 28283 perf-stat.overall.cycles-between-cache-misses
0.56 -61.8% 0.21 ± 2% perf-stat.overall.ipc
6.388e+10 -62.3% 2.409e+10 ± 2% perf-stat.ps.branch-instructions
4.538e+08 -60.0% 1.817e+08 ± 2% perf-stat.ps.branch-misses
33674051 ± 3% -40.1% 20155290 perf-stat.ps.cache-misses
4.077e+09 -68.2% 1.296e+09 ± 2% perf-stat.ps.cache-references
39570629 -69.5% 12072702 ± 2% perf-stat.ps.context-switches
5.78e+11 -1.4% 5.7e+11 perf-stat.ps.cpu-cycles
8584979 -64.5% 3051930 ± 2% perf-stat.ps.cpu-migrations
3.243e+11 -62.4% 1.22e+11 ± 2% perf-stat.ps.instructions
11022 ± 4% -6.5% 10300 ± 3% perf-stat.ps.minor-faults
11022 ± 4% -6.5% 10300 ± 3% perf-stat.ps.page-faults
1.941e+13 -61.9% 7.405e+12 ± 3% perf-stat.total.instructions
18451 +9.9% 20272 sched_debug.cfs_rq:/.avg_vruntime.avg
5869 ± 4% -7.4% 5437 ± 5% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.68 ± 2% -12.9% 0.59 ± 3% sched_debug.cfs_rq:/.h_nr_queued.stddev
0.62 ± 6% -12.9% 0.54 ± 2% sched_debug.cfs_rq:/.h_nr_runnable.stddev
8469 +12.7% 9544 ± 2% sched_debug.cfs_rq:/.left_deadline.stddev
8467 +12.7% 9542 ± 2% sched_debug.cfs_rq:/.left_vruntime.stddev
3513124 ± 25% -30.0% 2459550 ± 10% sched_debug.cfs_rq:/.load.max
588329 ± 5% -11.2% 522578 ± 5% sched_debug.cfs_rq:/.load.stddev
50699 ± 17% -19.8% 40655 ± 7% sched_debug.cfs_rq:/.load_avg.max
0.68 ± 2% -12.9% 0.59 ± 3% sched_debug.cfs_rq:/.nr_queued.stddev
38.80 ± 32% +108.5% 80.90 ± 16% sched_debug.cfs_rq:/.removed.load_avg.avg
857.83 ± 12% +61.0% 1381 ± 12% sched_debug.cfs_rq:/.removed.load_avg.max
152.02 ± 18% +57.2% 239.02 ± 11% sched_debug.cfs_rq:/.removed.load_avg.stddev
26.08 ± 28% +143.0% 63.37 ± 14% sched_debug.cfs_rq:/.removed.runnable_avg.avg
547.00 ± 13% +88.7% 1032 ± 12% sched_debug.cfs_rq:/.removed.runnable_avg.max
94.86 ± 17% +84.3% 174.82 ± 9% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
9.09 ± 52% +253.3% 32.11 ± 17% sched_debug.cfs_rq:/.removed.util_avg.avg
275.17 ± 3% +130.3% 633.67 ± 9% sched_debug.cfs_rq:/.removed.util_avg.max
44.90 ± 30% +126.4% 101.66 ± 11% sched_debug.cfs_rq:/.removed.util_avg.stddev
8467 +12.7% 9542 ± 2% sched_debug.cfs_rq:/.right_vruntime.stddev
659.63 ± 3% +13.0% 745.47 sched_debug.cfs_rq:/.runnable_avg.avg
271.34 ± 2% +31.2% 355.98 ± 3% sched_debug.cfs_rq:/.runnable_avg.stddev
0.00 ± 26% +110.4% 0.00 ± 45% sched_debug.cfs_rq:/.spread.avg
0.01 ± 13% +174.3% 0.02 ± 25% sched_debug.cfs_rq:/.spread.max
0.00 ± 7% +146.2% 0.00 ± 27% sched_debug.cfs_rq:/.spread.stddev
431.00 +14.5% 493.62 sched_debug.cfs_rq:/.util_avg.avg
1061 ± 3% +26.4% 1341 ± 3% sched_debug.cfs_rq:/.util_avg.max
151.53 ± 5% +50.1% 227.46 ± 2% sched_debug.cfs_rq:/.util_avg.stddev
206.96 +17.5% 243.18 ± 3% sched_debug.cfs_rq:/.util_est.avg
18451 +9.9% 20272 sched_debug.cfs_rq:/.zero_vruntime.avg
5869 ± 4% -7.4% 5437 ± 5% sched_debug.cfs_rq:/.zero_vruntime.stddev
2345 +33.6% 3133 ± 5% sched_debug.cpu.avg_idle.min
13.18 ± 2% +39.8% 18.42 ± 6% sched_debug.cpu.clock.stddev
3961 +14.6% 4541 sched_debug.cpu.curr->pid.avg
3213 -15.4% 2718 sched_debug.cpu.curr->pid.stddev
0.00 ± 29% +157.3% 0.00 ± 35% sched_debug.cpu.next_balance.stddev
0.70 -15.8% 0.59 ± 3% sched_debug.cpu.nr_running.stddev
5474800 -69.7% 1660250 ± 2% sched_debug.cpu.nr_switches.avg
5648642 -65.5% 1946319 ± 5% sched_debug.cpu.nr_switches.max
2229198 ± 8% -67.1% 734011 ± 20% sched_debug.cpu.nr_switches.min
297592 ± 6% -25.9% 220513 ± 18% sched_debug.cpu.nr_switches.stddev
23.75 -10.9 12.88 perf-profile.calltrace.cycles-pp.common_startup_64
23.65 -10.8 12.82 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
23.62 -10.8 12.81 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
23.51 -10.8 12.76 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
12.93 -7.0 5.94 ± 4% perf-profile.calltrace.cycles-pp.wake_up_q.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.78 -6.9 5.89 ± 4% perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_up_q.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64
11.30 -5.2 6.07 ± 3% perf-profile.calltrace.cycles-pp.wake_up_q.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe
11.17 -5.2 6.02 ± 3% perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_up_q.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64
9.89 -4.8 5.08 ± 4% perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.wake_up_q
12.41 -4.4 7.96 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
9.19 -4.4 4.77 ± 4% perf-profile.calltrace.cycles-pp.select_idle_cpu.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up
11.29 -4.0 7.29 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
11.38 -4.0 7.40 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
8.04 -3.9 4.13 ± 4% perf-profile.calltrace.cycles-pp.select_idle_core.select_idle_cpu.select_idle_sibling.select_task_rq_fair.select_task_rq
6.91 -3.8 3.08 ± 4% perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.wake_up_q.do_mq_timedreceive.__x64_sys_mq_timedreceive
6.76 -3.8 3.00 ± 4% perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.wake_up_q.do_mq_timedreceive
8.71 -3.7 5.00 perf-profile.calltrace.cycles-pp.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.71 -3.4 2.26 ± 4% perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.common_startup_64
8.12 -3.4 4.72 perf-profile.calltrace.cycles-pp.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64
7.76 -3.2 4.55 perf-profile.calltrace.cycles-pp.schedule.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend
8.39 -3.1 5.26 perf-profile.calltrace.cycles-pp.wq_sleep.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.47 -3.1 4.35 perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend
4.92 -2.9 2.00 ± 4% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary
7.79 -2.8 4.97 perf-profile.calltrace.cycles-pp.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64
7.48 -2.7 4.79 perf-profile.calltrace.cycles-pp.schedule.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedreceive.__x64_sys_mq_timedreceive
7.17 -2.6 4.60 perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedreceive
5.54 -2.3 3.24 ± 4% perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.wake_up_q.do_mq_timedsend.__x64_sys_mq_timedsend
5.41 -2.2 3.16 ± 4% perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.wake_up_q.do_mq_timedsend
3.88 -2.2 1.67 ± 4% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry
4.04 -2.2 1.88 perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
3.44 -2.2 1.28 ± 3% perf-profile.calltrace.cycles-pp.store_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.84 -2.0 1.80 perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
4.83 -2.0 2.85 perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.schedule_hrtimeout_range_clock.wq_sleep
4.71 -1.9 2.78 perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule_hrtimeout_range_clock
4.39 -1.8 2.58 perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule
2.37 -1.5 0.84 ± 3% perf-profile.calltrace.cycles-pp._copy_to_user.store_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64
2.66 -1.4 1.26 ± 4% perf-profile.calltrace.cycles-pp.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle
2.73 -1.4 1.33 ± 4% perf-profile.calltrace.cycles-pp.arch_exit_to_user_mode_prepare.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.50 -1.3 2.17 perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule
2.37 -1.3 1.06 ± 2% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.10 -1.1 3.03 perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.schedule_hrtimeout_range_clock.wq_sleep
2.20 -1.1 1.13 ± 5% perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue
2.20 -1.1 1.15 ± 4% perf-profile.calltrace.cycles-pp.switch_fpu_return.arch_exit_to_user_mode_prepare.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.74 -1.0 0.73 ± 3% perf-profile.calltrace.cycles-pp.msg_get.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.32 -1.0 0.37 ± 70% perf-profile.calltrace.cycles-pp.do_perf_trace_sched_wakeup_template.perf_trace_sched_wakeup_template.try_to_wake_up.wake_up_q.do_mq_timedreceive
1.93 -0.9 0.99 ± 4% perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue
1.34 -0.8 0.54 ± 5% perf-profile.calltrace.cycles-pp.perf_trace_sched_wakeup_template.try_to_wake_up.wake_up_q.do_mq_timedreceive.__x64_sys_mq_timedreceive
1.52 -0.8 0.73 ± 4% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.enqueue_task.ttwu_do_activate.sched_ttwu_pending
1.05 -0.7 0.34 ± 70% perf-profile.calltrace.cycles-pp.__switch_to
1.57 -0.7 0.90 ± 6% perf-profile.calltrace.cycles-pp.restore_fpregs_from_fpstate.switch_fpu_return.arch_exit_to_user_mode_prepare.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.56 -0.3 2.28 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedule_hrtimeout_range_clock
5.87 +0.9 6.78 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.00 +0.9 0.91 ± 30% perf-profile.calltrace.cycles-pp.sched_balance_newidle.pick_next_task_fair.__pick_next_task.__schedule.schedule
35.80 +3.5 39.29 perf-profile.calltrace.cycles-pp.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe
35.59 +3.6 39.21 perf-profile.calltrace.cycles-pp.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +4.4 4.37 ± 3% perf-profile.calltrace.cycles-pp.drain_obj_stock.__refill_obj_stock.__memcg_slab_post_alloc_hook.__kmalloc_node_noprof.load_msg
0.00 +4.4 4.39 ± 4% perf-profile.calltrace.cycles-pp.drain_obj_stock.__refill_obj_stock.__memcg_slab_free_hook.kfree.free_msg
0.00 +8.0 8.01 ± 4% perf-profile.calltrace.cycles-pp.__refill_obj_stock.__memcg_slab_post_alloc_hook.__kmalloc_node_noprof.load_msg.do_mq_timedsend
0.00 +8.3 8.29 ± 4% perf-profile.calltrace.cycles-pp.__refill_obj_stock.__memcg_slab_free_hook.kfree.free_msg.do_mq_timedreceive
28.51 +13.5 41.99 perf-profile.calltrace.cycles-pp.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe
28.23 +13.5 41.71 perf-profile.calltrace.cycles-pp.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe
70.69 +13.7 84.35 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
70.44 +13.8 84.26 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.99 +20.2 23.24 ± 2% perf-profile.calltrace.cycles-pp.free_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.79 +20.4 23.15 ± 2% perf-profile.calltrace.cycles-pp.kfree.free_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64
2.43 +20.6 22.98 ± 2% perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kfree.free_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive
2.26 +26.0 28.23 ± 2% perf-profile.calltrace.cycles-pp.load_msg.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.99 +26.8 27.80 ± 2% perf-profile.calltrace.cycles-pp.__kmalloc_node_noprof.load_msg.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64
0.65 +27.0 27.62 ± 2% perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_noprof.load_msg.do_mq_timedsend.__x64_sys_mq_timedsend
24.25 -12.2 12.02 ± 4% perf-profile.children.cycles-pp.wake_up_q
24.00 -12.1 11.93 ± 4% perf-profile.children.cycles-pp.try_to_wake_up
23.75 -10.9 12.88 perf-profile.children.cycles-pp.common_startup_64
23.75 -10.9 12.88 perf-profile.children.cycles-pp.cpu_startup_entry
23.68 -10.8 12.85 perf-profile.children.cycles-pp.do_idle
23.65 -10.8 12.82 perf-profile.children.cycles-pp.start_secondary
19.65 -8.3 11.33 perf-profile.children.cycles-pp.__schedule
17.14 -6.9 10.28 perf-profile.children.cycles-pp.wq_sleep
16.24 -6.4 9.82 perf-profile.children.cycles-pp.schedule
15.92 -6.2 9.69 perf-profile.children.cycles-pp.schedule_hrtimeout_range_clock
12.46 -6.1 6.32 ± 4% perf-profile.children.cycles-pp.select_task_rq
12.19 -6.0 6.17 ± 4% perf-profile.children.cycles-pp.select_task_rq_fair
9.91 -4.8 5.09 ± 4% perf-profile.children.cycles-pp.select_idle_sibling
9.27 -4.5 4.80 ± 4% perf-profile.children.cycles-pp.select_idle_cpu
12.49 -4.5 8.03 perf-profile.children.cycles-pp.cpuidle_idle_call
11.41 -4.0 7.39 perf-profile.children.cycles-pp.cpuidle_enter_state
11.44 -4.0 7.43 perf-profile.children.cycles-pp.cpuidle_enter
8.17 -4.0 4.18 ± 4% perf-profile.children.cycles-pp.select_idle_core
5.76 -3.5 2.28 ± 4% perf-profile.children.cycles-pp.flush_smp_call_function_queue
5.47 -3.1 2.35 ± 4% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
4.30 -2.4 1.94 ± 4% perf-profile.children.cycles-pp.sched_ttwu_pending
4.08 -2.2 1.90 perf-profile.children.cycles-pp.schedule_idle
3.47 -2.2 1.30 ± 2% perf-profile.children.cycles-pp.store_msg
5.87 -2.1 3.78 perf-profile.children.cycles-pp.__pick_next_task
4.84 -2.0 2.86 perf-profile.children.cycles-pp.try_to_block_task
4.73 -1.9 2.79 perf-profile.children.cycles-pp.dequeue_entities
4.73 -1.9 2.82 perf-profile.children.cycles-pp.dequeue_task_fair
4.04 -1.8 2.26 perf-profile.children.cycles-pp._raw_spin_lock
3.72 -1.7 2.03 ± 4% perf-profile.children.cycles-pp.enqueue_task
2.40 -1.6 0.85 ± 3% perf-profile.children.cycles-pp._copy_to_user
3.39 -1.5 1.86 ± 3% perf-profile.children.cycles-pp.ttwu_do_activate
2.56 -1.5 1.04 ± 5% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
2.54 -1.5 1.03 ± 5% perf-profile.children.cycles-pp.do_perf_trace_sched_wakeup_template
2.76 -1.4 1.34 ± 4% perf-profile.children.cycles-pp.arch_exit_to_user_mode_prepare
3.75 -1.4 2.34 perf-profile.children.cycles-pp.dequeue_entity
2.32 -1.4 0.95 ± 4% perf-profile.children.cycles-pp.ttwu_queue_wakelist
2.72 -1.3 1.38 ± 2% perf-profile.children.cycles-pp.update_curr
2.38 -1.3 1.06 ± 2% perf-profile.children.cycles-pp.exit_to_user_mode_loop
4.24 -1.2 2.99 perf-profile.children.cycles-pp.pick_next_task_fair
2.21 -1.1 1.15 ± 5% perf-profile.children.cycles-pp.switch_fpu_return
1.76 -1.0 0.73 ± 3% perf-profile.children.cycles-pp.msg_get
1.67 -1.0 0.65 ± 3% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
2.47 -1.0 1.47 ± 3% perf-profile.children.cycles-pp.enqueue_task_fair
2.39 -1.0 1.42 perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
2.29 -1.0 1.32 perf-profile.children.cycles-pp.update_load_avg
1.60 -1.0 0.64 perf-profile.children.cycles-pp.switch_mm_irqs_off
1.51 -0.9 0.57 ± 5% perf-profile.children.cycles-pp.__smp_call_single_queue
1.84 -0.9 0.93 ± 6% perf-profile.children.cycles-pp.wake_affine
1.49 -0.9 0.59 perf-profile.children.cycles-pp.__check_object_size
1.61 -0.9 0.72 ± 5% perf-profile.children.cycles-pp.update_rq_clock_task
1.73 -0.9 0.87 ± 2% perf-profile.children.cycles-pp.update_se
1.51 -0.9 0.65 ± 2% perf-profile.children.cycles-pp.wakeup_preempt
2.00 -0.8 1.15 ± 3% perf-profile.children.cycles-pp.enqueue_entity
1.26 -0.8 0.42 ± 3% perf-profile.children.cycles-pp.__wake_up
1.31 -0.7 0.58 ± 5% perf-profile.children.cycles-pp.set_task_cpu
1.15 -0.7 0.45 ± 3% perf-profile.children.cycles-pp.msg_insert
1.04 -0.7 0.35 ± 5% perf-profile.children.cycles-pp.perf_trace_buf_alloc
1.57 -0.7 0.90 ± 5% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
1.00 -0.7 0.34 ± 7% perf-profile.children.cycles-pp.perf_swevent_get_recursion_context
0.95 -0.7 0.29 ± 4% perf-profile.children.cycles-pp.llist_reverse_order
1.14 -0.6 0.50 ± 5% perf-profile.children.cycles-pp.migrate_task_rq_fair
1.11 -0.6 0.51 perf-profile.children.cycles-pp.set_next_entity
0.95 -0.6 0.39 ± 2% perf-profile.children.cycles-pp.__update_idle_core
1.04 -0.6 0.49 ± 2% perf-profile.children.cycles-pp.pick_task_fair
1.15 -0.6 0.60 ± 7% perf-profile.children.cycles-pp.task_h_load
0.84 -0.5 0.30 ± 5% perf-profile.children.cycles-pp.call_function_single_prep_ipi
1.06 -0.5 0.52 perf-profile.children.cycles-pp.set_next_task_idle
1.04 -0.5 0.51 perf-profile.children.cycles-pp._find_next_bit
1.28 -0.5 0.75 perf-profile.children.cycles-pp.__switch_to
1.38 -0.5 0.85 perf-profile.children.cycles-pp.update_cfs_rq_load_avg
0.85 -0.5 0.34 ± 3% perf-profile.children.cycles-pp.cpuacct_charge
0.77 ± 4% -0.5 0.25 perf-profile.children.cycles-pp.__bitmap_andnot
0.88 -0.5 0.41 ± 6% perf-profile.children.cycles-pp.update_entity_lag
0.94 -0.5 0.48 perf-profile.children.cycles-pp.prepare_task_switch
0.74 -0.4 0.31 ± 2% perf-profile.children.cycles-pp.check_heap_object
0.75 -0.4 0.32 ± 5% perf-profile.children.cycles-pp.requeue_delayed_entity
0.80 -0.4 0.40 perf-profile.children.cycles-pp.wakeup_preempt_fair
0.55 -0.4 0.16 ± 2% perf-profile.children.cycles-pp.native_sched_clock
0.57 -0.4 0.18 ± 4% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.55 -0.4 0.17 perf-profile.children.cycles-pp.os_xsave
0.58 -0.4 0.20 perf-profile.children.cycles-pp.sched_clock_cpu
1.18 -0.4 0.81 perf-profile.children.cycles-pp.___task_rq_lock
1.39 -0.4 1.01 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.51 ± 3% -0.4 0.14 ± 3% perf-profile.children.cycles-pp._copy_from_user
0.56 -0.4 0.19 ± 2% perf-profile.children.cycles-pp.update_rq_clock
0.51 -0.3 0.17 ± 2% perf-profile.children.cycles-pp.sched_clock
0.56 -0.3 0.23 ± 3% perf-profile.children.cycles-pp.simple_inode_init_ts
0.89 -0.3 0.56 perf-profile.children.cycles-pp.put_prev_entity
0.53 ± 3% -0.3 0.21 ± 2% perf-profile.children.cycles-pp.__put_user_4
0.68 ± 12% -0.3 0.36 ± 4% perf-profile.children.cycles-pp.stress_switch_mq
0.52 -0.3 0.20 perf-profile.children.cycles-pp.set_next_task_fair
0.55 -0.3 0.24 ± 5% perf-profile.children.cycles-pp.__switch_to_asm
0.51 -0.3 0.21 ± 3% perf-profile.children.cycles-pp.inode_set_ctime_current
0.56 -0.3 0.26 ± 3% perf-profile.children.cycles-pp.fdget
0.52 -0.3 0.23 ± 3% perf-profile.children.cycles-pp.mm_cid_switch_to
0.54 -0.3 0.25 ± 3% perf-profile.children.cycles-pp.remove_entity_load_avg
0.79 -0.3 0.51 perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.37 -0.3 0.11 ± 7% perf-profile.children.cycles-pp.__resched_curr
0.57 -0.2 0.32 perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.35 -0.2 0.12 ± 4% perf-profile.children.cycles-pp.avg_vruntime
0.62 -0.2 0.39 ± 2% perf-profile.children.cycles-pp.sysvec_call_function_single
0.34 -0.2 0.11 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.58 -0.2 0.36 perf-profile.children.cycles-pp.__enqueue_entity
0.46 -0.2 0.24 perf-profile.children.cycles-pp.__pick_eevdf
0.52 -0.2 0.30 ± 3% perf-profile.children.cycles-pp.perf_tp_event
0.35 -0.2 0.14 ± 3% perf-profile.children.cycles-pp.__wrgsbase_inactive
0.56 -0.2 0.36 perf-profile.children.cycles-pp.__sysvec_call_function_single
0.33 -0.2 0.13 ± 3% perf-profile.children.cycles-pp.__update_load_avg_se
0.31 ± 2% -0.2 0.12 ± 4% perf-profile.children.cycles-pp.__virt_addr_valid
0.58 -0.2 0.40 perf-profile.children.cycles-pp.menu_select
0.32 ± 5% -0.2 0.13 perf-profile.children.cycles-pp.__check_heap_object
0.31 -0.2 0.13 ± 3% perf-profile.children.cycles-pp.place_entity
0.32 -0.2 0.14 perf-profile.children.cycles-pp.tick_nohz_idle_enter
0.36 -0.2 0.18 ± 2% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.29 -0.2 0.11 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.33 -0.2 0.16 ± 3% perf-profile.children.cycles-pp.do_perf_trace_sched_stat_runtime
0.35 -0.2 0.18 ± 2% perf-profile.children.cycles-pp.ktime_get
0.25 -0.2 0.08 ± 5% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.37 -0.2 0.20 ± 4% perf-profile.children.cycles-pp.___perf_sw_event
0.23 -0.2 0.07 ± 6% perf-profile.children.cycles-pp.tick_nohz_idle_exit
0.22 -0.2 0.07 perf-profile.children.cycles-pp.read_tsc
0.21 -0.1 0.06 ± 7% perf-profile.children.cycles-pp.__rdgsbase_inactive
0.25 -0.1 0.11 ± 7% perf-profile.children.cycles-pp.strnlen
0.32 -0.1 0.18 perf-profile.children.cycles-pp.__dequeue_entity
0.23 ± 2% -0.1 0.10 ± 4% perf-profile.children.cycles-pp.check_stack_object
0.23 ± 2% -0.1 0.09 ± 5% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.53 -0.1 0.40 ± 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.30 ± 3% -0.1 0.17 ± 4% perf-profile.children.cycles-pp.tick_nohz_handler
0.20 -0.1 0.07 perf-profile.children.cycles-pp.__account_obj_stock
0.31 ± 2% -0.1 0.18 ± 2% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.41 ± 2% -0.1 0.29 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.41 -0.1 0.28 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt
0.28 ± 2% -0.1 0.16 ± 6% perf-profile.children.cycles-pp.update_process_times
0.33 -0.1 0.21 ± 6% perf-profile.children.cycles-pp.attach_entity_load_avg
0.19 ± 2% -0.1 0.07 ± 7% perf-profile.children.cycles-pp.wake_q_add_safe
0.23 ± 2% -0.1 0.11 perf-profile.children.cycles-pp.__kmalloc_cache_noprof
0.18 -0.1 0.07 ± 6% perf-profile.children.cycles-pp.nohz_run_idle_balance
0.56 -0.1 0.46 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.15 -0.1 0.05 perf-profile.children.cycles-pp._raw_spin_unlock
0.17 -0.1 0.08 perf-profile.children.cycles-pp.security_msg_msg_free
0.15 -0.1 0.06 perf-profile.children.cycles-pp.inode_set_ctime_to_ts
0.16 ± 2% -0.1 0.08 ± 5% perf-profile.children.cycles-pp.dl_server_update
0.13 -0.1 0.06 ± 8% perf-profile.children.cycles-pp.timestamp_truncate
0.15 ± 3% -0.1 0.08 perf-profile.children.cycles-pp.perf_trace_buf_update
0.13 ± 3% -0.1 0.06 perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64_mg
0.13 ± 3% -0.1 0.06 perf-profile.children.cycles-pp.migrate_disable_switch
0.12 ± 6% -0.1 0.05 ± 8% perf-profile.children.cycles-pp.__cgroup_account_cputime
0.11 ± 4% -0.1 0.05 perf-profile.children.cycles-pp.cpuidle_governor_latency_req
0.14 -0.1 0.08 ± 5% perf-profile.children.cycles-pp.update_curr_dl_se
0.10 -0.1 0.05 perf-profile.children.cycles-pp.ct_kernel_exit
0.11 -0.1 0.06 perf-profile.children.cycles-pp.tracing_gen_ctx_irq_test
0.10 ± 4% -0.0 0.05 perf-profile.children.cycles-pp.__rb_insert_augmented
0.10 -0.0 0.06 ± 8% perf-profile.children.cycles-pp.rest_init
0.10 -0.0 0.06 ± 8% perf-profile.children.cycles-pp.start_kernel
0.10 -0.0 0.06 ± 8% perf-profile.children.cycles-pp.x86_64_start_kernel
0.10 -0.0 0.06 ± 8% perf-profile.children.cycles-pp.x86_64_start_reservations
0.08 ± 10% -0.0 0.04 ± 71% perf-profile.children.cycles-pp.mq_timedreceive
0.15 -0.0 0.11 ± 4% perf-profile.children.cycles-pp.vruntime_eligible
0.13 -0.0 0.09 perf-profile.children.cycles-pp.put_prev_task_fair
0.09 -0.0 0.05 ± 8% perf-profile.children.cycles-pp.native_irq_return_iret
0.08 -0.0 0.05 perf-profile.children.cycles-pp.choose_new_asid
0.13 -0.0 0.11 perf-profile.children.cycles-pp.__irq_exit_rcu
0.07 -0.0 0.05 perf-profile.children.cycles-pp.__set_next_task_fair
0.76 -0.0 0.74 perf-profile.children.cycles-pp.finish_task_switch
0.09 -0.0 0.07 ± 6% perf-profile.children.cycles-pp.propagate_entity_load_avg
0.07 -0.0 0.06 perf-profile.children.cycles-pp.clockevents_program_event
0.10 -0.0 0.09 perf-profile.children.cycles-pp.handle_softirqs
0.07 +0.0 0.08 perf-profile.children.cycles-pp.perf_swevent_event
0.48 +0.0 0.49 perf-profile.children.cycles-pp.process_simple
0.05 +0.0 0.06 ± 7% perf-profile.children.cycles-pp.sched_update_worker
0.05 +0.0 0.07 perf-profile.children.cycles-pp.arch_cpu_idle_enter
0.07 ± 11% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.mq_timedsend
0.15 ± 3% +0.0 0.20 ± 4% perf-profile.children.cycles-pp.x64_sys_call
0.00 +0.1 0.05 perf-profile.children.cycles-pp.__sched_balance_update_blocked_averages
0.00 +0.1 0.05 perf-profile.children.cycles-pp.update_cfs_group
0.00 +0.1 0.06 ± 23% perf-profile.children.cycles-pp.generic_perform_write
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.detach_tasks
0.00 +0.1 0.06 ± 29% perf-profile.children.cycles-pp.shmem_file_write_iter
0.00 +0.1 0.06 ± 29% perf-profile.children.cycles-pp.vfs_write
0.00 +0.1 0.07 ± 25% perf-profile.children.cycles-pp.ksys_write
0.00 +0.1 0.08 ± 30% perf-profile.children.cycles-pp.record__pushfn
0.04 ± 71% +0.1 0.12 ± 35% perf-profile.children.cycles-pp.perf_mmap__push
0.54 ± 2% +0.1 0.62 ± 7% perf-profile.children.cycles-pp.cmd_record
0.04 ± 71% +0.1 0.13 ± 35% perf-profile.children.cycles-pp.handle_internal_command
0.04 ± 71% +0.1 0.13 ± 35% perf-profile.children.cycles-pp.main
0.04 ± 70% +0.1 0.13 ± 32% perf-profile.children.cycles-pp.record__mmap_read_evlist
0.04 ± 71% +0.1 0.13 ± 35% perf-profile.children.cycles-pp.run_builtin
0.10 ± 4% +0.1 0.20 ± 4% perf-profile.children.cycles-pp.do_perf_trace_sched_switch
0.00 +0.1 0.10 ± 4% perf-profile.children.cycles-pp.ct_idle_enter
0.13 ± 3% +0.2 0.29 ± 4% perf-profile.children.cycles-pp.perf_trace_sched_switch
0.21 ± 6% +0.6 0.78 ± 3% perf-profile.children.cycles-pp.update_sg_lb_stats
0.22 ± 6% +0.6 0.81 ± 4% perf-profile.children.cycles-pp.update_sd_lb_stats
0.22 ± 6% +0.6 0.81 ± 3% perf-profile.children.cycles-pp.sched_balance_find_src_group
0.40 ± 3% +0.7 1.08 ± 4% perf-profile.children.cycles-pp.sched_balance_newidle
0.27 ± 6% +0.7 0.97 ± 4% perf-profile.children.cycles-pp.sched_balance_rq
5.90 +0.9 6.81 perf-profile.children.cycles-pp.intel_idle
35.82 +3.5 39.30 perf-profile.children.cycles-pp.__x64_sys_mq_timedreceive
35.70 +3.5 39.25 perf-profile.children.cycles-pp.do_mq_timedreceive
0.00 +8.8 8.78 ± 3% perf-profile.children.cycles-pp.drain_obj_stock
28.34 +13.4 41.76 perf-profile.children.cycles-pp.do_mq_timedsend
28.52 +13.5 41.99 perf-profile.children.cycles-pp.__x64_sys_mq_timedsend
70.76 +13.7 84.45 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
70.56 +13.8 84.39 perf-profile.children.cycles-pp.do_syscall_64
0.08 +16.2 16.31 ± 4% perf-profile.children.cycles-pp.__refill_obj_stock
3.22 +20.1 23.33 ± 2% perf-profile.children.cycles-pp.kfree
3.01 +20.2 23.25 ± 2% perf-profile.children.cycles-pp.free_msg
2.46 +20.5 23.00 ± 2% perf-profile.children.cycles-pp.__memcg_slab_free_hook
2.31 +25.9 28.25 ± 2% perf-profile.children.cycles-pp.load_msg
1.00 +26.8 27.81 ± 2% perf-profile.children.cycles-pp.__kmalloc_node_noprof
0.68 +27.0 27.65 ± 2% perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
6.79 -3.2 3.63 ± 5% perf-profile.self.cycles-pp.select_idle_core
3.07 -1.7 1.33 ± 4% perf-profile.self.cycles-pp.do_mq_timedreceive
2.90 -1.6 1.31 ± 3% perf-profile.self.cycles-pp.__schedule
2.37 -1.5 0.84 ± 3% perf-profile.self.cycles-pp._copy_to_user
2.73 -1.5 1.22 ± 3% perf-profile.self.cycles-pp.do_mq_timedsend
2.63 -1.4 1.25 ± 3% perf-profile.self.cycles-pp._raw_spin_lock
1.64 -1.0 0.64 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
1.46 -0.9 0.55 ± 2% perf-profile.self.cycles-pp.switch_mm_irqs_off
1.54 -0.9 0.67 ± 6% perf-profile.self.cycles-pp.update_rq_clock_task
1.48 -0.9 0.62 ± 3% perf-profile.self.cycles-pp.msg_get
1.36 -0.8 0.58 ± 3% perf-profile.self.cycles-pp.exit_to_user_mode_loop
1.11 -0.7 0.44 ± 4% perf-profile.self.cycles-pp.msg_insert
1.56 -0.7 0.90 ± 6% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.95 -0.7 0.29 ± 5% perf-profile.self.cycles-pp.llist_reverse_order
1.00 -0.7 0.34 ± 7% perf-profile.self.cycles-pp.perf_swevent_get_recursion_context
1.19 -0.6 0.59 ± 3% perf-profile.self.cycles-pp.wq_sleep
0.92 -0.6 0.36 ± 6% perf-profile.self.cycles-pp.do_perf_trace_sched_wakeup_template
0.90 -0.6 0.34 ± 2% perf-profile.self.cycles-pp.__update_idle_core
1.15 -0.5 0.60 ± 7% perf-profile.self.cycles-pp.task_h_load
0.83 -0.5 0.30 ± 5% perf-profile.self.cycles-pp.call_function_single_prep_ipi
0.97 -0.5 0.44 ± 2% perf-profile.self.cycles-pp.dequeue_entities
0.84 -0.5 0.34 ± 3% perf-profile.self.cycles-pp.cpuacct_charge
0.96 -0.5 0.47 ± 4% perf-profile.self.cycles-pp.do_syscall_64
1.23 -0.5 0.74 perf-profile.self.cycles-pp.__switch_to
0.73 ± 4% -0.5 0.24 ± 3% perf-profile.self.cycles-pp.__bitmap_andnot
0.69 -0.5 0.20 ± 4% perf-profile.self.cycles-pp.flush_smp_call_function_queue
0.94 -0.5 0.47 perf-profile.self.cycles-pp._find_next_bit
0.77 -0.4 0.36 ± 6% perf-profile.self.cycles-pp.update_entity_lag
0.76 -0.4 0.36 ± 3% perf-profile.self.cycles-pp.update_load_avg
0.67 -0.4 0.27 ± 5% perf-profile.self.cycles-pp.__smp_call_single_queue
0.70 -0.4 0.31 ± 2% perf-profile.self.cycles-pp.pick_next_task_fair
0.55 -0.4 0.17 ± 2% perf-profile.self.cycles-pp.os_xsave
0.56 -0.4 0.18 ± 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.63 -0.4 0.25 perf-profile.self.cycles-pp.switch_fpu_return
1.38 -0.4 1.01 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.53 -0.4 0.16 perf-profile.self.cycles-pp.native_sched_clock
0.69 -0.4 0.33 ± 5% perf-profile.self.cycles-pp.wake_affine
0.67 -0.4 0.30 ± 2% perf-profile.self.cycles-pp.kfree
0.75 -0.4 0.39 ± 3% perf-profile.self.cycles-pp.update_curr
0.67 -0.4 0.31 ± 5% perf-profile.self.cycles-pp.ttwu_queue_wakelist
0.53 -0.4 0.18 ± 2% perf-profile.self.cycles-pp.arch_exit_to_user_mode_prepare
0.49 ± 2% -0.4 0.14 ± 3% perf-profile.self.cycles-pp._copy_from_user
0.63 -0.3 0.28 ± 3% perf-profile.self.cycles-pp.schedule_hrtimeout_range_clock
0.58 -0.3 0.23 ± 4% perf-profile.self.cycles-pp.select_idle_sibling
0.58 -0.3 0.24 ± 7% perf-profile.self.cycles-pp.migrate_task_rq_fair
0.64 -0.3 0.31 perf-profile.self.cycles-pp.prepare_task_switch
0.82 -0.3 0.49 ± 3% perf-profile.self.cycles-pp.select_idle_cpu
0.52 ± 3% -0.3 0.21 ± 2% perf-profile.self.cycles-pp.__put_user_4
0.63 ± 11% -0.3 0.32 ± 5% perf-profile.self.cycles-pp.stress_switch_mq
0.54 -0.3 0.24 ± 5% perf-profile.self.cycles-pp.__switch_to_asm
0.54 -0.3 0.25 ± 3% perf-profile.self.cycles-pp.fdget
0.51 -0.3 0.22 ± 4% perf-profile.self.cycles-pp.mm_cid_switch_to
0.44 -0.3 0.15 ± 6% perf-profile.self.cycles-pp.select_task_rq_fair
0.43 -0.3 0.15 ± 5% perf-profile.self.cycles-pp.sched_ttwu_pending
0.49 -0.3 0.23 ± 6% perf-profile.self.cycles-pp.enqueue_task
0.59 -0.2 0.34 ± 2% perf-profile.self.cycles-pp.try_to_wake_up
0.73 -0.2 0.48 perf-profile.self.cycles-pp.update_cfs_rq_load_avg
0.35 -0.2 0.11 ± 4% perf-profile.self.cycles-pp.__resched_curr
0.40 -0.2 0.16 ± 2% perf-profile.self.cycles-pp.__pick_next_task
0.36 -0.2 0.12 ± 3% perf-profile.self.cycles-pp.cpuidle_idle_call
0.34 -0.2 0.11 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.57 -0.2 0.36 perf-profile.self.cycles-pp.__enqueue_entity
0.51 -0.2 0.31 perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.34 -0.2 0.14 ± 5% perf-profile.self.cycles-pp.wakeup_preempt
0.34 -0.2 0.14 ± 3% perf-profile.self.cycles-pp.__wrgsbase_inactive
0.39 -0.2 0.20 ± 2% perf-profile.self.cycles-pp.do_idle
0.31 ± 5% -0.2 0.11 perf-profile.self.cycles-pp.__check_heap_object
0.37 -0.2 0.17 ± 2% perf-profile.self.cycles-pp.check_heap_object
0.51 -0.2 0.32 perf-profile.self.cycles-pp.schedule
0.36 -0.2 0.18 ± 2% perf-profile.self.cycles-pp.__pick_eevdf
0.29 ± 2% -0.2 0.11 ± 4% perf-profile.self.cycles-pp.__virt_addr_valid
0.61 -0.2 0.43 perf-profile.self.cycles-pp.dequeue_entity
0.30 -0.2 0.12 perf-profile.self.cycles-pp.__update_load_avg_se
0.27 -0.2 0.10 ± 4% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.25 -0.2 0.08 ± 5% perf-profile.self.cycles-pp.__check_object_size
0.26 -0.2 0.11 ± 4% perf-profile.self.cycles-pp.place_entity
0.45 -0.2 0.30 ± 3% perf-profile.self.cycles-pp.enqueue_entity
0.44 -0.2 0.29 ± 5% perf-profile.self.cycles-pp.enqueue_task_fair
0.24 -0.2 0.09 perf-profile.self.cycles-pp.wake_up_q
0.24 -0.1 0.09 ± 5% perf-profile.self.cycles-pp.___perf_sw_event
0.22 ± 2% -0.1 0.07 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.32 -0.1 0.18 ± 2% perf-profile.self.cycles-pp.dequeue_task_fair
0.21 ± 2% -0.1 0.06 ± 7% perf-profile.self.cycles-pp.read_tsc
0.20 -0.1 0.06 perf-profile.self.cycles-pp.__rdgsbase_inactive
0.36 -0.1 0.22 ± 4% perf-profile.self.cycles-pp.perf_tp_event
0.28 -0.1 0.14 perf-profile.self.cycles-pp.__kmalloc_node_noprof
0.24 -0.1 0.11 ± 4% perf-profile.self.cycles-pp.strnlen
0.21 -0.1 0.08 ± 6% perf-profile.self.cycles-pp.load_msg
0.33 -0.1 0.20 ± 4% perf-profile.self.cycles-pp.attach_entity_load_avg
0.44 -0.1 0.33 perf-profile.self.cycles-pp.menu_select
0.41 -0.1 0.29 perf-profile.self.cycles-pp.update_se
0.17 ± 2% -0.1 0.06 ± 8% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.27 -0.1 0.15 ± 6% perf-profile.self.cycles-pp.select_task_rq
0.23 ± 2% -0.1 0.11 ± 4% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.19 -0.1 0.08 ± 6% perf-profile.self.cycles-pp.update_rq_clock
0.20 ± 2% -0.1 0.09 perf-profile.self.cycles-pp.check_stack_object
0.18 ± 2% -0.1 0.06 ± 7% perf-profile.self.cycles-pp.wake_q_add_safe
0.18 -0.1 0.07 perf-profile.self.cycles-pp.__account_obj_stock
0.21 ± 2% -0.1 0.10 ± 4% perf-profile.self.cycles-pp.__kmalloc_cache_noprof
0.24 -0.1 0.14 perf-profile.self.cycles-pp.__dequeue_entity
0.19 -0.1 0.10 ± 4% perf-profile.self.cycles-pp.pick_task_fair
0.14 ± 3% -0.1 0.05 perf-profile.self.cycles-pp.inode_set_ctime_current
0.16 ± 3% -0.1 0.06 ± 7% perf-profile.self.cycles-pp.nohz_run_idle_balance
0.15 -0.1 0.06 perf-profile.self.cycles-pp.avg_vruntime
0.16 -0.1 0.07 perf-profile.self.cycles-pp.schedule_idle
0.15 -0.1 0.08 ± 5% perf-profile.self.cycles-pp.dl_server_update
0.15 -0.1 0.08 ± 5% perf-profile.self.cycles-pp.raw_spin_rq_lock_nested
0.12 ± 4% -0.1 0.05 perf-profile.self.cycles-pp.__x64_sys_mq_timedreceive
0.12 -0.1 0.05 ± 8% perf-profile.self.cycles-pp.migrate_disable_switch
0.13 -0.1 0.07 ± 7% perf-profile.self.cycles-pp.___task_rq_lock
0.09 ± 5% -0.1 0.03 ± 70% perf-profile.self.cycles-pp.inode_set_ctime_to_ts
0.22 -0.1 0.16 perf-profile.self.cycles-pp.cpuidle_enter_state
0.12 -0.1 0.06 perf-profile.self.cycles-pp.store_msg
0.12 -0.1 0.06 perf-profile.self.cycles-pp.wakeup_preempt_fair
0.11 ± 4% -0.1 0.05 ± 8% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64_mg
0.11 -0.1 0.05 perf-profile.self.cycles-pp.timestamp_truncate
0.12 ± 4% -0.1 0.06 ± 7% perf-profile.self.cycles-pp.do_perf_trace_sched_stat_runtime
0.11 ± 4% -0.1 0.06 ± 8% perf-profile.self.cycles-pp.tracing_gen_ctx_irq_test
0.13 ± 3% -0.0 0.08 perf-profile.self.cycles-pp.update_curr_dl_se
0.15 -0.0 0.11 ± 4% perf-profile.self.cycles-pp.sched_balance_newidle
0.09 -0.0 0.05 ± 8% perf-profile.self.cycles-pp.native_irq_return_iret
0.14 ± 3% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.vruntime_eligible
0.14 ± 3% -0.0 0.11 perf-profile.self.cycles-pp.ktime_get
0.07 ± 7% -0.0 0.03 ± 70% perf-profile.self.cycles-pp.__set_next_task_fair
0.15 ± 3% -0.0 0.13 perf-profile.self.cycles-pp.put_prev_entity
0.11 ± 4% +0.0 0.12 ± 3% perf-profile.self.cycles-pp.set_next_task_idle
0.06 +0.0 0.08 ± 6% perf-profile.self.cycles-pp.perf_swevent_event
0.07 ± 7% +0.0 0.09 ± 10% perf-profile.self.cycles-pp.mq_timedsend
0.00 +0.1 0.05 perf-profile.self.cycles-pp.ct_idle_enter
0.00 +0.1 0.05 perf-profile.self.cycles-pp.perf_trace_sched_switch
0.00 +0.1 0.06 perf-profile.self.cycles-pp.sched_update_worker
0.12 +0.1 0.18 ± 6% perf-profile.self.cycles-pp.x64_sys_call
0.38 ± 2% +0.1 0.46 perf-profile.self.cycles-pp.finish_task_switch
0.08 ± 5% +0.1 0.20 ± 6% perf-profile.self.cycles-pp.do_perf_trace_sched_switch
0.18 ± 5% +0.5 0.71 ± 3% perf-profile.self.cycles-pp.update_sg_lb_stats
5.89 +0.9 6.81 perf-profile.self.cycles-pp.intel_idle
0.07 +7.4 7.48 ± 4% perf-profile.self.cycles-pp.__refill_obj_stock
0.00 +8.7 8.70 ± 3% perf-profile.self.cycles-pp.drain_obj_stock
2.25 +12.4 14.61 perf-profile.self.cycles-pp.__memcg_slab_free_hook
0.53 +19.0 19.48 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
***************************************************************************************************
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/300s/lkp-icl-2sp9/spawn/unixbench
commit:
8285917d6f ("mm: memcontrol: prepare for reparenting non-hierarchical stats")
01b9da291c ("mm: memcontrol: convert objcg to be per-memcg per-node type")
8285917d6f383aef 01b9da291c4969354807b52956f
---------------- ---------------------------
%stddev %change %stddev
\ | \
12631 -3.8% 12146 unixbench.score
159157 -3.8% 153044 unixbench.throughput
19807234 -4.8% 18852799 unixbench.time.involuntary_context_switches
1.413e+09 -3.9% 1.357e+09 unixbench.time.minor_page_faults
8608 +2.8% 8852 unixbench.time.system_time
6616 -2.4% 6458 unixbench.time.user_time
94616535 -4.1% 90778327 unixbench.time.voluntary_context_switches
52575303 -3.9% 50543226 unixbench.workload
1110579 +8.8% 1208410 ± 2% meminfo.AnonPages
210425 -12.6% 183972 ± 5% meminfo.DirectMap4k
0.05 ± 4% +0.0 0.06 ± 2% mpstat.cpu.all.iowait%
1.51 +0.4 1.93 mpstat.cpu.all.soft%
557785 ± 44% +49.6% 834357 ± 25% numa-meminfo.node0.AnonPages.max
736507 ± 5% +8.1% 796466 ± 3% numa-meminfo.node1.Mapped
6.802e+08 -5.2% 6.446e+08 numa-numastat.node0.local_node
6.804e+08 -5.3% 6.445e+08 numa-numastat.node0.numa_hit
0.04 +7.1% 0.05 perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
0.04 +7.1% 0.05 perf-sched.total_sch_delay.average.ms
23.64 -0.9 22.72 turbostat.C1%
22.97 ± 10% -5.2 17.74 ± 5% turbostat.PKG_%
6.804e+08 -5.3% 6.445e+08 numa-vmstat.node0.numa_hit
6.802e+08 -5.2% 6.446e+08 numa-vmstat.node0.numa_local
184186 ± 5% +8.2% 199205 ± 3% numa-vmstat.node1.nr_mapped
0.03 ± 67% +680.5% 0.22 ± 62% vmstat.procs.b
564221 -3.9% 542439 vmstat.system.cs
380977 -2.6% 371055 vmstat.system.in
769143 +1.6% 781758 proc-vmstat.nr_active_anon
277665 +8.8% 302174 ± 2% proc-vmstat.nr_anon_pages
30969 -3.5% 29900 proc-vmstat.nr_page_table_pages
491673 -2.4% 479782 proc-vmstat.nr_shmem
57770 -1.0% 57209 proc-vmstat.nr_slab_unreclaimable
769143 +1.6% 781758 proc-vmstat.nr_zone_active_anon
252183 ± 14% +35.3% 341208 ± 18% proc-vmstat.numa_hint_faults
1.254e+09 -3.8% 1.207e+09 proc-vmstat.numa_hit
1.254e+09 -3.8% 1.207e+09 proc-vmstat.numa_local
317222 ± 15% +38.4% 438924 ± 20% proc-vmstat.numa_pte_updates
1.338e+09 -3.8% 1.287e+09 proc-vmstat.pgalloc_normal
1.415e+09 -3.9% 1.359e+09 proc-vmstat.pgfault
1.337e+09 -3.8% 1.286e+09 proc-vmstat.pgfree
1.598e+08 -4.1% 1.533e+08 proc-vmstat.pgreuse
482323 ± 4% +17.7% 567902 ± 7% sched_debug.cfs_rq:/.left_deadline.avg
1301805 ± 3% +10.2% 1435059 ± 3% sched_debug.cfs_rq:/.left_deadline.stddev
482320 ± 4% +17.7% 567899 ± 7% sched_debug.cfs_rq:/.left_vruntime.avg
1301796 ± 3% +10.2% 1435050 ± 3% sched_debug.cfs_rq:/.left_vruntime.stddev
135389 ± 2% +23.4% 167033 ± 6% sched_debug.cfs_rq:/.load.avg
327477 +12.0% 366681 ± 5% sched_debug.cfs_rq:/.load.stddev
482320 ± 4% +17.7% 567899 ± 7% sched_debug.cfs_rq:/.right_vruntime.avg
1301796 ± 3% +10.2% 1435050 ± 3% sched_debug.cfs_rq:/.right_vruntime.stddev
0.02 ± 9% +262.4% 0.06 ± 56% sched_debug.cfs_rq:/.spread.avg
0.92 ± 28% +264.8% 3.37 ± 57% sched_debug.cfs_rq:/.spread.max
0.12 ± 21% +260.0% 0.43 ± 57% sched_debug.cfs_rq:/.spread.stddev
405099 ± 2% +6.3% 430541 ± 4% sched_debug.cpu.avg_idle.max
1430164 -4.9% 1359536 sched_debug.cpu.nr_switches.max
90912 ± 2% -12.6% 79477 ± 9% sched_debug.cpu.nr_switches.stddev
1.418e+10 -2.4% 1.385e+10 perf-stat.i.branch-instructions
88090016 -2.4% 86013654 perf-stat.i.branch-misses
27.32 +0.5 27.79 perf-stat.i.cache-miss-rate%
3.981e+08 -2.5% 3.882e+08 perf-stat.i.cache-misses
1.414e+09 -3.9% 1.359e+09 perf-stat.i.cache-references
570181 -3.7% 548874 perf-stat.i.context-switches
2.41 +4.0% 2.50 perf-stat.i.cpi
1.81e+11 +1.2% 1.833e+11 perf-stat.i.cpu-cycles
109140 -4.5% 104200 perf-stat.i.cpu-migrations
7.268e+10 -2.4% 7.092e+10 perf-stat.i.instructions
143.18 -3.7% 137.84 perf-stat.i.metric.K/sec
4249150 -3.6% 4095509 perf-stat.i.minor-faults
4249372 -3.6% 4095721 perf-stat.i.page-faults
27.38 +0.5 27.93 perf-stat.overall.cache-miss-rate%
2.41 +4.1% 2.51 perf-stat.overall.cpi
456.68 +3.6% 473.23 perf-stat.overall.cycles-between-cache-misses
0.41 -3.9% 0.40 perf-stat.overall.ipc
468146 +1.0% 472826 perf-stat.overall.path-length
1.426e+10 -2.6% 1.39e+10 perf-stat.ps.branch-instructions
85809605 -2.2% 83901451 perf-stat.ps.branch-misses
3.912e+08 -2.3% 3.822e+08 perf-stat.ps.cache-misses
1.429e+09 -4.2% 1.368e+09 perf-stat.ps.cache-references
567141 -3.8% 545697 perf-stat.ps.context-switches
106475 -4.1% 102109 perf-stat.ps.cpu-migrations
7.401e+10 -2.7% 7.198e+10 perf-stat.ps.instructions
4240337 -3.8% 4081092 perf-stat.ps.minor-faults
4240545 -3.8% 4081293 perf-stat.ps.page-faults
2.461e+13 -2.9% 2.39e+13 perf-stat.total.instructions
18.76 -1.1 17.70 perf-profile.calltrace.cycles-pp.asm_exc_page_fault
17.73 -1.0 16.72 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
17.66 -1.0 16.66 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
16.03 ± 2% -0.9 15.10 ± 2% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
15.61 ± 2% -0.9 14.70 ± 2% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
10.73 ± 2% -0.8 9.96 ± 3% perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
10.75 ± 2% -0.8 9.98 ± 3% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
10.38 ± 3% -0.7 9.64 ± 3% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
1.53 -0.6 0.92 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
3.01 -0.1 2.86 perf-profile.calltrace.cycles-pp.common_startup_64
2.96 -0.1 2.81 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
2.96 -0.1 2.82 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
2.96 -0.1 2.82 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
2.82 -0.1 2.70 perf-profile.calltrace.cycles-pp.mm_init.dup_mm.copy_process.kernel_clone.__do_sys_clone
1.72 ± 2% -0.1 1.60 ± 2% perf-profile.calltrace.cycles-pp.set_pte_range.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
1.41 ± 2% -0.1 1.29 ± 3% perf-profile.calltrace.cycles-pp.folio_add_file_rmap_ptes.set_pte_range.filemap_map_pages.do_read_fault.do_fault
1.83 -0.1 1.72 perf-profile.calltrace.cycles-pp.schedule.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.82 -0.1 1.71 perf-profile.calltrace.cycles-pp.__schedule.schedule.do_wait.kernel_wait4.do_syscall_64
2.13 -0.1 2.04 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
1.96 -0.1 1.87 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
1.94 -0.1 1.85 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
1.29 -0.1 1.20 perf-profile.calltrace.cycles-pp.do_wp_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
1.87 -0.1 1.79 perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.86 -0.1 1.78 perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
1.86 -0.1 1.78 perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
1.43 -0.1 1.35 perf-profile.calltrace.cycles-pp.select_task_rq_fair.wake_up_new_task.kernel_clone.__do_sys_clone.do_syscall_64
2.09 -0.1 2.01 perf-profile.calltrace.cycles-pp.schedule_tail.ret_from_fork.ret_from_fork_asm
1.08 ± 2% -0.1 1.00 ± 2% perf-profile.calltrace.cycles-pp.__percpu_counter_init_many.mm_init.dup_mm.copy_process.kernel_clone
1.26 -0.1 1.20 perf-profile.calltrace.cycles-pp.sched_balance_find_dst_group.select_task_rq_fair.wake_up_new_task.kernel_clone.__do_sys_clone
1.16 -0.1 1.10 perf-profile.calltrace.cycles-pp.update_sg_wakeup_stats.sched_balance_find_dst_group.select_task_rq_fair.wake_up_new_task.kernel_clone
0.88 ± 2% -0.1 0.82 perf-profile.calltrace.cycles-pp.lru_add_drain.do_wp_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.87 -0.1 0.82 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.do_wp_page.__handle_mm_fault.handle_mm_fault
0.91 ± 2% -0.1 0.86 perf-profile.calltrace.cycles-pp.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init.dup_mm.copy_process
0.86 -0.1 0.80 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.do_wp_page.__handle_mm_fault
1.10 -0.1 1.05 perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
1.25 -0.1 1.20 perf-profile.calltrace.cycles-pp.__vma_start_write.dup_mmap.dup_mm.copy_process.kernel_clone
2.11 -0.1 2.06 perf-profile.calltrace.cycles-pp.wake_up_new_task.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.72 -0.1 0.67 perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
0.87 -0.0 0.82 perf-profile.calltrace.cycles-pp.__wp_page_copy_user.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
1.14 -0.0 1.10 perf-profile.calltrace.cycles-pp.alloc_thread_stack_node.dup_task_struct.copy_process.kernel_clone.__do_sys_clone
0.85 -0.0 0.80 ± 2% perf-profile.calltrace.cycles-pp.copy_mc_enhanced_fast_string.__wp_page_copy_user.wp_page_copy.__handle_mm_fault.handle_mm_fault
0.79 -0.0 0.74 perf-profile.calltrace.cycles-pp.__mmdrop.finish_task_switch.__schedule.schedule.do_wait
0.71 -0.0 0.66 perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.88 -0.0 0.83 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.do_wait.kernel_wait4
1.00 -0.0 0.96 perf-profile.calltrace.cycles-pp.__mmdrop.finish_task_switch.schedule_tail.ret_from_fork.ret_from_fork_asm
0.81 -0.0 0.76 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault
0.68 -0.0 0.64 perf-profile.calltrace.cycles-pp.copy_present_ptes.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
2.05 -0.0 2.01 perf-profile.calltrace.cycles-pp.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap.dup_mm
0.82 -0.0 0.78 perf-profile.calltrace.cycles-pp.__vmalloc_node_noprof.alloc_thread_stack_node.dup_task_struct.copy_process.kernel_clone
0.82 -0.0 0.78 perf-profile.calltrace.cycles-pp.__vmalloc_node_range_noprof.__vmalloc_node_noprof.alloc_thread_stack_node.dup_task_struct.copy_process
3.37 -0.0 3.33 perf-profile.calltrace.cycles-pp.ret_from_fork_asm
0.65 -0.0 0.61 perf-profile.calltrace.cycles-pp.sched_move_task.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
1.10 -0.0 1.07 perf-profile.calltrace.cycles-pp.finish_task_switch.schedule_tail.ret_from_fork.ret_from_fork_asm
0.61 -0.0 0.58 perf-profile.calltrace.cycles-pp.__vma_start_write.free_pgtables.exit_mmap.__mmput.exit_mm
0.56 -0.0 0.52 perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.57 -0.0 0.54 perf-profile.calltrace.cycles-pp.__wake_up_sync_key.do_notify_parent.exit_notify.do_exit.do_group_exit
0.55 -0.0 0.51 perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.do_notify_parent.exit_notify.do_exit
0.88 -0.0 0.85 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__put_user_4.schedule_tail.ret_from_fork.ret_from_fork_asm
0.61 -0.0 0.58 perf-profile.calltrace.cycles-pp.__vma_start_exclude_readers.__vma_start_write.dup_mmap.dup_mm.copy_process
0.61 -0.0 0.58 perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.66 -0.0 0.63 perf-profile.calltrace.cycles-pp.do_notify_parent.exit_notify.do_exit.do_group_exit.__x64_sys_exit_group
0.71 -0.0 0.68 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.__put_user_4.schedule_tail.ret_from_fork
1.67 -0.0 1.65 perf-profile.calltrace.cycles-pp.dup_task_struct.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
0.54 -0.0 0.52 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
0.76 -0.0 0.73 perf-profile.calltrace.cycles-pp.__put_user_4.schedule_tail.ret_from_fork.ret_from_fork_asm
0.55 -0.0 0.52 perf-profile.calltrace.cycles-pp.__vmalloc_area_node.__vmalloc_node_range_noprof.__vmalloc_node_noprof.alloc_thread_stack_node.dup_task_struct
0.54 -0.0 0.51 perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
0.81 -0.0 0.78 perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
0.70 -0.0 0.67 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__put_user_4.schedule_tail
0.96 -0.0 0.94 perf-profile.calltrace.cycles-pp.alloc_pages_noprof.pte_alloc_one.__pte_alloc.copy_pte_range.copy_p4d_range
0.67 -0.0 0.66 perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc.copy_p4d_range
1.07 +0.0 1.09 perf-profile.calltrace.cycles-pp.__pte_alloc.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
2.04 +0.0 2.07 perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.exit_mmap.__mmput.exit_mm
0.74 +0.0 0.77 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.dup_mmap.dup_mm.copy_process
0.63 +0.0 0.67 perf-profile.calltrace.cycles-pp.__pud_alloc.copy_p4d_range.copy_page_range.dup_mmap.dup_mm
0.76 +0.0 0.79 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.dup_mmap.dup_mm.copy_process.kernel_clone
0.73 +0.0 0.76 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.dup_mmap.dup_mm
1.10 +0.0 1.15 perf-profile.calltrace.cycles-pp.tear_down_vmas.exit_mmap.__mmput.exit_mm.do_exit
1.06 +0.1 1.12 perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
0.76 +0.1 0.84 perf-profile.calltrace.cycles-pp.kmem_cache_free.tear_down_vmas.exit_mmap.__mmput.exit_mm
0.63 +0.1 0.75 perf-profile.calltrace.cycles-pp.kmem_cache_free.unlink_anon_vmas.free_pgtables.exit_mmap.__mmput
0.72 +0.1 0.84 perf-profile.calltrace.cycles-pp.vm_area_dup.dup_mmap.dup_mm.copy_process.kernel_clone
0.64 +0.1 0.77 perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_dup.dup_mmap.dup_mm.copy_process
4.77 ± 2% +0.4 5.18 ± 2% perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.release_task.wait_task_zombie.__do_wait.do_wait
4.43 ± 2% +0.4 4.84 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.release_task.wait_task_zombie.__do_wait
0.20 ±141% +0.4 0.62 ± 11% perf-profile.calltrace.cycles-pp.queue_event.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events
0.20 ±141% +0.4 0.63 ± 11% perf-profile.calltrace.cycles-pp.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events.record__finish_output
0.21 ±141% +0.4 0.64 ± 11% perf-profile.calltrace.cycles-pp.process_simple.reader__read_event.perf_session__process_events.record__finish_output.cmd_record
4.85 ± 2% +0.5 5.30 ± 2% perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
4.53 ± 2% +0.5 4.99 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone
5.04 ± 2% +0.5 5.53 ± 2% perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.exit_notify.do_exit.do_group_exit.__x64_sys_exit_group
4.71 ± 2% +0.5 5.21 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.exit_notify.do_exit.do_group_exit
0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.tear_down_vmas.exit_mmap.__mmput
6.04 ± 2% +0.5 6.56 ± 2% perf-profile.calltrace.cycles-pp.wait_task_zombie.__do_wait.do_wait.kernel_wait4.do_syscall_64
0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.00 +0.5 0.53 perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unlink_anon_vmas.free_pgtables.exit_mmap
5.87 ± 2% +0.5 6.40 ± 2% perf-profile.calltrace.cycles-pp.release_task.wait_task_zombie.__do_wait.do_wait.kernel_wait4
0.00 +0.5 0.54 perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
5.98 ± 2% +0.5 6.52 ± 2% perf-profile.calltrace.cycles-pp.exit_notify.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
8.83 ± 2% +0.8 9.60 ± 2% perf-profile.calltrace.cycles-pp.queued_read_lock_slowpath.__do_wait.do_wait.kernel_wait4.do_syscall_64
8.28 ± 2% +0.8 9.06 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_read_lock_slowpath.__do_wait.do_wait.kernel_wait4
17.04 ± 2% +1.2 18.22 perf-profile.calltrace.cycles-pp.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.99 ± 2% +1.2 18.17 perf-profile.calltrace.cycles-pp.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
15.10 ± 2% +1.3 16.39 ± 2% perf-profile.calltrace.cycles-pp.__do_wait.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
71.62 +1.3 72.94 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
71.65 +1.3 72.97 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
20.02 -1.1 18.89 perf-profile.children.cycles-pp.asm_exc_page_fault
18.66 -1.0 17.61 perf-profile.children.cycles-pp.exc_page_fault
18.57 -1.0 17.52 perf-profile.children.cycles-pp.do_user_addr_fault
16.63 ± 2% -1.0 15.67 ± 2% perf-profile.children.cycles-pp.handle_mm_fault
16.14 ± 2% -0.9 15.22 ± 2% perf-profile.children.cycles-pp.__handle_mm_fault
10.73 ± 3% -0.8 9.96 ± 3% perf-profile.children.cycles-pp.do_read_fault
10.75 ± 3% -0.8 9.98 ± 3% perf-profile.children.cycles-pp.do_fault
10.44 ± 3% -0.7 9.70 ± 3% perf-profile.children.cycles-pp.filemap_map_pages
3.64 -0.2 3.44 perf-profile.children.cycles-pp.__schedule
2.34 -0.2 2.14 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
3.01 -0.2 2.86 perf-profile.children.cycles-pp.do_idle
3.01 -0.1 2.86 perf-profile.children.cycles-pp.common_startup_64
3.01 -0.1 2.86 perf-profile.children.cycles-pp.cpu_startup_entry
2.96 -0.1 2.82 perf-profile.children.cycles-pp.start_secondary
2.83 -0.1 2.71 perf-profile.children.cycles-pp.mm_init
1.76 -0.1 1.64 ± 2% perf-profile.children.cycles-pp.set_pte_range
1.44 ± 2% -0.1 1.32 ± 2% perf-profile.children.cycles-pp.folio_add_file_rmap_ptes
1.96 -0.1 1.85 perf-profile.children.cycles-pp.schedule
2.22 -0.1 2.11 ± 2% perf-profile.children.cycles-pp.pcpu_alloc_noprof
2.17 -0.1 2.07 perf-profile.children.cycles-pp.cpuidle_idle_call
2.25 -0.1 2.15 perf-profile.children.cycles-pp.get_page_from_freelist
1.81 -0.1 1.71 perf-profile.children.cycles-pp.__mmdrop
2.20 -0.1 2.10 perf-profile.children.cycles-pp.finish_task_switch
1.98 -0.1 1.88 perf-profile.children.cycles-pp.cpuidle_enter_state
2.00 -0.1 1.90 perf-profile.children.cycles-pp.cpuidle_enter
1.91 -0.1 1.82 perf-profile.children.cycles-pp.acpi_idle_enter
1.65 -0.1 1.56 perf-profile.children.cycles-pp.select_task_rq_fair
1.87 -0.1 1.78 perf-profile.children.cycles-pp.__vma_start_write
1.89 -0.1 1.80 perf-profile.children.cycles-pp.pv_native_safe_halt
1.89 -0.1 1.81 perf-profile.children.cycles-pp.acpi_idle_do_entry
1.34 -0.1 1.25 perf-profile.children.cycles-pp.do_wp_page
1.22 -0.1 1.14 perf-profile.children.cycles-pp.asm_sysvec_call_function_single
1.89 -0.1 1.81 perf-profile.children.cycles-pp.acpi_safe_halt
3.56 -0.1 3.48 perf-profile.children.cycles-pp.alloc_pages_mpol
1.19 -0.1 1.11 perf-profile.children.cycles-pp.folio_batch_move_lru
2.09 -0.1 2.01 perf-profile.children.cycles-pp.schedule_tail
1.08 ± 2% -0.1 1.01 perf-profile.children.cycles-pp.__percpu_counter_init_many
1.10 -0.1 1.03 perf-profile.children.cycles-pp.its_return_thunk
3.45 -0.1 3.38 perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
0.64 -0.1 0.57 perf-profile.children.cycles-pp.__slab_free
0.88 -0.1 0.81 perf-profile.children.cycles-pp.__pcs_replace_empty_main
0.89 -0.1 0.82 perf-profile.children.cycles-pp.refill_objects
1.27 -0.1 1.20 perf-profile.children.cycles-pp.sched_balance_find_dst_group
0.89 -0.1 0.82 perf-profile.children.cycles-pp.__refill_objects_node
1.32 ± 2% -0.1 1.26 perf-profile.children.cycles-pp.__pick_next_task
1.25 ± 2% -0.1 1.18 perf-profile.children.cycles-pp.pick_next_task_fair
3.17 -0.1 3.11 perf-profile.children.cycles-pp.alloc_pages_noprof
1.19 -0.1 1.13 perf-profile.children.cycles-pp.update_sg_wakeup_stats
0.88 ± 2% -0.1 0.82 perf-profile.children.cycles-pp.lru_add_drain
1.03 -0.1 0.97 perf-profile.children.cycles-pp.memset_orig
0.87 -0.1 0.82 perf-profile.children.cycles-pp.lru_add_drain_cpu
1.00 ± 2% -0.1 0.94 ± 2% perf-profile.children.cycles-pp.__page_cache_release
0.86 ± 2% -0.1 0.80 perf-profile.children.cycles-pp.dequeue_task_fair
1.11 -0.1 1.06 perf-profile.children.cycles-pp.lock_vma_under_rcu
0.95 -0.1 0.89 ± 2% perf-profile.children.cycles-pp.__wp_page_copy_user
0.84 ± 2% -0.1 0.79 perf-profile.children.cycles-pp.dequeue_entities
0.68 -0.1 0.63 perf-profile.children.cycles-pp.free_frozen_page_commit
0.81 -0.1 0.76 perf-profile.children.cycles-pp.rmqueue
0.86 -0.0 0.81 perf-profile.children.cycles-pp.__percpu_counter_sum
0.67 ± 2% -0.0 0.62 perf-profile.children.cycles-pp.dequeue_entity
0.73 -0.0 0.68 perf-profile.children.cycles-pp.schedule_idle
0.92 -0.0 0.87 ± 2% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string
1.26 -0.0 1.21 perf-profile.children.cycles-pp.kernel_init_pages
0.89 -0.0 0.85 perf-profile.children.cycles-pp.native_irq_return_iret
0.40 -0.0 0.36 perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
0.99 -0.0 0.95 perf-profile.children.cycles-pp.__vma_start_exclude_readers
1.14 -0.0 1.10 perf-profile.children.cycles-pp.alloc_thread_stack_node
0.28 ± 7% -0.0 0.24 ± 7% perf-profile.children.cycles-pp.callchain_cursor_reset
2.11 -0.0 2.06 perf-profile.children.cycles-pp.wake_up_new_task
0.97 -0.0 0.93 perf-profile.children.cycles-pp.__put_user_4
0.48 -0.0 0.44 perf-profile.children.cycles-pp.try_charge_memcg
0.82 -0.0 0.78 perf-profile.children.cycles-pp.__vmalloc_node_noprof
2.06 -0.0 2.02 perf-profile.children.cycles-pp.copy_pte_range
0.13 -0.0 0.09 perf-profile.children.cycles-pp.trylock_stock
0.55 -0.0 0.51 perf-profile.children.cycles-pp.free_pcppages_bulk
0.55 -0.0 0.52 ± 2% perf-profile.children.cycles-pp.free_percpu
0.81 ± 2% -0.0 0.78 perf-profile.children.cycles-pp.sync_regs
0.82 -0.0 0.78 perf-profile.children.cycles-pp.__vmalloc_node_range_noprof
0.65 -0.0 0.61 perf-profile.children.cycles-pp.sched_move_task
0.61 -0.0 0.57 perf-profile.children.cycles-pp.sysvec_call_function_single
0.51 -0.0 0.48 perf-profile.children.cycles-pp.__sysvec_call_function_single
0.68 -0.0 0.64 perf-profile.children.cycles-pp.copy_present_ptes
0.51 -0.0 0.48 perf-profile.children.cycles-pp.delayed_vfree_work
0.73 -0.0 0.70 perf-profile.children.cycles-pp.enqueue_task
0.56 -0.0 0.52 perf-profile.children.cycles-pp.process_one_work
0.49 -0.0 0.46 perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.57 -0.0 0.54 perf-profile.children.cycles-pp.__wake_up_common
0.70 -0.0 0.66 perf-profile.children.cycles-pp.enqueue_task_fair
3.37 -0.0 3.33 perf-profile.children.cycles-pp.ret_from_fork_asm
0.67 -0.0 0.63 perf-profile.children.cycles-pp.do_notify_parent
0.17 ± 2% -0.0 0.14 perf-profile.children.cycles-pp.rcu_cblist_dequeue
0.49 -0.0 0.46 perf-profile.children.cycles-pp.vfree
0.57 -0.0 0.54 perf-profile.children.cycles-pp.__wake_up_sync_key
0.57 -0.0 0.54 perf-profile.children.cycles-pp.try_to_wake_up
0.70 -0.0 0.66 perf-profile.children.cycles-pp.pte_offset_map_lock
0.61 -0.0 0.58 perf-profile.children.cycles-pp.worker_thread
0.54 -0.0 0.51 perf-profile.children.cycles-pp.__rmqueue_pcplist
0.16 ± 2% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.___free_pages
0.44 -0.0 0.41 perf-profile.children.cycles-pp.mas_find
0.20 ± 2% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.mutex_spin_on_owner
0.48 ± 2% -0.0 0.45 perf-profile.children.cycles-pp.try_to_block_task
0.45 -0.0 0.42 perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.38 -0.0 0.35 ± 2% perf-profile.children.cycles-pp.percpu_counter_destroy_many
0.23 ± 4% -0.0 0.20 ± 3% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
0.20 ± 4% -0.0 0.17 ± 4% perf-profile.children.cycles-pp.rwsem_spin_on_owner
0.47 -0.0 0.44 perf-profile.children.cycles-pp.__mod_memcg_state
0.57 -0.0 0.54 perf-profile.children.cycles-pp.ptep_clear_flush
0.37 -0.0 0.35 perf-profile.children.cycles-pp.__folio_batch_add_and_move
0.26 -0.0 0.24 perf-profile.children.cycles-pp.rseq_set_ids_get_csaddr
0.49 -0.0 0.47 perf-profile.children.cycles-pp.update_load_avg
0.53 -0.0 0.50 perf-profile.children.cycles-pp.enqueue_entity
0.28 ± 2% -0.0 0.26 perf-profile.children.cycles-pp.lru_gen_del_folio
0.55 -0.0 0.52 perf-profile.children.cycles-pp.__vmalloc_area_node
0.55 -0.0 0.52 perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
0.46 -0.0 0.43 perf-profile.children.cycles-pp.vma_alloc_folio_noprof
1.67 -0.0 1.65 perf-profile.children.cycles-pp.dup_task_struct
0.35 -0.0 0.32 perf-profile.children.cycles-pp.arch_dup_task_struct
0.24 ± 3% -0.0 0.22 ± 2% perf-profile.children.cycles-pp.vma_interval_tree_remove
0.43 -0.0 0.41 perf-profile.children.cycles-pp.__perf_sw_event
0.33 -0.0 0.31 perf-profile.children.cycles-pp.mas_next_slot
0.36 -0.0 0.34 perf-profile.children.cycles-pp.ttwu_do_activate
0.36 -0.0 0.34 ± 3% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.38 ± 2% -0.0 0.36 perf-profile.children.cycles-pp.sched_change_begin
0.36 -0.0 0.34 perf-profile.children.cycles-pp.sched_ttwu_pending
0.42 -0.0 0.40 perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
0.26 -0.0 0.24 ± 2% perf-profile.children.cycles-pp.__rseq_handle_slowpath
0.34 -0.0 0.32 perf-profile.children.cycles-pp.mas_walk
0.27 -0.0 0.25 ± 3% perf-profile.children.cycles-pp.__get_vm_area_node
0.26 -0.0 0.24 perf-profile.children.cycles-pp.remove_vm_area
0.44 -0.0 0.42 perf-profile.children.cycles-pp.vm_area_alloc_pages
0.24 -0.0 0.22 ± 3% perf-profile.children.cycles-pp._raw_spin_trylock
0.27 -0.0 0.26 perf-profile.children.cycles-pp.perf_event_task
0.40 -0.0 0.39 perf-profile.children.cycles-pp.flush_tlb_func
0.29 -0.0 0.28 perf-profile.children.cycles-pp.handle_pte_fault
0.29 -0.0 0.28 perf-profile.children.cycles-pp.mas_store
0.31 -0.0 0.30 perf-profile.children.cycles-pp.lru_add
0.23 ± 2% -0.0 0.21 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.28 -0.0 0.26 perf-profile.children.cycles-pp.__pi_memcpy
0.21 ± 2% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.free_unref_folios
0.07 ± 7% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.map_symbol__copy
0.26 -0.0 0.24 perf-profile.children.cycles-pp.update_cfs_rq_load_avg
0.20 -0.0 0.19 ± 2% perf-profile.children.cycles-pp.lock_mm_and_find_vma
0.16 -0.0 0.15 ± 2% perf-profile.children.cycles-pp._raw_write_lock_irq
0.07 -0.0 0.06 ± 7% perf-profile.children.cycles-pp.put_cred_rcu
0.23 -0.0 0.22 perf-profile.children.cycles-pp.select_task_rq
0.19 -0.0 0.18 perf-profile.children.cycles-pp._find_next_or_bit
0.20 -0.0 0.19 perf-profile.children.cycles-pp.add_device_randomness
0.10 -0.0 0.09 perf-profile.children.cycles-pp.filp_close
0.10 -0.0 0.09 perf-profile.children.cycles-pp.free_swap_cache
0.13 -0.0 0.12 perf-profile.children.cycles-pp.___task_rq_lock
0.13 -0.0 0.12 perf-profile.children.cycles-pp.__mt_destroy
0.13 -0.0 0.12 perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
0.14 -0.0 0.13 perf-profile.children.cycles-pp._find_next_and_bit
0.09 -0.0 0.08 perf-profile.children.cycles-pp._raw_spin_unlock
0.07 -0.0 0.06 perf-profile.children.cycles-pp.cgroup_task_dead
0.07 -0.0 0.06 perf-profile.children.cycles-pp.exit_fs
0.13 -0.0 0.12 perf-profile.children.cycles-pp.select_idle_sibling
0.12 -0.0 0.11 perf-profile.children.cycles-pp.get_free_pages_noprof
0.12 -0.0 0.11 perf-profile.children.cycles-pp.mod_node_page_state
0.06 -0.0 0.05 perf-profile.children.cycles-pp.set_task_cpu
0.21 -0.0 0.20 perf-profile.children.cycles-pp.memcpy_and_pad
0.07 +0.0 0.08 perf-profile.children.cycles-pp.memcg_charge_kernel_stack
0.11 ± 4% +0.0 0.13 ± 6% perf-profile.children.cycles-pp.pgd_free
0.17 ± 2% +0.0 0.18 ± 2% perf-profile.children.cycles-pp.exit_task_stack_account
0.12 +0.0 0.14 ± 3% perf-profile.children.cycles-pp.kmem_cache_alloc_node_noprof
1.07 +0.0 1.09 perf-profile.children.cycles-pp.__pte_alloc
1.30 +0.0 1.32 perf-profile.children.cycles-pp.flush_tlb_mm_range
1.03 +0.0 1.05 perf-profile.children.cycles-pp.pte_alloc_one
0.49 +0.0 0.50 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
2.05 +0.0 2.08 perf-profile.children.cycles-pp.unlink_anon_vmas
0.64 +0.0 0.67 perf-profile.children.cycles-pp.__pud_alloc
0.75 +0.0 0.78 perf-profile.children.cycles-pp.on_each_cpu_cond_mask
0.74 +0.0 0.78 perf-profile.children.cycles-pp.smp_call_function_many_cond
1.12 +0.1 1.17 perf-profile.children.cycles-pp.__memcg_kmem_charge_page
1.11 +0.1 1.16 perf-profile.children.cycles-pp.tear_down_vmas
0.26 +0.1 0.31 ± 2% perf-profile.children.cycles-pp.__exit_signal
1.92 +0.1 1.97 perf-profile.children.cycles-pp._raw_spin_lock
1.06 +0.1 1.12 perf-profile.children.cycles-pp.kthread
0.27 +0.1 0.34 ± 2% perf-profile.children.cycles-pp.__put_task_struct
0.48 +0.1 0.54 perf-profile.children.cycles-pp.__account_obj_stock
0.10 +0.1 0.18 ± 2% perf-profile.children.cycles-pp.lru_gen_del_mm
0.10 ± 4% +0.1 0.20 ± 2% perf-profile.children.cycles-pp.lru_gen_add_mm
0.44 +0.1 0.54 perf-profile.children.cycles-pp.smpboot_thread_fn
0.43 +0.1 0.52 perf-profile.children.cycles-pp.run_ksoftirqd
0.07 ± 7% +0.1 0.18 ± 2% perf-profile.children.cycles-pp.put_pid
0.73 +0.1 0.86 perf-profile.children.cycles-pp.vm_area_dup
0.30 ± 2% +0.1 0.43 perf-profile.children.cycles-pp.folio_add_new_anon_rmap
0.84 +0.1 0.98 perf-profile.children.cycles-pp.__free_frozen_pages
0.23 ± 3% +0.1 0.37 ± 2% perf-profile.children.cycles-pp.__folio_mod_stat
0.46 ± 24% +0.2 0.64 ± 11% perf-profile.children.cycles-pp.process_simple
0.45 ± 24% +0.2 0.63 ± 11% perf-profile.children.cycles-pp.ordered_events__queue
0.45 ± 26% +0.2 0.62 ± 11% perf-profile.children.cycles-pp.queue_event
2.28 +0.2 2.46 perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
0.15 ± 3% +0.2 0.35 ± 3% perf-profile.children.cycles-pp.__memcg_kmem_uncharge_page
0.00 +0.2 0.23 ± 6% perf-profile.children.cycles-pp.drain_obj_stock
0.59 ± 2% +0.3 0.87 ± 2% perf-profile.children.cycles-pp._raw_write_unlock_irq
1.13 +0.3 1.41 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
0.13 +0.4 0.53 ± 3% perf-profile.children.cycles-pp.__refill_obj_stock
1.09 +0.4 1.53 perf-profile.children.cycles-pp.tlb_remove_table_rcu
6.04 ± 2% +0.5 6.56 ± 2% perf-profile.children.cycles-pp.wait_task_zombie
5.87 ± 2% +0.5 6.40 ± 2% perf-profile.children.cycles-pp.release_task
5.98 ± 2% +0.5 6.52 ± 2% perf-profile.children.cycles-pp.exit_notify
2.17 +0.6 2.77 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
2.11 +0.6 2.71 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
1.77 +0.6 2.37 perf-profile.children.cycles-pp.__irq_exit_rcu
2.18 +0.6 2.82 perf-profile.children.cycles-pp.kmem_cache_free
2.18 +0.7 2.88 perf-profile.children.cycles-pp.handle_softirqs
2.06 +0.7 2.76 perf-profile.children.cycles-pp.rcu_core
2.03 +0.7 2.74 perf-profile.children.cycles-pp.rcu_do_batch
1.21 +0.7 1.95 perf-profile.children.cycles-pp.__memcg_slab_free_hook
0.64 ± 2% +0.8 1.39 perf-profile.children.cycles-pp.lruvec_stat_mod_folio
8.83 ± 2% +0.8 9.60 ± 2% perf-profile.children.cycles-pp.queued_read_lock_slowpath
17.04 ± 2% +1.2 18.22 perf-profile.children.cycles-pp.kernel_wait4
16.99 ± 2% +1.2 18.17 perf-profile.children.cycles-pp.do_wait
15.10 ± 2% +1.3 16.39 ± 2% perf-profile.children.cycles-pp.__do_wait
71.84 +1.3 73.15 perf-profile.children.cycles-pp.do_syscall_64
71.88 +1.3 73.18 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
14.66 ± 2% +1.4 16.02 ± 2% perf-profile.children.cycles-pp.queued_write_lock_slowpath
23.80 ± 2% +2.1 25.86 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
6.60 ± 3% -0.5 6.10 ± 3% perf-profile.self.cycles-pp.next_uptodate_folio
1.53 ± 2% -0.1 1.38 ± 2% perf-profile.self.cycles-pp.filemap_map_pages
1.40 ± 2% -0.1 1.28 ± 2% perf-profile.self.cycles-pp.folio_add_file_rmap_ptes
1.19 -0.1 1.12 perf-profile.self.cycles-pp.pv_native_safe_halt
1.00 -0.1 0.93 perf-profile.self.cycles-pp.memset_orig
0.83 -0.1 0.77 ± 2% perf-profile.self.cycles-pp.__refill_objects_node
1.00 -0.1 0.95 perf-profile.self.cycles-pp.update_sg_wakeup_stats
0.42 -0.1 0.36 perf-profile.self.cycles-pp.try_charge_memcg
0.54 -0.1 0.49 perf-profile.self.cycles-pp.__slab_free
0.95 -0.1 0.90 perf-profile.self.cycles-pp.down_write
0.39 -0.1 0.34 perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
0.90 -0.1 0.85 perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string
0.60 -0.0 0.55 perf-profile.self.cycles-pp.its_return_thunk
0.93 -0.0 0.88 perf-profile.self.cycles-pp.__vma_start_exclude_readers
1.23 -0.0 1.18 perf-profile.self.cycles-pp.kernel_init_pages
0.89 -0.0 0.85 perf-profile.self.cycles-pp.native_irq_return_iret
0.78 -0.0 0.74 perf-profile.self.cycles-pp.lock_vma_under_rcu
0.86 -0.0 0.81 ± 2% perf-profile.self.cycles-pp.__vma_start_write
0.81 -0.0 0.77 perf-profile.self.cycles-pp.sync_regs
0.65 -0.0 0.61 perf-profile.self.cycles-pp.copy_present_ptes
0.61 -0.0 0.57 perf-profile.self.cycles-pp.__percpu_counter_sum
0.17 ± 2% -0.0 0.14 perf-profile.self.cycles-pp.rcu_cblist_dequeue
0.91 -0.0 0.88 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
1.48 -0.0 1.45 perf-profile.self.cycles-pp._raw_spin_lock
0.41 -0.0 0.38 perf-profile.self.cycles-pp.do_wp_page
0.16 -0.0 0.13 ± 3% perf-profile.self.cycles-pp.___free_pages
0.43 -0.0 0.40 perf-profile.self.cycles-pp.__mod_memcg_state
0.33 -0.0 0.30 ± 2% perf-profile.self.cycles-pp.tlb_remove_table_rcu
0.19 ± 4% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.mutex_spin_on_owner
0.52 -0.0 0.50 perf-profile.self.cycles-pp.anon_vma_fork
0.33 -0.0 0.30 perf-profile.self.cycles-pp.unlink_anon_vmas
0.24 -0.0 0.22 ± 5% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.21 ± 2% -0.0 0.18 ± 8% perf-profile.self.cycles-pp.vma_interval_tree_insert_after
0.19 ± 4% -0.0 0.17 ± 6% perf-profile.self.cycles-pp.rwsem_spin_on_owner
0.53 -0.0 0.51 perf-profile.self.cycles-pp.zap_pte_range
0.27 -0.0 0.25 perf-profile.self.cycles-pp.mas_next_slot
0.22 ± 3% -0.0 0.20 perf-profile.self.cycles-pp.lru_gen_del_folio
0.26 -0.0 0.24 ± 2% perf-profile.self.cycles-pp.__schedule
0.33 -0.0 0.32 perf-profile.self.cycles-pp.mas_walk
0.25 -0.0 0.23 perf-profile.self.cycles-pp.anon_vma_clone
0.22 ± 2% -0.0 0.21 ± 2% perf-profile.self.cycles-pp._raw_spin_trylock
0.27 -0.0 0.26 perf-profile.self.cycles-pp.__rmqueue_pcplist
0.27 -0.0 0.25 perf-profile.self.cycles-pp.__pi_memcpy
0.22 ± 2% -0.0 0.20 ± 2% perf-profile.self.cycles-pp.__put_user_4
0.18 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.page_counter_uncharge
0.26 -0.0 0.25 perf-profile.self.cycles-pp.do_user_addr_fault
0.14 -0.0 0.13 ± 3% perf-profile.self.cycles-pp.__task_pid_nr_ns
0.16 -0.0 0.15 ± 2% perf-profile.self.cycles-pp._raw_write_lock_irq
0.16 -0.0 0.15 ± 2% perf-profile.self.cycles-pp.mas_store
0.20 -0.0 0.19 perf-profile.self.cycles-pp._raw_spin_lock_irq
0.19 -0.0 0.18 perf-profile.self.cycles-pp.handle_mm_fault
0.10 -0.0 0.09 perf-profile.self.cycles-pp.error_entry
0.10 -0.0 0.09 perf-profile.self.cycles-pp.exit_to_user_mode_loop
0.10 -0.0 0.09 perf-profile.self.cycles-pp.prepare_creds
0.16 -0.0 0.15 perf-profile.self.cycles-pp.__perf_sw_event
0.07 -0.0 0.06 perf-profile.self.cycles-pp._raw_spin_unlock
0.13 -0.0 0.12 perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
0.16 -0.0 0.15 perf-profile.self.cycles-pp.dup_fd
0.09 -0.0 0.08 perf-profile.self.cycles-pp.free_unref_folios
0.17 -0.0 0.16 perf-profile.self.cycles-pp.get_page_from_freelist
0.09 -0.0 0.08 perf-profile.self.cycles-pp.lru_add
0.16 -0.0 0.15 perf-profile.self.cycles-pp.lru_gen_add_folio
0.09 -0.0 0.08 perf-profile.self.cycles-pp.ptep_clear_flush
0.13 -0.0 0.12 perf-profile.self.cycles-pp.tear_down_vmas
0.13 -0.0 0.12 perf-profile.self.cycles-pp.update_curr
0.06 -0.0 0.05 perf-profile.self.cycles-pp.__percpu_counter_init_many
0.11 -0.0 0.10 perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.11 -0.0 0.10 perf-profile.self.cycles-pp.enqueue_task_fair
0.08 -0.0 0.07 perf-profile.self.cycles-pp.exit_task_stack_account
0.08 -0.0 0.07 perf-profile.self.cycles-pp.flush_tlb_func
0.08 -0.0 0.07 perf-profile.self.cycles-pp.pidfs_add_pid
0.06 -0.0 0.05 perf-profile.self.cycles-pp.rseq_set_ids_get_csaddr
0.11 -0.0 0.10 perf-profile.self.cycles-pp.update_cfs_rq_load_avg
0.06 -0.0 0.05 perf-profile.self.cycles-pp.vm_area_init_from
0.15 -0.0 0.14 perf-profile.self.cycles-pp.acct_collect
0.23 ± 2% +0.0 0.26 perf-profile.self.cycles-pp.pcpu_alloc_noprof
0.68 +0.0 0.72 perf-profile.self.cycles-pp.smp_call_function_many_cond
0.07 +0.0 0.11 perf-profile.self.cycles-pp.__mmput
0.40 +0.1 0.46 perf-profile.self.cycles-pp.__account_obj_stock
0.00 +0.1 0.06 perf-profile.self.cycles-pp.lru_gen_add_mm
0.02 ±141% +0.1 0.09 perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave
0.32 +0.1 0.44 perf-profile.self.cycles-pp.__memcg_kmem_charge_page
0.09 +0.2 0.26 ± 4% perf-profile.self.cycles-pp.__refill_obj_stock
0.44 ± 25% +0.2 0.62 ± 11% perf-profile.self.cycles-pp.queue_event
0.06 +0.2 0.27 ± 2% perf-profile.self.cycles-pp.__memcg_kmem_uncharge_page
0.00 +0.2 0.22 ± 4% perf-profile.self.cycles-pp.drain_obj_stock
0.75 +0.3 1.01 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
0.87 +0.3 1.17 perf-profile.self.cycles-pp.__memcg_slab_free_hook
0.29 ± 5% +0.8 1.06 perf-profile.self.cycles-pp.lruvec_stat_mod_folio
23.65 ± 2% +2.0 25.66 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-12 12:56 [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression kernel test robot @ 2026-05-12 16:03 ` Shakeel Butt 2026-05-13 2:10 ` Qi Zheng 0 siblings, 1 reply; 13+ messages in thread From: Shakeel Butt @ 2026-05-12 16:03 UTC (permalink / raw) To: kernel test robot Cc: Qi Zheng, oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: > > > Hello, > > kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: > > > commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master This is most probably due to shuffling of struct mem_cgroup and struct mem_cgroup_per_node members. I will try to reproduce and will followup on this. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-12 16:03 ` Shakeel Butt @ 2026-05-13 2:10 ` Qi Zheng 2026-05-13 13:49 ` Shakeel Butt 0 siblings, 1 reply; 13+ messages in thread From: Qi Zheng @ 2026-05-13 2:10 UTC (permalink / raw) To: Shakeel Butt, kernel test robot Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm On 5/13/26 12:03 AM, Shakeel Butt wrote: > On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: >> >> >> Hello, >> >> kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: >> >> >> commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > This is most probably due to shuffling of struct mem_cgroup and struct > mem_cgroup_per_node members. Another possibility is that after objcg was split into per-node, the slab accounting fast path is still designed assuming only one current objcg per CPU: struct obj_stock_pcp { struct obj_cgroup *cached_objcg; }; So it's may cause the following thrashing: CPU stock cached = memcg/node0 objcg free object tagged = memcg/node1 objcg => __refill_obj_stock --> objcg mismatch => drain_obj_stock() => cache switches to node1 objcg next local allocation tagged = node0 objcg => mismatch again => drain_obj_stock() > > I will try to reproduce and will followup on this. Thanks! I'll also try to reproduce it locally and work on a fix. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-13 2:10 ` Qi Zheng @ 2026-05-13 13:49 ` Shakeel Butt 2026-05-13 14:27 ` Shakeel Butt 0 siblings, 1 reply; 13+ messages in thread From: Shakeel Butt @ 2026-05-13 13:49 UTC (permalink / raw) To: Qi Zheng Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm On Wed, May 13, 2026 at 10:10:34AM +0800, Qi Zheng wrote: > > > On 5/13/26 12:03 AM, Shakeel Butt wrote: > > On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: > > > > > > > > > Hello, > > > > > > kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: > > > > > > > > > commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > This is most probably due to shuffling of struct mem_cgroup and struct > > mem_cgroup_per_node members. > > Another possibility is that after objcg was split into per-node, the > slab accounting fast path is still designed assuming only one current > objcg per CPU: > > struct obj_stock_pcp { > struct obj_cgroup *cached_objcg; > }; > > So it's may cause the following thrashing: > > CPU stock cached = memcg/node0 objcg > free object tagged = memcg/node1 objcg > => __refill_obj_stock --> objcg mismatch > => drain_obj_stock() > => cache switches to node1 objcg > > next local allocation tagged = node0 objcg > => mismatch again > => drain_obj_stock() Actually I think this is the issue, we have ping pong threads running on different nodes where though theu are in same cgroup but their current->obcg is for local node and thus this ping pong is thrashing the per-cpu objcg stock. The easier fix would be to compare objcg->memcg instead of just objcg during draining and caching. In addition we can add support for multiple objcg per-cpu stock caching. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-13 13:49 ` Shakeel Butt @ 2026-05-13 14:27 ` Shakeel Butt 2026-05-14 7:46 ` Qi Zheng 0 siblings, 1 reply; 13+ messages in thread From: Shakeel Butt @ 2026-05-13 14:27 UTC (permalink / raw) To: Qi Zheng Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm On Wed, May 13, 2026 at 06:49:45AM -0700, Shakeel Butt wrote: > On Wed, May 13, 2026 at 10:10:34AM +0800, Qi Zheng wrote: > > > > > > On 5/13/26 12:03 AM, Shakeel Butt wrote: > > > On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: > > > > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: > > > > > > > > > > > > commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > This is most probably due to shuffling of struct mem_cgroup and struct > > > mem_cgroup_per_node members. > > > > Another possibility is that after objcg was split into per-node, the > > slab accounting fast path is still designed assuming only one current > > objcg per CPU: > > > > struct obj_stock_pcp { > > struct obj_cgroup *cached_objcg; > > }; > > > > So it's may cause the following thrashing: > > > > CPU stock cached = memcg/node0 objcg > > free object tagged = memcg/node1 objcg > > => __refill_obj_stock --> objcg mismatch > > => drain_obj_stock() > > => cache switches to node1 objcg > > > > next local allocation tagged = node0 objcg > > => mismatch again > > => drain_obj_stock() > > Actually I think this is the issue, we have ping pong threads running on > different nodes where though theu are in same cgroup but their current->obcg is > for local node and thus this ping pong is thrashing the per-cpu objcg stock. > > The easier fix would be to compare objcg->memcg instead of just objcg during > draining and caching. In addition we can add support for multiple objcg per-cpu > stock caching. Something like the following: From d756abe831a905d6fe32bad9a984fc619dafb7e0 Mon Sep 17 00:00:00 2001 From: Shakeel Butt <shakeel.butt@linux.dev> Date: Wed, 13 May 2026 07:24:55 -0700 Subject: [PATCH] mm/memcontrol: skip obj_stock drain when refilled objcg shares memcg Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> --- mm/memcontrol.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d978e18b9b2d..01ed7a8e18ac 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3318,6 +3318,7 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, bool allow_uncharge) { + struct obj_cgroup *cached; unsigned int nr_pages = 0; if (!stock) { @@ -3327,7 +3328,18 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, goto out; } - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ + cached = READ_ONCE(stock->cached_objcg); + if (cached != objcg && + (!cached || obj_cgroup_memcg(cached) != obj_cgroup_memcg(objcg))) { drain_obj_stock(stock); obj_cgroup_get(objcg); stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) -- 2.53.0-Meta ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-13 14:27 ` Shakeel Butt @ 2026-05-14 7:46 ` Qi Zheng 2026-05-14 13:40 ` Shakeel Butt 0 siblings, 1 reply; 13+ messages in thread From: Qi Zheng @ 2026-05-14 7:46 UTC (permalink / raw) To: Shakeel Butt, kernel test robot Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm On 5/13/26 10:27 PM, Shakeel Butt wrote: > On Wed, May 13, 2026 at 06:49:45AM -0700, Shakeel Butt wrote: >> On Wed, May 13, 2026 at 10:10:34AM +0800, Qi Zheng wrote: >>> >>> >>> On 5/13/26 12:03 AM, Shakeel Butt wrote: >>>> On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: >>>>> >>>>> >>>>> Hello, >>>>> >>>>> kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: >>>>> >>>>> >>>>> commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") >>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master >>>> >>>> This is most probably due to shuffling of struct mem_cgroup and struct >>>> mem_cgroup_per_node members. >>> >>> Another possibility is that after objcg was split into per-node, the >>> slab accounting fast path is still designed assuming only one current >>> objcg per CPU: >>> >>> struct obj_stock_pcp { >>> struct obj_cgroup *cached_objcg; >>> }; >>> >>> So it's may cause the following thrashing: >>> >>> CPU stock cached = memcg/node0 objcg >>> free object tagged = memcg/node1 objcg >>> => __refill_obj_stock --> objcg mismatch >>> => drain_obj_stock() >>> => cache switches to node1 objcg >>> >>> next local allocation tagged = node0 objcg >>> => mismatch again >>> => drain_obj_stock() >> >> Actually I think this is the issue, we have ping pong threads running on >> different nodes where though theu are in same cgroup but their current->obcg is >> for local node and thus this ping pong is thrashing the per-cpu objcg stock. >> >> The easier fix would be to compare objcg->memcg instead of just objcg during >> draining and caching. In addition we can add support for multiple objcg per-cpu >> stock caching. > > Something like the following: > > From d756abe831a905d6fe32bad9a984fc619dafb7e0 Mon Sep 17 00:00:00 2001 > From: Shakeel Butt <shakeel.butt@linux.dev> > Date: Wed, 13 May 2026 07:24:55 -0700 > Subject: [PATCH] mm/memcontrol: skip obj_stock drain when refilled objcg > shares memcg > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> > --- > mm/memcontrol.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index d978e18b9b2d..01ed7a8e18ac 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3318,6 +3318,7 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > unsigned int nr_bytes, > bool allow_uncharge) > { > + struct obj_cgroup *cached; > unsigned int nr_pages = 0; > > if (!stock) { > @@ -3327,7 +3328,18 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > goto out; > } > > - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ > + cached = READ_ONCE(stock->cached_objcg); > + if (cached != objcg && > + (!cached || obj_cgroup_memcg(cached) != obj_cgroup_memcg(objcg))) { > drain_obj_stock(stock); > obj_cgroup_get(objcg); > stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) This change looks like it should be able to fix the ping-pong issue, but I stiil haven't reproduced the performance regression locally. I'll continue testing it. Hi kernel-test-robot, could you help check if the patch above fixes the issue on your end? Thanks, Qi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-14 7:46 ` Qi Zheng @ 2026-05-14 13:40 ` Shakeel Butt 2026-05-15 7:37 ` Qi Zheng 0 siblings, 1 reply; 13+ messages in thread From: Shakeel Butt @ 2026-05-14 13:40 UTC (permalink / raw) To: Qi Zheng, kernel test robot Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm May 14, 2026 at 12:46 AM, "Qi Zheng" <qi.zheng@linux.dev mailto:qi.zheng@linux.dev?to=%22Qi%20Zheng%22%20%3Cqi.zheng%40linux.dev%3E > wrote: > > On 5/13/26 10:27 PM, Shakeel Butt wrote: > > > > > On Wed, May 13, 2026 at 06:49:45AM -0700, Shakeel Butt wrote: > > > > > > > > On Wed, May 13, 2026 at 10:10:34AM +0800, Qi Zheng wrote: > > > > > On 5/13/26 12:03 AM, Shakeel Butt wrote: > > On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: > > > > Hello, > > > > kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: > > > > commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > This is most probably due to shuffling of struct mem_cgroup and struct > > mem_cgroup_per_node members. > > > > Another possibility is that after objcg was split into per-node, the > > slab accounting fast path is still designed assuming only one current > > objcg per CPU: > > > > struct obj_stock_pcp { > > struct obj_cgroup *cached_objcg; > > }; > > > > So it's may cause the following thrashing: > > > > CPU stock cached = memcg/node0 objcg > > free object tagged = memcg/node1 objcg > > => __refill_obj_stock --> objcg mismatch > > => drain_obj_stock() > > => cache switches to node1 objcg > > > > next local allocation tagged = node0 objcg > > => mismatch again > > => drain_obj_stock() > > > > > > > > Actually I think this is the issue, we have ping pong threads running on > > > different nodes where though theu are in same cgroup but their current->obcg is > > > for local node and thus this ping pong is thrashing the per-cpu objcg stock. > > > > > > The easier fix would be to compare objcg->memcg instead of just objcg during > > > draining and caching. In addition we can add support for multiple objcg per-cpu > > > stock caching. > > > > > Something like the following: > > From d756abe831a905d6fe32bad9a984fc619dafb7e0 Mon Sep 17 00:00:00 2001 > > From: Shakeel Butt <shakeel.butt@linux.dev> > > Date: Wed, 13 May 2026 07:24:55 -0700 > > Subject: [PATCH] mm/memcontrol: skip obj_stock drain when refilled objcg > > shares memcg > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> > > --- > > mm/memcontrol.c | 14 +++++++++++++- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index d978e18b9b2d..01ed7a8e18ac 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -3318,6 +3318,7 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > unsigned int nr_bytes, > > bool allow_uncharge) > > { > > + struct obj_cgroup *cached; > > unsigned int nr_pages = 0; > > > if (!stock) { > > @@ -3327,7 +3328,18 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > goto out; > > } > > > - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ > > + cached = READ_ONCE(stock->cached_objcg); > > + if (cached != objcg && > > + (!cached || obj_cgroup_memcg(cached) != obj_cgroup_memcg(objcg))) { > > drain_obj_stock(stock); > > obj_cgroup_get(objcg); > > stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) > > > This change looks like it should be able to fix the ping-pong issue, but > I stiil haven't reproduced the performance regression locally. I'll > continue testing it. Same here, couldn't reproduce locally. It seems like we had to craft a scenario where the pair pingpong threads get their current->objcg from different nodes. I will try that. > > Hi kernel-test-robot, could you help check if the patch above fixes the > issue on your end? > In the meantime, Oliver, can you please help in testing this patch? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-14 13:40 ` Shakeel Butt @ 2026-05-15 7:37 ` Qi Zheng 2026-05-15 17:09 ` Shakeel Butt 0 siblings, 1 reply; 13+ messages in thread From: Qi Zheng @ 2026-05-15 7:37 UTC (permalink / raw) To: Shakeel Butt, kernel test robot Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm Hi Shakeel, On 5/14/26 9:40 PM, Shakeel Butt wrote: > May 14, 2026 at 12:46 AM, "Qi Zheng" <qi.zheng@linux.dev mailto:qi.zheng@linux.dev?to=%22Qi%20Zheng%22%20%3Cqi.zheng%40linux.dev%3E > wrote: > > >> >> On 5/13/26 10:27 PM, Shakeel Butt wrote: >> >>> >>> On Wed, May 13, 2026 at 06:49:45AM -0700, Shakeel Butt wrote: >>> >>>> >>>> On Wed, May 13, 2026 at 10:10:34AM +0800, Qi Zheng wrote: >>>> >>> On 5/13/26 12:03 AM, Shakeel Butt wrote: >>> On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: >>> >>> Hello, >>> >>> kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: >>> >>> commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master >>> >>> This is most probably due to shuffling of struct mem_cgroup and struct >>> mem_cgroup_per_node members. >>> >>> Another possibility is that after objcg was split into per-node, the >>> slab accounting fast path is still designed assuming only one current >>> objcg per CPU: >>> >>> struct obj_stock_pcp { >>> struct obj_cgroup *cached_objcg; >>> }; >>> >>> So it's may cause the following thrashing: >>> >>> CPU stock cached = memcg/node0 objcg >>> free object tagged = memcg/node1 objcg >>> => __refill_obj_stock --> objcg mismatch >>> => drain_obj_stock() >>> => cache switches to node1 objcg >>> >>> next local allocation tagged = node0 objcg >>> => mismatch again >>> => drain_obj_stock() >>> >>>> >>>> Actually I think this is the issue, we have ping pong threads running on >>>> different nodes where though theu are in same cgroup but their current->obcg is >>>> for local node and thus this ping pong is thrashing the per-cpu objcg stock. >>>> >>>> The easier fix would be to compare objcg->memcg instead of just objcg during >>>> draining and caching. In addition we can add support for multiple objcg per-cpu >>>> stock caching. >>>> >>> Something like the following: >>> From d756abe831a905d6fe32bad9a984fc619dafb7e0 Mon Sep 17 00:00:00 2001 >>> From: Shakeel Butt <shakeel.butt@linux.dev> >>> Date: Wed, 13 May 2026 07:24:55 -0700 >>> Subject: [PATCH] mm/memcontrol: skip obj_stock drain when refilled objcg >>> shares memcg >>> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> >>> --- >>> mm/memcontrol.c | 14 +++++++++++++- >>> 1 file changed, 13 insertions(+), 1 deletion(-) >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index d978e18b9b2d..01ed7a8e18ac 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >>> @@ -3318,6 +3318,7 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, >>> unsigned int nr_bytes, >>> bool allow_uncharge) >>> { >>> + struct obj_cgroup *cached; >>> unsigned int nr_pages = 0; >>> > if (!stock) { >>> @@ -3327,7 +3328,18 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, >>> goto out; >>> } >>> > - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ >>> + cached = READ_ONCE(stock->cached_objcg); >>> + if (cached != objcg && >>> + (!cached || obj_cgroup_memcg(cached) != obj_cgroup_memcg(objcg))) { >>> drain_obj_stock(stock); >>> obj_cgroup_get(objcg); >>> stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) >>> >> This change looks like it should be able to fix the ping-pong issue, but >> I stiil haven't reproduced the performance regression locally. I'll >> continue testing it. > > Same here, couldn't reproduce locally. It seems like we had to craft a scenario > where the pair pingpong threads get their current->objcg from different nodes. > I will try that. I still haven't been able to reproduce the LKP results locally, but I used an AI bot to generate a pingpong test case (pasted at the end) and automatically ran the test on a physical machine. The results are as follows: parent: 8285917d6f bad: 01b9da291c fix: 01b9da291c + stock patch | kernel | mq_ops/sec mean | vs parent | drain_obj_stock / round | |--------|-----------------|-----------|-------------------------| | parent | 9.743M | baseline | ~0 | | bad | 7.821M | -19.73% | ~11.16M | | fix | 9.274M | -4.81% | ~0 | Probing the drain_obj_stock() calls confirms that the fix restores the frequency to the parent's baseline. And it seems that besides __refill_obj_stock(), we should also modify __consume_obj_stock()? Thanks, Qi ========= test case ========= objcg_pingpong_mq.c ------------------- #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <mqueue.h> #include <pthread.h> #include <sched.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/resource.h> #include <sys/stat.h> #include <sys/syscall.h> #include <time.h> #include <unistd.h> #ifndef SYS_mq_timedsend #define SYS_mq_timedsend __NR_mq_timedsend #endif #ifndef SYS_mq_timedreceive #define SYS_mq_timedreceive __NR_mq_timedreceive #endif struct worker_arg { mqd_t send_mqd; mqd_t recv_mqd; int cpu; long count; size_t msg_size; int send_first; }; static pthread_barrier_t start_barrier; static void die(const char *what) { fprintf(stderr, "%s: %s\n", what, strerror(errno)); exit(1); } static int add_cpu(int **cpus, size_t *nr, size_t *cap, int cpu) { int *tmp; if (*nr == *cap) { size_t new_cap = *cap ? *cap * 2 : 64; tmp = realloc(*cpus, new_cap * sizeof(**cpus)); if (!tmp) return -1; *cpus = tmp; *cap = new_cap; } (*cpus)[(*nr)++] = cpu; return 0; } static int read_cpulist(const char *path, int **cpus, size_t *nr) { char buf[4096]; char *p, *end; size_t cap = 0; int fd; ssize_t len; *cpus = NULL; *nr = 0; fd = open(path, O_RDONLY | O_CLOEXEC); if (fd < 0) return -1; len = read(fd, buf, sizeof(buf) - 1); close(fd); if (len <= 0) return -1; buf[len] = '\0'; p = buf; while (*p) { long first, last, cpu; while (*p == ',' || *p == '\n' || *p == '\t' || *p == ' ') p++; if (!*p) break; errno = 0; first = strtol(p, &end, 10); if (errno || end == p) return -1; p = end; last = first; if (*p == '-') { p++; errno = 0; last = strtol(p, &end, 10); if (errno || end == p || last < first) return -1; p = end; } for (cpu = first; cpu <= last; cpu++) { if (add_cpu(cpus, nr, &cap, (int)cpu)) return -1; } } return *nr ? 0 : -1; } static long read_cmdline_long(const char *key, long fallback) { char buf[4096]; char *p, *end; int fd; ssize_t len; size_t key_len = strlen(key); long val; fd = open("/proc/cmdline", O_RDONLY | O_CLOEXEC); if (fd < 0) return fallback; len = read(fd, buf, sizeof(buf) - 1); close(fd); if (len <= 0) return fallback; buf[len] = '\0'; p = buf; while ((p = strstr(p, key))) { if ((p == buf || p[-1] == ' ') && p[key_len] == '=') { val = strtol(p + key_len + 1, &end, 10); if (end != p + key_len + 1 && val >= 0) return val; } p += key_len; } return fallback; } static void pin_cpu(int cpu) { cpu_set_t set; CPU_ZERO(&set); CPU_SET(cpu, &set); if (sched_setaffinity(0, sizeof(set), &set)) { fprintf(stderr, "sched_setaffinity(%d): %s\n", cpu, strerror(errno)); exit(2); } } static void *worker(void *data) { struct worker_arg *arg = data; char *msg; long i; msg = malloc(arg->msg_size); if (!msg) die("malloc msg"); memset(msg, 0x5a, arg->msg_size); pin_cpu(arg->cpu); pthread_barrier_wait(&start_barrier); for (i = 0; i < arg->count; i++) { int ret[2]; if (arg->send_first) { ret[0] = syscall(SYS_mq_timedsend, arg->send_mqd, msg, arg->msg_size, 0, NULL); ret[1] = syscall(SYS_mq_timedreceive, arg->recv_mqd, msg, arg->msg_size, NULL, NULL); } else { ret[0] = syscall(SYS_mq_timedreceive, arg->recv_mqd, msg, arg->msg_size, NULL, NULL); ret[1] = syscall(SYS_mq_timedsend, arg->send_mqd, msg, arg->msg_size, 0, NULL); } if (ret[0] < 0 || ret[1] < 0) { fprintf(stderr, "mq failed cpu=%d iter=%ld: %s\n", arg->cpu, i, strerror(errno)); exit(3); } } free(msg); return NULL; } static double nsec_diff(struct timespec a, struct timespec b) { return (double)(b.tv_sec - a.tv_sec) * 1000000000.0 + (double)(b.tv_nsec - a.tv_nsec); } static void usage(const char *prog) { fprintf(stderr, "usage: %s [-p pairs] [-n iterations] [-s msg_size]\n", prog); } int main(int argc, char **argv) { long count = read_cmdline_long("pp_count", 100000); long pairs = read_cmdline_long("pp_pairs", 0); long msg_size_arg = read_cmdline_long("pp_size", 64); struct mq_attr attr = { .mq_flags = 0, .mq_maxmsg = 1, .mq_msgsize = 64, .mq_curmsgs = 0, }; struct rusage ru; pthread_t *threads; struct worker_arg *args; struct timespec start, end; int *node0_cpus, *node1_cpus; size_t node0_nr, node1_nr; long messages, mq_syscalls; int opt, i; while ((opt = getopt(argc, argv, "p:n:s:h")) != -1) { switch (opt) { case 'p': pairs = atol(optarg); break; case 'n': count = atol(optarg); break; case 's': msg_size_arg = atol(optarg); break; default: usage(argv[0]); return opt == 'h' ? 0 : 1; } } if (count <= 0) count = 100000; if (msg_size_arg <= 0) msg_size_arg = 64; if (msg_size_arg > 65536) { fprintf(stderr, "msg_size too large: %ld\n", msg_size_arg); return 1; } attr.mq_msgsize = msg_size_arg; if (read_cpulist("/sys/devices/system/node/node0/cpulist", &node0_cpus, &node0_nr) || read_cpulist("/sys/devices/system/node/node1/cpulist", &node1_cpus, &node1_nr)) { fprintf(stderr, "need at least two NUMA nodes with cpulist files\n"); return 1; } if (pairs <= 0 || pairs > (long)node0_nr || pairs > (long)node1_nr) pairs = node0_nr < node1_nr ? (long)node0_nr : (long)node1_nr; if (pairs <= 0) { fprintf(stderr, "no CPU pairs available\n"); return 1; } threads = calloc(pairs * 2, sizeof(*threads)); args = calloc(pairs * 2, sizeof(*args)); if (!threads || !args) die("calloc"); printf("CONFIG pairs=%ld count=%ld msg_size=%ld node0_cpus=%zu node1_cpus=%zu\n", pairs, count, msg_size_arg, node0_nr, node1_nr); printf("CPUS first=%d:%d last=%d:%d\n", node0_cpus[0], node1_cpus[0], node0_cpus[pairs - 1], node1_cpus[pairs - 1]); fflush(stdout); pthread_barrier_init(&start_barrier, NULL, pairs * 2 + 1); for (i = 0; i < pairs; i++) { char name_ab[64], name_ba[64]; mqd_t mqd_ab, mqd_ba; snprintf(name_ab, sizeof(name_ab), "/objcg_pp_ab_%d_%ld", i, (long)getpid()); snprintf(name_ba, sizeof(name_ba), "/objcg_pp_ba_%d_%ld", i, (long)getpid()); mq_unlink(name_ab); mq_unlink(name_ba); mqd_ab = mq_open(name_ab, O_CREAT | O_RDWR, 0600, &attr); mqd_ba = mq_open(name_ba, O_CREAT | O_RDWR, 0600, &attr); if (mqd_ab == (mqd_t)-1 || mqd_ba == (mqd_t)-1) die("mq_open"); mq_unlink(name_ab); mq_unlink(name_ba); args[i * 2] = (struct worker_arg) { .send_mqd = mqd_ab, .recv_mqd = mqd_ba, .cpu = node0_cpus[i], .count = count, .msg_size = msg_size_arg, .send_first = 1, }; args[i * 2 + 1] = (struct worker_arg) { .send_mqd = mqd_ba, .recv_mqd = mqd_ab, .cpu = node1_cpus[i], .count = count, .msg_size = msg_size_arg, .send_first = 0, }; if (pthread_create(&threads[i * 2], NULL, worker, &args[i * 2])) die("pthread_create"); if (pthread_create(&threads[i * 2 + 1], NULL, worker, &args[i * 2 + 1])) die("pthread_create"); } clock_gettime(CLOCK_MONOTONIC, &start); pthread_barrier_wait(&start_barrier); for (i = 0; i < pairs * 2; i++) pthread_join(threads[i], NULL); clock_gettime(CLOCK_MONOTONIC, &end); getrusage(RUSAGE_SELF, &ru); messages = count * pairs * 2; mq_syscalls = messages * 2; printf("RESULT pairs=%ld messages=%ld mq_syscalls=%ld seconds=%.6f msg_per_sec=%.0f mq_ops_per_sec=%.0f user_sec=%.6f system_sec=%.6f voluntary_cs=%ld involuntary_cs=%ld\n", pairs, messages, mq_syscalls, nsec_diff(start, end) / 1000000000.0, (double)messages * 1000000000.0 / nsec_diff(start, end), (double)mq_syscalls * 1000000000.0 / nsec_diff(start, end), (double)ru.ru_utime.tv_sec + (double)ru.ru_utime.tv_usec / 1000000.0, (double)ru.ru_stime.tv_sec + (double)ru.ru_stime.tv_usec / 1000000.0, ru.ru_nvcsw, ru.ru_nivcsw); return 0; } objcg_stock_probe.c ------------------- #include <linux/atomic.h> #include <linux/init.h> #include <linux/kprobes.h> #include <linux/module.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/uaccess.h> static atomic64_t drain_hits; static atomic64_t refill_hits; static atomic64_t post_alloc_hits; static atomic64_t free_hits; static int drain_pre(struct kprobe *kp, struct pt_regs *regs) { atomic64_inc(&drain_hits); return 0; } static int refill_pre(struct kprobe *kp, struct pt_regs *regs) { atomic64_inc(&refill_hits); return 0; } static int post_alloc_pre(struct kprobe *kp, struct pt_regs *regs) { atomic64_inc(&post_alloc_hits); return 0; } static int free_pre(struct kprobe *kp, struct pt_regs *regs) { atomic64_inc(&free_hits); return 0; } static struct kprobe probes[] = { { .symbol_name = "drain_obj_stock", .pre_handler = drain_pre, }, { .symbol_name = "__refill_obj_stock", .pre_handler = refill_pre, }, { .symbol_name = "__memcg_slab_post_alloc_hook", .pre_handler = post_alloc_pre, }, { .symbol_name = "__memcg_slab_free_hook", .pre_handler = free_pre, }, }; static struct kprobe *probe_ptrs[] = { &probes[0], &probes[1], &probes[2], &probes[3], }; static void reset_counts(void) { atomic64_set(&drain_hits, 0); atomic64_set(&refill_hits, 0); atomic64_set(&post_alloc_hits, 0); atomic64_set(&free_hits, 0); } static int counts_show(struct seq_file *m, void *v) { seq_printf(m, "drain_obj_stock=%lld\n", atomic64_read(&drain_hits)); seq_printf(m, "__refill_obj_stock=%lld\n", atomic64_read(&refill_hits)); seq_printf(m, "__memcg_slab_post_alloc_hook=%lld\n", atomic64_read(&post_alloc_hits)); seq_printf(m, "__memcg_slab_free_hook=%lld\n", atomic64_read(&free_hits)); return 0; } static int counts_open(struct inode *inode, struct file *file) { return single_open(file, counts_show, NULL); } static ssize_t counts_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { reset_counts(); return count; } static const struct proc_ops counts_fops = { .proc_open = counts_open, .proc_read = seq_read, .proc_lseek = seq_lseek, .proc_release = single_release, .proc_write = counts_write, }; static int __init objcg_stock_probe_init(void) { int ret; reset_counts(); ret = register_kprobes(probe_ptrs, ARRAY_SIZE(probe_ptrs)); if (ret) return ret; if (!proc_create("objcg_stock_probe", 0600, NULL, &counts_fops)) { unregister_kprobes(probe_ptrs, ARRAY_SIZE(probe_ptrs)); return -ENOMEM; } return 0; } static void __exit objcg_stock_probe_exit(void) { remove_proc_entry("objcg_stock_probe", NULL); unregister_kprobes(probe_ptrs, ARRAY_SIZE(probe_ptrs)); } module_init(objcg_stock_probe_init); module_exit(objcg_stock_probe_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Count memcg obj stock kprobe hits for ping-pong tests"); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-15 7:37 ` Qi Zheng @ 2026-05-15 17:09 ` Shakeel Butt 2026-05-17 12:55 ` Oliver Sang 0 siblings, 1 reply; 13+ messages in thread From: Shakeel Butt @ 2026-05-15 17:09 UTC (permalink / raw) To: Qi Zheng Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm On Fri, May 15, 2026 at 03:37:22PM +0800, Qi Zheng wrote: > Hi Shakeel, > > On 5/14/26 9:40 PM, Shakeel Butt wrote: > > May 14, 2026 at 12:46 AM, "Qi Zheng" <qi.zheng@linux.dev mailto:qi.zheng@linux.dev?to=%22Qi%20Zheng%22%20%3Cqi.zheng%40linux.dev%3E > wrote: > > > > > > > > > > On 5/13/26 10:27 PM, Shakeel Butt wrote: > > > > > > > > > > > On Wed, May 13, 2026 at 06:49:45AM -0700, Shakeel Butt wrote: > > > > > > > > > > > > > > On Wed, May 13, 2026 at 10:10:34AM +0800, Qi Zheng wrote: > > > > > > > > > On 5/13/26 12:03 AM, Shakeel Butt wrote: > > > > On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: > > > > > > > > Hello, > > > > > > > > kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: > > > > > > > > commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > This is most probably due to shuffling of struct mem_cgroup and struct > > > > mem_cgroup_per_node members. > > > > > > > > Another possibility is that after objcg was split into per-node, the > > > > slab accounting fast path is still designed assuming only one current > > > > objcg per CPU: > > > > > > > > struct obj_stock_pcp { > > > > struct obj_cgroup *cached_objcg; > > > > }; > > > > > > > > So it's may cause the following thrashing: > > > > > > > > CPU stock cached = memcg/node0 objcg > > > > free object tagged = memcg/node1 objcg > > > > => __refill_obj_stock --> objcg mismatch > > > > => drain_obj_stock() > > > > => cache switches to node1 objcg > > > > > > > > next local allocation tagged = node0 objcg > > > > => mismatch again > > > > => drain_obj_stock() > > > > > > > > > > > > > > Actually I think this is the issue, we have ping pong threads running on > > > > > different nodes where though theu are in same cgroup but their current->obcg is > > > > > for local node and thus this ping pong is thrashing the per-cpu objcg stock. > > > > > > > > > > The easier fix would be to compare objcg->memcg instead of just objcg during > > > > > draining and caching. In addition we can add support for multiple objcg per-cpu > > > > > stock caching. > > > > > > > > > Something like the following: > > > > From d756abe831a905d6fe32bad9a984fc619dafb7e0 Mon Sep 17 00:00:00 2001 > > > > From: Shakeel Butt <shakeel.butt@linux.dev> > > > > Date: Wed, 13 May 2026 07:24:55 -0700 > > > > Subject: [PATCH] mm/memcontrol: skip obj_stock drain when refilled objcg > > > > shares memcg > > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> > > > > --- > > > > mm/memcontrol.c | 14 +++++++++++++- > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > index d978e18b9b2d..01ed7a8e18ac 100644 > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -3318,6 +3318,7 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > > > unsigned int nr_bytes, > > > > bool allow_uncharge) > > > > { > > > > + struct obj_cgroup *cached; > > > > unsigned int nr_pages = 0; > > > > > if (!stock) { > > > > @@ -3327,7 +3328,18 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > > > goto out; > > > > } > > > > > - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ > > > > + cached = READ_ONCE(stock->cached_objcg); > > > > + if (cached != objcg && > > > > + (!cached || obj_cgroup_memcg(cached) != obj_cgroup_memcg(objcg))) { > > > > drain_obj_stock(stock); > > > > obj_cgroup_get(objcg); > > > > stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) > > > > > > > This change looks like it should be able to fix the ping-pong issue, but > > > I stiil haven't reproduced the performance regression locally. I'll > > > continue testing it. > > > > Same here, couldn't reproduce locally. It seems like we had to craft a scenario > > where the pair pingpong threads get their current->objcg from different nodes. > > I will try that. > > I still haven't been able to reproduce the LKP results locally, but I > used an AI bot to generate a pingpong test case (pasted at the end) and > automatically ran the test on a physical machine. The results are as > follows: > > parent: 8285917d6f > bad: 01b9da291c > fix: 01b9da291c + stock patch > > | kernel | mq_ops/sec mean | vs parent | drain_obj_stock / round | > |--------|-----------------|-----------|-------------------------| > | parent | 9.743M | baseline | ~0 | > | bad | 7.821M | -19.73% | ~11.16M | > | fix | 9.274M | -4.81% | ~0 | > > Probing the drain_obj_stock() calls confirms that the fix restores the > frequency to the parent's baseline. > > And it seems that besides __refill_obj_stock(), we should also modify > __consume_obj_stock()? > Thanks a lot Qi. I will send the formal patch and will add your Debugged-by if you don't mind. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-15 17:09 ` Shakeel Butt @ 2026-05-17 12:55 ` Oliver Sang 2026-05-17 19:38 ` Shakeel Butt 0 siblings, 1 reply; 13+ messages in thread From: Oliver Sang @ 2026-05-17 12:55 UTC (permalink / raw) To: Shakeel Butt Cc: Qi Zheng, oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm, oliver.sang hi, Shakeel, hi, Qi, On Fri, May 15, 2026 at 10:09:06AM -0700, Shakeel Butt wrote: > On Fri, May 15, 2026 at 03:37:22PM +0800, Qi Zheng wrote: > > Hi Shakeel, > > > > On 5/14/26 9:40 PM, Shakeel Butt wrote: > > > May 14, 2026 at 12:46 AM, "Qi Zheng" <qi.zheng@linux.dev mailto:qi.zheng@linux.dev?to=%22Qi%20Zheng%22%20%3Cqi.zheng%40linux.dev%3E > wrote: > > > > > > > > > > > > > > On 5/13/26 10:27 PM, Shakeel Butt wrote: > > > > > > > > > > > > > > On Wed, May 13, 2026 at 06:49:45AM -0700, Shakeel Butt wrote: > > > > > > > > > > > > > > > > > On Wed, May 13, 2026 at 10:10:34AM +0800, Qi Zheng wrote: > > > > > > > > > > > On 5/13/26 12:03 AM, Shakeel Butt wrote: > > > > > On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: > > > > > > > > > > Hello, > > > > > > > > > > kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: > > > > > > > > > > commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > > > This is most probably due to shuffling of struct mem_cgroup and struct > > > > > mem_cgroup_per_node members. > > > > > > > > > > Another possibility is that after objcg was split into per-node, the > > > > > slab accounting fast path is still designed assuming only one current > > > > > objcg per CPU: > > > > > > > > > > struct obj_stock_pcp { > > > > > struct obj_cgroup *cached_objcg; > > > > > }; > > > > > > > > > > So it's may cause the following thrashing: > > > > > > > > > > CPU stock cached = memcg/node0 objcg > > > > > free object tagged = memcg/node1 objcg > > > > > => __refill_obj_stock --> objcg mismatch > > > > > => drain_obj_stock() > > > > > => cache switches to node1 objcg > > > > > > > > > > next local allocation tagged = node0 objcg > > > > > => mismatch again > > > > > => drain_obj_stock() > > > > > > > > > > > > > > > > > Actually I think this is the issue, we have ping pong threads running on > > > > > > different nodes where though theu are in same cgroup but their current->obcg is > > > > > > for local node and thus this ping pong is thrashing the per-cpu objcg stock. > > > > > > > > > > > > The easier fix would be to compare objcg->memcg instead of just objcg during > > > > > > draining and caching. In addition we can add support for multiple objcg per-cpu > > > > > > stock caching. > > > > > > > > > > > Something like the following: > > > > > From d756abe831a905d6fe32bad9a984fc619dafb7e0 Mon Sep 17 00:00:00 2001 > > > > > From: Shakeel Butt <shakeel.butt@linux.dev> > > > > > Date: Wed, 13 May 2026 07:24:55 -0700 > > > > > Subject: [PATCH] mm/memcontrol: skip obj_stock drain when refilled objcg > > > > > shares memcg > > > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> > > > > > --- > > > > > mm/memcontrol.c | 14 +++++++++++++- > > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > > index d978e18b9b2d..01ed7a8e18ac 100644 > > > > > --- a/mm/memcontrol.c > > > > > +++ b/mm/memcontrol.c > > > > > @@ -3318,6 +3318,7 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > > > > unsigned int nr_bytes, > > > > > bool allow_uncharge) > > > > > { > > > > > + struct obj_cgroup *cached; > > > > > unsigned int nr_pages = 0; > > > > > > if (!stock) { > > > > > @@ -3327,7 +3328,18 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > > > > goto out; > > > > > } > > > > > > - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ > > > > > + cached = READ_ONCE(stock->cached_objcg); > > > > > + if (cached != objcg && > > > > > + (!cached || obj_cgroup_memcg(cached) != obj_cgroup_memcg(objcg))) { > > > > > drain_obj_stock(stock); > > > > > obj_cgroup_get(objcg); > > > > > stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) > > > > > > > > > This change looks like it should be able to fix the ping-pong issue, but > > > > I stiil haven't reproduced the performance regression locally. I'll > > > > continue testing it. > > > > > > Same here, couldn't reproduce locally. It seems like we had to craft a scenario > > > where the pair pingpong threads get their current->objcg from different nodes. > > > I will try that. > > > > I still haven't been able to reproduce the LKP results locally, but I > > used an AI bot to generate a pingpong test case (pasted at the end) and > > automatically ran the test on a physical machine. The results are as > > follows: > > > > parent: 8285917d6f > > bad: 01b9da291c > > fix: 01b9da291c + stock patch > > > > | kernel | mq_ops/sec mean | vs parent | drain_obj_stock / round | > > |--------|-----------------|-----------|-------------------------| > > | parent | 9.743M | baseline | ~0 | > > | bad | 7.821M | -19.73% | ~11.16M | > > | fix | 9.274M | -4.81% | ~0 | > > > > Probing the drain_obj_stock() calls confirms that the fix restores the > > frequency to the parent's baseline. > > > > And it seems that besides __refill_obj_stock(), we should also modify > > __consume_obj_stock()? > > > > Thanks a lot Qi. I will send the formal patch and will add your Debugged-by if > you don't mind. > Tested-by: kernel test robot <oliver.sang@intel.com> we tested above patch, and it recovers the regression: ========================================================================================= compiler/cpufreq_governor/kconfig/method/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-14/performance/x86_64-rhel-9.4/mq/100%/debian-13-x86_64-20250902.cgz/lkp-spr-r02/switch/stress-ng/60s commit: 8285917d6f ("mm: memcontrol: prepare for reparenting non-hierarchical stats") 01b9da291c ("mm: memcontrol: convert objcg to be per-memcg per-node type") 682fd4e9ff <--- above patch from Shakeel 8285917d6f383aef 01b9da291c4969354807b52956f 682fd4e9ffd4009805f81dd25ed ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 5849 +210.2% 18145 ± 3% +1.5% 5935 stress-ng.switch.nanosecs_per_context_switch_mq_method 2.296e+09 -67.7% 7.408e+08 ± 3% -1.4% 2.263e+09 stress-ng.switch.ops 38288993 -67.7% 12355813 ± 3% -1.4% 37739220 stress-ng.switch.ops_per_sec full compasison is as below [3] but there are two notes. #1 is that we noticed there is a fomal patch later from Shakeel in [1] which has more changes. not sure if this test is enough? do you want us to test [1] further? #2: when we test above patch, we found the server easy to crash while running tests. we try to run up to 20 times, only 2 of them run successfully (above 37739220 is just the average data from these 2 runs, since the data is stable, we think maybe it's ok to report to you with this data). we also noticed for [1] there is a [syzbot ci] report in [2]. since we don't have serial output for our test server in this report which is for performance tests, we cannot say if other 18 runs failed due to similar reason. just FYI. [1] https://lore.kernel.org/all/20260515171953.2224503-1-shakeel.butt@linux.dev/ [2] https://lore.kernel.org/all/6a081599.170a0220.4530d.0003.GAE@google.com/ [3] ========================================================================================= compiler/cpufreq_governor/kconfig/method/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-14/performance/x86_64-rhel-9.4/mq/100%/debian-13-x86_64-20250902.cgz/lkp-spr-r02/switch/stress-ng/60s commit: 8285917d6f ("mm: memcontrol: prepare for reparenting non-hierarchical stats") 01b9da291c ("mm: memcontrol: convert objcg to be per-memcg per-node type") 682fd4e9ff <--- above patch from Shakeel 8285917d6f383aef 01b9da291c4969354807b52956f 682fd4e9ffd4009805f81dd25ed ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 5849 +210.2% 18145 ± 3% +1.5% 5935 stress-ng.switch.nanosecs_per_context_switch_mq_method 2.296e+09 -67.7% 7.408e+08 ± 3% -1.4% 2.263e+09 stress-ng.switch.ops 38288993 -67.7% 12355813 ± 3% -1.4% 37739220 stress-ng.switch.ops_per_sec 93416932 -68.6% 29310048 ± 3% +0.1% 93506345 stress-ng.time.involuntary_context_switches 15845 +11.0% 17584 +0.3% 15894 stress-ng.time.percent_of_cpu_this_job_got 8556 +18.2% 10115 +0.5% 8597 stress-ng.time.system_time 963.36 -53.5% 447.72 ± 3% -1.3% 950.84 stress-ng.time.user_time 1.518e+09 -69.7% 4.607e+08 ± 2% -1.5% 1.496e+09 stress-ng.time.voluntary_context_switches 1124 ± 17% +34.3% 1509 ± 8% -7.7% 1037 ± 14% perf-c2c.HITM.remote 2.55e+09 -12.3% 2.236e+09 ± 3% -2.3% 2.491e+09 cpuidle..time 8.29e+08 -71.8% 2.337e+08 ± 2% -1.8% 8.143e+08 cpuidle..usage 14184409 ± 2% -16.4% 11860068 +0.5% 14255389 ± 2% vmstat.memory.cache 39204964 -69.7% 11868752 ± 2% -1.2% 38715052 vmstat.system.cs 1808848 -38.5% 1111830 -0.8% 1793867 vmstat.system.in 115757 ± 52% +8.5% 125625 ± 44% -62.0% 44014 ± 32% numa-numastat.node0.other_node 4102960 ± 5% -19.0% 3324393 ± 4% -1.0% 4063575 ± 2% numa-numastat.node1.local_node 4218983 ± 3% -18.7% 3430325 ± 3% +7.3% 4526769 ± 3% numa-numastat.node1.numa_hit 116023 ± 52% -8.7% 105932 ± 52% +62.0% 187903 ± 7% numa-numastat.node1.other_node 93416932 -68.6% 29310048 ± 3% +0.1% 93506345 time.involuntary_context_switches 15845 +11.0% 17584 +0.3% 15894 time.percent_of_cpu_this_job_got 8556 +18.2% 10115 +0.5% 8597 time.system_time 963.36 -53.5% 447.72 ± 3% -1.3% 950.84 time.user_time 1.518e+09 -69.7% 4.607e+08 ± 2% -1.5% 1.496e+09 time.voluntary_context_switches 22.48 -4.4 18.08 -0.2 22.29 mpstat.cpu.all.idle% 1.13 -0.4 0.73 -0.0 1.12 mpstat.cpu.all.irq% 0.10 -0.0 0.09 -0.0 0.10 mpstat.cpu.all.soft% 67.98 +9.1 77.06 +0.3 68.29 mpstat.cpu.all.sys% 8.32 -4.3 4.04 ± 2% -0.1 8.21 mpstat.cpu.all.usr% 17.33 ± 2% +15.4% 20.00 ± 4% +3.8% 18.00 mpstat.max_utilization.seconds 10552401 ± 4% -23.3% 8092823 +0.3% 10586331 ± 6% numa-meminfo.node1.Active 10552392 ± 4% -23.3% 8092820 +0.3% 10586323 ± 6% numa-meminfo.node1.Active(anon) 12454155 ± 15% -34.9% 8106052 -3.6% 12008629 ± 16% numa-meminfo.node1.FilePages 559046 ± 8% -19.2% 451929 ± 2% -2.4% 545736 ± 10% numa-meminfo.node1.Mapped 14688311 ± 13% -30.0% 10285394 ± 2% -4.7% 14004338 ± 15% numa-meminfo.node1.MemUsed 10028979 ± 3% -22.4% 7783864 +0.8% 10109957 ± 3% numa-meminfo.node1.Shmem 10939677 ± 2% -21.0% 8641166 +0.5% 10995134 ± 2% meminfo.Active 10939661 ± 2% -21.0% 8641149 +0.5% 10995117 ± 2% meminfo.Active(anon) 13917673 ± 2% -16.4% 11633722 +0.4% 13973077 ± 2% meminfo.Cached 14400924 ± 2% -16.0% 12102150 +0.6% 14482134 ± 2% meminfo.Committed_AS 8394752 ± 5% +16.7% 9796949 ± 8% +10.5% 9274368 ± 16% meminfo.DirectMap2M 617671 -12.0% 543559 -1.5% 608569 meminfo.Mapped 18364992 -12.5% 16065468 +0.3% 18426986 meminfo.Memused 10124702 ± 2% -22.6% 7839682 +0.6% 10181038 ± 3% meminfo.Shmem 18393665 -12.5% 16100473 +0.4% 18458236 meminfo.max_used_kB 115757 ± 52% +8.5% 125625 ± 44% -62.0% 44014 ± 32% numa-vmstat.node0.numa_other 2638537 ± 4% -23.3% 2022639 +0.3% 2647116 ± 6% numa-vmstat.node1.nr_active_anon 3113944 ± 15% -34.9% 2025946 -3.6% 3002667 ± 16% numa-vmstat.node1.nr_file_pages 139848 ± 9% -19.3% 112912 ± 2% -2.4% 136548 ± 10% numa-vmstat.node1.nr_mapped 2507650 ± 3% -22.4% 1945399 +0.8% 2527999 ± 3% numa-vmstat.node1.nr_shmem 2638531 ± 4% -23.3% 2022634 +0.3% 2647111 ± 6% numa-vmstat.node1.nr_zone_active_anon 4219206 ± 3% -18.7% 3430093 ± 3% +7.3% 4527044 ± 3% numa-vmstat.node1.numa_hit 4103183 ± 4% -19.0% 3324161 ± 4% -1.0% 4063850 ± 2% numa-vmstat.node1.numa_local 116023 ± 52% -8.7% 105932 ± 52% +62.0% 187903 ± 7% numa-vmstat.node1.numa_other 10.59 -7.6 2.97 ± 8% -0.0 10.58 turbostat.C1% 0.85 ± 3% +9.1 9.96 ± 2% +0.1 0.90 turbostat.C1E% 1.29 ± 6% +19.4% 1.54 ± 2% +7.8% 1.39 turbostat.CPU%c1 48.67 ± 2% -15.1% 41.33 ± 3% -14.7% 41.50 turbostat.CoreTmp 0.56 -60.7% 0.22 ± 3% +1.8% 0.57 turbostat.IPC 1.153e+08 -38.7% 70680365 -1.1% 1.14e+08 turbostat.IRQ 10242404 -14.8% 8723704 -0.2% 10218332 turbostat.NMI 88.65 -84.0 4.67 ± 33% -2.4 86.25 ± 2% turbostat.PKG_% 3.82 -3.8 0.04 ± 10% -0.1 3.68 turbostat.POLL% 48.67 ± 2% -13.7% 42.00 ± 3% -12.7% 42.50 turbostat.PkgTmp 683.77 -13.1% 594.00 -0.1% 683.24 turbostat.PkgWatt 18.74 -3.3% 18.13 -1.0% 18.54 turbostat.RAMWatt 2735312 ± 2% -21.0% 2160742 +0.5% 2749182 ± 2% proc-vmstat.nr_active_anon 204708 -1.6% 201435 -0.1% 204401 proc-vmstat.nr_anon_pages 3479812 ± 2% -16.4% 2908863 +0.4% 3493660 ± 2% proc-vmstat.nr_file_pages 154477 -12.0% 135959 -1.5% 152200 proc-vmstat.nr_mapped 2531568 ± 2% -22.6% 1960353 +0.6% 2545651 ± 3% proc-vmstat.nr_shmem 42010 -3.5% 40543 +0.0% 42030 proc-vmstat.nr_slab_reclaimable 2735312 ± 2% -21.0% 2160742 +0.5% 2749182 ± 2% proc-vmstat.nr_zone_active_anon 210167 ± 5% -11.5% 185950 ± 11% +7.6% 226044 ± 3% proc-vmstat.numa_hint_faults 198174 ± 6% -7.8% 182781 ± 11% +12.5% 222958 ± 4% proc-vmstat.numa_hint_faults_local 4730338 ± 2% -18.2% 3871343 +6.3% 5030657 ± 3% proc-vmstat.numa_hit 4498551 ± 2% -19.1% 3639783 +0.6% 4523449 ± 2% proc-vmstat.numa_local 22013 ± 74% -11.9% 19394 ± 79% -88.0% 2632 ± 2% proc-vmstat.numa_pages_migrated 4808959 ± 2% -17.8% 3954157 +0.3% 4821123 ± 2% proc-vmstat.pgalloc_normal 806619 -5.1% 765525 ± 2% +1.1% 815270 proc-vmstat.pgfault 467932 ± 3% -0.3% 466368 ± 2% -5.2% 443691 proc-vmstat.pgfree 22013 ± 74% -11.9% 19394 ± 79% -88.0% 2632 ± 2% proc-vmstat.pgmigrate_success 34098 ± 3% -14.8% 29054 -8.8% 31100 ± 4% proc-vmstat.pgreuse 0.11 +59.9% 0.17 ± 3% -3.8% 0.10 ± 2% perf-stat.i.MPKI 6.653e+10 -61.7% 2.546e+10 ± 2% +1.7% 6.767e+10 perf-stat.i.branch-instructions 0.76 +0.1 0.89 +0.0 0.78 perf-stat.i.branch-miss-rate% 4.685e+08 -59.7% 1.888e+08 ± 2% +4.8% 4.911e+08 perf-stat.i.branch-misses 1.12 +0.6 1.76 ± 3% +0.0 1.12 perf-stat.i.cache-miss-rate% 35553724 ± 3% -40.4% 21188697 -0.9% 35218701 perf-stat.i.cache-misses 4.194e+09 -68.3% 1.331e+09 ± 2% -1.3% 4.139e+09 perf-stat.i.cache-references 40710745 -69.6% 12395879 ± 2% -1.5% 40082554 perf-stat.i.context-switches 1.84 +189.1% 5.31 ± 2% -1.7% 1.81 perf-stat.i.cpi 5.965e+11 -2.0% 5.848e+11 -0.1% 5.962e+11 perf-stat.i.cpu-cycles 8813175 -64.5% 3125097 ± 2% -2.0% 8632551 perf-stat.i.cpu-migrations 24447 ± 3% +68.5% 41184 ± 2% +2.6% 25087 perf-stat.i.cycles-between-cache-misses 3.374e+11 -61.8% 1.287e+11 ± 2% +1.5% 3.426e+11 perf-stat.i.instructions 0.57 -60.8% 0.22 ± 2% +1.6% 0.58 perf-stat.i.ipc 0.01 ±141% +185.0% 0.04 ± 46% +280.6% 0.05 ± 20% perf-stat.i.major-faults 221.10 -68.6% 69.32 ± 2% -1.6% 217.48 perf-stat.i.metric.K/sec 11782 ± 3% -6.1% 11068 ± 3% -1.0% 11660 perf-stat.i.minor-faults 11782 ± 3% -6.1% 11068 ± 3% -1.0% 11660 perf-stat.i.page-faults 0.10 ± 2% +59.2% 0.17 ± 3% -3.0% 0.10 perf-stat.overall.MPKI 0.71 +0.0 0.75 +0.0 0.73 perf-stat.overall.branch-miss-rate% 0.83 ± 3% +0.7 1.56 ± 3% -0.0 0.82 perf-stat.overall.cache-miss-rate% 1.78 +162.2% 4.67 ± 2% -1.4% 1.76 perf-stat.overall.cpi 17181 ± 3% +64.6% 28283 +1.6% 17452 perf-stat.overall.cycles-between-cache-misses 0.56 -61.8% 0.21 ± 2% +1.4% 0.57 perf-stat.overall.ipc 6.388e+10 -62.3% 2.409e+10 ± 2% +1.6% 6.492e+10 perf-stat.ps.branch-instructions 4.538e+08 -60.0% 1.817e+08 ± 2% +5.0% 4.764e+08 perf-stat.ps.branch-misses 33674051 ± 3% -40.1% 20155290 -1.6% 33143601 perf-stat.ps.cache-misses 4.077e+09 -68.2% 1.296e+09 ± 2% -1.1% 4.032e+09 perf-stat.ps.cache-references 39570629 -69.5% 12072702 ± 2% -1.4% 39036286 perf-stat.ps.context-switches 5.78e+11 -1.4% 5.7e+11 +0.1% 5.784e+11 perf-stat.ps.cpu-cycles 8584979 -64.5% 3051930 ± 2% -1.8% 8430431 perf-stat.ps.cpu-migrations 3.243e+11 -62.4% 1.22e+11 ± 2% +1.5% 3.291e+11 perf-stat.ps.instructions 0.01 ±141% +195.1% 0.03 ± 41% +270.3% 0.04 ± 20% perf-stat.ps.major-faults 11022 ± 4% -6.5% 10300 ± 3% -0.1% 11015 perf-stat.ps.minor-faults 11022 ± 4% -6.5% 10300 ± 3% -0.1% 11015 perf-stat.ps.page-faults 1.941e+13 -61.9% 7.405e+12 ± 3% +2.5% 1.989e+13 perf-stat.total.instructions 18451 +9.9% 20272 +2.2% 18858 ± 4% sched_debug.cfs_rq:/.avg_vruntime.avg 5869 ± 4% -7.4% 5437 ± 5% +10.3% 6472 ± 19% sched_debug.cfs_rq:/.avg_vruntime.stddev 0.68 ± 2% -12.9% 0.59 ± 3% +0.6% 0.68 sched_debug.cfs_rq:/.h_nr_queued.stddev 0.62 ± 6% -12.9% 0.54 ± 2% +0.4% 0.62 sched_debug.cfs_rq:/.h_nr_runnable.stddev 8469 +12.7% 9544 ± 2% +3.0% 8727 ± 4% sched_debug.cfs_rq:/.left_deadline.stddev 8467 +12.7% 9542 ± 2% +3.1% 8727 ± 4% sched_debug.cfs_rq:/.left_vruntime.stddev 3513124 ± 25% -30.0% 2459550 ± 10% -2.7% 3419965 ± 22% sched_debug.cfs_rq:/.load.max 588329 ± 5% -11.2% 522578 ± 5% +4.1% 612170 sched_debug.cfs_rq:/.load.stddev 50699 ± 17% -19.8% 40655 ± 7% +20.0% 60854 ± 36% sched_debug.cfs_rq:/.load_avg.max 0.68 ± 2% -12.9% 0.59 ± 3% +0.9% 0.68 sched_debug.cfs_rq:/.nr_queued.stddev 38.80 ± 32% +108.5% 80.90 ± 16% +276.0% 145.88 ± 64% sched_debug.cfs_rq:/.removed.load_avg.avg 857.83 ± 12% +61.0% 1381 ± 12% +2561.3% 22829 ± 96% sched_debug.cfs_rq:/.removed.load_avg.max 152.02 ± 18% +57.2% 239.02 ± 11% +955.1% 1604 ± 88% sched_debug.cfs_rq:/.removed.load_avg.stddev 26.08 ± 28% +143.0% 63.37 ± 14% +23.5% 32.20 ± 9% sched_debug.cfs_rq:/.removed.runnable_avg.avg 547.00 ± 13% +88.7% 1032 ± 12% +6.4% 582.00 sched_debug.cfs_rq:/.removed.runnable_avg.max 94.86 ± 17% +84.3% 174.82 ± 9% +19.2% 113.04 ± 2% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 9.09 ± 52% +253.3% 32.11 ± 17% +53.1% 13.91 ± 6% sched_debug.cfs_rq:/.removed.util_avg.avg 275.17 ± 3% +130.3% 633.67 ± 9% +0.0% 275.25 sched_debug.cfs_rq:/.removed.util_avg.max 44.90 ± 30% +126.4% 101.66 ± 11% +30.9% 58.78 ± 2% sched_debug.cfs_rq:/.removed.util_avg.stddev 8467 +12.7% 9542 ± 2% +3.1% 8727 ± 4% sched_debug.cfs_rq:/.right_vruntime.stddev 659.63 ± 3% +13.0% 745.47 +0.6% 663.51 sched_debug.cfs_rq:/.runnable_avg.avg 271.34 ± 2% +31.2% 355.98 ± 3% +0.8% 273.61 sched_debug.cfs_rq:/.runnable_avg.stddev 0.00 ± 26% +110.4% 0.00 ± 45% +3.3% 0.00 ± 21% sched_debug.cfs_rq:/.spread.avg 0.01 ± 13% +174.3% 0.02 ± 25% -36.7% 0.00 sched_debug.cfs_rq:/.spread.max 0.00 ± 7% +146.2% 0.00 ± 27% -13.2% 0.00 ± 8% sched_debug.cfs_rq:/.spread.stddev 431.00 +14.5% 493.62 -1.9% 422.73 sched_debug.cfs_rq:/.util_avg.avg 1061 ± 3% +26.4% 1341 ± 3% -3.5% 1024 sched_debug.cfs_rq:/.util_avg.max 151.53 ± 5% +50.1% 227.46 ± 2% -8.0% 139.45 sched_debug.cfs_rq:/.util_avg.stddev 206.96 +17.5% 243.18 ± 3% +10.4% 228.53 sched_debug.cfs_rq:/.util_est.avg 18451 +9.9% 20272 +2.2% 18858 ± 4% sched_debug.cfs_rq:/.zero_vruntime.avg 5869 ± 4% -7.4% 5437 ± 5% +10.3% 6472 ± 19% sched_debug.cfs_rq:/.zero_vruntime.stddev 460133 ± 2% +2.0% 469231 +5.0% 483144 ± 5% sched_debug.cpu.avg_idle.avg 2345 +33.6% 3133 ± 5% +887.0% 23149 ± 88% sched_debug.cpu.avg_idle.min 13.18 ± 2% +39.8% 18.42 ± 6% +5.1% 13.85 ± 5% sched_debug.cpu.clock.stddev 3961 +14.6% 4541 +0.7% 3989 sched_debug.cpu.curr->pid.avg 3213 -15.4% 2718 -5.7% 3030 sched_debug.cpu.curr->pid.stddev 0.00 ± 29% +157.3% 0.00 ± 35% -9.3% 0.00 ± 26% sched_debug.cpu.next_balance.stddev 0.70 -15.8% 0.59 ± 3% -2.7% 0.68 sched_debug.cpu.nr_running.stddev 5474800 -69.7% 1660250 ± 2% -1.5% 5392540 sched_debug.cpu.nr_switches.avg 5648642 -65.5% 1946319 ± 5% -1.3% 5576589 sched_debug.cpu.nr_switches.max 2229198 ± 8% -67.1% 734011 ± 20% +11.1% 2476939 ± 5% sched_debug.cpu.nr_switches.min 297592 ± 6% -25.9% 220513 ± 18% -5.6% 281029 ± 8% sched_debug.cpu.nr_switches.stddev 27.83 ± 30% -24.0% 21.17 ± 17% +33.8% 37.25 ± 14% sched_debug.cpu.nr_uninterruptible.max 23.75 -10.9 12.88 -0.3 23.42 perf-profile.calltrace.cycles-pp.common_startup_64 23.65 -10.8 12.82 -0.3 23.32 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64 23.62 -10.8 12.81 -0.3 23.28 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64 23.51 -10.8 12.76 -0.3 23.18 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64 12.93 -7.0 5.94 ± 4% -0.1 12.82 perf-profile.calltrace.cycles-pp.wake_up_q.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe 12.78 -6.9 5.89 ± 4% -0.1 12.66 perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_up_q.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64 11.30 -5.2 6.07 ± 3% -0.1 11.22 perf-profile.calltrace.cycles-pp.wake_up_q.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe 11.17 -5.2 6.02 ± 3% -0.1 11.10 perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_up_q.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64 9.89 -4.8 5.08 ± 4% -0.1 9.76 perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.wake_up_q 4.52 -4.5 0.00 -0.1 4.39 perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 12.41 -4.4 7.96 -0.2 12.20 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64 9.19 -4.4 4.77 ± 4% -0.1 9.06 perf-profile.calltrace.cycles-pp.select_idle_cpu.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up 11.29 -4.0 7.29 -0.2 11.11 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 11.38 -4.0 7.40 -0.2 11.20 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 8.04 -3.9 4.13 ± 4% -0.1 7.93 perf-profile.calltrace.cycles-pp.select_idle_core.select_idle_cpu.select_idle_sibling.select_task_rq_fair.select_task_rq 6.91 -3.8 3.08 ± 4% -0.1 6.80 perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.wake_up_q.do_mq_timedreceive.__x64_sys_mq_timedreceive 6.76 -3.8 3.00 ± 4% -0.1 6.66 perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.wake_up_q.do_mq_timedreceive 8.71 -3.7 5.00 -0.0 8.67 perf-profile.calltrace.cycles-pp.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.71 -3.4 2.26 ± 4% -0.1 5.64 perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.common_startup_64 8.12 -3.4 4.72 -0.0 8.08 perf-profile.calltrace.cycles-pp.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64 7.76 -3.2 4.55 -0.0 7.72 perf-profile.calltrace.cycles-pp.schedule.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend.__x64_sys_mq_timedsend 8.39 -3.1 5.26 -0.0 8.35 perf-profile.calltrace.cycles-pp.wq_sleep.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe 7.47 -3.1 4.35 -0.0 7.44 perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedsend 4.92 -2.9 2.00 ± 4% -0.1 4.86 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary 7.79 -2.8 4.97 -0.0 7.74 perf-profile.calltrace.cycles-pp.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64 7.48 -2.7 4.79 -0.0 7.44 perf-profile.calltrace.cycles-pp.schedule.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedreceive.__x64_sys_mq_timedreceive 7.17 -2.6 4.60 -0.0 7.13 perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_hrtimeout_range_clock.wq_sleep.do_mq_timedreceive 5.54 -2.3 3.24 ± 4% -0.1 5.48 perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.wake_up_q.do_mq_timedsend.__x64_sys_mq_timedsend 5.41 -2.2 3.16 ± 4% -0.1 5.35 perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.wake_up_q.do_mq_timedsend 3.88 -2.2 1.67 ± 4% -0.1 3.82 perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry 4.04 -2.2 1.88 -0.0 3.99 perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64 3.44 -2.2 1.28 ± 3% -0.1 3.38 perf-profile.calltrace.cycles-pp.store_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.84 -2.0 1.80 -0.0 3.79 perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary 4.83 -2.0 2.85 -0.0 4.80 perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.schedule_hrtimeout_range_clock.wq_sleep 4.71 -1.9 2.78 -0.0 4.68 perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule_hrtimeout_range_clock 1.84 -1.8 0.00 -0.0 1.81 perf-profile.calltrace.cycles-pp.wake_affine.select_task_rq_fair.select_task_rq.try_to_wake_up.wake_up_q 4.39 -1.8 2.58 -0.0 4.36 perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule 2.37 -1.5 0.84 ± 3% -0.0 2.34 perf-profile.calltrace.cycles-pp._copy_to_user.store_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64 2.66 -1.4 1.26 ± 4% -0.0 2.62 perf-profile.calltrace.cycles-pp.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle 2.73 -1.4 1.33 ± 4% -0.0 2.70 perf-profile.calltrace.cycles-pp.arch_exit_to_user_mode_prepare.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.50 -1.3 2.17 -0.0 3.49 perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule 2.37 -1.3 1.06 ± 2% -0.0 2.32 perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.23 -1.2 0.00 -0.0 1.20 perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule_idle.do_idle.cpu_startup_entry 1.15 -1.2 0.00 -0.0 1.13 perf-profile.calltrace.cycles-pp.task_h_load.wake_affine.select_task_rq_fair.select_task_rq.try_to_wake_up 1.15 -1.1 0.00 -0.0 1.12 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule_idle.do_idle 4.10 -1.1 3.03 +0.0 4.10 perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.schedule_hrtimeout_range_clock.wq_sleep 2.20 -1.1 1.13 ± 5% -0.0 2.16 perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue 2.20 -1.1 1.15 ± 4% -0.0 2.18 perf-profile.calltrace.cycles-pp.switch_fpu_return.arch_exit_to_user_mode_prepare.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.05 -1.0 0.00 +0.0 1.06 perf-profile.calltrace.cycles-pp.set_next_task_idle.__pick_next_task.__schedule.schedule.schedule_hrtimeout_range_clock 1.21 -1.0 0.18 ±141% -0.0 1.20 perf-profile.calltrace.cycles-pp.perf_trace_sched_wakeup_template.try_to_wake_up.wake_up_q.do_mq_timedsend.__x64_sys_mq_timedsend 1.20 -1.0 0.17 ±141% -0.0 1.19 perf-profile.calltrace.cycles-pp.do_perf_trace_sched_wakeup_template.perf_trace_sched_wakeup_template.try_to_wake_up.wake_up_q.do_mq_timedsend 1.74 -1.0 0.73 ± 3% +0.0 1.77 perf-profile.calltrace.cycles-pp.msg_get.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.99 -1.0 0.00 +0.0 1.00 perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.98 -1.0 0.00 +0.0 1.00 perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.32 -1.0 0.37 ± 70% -0.0 1.32 perf-profile.calltrace.cycles-pp.do_perf_trace_sched_wakeup_template.perf_trace_sched_wakeup_template.try_to_wake_up.wake_up_q.do_mq_timedreceive 1.93 -0.9 0.99 ± 4% -0.0 1.89 perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue 1.34 -0.8 0.54 ± 5% -0.0 1.34 perf-profile.calltrace.cycles-pp.perf_trace_sched_wakeup_template.try_to_wake_up.wake_up_q.do_mq_timedreceive.__x64_sys_mq_timedreceive 1.52 -0.8 0.73 ± 4% -0.0 1.50 perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.enqueue_task.ttwu_do_activate.sched_ttwu_pending 0.74 -0.7 0.00 -0.0 0.73 perf-profile.calltrace.cycles-pp.__smp_call_single_queue.ttwu_queue_wakelist.try_to_wake_up.wake_up_q.do_mq_timedsend 1.05 -0.7 0.34 ± 70% -0.0 1.05 perf-profile.calltrace.cycles-pp.__switch_to 0.71 -0.7 0.00 -0.0 0.68 perf-profile.calltrace.cycles-pp.msg_insert.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.57 -0.7 0.90 ± 6% -0.0 1.56 perf-profile.calltrace.cycles-pp.restore_fpregs_from_fpstate.switch_fpu_return.arch_exit_to_user_mode_prepare.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.67 -0.7 0.00 -0.0 0.64 perf-profile.calltrace.cycles-pp.__wake_up.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.66 -0.7 0.00 -0.0 0.65 perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.enqueue_task.ttwu_do_activate 0.65 -0.6 0.00 -0.0 0.64 perf-profile.calltrace.cycles-pp.___task_rq_lock.try_to_wake_up.wake_up_q.do_mq_timedsend.__x64_sys_mq_timedsend 0.60 -0.6 0.00 -0.0 0.58 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__wake_up.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64 0.56 -0.6 0.00 -0.0 0.55 perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested.___task_rq_lock.try_to_wake_up.wake_up_q 0.54 -0.5 0.00 -0.0 0.53 perf-profile.calltrace.cycles-pp.os_xsave 0.52 -0.5 0.00 -0.0 0.51 perf-profile.calltrace.cycles-pp.__switch_to_asm 2.56 -0.3 2.28 -0.0 2.54 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedule_hrtimeout_range_clock 0.00 +0.0 0.00 +0.5 0.50 perf-profile.calltrace.cycles-pp.__check_object_size.load_msg.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64 5.87 +0.9 6.78 -0.0 5.86 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 0.00 +0.9 0.91 ± 30% +0.0 0.00 perf-profile.calltrace.cycles-pp.sched_balance_newidle.pick_next_task_fair.__pick_next_task.__schedule.schedule 35.80 +3.5 39.29 +0.1 35.92 perf-profile.calltrace.cycles-pp.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe 35.59 +3.6 39.21 +0.1 35.70 perf-profile.calltrace.cycles-pp.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +4.4 4.37 ± 3% +0.0 0.00 perf-profile.calltrace.cycles-pp.drain_obj_stock.__refill_obj_stock.__memcg_slab_post_alloc_hook.__kmalloc_node_noprof.load_msg 0.00 +4.4 4.39 ± 4% +0.0 0.00 perf-profile.calltrace.cycles-pp.drain_obj_stock.__refill_obj_stock.__memcg_slab_free_hook.kfree.free_msg 0.00 +8.0 8.01 ± 4% +0.0 0.00 perf-profile.calltrace.cycles-pp.__refill_obj_stock.__memcg_slab_post_alloc_hook.__kmalloc_node_noprof.load_msg.do_mq_timedsend 0.00 +8.3 8.29 ± 4% +0.0 0.00 perf-profile.calltrace.cycles-pp.__refill_obj_stock.__memcg_slab_free_hook.kfree.free_msg.do_mq_timedreceive 28.51 +13.5 41.99 +0.3 28.81 perf-profile.calltrace.cycles-pp.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe 28.23 +13.5 41.71 +0.3 28.53 perf-profile.calltrace.cycles-pp.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe 70.69 +13.7 84.35 +0.3 71.02 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 70.44 +13.8 84.26 +0.3 70.77 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.99 +20.2 23.24 ± 2% +0.4 3.35 perf-profile.calltrace.cycles-pp.free_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.79 +20.4 23.15 ± 2% +0.4 3.16 perf-profile.calltrace.cycles-pp.kfree.free_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive.do_syscall_64 2.43 +20.6 22.98 ± 2% +0.4 2.80 perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kfree.free_msg.do_mq_timedreceive.__x64_sys_mq_timedreceive 2.26 +26.0 28.23 ± 2% +0.6 2.90 perf-profile.calltrace.cycles-pp.load_msg.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.99 +26.8 27.80 ± 2% +0.6 1.60 perf-profile.calltrace.cycles-pp.__kmalloc_node_noprof.load_msg.do_mq_timedsend.__x64_sys_mq_timedsend.do_syscall_64 0.65 +27.0 27.62 ± 2% +0.6 1.24 perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_noprof.load_msg.do_mq_timedsend.__x64_sys_mq_timedsend 24.25 -12.2 12.02 ± 4% -0.2 24.05 perf-profile.children.cycles-pp.wake_up_q 24.00 -12.1 11.93 ± 4% -0.2 23.81 perf-profile.children.cycles-pp.try_to_wake_up 23.75 -10.9 12.88 -0.3 23.42 perf-profile.children.cycles-pp.common_startup_64 23.75 -10.9 12.88 -0.3 23.42 perf-profile.children.cycles-pp.cpu_startup_entry 23.68 -10.8 12.85 -0.3 23.34 perf-profile.children.cycles-pp.do_idle 23.65 -10.8 12.82 -0.3 23.32 perf-profile.children.cycles-pp.start_secondary 19.65 -8.3 11.33 -0.1 19.54 perf-profile.children.cycles-pp.__schedule 17.14 -6.9 10.28 -0.1 17.06 perf-profile.children.cycles-pp.wq_sleep 16.24 -6.4 9.82 -0.1 16.18 perf-profile.children.cycles-pp.schedule 15.92 -6.2 9.69 -0.1 15.84 perf-profile.children.cycles-pp.schedule_hrtimeout_range_clock 12.46 -6.1 6.32 ± 4% -0.2 12.30 perf-profile.children.cycles-pp.select_task_rq 12.19 -6.0 6.17 ± 4% -0.2 12.03 perf-profile.children.cycles-pp.select_task_rq_fair 9.91 -4.8 5.09 ± 4% -0.1 9.78 perf-profile.children.cycles-pp.select_idle_sibling 4.57 -4.6 0.02 ±141% -0.1 4.44 perf-profile.children.cycles-pp.poll_idle 9.27 -4.5 4.80 ± 4% -0.1 9.14 perf-profile.children.cycles-pp.select_idle_cpu 12.49 -4.5 8.03 -0.2 12.28 perf-profile.children.cycles-pp.cpuidle_idle_call 11.41 -4.0 7.39 -0.2 11.23 perf-profile.children.cycles-pp.cpuidle_enter_state 11.44 -4.0 7.43 -0.2 11.26 perf-profile.children.cycles-pp.cpuidle_enter 8.17 -4.0 4.18 ± 4% -0.1 8.06 perf-profile.children.cycles-pp.select_idle_core 5.76 -3.5 2.28 ± 4% -0.1 5.70 perf-profile.children.cycles-pp.flush_smp_call_function_queue 5.47 -3.1 2.35 ± 4% -0.1 5.40 perf-profile.children.cycles-pp.__flush_smp_call_function_queue 4.30 -2.4 1.94 ± 4% -0.1 4.24 perf-profile.children.cycles-pp.sched_ttwu_pending 4.08 -2.2 1.90 -0.0 4.03 perf-profile.children.cycles-pp.schedule_idle 3.47 -2.2 1.30 ± 2% -0.1 3.42 perf-profile.children.cycles-pp.store_msg 5.87 -2.1 3.78 -0.0 5.86 perf-profile.children.cycles-pp.__pick_next_task 4.84 -2.0 2.86 -0.0 4.81 perf-profile.children.cycles-pp.try_to_block_task 4.73 -1.9 2.79 -0.0 4.69 perf-profile.children.cycles-pp.dequeue_entities 4.73 -1.9 2.82 -0.0 4.70 perf-profile.children.cycles-pp.dequeue_task_fair 4.04 -1.8 2.26 -0.0 4.00 perf-profile.children.cycles-pp._raw_spin_lock 3.72 -1.7 2.03 ± 4% -0.0 3.70 perf-profile.children.cycles-pp.enqueue_task 2.40 -1.6 0.85 ± 3% -0.0 2.37 perf-profile.children.cycles-pp._copy_to_user 3.39 -1.5 1.86 ± 3% -0.0 3.36 perf-profile.children.cycles-pp.ttwu_do_activate 2.56 -1.5 1.04 ± 5% -0.0 2.54 perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template 2.54 -1.5 1.03 ± 5% -0.0 2.52 perf-profile.children.cycles-pp.do_perf_trace_sched_wakeup_template 2.76 -1.4 1.34 ± 4% -0.0 2.73 perf-profile.children.cycles-pp.arch_exit_to_user_mode_prepare 3.75 -1.4 2.34 -0.0 3.74 perf-profile.children.cycles-pp.dequeue_entity 2.32 -1.4 0.95 ± 4% -0.0 2.30 perf-profile.children.cycles-pp.ttwu_queue_wakelist 2.72 -1.3 1.38 ± 2% -0.0 2.72 perf-profile.children.cycles-pp.update_curr 2.38 -1.3 1.06 ± 2% -0.1 2.32 perf-profile.children.cycles-pp.exit_to_user_mode_loop 4.24 -1.2 2.99 -0.0 4.21 perf-profile.children.cycles-pp.pick_next_task_fair 2.21 -1.1 1.15 ± 5% -0.0 2.19 perf-profile.children.cycles-pp.switch_fpu_return 1.76 -1.0 0.73 ± 3% +0.0 1.80 perf-profile.children.cycles-pp.msg_get 1.67 -1.0 0.65 ± 3% -0.1 1.54 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 2.47 -1.0 1.47 ± 3% -0.0 2.44 perf-profile.children.cycles-pp.enqueue_task_fair 2.29 -1.0 1.32 -0.0 2.25 perf-profile.children.cycles-pp.update_load_avg 2.39 -1.0 1.42 -0.0 2.37 perf-profile.children.cycles-pp.raw_spin_rq_lock_nested 1.60 -1.0 0.64 -0.0 1.59 perf-profile.children.cycles-pp.switch_mm_irqs_off 1.51 -0.9 0.57 ± 5% -0.0 1.49 perf-profile.children.cycles-pp.__smp_call_single_queue 1.84 -0.9 0.93 ± 6% -0.0 1.82 perf-profile.children.cycles-pp.wake_affine 1.49 -0.9 0.59 +0.0 1.49 perf-profile.children.cycles-pp.__check_object_size 1.61 -0.9 0.72 ± 5% +0.0 1.62 perf-profile.children.cycles-pp.update_rq_clock_task 1.73 -0.9 0.87 ± 2% -0.0 1.73 perf-profile.children.cycles-pp.update_se 1.51 -0.9 0.65 ± 2% +0.0 1.53 perf-profile.children.cycles-pp.wakeup_preempt 2.00 -0.8 1.15 ± 3% -0.0 1.97 perf-profile.children.cycles-pp.enqueue_entity 1.26 -0.8 0.42 ± 3% -0.1 1.14 perf-profile.children.cycles-pp.__wake_up 1.31 -0.7 0.58 ± 5% -0.0 1.30 perf-profile.children.cycles-pp.set_task_cpu 1.15 -0.7 0.45 ± 3% -0.0 1.10 perf-profile.children.cycles-pp.msg_insert 1.04 -0.7 0.35 ± 5% -0.0 1.04 perf-profile.children.cycles-pp.perf_trace_buf_alloc 1.57 -0.7 0.90 ± 5% -0.0 1.56 perf-profile.children.cycles-pp.restore_fpregs_from_fpstate 1.00 -0.7 0.34 ± 7% -0.0 1.00 perf-profile.children.cycles-pp.perf_swevent_get_recursion_context 0.95 -0.7 0.29 ± 4% -0.0 0.94 perf-profile.children.cycles-pp.llist_reverse_order 1.14 -0.6 0.50 ± 5% -0.0 1.13 perf-profile.children.cycles-pp.migrate_task_rq_fair 1.11 -0.6 0.51 -0.0 1.10 perf-profile.children.cycles-pp.set_next_entity 0.95 -0.6 0.39 ± 2% +0.0 0.96 perf-profile.children.cycles-pp.__update_idle_core 1.04 -0.6 0.49 ± 2% -0.0 1.03 perf-profile.children.cycles-pp.pick_task_fair 1.15 -0.6 0.60 ± 7% -0.0 1.14 perf-profile.children.cycles-pp.task_h_load 0.84 -0.5 0.30 ± 5% -0.0 0.84 perf-profile.children.cycles-pp.call_function_single_prep_ipi 1.06 -0.5 0.52 +0.0 1.07 perf-profile.children.cycles-pp.set_next_task_idle 1.04 -0.5 0.51 -0.0 1.03 perf-profile.children.cycles-pp._find_next_bit 1.38 -0.5 0.85 -0.0 1.36 perf-profile.children.cycles-pp.update_cfs_rq_load_avg 1.28 -0.5 0.75 -0.0 1.27 perf-profile.children.cycles-pp.__switch_to 0.85 -0.5 0.34 ± 3% +0.0 0.86 perf-profile.children.cycles-pp.cpuacct_charge 0.77 ± 4% -0.5 0.25 -0.0 0.73 perf-profile.children.cycles-pp.__bitmap_andnot 0.88 -0.5 0.41 ± 6% +0.0 0.89 perf-profile.children.cycles-pp.update_entity_lag 0.94 -0.5 0.48 -0.0 0.92 perf-profile.children.cycles-pp.prepare_task_switch 0.74 -0.4 0.31 ± 2% +0.0 0.76 perf-profile.children.cycles-pp.check_heap_object 0.75 -0.4 0.32 ± 5% +0.0 0.75 perf-profile.children.cycles-pp.requeue_delayed_entity 0.80 -0.4 0.40 -0.0 0.80 perf-profile.children.cycles-pp.wakeup_preempt_fair 0.55 -0.4 0.16 ± 2% -0.0 0.54 perf-profile.children.cycles-pp.native_sched_clock 0.57 -0.4 0.18 ± 4% -0.0 0.56 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 0.55 -0.4 0.17 -0.0 0.54 perf-profile.children.cycles-pp.os_xsave 0.58 -0.4 0.20 -0.0 0.57 perf-profile.children.cycles-pp.sched_clock_cpu 1.39 -0.4 1.01 -0.0 1.36 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 1.18 -0.4 0.81 -0.0 1.18 perf-profile.children.cycles-pp.___task_rq_lock 0.51 ± 3% -0.4 0.14 ± 3% -0.0 0.50 perf-profile.children.cycles-pp._copy_from_user 0.56 -0.4 0.19 ± 2% -0.0 0.54 perf-profile.children.cycles-pp.update_rq_clock 0.51 -0.3 0.17 ± 2% -0.0 0.50 perf-profile.children.cycles-pp.sched_clock 0.56 -0.3 0.23 ± 3% +0.0 0.58 perf-profile.children.cycles-pp.simple_inode_init_ts 0.89 -0.3 0.56 +0.0 0.89 perf-profile.children.cycles-pp.put_prev_entity 0.53 ± 3% -0.3 0.21 ± 2% +0.0 0.53 perf-profile.children.cycles-pp.__put_user_4 0.68 ± 12% -0.3 0.36 ± 4% +0.1 0.76 perf-profile.children.cycles-pp.stress_switch_mq 0.52 -0.3 0.20 -0.0 0.50 perf-profile.children.cycles-pp.set_next_task_fair 0.55 -0.3 0.24 ± 5% -0.0 0.54 perf-profile.children.cycles-pp.__switch_to_asm 0.51 -0.3 0.21 ± 3% +0.0 0.52 perf-profile.children.cycles-pp.inode_set_ctime_current 0.56 -0.3 0.26 ± 3% +0.0 0.57 perf-profile.children.cycles-pp.fdget 0.52 -0.3 0.23 ± 3% -0.0 0.51 perf-profile.children.cycles-pp.mm_cid_switch_to 0.54 -0.3 0.25 ± 3% -0.0 0.54 perf-profile.children.cycles-pp.remove_entity_load_avg 0.79 -0.3 0.51 -0.0 0.78 perf-profile.children.cycles-pp.asm_sysvec_call_function_single 0.37 -0.3 0.11 ± 7% +0.0 0.37 perf-profile.children.cycles-pp.__resched_curr 0.57 -0.2 0.32 -0.0 0.56 perf-profile.children.cycles-pp.__update_load_avg_cfs_rq 0.35 -0.2 0.12 ± 4% -0.0 0.35 perf-profile.children.cycles-pp.avg_vruntime 0.62 -0.2 0.39 ± 2% -0.0 0.62 perf-profile.children.cycles-pp.sysvec_call_function_single 0.34 -0.2 0.11 -0.0 0.33 perf-profile.children.cycles-pp.entry_SYSCALL_64 0.58 -0.2 0.36 +0.0 0.59 perf-profile.children.cycles-pp.__enqueue_entity 0.46 -0.2 0.24 -0.0 0.45 perf-profile.children.cycles-pp.__pick_eevdf 0.52 -0.2 0.30 ± 3% +0.0 0.52 perf-profile.children.cycles-pp.perf_tp_event 0.35 -0.2 0.14 ± 3% -0.0 0.35 perf-profile.children.cycles-pp.__wrgsbase_inactive 0.56 -0.2 0.36 -0.0 0.56 perf-profile.children.cycles-pp.__sysvec_call_function_single 0.33 -0.2 0.13 ± 3% -0.0 0.32 perf-profile.children.cycles-pp.__update_load_avg_se 0.31 ± 2% -0.2 0.12 ± 4% +0.0 0.33 perf-profile.children.cycles-pp.__virt_addr_valid 0.58 -0.2 0.40 -0.0 0.58 perf-profile.children.cycles-pp.menu_select 0.32 ± 5% -0.2 0.13 -0.0 0.29 ± 3% perf-profile.children.cycles-pp.__check_heap_object 0.31 -0.2 0.13 ± 3% -0.0 0.30 perf-profile.children.cycles-pp.place_entity 0.32 -0.2 0.14 -0.0 0.32 perf-profile.children.cycles-pp.tick_nohz_idle_enter 0.36 -0.2 0.18 ± 2% -0.0 0.36 perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime 0.29 -0.2 0.11 -0.0 0.28 perf-profile.children.cycles-pp.syscall_return_via_sysret 0.33 -0.2 0.16 ± 3% -0.0 0.32 perf-profile.children.cycles-pp.do_perf_trace_sched_stat_runtime 0.35 -0.2 0.18 ± 2% +0.0 0.36 perf-profile.children.cycles-pp.ktime_get 0.25 -0.2 0.08 ± 5% -0.0 0.25 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 0.37 -0.2 0.20 ± 4% -0.0 0.36 perf-profile.children.cycles-pp.___perf_sw_event 0.23 -0.2 0.07 ± 6% -0.0 0.22 ± 2% perf-profile.children.cycles-pp.tick_nohz_idle_exit 0.22 -0.2 0.07 -0.0 0.22 ± 2% perf-profile.children.cycles-pp.read_tsc 0.21 -0.1 0.06 ± 7% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.__rdgsbase_inactive 0.25 -0.1 0.11 ± 7% -0.0 0.25 perf-profile.children.cycles-pp.strnlen 0.16 -0.1 0.02 ±141% -0.0 0.15 perf-profile.children.cycles-pp.ct_idle_exit 0.32 -0.1 0.18 -0.0 0.31 perf-profile.children.cycles-pp.__dequeue_entity 0.23 ± 2% -0.1 0.10 ± 4% +0.0 0.24 ± 2% perf-profile.children.cycles-pp.check_stack_object 0.53 -0.1 0.40 ± 2% -0.0 0.52 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.23 ± 2% -0.1 0.09 ± 5% -0.0 0.22 ± 2% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore 0.30 ± 3% -0.1 0.17 ± 4% -0.0 0.30 perf-profile.children.cycles-pp.tick_nohz_handler 0.20 -0.1 0.07 +0.4 0.57 ± 3% perf-profile.children.cycles-pp.__account_obj_stock 0.13 -0.1 0.00 -0.0 0.12 perf-profile.children.cycles-pp.ct_kernel_enter 0.13 -0.1 0.00 -0.0 0.12 perf-profile.children.cycles-pp.ct_kernel_exit_state 0.41 -0.1 0.28 ± 2% -0.0 0.40 perf-profile.children.cycles-pp.hrtimer_interrupt 0.31 ± 2% -0.1 0.18 ± 2% -0.0 0.30 perf-profile.children.cycles-pp.__hrtimer_run_queues 0.41 ± 2% -0.1 0.29 ± 3% -0.0 0.41 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.28 ± 2% -0.1 0.16 ± 6% -0.0 0.27 perf-profile.children.cycles-pp.update_process_times 0.33 -0.1 0.21 ± 6% +0.0 0.33 perf-profile.children.cycles-pp.attach_entity_load_avg 0.12 ± 3% -0.1 0.00 -0.0 0.08 perf-profile.children.cycles-pp.__do_notify 0.19 ± 2% -0.1 0.07 ± 7% -0.0 0.18 perf-profile.children.cycles-pp.wake_q_add_safe 0.23 ± 2% -0.1 0.11 -0.0 0.22 perf-profile.children.cycles-pp.__kmalloc_cache_noprof 0.18 -0.1 0.07 ± 6% -0.0 0.18 perf-profile.children.cycles-pp.nohz_run_idle_balance 0.56 -0.1 0.46 -0.0 0.56 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.15 -0.1 0.05 +0.0 0.15 perf-profile.children.cycles-pp._raw_spin_unlock 0.17 -0.1 0.08 -0.0 0.16 perf-profile.children.cycles-pp.security_msg_msg_free 0.15 -0.1 0.06 +0.0 0.16 perf-profile.children.cycles-pp.inode_set_ctime_to_ts 0.08 ± 5% -0.1 0.00 +0.0 0.11 perf-profile.children.cycles-pp.trylock_stock 0.16 ± 2% -0.1 0.08 ± 5% -0.0 0.16 perf-profile.children.cycles-pp.dl_server_update 0.13 -0.1 0.06 ± 8% +0.0 0.13 perf-profile.children.cycles-pp.timestamp_truncate 0.15 ± 3% -0.1 0.08 +0.0 0.17 perf-profile.children.cycles-pp.perf_trace_buf_update 0.13 ± 3% -0.1 0.06 +0.0 0.13 perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64_mg 0.13 ± 3% -0.1 0.06 +0.0 0.13 perf-profile.children.cycles-pp.migrate_disable_switch 0.12 ± 6% -0.1 0.05 ± 8% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.__cgroup_account_cputime 0.11 ± 4% -0.1 0.05 -0.0 0.11 perf-profile.children.cycles-pp.cpuidle_governor_latency_req 0.06 -0.1 0.00 +0.0 0.07 perf-profile.children.cycles-pp.raw_spin_rq_unlock 0.14 -0.1 0.08 ± 5% +0.0 0.14 perf-profile.children.cycles-pp.update_curr_dl_se 0.06 ± 16% -0.1 0.00 +0.1 0.12 ± 4% perf-profile.children.cycles-pp.css_rstat_updated 0.10 -0.1 0.05 -0.0 0.10 perf-profile.children.cycles-pp.ct_kernel_exit 0.11 -0.1 0.06 +0.0 0.11 perf-profile.children.cycles-pp.tracing_gen_ctx_irq_test 0.10 ± 4% -0.0 0.05 +0.0 0.10 perf-profile.children.cycles-pp.__rb_insert_augmented 0.10 -0.0 0.06 ± 8% -0.0 0.10 perf-profile.children.cycles-pp.rest_init 0.10 -0.0 0.06 ± 8% -0.0 0.10 perf-profile.children.cycles-pp.start_kernel 0.10 -0.0 0.06 ± 8% -0.0 0.10 perf-profile.children.cycles-pp.x86_64_start_kernel 0.10 -0.0 0.06 ± 8% -0.0 0.10 perf-profile.children.cycles-pp.x86_64_start_reservations 0.08 ± 10% -0.0 0.04 ± 71% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.mq_timedreceive 0.15 -0.0 0.11 ± 4% +0.0 0.15 perf-profile.children.cycles-pp.vruntime_eligible 0.13 -0.0 0.09 +0.0 0.13 perf-profile.children.cycles-pp.put_prev_task_fair 0.09 -0.0 0.05 ± 8% -0.0 0.09 perf-profile.children.cycles-pp.native_irq_return_iret 0.08 -0.0 0.05 +0.0 0.08 perf-profile.children.cycles-pp.choose_new_asid 0.13 -0.0 0.11 +0.0 0.13 perf-profile.children.cycles-pp.__irq_exit_rcu 0.07 -0.0 0.05 +0.0 0.07 perf-profile.children.cycles-pp.__set_next_task_fair 0.76 -0.0 0.74 -0.0 0.74 perf-profile.children.cycles-pp.finish_task_switch 0.09 -0.0 0.07 ± 6% -0.0 0.09 perf-profile.children.cycles-pp.propagate_entity_load_avg 0.10 -0.0 0.09 -0.0 0.10 perf-profile.children.cycles-pp.handle_softirqs 0.07 -0.0 0.06 +0.0 0.08 perf-profile.children.cycles-pp.clockevents_program_event 0.00 +0.0 0.00 +0.1 0.14 ± 3% perf-profile.children.cycles-pp.__mod_memcg_state 0.00 +0.0 0.00 +0.1 0.14 ± 3% perf-profile.children.cycles-pp.try_charge_memcg 0.00 +0.0 0.00 +0.3 0.32 ± 4% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 0.07 +0.0 0.08 +0.0 0.07 perf-profile.children.cycles-pp.perf_swevent_event 0.48 +0.0 0.49 +0.0 0.48 perf-profile.children.cycles-pp.process_simple 0.05 +0.0 0.06 ± 7% -0.0 0.05 perf-profile.children.cycles-pp.sched_update_worker 0.05 +0.0 0.07 -0.0 0.05 perf-profile.children.cycles-pp.arch_cpu_idle_enter 0.07 ± 11% +0.0 0.09 ± 5% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.mq_timedsend 0.15 ± 3% +0.0 0.20 ± 4% -0.0 0.15 perf-profile.children.cycles-pp.x64_sys_call 0.00 +0.1 0.05 +0.0 0.00 perf-profile.children.cycles-pp.__sched_balance_update_blocked_averages 0.00 +0.1 0.05 +0.0 0.00 perf-profile.children.cycles-pp.update_cfs_group 0.00 +0.1 0.06 ± 23% +0.0 0.00 perf-profile.children.cycles-pp.generic_perform_write 0.00 +0.1 0.06 ± 7% +0.0 0.00 perf-profile.children.cycles-pp.detach_tasks 0.00 +0.1 0.06 ± 29% +0.0 0.00 perf-profile.children.cycles-pp.shmem_file_write_iter 0.00 +0.1 0.06 ± 29% +0.0 0.00 perf-profile.children.cycles-pp.vfs_write 0.00 +0.1 0.07 ± 25% +0.0 0.00 perf-profile.children.cycles-pp.ksys_write 0.00 +0.1 0.08 ± 30% +0.0 0.00 perf-profile.children.cycles-pp.record__pushfn 0.04 ± 71% +0.1 0.12 ± 35% -0.0 0.03 ±100% perf-profile.children.cycles-pp.perf_mmap__push 0.54 ± 2% +0.1 0.62 ± 7% -0.0 0.54 ± 2% perf-profile.children.cycles-pp.cmd_record 0.04 ± 70% +0.1 0.13 ± 32% -0.0 0.03 ±100% perf-profile.children.cycles-pp.record__mmap_read_evlist 0.04 ± 71% +0.1 0.13 ± 35% -0.0 0.04 ±100% perf-profile.children.cycles-pp.handle_internal_command 0.04 ± 71% +0.1 0.13 ± 35% -0.0 0.04 ±100% perf-profile.children.cycles-pp.main 0.04 ± 71% +0.1 0.13 ± 35% -0.0 0.04 ±100% perf-profile.children.cycles-pp.run_builtin 0.10 ± 4% +0.1 0.20 ± 4% -0.0 0.10 perf-profile.children.cycles-pp.do_perf_trace_sched_switch 0.00 +0.1 0.10 ± 4% +0.0 0.00 perf-profile.children.cycles-pp.ct_idle_enter 0.13 ± 3% +0.2 0.29 ± 4% -0.0 0.13 perf-profile.children.cycles-pp.perf_trace_sched_switch 0.21 ± 6% +0.6 0.78 ± 3% -0.0 0.20 perf-profile.children.cycles-pp.update_sg_lb_stats 0.22 ± 6% +0.6 0.81 ± 4% -0.0 0.21 perf-profile.children.cycles-pp.update_sd_lb_stats 0.22 ± 6% +0.6 0.81 ± 3% -0.0 0.21 perf-profile.children.cycles-pp.sched_balance_find_src_group 0.40 ± 3% +0.7 1.08 ± 4% -0.0 0.40 perf-profile.children.cycles-pp.sched_balance_newidle 0.27 ± 6% +0.7 0.97 ± 4% -0.0 0.26 perf-profile.children.cycles-pp.sched_balance_rq 5.90 +0.9 6.81 -0.0 5.88 perf-profile.children.cycles-pp.intel_idle 35.82 +3.5 39.30 +0.1 35.93 perf-profile.children.cycles-pp.__x64_sys_mq_timedreceive 35.70 +3.5 39.25 +0.1 35.82 perf-profile.children.cycles-pp.do_mq_timedreceive 0.00 +8.8 8.78 ± 3% +0.0 0.00 perf-profile.children.cycles-pp.drain_obj_stock 28.34 +13.4 41.76 +0.3 28.64 perf-profile.children.cycles-pp.do_mq_timedsend 28.52 +13.5 41.99 +0.3 28.82 perf-profile.children.cycles-pp.__x64_sys_mq_timedsend 70.76 +13.7 84.45 +0.3 71.10 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 70.56 +13.8 84.39 +0.3 70.90 perf-profile.children.cycles-pp.do_syscall_64 0.08 +16.2 16.31 ± 4% +0.2 0.27 ± 3% perf-profile.children.cycles-pp.__refill_obj_stock 3.22 +20.1 23.33 ± 2% +0.4 3.62 perf-profile.children.cycles-pp.kfree 3.01 +20.2 23.25 ± 2% +0.4 3.38 perf-profile.children.cycles-pp.free_msg 2.46 +20.5 23.00 ± 2% +0.4 2.83 perf-profile.children.cycles-pp.__memcg_slab_free_hook 2.31 +25.9 28.25 ± 2% +0.6 2.94 perf-profile.children.cycles-pp.load_msg 1.00 +26.8 27.81 ± 2% +0.6 1.62 perf-profile.children.cycles-pp.__kmalloc_node_noprof 0.68 +27.0 27.65 ± 2% +0.6 1.29 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook 4.41 -4.4 0.02 ±141% -0.1 4.28 perf-profile.self.cycles-pp.poll_idle 6.79 -3.2 3.63 ± 5% -0.1 6.70 perf-profile.self.cycles-pp.select_idle_core 3.07 -1.7 1.33 ± 4% -0.0 3.06 perf-profile.self.cycles-pp.do_mq_timedreceive 2.90 -1.6 1.31 ± 3% -0.0 2.88 perf-profile.self.cycles-pp.__schedule 2.37 -1.5 0.84 ± 3% -0.0 2.34 perf-profile.self.cycles-pp._copy_to_user 2.73 -1.5 1.22 ± 3% -0.0 2.70 perf-profile.self.cycles-pp.do_mq_timedsend 2.63 -1.4 1.25 ± 3% -0.0 2.59 perf-profile.self.cycles-pp._raw_spin_lock 1.64 -1.0 0.64 ± 3% -0.1 1.52 perf-profile.self.cycles-pp._raw_spin_lock_irqsave 1.46 -0.9 0.55 ± 2% -0.0 1.45 perf-profile.self.cycles-pp.switch_mm_irqs_off 1.54 -0.9 0.67 ± 6% +0.0 1.54 perf-profile.self.cycles-pp.update_rq_clock_task 1.48 -0.9 0.62 ± 3% +0.0 1.49 perf-profile.self.cycles-pp.msg_get 1.36 -0.8 0.58 ± 3% -0.1 1.30 perf-profile.self.cycles-pp.exit_to_user_mode_loop 1.11 -0.7 0.44 ± 4% -0.0 1.08 perf-profile.self.cycles-pp.msg_insert 1.56 -0.7 0.90 ± 6% -0.0 1.56 perf-profile.self.cycles-pp.restore_fpregs_from_fpstate 0.95 -0.7 0.29 ± 5% -0.0 0.94 perf-profile.self.cycles-pp.llist_reverse_order 1.00 -0.7 0.34 ± 7% -0.0 0.99 perf-profile.self.cycles-pp.perf_swevent_get_recursion_context 1.19 -0.6 0.59 ± 3% -0.0 1.18 perf-profile.self.cycles-pp.wq_sleep 0.92 -0.6 0.36 ± 6% -0.0 0.92 perf-profile.self.cycles-pp.do_perf_trace_sched_wakeup_template 0.90 -0.6 0.34 ± 2% +0.0 0.92 perf-profile.self.cycles-pp.__update_idle_core 1.15 -0.5 0.60 ± 7% -0.0 1.14 perf-profile.self.cycles-pp.task_h_load 0.83 -0.5 0.30 ± 5% -0.0 0.83 perf-profile.self.cycles-pp.call_function_single_prep_ipi 0.97 -0.5 0.44 ± 2% -0.0 0.94 perf-profile.self.cycles-pp.dequeue_entities 0.84 -0.5 0.34 ± 3% +0.0 0.86 perf-profile.self.cycles-pp.cpuacct_charge 0.96 -0.5 0.47 ± 4% +0.0 0.96 perf-profile.self.cycles-pp.do_syscall_64 1.23 -0.5 0.74 -0.0 1.22 perf-profile.self.cycles-pp.__switch_to 0.73 ± 4% -0.5 0.24 ± 3% -0.0 0.70 perf-profile.self.cycles-pp.__bitmap_andnot 0.69 -0.5 0.20 ± 4% +0.0 0.69 perf-profile.self.cycles-pp.flush_smp_call_function_queue 0.94 -0.5 0.47 +0.0 0.94 ± 2% perf-profile.self.cycles-pp._find_next_bit 0.77 -0.4 0.36 ± 6% +0.0 0.77 perf-profile.self.cycles-pp.update_entity_lag 0.76 -0.4 0.36 ± 3% -0.0 0.74 perf-profile.self.cycles-pp.update_load_avg 0.67 -0.4 0.27 ± 5% -0.0 0.66 perf-profile.self.cycles-pp.__smp_call_single_queue 0.70 -0.4 0.31 ± 2% +0.0 0.70 perf-profile.self.cycles-pp.pick_next_task_fair 0.55 -0.4 0.17 ± 2% -0.0 0.54 perf-profile.self.cycles-pp.os_xsave 0.56 -0.4 0.18 ± 4% -0.0 0.54 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.63 -0.4 0.25 -0.0 0.62 perf-profile.self.cycles-pp.switch_fpu_return 1.38 -0.4 1.01 -0.0 1.36 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 0.69 -0.4 0.33 ± 5% -0.0 0.68 perf-profile.self.cycles-pp.wake_affine 0.53 -0.4 0.16 -0.0 0.51 perf-profile.self.cycles-pp.native_sched_clock 0.67 -0.4 0.30 ± 2% +0.0 0.70 perf-profile.self.cycles-pp.kfree 0.75 -0.4 0.39 ± 3% +0.0 0.76 perf-profile.self.cycles-pp.update_curr 0.67 -0.4 0.31 ± 5% -0.0 0.66 perf-profile.self.cycles-pp.ttwu_queue_wakelist 0.53 -0.4 0.18 ± 2% -0.0 0.52 perf-profile.self.cycles-pp.arch_exit_to_user_mode_prepare 0.49 ± 2% -0.4 0.14 ± 3% -0.0 0.48 perf-profile.self.cycles-pp._copy_from_user 0.63 -0.3 0.28 ± 3% -0.0 0.62 perf-profile.self.cycles-pp.schedule_hrtimeout_range_clock 0.58 -0.3 0.23 ± 4% -0.0 0.57 perf-profile.self.cycles-pp.select_idle_sibling 0.58 -0.3 0.24 ± 7% -0.0 0.56 perf-profile.self.cycles-pp.migrate_task_rq_fair 0.64 -0.3 0.31 -0.0 0.63 perf-profile.self.cycles-pp.prepare_task_switch 0.82 -0.3 0.49 ± 3% -0.0 0.82 perf-profile.self.cycles-pp.select_idle_cpu 0.52 ± 3% -0.3 0.21 ± 2% +0.0 0.52 perf-profile.self.cycles-pp.__put_user_4 0.63 ± 11% -0.3 0.32 ± 5% +0.0 0.64 perf-profile.self.cycles-pp.stress_switch_mq 0.54 -0.3 0.24 ± 5% -0.0 0.53 perf-profile.self.cycles-pp.__switch_to_asm 0.54 -0.3 0.25 ± 3% +0.0 0.55 perf-profile.self.cycles-pp.fdget 0.51 -0.3 0.22 ± 4% -0.0 0.50 perf-profile.self.cycles-pp.mm_cid_switch_to 0.44 -0.3 0.15 ± 6% -0.0 0.44 perf-profile.self.cycles-pp.select_task_rq_fair 0.43 -0.3 0.15 ± 5% -0.0 0.42 perf-profile.self.cycles-pp.sched_ttwu_pending 0.49 -0.3 0.23 ± 6% +0.0 0.50 perf-profile.self.cycles-pp.enqueue_task 0.59 -0.2 0.34 ± 2% -0.0 0.58 perf-profile.self.cycles-pp.try_to_wake_up 0.73 -0.2 0.48 -0.0 0.72 perf-profile.self.cycles-pp.update_cfs_rq_load_avg 0.35 -0.2 0.11 ± 4% +0.0 0.35 perf-profile.self.cycles-pp.__resched_curr 0.40 -0.2 0.16 ± 2% -0.0 0.40 perf-profile.self.cycles-pp.__pick_next_task 0.36 -0.2 0.12 ± 3% -0.0 0.35 perf-profile.self.cycles-pp.cpuidle_idle_call 0.34 -0.2 0.11 -0.0 0.33 perf-profile.self.cycles-pp.entry_SYSCALL_64 0.57 -0.2 0.36 +0.0 0.58 perf-profile.self.cycles-pp.__enqueue_entity 0.51 -0.2 0.31 -0.0 0.50 perf-profile.self.cycles-pp.__update_load_avg_cfs_rq 0.34 -0.2 0.14 ± 5% +0.0 0.36 perf-profile.self.cycles-pp.wakeup_preempt 0.34 -0.2 0.14 ± 3% -0.0 0.34 perf-profile.self.cycles-pp.__wrgsbase_inactive 0.39 -0.2 0.20 ± 2% -0.0 0.39 perf-profile.self.cycles-pp.do_idle 0.31 ± 5% -0.2 0.11 -0.0 0.28 ± 5% perf-profile.self.cycles-pp.__check_heap_object 0.37 -0.2 0.17 ± 2% +0.0 0.37 perf-profile.self.cycles-pp.check_heap_object 0.51 -0.2 0.32 +0.0 0.51 perf-profile.self.cycles-pp.schedule 0.36 -0.2 0.18 ± 2% -0.0 0.36 perf-profile.self.cycles-pp.__pick_eevdf 0.61 -0.2 0.43 +0.0 0.63 perf-profile.self.cycles-pp.dequeue_entity 0.29 ± 2% -0.2 0.11 ± 4% +0.0 0.31 perf-profile.self.cycles-pp.__virt_addr_valid 0.30 -0.2 0.12 -0.0 0.29 perf-profile.self.cycles-pp.__update_load_avg_se 0.27 -0.2 0.10 ± 4% -0.0 0.27 perf-profile.self.cycles-pp.syscall_return_via_sysret 0.25 -0.2 0.08 ± 5% -0.0 0.24 ± 2% perf-profile.self.cycles-pp.__check_object_size 0.26 -0.2 0.11 ± 4% -0.0 0.25 perf-profile.self.cycles-pp.place_entity 0.45 -0.2 0.30 ± 3% -0.0 0.44 perf-profile.self.cycles-pp.enqueue_entity 0.44 -0.2 0.29 ± 5% -0.0 0.44 perf-profile.self.cycles-pp.enqueue_task_fair 0.24 -0.2 0.09 +0.0 0.24 perf-profile.self.cycles-pp.wake_up_q 0.22 ± 2% -0.1 0.07 -0.0 0.21 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.24 -0.1 0.09 ± 5% +0.0 0.24 perf-profile.self.cycles-pp.___perf_sw_event 0.32 -0.1 0.18 ± 2% +0.0 0.32 perf-profile.self.cycles-pp.dequeue_task_fair 0.21 ± 2% -0.1 0.06 ± 7% -0.0 0.20 perf-profile.self.cycles-pp.read_tsc 0.20 -0.1 0.06 -0.0 0.20 ± 2% perf-profile.self.cycles-pp.__rdgsbase_inactive 0.36 -0.1 0.22 ± 4% -0.0 0.35 perf-profile.self.cycles-pp.perf_tp_event 0.24 -0.1 0.11 ± 4% -0.0 0.24 perf-profile.self.cycles-pp.strnlen 0.28 -0.1 0.14 +0.0 0.28 perf-profile.self.cycles-pp.__kmalloc_node_noprof 0.21 -0.1 0.08 ± 6% +0.0 0.23 perf-profile.self.cycles-pp.load_msg 0.33 -0.1 0.20 ± 4% -0.0 0.32 perf-profile.self.cycles-pp.attach_entity_load_avg 0.12 ± 3% -0.1 0.00 -0.0 0.08 ± 6% perf-profile.self.cycles-pp.__do_notify 0.41 -0.1 0.29 -0.0 0.40 perf-profile.self.cycles-pp.update_se 0.44 -0.1 0.33 -0.0 0.44 perf-profile.self.cycles-pp.menu_select 0.27 -0.1 0.15 ± 6% -0.0 0.26 perf-profile.self.cycles-pp.select_task_rq 0.17 ± 2% -0.1 0.06 ± 8% -0.0 0.17 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.23 ± 2% -0.1 0.11 ± 4% -0.0 0.22 perf-profile.self.cycles-pp.__flush_smp_call_function_queue 0.19 -0.1 0.08 ± 6% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.update_rq_clock 0.18 ± 2% -0.1 0.06 ± 7% -0.0 0.17 perf-profile.self.cycles-pp.wake_q_add_safe 0.20 ± 2% -0.1 0.09 +0.0 0.21 perf-profile.self.cycles-pp.check_stack_object 0.18 -0.1 0.07 +0.1 0.26 perf-profile.self.cycles-pp.__account_obj_stock 0.21 ± 2% -0.1 0.10 ± 4% -0.0 0.20 perf-profile.self.cycles-pp.__kmalloc_cache_noprof 0.24 -0.1 0.14 -0.0 0.24 ± 2% perf-profile.self.cycles-pp.__dequeue_entity 0.19 -0.1 0.10 ± 4% -0.0 0.19 perf-profile.self.cycles-pp.pick_task_fair 0.14 ± 3% -0.1 0.05 +0.0 0.14 ± 3% perf-profile.self.cycles-pp.inode_set_ctime_current 0.16 ± 3% -0.1 0.06 ± 7% +0.0 0.16 perf-profile.self.cycles-pp.nohz_run_idle_balance 0.15 -0.1 0.06 +0.0 0.15 perf-profile.self.cycles-pp.avg_vruntime 0.16 -0.1 0.07 +0.0 0.16 perf-profile.self.cycles-pp.schedule_idle 0.08 -0.1 0.00 -0.0 0.07 perf-profile.self.cycles-pp.security_msg_msg_free 0.07 -0.1 0.00 -0.0 0.06 perf-profile.self.cycles-pp.ct_kernel_enter 0.07 -0.1 0.00 +0.0 0.08 ± 5% perf-profile.self.cycles-pp.trylock_stock 0.15 -0.1 0.08 ± 5% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.dl_server_update 0.15 -0.1 0.08 ± 5% +0.0 0.15 perf-profile.self.cycles-pp.raw_spin_rq_lock_nested 0.12 -0.1 0.05 ± 8% +0.0 0.12 perf-profile.self.cycles-pp.migrate_disable_switch 0.12 ± 4% -0.1 0.05 +0.0 0.12 perf-profile.self.cycles-pp.__x64_sys_mq_timedreceive 0.13 -0.1 0.07 ± 7% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.___task_rq_lock 0.09 ± 5% -0.1 0.03 ± 70% +0.0 0.11 perf-profile.self.cycles-pp.inode_set_ctime_to_ts 0.22 -0.1 0.16 -0.0 0.21 perf-profile.self.cycles-pp.cpuidle_enter_state 0.12 -0.1 0.06 -0.0 0.12 ± 4% perf-profile.self.cycles-pp.store_msg 0.12 -0.1 0.06 -0.0 0.12 ± 4% perf-profile.self.cycles-pp.wakeup_preempt_fair 0.11 -0.1 0.05 +0.0 0.11 perf-profile.self.cycles-pp.timestamp_truncate 0.11 ± 4% -0.1 0.05 ± 8% +0.0 0.12 perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64_mg 0.12 ± 4% -0.1 0.06 ± 7% -0.0 0.11 perf-profile.self.cycles-pp.do_perf_trace_sched_stat_runtime 0.11 ± 4% -0.1 0.06 ± 8% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.tracing_gen_ctx_irq_test 0.07 ± 11% -0.1 0.02 ±141% +0.0 0.11 ± 9% perf-profile.self.cycles-pp.mq_timedreceive 0.13 ± 3% -0.0 0.08 +0.0 0.13 perf-profile.self.cycles-pp.update_curr_dl_se 0.15 -0.0 0.11 ± 4% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.sched_balance_newidle 0.09 -0.0 0.05 ± 8% -0.0 0.09 perf-profile.self.cycles-pp.native_irq_return_iret 0.14 ± 3% -0.0 0.10 ± 4% -0.0 0.13 perf-profile.self.cycles-pp.vruntime_eligible 0.14 ± 3% -0.0 0.11 +0.0 0.14 ± 3% perf-profile.self.cycles-pp.ktime_get 0.07 ± 7% -0.0 0.03 ± 70% -0.0 0.06 perf-profile.self.cycles-pp.__set_next_task_fair 0.15 ± 3% -0.0 0.13 +0.0 0.16 perf-profile.self.cycles-pp.put_prev_entity 0.02 ±141% -0.0 0.00 +0.1 0.09 perf-profile.self.cycles-pp.css_rstat_updated 0.02 ±141% -0.0 0.00 +0.0 0.06 ± 9% perf-profile.self.cycles-pp.perf_trace_buf_update 0.19 -0.0 0.18 ± 2% -0.0 0.18 perf-profile.self.cycles-pp.__x64_sys_mq_timedsend 0.00 +0.0 0.00 +0.1 0.11 perf-profile.self.cycles-pp.__mod_memcg_state 0.00 +0.0 0.00 +0.1 0.12 ± 4% perf-profile.self.cycles-pp.try_charge_memcg 0.00 +0.0 0.00 +0.3 0.26 ± 3% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.11 ± 4% +0.0 0.12 ± 3% +0.0 0.11 perf-profile.self.cycles-pp.set_next_task_idle 0.06 +0.0 0.08 ± 6% +0.0 0.06 perf-profile.self.cycles-pp.perf_swevent_event 0.07 ± 7% +0.0 0.09 ± 10% +0.0 0.11 perf-profile.self.cycles-pp.mq_timedsend 0.00 +0.1 0.05 +0.0 0.00 perf-profile.self.cycles-pp.ct_idle_enter 0.00 +0.1 0.05 +0.0 0.00 perf-profile.self.cycles-pp.perf_trace_sched_switch 0.00 +0.1 0.06 +0.0 0.00 perf-profile.self.cycles-pp.sched_update_worker 0.12 +0.1 0.18 ± 6% +0.0 0.12 perf-profile.self.cycles-pp.x64_sys_call 0.38 ± 2% +0.1 0.46 -0.0 0.37 perf-profile.self.cycles-pp.finish_task_switch 0.08 ± 5% +0.1 0.20 ± 6% -0.0 0.08 perf-profile.self.cycles-pp.do_perf_trace_sched_switch 0.18 ± 5% +0.5 0.71 ± 3% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.update_sg_lb_stats 5.89 +0.9 6.81 -0.0 5.88 perf-profile.self.cycles-pp.intel_idle 0.07 +7.4 7.48 ± 4% +0.1 0.22 ± 6% perf-profile.self.cycles-pp.__refill_obj_stock 0.00 +8.7 8.70 ± 3% +0.0 0.00 perf-profile.self.cycles-pp.drain_obj_stock 2.25 +12.4 14.61 +0.0 2.26 perf-profile.self.cycles-pp.__memcg_slab_free_hook 0.53 +19.0 19.48 +0.1 0.67 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-17 12:55 ` Oliver Sang @ 2026-05-17 19:38 ` Shakeel Butt [not found] ` <agtATZG9mIlYzMUl@linux.dev> 0 siblings, 1 reply; 13+ messages in thread From: Shakeel Butt @ 2026-05-17 19:38 UTC (permalink / raw) To: Oliver Sang Cc: Qi Zheng, oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm On Sun, May 17, 2026 at 08:55:50PM +0800, Oliver Sang wrote: > hi, Shakeel, hi, Qi, > > On Fri, May 15, 2026 at 10:09:06AM -0700, Shakeel Butt wrote: > > On Fri, May 15, 2026 at 03:37:22PM +0800, Qi Zheng wrote: > > > Hi Shakeel, > > > > > > On 5/14/26 9:40 PM, Shakeel Butt wrote: > > > > May 14, 2026 at 12:46 AM, "Qi Zheng" <qi.zheng@linux.dev mailto:qi.zheng@linux.dev?to=%22Qi%20Zheng%22%20%3Cqi.zheng%40linux.dev%3E > wrote: > > > > > > > > > > > > > > > > > > On 5/13/26 10:27 PM, Shakeel Butt wrote: > > > > > > > > > > > > > > > > > On Wed, May 13, 2026 at 06:49:45AM -0700, Shakeel Butt wrote: > > > > > > > > > > > > > > > > > > > > On Wed, May 13, 2026 at 10:10:34AM +0800, Qi Zheng wrote: > > > > > > > > > > > > > On 5/13/26 12:03 AM, Shakeel Butt wrote: > > > > > > On Tue, May 12, 2026 at 08:56:52PM +0800, kernel test robot wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > kernel test robot noticed a 67.7% regression of stress-ng.switch.ops_per_sec on: > > > > > > > > > > > > commit: 01b9da291c4969354807b52956f4aae1f41b4924 ("mm: memcontrol: convert objcg to be per-memcg per-node type") > > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > > > > > This is most probably due to shuffling of struct mem_cgroup and struct > > > > > > mem_cgroup_per_node members. > > > > > > > > > > > > Another possibility is that after objcg was split into per-node, the > > > > > > slab accounting fast path is still designed assuming only one current > > > > > > objcg per CPU: > > > > > > > > > > > > struct obj_stock_pcp { > > > > > > struct obj_cgroup *cached_objcg; > > > > > > }; > > > > > > > > > > > > So it's may cause the following thrashing: > > > > > > > > > > > > CPU stock cached = memcg/node0 objcg > > > > > > free object tagged = memcg/node1 objcg > > > > > > => __refill_obj_stock --> objcg mismatch > > > > > > => drain_obj_stock() > > > > > > => cache switches to node1 objcg > > > > > > > > > > > > next local allocation tagged = node0 objcg > > > > > > => mismatch again > > > > > > => drain_obj_stock() > > > > > > > > > > > > > > > > > > > > Actually I think this is the issue, we have ping pong threads running on > > > > > > > different nodes where though theu are in same cgroup but their current->obcg is > > > > > > > for local node and thus this ping pong is thrashing the per-cpu objcg stock. > > > > > > > > > > > > > > The easier fix would be to compare objcg->memcg instead of just objcg during > > > > > > > draining and caching. In addition we can add support for multiple objcg per-cpu > > > > > > > stock caching. > > > > > > > > > > > > > Something like the following: > > > > > > From d756abe831a905d6fe32bad9a984fc619dafb7e0 Mon Sep 17 00:00:00 2001 > > > > > > From: Shakeel Butt <shakeel.butt@linux.dev> > > > > > > Date: Wed, 13 May 2026 07:24:55 -0700 > > > > > > Subject: [PATCH] mm/memcontrol: skip obj_stock drain when refilled objcg > > > > > > shares memcg > > > > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> > > > > > > --- > > > > > > mm/memcontrol.c | 14 +++++++++++++- > > > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > > > index d978e18b9b2d..01ed7a8e18ac 100644 > > > > > > --- a/mm/memcontrol.c > > > > > > +++ b/mm/memcontrol.c > > > > > > @@ -3318,6 +3318,7 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > > > > > unsigned int nr_bytes, > > > > > > bool allow_uncharge) > > > > > > { > > > > > > + struct obj_cgroup *cached; > > > > > > unsigned int nr_pages = 0; > > > > > > > if (!stock) { > > > > > > @@ -3327,7 +3328,18 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > > > > > goto out; > > > > > > } > > > > > > > - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ > > > > > > + cached = READ_ONCE(stock->cached_objcg); > > > > > > + if (cached != objcg && > > > > > > + (!cached || obj_cgroup_memcg(cached) != obj_cgroup_memcg(objcg))) { > > > > > > drain_obj_stock(stock); > > > > > > obj_cgroup_get(objcg); > > > > > > stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) > > > > > > > > > > > This change looks like it should be able to fix the ping-pong issue, but > > > > > I stiil haven't reproduced the performance regression locally. I'll > > > > > continue testing it. > > > > > > > > Same here, couldn't reproduce locally. It seems like we had to craft a scenario > > > > where the pair pingpong threads get their current->objcg from different nodes. > > > > I will try that. > > > > > > I still haven't been able to reproduce the LKP results locally, but I > > > used an AI bot to generate a pingpong test case (pasted at the end) and > > > automatically ran the test on a physical machine. The results are as > > > follows: > > > > > > parent: 8285917d6f > > > bad: 01b9da291c > > > fix: 01b9da291c + stock patch > > > > > > | kernel | mq_ops/sec mean | vs parent | drain_obj_stock / round | > > > |--------|-----------------|-----------|-------------------------| > > > | parent | 9.743M | baseline | ~0 | > > > | bad | 7.821M | -19.73% | ~11.16M | > > > | fix | 9.274M | -4.81% | ~0 | > > > > > > Probing the drain_obj_stock() calls confirms that the fix restores the > > > frequency to the parent's baseline. > > > > > > And it seems that besides __refill_obj_stock(), we should also modify > > > __consume_obj_stock()? > > > > > > > Thanks a lot Qi. I will send the formal patch and will add your Debugged-by if > > you don't mind. > > > > Tested-by: kernel test robot <oliver.sang@intel.com> > > we tested above patch, and it recovers the regression: > > ========================================================================================= > compiler/cpufreq_governor/kconfig/method/nr_threads/rootfs/tbox_group/test/testcase/testtime: > gcc-14/performance/x86_64-rhel-9.4/mq/100%/debian-13-x86_64-20250902.cgz/lkp-spr-r02/switch/stress-ng/60s > > commit: > 8285917d6f ("mm: memcontrol: prepare for reparenting non-hierarchical stats") > 01b9da291c ("mm: memcontrol: convert objcg to be per-memcg per-node type") > 682fd4e9ff <--- above patch from Shakeel > > 8285917d6f383aef 01b9da291c4969354807b52956f 682fd4e9ffd4009805f81dd25ed > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 5849 +210.2% 18145 ± 3% +1.5% 5935 stress-ng.switch.nanosecs_per_context_switch_mq_method > 2.296e+09 -67.7% 7.408e+08 ± 3% -1.4% 2.263e+09 stress-ng.switch.ops > 38288993 -67.7% 12355813 ± 3% -1.4% 37739220 stress-ng.switch.ops_per_sec > > > full compasison is as below [3] > > but there are two notes. > > #1 is that we noticed there is a fomal patch later from Shakeel in [1] which has > more changes. not sure if this test is enough? do you want us to test [1] > further? Thanks Oliver, I will send a v2 soon, please test v2. > > #2: when we test above patch, we found the server easy to crash while running > tests. we try to run up to 20 times, only 2 of them run successfully (above > 37739220 is just the average data from these 2 runs, since the data is stable, > we think maybe it's ok to report to you with this data). > we also noticed for [1] there is a [syzbot ci] report in [2]. since we don't > have serial output for our test server in this report which is for performance > tests, we cannot say if other 18 runs failed due to similar reason. just FYI. > The syzbot report is simply a rcu warning which will be fixed in v2. Do you have more details on the crash you are seeing? Is it page counter underflow warning? Thanks again for the help. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <agtATZG9mIlYzMUl@linux.dev>]
[parent not found: <agtPMpQK2jXdQAY4@linux.dev>]
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression [not found] ` <agtPMpQK2jXdQAY4@linux.dev> @ 2026-05-19 5:04 ` Oliver Sang 2026-05-19 14:22 ` Shakeel Butt 0 siblings, 1 reply; 13+ messages in thread From: Oliver Sang @ 2026-05-19 5:04 UTC (permalink / raw) To: Shakeel Butt Cc: Qi Zheng, oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm, oliver.sang hi, Shakeel, On Mon, May 18, 2026 at 10:54:20AM -0700, Shakeel Butt wrote: > On Mon, May 18, 2026 at 09:39:00AM -0700, Shakeel Butt wrote: > > On Sun, May 17, 2026 at 12:38:48PM -0700, Shakeel Butt wrote: > > > On Sun, May 17, 2026 at 08:55:50PM +0800, Oliver Sang wrote: > > > > hi, Shakeel, hi, Qi, > > > > > > > > #2: when we test above patch, we found the server easy to crash while running > > > > tests. we try to run up to 20 times, only 2 of them run successfully (above > > > > 37739220 is just the average data from these 2 runs, since the data is stable, > > > > we think maybe it's ok to report to you with this data). > > > > we also noticed for [1] there is a [syzbot ci] report in [2]. since we don't > > > > have serial output for our test server in this report which is for performance > > > > tests, we cannot say if other 18 runs failed due to similar reason. just FYI. > > > > > > > > > > The syzbot report is simply a rcu warning which will be fixed in v2. Do you > > > have more details on the crash you are seeing? Is it page counter underflow > > > warning? > > > > > > Thanks again for the help. > > > > Hi Oliver, it seems like sashiko found another issue with v2, so, if you have > > not yet started the test, you can skip it. firstly, let me still give you an update about v2. I applied it directly on top of 01b9da291c, found it can recover the performance. ========================================================================================= compiler/cpufreq_governor/kconfig/method/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-14/performance/x86_64-rhel-9.4/mq/100%/debian-13-x86_64-20250902.cgz/lkp-spr-r02/switch/stress-ng/60s commit: 8285917d6f ("mm: memcontrol: prepare for reparenting non-hierarchical stats") 01b9da291c ("mm: memcontrol: convert objcg to be per-memcg per-node type") 8da1b1ea43 ("memcg: cache obj_stock by memcg, not by objcg pointer") <---- v2 8285917d6f383aef 01b9da291c4969354807b52956f 8da1b1ea4344c152a3892cbb132 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 5849 +210.2% 18145 ± 3% +0.8% 5896 stress-ng.switch.nanosecs_per_context_switch_mq_method 2.296e+09 -67.7% 7.408e+08 ± 3% -0.8% 2.278e+09 stress-ng.switch.ops 38288993 -67.7% 12355813 ± 3% -0.8% 37987427 stress-ng.switch.ops_per_sec but since this version is out-of-date now, I won't give out the full comparison. if you still want it, please let me know. > > > > Also I am rethinking the approach, so I will send a prototype in response on > > this email for which I will need your help in testing. > > Hi Oliver, can you please test the following patch? got it. will change to test following patch. and this looks quite different with v2 or v3, so if you still want us to test v3, please let me know. thanks! > > From: Shakeel Butt <shakeel.butt@linux.dev> > Subject: [PATCH] memcg: shrink obj_stock_pcp and cache multiple objcgs > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> > --- > mm/memcontrol.c | 213 +++++++++++++++++++++++++++++++++++------------- > 1 file changed, 156 insertions(+), 57 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index d978e18b9b2d..2a9e5136a956 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -150,14 +150,14 @@ static void obj_cgroup_release(struct percpu_ref *ref) > * However, it can be PAGE_SIZE or (x * PAGE_SIZE). > * > * The following sequence can lead to it: > - * 1) CPU0: objcg == stock->cached_objcg > + * 1) CPU0: objcg cached in one of stock->cached[i] > * 2) CPU1: we do a small allocation (e.g. 92 bytes), > * PAGE_SIZE bytes are charged > * 3) CPU1: a process from another memcg is allocating something, > * the stock if flushed, > * objcg->nr_charged_bytes = PAGE_SIZE - 92 > * 5) CPU0: we do release this object, > - * 92 bytes are added to stock->nr_bytes > + * 92 bytes are added to stock->nr_bytes[i] > * 6) CPU0: stock is flushed, > * 92 bytes are added to objcg->nr_charged_bytes > * > @@ -2017,13 +2017,25 @@ static DEFINE_PER_CPU_ALIGNED(struct memcg_stock_pcp, memcg_stock) = { > .lock = INIT_LOCAL_TRYLOCK(lock), > }; > > +/* > + * NR_OBJ_STOCK is sized so the entire hot path of obj_stock_pcp > + * (lock, accounting metadata, nr_bytes[] and cached[]) fits within a > + * single 64-byte cache line on non-debug 64-bit builds. With 5 slots: > + * lock(1) + index(1) + node_id(2) + slab stats(4) + nr_bytes(10) > + * + pad(6) + cached(40) == 64 bytes. > + * A CPU can thus consume/refill/account against five different objcgs > + * (typically per-node variants of the same memcg) while incurring at > + * most one cache miss on the stock. > + */ > +#define NR_OBJ_STOCK 5 > struct obj_stock_pcp { > local_trylock_t lock; > - unsigned int nr_bytes; > - struct obj_cgroup *cached_objcg; > - struct pglist_data *cached_pgdat; > - int nr_slab_reclaimable_b; > - int nr_slab_unreclaimable_b; > + int8_t index; > + int16_t node_id; > + int16_t nr_slab_reclaimable_b; > + int16_t nr_slab_unreclaimable_b; > + uint16_t nr_bytes[NR_OBJ_STOCK]; > + struct obj_cgroup *cached[NR_OBJ_STOCK]; > > struct work_struct work; > unsigned long flags; > @@ -2031,10 +2043,13 @@ struct obj_stock_pcp { > > static DEFINE_PER_CPU_ALIGNED(struct obj_stock_pcp, obj_stock) = { > .lock = INIT_LOCAL_TRYLOCK(lock), > + .index = -1, > + .node_id = NUMA_NO_NODE, > }; > > static DEFINE_MUTEX(percpu_charge_mutex); > > +static void drain_obj_stock_slot(struct obj_stock_pcp *stock, int i); > static void drain_obj_stock(struct obj_stock_pcp *stock); > static bool obj_stock_flush_required(struct obj_stock_pcp *stock, > struct mem_cgroup *root_memcg); > @@ -3152,39 +3167,68 @@ static void unlock_stock(struct obj_stock_pcp *stock) > local_unlock(&obj_stock.lock); > } > > -/* Call after __refill_obj_stock() to ensure stock->cached_objg == objcg */ > +/* Call after __refill_obj_stock() so a slot for objcg exists in the stock */ > static void __account_obj_stock(struct obj_cgroup *objcg, > struct obj_stock_pcp *stock, int nr, > struct pglist_data *pgdat, enum node_stat_item idx) > { > - int *bytes; > + int16_t *bytes; > + int i; > > - if (!stock || READ_ONCE(stock->cached_objcg) != objcg) > + /* > + * node_id is stored as int16_t and -1 is used as the "no pgdat > + * cached" sentinel, so MAX_NUMNODES must fit in a positive int16_t. > + */ > + BUILD_BUG_ON(MAX_NUMNODES >= S16_MAX); > + > + if (!stock) > + goto direct; > + > + for (i = 0; i < NR_OBJ_STOCK; ++i) { > + if (READ_ONCE(stock->cached[i]) == objcg) > + break; > + } > + if (i == NR_OBJ_STOCK) > goto direct; > > /* > * Save vmstat data in stock and skip vmstat array update unless > - * accumulating over a page of vmstat data or when pgdat changes. > + * accumulating over a page of vmstat data or when the objcg slot or > + * pgdat the stats belong to changes. > */ > - if (stock->cached_pgdat != pgdat) { > - /* Flush the existing cached vmstat data */ > - struct pglist_data *oldpg = stock->cached_pgdat; > + if (stock->index < 0) { > + stock->index = i; > + stock->node_id = pgdat->node_id; > + } else if (stock->index != i || stock->node_id != pgdat->node_id) { > + struct obj_cgroup *old = READ_ONCE(stock->cached[stock->index]); > + struct pglist_data *oldpg = NODE_DATA(stock->node_id); > > if (stock->nr_slab_reclaimable_b) { > - mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B, > + mod_objcg_mlstate(old, oldpg, NR_SLAB_RECLAIMABLE_B, > stock->nr_slab_reclaimable_b); > stock->nr_slab_reclaimable_b = 0; > } > if (stock->nr_slab_unreclaimable_b) { > - mod_objcg_mlstate(objcg, oldpg, NR_SLAB_UNRECLAIMABLE_B, > + mod_objcg_mlstate(old, oldpg, NR_SLAB_UNRECLAIMABLE_B, > stock->nr_slab_unreclaimable_b); > stock->nr_slab_unreclaimable_b = 0; > } > - stock->cached_pgdat = pgdat; > + stock->index = i; > + stock->node_id = pgdat->node_id; > } > > bytes = (idx == NR_SLAB_RECLAIMABLE_B) ? &stock->nr_slab_reclaimable_b > : &stock->nr_slab_unreclaimable_b; > + /* > + * Cached stats are int16_t; flush directly if accumulating @nr would > + * overflow or underflow the cache. > + */ > + if (abs(nr + *bytes) >= S16_MAX) { > + nr += *bytes; > + *bytes = 0; > + goto direct; > + } > + > /* > * Even for large object >= PAGE_SIZE, the vmstat data will still be > * cached locally at least once before pushing it out. > @@ -3210,10 +3254,16 @@ static bool __consume_obj_stock(struct obj_cgroup *objcg, > struct obj_stock_pcp *stock, > unsigned int nr_bytes) > { > - if (objcg == READ_ONCE(stock->cached_objcg) && > - stock->nr_bytes >= nr_bytes) { > - stock->nr_bytes -= nr_bytes; > - return true; > + int i; > + > + for (i = 0; i < NR_OBJ_STOCK; ++i) { > + if (READ_ONCE(stock->cached[i]) != objcg) > + continue; > + if (stock->nr_bytes[i] >= nr_bytes) { > + stock->nr_bytes[i] -= nr_bytes; > + return true; > + } > + return false; > } > > return false; > @@ -3234,16 +3284,42 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) > return ret; > } > > -static void drain_obj_stock(struct obj_stock_pcp *stock) > +/* Flush the cached slab stats (if any) back to their owning objcg/pgdat. */ > +static void drain_obj_stock_stats(struct obj_stock_pcp *stock) > +{ > + struct obj_cgroup *old; > + struct pglist_data *oldpg; > + > + if (stock->index < 0) > + return; > + > + old = READ_ONCE(stock->cached[stock->index]); > + oldpg = NODE_DATA(stock->node_id); > + > + if (stock->nr_slab_reclaimable_b) { > + mod_objcg_mlstate(old, oldpg, NR_SLAB_RECLAIMABLE_B, > + stock->nr_slab_reclaimable_b); > + stock->nr_slab_reclaimable_b = 0; > + } > + if (stock->nr_slab_unreclaimable_b) { > + mod_objcg_mlstate(old, oldpg, NR_SLAB_UNRECLAIMABLE_B, > + stock->nr_slab_unreclaimable_b); > + stock->nr_slab_unreclaimable_b = 0; > + } > + stock->index = -1; > + stock->node_id = NUMA_NO_NODE; > +} > + > +static void drain_obj_stock_slot(struct obj_stock_pcp *stock, int i) > { > - struct obj_cgroup *old = READ_ONCE(stock->cached_objcg); > + struct obj_cgroup *old = READ_ONCE(stock->cached[i]); > > if (!old) > return; > > - if (stock->nr_bytes) { > - unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; > - unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1); > + if (stock->nr_bytes[i]) { > + unsigned int nr_pages = stock->nr_bytes[i] >> PAGE_SHIFT; > + unsigned int nr_bytes = stock->nr_bytes[i] & (PAGE_SIZE - 1); > > if (nr_pages) { > struct mem_cgroup *memcg; > @@ -3269,44 +3345,43 @@ static void drain_obj_stock(struct obj_stock_pcp *stock) > * so it might be changed in the future. > */ > atomic_add(nr_bytes, &old->nr_charged_bytes); > - stock->nr_bytes = 0; > + stock->nr_bytes[i] = 0; > } > > - /* > - * Flush the vmstat data in current stock > - */ > - if (stock->nr_slab_reclaimable_b || stock->nr_slab_unreclaimable_b) { > - if (stock->nr_slab_reclaimable_b) { > - mod_objcg_mlstate(old, stock->cached_pgdat, > - NR_SLAB_RECLAIMABLE_B, > - stock->nr_slab_reclaimable_b); > - stock->nr_slab_reclaimable_b = 0; > - } > - if (stock->nr_slab_unreclaimable_b) { > - mod_objcg_mlstate(old, stock->cached_pgdat, > - NR_SLAB_UNRECLAIMABLE_B, > - stock->nr_slab_unreclaimable_b); > - stock->nr_slab_unreclaimable_b = 0; > - } > - stock->cached_pgdat = NULL; > - } > + /* Flush vmstat data when its owning slot is being drained. */ > + if (stock->index == i) > + drain_obj_stock_stats(stock); > > - WRITE_ONCE(stock->cached_objcg, NULL); > + WRITE_ONCE(stock->cached[i], NULL); > obj_cgroup_put(old); > } > > +static void drain_obj_stock(struct obj_stock_pcp *stock) > +{ > + int i; > + > + for (i = 0; i < NR_OBJ_STOCK; ++i) > + drain_obj_stock_slot(stock, i); > +} > + > static bool obj_stock_flush_required(struct obj_stock_pcp *stock, > struct mem_cgroup *root_memcg) > { > - struct obj_cgroup *objcg = READ_ONCE(stock->cached_objcg); > + struct obj_cgroup *objcg; > struct mem_cgroup *memcg; > bool flush = false; > + int i; > > rcu_read_lock(); > - if (objcg) { > + for (i = 0; i < NR_OBJ_STOCK; ++i) { > + objcg = READ_ONCE(stock->cached[i]); > + if (!objcg) > + continue; > memcg = obj_cgroup_memcg(objcg); > - if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) > + if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) { > flush = true; > + break; > + } > } > rcu_read_unlock(); > > @@ -3319,6 +3394,8 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > bool allow_uncharge) > { > unsigned int nr_pages = 0; > + unsigned int stock_nr_bytes; > + int i, slot = -1, empty_slot = -1; > > if (!stock) { > nr_pages = nr_bytes >> PAGE_SHIFT; > @@ -3327,21 +3404,43 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > goto out; > } > > - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ > - drain_obj_stock(stock); > + for (i = 0; i < NR_OBJ_STOCK; ++i) { > + struct obj_cgroup *cached = READ_ONCE(stock->cached[i]); > + > + if (!cached) { > + if (empty_slot == -1) > + empty_slot = i; > + continue; > + } > + if (cached == objcg) { > + slot = i; > + break; > + } > + } > + > + if (slot == -1) { > + slot = empty_slot; > + if (slot == -1) { > + slot = get_random_u32_below(NR_OBJ_STOCK); > + drain_obj_stock_slot(stock, slot); > + } > obj_cgroup_get(objcg); > - stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) > + stock->nr_bytes[slot] = atomic_read(&objcg->nr_charged_bytes) > ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0; > - WRITE_ONCE(stock->cached_objcg, objcg); > + WRITE_ONCE(stock->cached[slot], objcg); > > allow_uncharge = true; /* Allow uncharge when objcg changes */ > } > - stock->nr_bytes += nr_bytes; > > - if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) { > - nr_pages = stock->nr_bytes >> PAGE_SHIFT; > - stock->nr_bytes &= (PAGE_SIZE - 1); > + stock_nr_bytes = (unsigned int)stock->nr_bytes[slot] + nr_bytes; > + > + /* nr_bytes[] is uint16_t; flush if we would refill >= U16_MAX. */ > + if ((allow_uncharge && (stock_nr_bytes > PAGE_SIZE)) || > + stock_nr_bytes >= U16_MAX) { > + nr_pages = stock_nr_bytes >> PAGE_SHIFT; > + stock_nr_bytes &= (PAGE_SIZE - 1); > } > + stock->nr_bytes[slot] = stock_nr_bytes; > > out: > if (nr_pages) > -- > 2.53.0-Meta > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression 2026-05-19 5:04 ` Oliver Sang @ 2026-05-19 14:22 ` Shakeel Butt 0 siblings, 0 replies; 13+ messages in thread From: Shakeel Butt @ 2026-05-19 14:22 UTC (permalink / raw) To: Oliver Sang Cc: Qi Zheng, oe-lkp, lkp, linux-kernel, Andrew Morton, David Carlier, Allen Pais, Axel Rasmussen, Baoquan He, Chengming Zhou, Chen Ridong, David Hildenbrand, Hamza Mahfooz, Harry Yoo, Hugh Dickins, Imran Khan, Johannes Weiner, Kamalesh Babulal, Lance Yang, Liam Howlett, Lorenzo Stoakes, Michal Hocko, Michal Koutný, Mike Rapoport, Muchun Song, Muchun Song, Nhat Pham, Roman Gushchin, Suren Baghdasaryan, Usama Arif, Vlastimil Babka, Wei Xu, Yosry Ahmed, Yuanchu Xie, Zi Yan, Usama Arif, cgroups, linux-mm Hi Oliver, On Tue, May 19, 2026 at 01:04:04PM +0800, Oliver Sang wrote: > hi, Shakeel, > > On Mon, May 18, 2026 at 10:54:20AM -0700, Shakeel Butt wrote: > > On Mon, May 18, 2026 at 09:39:00AM -0700, Shakeel Butt wrote: > > > On Sun, May 17, 2026 at 12:38:48PM -0700, Shakeel Butt wrote: > > > > On Sun, May 17, 2026 at 08:55:50PM +0800, Oliver Sang wrote: > > > > > hi, Shakeel, hi, Qi, > > > > > > > > > > #2: when we test above patch, we found the server easy to crash while running > > > > > tests. we try to run up to 20 times, only 2 of them run successfully (above > > > > > 37739220 is just the average data from these 2 runs, since the data is stable, > > > > > we think maybe it's ok to report to you with this data). > > > > > we also noticed for [1] there is a [syzbot ci] report in [2]. since we don't > > > > > have serial output for our test server in this report which is for performance > > > > > tests, we cannot say if other 18 runs failed due to similar reason. just FYI. > > > > > > > > > > > > > The syzbot report is simply a rcu warning which will be fixed in v2. Do you > > > > have more details on the crash you are seeing? Is it page counter underflow > > > > warning? > > > > > > > > Thanks again for the help. > > > > > > Hi Oliver, it seems like sashiko found another issue with v2, so, if you have > > > not yet started the test, you can skip it. > > firstly, let me still give you an update about v2. I applied it directly on top > of 01b9da291c, found it can recover the performance. > > ========================================================================================= > compiler/cpufreq_governor/kconfig/method/nr_threads/rootfs/tbox_group/test/testcase/testtime: > gcc-14/performance/x86_64-rhel-9.4/mq/100%/debian-13-x86_64-20250902.cgz/lkp-spr-r02/switch/stress-ng/60s > > commit: > 8285917d6f ("mm: memcontrol: prepare for reparenting non-hierarchical stats") > 01b9da291c ("mm: memcontrol: convert objcg to be per-memcg per-node type") > 8da1b1ea43 ("memcg: cache obj_stock by memcg, not by objcg pointer") <---- v2 > > 8285917d6f383aef 01b9da291c4969354807b52956f 8da1b1ea4344c152a3892cbb132 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 5849 +210.2% 18145 ± 3% +0.8% 5896 stress-ng.switch.nanosecs_per_context_switch_mq_method > 2.296e+09 -67.7% 7.408e+08 ± 3% -0.8% 2.278e+09 stress-ng.switch.ops > 38288993 -67.7% 12355813 ± 3% -0.8% 37987427 stress-ng.switch.ops_per_sec > > but since this version is out-of-date now, I won't give out the full > comparison. if you still want it, please let me know. > Thanks a lot and this is good enough. > > > > > > Also I am rethinking the approach, so I will send a prototype in response on > > > this email for which I will need your help in testing. > > > > Hi Oliver, can you please test the following patch? > > got it. will change to test following patch. and this looks quite different > with v2 or v3, so if you still want us to test v3, please let me know. thanks! > No need to test v3 as it is similar to v2. Please test the following patch as it is a direction I want to pursue and wanted an early signal if this is the right direction. > > > > From: Shakeel Butt <shakeel.butt@linux.dev> > > Subject: [PATCH] memcg: shrink obj_stock_pcp and cache multiple objcgs > > > > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> > > --- > > mm/memcontrol.c | 213 +++++++++++++++++++++++++++++++++++------------- > > 1 file changed, 156 insertions(+), 57 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index d978e18b9b2d..2a9e5136a956 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -150,14 +150,14 @@ static void obj_cgroup_release(struct percpu_ref *ref) > > * However, it can be PAGE_SIZE or (x * PAGE_SIZE). > > * > > * The following sequence can lead to it: > > - * 1) CPU0: objcg == stock->cached_objcg > > + * 1) CPU0: objcg cached in one of stock->cached[i] > > * 2) CPU1: we do a small allocation (e.g. 92 bytes), > > * PAGE_SIZE bytes are charged > > * 3) CPU1: a process from another memcg is allocating something, > > * the stock if flushed, > > * objcg->nr_charged_bytes = PAGE_SIZE - 92 > > * 5) CPU0: we do release this object, > > - * 92 bytes are added to stock->nr_bytes > > + * 92 bytes are added to stock->nr_bytes[i] > > * 6) CPU0: stock is flushed, > > * 92 bytes are added to objcg->nr_charged_bytes > > * > > @@ -2017,13 +2017,25 @@ static DEFINE_PER_CPU_ALIGNED(struct memcg_stock_pcp, memcg_stock) = { > > .lock = INIT_LOCAL_TRYLOCK(lock), > > }; > > > > +/* > > + * NR_OBJ_STOCK is sized so the entire hot path of obj_stock_pcp > > + * (lock, accounting metadata, nr_bytes[] and cached[]) fits within a > > + * single 64-byte cache line on non-debug 64-bit builds. With 5 slots: > > + * lock(1) + index(1) + node_id(2) + slab stats(4) + nr_bytes(10) > > + * + pad(6) + cached(40) == 64 bytes. > > + * A CPU can thus consume/refill/account against five different objcgs > > + * (typically per-node variants of the same memcg) while incurring at > > + * most one cache miss on the stock. > > + */ > > +#define NR_OBJ_STOCK 5 > > struct obj_stock_pcp { > > local_trylock_t lock; > > - unsigned int nr_bytes; > > - struct obj_cgroup *cached_objcg; > > - struct pglist_data *cached_pgdat; > > - int nr_slab_reclaimable_b; > > - int nr_slab_unreclaimable_b; > > + int8_t index; > > + int16_t node_id; > > + int16_t nr_slab_reclaimable_b; > > + int16_t nr_slab_unreclaimable_b; > > + uint16_t nr_bytes[NR_OBJ_STOCK]; > > + struct obj_cgroup *cached[NR_OBJ_STOCK]; > > > > struct work_struct work; > > unsigned long flags; > > @@ -2031,10 +2043,13 @@ struct obj_stock_pcp { > > > > static DEFINE_PER_CPU_ALIGNED(struct obj_stock_pcp, obj_stock) = { > > .lock = INIT_LOCAL_TRYLOCK(lock), > > + .index = -1, > > + .node_id = NUMA_NO_NODE, > > }; > > > > static DEFINE_MUTEX(percpu_charge_mutex); > > > > +static void drain_obj_stock_slot(struct obj_stock_pcp *stock, int i); > > static void drain_obj_stock(struct obj_stock_pcp *stock); > > static bool obj_stock_flush_required(struct obj_stock_pcp *stock, > > struct mem_cgroup *root_memcg); > > @@ -3152,39 +3167,68 @@ static void unlock_stock(struct obj_stock_pcp *stock) > > local_unlock(&obj_stock.lock); > > } > > > > -/* Call after __refill_obj_stock() to ensure stock->cached_objg == objcg */ > > +/* Call after __refill_obj_stock() so a slot for objcg exists in the stock */ > > static void __account_obj_stock(struct obj_cgroup *objcg, > > struct obj_stock_pcp *stock, int nr, > > struct pglist_data *pgdat, enum node_stat_item idx) > > { > > - int *bytes; > > + int16_t *bytes; > > + int i; > > > > - if (!stock || READ_ONCE(stock->cached_objcg) != objcg) > > + /* > > + * node_id is stored as int16_t and -1 is used as the "no pgdat > > + * cached" sentinel, so MAX_NUMNODES must fit in a positive int16_t. > > + */ > > + BUILD_BUG_ON(MAX_NUMNODES >= S16_MAX); > > + > > + if (!stock) > > + goto direct; > > + > > + for (i = 0; i < NR_OBJ_STOCK; ++i) { > > + if (READ_ONCE(stock->cached[i]) == objcg) > > + break; > > + } > > + if (i == NR_OBJ_STOCK) > > goto direct; > > > > /* > > * Save vmstat data in stock and skip vmstat array update unless > > - * accumulating over a page of vmstat data or when pgdat changes. > > + * accumulating over a page of vmstat data or when the objcg slot or > > + * pgdat the stats belong to changes. > > */ > > - if (stock->cached_pgdat != pgdat) { > > - /* Flush the existing cached vmstat data */ > > - struct pglist_data *oldpg = stock->cached_pgdat; > > + if (stock->index < 0) { > > + stock->index = i; > > + stock->node_id = pgdat->node_id; > > + } else if (stock->index != i || stock->node_id != pgdat->node_id) { > > + struct obj_cgroup *old = READ_ONCE(stock->cached[stock->index]); > > + struct pglist_data *oldpg = NODE_DATA(stock->node_id); > > > > if (stock->nr_slab_reclaimable_b) { > > - mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B, > > + mod_objcg_mlstate(old, oldpg, NR_SLAB_RECLAIMABLE_B, > > stock->nr_slab_reclaimable_b); > > stock->nr_slab_reclaimable_b = 0; > > } > > if (stock->nr_slab_unreclaimable_b) { > > - mod_objcg_mlstate(objcg, oldpg, NR_SLAB_UNRECLAIMABLE_B, > > + mod_objcg_mlstate(old, oldpg, NR_SLAB_UNRECLAIMABLE_B, > > stock->nr_slab_unreclaimable_b); > > stock->nr_slab_unreclaimable_b = 0; > > } > > - stock->cached_pgdat = pgdat; > > + stock->index = i; > > + stock->node_id = pgdat->node_id; > > } > > > > bytes = (idx == NR_SLAB_RECLAIMABLE_B) ? &stock->nr_slab_reclaimable_b > > : &stock->nr_slab_unreclaimable_b; > > + /* > > + * Cached stats are int16_t; flush directly if accumulating @nr would > > + * overflow or underflow the cache. > > + */ > > + if (abs(nr + *bytes) >= S16_MAX) { > > + nr += *bytes; > > + *bytes = 0; > > + goto direct; > > + } > > + > > /* > > * Even for large object >= PAGE_SIZE, the vmstat data will still be > > * cached locally at least once before pushing it out. > > @@ -3210,10 +3254,16 @@ static bool __consume_obj_stock(struct obj_cgroup *objcg, > > struct obj_stock_pcp *stock, > > unsigned int nr_bytes) > > { > > - if (objcg == READ_ONCE(stock->cached_objcg) && > > - stock->nr_bytes >= nr_bytes) { > > - stock->nr_bytes -= nr_bytes; > > - return true; > > + int i; > > + > > + for (i = 0; i < NR_OBJ_STOCK; ++i) { > > + if (READ_ONCE(stock->cached[i]) != objcg) > > + continue; > > + if (stock->nr_bytes[i] >= nr_bytes) { > > + stock->nr_bytes[i] -= nr_bytes; > > + return true; > > + } > > + return false; > > } > > > > return false; > > @@ -3234,16 +3284,42 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) > > return ret; > > } > > > > -static void drain_obj_stock(struct obj_stock_pcp *stock) > > +/* Flush the cached slab stats (if any) back to their owning objcg/pgdat. */ > > +static void drain_obj_stock_stats(struct obj_stock_pcp *stock) > > +{ > > + struct obj_cgroup *old; > > + struct pglist_data *oldpg; > > + > > + if (stock->index < 0) > > + return; > > + > > + old = READ_ONCE(stock->cached[stock->index]); > > + oldpg = NODE_DATA(stock->node_id); > > + > > + if (stock->nr_slab_reclaimable_b) { > > + mod_objcg_mlstate(old, oldpg, NR_SLAB_RECLAIMABLE_B, > > + stock->nr_slab_reclaimable_b); > > + stock->nr_slab_reclaimable_b = 0; > > + } > > + if (stock->nr_slab_unreclaimable_b) { > > + mod_objcg_mlstate(old, oldpg, NR_SLAB_UNRECLAIMABLE_B, > > + stock->nr_slab_unreclaimable_b); > > + stock->nr_slab_unreclaimable_b = 0; > > + } > > + stock->index = -1; > > + stock->node_id = NUMA_NO_NODE; > > +} > > + > > +static void drain_obj_stock_slot(struct obj_stock_pcp *stock, int i) > > { > > - struct obj_cgroup *old = READ_ONCE(stock->cached_objcg); > > + struct obj_cgroup *old = READ_ONCE(stock->cached[i]); > > > > if (!old) > > return; > > > > - if (stock->nr_bytes) { > > - unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; > > - unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1); > > + if (stock->nr_bytes[i]) { > > + unsigned int nr_pages = stock->nr_bytes[i] >> PAGE_SHIFT; > > + unsigned int nr_bytes = stock->nr_bytes[i] & (PAGE_SIZE - 1); > > > > if (nr_pages) { > > struct mem_cgroup *memcg; > > @@ -3269,44 +3345,43 @@ static void drain_obj_stock(struct obj_stock_pcp *stock) > > * so it might be changed in the future. > > */ > > atomic_add(nr_bytes, &old->nr_charged_bytes); > > - stock->nr_bytes = 0; > > + stock->nr_bytes[i] = 0; > > } > > > > - /* > > - * Flush the vmstat data in current stock > > - */ > > - if (stock->nr_slab_reclaimable_b || stock->nr_slab_unreclaimable_b) { > > - if (stock->nr_slab_reclaimable_b) { > > - mod_objcg_mlstate(old, stock->cached_pgdat, > > - NR_SLAB_RECLAIMABLE_B, > > - stock->nr_slab_reclaimable_b); > > - stock->nr_slab_reclaimable_b = 0; > > - } > > - if (stock->nr_slab_unreclaimable_b) { > > - mod_objcg_mlstate(old, stock->cached_pgdat, > > - NR_SLAB_UNRECLAIMABLE_B, > > - stock->nr_slab_unreclaimable_b); > > - stock->nr_slab_unreclaimable_b = 0; > > - } > > - stock->cached_pgdat = NULL; > > - } > > + /* Flush vmstat data when its owning slot is being drained. */ > > + if (stock->index == i) > > + drain_obj_stock_stats(stock); > > > > - WRITE_ONCE(stock->cached_objcg, NULL); > > + WRITE_ONCE(stock->cached[i], NULL); > > obj_cgroup_put(old); > > } > > > > +static void drain_obj_stock(struct obj_stock_pcp *stock) > > +{ > > + int i; > > + > > + for (i = 0; i < NR_OBJ_STOCK; ++i) > > + drain_obj_stock_slot(stock, i); > > +} > > + > > static bool obj_stock_flush_required(struct obj_stock_pcp *stock, > > struct mem_cgroup *root_memcg) > > { > > - struct obj_cgroup *objcg = READ_ONCE(stock->cached_objcg); > > + struct obj_cgroup *objcg; > > struct mem_cgroup *memcg; > > bool flush = false; > > + int i; > > > > rcu_read_lock(); > > - if (objcg) { > > + for (i = 0; i < NR_OBJ_STOCK; ++i) { > > + objcg = READ_ONCE(stock->cached[i]); > > + if (!objcg) > > + continue; > > memcg = obj_cgroup_memcg(objcg); > > - if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) > > + if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) { > > flush = true; > > + break; > > + } > > } > > rcu_read_unlock(); > > > > @@ -3319,6 +3394,8 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > bool allow_uncharge) > > { > > unsigned int nr_pages = 0; > > + unsigned int stock_nr_bytes; > > + int i, slot = -1, empty_slot = -1; > > > > if (!stock) { > > nr_pages = nr_bytes >> PAGE_SHIFT; > > @@ -3327,21 +3404,43 @@ static void __refill_obj_stock(struct obj_cgroup *objcg, > > goto out; > > } > > > > - if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */ > > - drain_obj_stock(stock); > > + for (i = 0; i < NR_OBJ_STOCK; ++i) { > > + struct obj_cgroup *cached = READ_ONCE(stock->cached[i]); > > + > > + if (!cached) { > > + if (empty_slot == -1) > > + empty_slot = i; > > + continue; > > + } > > + if (cached == objcg) { > > + slot = i; > > + break; > > + } > > + } > > + > > + if (slot == -1) { > > + slot = empty_slot; > > + if (slot == -1) { > > + slot = get_random_u32_below(NR_OBJ_STOCK); > > + drain_obj_stock_slot(stock, slot); > > + } > > obj_cgroup_get(objcg); > > - stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) > > + stock->nr_bytes[slot] = atomic_read(&objcg->nr_charged_bytes) > > ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0; > > - WRITE_ONCE(stock->cached_objcg, objcg); > > + WRITE_ONCE(stock->cached[slot], objcg); > > > > allow_uncharge = true; /* Allow uncharge when objcg changes */ > > } > > - stock->nr_bytes += nr_bytes; > > > > - if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) { > > - nr_pages = stock->nr_bytes >> PAGE_SHIFT; > > - stock->nr_bytes &= (PAGE_SIZE - 1); > > + stock_nr_bytes = (unsigned int)stock->nr_bytes[slot] + nr_bytes; > > + > > + /* nr_bytes[] is uint16_t; flush if we would refill >= U16_MAX. */ > > + if ((allow_uncharge && (stock_nr_bytes > PAGE_SIZE)) || > > + stock_nr_bytes >= U16_MAX) { > > + nr_pages = stock_nr_bytes >> PAGE_SHIFT; > > + stock_nr_bytes &= (PAGE_SIZE - 1); > > } > > + stock->nr_bytes[slot] = stock_nr_bytes; > > > > out: > > if (nr_pages) > > -- > > 2.53.0-Meta > > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-05-19 14:23 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 12:56 [linus:master] [mm] 01b9da291c: stress-ng.switch.ops_per_sec 67.7% regression kernel test robot
2026-05-12 16:03 ` Shakeel Butt
2026-05-13 2:10 ` Qi Zheng
2026-05-13 13:49 ` Shakeel Butt
2026-05-13 14:27 ` Shakeel Butt
2026-05-14 7:46 ` Qi Zheng
2026-05-14 13:40 ` Shakeel Butt
2026-05-15 7:37 ` Qi Zheng
2026-05-15 17:09 ` Shakeel Butt
2026-05-17 12:55 ` Oliver Sang
2026-05-17 19:38 ` Shakeel Butt
[not found] ` <agtATZG9mIlYzMUl@linux.dev>
[not found] ` <agtPMpQK2jXdQAY4@linux.dev>
2026-05-19 5:04 ` Oliver Sang
2026-05-19 14:22 ` Shakeel Butt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox