* [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression
@ 2024-11-28 14:57 kernel test robot
2024-11-28 16:21 ` Peter Zijlstra
` (3 more replies)
0 siblings, 4 replies; 24+ messages in thread
From: kernel test robot @ 2024-11-28 14:57 UTC (permalink / raw)
To: Rik van Riel
Cc: oe-lkp, lkp, linux-kernel, x86, Ingo Molnar, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Mel Gorman, oliver.sang
Hello,
kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on:
commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
[test failed on linux-next/master 6f3d2b5299b0a8bcb8a9405a8d3fceb24f79c4f0]
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:
nr_task: 100%
mode: thread
test: tlb_flush2
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | vm-scalability: vm-scalability.throughput 40.7% regression |
| test machine | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory |
| test parameters | cpufreq_governor=performance |
| | nr_ssd=1 |
| | nr_task=32 |
| | priority=1 |
| | runtime=300 |
| | test=swap-w-seq-mt |
| | thp_defrag=always |
| | thp_enabled=never |
+------------------+------------------------------------------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241128/202411282207.6bd28eae-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20
---------------- ---------------------------
%stddev %change %stddev
\ | \
3743 ± 6% -9.6% 3383 ± 10% numa-meminfo.node1.PageTables
18158 ± 2% -9.9% 16367 ± 2% uptime.idle
36.77 -1.2% 36.34 boot-time.boot
3503 -1.3% 3458 boot-time.idle
1.421e+10 -9.6% 1.284e+10 ± 2% cpuidle..time
2.595e+08 -12.9% 2.26e+08 cpuidle..usage
3.14 ± 4% +27.4% 4.00 ± 6% sched_debug.cfs_rq:/.load_avg.min
1598 ± 3% +53.8% 2458 ± 16% sched_debug.cpu.clock_task.stddev
695438 -8.1% 638790 vmstat.system.cs
4553480 -3.5% 4393928 vmstat.system.in
20954 ± 17% -17.0% 17391 ± 2% perf-c2c.DRAM.remote
18165 ± 17% -17.7% 14957 ± 2% perf-c2c.HITM.remote
44864 ± 17% -15.4% 37953 perf-c2c.HITM.total
44.57 -4.3 40.31 mpstat.cpu.all.idle%
9.85 +6.1 15.94 mpstat.cpu.all.irq%
0.10 +0.0 0.12 mpstat.cpu.all.soft%
2.34 ± 2% -0.3 2.02 mpstat.cpu.all.usr%
1.139e+08 ± 2% -14.5% 97376097 ± 3% numa-numastat.node0.local_node
1.139e+08 ± 2% -14.5% 97404595 ± 3% numa-numastat.node0.numa_hit
1.146e+08 -11.6% 1.013e+08 ± 2% numa-numastat.node1.local_node
1.146e+08 -11.6% 1.013e+08 ± 2% numa-numastat.node1.numa_hit
756738 -13.2% 656838 will-it-scale.104.threads
43.82 -9.5% 39.67 will-it-scale.104.threads_idle
7276 -13.2% 6315 will-it-scale.per_thread_ops
756738 -13.2% 656838 will-it-scale.workload
1.139e+08 ± 2% -14.5% 97404162 ± 3% numa-vmstat.node0.numa_hit
1.139e+08 ± 2% -14.5% 97375664 ± 3% numa-vmstat.node0.numa_local
936.25 ± 6% -9.7% 845.81 ± 10% numa-vmstat.node1.nr_page_table_pages
1.146e+08 -11.6% 1.013e+08 ± 2% numa-vmstat.node1.numa_hit
1.146e+08 -11.6% 1.012e+08 ± 2% numa-vmstat.node1.numa_local
0.17 ± 5% -14.8% 0.14 ± 11% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.13 ± 5% -17.0% 0.11 ± 9% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
41283 ± 5% +22.8% 50696 ± 13% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
8372 ± 13% +22.1% 10221 ± 9% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
0.16 ± 5% -15.3% 0.14 ± 12% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.12 ± 6% -17.5% 0.10 ± 8% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
432388 ± 2% -2.1% 423334 proc-vmstat.nr_active_anon
261209 ± 3% -3.7% 251458 proc-vmstat.nr_shmem
432388 ± 2% -2.1% 423334 proc-vmstat.nr_zone_active_anon
2.286e+08 -13.1% 1.987e+08 proc-vmstat.numa_hit
2.285e+08 -13.1% 1.986e+08 proc-vmstat.numa_local
2.287e+08 -13.0% 1.988e+08 proc-vmstat.pgalloc_normal
4.559e+08 -13.1% 3.962e+08 proc-vmstat.pgfault
2.283e+08 -13.1% 1.985e+08 proc-vmstat.pgfree
5.74 -5.3% 5.43 perf-stat.i.MPKI
5.392e+09 -6.9% 5.019e+09 perf-stat.i.branch-instructions
1.509e+08 -5.8% 1.421e+08 perf-stat.i.branch-misses
24.36 -1.4 22.92 perf-stat.i.cache-miss-rate%
1.538e+08 -12.1% 1.351e+08 perf-stat.i.cache-misses
6.321e+08 -6.4% 5.915e+08 perf-stat.i.cache-references
702183 -8.3% 644080 perf-stat.i.context-switches
6.24 +19.0% 7.42 perf-stat.i.cpi
1.672e+11 +10.1% 1.841e+11 perf-stat.i.cpu-cycles
550.50 +2.6% 565.02 perf-stat.i.cpu-migrations
1085 +25.0% 1356 perf-stat.i.cycles-between-cache-misses
2.683e+10 -7.0% 2.494e+10 perf-stat.i.instructions
0.17 -14.7% 0.14 ± 2% perf-stat.i.ipc
0.00 ±141% +265.0% 0.00 ± 33% perf-stat.i.major-faults
35.60 -12.2% 31.27 perf-stat.i.metric.K/sec
1500379 -13.1% 1304071 perf-stat.i.minor-faults
1500379 -13.1% 1304071 perf-stat.i.page-faults
5.19 ± 44% +42.2% 7.37 perf-stat.overall.cpi
905.91 ± 44% +50.4% 1362 perf-stat.overall.cycles-between-cache-misses
8967486 ± 44% +28.7% 11541403 perf-stat.overall.path-length
1.387e+11 ± 44% +32.2% 1.835e+11 perf-stat.ps.cpu-cycles
457.08 ± 44% +23.1% 562.85 perf-stat.ps.cpu-migrations
70.53 -6.7 63.83 perf-profile.calltrace.cycles-pp.__madvise
68.82 -6.4 62.40 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
68.63 -6.4 62.23 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
68.38 -6.4 62.02 perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
68.36 -6.4 62.01 perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
54.54 -3.9 50.68 perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
54.49 -3.8 50.64 perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
20.77 -3.8 16.93 ± 2% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
48.74 -3.8 44.99 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
42.51 -3.4 39.15 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
42.90 -3.4 39.54 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
43.33 -3.3 40.07 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
12.96 -2.3 10.63 ± 3% perf-profile.calltrace.cycles-pp.down_read.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.32 -2.2 10.12 ± 3% perf-profile.calltrace.cycles-pp.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
9.65 ± 2% -1.9 7.75 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise
9.46 ± 2% -1.9 7.57 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.rwsem_down_read_slowpath.down_read.do_madvise
6.54 ± 2% -1.5 5.08 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
6.29 -1.1 5.16 perf-profile.calltrace.cycles-pp.testcase
4.34 -0.9 3.41 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
3.94 -0.9 3.07 ± 2% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
3.79 -0.8 2.95 ± 2% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
3.74 -0.8 2.91 ± 2% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
4.88 ± 2% -0.8 4.11 ± 3% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.28 ± 2% -0.8 0.52 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
4.63 -0.7 3.92 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
3.82 -0.7 3.13 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
3.34 ± 2% -0.6 2.72 ± 2% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
3.23 ± 2% -0.6 2.63 ± 2% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
5.07 -0.4 4.67 ± 2% perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
5.05 -0.4 4.65 ± 2% perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
3.31 -0.4 2.92 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
3.48 -0.4 3.09 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
3.35 -0.4 2.96 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
3.82 -0.4 3.44 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
4.88 -0.4 4.51 ± 2% perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.zap_page_range_single
2.13 -0.3 1.84 perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
1.64 -0.3 1.38 ± 2% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
1.38 -0.2 1.16 perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
1.40 -0.2 1.18 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.89 ± 6% -0.2 0.69 ± 9% perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
3.23 ± 2% -0.2 3.06 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
1.13 -0.2 0.97 ± 3% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
2.92 ± 2% -0.1 2.79 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
2.85 ± 2% -0.1 2.74 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
2.74 ± 2% -0.1 2.64 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
0.69 -0.1 0.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
0.69 -0.1 0.60 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.70 -0.1 0.60 perf-profile.calltrace.cycles-pp.__munmap
0.69 -0.1 0.60 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.68 -0.1 0.60 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.62 ± 3% -0.1 0.54 perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
0.88 ± 2% -0.1 0.82 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
0.81 ± 2% -0.1 0.75 ± 2% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
1.48 -0.1 1.43 perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read
0.77 ± 2% -0.1 0.72 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.folios_put_refs
0.78 ± 2% -0.1 0.73 ± 2% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.folios_put_refs.free_pages_and_swap_cache
1.48 -0.0 1.43 perf-profile.calltrace.cycles-pp.schedule.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
1.48 -0.0 1.43 perf-profile.calltrace.cycles-pp.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise
0.51 +0.0 0.56 perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.54 +0.1 0.64 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary
0.60 ± 2% +0.1 0.72 ± 3% perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.common_startup_64
2.98 +0.2 3.21 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
2.90 ± 2% +0.2 3.15 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
2.81 ± 2% +0.2 3.06 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
0.72 ± 2% +0.6 1.29 ± 2% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
0.70 ± 2% +0.6 1.28 ± 2% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.00 +0.6 0.63 ± 2% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule_idle.do_idle.cpu_startup_entry
9.12 +1.0 10.12 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.70 ± 2% +1.4 2.06 ± 4% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.intel_idle_irq.cpuidle_enter_state.cpuidle_enter
0.61 ± 2% +1.4 2.00 ± 4% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.intel_idle_irq.cpuidle_enter_state
0.60 ± 2% +1.4 1.99 ± 4% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.intel_idle_irq
19.23 +7.0 26.22 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
19.34 +7.0 26.36 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
20.03 +7.2 27.19 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
21.50 +7.9 29.39 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
21.51 +7.9 29.40 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
21.51 +7.9 29.40 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
21.72 +8.0 29.70 perf-profile.calltrace.cycles-pp.common_startup_64
1.04 +8.2 9.20 ± 2% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state
1.05 +8.2 9.23 ± 2% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter
1.17 +8.2 9.38 ± 2% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
1.28 +8.2 9.51 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.91 +9.3 11.17 perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
70.58 -6.7 63.87 perf-profile.children.cycles-pp.__madvise
69.89 -6.5 63.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
69.68 -6.5 63.20 perf-profile.children.cycles-pp.do_syscall_64
68.38 -6.4 62.02 perf-profile.children.cycles-pp.__x64_sys_madvise
68.37 -6.4 62.01 perf-profile.children.cycles-pp.do_madvise
21.28 -3.9 17.39 ± 2% perf-profile.children.cycles-pp.llist_add_batch
54.54 -3.9 50.68 perf-profile.children.cycles-pp.madvise_vma_behavior
54.50 -3.8 50.65 perf-profile.children.cycles-pp.zap_page_range_single
48.89 -3.8 45.11 perf-profile.children.cycles-pp.tlb_finish_mmu
43.03 -3.4 39.65 perf-profile.children.cycles-pp.smp_call_function_many_cond
43.03 -3.4 39.65 perf-profile.children.cycles-pp.on_each_cpu_cond_mask
43.48 -3.3 40.19 perf-profile.children.cycles-pp.flush_tlb_mm_range
8.38 ± 2% -2.4 5.97 ± 2% perf-profile.children.cycles-pp.intel_idle_irq
12.98 -2.3 10.64 ± 3% perf-profile.children.cycles-pp.down_read
12.41 -2.2 10.19 ± 3% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
9.72 ± 2% -1.9 7.81 ± 3% perf-profile.children.cycles-pp._raw_spin_lock_irq
15.12 -1.7 13.38 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
8.04 -1.4 6.63 perf-profile.children.cycles-pp.llist_reverse_order
6.87 -1.2 5.64 perf-profile.children.cycles-pp.testcase
4.16 -0.7 3.41 perf-profile.children.cycles-pp.asm_exc_page_fault
3.34 ± 2% -0.6 2.73 ± 2% perf-profile.children.cycles-pp.exc_page_fault
3.30 ± 2% -0.6 2.69 ± 2% perf-profile.children.cycles-pp.do_user_addr_fault
5.07 -0.4 4.67 ± 2% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
5.05 -0.4 4.66 ± 2% perf-profile.children.cycles-pp.free_pages_and_swap_cache
5.06 -0.4 4.67 ± 2% perf-profile.children.cycles-pp.folios_put_refs
2.22 -0.3 1.92 perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
1.65 -0.3 1.39 ± 2% perf-profile.children.cycles-pp.handle_mm_fault
1.44 -0.2 1.20 perf-profile.children.cycles-pp.__irqentry_text_end
1.41 -0.2 1.19 ± 2% perf-profile.children.cycles-pp.__handle_mm_fault
0.90 ± 6% -0.2 0.70 ± 9% perf-profile.children.cycles-pp.lock_vma_under_rcu
1.23 -0.2 1.05 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
3.24 -0.2 3.06 perf-profile.children.cycles-pp.__page_cache_release
1.14 -0.2 0.98 ± 3% perf-profile.children.cycles-pp.do_anonymous_page
0.26 ± 5% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.poll_idle
0.79 -0.1 0.66 perf-profile.children.cycles-pp.error_entry
0.62 ± 3% -0.1 0.49 ± 8% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
0.45 ± 8% -0.1 0.33 ± 16% perf-profile.children.cycles-pp.mas_walk
0.88 -0.1 0.76 perf-profile.children.cycles-pp.native_irq_return_iret
0.54 ± 3% -0.1 0.42 ± 5% perf-profile.children.cycles-pp.page_counter_uncharge
0.50 ± 3% -0.1 0.39 ± 4% perf-profile.children.cycles-pp.page_counter_cancel
0.54 ± 3% -0.1 0.43 ± 6% perf-profile.children.cycles-pp.uncharge_batch
0.55 ± 2% -0.1 0.45 ± 3% perf-profile.children.cycles-pp.up_read
0.70 -0.1 0.60 perf-profile.children.cycles-pp.__munmap
0.69 -0.1 0.60 perf-profile.children.cycles-pp.__vm_munmap
0.69 -0.1 0.60 perf-profile.children.cycles-pp.__x64_sys_munmap
0.67 ± 3% -0.1 0.58 perf-profile.children.cycles-pp.unmap_page_range
0.52 ± 2% -0.1 0.44 ± 4% perf-profile.children.cycles-pp.alloc_anon_folio
0.57 -0.1 0.50 ± 4% perf-profile.children.cycles-pp.zap_pmd_range
0.54 -0.1 0.48 ± 2% perf-profile.children.cycles-pp.do_vmi_align_munmap
0.54 -0.1 0.48 ± 2% perf-profile.children.cycles-pp.do_vmi_munmap
0.38 ± 2% -0.1 0.32 perf-profile.children.cycles-pp.vma_alloc_folio_noprof
0.53 ± 3% -0.1 0.47 ± 4% perf-profile.children.cycles-pp.zap_pte_range
1.51 -0.1 1.45 perf-profile.children.cycles-pp.schedule_preempt_disabled
0.52 ± 2% -0.1 0.47 ± 2% perf-profile.children.cycles-pp.vms_complete_munmap_vmas
0.48 -0.1 0.42 perf-profile.children.cycles-pp.native_flush_tlb_local
0.31 ± 2% -0.1 0.26 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
0.27 ± 5% -0.1 0.22 ± 8% perf-profile.children.cycles-pp.tlb_gather_mmu
0.31 ± 2% -0.1 0.26 ± 2% perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
1.51 -0.1 1.46 perf-profile.children.cycles-pp.schedule
0.34 -0.1 0.29 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.32 -0.0 0.27 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.50 -0.0 0.45 perf-profile.children.cycles-pp.dequeue_task_fair
0.40 -0.0 0.35 ± 2% perf-profile.children.cycles-pp.__pte_offset_map_lock
0.48 -0.0 0.43 perf-profile.children.cycles-pp.dequeue_entities
0.28 ± 2% -0.0 0.24 ± 2% perf-profile.children.cycles-pp.__alloc_pages_noprof
0.28 -0.0 0.24 ± 4% perf-profile.children.cycles-pp.lru_gen_del_folio
0.24 ± 5% -0.0 0.20 ± 7% perf-profile.children.cycles-pp.find_vma_prev
0.22 ± 3% -0.0 0.18 ± 4% perf-profile.children.cycles-pp.__perf_sw_event
0.32 -0.0 0.28 perf-profile.children.cycles-pp.irqtime_account_irq
0.24 ± 4% -0.0 0.20 ± 7% perf-profile.children.cycles-pp.down_read_trylock
0.19 ± 3% -0.0 0.15 ± 3% perf-profile.children.cycles-pp.vms_clear_ptes
0.14 ± 9% -0.0 0.10 ± 8% perf-profile.children.cycles-pp.flush_tlb_batched_pending
0.22 -0.0 0.19 ± 2% perf-profile.children.cycles-pp.get_page_from_freelist
0.24 -0.0 0.20 ± 3% perf-profile.children.cycles-pp.sync_regs
0.21 ± 2% -0.0 0.18 ± 4% perf-profile.children.cycles-pp.___perf_sw_event
0.23 ± 3% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.down_write_killable
0.29 -0.0 0.26 perf-profile.children.cycles-pp.dequeue_entity
0.22 ± 3% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.rwsem_down_write_slowpath
0.06 -0.0 0.03 ± 81% perf-profile.children.cycles-pp.__cond_resched
0.29 ± 2% -0.0 0.26 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler
0.27 -0.0 0.24 ± 4% perf-profile.children.cycles-pp.lru_gen_add_folio
0.09 ± 4% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.call_function_single_prep_ipi
0.23 ± 2% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.sched_clock_cpu
0.20 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.native_sched_clock
0.15 ± 4% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.rwsem_mark_wake
0.33 ± 2% -0.0 0.31 perf-profile.children.cycles-pp.downgrade_write
0.14 ± 4% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.34 -0.0 0.32 ± 3% perf-profile.children.cycles-pp.lru_add
0.42 ± 2% -0.0 0.40 ± 2% perf-profile.children.cycles-pp.try_to_wake_up
0.25 -0.0 0.23 perf-profile.children.cycles-pp.update_process_times
0.12 ± 6% -0.0 0.10 ± 9% perf-profile.children.cycles-pp.folio_add_new_anon_rmap
0.20 ± 2% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.sched_clock
0.37 -0.0 0.35 perf-profile.children.cycles-pp.__hrtimer_run_queues
0.14 ± 3% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.update_curr
0.13 ± 5% -0.0 0.11 ± 3% perf-profile.children.cycles-pp.rwsem_optimistic_spin
0.11 -0.0 0.09 ± 4% perf-profile.children.cycles-pp.clear_page_erms
0.15 -0.0 0.13 ± 3% perf-profile.children.cycles-pp.__smp_call_single_queue
0.14 ± 3% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.ktime_get
0.43 -0.0 0.41 perf-profile.children.cycles-pp.hrtimer_interrupt
0.22 ± 2% -0.0 0.20 perf-profile.children.cycles-pp.enqueue_entity
0.14 ± 2% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.__hrtimer_start_range_ns
0.18 ± 2% -0.0 0.16 ± 2% perf-profile.children.cycles-pp.update_load_avg
0.12 ± 3% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.10 ± 3% -0.0 0.09 ± 4% perf-profile.children.cycles-pp.free_unref_folios
0.19 ± 2% -0.0 0.18 perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.08 ± 6% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.rmqueue
0.07 ± 7% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.get_nohz_timer_target
0.09 -0.0 0.08 ± 5% perf-profile.children.cycles-pp.read_tsc
0.14 ± 3% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.idle_cpu
0.08 ± 5% -0.0 0.07 ± 6% perf-profile.children.cycles-pp.prepare_task_switch
0.06 ± 7% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.mm_cid_get
0.07 -0.0 0.06 perf-profile.children.cycles-pp.native_apic_mem_eoi
0.07 -0.0 0.06 perf-profile.children.cycles-pp.rwsem_spin_on_owner
0.16 ± 2% +0.0 0.18 ± 4% perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.05 ± 8% +0.0 0.07 ± 5% perf-profile.children.cycles-pp.__switch_to
0.10 ± 3% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.hrtimer_try_to_cancel
0.66 +0.0 0.68 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.11 ± 3% +0.0 0.14 ± 5% perf-profile.children.cycles-pp.start_dl_timer
0.05 +0.0 0.08 ± 6% perf-profile.children.cycles-pp.task_contending
0.03 ± 70% +0.0 0.07 perf-profile.children.cycles-pp.hrtimer_next_event_without
0.52 +0.0 0.56 perf-profile.children.cycles-pp.menu_select
0.19 ± 5% +0.0 0.23 perf-profile.children.cycles-pp.handle_softirqs
0.18 ± 2% +0.0 0.22 ± 3% perf-profile.children.cycles-pp.enqueue_dl_entity
0.18 ± 3% +0.1 0.23 perf-profile.children.cycles-pp.dl_server_start
0.11 ± 3% +0.1 0.16 ± 2% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
0.46 +0.1 0.52 perf-profile.children.cycles-pp.enqueue_task_fair
0.00 +0.1 0.06 perf-profile.children.cycles-pp.call_cpuidle
0.47 +0.1 0.53 perf-profile.children.cycles-pp.enqueue_task
0.48 +0.1 0.55 perf-profile.children.cycles-pp.ttwu_do_activate
0.62 +0.1 0.70 ± 4% perf-profile.children.cycles-pp._find_next_bit
0.18 ± 2% +0.1 0.26 perf-profile.children.cycles-pp.__sysvec_call_function_single
0.20 +0.1 0.28 ± 3% perf-profile.children.cycles-pp.sysvec_call_function_single
0.64 +0.1 0.72 perf-profile.children.cycles-pp.sched_ttwu_pending
0.22 +0.1 0.30 perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.21 ± 2% +0.1 0.30 ± 8% perf-profile.children.cycles-pp.rest_init
0.21 ± 2% +0.1 0.30 ± 8% perf-profile.children.cycles-pp.start_kernel
0.21 ± 2% +0.1 0.30 ± 8% perf-profile.children.cycles-pp.x86_64_start_kernel
0.21 ± 2% +0.1 0.30 ± 8% perf-profile.children.cycles-pp.x86_64_start_reservations
0.10 ± 3% +0.1 0.20 ± 2% perf-profile.children.cycles-pp.tick_nohz_next_event
0.00 +0.1 0.10 ± 7% perf-profile.children.cycles-pp.__bitmap_and
0.06 ± 6% +0.1 0.16 ± 4% perf-profile.children.cycles-pp.__get_next_timer_interrupt
0.48 +0.1 0.58 ± 2% perf-profile.children.cycles-pp._raw_spin_lock
0.16 ± 3% +0.1 0.28 perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.61 +0.1 0.73 ± 3% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.28 ± 2% +0.1 0.42 ± 4% perf-profile.children.cycles-pp.finish_task_switch
0.00 +0.2 0.15 ± 3% perf-profile.children.cycles-pp.ct_kernel_enter
0.00 +0.2 0.16 ± 3% perf-profile.children.cycles-pp.ct_idle_exit
0.02 ± 99% +0.2 0.24 ± 2% perf-profile.children.cycles-pp.ct_kernel_exit_state
0.37 ± 4% +0.4 0.73 ± 2% perf-profile.children.cycles-pp.switch_mm_irqs_off
2.22 +0.5 2.76 perf-profile.children.cycles-pp.__schedule
0.73 ± 2% +0.6 1.31 ± 2% perf-profile.children.cycles-pp.schedule_idle
9.24 +1.0 10.25 perf-profile.children.cycles-pp.intel_idle
18.85 +6.3 25.12 perf-profile.children.cycles-pp.asm_sysvec_call_function
19.52 +7.1 26.62 perf-profile.children.cycles-pp.cpuidle_enter_state
19.53 +7.1 26.63 perf-profile.children.cycles-pp.cpuidle_enter
20.22 +7.2 27.47 perf-profile.children.cycles-pp.cpuidle_idle_call
14.43 +7.8 22.23 perf-profile.children.cycles-pp.sysvec_call_function
13.73 +7.9 21.59 perf-profile.children.cycles-pp.__sysvec_call_function
21.51 +7.9 29.40 perf-profile.children.cycles-pp.start_secondary
21.72 +8.0 29.69 perf-profile.children.cycles-pp.do_idle
21.72 +8.0 29.70 perf-profile.children.cycles-pp.common_startup_64
21.72 +8.0 29.70 perf-profile.children.cycles-pp.cpu_startup_entry
14.36 +8.1 22.42 perf-profile.children.cycles-pp.__flush_smp_call_function_queue
3.58 +9.4 12.97 perf-profile.children.cycles-pp.flush_tlb_func
7.43 ± 2% -3.8 3.66 perf-profile.self.cycles-pp.intel_idle_irq
16.93 -2.9 13.99 ± 2% perf-profile.self.cycles-pp.llist_add_batch
15.11 -1.7 13.38 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
8.01 -1.4 6.57 perf-profile.self.cycles-pp.llist_reverse_order
1.44 -0.2 1.19 perf-profile.self.cycles-pp.__irqentry_text_end
1.69 -0.2 1.50 perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
0.24 ± 6% -0.1 0.10 ± 10% perf-profile.self.cycles-pp.poll_idle
0.78 -0.1 0.66 perf-profile.self.cycles-pp.error_entry
0.87 -0.1 0.76 perf-profile.self.cycles-pp.native_irq_return_iret
0.65 ± 2% -0.1 0.54 ± 2% perf-profile.self.cycles-pp.testcase
0.46 ± 2% -0.1 0.37 ± 5% perf-profile.self.cycles-pp.down_read
0.38 ± 8% -0.1 0.29 ± 16% perf-profile.self.cycles-pp.mas_walk
0.41 ± 3% -0.1 0.32 ± 4% perf-profile.self.cycles-pp.page_counter_cancel
0.32 ± 12% -0.1 0.24 ± 4% perf-profile.self.cycles-pp.zap_page_range_single
0.46 ± 2% -0.1 0.38 ± 3% perf-profile.self.cycles-pp.up_read
0.28 ± 5% -0.1 0.21 ± 5% perf-profile.self.cycles-pp.tlb_finish_mmu
0.56 -0.1 0.49 ± 2% perf-profile.self.cycles-pp.rwsem_down_read_slowpath
0.32 ± 11% -0.1 0.25 ± 16% perf-profile.self.cycles-pp.lock_vma_under_rcu
0.30 ± 2% -0.1 0.24 perf-profile.self.cycles-pp.menu_select
0.33 ± 6% -0.1 0.27 ± 7% perf-profile.self.cycles-pp.flush_tlb_mm_range
0.46 -0.1 0.41 perf-profile.self.cycles-pp.native_flush_tlb_local
0.34 -0.1 0.29 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.32 -0.0 0.27 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.18 ± 9% -0.0 0.13 ± 12% perf-profile.self.cycles-pp.__handle_mm_fault
0.22 ± 5% -0.0 0.18 ± 7% perf-profile.self.cycles-pp.tlb_gather_mmu
0.12 ± 9% -0.0 0.08 ± 9% perf-profile.self.cycles-pp.flush_tlb_batched_pending
0.24 -0.0 0.20 ± 3% perf-profile.self.cycles-pp.sync_regs
0.14 ± 4% -0.0 0.11 ± 9% perf-profile.self.cycles-pp.do_madvise
0.22 ± 2% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.lru_gen_del_folio
0.21 ± 2% -0.0 0.17 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.13 ± 7% -0.0 0.10 ± 6% perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave
0.19 ± 4% -0.0 0.17 ± 6% perf-profile.self.cycles-pp.down_read_trylock
0.09 ± 5% -0.0 0.06 perf-profile.self.cycles-pp.call_function_single_prep_ipi
0.20 ± 3% -0.0 0.17 perf-profile.self.cycles-pp.native_sched_clock
0.13 ± 4% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.15 ± 2% -0.0 0.13 ± 3% perf-profile.self.cycles-pp.___perf_sw_event
0.11 ± 4% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.do_user_addr_fault
0.22 ± 2% -0.0 0.20 ± 5% perf-profile.self.cycles-pp.lru_gen_add_folio
0.19 ± 3% -0.0 0.17 ± 4% perf-profile.self.cycles-pp.folio_batch_move_lru
0.22 -0.0 0.20 ± 3% perf-profile.self.cycles-pp.folios_put_refs
0.14 ± 3% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.irqtime_account_irq
0.09 ± 5% -0.0 0.07 perf-profile.self.cycles-pp.__madvise
0.12 ± 4% -0.0 0.10 perf-profile.self.cycles-pp.rwsem_mark_wake
0.20 ± 4% -0.0 0.19 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.09 ± 5% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.read_tsc
0.09 -0.0 0.08 ± 5% perf-profile.self.cycles-pp.clear_page_erms
0.07 -0.0 0.06 ± 6% perf-profile.self.cycles-pp.native_apic_mem_eoi
0.06 ± 7% -0.0 0.05 ± 9% perf-profile.self.cycles-pp.mm_cid_get
0.10 -0.0 0.09 perf-profile.self.cycles-pp.asm_sysvec_call_function
0.06 -0.0 0.05 perf-profile.self.cycles-pp.handle_mm_fault
0.05 ± 7% +0.0 0.07 ± 5% perf-profile.self.cycles-pp.__switch_to
0.17 ± 3% +0.0 0.19 ± 2% perf-profile.self.cycles-pp.cpuidle_enter_state
0.06 +0.1 0.11 ± 3% perf-profile.self.cycles-pp.cpuidle_idle_call
0.02 ±141% +0.1 0.07 ± 5% perf-profile.self.cycles-pp.do_idle
0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.call_cpuidle
0.48 +0.1 0.56 ± 4% perf-profile.self.cycles-pp._find_next_bit
0.00 +0.1 0.09 ± 5% perf-profile.self.cycles-pp.__bitmap_and
0.38 ± 3% +0.1 0.49 ± 2% perf-profile.self.cycles-pp._raw_spin_lock
0.28 +0.1 0.39 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.01 ±223% +0.2 0.24 ± 3% perf-profile.self.cycles-pp.ct_kernel_exit_state
0.36 ± 4% +0.4 0.73 ± 2% perf-profile.self.cycles-pp.switch_mm_irqs_off
9.24 +1.0 10.25 perf-profile.self.cycles-pp.intel_idle
15.13 +1.2 16.34 perf-profile.self.cycles-pp.smp_call_function_many_cond
3.07 +9.5 12.52 perf-profile.self.cycles-pp.flush_tlb_func
***************************************************************************************************
lkp-icl-2sp4: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/priority/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled:
gcc-12/performance/x86_64-rhel-9.4/1/32/1/debian-12-x86_64-20240206.cgz/300/lkp-icl-2sp4/swap-w-seq-mt/vm-scalability/always/never
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20
---------------- ---------------------------
%stddev %change %stddev
\ | \
3.016e+10 ± 5% +23.7% 3.732e+10 cpuidle..time
2394210 ± 5% +1598.8% 40671711 cpuidle..usage
280.01 ± 3% +24.8% 349.50 uptime.boot
27411 ± 3% +21.2% 33234 uptime.idle
73.97 -2.3% 72.29 iostat.cpu.idle
23.61 -7.6% 21.82 iostat.cpu.iowait
2.11 ± 2% +169.7% 5.68 ± 4% iostat.cpu.system
0.26 ± 3% -0.0 0.23 ± 2% mpstat.cpu.all.irq%
0.08 -0.0 0.04 ± 4% mpstat.cpu.all.soft%
1.78 ± 2% +3.7 5.45 ± 4% mpstat.cpu.all.sys%
0.31 ± 7% -0.1 0.21 ± 5% mpstat.cpu.all.usr%
16661751 ± 42% -59.8% 6694161 ± 70% numa-numastat.node0.numa_miss
16734663 ± 41% -59.5% 6770170 ± 69% numa-numastat.node0.other_node
26857269 ± 23% -60.2% 10694204 ± 50% numa-numastat.node1.local_node
16665351 ± 42% -59.8% 6694094 ± 70% numa-numastat.node1.numa_foreign
26918278 ± 23% -60.1% 10751098 ± 49% numa-numastat.node1.numa_hit
368.92 ± 36% -55.2% 165.39 ± 74% vmstat.io.bi
409795 -51.0% 200717 ± 3% vmstat.io.bo
4.14 ± 7% +100.0% 8.28 ± 6% vmstat.procs.r
359.98 ± 37% -56.0% 158.48 ± 77% vmstat.swap.si
409786 -51.0% 200710 ± 3% vmstat.swap.so
5382 -28.9% 3825 ± 2% vmstat.system.cs
339018 ± 2% -33.0% 227081 ± 3% vmstat.system.in
54162177 ± 11% -32.5% 36537092 ± 17% meminfo.Active
54162037 ± 11% -32.5% 36536947 ± 17% meminfo.Active(anon)
66576747 ± 9% +24.3% 82748036 ± 9% meminfo.Inactive
66575517 ± 9% +24.3% 82746881 ± 9% meminfo.Inactive(anon)
333831 -11.8% 294280 meminfo.PageTables
33487 ± 3% +199.2% 100210 ± 26% meminfo.Shmem
1.627e+08 +11.4% 1.812e+08 meminfo.SwapFree
1644 ± 11% -41.9% 955.75 ± 10% meminfo.Writeback
12758 ± 8% +171.6% 34650 ± 68% numa-meminfo.node0.Shmem
31281633 ± 6% -37.6% 19515816 ± 25% numa-meminfo.node1.Active
31281573 ± 6% -37.6% 19515752 ± 25% numa-meminfo.node1.Active(anon)
28059820 ± 5% +40.6% 39461616 ± 15% numa-meminfo.node1.Inactive
28059215 ± 5% +40.6% 39461201 ± 15% numa-meminfo.node1.Inactive(anon)
178279 -16.5% 148899 ± 2% numa-meminfo.node1.PageTables
20873 ± 7% +215.2% 65784 ± 13% numa-meminfo.node1.Shmem
1296 ± 6% -49.8% 650.40 ± 9% numa-meminfo.node1.Writeback
38311 ± 5% -40.8% 22667 vm-scalability.median
1234132 ± 4% -40.7% 732265 vm-scalability.throughput
239.68 ± 5% +28.4% 307.63 vm-scalability.time.elapsed_time
239.68 ± 5% +28.4% 307.63 vm-scalability.time.elapsed_time.max
6297 ± 5% +60.9% 10134 vm-scalability.time.involuntary_context_switches
62687446 -24.8% 47163918 ± 2% vm-scalability.time.minor_page_faults
224.00 ± 3% +166.7% 597.33 ± 3% vm-scalability.time.percent_of_cpu_this_job_got
474.22 ± 3% +276.7% 1786 ± 3% vm-scalability.time.system_time
63.58 -17.2% 52.64 ± 3% vm-scalability.time.user_time
347556 ± 6% -26.4% 255772 ± 5% vm-scalability.time.voluntary_context_switches
2.821e+08 -22.0% 2.2e+08 vm-scalability.workload
427.78 ± 20% +93.8% 828.91 ± 64% sched_debug.cfs_rq:/.load_avg.max
13.46 ± 23% -63.0% 4.98 ± 47% sched_debug.cfs_rq:/.removed.load_avg.avg
59.42 ± 17% -48.6% 30.57 ± 34% sched_debug.cfs_rq:/.removed.load_avg.stddev
114.05 ± 16% +23.9% 141.35 ± 2% sched_debug.cfs_rq:/.runnable_avg.stddev
113.70 ± 16% +23.9% 140.92 ± 2% sched_debug.cfs_rq:/.util_avg.stddev
10.59 ± 25% +131.9% 24.56 ± 10% sched_debug.cfs_rq:/.util_est.avg
49.96 ± 28% +71.7% 85.76 ± 5% sched_debug.cfs_rq:/.util_est.stddev
130266 ± 19% +38.9% 180973 ± 8% sched_debug.cpu.clock.avg
130457 ± 19% +38.9% 181208 ± 8% sched_debug.cpu.clock.max
130028 ± 19% +38.9% 180638 ± 8% sched_debug.cpu.clock.min
129816 ± 19% +39.0% 180459 ± 8% sched_debug.cpu.clock_task.avg
130389 ± 19% +38.9% 181122 ± 8% sched_debug.cpu.clock_task.max
121562 ± 20% +40.5% 170799 ± 8% sched_debug.cpu.clock_task.min
573.18 ± 25% +47.9% 847.53 ± 8% sched_debug.cpu.nr_switches.min
4.07 ± 14% +43.8% 5.86 ± 5% sched_debug.cpu.nr_uninterruptible.stddev
130026 ± 19% +38.9% 180621 ± 8% sched_debug.cpu_clk
129318 ± 19% +39.1% 179912 ± 8% sched_debug.ktime
130797 ± 19% +38.7% 181392 ± 8% sched_debug.sched_clk
3197 ± 8% +172.3% 8706 ± 67% numa-vmstat.node0.nr_shmem
4670410 ± 6% -22.0% 3641951 ± 8% numa-vmstat.node0.nr_vmscan_write
9127776 ± 5% -21.5% 7169554 ± 5% numa-vmstat.node0.nr_written
16661751 ± 42% -59.8% 6694161 ± 70% numa-vmstat.node0.numa_miss
16734663 ± 41% -59.5% 6770170 ± 69% numa-vmstat.node0.numa_other
7829198 ± 6% -36.9% 4941489 ± 24% numa-vmstat.node1.nr_active_anon
718935 ± 14% +53.9% 1106567 ± 31% numa-vmstat.node1.nr_free_pages
6977014 ± 5% +39.1% 9704494 ± 15% numa-vmstat.node1.nr_inactive_anon
44508 ± 2% -16.6% 37117 ± 2% numa-vmstat.node1.nr_page_table_pages
5222 ± 8% +218.1% 16612 ± 13% numa-vmstat.node1.nr_shmem
8007802 ± 6% -47.6% 4196794 ± 7% numa-vmstat.node1.nr_vmscan_write
352.06 ± 7% -50.7% 173.73 ± 16% numa-vmstat.node1.nr_writeback
15775556 ± 6% -45.5% 8590752 ± 5% numa-vmstat.node1.nr_written
7829176 ± 6% -36.9% 4941484 ± 24% numa-vmstat.node1.nr_zone_active_anon
6977031 ± 5% +39.1% 9704497 ± 15% numa-vmstat.node1.nr_zone_inactive_anon
346.80 ± 7% -49.9% 173.73 ± 16% numa-vmstat.node1.nr_zone_write_pending
16665351 ± 42% -59.8% 6694094 ± 70% numa-vmstat.node1.numa_foreign
26917054 ± 23% -60.1% 10749940 ± 49% numa-vmstat.node1.numa_hit
26856045 ± 23% -60.2% 10693045 ± 50% numa-vmstat.node1.numa_local
191035 ± 7% -29.3% 135009 ± 4% proc-vmstat.allocstall_movable
3850 ± 11% +78.5% 6872 ± 12% proc-vmstat.allocstall_normal
13525554 ± 10% -32.5% 9125751 ± 17% proc-vmstat.nr_active_anon
16631565 ± 8% +23.8% 20588362 ± 9% proc-vmstat.nr_inactive_anon
83457 -12.1% 73319 proc-vmstat.nr_page_table_pages
8392 ± 3% +198.4% 25047 ± 26% proc-vmstat.nr_shmem
12629057 ± 5% -39.1% 7691618 ± 5% proc-vmstat.nr_vmscan_write
440.92 ± 10% -42.0% 255.71 ± 15% proc-vmstat.nr_writeback
24903332 ± 5% -36.7% 15760306 ± 4% proc-vmstat.nr_written
13525564 ± 10% -32.5% 9125755 ± 17% proc-vmstat.nr_zone_active_anon
16631569 ± 8% +23.8% 20588365 ± 9% proc-vmstat.nr_zone_inactive_anon
443.01 ± 10% -42.0% 257.00 ± 16% proc-vmstat.nr_zone_write_pending
24485570 ± 3% -15.4% 20714438 ± 3% proc-vmstat.numa_foreign
39260606 ± 2% -30.1% 27457969 ± 4% proc-vmstat.numa_hit
39098081 ± 2% -30.1% 27325222 ± 4% proc-vmstat.numa_local
24482446 ± 3% -15.5% 20696329 ± 3% proc-vmstat.numa_miss
24643161 ± 3% -15.5% 20828939 ± 3% proc-vmstat.numa_other
7478080 ± 19% +140.2% 17959948 ± 8% proc-vmstat.numa_pte_updates
63140512 -24.7% 47553512 proc-vmstat.pgalloc_normal
63461017 -24.5% 47896127 ± 2% proc-vmstat.pgfault
64134373 -24.6% 48331932 ± 2% proc-vmstat.pgfree
2796 ± 78% -70.9% 815.00 ± 50% proc-vmstat.pgmigrate_fail
99615377 ± 5% -36.7% 63043276 ± 4% proc-vmstat.pgpgout
34932 ± 3% -7.8% 32198 ± 2% proc-vmstat.pgreuse
21507042 ± 5% -36.0% 13775181 ± 4% proc-vmstat.pgrotated
58427243 ± 10% -43.5% 32993860 ± 12% proc-vmstat.pgscan_anon
44324880 ± 10% -37.2% 27839440 ± 10% proc-vmstat.pgscan_direct
14102763 ± 23% -63.4% 5154838 ± 27% proc-vmstat.pgscan_kswapd
2666 ± 88% -90.7% 248.33 ±137% proc-vmstat.pgskip_normal
24911061 ± 5% -36.7% 15767491 ± 4% proc-vmstat.pgsteal_anon
17074863 ± 8% -25.3% 12754191 ± 5% proc-vmstat.pgsteal_direct
7836517 ± 8% -61.5% 3013661 ± 7% proc-vmstat.pgsteal_kswapd
24903332 ± 5% -36.7% 15760306 ± 4% proc-vmstat.pswpout
78185 ± 27% -82.8% 13463 ± 52% proc-vmstat.workingset_nodereclaim
1.85 ± 4% -31.7% 1.26 perf-stat.i.MPKI
1.992e+09 ± 3% -18.9% 1.615e+09 ± 2% perf-stat.i.branch-instructions
0.93 ± 6% +0.6 1.55 ± 3% perf-stat.i.branch-miss-rate%
14377927 ± 11% +29.7% 18645141 ± 5% perf-stat.i.branch-misses
13.97 ± 3% -9.0 4.95 perf-stat.i.cache-miss-rate%
15782867 ± 3% -34.3% 10364434 ± 2% perf-stat.i.cache-misses
79049148 +92.6% 1.522e+08 ± 2% perf-stat.i.cache-references
5344 -29.2% 3783 ± 2% perf-stat.i.context-switches
1.31 ± 2% +316.3% 5.46 ± 3% perf-stat.i.cpi
8.392e+09 ± 3% +197.0% 2.492e+10 ± 3% perf-stat.i.cpu-cycles
150.26 +14.1% 171.44 ± 3% perf-stat.i.cpu-migrations
737.89 ± 5% +500.7% 4432 ± 4% perf-stat.i.cycles-between-cache-misses
7.732e+09 ± 3% -17.2% 6.405e+09 ± 2% perf-stat.i.instructions
0.80 -69.8% 0.24 ± 5% perf-stat.i.ipc
23.75 ± 27% -52.9% 11.19 ± 69% perf-stat.i.major-faults
2.55 ± 8% -38.4% 1.57 ± 4% perf-stat.i.metric.K/sec
265295 ± 5% -42.5% 152670 ± 2% perf-stat.i.minor-faults
265319 ± 5% -42.5% 152681 ± 2% perf-stat.i.page-faults
2.04 ± 2% -20.6% 1.62 ± 2% perf-stat.overall.MPKI
0.72 ± 12% +0.4 1.15 ± 4% perf-stat.overall.branch-miss-rate%
19.95 ± 2% -13.1 6.84 perf-stat.overall.cache-miss-rate%
1.09 ± 2% +257.6% 3.88 ± 3% perf-stat.overall.cpi
532.42 ± 2% +350.1% 2396 ± 4% perf-stat.overall.cycles-between-cache-misses
0.92 -72.0% 0.26 ± 4% perf-stat.overall.ipc
6551 ± 2% +38.5% 9072 perf-stat.overall.path-length
1.982e+09 ± 3% -18.5% 1.616e+09 perf-stat.ps.branch-instructions
14325844 ± 11% +29.7% 18584702 ± 5% perf-stat.ps.branch-misses
15697779 ± 3% -33.9% 10379452 ± 2% perf-stat.ps.cache-misses
78678984 +93.0% 1.518e+08 ± 2% perf-stat.ps.cache-references
5321 -29.1% 3771 ± 2% perf-stat.ps.context-switches
8.355e+09 ± 3% +197.6% 2.487e+10 ± 3% perf-stat.ps.cpu-cycles
149.59 +14.2% 170.85 ± 3% perf-stat.ps.cpu-migrations
7.693e+09 ± 3% -16.8% 6.404e+09 perf-stat.ps.instructions
23.73 ± 27% -52.9% 11.18 ± 69% perf-stat.ps.major-faults
263785 ± 5% -41.9% 153177 perf-stat.ps.minor-faults
263809 ± 5% -41.9% 153188 perf-stat.ps.page-faults
1.848e+12 ± 2% +8.0% 1.995e+12 perf-stat.total.instructions
0.09 ± 3% +316.6% 0.37 ±135% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.04 ± 15% -34.4% 0.02 ± 16% perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.05 ± 17% +600.7% 0.34 ±172% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.01 ± 26% +2198.6% 0.27 ±152% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.06 ± 8% +41.6% 0.09 ± 17% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.07 ± 8% +36.7% 0.10 ± 15% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1.18 ± 45% +14663.2% 173.84 ±219% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.16 ± 7% +81.8% 0.29 ± 22% perf-sched.sch_delay.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
0.13 ± 13% -55.3% 0.06 ± 83% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.folio_alloc_swap.add_to_swap.shrink_folio_list
0.18 ± 11% +8754.6% 15.60 ±219% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.35 ±107% +2644.1% 256.70 ±154% perf-sched.sch_delay.max.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
0.15 ± 25% +100.0% 0.31 ± 52% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.17 ± 10% +74.6% 0.30 ± 15% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.16 ± 12% +391.4% 0.80 ±148% perf-sched.sch_delay.max.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
0.11 ± 94% +1386.1% 1.62 ± 66% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.15 ± 18% +45.5% 0.22 ± 13% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
87.69 ± 2% +57.5% 138.09 ± 4% perf-sched.total_wait_and_delay.average.ms
87.52 ± 2% +57.6% 137.91 ± 4% perf-sched.total_wait_time.average.ms
7.23 ±142% +493.8% 42.93 ± 11% perf-sched.wait_and_delay.avg.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
89.03 ± 56% -97.9% 1.83 ±152% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.__folio_lock_or_retry.do_swap_page
21.44 ± 3% +88.0% 40.32 ± 7% perf-sched.wait_and_delay.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
383.35 ± 3% +8.2% 414.60 ± 3% perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
40.29 ± 34% +602.5% 283.08 ± 58% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
4.06 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
338.75 ± 23% -65.9% 115.54 ± 72% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
20.91 ± 4% +64.7% 34.43 ± 6% perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
5.97 ± 8% -27.5% 4.33 perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
527.31 ± 2% +20.9% 637.26 ± 10% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
159.81 ± 2% +78.2% 284.75 ± 9% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
640.33 ± 11% +33.5% 854.67 ± 16% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
26.83 ±141% +777.6% 235.50 ± 20% perf-sched.wait_and_delay.count.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
5.00 +43.3% 7.17 ± 12% perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
7206 ± 4% -28.5% 5149 ± 12% perf-sched.wait_and_delay.count.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
8.67 ± 10% +38.5% 12.00 ± 16% perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
160.17 ± 10% -100.0% 0.00 perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
112.17 ± 33% +279.8% 426.00 ± 13% perf-sched.wait_and_delay.count.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
639.00 ± 11% +120.6% 1409 ± 18% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
15.52 ±141% +4400.0% 698.48 ± 63% perf-sched.wait_and_delay.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
3425 ± 44% -99.4% 22.13 ±141% perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.__folio_lock_or_retry.do_swap_page
1212 ± 4% +81.2% 2197 ± 12% perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
6.49 ± 46% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
81.27 ± 26% -60.3% 32.25 ± 60% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
3448 ± 12% +48.4% 5119 ± 23% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
21.71 ± 7% +97.3% 42.83 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
6.41 ± 96% +426.5% 33.77 ± 20% perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
9.59 ± 52% +792.6% 85.58 ± 27% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
88.96 ± 56% -90.5% 8.44 ± 53% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.__folio_lock_or_retry.do_swap_page
21.33 ± 3% +88.1% 40.12 ± 6% perf-sched.wait_time.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
383.33 ± 3% +8.2% 414.58 ± 3% perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
40.28 ± 34% +602.1% 282.81 ± 59% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
3.97 -16.8% 3.30 ± 3% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
338.69 ± 23% -54.8% 153.25 ± 23% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
22.02 ± 23% +462.9% 123.94 ± 26% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
20.81 ± 4% +65.0% 34.33 ± 6% perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
5.87 ± 8% -28.1% 4.22 perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
527.30 ± 2% +20.9% 637.25 ± 10% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
159.22 ± 2% +78.8% 284.72 ± 9% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
42.83 ± 9% +1530.6% 698.38 ± 63% perf-sched.wait_time.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
12.83 ± 82% +344.6% 57.05 ± 10% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
124.59 ± 77% +333.7% 540.35 ± 31% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
1212 ± 4% +81.2% 2197 ± 12% perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
220.10 ± 74% +283.8% 844.67 ± 11% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
81.17 ± 26% -60.4% 32.15 ± 60% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
3385 ± 10% +51.2% 5119 ± 23% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
79.77 -10.1 69.65 perf-profile.calltrace.cycles-pp.do_access
77.33 -7.8 69.51 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
7.43 ± 2% -6.8 0.66 ± 13% perf-profile.calltrace.cycles-pp.add_to_swap.shrink_folio_list.evict_folios.try_to_shrink_lruvec.shrink_one
6.76 ± 5% -5.8 0.95 ± 5% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty
6.24 ± 2% -5.7 0.58 ± 12% perf-profile.calltrace.cycles-pp.folio_alloc_swap.add_to_swap.shrink_folio_list.evict_folios.try_to_shrink_lruvec
5.73 ± 4% -5.6 0.17 ±141% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty
74.64 -5.4 69.25 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
74.54 -5.3 69.25 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
5.79 ± 3% -5.2 0.55 ± 11% perf-profile.calltrace.cycles-pp.__mem_cgroup_try_charge_swap.folio_alloc_swap.add_to_swap.shrink_folio_list.evict_folios
5.51 ± 4% -5.1 0.37 ± 72% perf-profile.calltrace.cycles-pp.do_rw_once
73.45 -4.3 69.17 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.92 ± 2% -3.5 0.44 ± 44% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty
72.77 -2.9 69.91 perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
74.31 -1.9 72.37 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +0.6 0.59 ± 7% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry
0.00 +0.7 0.66 ± 5% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state
0.00 +0.7 0.69 ± 5% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter
0.00 +0.8 0.81 ± 26% perf-profile.calltrace.cycles-pp.__get_user_pages.get_user_pages_remote.get_arg_page.copy_string_kernel.do_execveat_common
0.00 +0.8 0.81 ± 26% perf-profile.calltrace.cycles-pp.copy_string_kernel.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.8 0.81 ± 26% perf-profile.calltrace.cycles-pp.get_arg_page.copy_string_kernel.do_execveat_common.__x64_sys_execve.do_syscall_64
0.00 +0.8 0.81 ± 26% perf-profile.calltrace.cycles-pp.get_user_pages_remote.get_arg_page.copy_string_kernel.do_execveat_common.__x64_sys_execve
0.00 +0.8 0.81 ± 26% perf-profile.calltrace.cycles-pp.handle_mm_fault.__get_user_pages.get_user_pages_remote.get_arg_page.copy_string_kernel
0.00 +0.8 0.81 ± 12% perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
0.00 +0.8 0.81 ± 12% perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
0.00 +0.8 0.81 ± 12% perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve
0.00 +0.8 0.81 ± 12% perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.9 0.95 ± 19% perf-profile.calltrace.cycles-pp._Fork
0.08 ±223% +1.0 1.06 ± 12% perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.wp_page_copy
0.08 ±223% +1.0 1.06 ± 12% perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.wp_page_copy.__handle_mm_fault
0.08 ±223% +1.0 1.06 ± 12% perf-profile.calltrace.cycles-pp.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.wp_page_copy.__handle_mm_fault.handle_mm_fault
0.08 ±223% +1.0 1.06 ± 12% perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.09 ±223% +1.0 1.06 ± 12% perf-profile.calltrace.cycles-pp.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.00 +1.3 1.26 ± 7% perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.00 +1.4 1.36 ± 30% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__get_user_pages.get_user_pages_remote.get_arg_page
0.00 +1.6 1.64 ± 6% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
1.21 ± 46% +2.0 3.22 ± 29% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
1.20 ± 46% +2.0 3.22 ± 29% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
1.20 ± 46% +2.0 3.22 ± 29% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
1.18 ± 47% +2.0 3.22 ± 29% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +2.1 2.09 ± 7% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.30 ±100% +2.2 2.48 ± 11% perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% perf-profile.calltrace.cycles-pp.execve
67.34 +2.3 69.67 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
67.27 +2.4 69.67 perf-profile.calltrace.cycles-pp.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
67.22 +2.4 69.66 perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page
67.12 +2.5 69.64 perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.alloc_anon_folio
5.78 +3.1 8.85 ± 6% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
5.78 +3.1 8.85 ± 6% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
5.78 +3.1 8.85 ± 6% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
2.21 ± 6% +3.2 5.36 ± 11% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
4.87 +3.7 8.62 ± 6% perf-profile.calltrace.cycles-pp.balance_pgdat.kswapd.kthread.ret_from_fork.ret_from_fork_asm
4.87 +3.7 8.62 ± 6% perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork.ret_from_fork_asm
4.87 +3.7 8.62 ± 6% perf-profile.calltrace.cycles-pp.shrink_many.shrink_node.balance_pgdat.kswapd.kthread
4.87 +3.7 8.62 ± 6% perf-profile.calltrace.cycles-pp.shrink_node.balance_pgdat.kswapd.kthread.ret_from_fork
4.87 +3.7 8.62 ± 6% perf-profile.calltrace.cycles-pp.shrink_one.shrink_many.shrink_node.balance_pgdat.kswapd
4.87 +3.8 8.62 ± 6% perf-profile.calltrace.cycles-pp.try_to_shrink_lruvec.shrink_one.shrink_many.shrink_node.balance_pgdat
66.53 +4.1 70.63 perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
6.72 ± 2% +4.5 11.21 ± 9% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
6.72 ± 2% +4.5 11.21 ± 9% perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
6.72 ± 2% +4.5 11.20 ± 9% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
6.94 ± 2% +4.6 11.50 ± 9% perf-profile.calltrace.cycles-pp.common_startup_64
3.57 ± 2% +5.1 8.64 ± 9% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
3.65 ± 2% +5.5 9.14 ± 9% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
4.52 ± 2% +6.3 10.82 ± 8% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
64.31 +6.9 71.18 perf-profile.calltrace.cycles-pp.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof
64.17 +7.4 71.56 perf-profile.calltrace.cycles-pp.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_noprof.alloc_pages_mpol_noprof
64.15 +7.4 71.56 perf-profile.calltrace.cycles-pp.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_noprof
62.53 +9.0 71.48 perf-profile.calltrace.cycles-pp.shrink_many.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath
62.50 +9.0 71.48 perf-profile.calltrace.cycles-pp.shrink_one.shrink_many.shrink_node.do_try_to_free_pages.try_to_free_pages
62.03 +9.4 71.45 perf-profile.calltrace.cycles-pp.try_to_shrink_lruvec.shrink_one.shrink_many.shrink_node.do_try_to_free_pages
66.79 +13.3 80.06 perf-profile.calltrace.cycles-pp.evict_folios.try_to_shrink_lruvec.shrink_one.shrink_many.shrink_node
63.11 +16.6 79.70 perf-profile.calltrace.cycles-pp.shrink_folio_list.evict_folios.try_to_shrink_lruvec.shrink_one.shrink_many
42.45 ± 2% +35.3 77.74 ± 2% perf-profile.calltrace.cycles-pp.try_to_unmap_flush_dirty.shrink_folio_list.evict_folios.try_to_shrink_lruvec.shrink_one
42.43 ± 2% +35.3 77.74 ± 2% perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_folio_list.evict_folios.try_to_shrink_lruvec
42.34 ± 2% +35.4 77.73 ± 2% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_folio_list.evict_folios
41.73 ± 2% +35.9 77.58 ± 2% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_folio_list
15.56 ± 2% -12.5 3.03 ± 4% perf-profile.children.cycles-pp.asm_sysvec_call_function
80.28 -10.4 69.87 perf-profile.children.cycles-pp.do_access
11.47 ± 4% -10.1 1.37 ± 4% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
10.79 ± 4% -9.5 1.32 ± 4% perf-profile.children.cycles-pp.__sysvec_call_function
11.76 ± 3% -9.4 2.34 ± 3% perf-profile.children.cycles-pp.sysvec_call_function
8.04 ± 2% -7.2 0.79 ± 11% perf-profile.children.cycles-pp.add_to_swap
7.85 ± 4% -6.7 1.19 ± 7% perf-profile.children.cycles-pp.llist_add_batch
6.84 ± 2% -6.2 0.69 ± 9% perf-profile.children.cycles-pp.folio_alloc_swap
6.38 ± 2% -5.7 0.65 ± 8% perf-profile.children.cycles-pp.__mem_cgroup_try_charge_swap
5.83 ± 7% -5.3 0.57 ± 50% perf-profile.children.cycles-pp.rmap_walk_anon
5.72 ± 4% -5.1 0.62 ± 15% perf-profile.children.cycles-pp.do_rw_once
5.03 ± 6% -4.6 0.47 ± 4% perf-profile.children.cycles-pp.flush_tlb_func
4.76 ± 4% -4.3 0.41 ±143% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
4.83 ± 2% -4.1 0.73 ± 3% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
4.27 ± 7% -3.9 0.34 ± 9% perf-profile.children.cycles-pp.try_to_unmap
4.31 ± 3% -3.9 0.40 ± 17% perf-profile.children.cycles-pp.pageout
4.46 ± 2% -3.8 0.64 ± 4% perf-profile.children.cycles-pp.llist_reverse_order
78.01 -3.6 74.42 perf-profile.children.cycles-pp.asm_exc_page_fault
3.88 ± 8% -3.6 0.30 ± 6% perf-profile.children.cycles-pp.try_to_unmap_one
3.94 ± 3% -3.6 0.37 ± 17% perf-profile.children.cycles-pp.swap_writepage
3.10 ± 4% -2.7 0.36 ± 77% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
73.30 -2.3 70.98 perf-profile.children.cycles-pp.do_anonymous_page
2.48 ± 6% -2.3 0.19 ± 10% perf-profile.children.cycles-pp.get_page_from_freelist
2.36 ± 7% -2.1 0.25 ± 20% perf-profile.children.cycles-pp.swap_cgroup_record
2.31 ± 3% -2.1 0.25 ± 5% perf-profile.children.cycles-pp.page_counter_try_charge
2.39 -2.1 0.34 ± 5% perf-profile.children.cycles-pp.native_irq_return_iret
2.26 ± 5% -2.0 0.25 ±115% perf-profile.children.cycles-pp.folio_batch_move_lru
76.12 -2.0 74.16 perf-profile.children.cycles-pp.exc_page_fault
76.08 -1.9 74.15 perf-profile.children.cycles-pp.do_user_addr_fault
2.20 ± 4% -1.9 0.34 ± 12% perf-profile.children.cycles-pp._raw_spin_lock
1.85 ± 3% -1.8 0.09 ± 10% perf-profile.children.cycles-pp.native_flush_tlb_local
2.09 ± 2% -1.7 0.36 ± 23% perf-profile.children.cycles-pp.handle_softirqs
1.76 ± 6% -1.5 0.22 ± 8% perf-profile.children.cycles-pp.__pte_offset_map_lock
1.78 ± 8% -1.5 0.25 ±103% perf-profile.children.cycles-pp.folio_referenced
1.68 ± 3% -1.5 0.19 ± 36% perf-profile.children.cycles-pp.blk_complete_reqs
1.57 ± 6% -1.5 0.12 ± 31% perf-profile.children.cycles-pp.flush_smp_call_function_queue
1.54 ± 15% -1.4 0.11 ± 8% perf-profile.children.cycles-pp.set_tlb_ubc_flush_pending
1.61 ± 3% -1.4 0.18 ± 34% perf-profile.children.cycles-pp.scsi_end_request
1.61 ± 3% -1.4 0.18 ± 34% perf-profile.children.cycles-pp.scsi_io_completion
1.57 ± 4% -1.4 0.16 ± 11% perf-profile.children.cycles-pp.__mem_cgroup_charge
1.48 ± 4% -1.3 0.16 ± 35% perf-profile.children.cycles-pp.blk_update_request
1.44 ± 8% -1.3 0.13 ± 12% perf-profile.children.cycles-pp.__swap_writepage
1.38 ± 7% -1.3 0.08 ± 40% perf-profile.children.cycles-pp.__remove_mapping
1.33 ± 6% -1.3 0.07 ± 40% perf-profile.children.cycles-pp.do_softirq
1.32 ± 11% -1.2 0.08 ± 6% perf-profile.children.cycles-pp.rmqueue
1.34 ± 11% -1.2 0.12 ± 15% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
1.43 ± 7% -1.2 0.23 ±106% perf-profile.children.cycles-pp.__folio_batch_add_and_move
1.18 ± 12% -1.1 0.06 ± 7% perf-profile.children.cycles-pp.__rmqueue_pcplist
1.25 ± 4% -1.1 0.14 ± 50% perf-profile.children.cycles-pp.isolate_folios
1.24 ± 4% -1.1 0.14 ± 48% perf-profile.children.cycles-pp.scan_folios
1.19 ± 6% -1.1 0.12 ± 10% perf-profile.children.cycles-pp.try_charge_memcg
1.12 ± 13% -1.1 0.06 ± 9% perf-profile.children.cycles-pp.rmqueue_bulk
1.18 ± 4% -1.1 0.12 ± 25% perf-profile.children.cycles-pp.submit_bio_noacct_nocheck
1.25 ± 9% -1.0 0.20 ±122% perf-profile.children.cycles-pp.folio_referenced_one
1.14 ± 7% -1.0 0.11 ± 6% perf-profile.children.cycles-pp.mem_cgroup_id_get_online
1.13 ± 3% -1.0 0.12 ± 42% perf-profile.children.cycles-pp.end_swap_bio_write
1.08 ± 5% -1.0 0.09 ± 22% perf-profile.children.cycles-pp.add_to_swap_cache
1.09 ± 4% -1.0 0.11 ± 26% perf-profile.children.cycles-pp.__submit_bio
1.10 ± 3% -1.0 0.12 ± 43% perf-profile.children.cycles-pp.folio_end_writeback
1.06 ± 4% -1.0 0.11 ± 24% perf-profile.children.cycles-pp.blk_mq_submit_bio
1.04 ± 6% -0.9 0.12 ± 6% perf-profile.children.cycles-pp._find_next_bit
1.00 ± 2% -0.9 0.11 ± 47% perf-profile.children.cycles-pp.isolate_folio
0.96 ± 12% -0.9 0.08 ± 34% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.94 ± 3% -0.9 0.08 ± 38% perf-profile.children.cycles-pp.page_vma_mapped_walk
1.28 ± 9% -0.8 0.46 ± 15% perf-profile.children.cycles-pp.__irq_exit_rcu
0.95 ± 4% -0.8 0.15 ± 17% perf-profile.children.cycles-pp.__schedule
0.85 ± 3% -0.7 0.13 ± 8% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.75 ± 4% -0.7 0.08 ± 17% perf-profile.children.cycles-pp.sync_regs
1.16 ± 4% -0.6 0.52 ± 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.70 ± 4% -0.6 0.08 ± 61% perf-profile.children.cycles-pp.lru_gen_del_folio
0.70 ± 5% -0.6 0.08 ± 52% perf-profile.children.cycles-pp.lru_gen_add_folio
0.66 ± 8% -0.6 0.05 ± 48% perf-profile.children.cycles-pp.__folio_start_writeback
1.08 ± 4% -0.6 0.47 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.69 ± 7% -0.6 0.11 ± 18% perf-profile.children.cycles-pp.schedule
0.75 ± 6% -0.6 0.18 ± 19% perf-profile.children.cycles-pp.worker_thread
0.65 ± 11% -0.5 0.10 ± 20% perf-profile.children.cycles-pp.__drain_all_pages
0.64 ± 4% -0.5 0.11 ± 6% perf-profile.children.cycles-pp.sysvec_call_function_single
0.64 ± 16% -0.5 0.12 ± 25% perf-profile.children.cycles-pp.asm_common_interrupt
0.64 ± 16% -0.5 0.12 ± 25% perf-profile.children.cycles-pp.common_interrupt
0.54 ± 6% -0.5 0.04 ± 75% perf-profile.children.cycles-pp.blk_mq_sched_dispatch_requests
0.54 ± 6% -0.5 0.04 ± 75% perf-profile.children.cycles-pp.__blk_mq_sched_dispatch_requests
0.53 ± 6% -0.5 0.04 ± 73% perf-profile.children.cycles-pp.__blk_mq_do_dispatch_sched
0.56 ± 12% -0.5 0.07 ± 23% perf-profile.children.cycles-pp.drain_pages_zone
0.54 ± 4% -0.5 0.06 ± 19% perf-profile.children.cycles-pp.__blk_flush_plug
0.52 ± 7% -0.5 0.03 ± 70% perf-profile.children.cycles-pp.lock_vma_under_rcu
0.54 ± 4% -0.5 0.06 ± 19% perf-profile.children.cycles-pp.blk_mq_flush_plug_list
0.54 ± 3% -0.5 0.06 ± 19% perf-profile.children.cycles-pp.blk_mq_dispatch_plug_list
0.51 ± 10% -0.4 0.08 ± 24% perf-profile.children.cycles-pp.free_pcppages_bulk
0.49 ± 7% -0.4 0.08 ± 17% perf-profile.children.cycles-pp.__pick_next_task
0.45 ± 5% -0.4 0.04 ± 75% perf-profile.children.cycles-pp.__rq_qos_throttle
0.62 ± 4% -0.4 0.22 ± 8% perf-profile.children.cycles-pp.irqtime_account_irq
0.44 ± 5% -0.4 0.04 ± 75% perf-profile.children.cycles-pp.wbt_wait
0.42 ± 6% -0.4 0.04 ± 75% perf-profile.children.cycles-pp.rq_qos_wait
0.42 ± 5% -0.4 0.04 ± 45% perf-profile.children.cycles-pp.bio_alloc_bioset
0.66 ± 3% -0.3 0.31 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.65 ± 3% -0.3 0.31 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
0.48 ± 3% -0.3 0.15 ± 8% perf-profile.children.cycles-pp.sched_clock_cpu
0.40 ± 10% -0.3 0.06 ± 17% perf-profile.children.cycles-pp.pick_next_task_fair
0.46 ± 6% -0.3 0.14 ± 23% perf-profile.children.cycles-pp.process_one_work
0.38 ± 10% -0.3 0.06 ± 19% perf-profile.children.cycles-pp.sched_balance_newidle
0.44 ± 6% -0.3 0.12 ± 15% perf-profile.children.cycles-pp.tick_nohz_stop_tick
0.56 ± 3% -0.3 0.24 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.42 ± 10% -0.3 0.11 ± 13% perf-profile.children.cycles-pp.sched_balance_rq
0.42 ± 4% -0.3 0.12 ± 7% perf-profile.children.cycles-pp.sched_clock
0.45 ± 6% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.tick_nohz_idle_stop_tick
0.37 ± 9% -0.3 0.09 ± 12% perf-profile.children.cycles-pp.sched_balance_find_src_group
0.50 ± 3% -0.3 0.23 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler
0.36 ± 8% -0.3 0.09 ± 13% perf-profile.children.cycles-pp.update_sd_lb_stats
0.31 ± 8% -0.3 0.04 ± 71% perf-profile.children.cycles-pp.tlb_is_not_lazy
0.33 ± 11% -0.3 0.08 ± 15% perf-profile.children.cycles-pp.update_sg_lb_stats
0.44 ± 4% -0.2 0.20 ± 4% perf-profile.children.cycles-pp.update_process_times
0.30 ± 7% -0.2 0.07 ± 10% perf-profile.children.cycles-pp.error_entry
0.29 ± 4% -0.2 0.09 ± 5% perf-profile.children.cycles-pp.irq_work_run_list
0.39 ± 6% -0.2 0.19 ± 3% perf-profile.children.cycles-pp.native_sched_clock
0.28 ± 5% -0.2 0.09 ± 5% perf-profile.children.cycles-pp.__sysvec_irq_work
0.28 ± 5% -0.2 0.09 ± 5% perf-profile.children.cycles-pp._printk
0.28 ± 5% -0.2 0.09 ± 5% perf-profile.children.cycles-pp.asm_sysvec_irq_work
0.28 ± 5% -0.2 0.09 ± 5% perf-profile.children.cycles-pp.irq_work_run
0.28 ± 5% -0.2 0.09 ± 5% perf-profile.children.cycles-pp.irq_work_single
0.28 ± 5% -0.2 0.09 ± 5% perf-profile.children.cycles-pp.sysvec_irq_work
0.28 ± 5% -0.2 0.09 ± 10% perf-profile.children.cycles-pp.console_flush_all
0.28 ± 5% -0.2 0.09 ± 10% perf-profile.children.cycles-pp.console_unlock
0.28 ± 5% -0.2 0.09 ± 10% perf-profile.children.cycles-pp.vprintk_emit
0.28 ± 4% -0.2 0.09 ± 7% perf-profile.children.cycles-pp.serial8250_console_write
0.28 ± 5% -0.2 0.09 ± 9% perf-profile.children.cycles-pp.wait_for_lsr
0.23 ± 15% -0.2 0.06 ± 65% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
0.24 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
0.24 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.drm_fb_helper_damage_work
0.24 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.drm_fbdev_shmem_helper_fb_dirty
0.24 ± 7% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.drm_atomic_commit
0.24 ± 7% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.drm_atomic_helper_commit
0.24 ± 7% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail
0.24 ± 7% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update
0.24 ± 7% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.commit_tail
0.24 ± 7% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
0.24 ± 7% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail
0.24 ± 7% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.drm_fb_memcpy
0.23 ± 8% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.memcpy_toio
0.19 ± 11% -0.1 0.06 ± 13% perf-profile.children.cycles-pp.io_serial_in
0.19 ± 7% -0.1 0.06 ± 98% perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
0.20 ± 10% -0.1 0.12 ± 6% perf-profile.children.cycles-pp.sched_tick
0.11 ± 13% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.sched_balance_domains
0.10 ± 4% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.sched_core_idle_cpu
0.14 ± 5% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.irqentry_enter
0.09 ± 14% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.clockevents_program_event
0.09 ± 11% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.task_tick_fair
0.03 ± 70% +0.0 0.08 ± 11% perf-profile.children.cycles-pp.read_tsc
0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.00 +0.1 0.06 ± 14% perf-profile.children.cycles-pp.ct_kernel_exit
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.hrtimer_get_next_event
0.00 +0.1 0.06 ± 14% perf-profile.children.cycles-pp.nr_iowait_cpu
0.00 +0.1 0.07 ± 7% perf-profile.children.cycles-pp.tmigr_cpu_new_timer
0.00 +0.1 0.07 ± 10% perf-profile.children.cycles-pp.irq_work_needs_cpu
0.00 +0.1 0.08 ± 11% perf-profile.children.cycles-pp.get_cpu_device
0.21 ± 9% +0.1 0.29 ± 10% perf-profile.children.cycles-pp.rest_init
0.21 ± 9% +0.1 0.29 ± 10% perf-profile.children.cycles-pp.start_kernel
0.21 ± 9% +0.1 0.29 ± 10% perf-profile.children.cycles-pp.x86_64_start_kernel
0.21 ± 9% +0.1 0.29 ± 10% perf-profile.children.cycles-pp.x86_64_start_reservations
0.00 +0.1 0.09 ± 13% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.00 +0.1 0.09 ± 18% perf-profile.children.cycles-pp.intel_idle_irq
0.01 ±223% +0.1 0.10 ± 32% perf-profile.children.cycles-pp.load_elf_interp
0.00 +0.1 0.10 ± 12% perf-profile.children.cycles-pp.ct_kernel_enter
0.00 +0.1 0.10 ± 15% perf-profile.children.cycles-pp.tsc_verify_tsc_adjust
0.12 ± 9% +0.1 0.22 ± 8% perf-profile.children.cycles-pp.ktime_get
0.00 +0.1 0.10 ± 7% perf-profile.children.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.00 +0.1 0.11 ± 14% perf-profile.children.cycles-pp.arch_cpu_idle_enter
0.00 +0.1 0.11 ± 15% perf-profile.children.cycles-pp.tick_nohz_stop_idle
0.01 ±223% +0.1 0.14 ± 47% perf-profile.children.cycles-pp._IO_setvbuf
0.00 +0.1 0.13 ± 9% perf-profile.children.cycles-pp.ct_idle_exit
0.01 ±223% +0.1 0.14 ± 83% perf-profile.children.cycles-pp._copy_to_iter
0.02 ±142% +0.1 0.16 ± 8% perf-profile.children.cycles-pp.cpuidle_governor_latency_req
0.01 ±223% +0.1 0.15 ± 8% perf-profile.children.cycles-pp.local_clock_noinstr
0.01 ±223% +0.2 0.16 ± 40% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.01 ±223% +0.2 0.16 ± 40% perf-profile.children.cycles-pp.rseq_ip_fixup
0.08 ± 41% +0.2 0.23 ± 24% perf-profile.children.cycles-pp.write
0.01 ±223% +0.2 0.16 ± 39% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.07 ± 63% +0.2 0.23 ± 24% perf-profile.children.cycles-pp.ksys_write
0.06 ± 60% +0.2 0.23 ± 24% perf-profile.children.cycles-pp.vfs_write
0.00 +0.2 0.17 ± 33% perf-profile.children.cycles-pp.copy_p4d_range
0.00 +0.2 0.17 ± 33% perf-profile.children.cycles-pp.copy_page_range
0.00 +0.2 0.18 ± 32% perf-profile.children.cycles-pp.dup_mmap
0.12 ± 15% +0.2 0.30 ± 6% perf-profile.children.cycles-pp.__get_next_timer_interrupt
0.00 +0.2 0.18 ± 38% perf-profile.children.cycles-pp.__do_fault
0.00 +0.2 0.18 ± 14% perf-profile.children.cycles-pp.__pmd_alloc
0.01 ±223% +0.2 0.20 ± 36% perf-profile.children.cycles-pp.__libc_fork
0.02 ±141% +0.2 0.21 ±130% perf-profile.children.cycles-pp.__cmd_record
0.04 ± 72% +0.2 0.27 ± 30% perf-profile.children.cycles-pp.dup_mm
0.07 ± 16% +0.2 0.30 ± 13% perf-profile.children.cycles-pp.elf_load
0.05 ± 82% +0.3 0.31 ± 44% perf-profile.children.cycles-pp.schedule_tail
0.15 ± 16% +0.3 0.41 ± 17% perf-profile.children.cycles-pp.__vfork
0.14 ± 17% +0.3 0.41 ± 17% perf-profile.children.cycles-pp.__x64_sys_vfork
0.03 ±101% +0.3 0.30 ± 13% perf-profile.children.cycles-pp.rep_stos_alternative
0.04 ± 71% +0.3 0.32 ± 19% perf-profile.children.cycles-pp.poll_idle
0.01 ±223% +0.3 0.29 ± 30% perf-profile.children.cycles-pp.___kmalloc_large_node
0.01 ±223% +0.3 0.29 ± 30% perf-profile.children.cycles-pp.__kmalloc_large_node_noprof
0.01 ±223% +0.3 0.29 ± 30% perf-profile.children.cycles-pp.__kmalloc_node_noprof
0.04 ±112% +0.3 0.32 ± 41% perf-profile.children.cycles-pp.__put_user_4
0.12 ± 26% +0.3 0.42 ± 17% perf-profile.children.cycles-pp.alloc_pages_bulk_noprof
0.09 ± 28% +0.3 0.40 ± 37% perf-profile.children.cycles-pp.__p4d_alloc
0.09 ± 28% +0.3 0.40 ± 37% perf-profile.children.cycles-pp.get_zeroed_page_noprof
0.10 ± 21% +0.3 0.43 ± 39% perf-profile.children.cycles-pp.__x64_sys_openat
0.10 ± 19% +0.3 0.43 ± 39% perf-profile.children.cycles-pp.do_sys_openat2
0.01 ±223% +0.3 0.34 ± 26% perf-profile.children.cycles-pp.__kvmalloc_node_noprof
0.01 ±223% +0.3 0.34 ± 26% perf-profile.children.cycles-pp.single_open_size
0.09 ± 22% +0.3 0.43 ± 39% perf-profile.children.cycles-pp.do_filp_open
0.09 ± 22% +0.3 0.43 ± 39% perf-profile.children.cycles-pp.path_openat
0.12 ± 6% +0.3 0.47 ± 8% perf-profile.children.cycles-pp.irq_enter_rcu
0.04 ± 45% +0.3 0.39 ± 34% perf-profile.children.cycles-pp.perf_evlist__poll
0.04 ± 45% +0.3 0.39 ± 34% perf-profile.children.cycles-pp.perf_evlist__poll_thread
0.04 ± 44% +0.4 0.39 ± 34% perf-profile.children.cycles-pp.perf_poll
0.04 ± 45% +0.4 0.40 ± 33% perf-profile.children.cycles-pp.do_poll
0.04 ± 45% +0.4 0.40 ± 33% perf-profile.children.cycles-pp.__poll
0.04 ± 45% +0.4 0.40 ± 33% perf-profile.children.cycles-pp.__x64_sys_poll
0.04 ± 45% +0.4 0.40 ± 33% perf-profile.children.cycles-pp.do_sys_poll
0.02 ±141% +0.4 0.38 ± 32% perf-profile.children.cycles-pp.do_open
0.02 ±141% +0.4 0.38 ± 32% perf-profile.children.cycles-pp.vfs_open
0.07 ± 14% +0.4 0.44 ± 9% perf-profile.children.cycles-pp.tick_irq_enter
0.02 ±141% +0.4 0.39 ± 34% perf-profile.children.cycles-pp.__pollwait
0.01 ±223% +0.4 0.38 ± 32% perf-profile.children.cycles-pp.do_dentry_open
0.10 ± 11% +0.4 0.47 ± 7% perf-profile.children.cycles-pp.tick_nohz_next_event
0.04 ± 72% +0.4 0.43 ± 10% perf-profile.children.cycles-pp.alloc_new_pud
0.22 ± 21% +0.4 0.62 ± 7% perf-profile.children.cycles-pp.do_pte_missing
0.06 ± 51% +0.4 0.47 ± 15% perf-profile.children.cycles-pp.setup_arg_pages
0.01 ±223% +0.4 0.42 ± 40% perf-profile.children.cycles-pp.open64
0.06 ± 50% +0.4 0.47 ± 15% perf-profile.children.cycles-pp.relocate_vma_down
0.05 ± 73% +0.4 0.47 ± 15% perf-profile.children.cycles-pp.move_page_tables
0.14 ± 10% +0.5 0.61 ± 7% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.10 ± 13% +0.5 0.56 ± 23% perf-profile.children.cycles-pp.__do_sys_clone
0.21 ± 26% +0.5 0.74 ± 29% perf-profile.children.cycles-pp.get_free_pages_noprof
0.16 ± 17% +0.5 0.69 ± 13% perf-profile.children.cycles-pp.alloc_thread_stack_node
0.16 ± 18% +0.5 0.70 ± 15% perf-profile.children.cycles-pp.dup_task_struct
0.08 ± 41% +0.5 0.62 ± 19% perf-profile.children.cycles-pp.copy_strings
0.15 ± 20% +0.6 0.73 ± 16% perf-profile.children.cycles-pp.__vmalloc_area_node
0.16 ± 17% +0.6 0.74 ± 15% perf-profile.children.cycles-pp.__vmalloc_node_range_noprof
0.19 ± 19% +0.6 0.81 ± 12% perf-profile.children.cycles-pp.bprm_execve
0.18 ± 20% +0.6 0.81 ± 12% perf-profile.children.cycles-pp.exec_binprm
0.18 ± 20% +0.6 0.81 ± 12% perf-profile.children.cycles-pp.search_binary_handler
0.18 ± 21% +0.6 0.81 ± 12% perf-profile.children.cycles-pp.load_elf_binary
0.10 ± 54% +0.7 0.81 ± 26% perf-profile.children.cycles-pp.copy_string_kernel
0.24 ± 10% +0.7 0.98 ± 15% perf-profile.children.cycles-pp.kernel_clone
0.23 ± 12% +0.7 0.98 ± 15% perf-profile.children.cycles-pp.copy_process
0.16 ± 22% +0.8 0.95 ± 19% perf-profile.children.cycles-pp._Fork
0.09 ± 28% +0.9 0.96 ± 16% perf-profile.children.cycles-pp.__pud_alloc
0.32 ± 8% +0.9 1.27 ± 7% perf-profile.children.cycles-pp.menu_select
0.18 ± 38% +1.3 1.43 ± 20% perf-profile.children.cycles-pp.get_arg_page
0.18 ± 37% +1.3 1.43 ± 20% perf-profile.children.cycles-pp.__get_user_pages
0.18 ± 37% +1.3 1.43 ± 20% perf-profile.children.cycles-pp.get_user_pages_remote
0.39 ± 43% +1.4 1.82 ± 8% perf-profile.children.cycles-pp.wp_page_copy
0.53 ± 15% +1.9 2.48 ± 11% perf-profile.children.cycles-pp.do_execveat_common
0.53 ± 14% +1.9 2.48 ± 11% perf-profile.children.cycles-pp.execve
0.53 ± 15% +1.9 2.48 ± 11% perf-profile.children.cycles-pp.__x64_sys_execve
5.78 +3.1 8.85 ± 6% perf-profile.children.cycles-pp.kthread
2.28 ± 5% +3.2 5.48 ± 11% perf-profile.children.cycles-pp.intel_idle
5.84 +3.3 9.16 ± 7% perf-profile.children.cycles-pp.ret_from_fork
5.84 +3.4 9.20 ± 6% perf-profile.children.cycles-pp.ret_from_fork_asm
1.46 ± 7% +3.5 4.97 ± 8% perf-profile.children.cycles-pp.do_syscall_64
1.46 ± 7% +3.5 4.97 ± 8% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
4.87 +3.7 8.62 ± 6% perf-profile.children.cycles-pp.balance_pgdat
4.87 +3.7 8.62 ± 6% perf-profile.children.cycles-pp.kswapd
68.16 +4.2 72.34 perf-profile.children.cycles-pp.vma_alloc_folio_noprof
6.72 ± 2% +4.5 11.21 ± 9% perf-profile.children.cycles-pp.start_secondary
6.94 ± 2% +4.6 11.50 ± 9% perf-profile.children.cycles-pp.common_startup_64
6.94 ± 2% +4.6 11.50 ± 9% perf-profile.children.cycles-pp.cpu_startup_entry
6.93 ± 2% +4.6 11.50 ± 9% perf-profile.children.cycles-pp.do_idle
68.51 +5.0 73.50 perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
3.78 +5.5 9.32 ± 9% perf-profile.children.cycles-pp.cpuidle_enter_state
3.80 +5.6 9.39 ± 9% perf-profile.children.cycles-pp.cpuidle_enter
4.70 +6.4 11.09 ± 8% perf-profile.children.cycles-pp.cpuidle_idle_call
69.40 +7.4 76.85 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
69.44 +8.1 77.55 perf-profile.children.cycles-pp.__alloc_pages_noprof
68.72 +8.7 77.46 perf-profile.children.cycles-pp.__alloc_pages_slowpath
65.97 +11.3 77.23 perf-profile.children.cycles-pp.try_to_free_pages
65.66 +11.5 77.18 perf-profile.children.cycles-pp.do_try_to_free_pages
70.51 +15.3 85.80 perf-profile.children.cycles-pp.shrink_node
68.95 +16.8 85.72 perf-profile.children.cycles-pp.shrink_many
68.92 +16.8 85.71 perf-profile.children.cycles-pp.shrink_one
68.42 +17.3 85.68 perf-profile.children.cycles-pp.try_to_shrink_lruvec
68.37 +17.3 85.67 perf-profile.children.cycles-pp.evict_folios
64.64 +20.7 85.30 ± 2% perf-profile.children.cycles-pp.shrink_folio_list
43.46 +39.8 83.30 ± 2% perf-profile.children.cycles-pp.try_to_unmap_flush_dirty
43.44 +39.9 83.30 ± 2% perf-profile.children.cycles-pp.arch_tlbbatch_flush
43.35 +40.0 83.33 ± 2% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
43.34 +40.0 83.33 ± 2% perf-profile.children.cycles-pp.smp_call_function_many_cond
5.95 ± 4% -4.9 1.04 ± 7% perf-profile.self.cycles-pp.llist_add_batch
4.70 ± 4% -4.3 0.41 ±142% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
4.45 ± 2% -3.8 0.64 ± 4% perf-profile.self.cycles-pp.llist_reverse_order
4.31 ± 5% -3.8 0.53 ± 14% perf-profile.self.cycles-pp.do_rw_once
3.65 ± 2% -3.0 0.66 ± 4% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
3.14 ± 9% -2.8 0.36 ± 4% perf-profile.self.cycles-pp.flush_tlb_func
2.56 ± 5% -2.3 0.30 ± 16% perf-profile.self.cycles-pp.do_access
2.35 ± 4% -2.1 0.27 ± 6% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
2.39 -2.1 0.34 ± 5% perf-profile.self.cycles-pp.native_irq_return_iret
1.83 ± 3% -1.7 0.09 ± 10% perf-profile.self.cycles-pp.native_flush_tlb_local
1.92 ± 3% -1.7 0.24 ± 4% perf-profile.self.cycles-pp.page_counter_try_charge
1.69 ± 4% -1.4 0.31 ± 11% perf-profile.self.cycles-pp._raw_spin_lock
1.12 ± 14% -1.0 0.10 ± 6% perf-profile.self.cycles-pp.set_tlb_ubc_flush_pending
0.99 ± 6% -0.9 0.10 ± 10% perf-profile.self.cycles-pp.try_to_unmap_one
0.98 ± 7% -0.9 0.12 ± 11% perf-profile.self.cycles-pp.try_charge_memcg
0.94 ± 5% -0.8 0.10 ± 4% perf-profile.self.cycles-pp.mem_cgroup_id_get_online
0.75 ± 13% -0.7 0.07 ± 34% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.75 ± 5% -0.7 0.09 ± 27% perf-profile.self.cycles-pp.shrink_folio_list
0.74 ± 4% -0.7 0.08 ± 17% perf-profile.self.cycles-pp.sync_regs
0.76 ± 9% -0.7 0.10 ± 9% perf-profile.self.cycles-pp._find_next_bit
0.63 ± 3% -0.6 0.07 ± 16% perf-profile.self.cycles-pp.swap_writepage
0.57 ± 9% -0.5 0.04 ± 72% perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
0.47 ± 7% -0.4 0.02 ± 99% perf-profile.self.cycles-pp.rmqueue_bulk
0.56 ± 4% -0.4 0.13 ± 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.48 ± 6% -0.4 0.05 ± 7% perf-profile.self.cycles-pp.swap_cgroup_record
0.45 ± 6% -0.4 0.04 ±115% perf-profile.self.cycles-pp.lru_gen_add_folio
0.46 ± 2% -0.4 0.05 ± 90% perf-profile.self.cycles-pp.lru_gen_del_folio
0.40 ± 6% -0.4 0.04 ± 71% perf-profile.self.cycles-pp.get_page_from_freelist
0.38 ± 3% -0.3 0.04 ± 45% perf-profile.self.cycles-pp.do_anonymous_page
0.28 ± 8% -0.2 0.07 ± 12% perf-profile.self.cycles-pp.error_entry
0.38 ± 6% -0.2 0.19 ± 5% perf-profile.self.cycles-pp.native_sched_clock
0.25 ± 10% -0.2 0.06 ± 13% perf-profile.self.cycles-pp.update_sg_lb_stats
0.23 ± 7% -0.1 0.08 ± 10% perf-profile.self.cycles-pp.memcpy_toio
0.19 ± 11% -0.1 0.06 ± 13% perf-profile.self.cycles-pp.io_serial_in
0.16 ± 8% -0.1 0.06 ± 17% perf-profile.self.cycles-pp.asm_sysvec_call_function
0.11 ± 11% -0.1 0.04 ± 44% perf-profile.self.cycles-pp.irqentry_enter
0.17 ± 8% -0.1 0.10 ± 8% perf-profile.self.cycles-pp.irqtime_account_irq
0.09 ± 7% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.sched_core_idle_cpu
0.03 ± 70% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.read_tsc
0.00 +0.1 0.05 ± 8% perf-profile.self.cycles-pp.__hrtimer_next_event_base
0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.tick_nohz_stop_tick
0.00 +0.1 0.06 ± 14% perf-profile.self.cycles-pp.cpuidle_enter
0.07 ± 12% +0.1 0.14 ± 11% perf-profile.self.cycles-pp.ktime_get
0.00 +0.1 0.06 ± 14% perf-profile.self.cycles-pp.nr_iowait_cpu
0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.irq_work_needs_cpu
0.00 +0.1 0.07 ± 11% perf-profile.self.cycles-pp.tsc_verify_tsc_adjust
0.00 +0.1 0.07 ± 13% perf-profile.self.cycles-pp.ct_kernel_enter
0.00 +0.1 0.07 ± 10% perf-profile.self.cycles-pp.tick_nohz_next_event
0.00 +0.1 0.08 ± 10% perf-profile.self.cycles-pp.get_cpu_device
0.00 +0.1 0.09 ± 4% perf-profile.self.cycles-pp.tick_irq_enter
0.01 ±223% +0.1 0.10 ± 9% perf-profile.self.cycles-pp.__get_next_timer_interrupt
0.00 +0.1 0.10 ± 11% perf-profile.self.cycles-pp.cpuidle_idle_call
0.00 +0.1 0.10 ± 9% perf-profile.self.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.02 ± 99% +0.3 0.30 ± 19% perf-profile.self.cycles-pp.poll_idle
0.13 ± 8% +0.4 0.50 ± 8% perf-profile.self.cycles-pp.menu_select
0.11 ± 13% +0.6 0.69 ± 9% perf-profile.self.cycles-pp.cpuidle_enter_state
2.28 ± 5% +3.2 5.48 ± 11% perf-profile.self.cycles-pp.intel_idle
24.40 +56.1 80.53 ± 2% perf-profile.self.cycles-pp.smp_call_function_many_cond
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression
2024-11-28 14:57 [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression kernel test robot
@ 2024-11-28 16:21 ` Peter Zijlstra
2024-11-29 1:44 ` Oliver Sang
2024-11-28 19:46 ` Mathieu Desnoyers
` (2 subsequent siblings)
3 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2024-11-28 16:21 UTC (permalink / raw)
To: kernel test robot
Cc: Rik van Riel, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Mel Gorman
On Thu, Nov 28, 2024 at 10:57:35PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on:
>
>
> commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
>
> [test failed on linux-next/master 6f3d2b5299b0a8bcb8a9405a8d3fceb24f79c4f0]
>
> testcase: will-it-scale
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 104 threads 2 sockets (Skylake) with 192G memory
> parameters:
>
> nr_task: 100%
> mode: thread
> test: tlb_flush2
> cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | vm-scalability: vm-scalability.throughput 40.7% regression |
> | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory |
> | test parameters | cpufreq_governor=performance |
> | | nr_ssd=1 |
> | | nr_task=32 |
> | | priority=1 |
> | | runtime=300 |
> | | test=swap-w-seq-mt |
> | | thp_defrag=always |
> | | thp_enabled=never |
> +------------------+------------------------------------------------------------------------------------------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20241128/202411282207.6bd28eae-lkp@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale
>
> commit:
> 7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
> 209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
I got a bit lost in the actual jobs descriptions. Are you running this
test with or without affinity? AFAICT will-it-scale itself defaults to
being affine (-n is No Affinity).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression
2024-11-28 14:57 [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression kernel test robot
2024-11-28 16:21 ` Peter Zijlstra
@ 2024-11-28 19:46 ` Mathieu Desnoyers
2024-11-29 2:52 ` Rik van Riel
2024-12-02 16:50 ` Dave Hansen
2024-12-03 0:43 ` [PATCH] x86,mm: only trim the mm_cpumask once a second Rik van Riel
2024-12-03 1:22 ` [PATCH -tip] x86,mm: only " Rik van Riel
3 siblings, 2 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2024-11-28 19:46 UTC (permalink / raw)
To: Peter Zijlstra, Rik van Riel
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Mel Gorman
On 28-Nov-2024 10:57:35 PM, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on:
>
>
> commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
AFAIU, this commit changes the way TLB flushes are inhibited when
context switching away from a mm. This means that one additional TLB
flush is performed to a given CPU even after it has context switched
away from the mm, and only then is the mm_cpumask cleared for that CPU.
This could result in additional TLB flush IPI overhead in specific
scenarios where the IPIs are typically triggered after a thread has
context-switched out.
May I recommend looking into a scheme similar to rseq mm_cid for this ?
We're already adding a per-mm per-cpu data:
mm_struct:
/**
* @pcpu_cid: Per-cpu current cid.
*
* Keep track of the currently allocated mm_cid for each cpu.
* The per-cpu mm_cid values are serialized by their respective
* runqueue locks.
*/
struct mm_cid __percpu *pcpu_cid;
struct mm_cid {
u64 time;
int cid;
int recent_cid;
};
I suspect you could use a similar per-cpu data structure per-mm
to keep track of the pending TLB flush mask, and update it simply with
load/store to per-CPU data rather than have to cacheline-bounce all over
the place due to frequent mm_cpumask atomic updates.
Then you get all the benefits without introducing a window where useless
TLB flush IPIs get triggered.
Of course it's slightly less compact in terms of memory footprint than a
cpumask, but you gain a lot by removing cache line bouncing on this
frequent context switch code path.
Thoughts ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression
2024-11-28 16:21 ` Peter Zijlstra
@ 2024-11-29 1:44 ` Oliver Sang
0 siblings, 0 replies; 24+ messages in thread
From: Oliver Sang @ 2024-11-29 1:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rik van Riel, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Mel Gorman, oliver.sang
hi, Peter Zijlstra,
On Thu, Nov 28, 2024 at 05:21:28PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 28, 2024 at 10:57:35PM +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on:
> >
> >
> > commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
> > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
> >
[...]
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20241128/202411282207.6bd28eae-lkp@intel.com
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> > gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale
> >
> > commit:
> > 7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
> > 209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
>
> I got a bit lost in the actual jobs descriptions. Are you running this
> test with or without affinity? AFAICT will-it-scale itself defaults to
> being affine (-n is No Affinity).
with affinity. we don't change this default setting.
the command is
python3 ./runtest.py tlb_flush2 295 thread 0 0 104
the first '0' keeps affinity as 'yes'
print('Usage: runtest.py <testcase> <duration> <mode> <no_affinity> <smt> <threads...>', file=sys.stderr)
sys.exit(1)
...
affinity = " "
if int(sys.argv[4]) == 1:
affinity = " -n "
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression
2024-11-28 19:46 ` Mathieu Desnoyers
@ 2024-11-29 2:52 ` Rik van Riel
2024-12-02 16:30 ` Mathieu Desnoyers
2024-12-02 16:50 ` Dave Hansen
1 sibling, 1 reply; 24+ messages in thread
From: Rik van Riel @ 2024-11-29 2:52 UTC (permalink / raw)
To: Mathieu Desnoyers, Peter Zijlstra
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Mel Gorman
On Thu, 2024-11-28 at 14:46 -0500, Mathieu Desnoyers wrote:
>
> I suspect you could use a similar per-cpu data structure per-mm
> to keep track of the pending TLB flush mask, and update it simply
> with
> load/store to per-CPU data rather than have to cacheline-bounce all
> over
> the place due to frequent mm_cpumask atomic updates.
>
> Then you get all the benefits without introducing a window where
> useless
> TLB flush IPIs get triggered.
>
> Of course it's slightly less compact in terms of memory footprint
> than a
> cpumask, but you gain a lot by removing cache line bouncing on this
> frequent context switch code path.
>
> Thoughts ?
The first thought that comes to mind is that we already
have a per-CPU variable indicating which is the currently
loaded mm on that CPU.
We could probably just skip sending IPIs to CPUs that do
not have the mm_struct currently loaded.
This can race against switch_mm_irqs_off() on a CPU
switching to that mm simultaneously with the TLB flush,
which should be fine because that CPU cannot load TLB
entries from previously cleared page tables.
However, it does mean we cannot safely clear bits
out of the mm_cpumask, because a race between clearing
the bit on one CPU, and setting it on another would not
be something we could easily catch at all, unless we
can figure out some clever memory ordering thing there.
--
All Rights Reversed.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression
2024-11-29 2:52 ` Rik van Riel
@ 2024-12-02 16:30 ` Mathieu Desnoyers
2024-12-02 18:10 ` Rik van Riel
0 siblings, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2024-12-02 16:30 UTC (permalink / raw)
To: Rik van Riel, Peter Zijlstra
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Mel Gorman
On 2024-11-28 21:52, Rik van Riel wrote:
> On Thu, 2024-11-28 at 14:46 -0500, Mathieu Desnoyers wrote:
>>
>> I suspect you could use a similar per-cpu data structure per-mm
>> to keep track of the pending TLB flush mask, and update it simply
>> with
>> load/store to per-CPU data rather than have to cacheline-bounce all
>> over
>> the place due to frequent mm_cpumask atomic updates.
>>
>> Then you get all the benefits without introducing a window where
>> useless
>> TLB flush IPIs get triggered.
>>
>> Of course it's slightly less compact in terms of memory footprint
>> than a
>> cpumask, but you gain a lot by removing cache line bouncing on this
>> frequent context switch code path.
>>
>> Thoughts ?
>
> The first thought that comes to mind is that we already
> have a per-CPU variable indicating which is the currently
> loaded mm on that CPU.
Only on x86 though.
>
> We could probably just skip sending IPIs to CPUs that do
> not have the mm_struct currently loaded.
>
> This can race against switch_mm_irqs_off() on a CPU
> switching to that mm simultaneously with the TLB flush,
> which should be fine because that CPU cannot load TLB
> entries from previously cleared page tables.
>
> However, it does mean we cannot safely clear bits
> out of the mm_cpumask, because a race between clearing
> the bit on one CPU, and setting it on another would not
> be something we could easily catch at all, unless we
> can figure out some clever memory ordering thing there.
>
Or we just build a per-cpu mm_cpumask from per-CPU state
every time we want to use the mm_cpumask. But AFAIU this
is going to be a tradeoff between:
- Overhead of context switch at scale
(e.g. will-it-scale:)
for a in $(seq 1 2); do (./context_switch1_threads -t 192 -s 20 &); done
For reference, my POC reaches 50% performance improvement with this.
vs
- Overhead of TLB flush
(e.g. will-it-scale:)
./tlb_flush2_threads -t 192 -s 20
For reference, my POC has about 33% regression on that test case due
to extra work when using mm_cpumask.
So I guess what we end up doing really depends which scenario we consider
most frequent.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression
2024-11-28 19:46 ` Mathieu Desnoyers
2024-11-29 2:52 ` Rik van Riel
@ 2024-12-02 16:50 ` Dave Hansen
1 sibling, 0 replies; 24+ messages in thread
From: Dave Hansen @ 2024-12-02 16:50 UTC (permalink / raw)
To: Mathieu Desnoyers, Peter Zijlstra, Rik van Riel
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Linus Torvalds, Mel Gorman
On 11/28/24 11:46, Mathieu Desnoyers wrote:
> AFAIU, this commit changes the way TLB flushes are inhibited when
> context switching away from a mm. This means that one additional TLB
> flush is performed to a given CPU even after it has context switched
> away from the mm, and only then is the mm_cpumask cleared for that CPU.
>
> This could result in additional TLB flush IPI overhead in specific
> scenarios where the IPIs are typically triggered after a thread has
> context-switched out.
I can see how that might generally be a problem, but for this particular
workload:
> https://github.com/antonblanchard/will-it-scale/blob/master/tests/tlb_flush2.c
I'm not sure how it would apply.
will-it-scale should create one big process that runs for quite a long
while and is bound to one CPU. The threads can get scheduled out as
other things run, but they should pop right back on to the CPU.
There shouldn't be a lot of context switching in this workload.
It would be great if someone could reproduce this and double-check that
the theory about what's making it regress is really holding in practice.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression
2024-12-02 16:30 ` Mathieu Desnoyers
@ 2024-12-02 18:10 ` Rik van Riel
0 siblings, 0 replies; 24+ messages in thread
From: Rik van Riel @ 2024-12-02 18:10 UTC (permalink / raw)
To: Mathieu Desnoyers, Peter Zijlstra
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Mel Gorman
On Mon, 2024-12-02 at 11:30 -0500, Mathieu Desnoyers wrote:
>
> Or we just build a per-cpu mm_cpumask from per-CPU state
> every time we want to use the mm_cpumask. But AFAIU this
> is going to be a tradeoff between:
>
> - Overhead of context switch at scale
>
> vs
>
> - Overhead of TLB flush
>
>
> So I guess what we end up doing really depends which scenario we
> consider
> most frequent.
>
I think that is going to be more workload dependent than
anything else.
If you're doing a kernel compile, or running a bunch of
shell scripts and simple Unix commands, you are dealing
mostly with single threaded programs, where not sending
IPIs is the best thing to do.
If you're running a long-lived, heavily multithreaded
program, you will benefit from reducing the context
switch overhead more than anything else.
Both seem like equally valid use cases.
I'm playing around with a patch now that builds on
my previous patches, but only trims the mm_cpumask
once a second.
Hopefully that can give us a reasonable medium between
the two.
--
All Rights Reversed.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH] x86,mm: only trim the mm_cpumask once a second
2024-11-28 14:57 [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression kernel test robot
2024-11-28 16:21 ` Peter Zijlstra
2024-11-28 19:46 ` Mathieu Desnoyers
@ 2024-12-03 0:43 ` Rik van Riel
2024-12-04 13:15 ` Oliver Sang
2024-12-03 1:22 ` [PATCH -tip] x86,mm: only " Rik van Riel
3 siblings, 1 reply; 24+ messages in thread
From: Rik van Riel @ 2024-12-03 0:43 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, linux-kernel, x86, Ingo Molnar, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Mel Gorman, Oliver Sang,
Mathieu Desnoyers
On Thu, 28 Nov 2024 22:57:35 +0800
kernel test robot <oliver.sang@intel.com> wrote:
> Hello,
>
> kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on:
>
>
> commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
The patch below should fix the will-it-scale performance regression,
while still allowing us to keep the lazy mm_cpumask updates that help
workloads in other ways.
I do not have the same hardware as the Intel guys have access to, and
could only test this on one two socket system, but hopefully this
provides a simple (enough) compromise that allows us to keep both
the lazier context switch code, and a limited mm_cpumask to keep
TLB flushing work bounded.
---8<---
From b639c1f16ddf4bcfc44dbaa2b8077220f88b1876 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@fb.com>
Date: Mon, 2 Dec 2024 09:57:31 -0800
Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second
Setting and clearing CPU bits in the mm_cpumask is only ever done
by the CPU itself, from the context switch code or the TLB flush
code.
Synchronization is handled by switch_mm_irqs_off blocking interrupts.
Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
longer running the program causes a regression in the will-it-scale
tlbflush2 test. This test is contrived, but a large regression here
might cause a small regression in some real world workload.
Instead of always sending IPIs to CPUs that are in the mm_cpumask,
but no longer running the program, send these IPIs only once a second.
The rest of the time we can skip over CPUs where the loaded_mm is
different from the target mm.
On a two socket system with 20 CPU cores on each socket (80 CPUs total),
this patch, on top of the other context switch patches shows a 3.6%
speedup in the total runtime of will-it-scale tlbflush2 -t 40 -s 100000.
Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: kernel test roboto <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
---
arch/x86/include/asm/mmu.h | 2 ++
arch/x86/include/asm/mmu_context.h | 1 +
arch/x86/mm/tlb.c | 25 ++++++++++++++++++++++---
3 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index ce4677b8b735..2c7e3855b88b 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -37,6 +37,8 @@ typedef struct {
*/
atomic64_t tlb_gen;
+ unsigned long last_trimmed_cpumask;
+
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct rw_semaphore ldt_usr_sem;
struct ldt_struct *ldt;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 8dac45a2c7fc..428fd190477a 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -145,6 +145,7 @@ static inline int init_new_context(struct task_struct *tsk,
mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
atomic64_set(&mm->context.tlb_gen, 0);
+ mm->context.last_trimmed_cpumask = jiffies;
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index fcea29e07eed..0ce5f2ed7825 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -766,6 +766,7 @@ static void flush_tlb_func(void *info)
*/
if (f->mm && f->mm != loaded_mm) {
cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
+ f->mm->context.last_trimmed_cpumask = jiffies;
return;
}
}
@@ -897,9 +898,27 @@ static void flush_tlb_func(void *info)
nr_invalidate);
}
-static bool tlb_is_not_lazy(int cpu, void *data)
+static bool should_flush_tlb(int cpu, void *data)
{
- return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
+ struct flush_tlb_info *info = data;
+
+ /* Lazy TLB will get flushed at the next context switch. */
+ if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
+ return false;
+
+ /* No mm means kernel memory flush. */
+ if (!info->mm)
+ return true;
+
+ /* The target mm is loaded, and the CPU is not lazy. */
+ if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+ return true;
+
+ /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
+ if (jiffies > info->mm->context.last_trimmed_cpumask + HZ)
+ return true;
+
+ return false;
}
DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
@@ -933,7 +952,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
if (info->freed_tables)
on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
else
- on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
+ on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
(void *)info, 1, cpumask);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH -tip] x86,mm: only trim the mm_cpumask once a second
2024-11-28 14:57 [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression kernel test robot
` (2 preceding siblings ...)
2024-12-03 0:43 ` [PATCH] x86,mm: only trim the mm_cpumask once a second Rik van Riel
@ 2024-12-03 1:22 ` Rik van Riel
2024-12-03 14:57 ` Mathieu Desnoyers
3 siblings, 1 reply; 24+ messages in thread
From: Rik van Riel @ 2024-12-03 1:22 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, linux-kernel, x86, Ingo Molnar, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Mel Gorman, Mathieu Desnoyers
On Thu, 28 Nov 2024 22:57:35 +0800
kernel test robot <oliver.sang@intel.com> wrote:
> Hello,
>
> kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on:
>
>
> commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
[UGH - of course I mailed out the version I tested with, rather than
the version that I merged into -tip. Here's the right one.]
The patch below should fix the will-it-scale performance regression,
while still allowing us to keep the lazy mm_cpumask updates that help
workloads in other ways.
I do not have the same hardware as the Intel guys have access to, and
could only test this on one two socket system, but hopefully this
provides a simple (enough) compromise that allows us to keep both
the lazier context switch code, and a limited mm_cpumask to keep
TLB flushing work bounded.
---8<---
From dec4a588077563b86dbfe547737018b881e1f6c2 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@fb.com>
Date: Mon, 2 Dec 2024 09:57:31 -0800
Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second
Setting and clearing CPU bits in the mm_cpumask is only ever done
by the CPU itself, from the context switch code or the TLB flush
code.
Synchronization is handled by switch_mm_irqs_off blocking interrupts.
Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
longer running the program causes a regression in the will-it-scale
tlbflush2 test. This test is contrived, but a large regression here
might cause a small regression in some real world workload.
Instead of always sending IPIs to CPUs that are in the mm_cpumask,
but no longer running the program, send these IPIs only once a second.
The rest of the time we can skip over CPUs where the loaded_mm is
different from the target mm.
Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: kernel test roboto <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
---
arch/x86/include/asm/mmu.h | 2 ++
arch/x86/include/asm/mmu_context.h | 1 +
arch/x86/mm/tlb.c | 25 ++++++++++++++++++++++---
3 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index ce4677b8b735..2c7e3855b88b 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -37,6 +37,8 @@ typedef struct {
*/
atomic64_t tlb_gen;
+ unsigned long last_trimmed_cpumask;
+
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct rw_semaphore ldt_usr_sem;
struct ldt_struct *ldt;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2886cb668d7f..086af641d19a 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
atomic64_set(&mm->context.tlb_gen, 0);
+ mm->context.last_trimmed_cpumask = jiffies;
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1aac4fa90d3d..19ae8ca34cb8 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -761,6 +761,7 @@ static void flush_tlb_func(void *info)
if (f->mm && f->mm != loaded_mm) {
cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
+ f->mm->context.last_trimmed_cpumask = jiffies;
return;
}
}
@@ -892,9 +893,27 @@ static void flush_tlb_func(void *info)
nr_invalidate);
}
-static bool tlb_is_not_lazy(int cpu, void *data)
+static bool should_flush_tlb(int cpu, void *data)
{
- return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
+ struct flush_tlb_info *info = data;
+
+ /* Lazy TLB will get flushed at the next context switch. */
+ if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
+ return false;
+
+ /* No mm means kernel memory flush. */
+ if (!info->mm)
+ return true;
+
+ /* The target mm is loaded, and the CPU is not lazy. */
+ if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+ return true;
+
+ /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
+ if (jiffies > info->mm->context.last_trimmed_cpumask + HZ)
+ return true;
+
+ return false;
}
DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
@@ -928,7 +947,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
if (info->freed_tables)
on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
else
- on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
+ on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
(void *)info, 1, cpumask);
}
--
2.47.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH -tip] x86,mm: only trim the mm_cpumask once a second
2024-12-03 1:22 ` [PATCH -tip] x86,mm: only " Rik van Riel
@ 2024-12-03 14:57 ` Mathieu Desnoyers
2024-12-03 19:48 ` [PATCH v2] " Rik van Riel
0 siblings, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2024-12-03 14:57 UTC (permalink / raw)
To: Rik van Riel, kernel test robot
Cc: oe-lkp, lkp, linux-kernel, x86, Ingo Molnar, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Mel Gorman
On 2024-12-02 20:22, Rik van Riel wrote:
> On Thu, 28 Nov 2024 22:57:35 +0800
> kernel test robot <oliver.sang@intel.com> wrote:
>
>> Hello,
>>
>> kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on:
>>
>>
>> commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
>> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
>
> [UGH - of course I mailed out the version I tested with, rather than
> the version that I merged into -tip. Here's the right one.]
>
> The patch below should fix the will-it-scale performance regression,
> while still allowing us to keep the lazy mm_cpumask updates that help
> workloads in other ways.
>
> I do not have the same hardware as the Intel guys have access to, and
> could only test this on one two socket system, but hopefully this
> provides a simple (enough) compromise that allows us to keep both
> the lazier context switch code, and a limited mm_cpumask to keep
> TLB flushing work bounded.
>
> ---8<---
>
> From dec4a588077563b86dbfe547737018b881e1f6c2 Mon Sep 17 00:00:00 2001
> From: Rik van Riel <riel@fb.com>
> Date: Mon, 2 Dec 2024 09:57:31 -0800
> Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second
>
> Setting and clearing CPU bits in the mm_cpumask is only ever done
> by the CPU itself, from the context switch code or the TLB flush
> code.
>
> Synchronization is handled by switch_mm_irqs_off blocking interrupts.
>
> Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
> longer running the program causes a regression in the will-it-scale
> tlbflush2 test. This test is contrived, but a large regression here
> might cause a small regression in some real world workload.
>
> Instead of always sending IPIs to CPUs that are in the mm_cpumask,
> but no longer running the program, send these IPIs only once a second.
>
> The rest of the time we can skip over CPUs where the loaded_mm is
> different from the target mm.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reported-by: kernel test roboto <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
> ---
> arch/x86/include/asm/mmu.h | 2 ++
> arch/x86/include/asm/mmu_context.h | 1 +
> arch/x86/mm/tlb.c | 25 ++++++++++++++++++++++---
> 3 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> index ce4677b8b735..2c7e3855b88b 100644
> --- a/arch/x86/include/asm/mmu.h
> +++ b/arch/x86/include/asm/mmu.h
> @@ -37,6 +37,8 @@ typedef struct {
> */
> atomic64_t tlb_gen;
>
> + unsigned long last_trimmed_cpumask;
I'd recommend to rename "last_trimmed_cpumask" to "next_trim_cpumask",
and always update it to "jiffies + HZ". Then we can remove the addition
from the comparison in the should_flush_tlb() fast-path:
if (time_after(jiffies, READ_ONCE(info->mm->context.next_trim_cpumask)))
return true;
> +
> #ifdef CONFIG_MODIFY_LDT_SYSCALL
> struct rw_semaphore ldt_usr_sem;
> struct ldt_struct *ldt;
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 2886cb668d7f..086af641d19a 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
>
> mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
> atomic64_set(&mm->context.tlb_gen, 0);
> + mm->context.last_trimmed_cpumask = jiffies;
mm->context.next_trim_cpumask = jiffies + HZ;
>
> #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 1aac4fa90d3d..19ae8ca34cb8 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -761,6 +761,7 @@ static void flush_tlb_func(void *info)
> if (f->mm && f->mm != loaded_mm) {
> cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
> trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
> + f->mm->context.last_trimmed_cpumask = jiffies;
mm->context.next_trim_cpumask is stored/loaded without any locks.
READ_ONCE()/WRITE_ONCE() would be relevant here.
This is likely introducing a burst of mostly useless
f->mm->context.next_trim_cpumask updates. When reaching the threshold
where trimming is permitted, IPIs are sent to a set of CPUs which are
allowed to trim. Each CPU performing the tlb flush for trimming
will end up updating the f->mm->context.next_trim_cpumask
concurrently, when in fact we only care about the first update.
We should change this to
unsigned long next_jiffies = jiffies + HZ;
if (time_after(next_jiffies, READ_ONCE(f->mm->context.next_trim_cpumask))
WRITE_ONCE(f->mm->context.next_trim_cpumask, next_jiffies);
> return;
> }
> }
> @@ -892,9 +893,27 @@ static void flush_tlb_func(void *info)
> nr_invalidate);
> }
>
> -static bool tlb_is_not_lazy(int cpu, void *data)
> +static bool should_flush_tlb(int cpu, void *data)
> {
> - return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
> + struct flush_tlb_info *info = data;
> +
> + /* Lazy TLB will get flushed at the next context switch. */
> + if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
> + return false;
> +
> + /* No mm means kernel memory flush. */
> + if (!info->mm)
> + return true;
> +
> + /* The target mm is loaded, and the CPU is not lazy. */
> + if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
> + return true;
> +
> + /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
> + if (jiffies > info->mm->context.last_trimmed_cpumask + HZ)
When jiffies overflow on 32-bit architectures, it will go back to a
near-zero value and chances are that
info->mm->context.last_trimmed_cpumask + HZ
is a near-overflow large value. Therefore, in that state, the comparison
will stay false for quite a while, which is unexpected.
This will prevent trimming the mm_cpumask for as long as that state
persists.
I'd recommend using the following overflow-safe comparison instead:
if (time_after(jiffies, info->mm->context.next_trim_cpumask)
return true;
> + return true;
> +
> + return false;
> }
>
> DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
> @@ -928,7 +947,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
> if (info->freed_tables)
> on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
> else
> - on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
> + on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
> (void *)info, 1, cpumask);
I'm concerned about the following race in smp_call_function_many_cond():
1) cond_func() is called for all remote cpus,
2) IPIs are sent.
3) cond_func() is called for the local cpu.
(3) is loading the next_trim_cpumask value after other cpus had a chance to
trim, and thus bump the next_trim_cpumask. This appears to be unwanted.
I would be tempted to move the evaluation of cond_func() before sending the
IPIs to other cpus in that function.
Thanks,
Mathieu
> }
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v2] x86,mm: only trim the mm_cpumask once a second
2024-12-03 14:57 ` Mathieu Desnoyers
@ 2024-12-03 19:48 ` Rik van Riel
2024-12-03 20:05 ` Dave Hansen
2024-12-03 23:27 ` Mathieu Desnoyers
0 siblings, 2 replies; 24+ messages in thread
From: Rik van Riel @ 2024-12-03 19:48 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Mel Gorman
On Tue, 3 Dec 2024 09:57:55 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> I'd recommend to rename "last_trimmed_cpumask" to "next_trim_cpumask",
> and always update it to "jiffies + HZ". Then we can remove the addition
> from the comparison in the should_flush_tlb() fast-path:
Thanks Mathieu, I have applied your suggested improvements,
except for the one you posted as a separate patch earlier.
---8<---
From c7d04233f15ba217ce6ebd0dcf12fab91c437e96 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@fb.com>
Date: Mon, 2 Dec 2024 09:57:31 -0800
Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second
Setting and clearing CPU bits in the mm_cpumask is only ever done
by the CPU itself, from the context switch code or the TLB flush
code.
Synchronization is handled by switch_mm_irqs_off blocking interrupts.
Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
longer running the program causes a regression in the will-it-scale
tlbflush2 test. This test is contrived, but a large regression here
might cause a small regression in some real world workload.
Instead of always sending IPIs to CPUs that are in the mm_cpumask,
but no longer running the program, send these IPIs only once a second.
The rest of the time we can skip over CPUs where the loaded_mm is
different from the target mm.
Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: kernel test roboto <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
---
arch/x86/include/asm/mmu.h | 2 ++
arch/x86/include/asm/mmu_context.h | 1 +
arch/x86/mm/tlb.c | 27 ++++++++++++++++++++++++---
3 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index ce4677b8b735..3b496cdcb74b 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -37,6 +37,8 @@ typedef struct {
*/
atomic64_t tlb_gen;
+ unsigned long next_trim_cpumask;
+
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct rw_semaphore ldt_usr_sem;
struct ldt_struct *ldt;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2886cb668d7f..795fdd53bd0a 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
atomic64_set(&mm->context.tlb_gen, 0);
+ mm->context.next_trim_cpumask = jiffies + HZ;
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1aac4fa90d3d..e90edbbf0188 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -759,8 +759,11 @@ static void flush_tlb_func(void *info)
/* Can only happen on remote CPUs */
if (f->mm && f->mm != loaded_mm) {
+ unsigned long next_jiffies = jiffies + HZ;
cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
+ if (time_after(next_jiffies, READ_ONCE(f->mm->context.next_trim_cpumask)))
+ WRITE_ONCE(f->mm->context.next_trim_cpumask, next_jiffies);
return;
}
}
@@ -892,9 +895,27 @@ static void flush_tlb_func(void *info)
nr_invalidate);
}
-static bool tlb_is_not_lazy(int cpu, void *data)
+static bool should_flush_tlb(int cpu, void *data)
{
- return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
+ struct flush_tlb_info *info = data;
+
+ /* Lazy TLB will get flushed at the next context switch. */
+ if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
+ return false;
+
+ /* No mm means kernel memory flush. */
+ if (!info->mm)
+ return true;
+
+ /* The target mm is loaded, and the CPU is not lazy. */
+ if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+ return true;
+
+ /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
+ if (time_after(jiffies, info->mm->context.next_trim_cpumask))
+ return true;
+
+ return false;
}
DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
@@ -928,7 +949,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
if (info->freed_tables)
on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
else
- on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
+ on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
(void *)info, 1, cpumask);
}
--
2.47.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v2] x86,mm: only trim the mm_cpumask once a second
2024-12-03 19:48 ` [PATCH v2] " Rik van Riel
@ 2024-12-03 20:05 ` Dave Hansen
2024-12-03 20:07 ` Rik van Riel
2024-12-03 23:27 ` Mathieu Desnoyers
1 sibling, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2024-12-03 20:05 UTC (permalink / raw)
To: Rik van Riel, Mathieu Desnoyers
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Linus Torvalds, Peter Zijlstra, Mel Gorman
On 12/3/24 11:48, Rik van Riel wrote:
> Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
> longer running the program causes a regression in the will-it-scale
> tlbflush2 test. This test is contrived, but a large regression here
> might cause a small regression in some real world workload.
The patch seems OK in theory, but this explanation doesn't sit right
with me.
Most of the will-it-scale tests including tlbflush2 have long-lived
CPU-bound threads. They shouldn't schedule out much at all during the
benchmark. I don't see how they could drive a significant increase in
IPIs to cause a 10%+ regression.
I'd much prefer that we understand the regression in detail before
throwing more code at fixing it.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] x86,mm: only trim the mm_cpumask once a second
2024-12-03 20:05 ` Dave Hansen
@ 2024-12-03 20:07 ` Rik van Riel
2024-12-04 0:46 ` Dave Hansen
0 siblings, 1 reply; 24+ messages in thread
From: Rik van Riel @ 2024-12-03 20:07 UTC (permalink / raw)
To: Dave Hansen, Mathieu Desnoyers
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Linus Torvalds, Peter Zijlstra, Mel Gorman
On Tue, 2024-12-03 at 12:05 -0800, Dave Hansen wrote:
> On 12/3/24 11:48, Rik van Riel wrote:
> > Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
> > longer running the program causes a regression in the will-it-scale
> > tlbflush2 test. This test is contrived, but a large regression here
> > might cause a small regression in some real world workload.
>
> The patch seems OK in theory, but this explanation doesn't sit right
> with me.
>
> Most of the will-it-scale tests including tlbflush2 have long-lived
> CPU-bound threads. They shouldn't schedule out much at all during the
> benchmark. I don't see how they could drive a significant increase in
> IPIs to cause a 10%+ regression.
>
> I'd much prefer that we understand the regression in detail before
> throwing more code at fixing it.
>
The tlb_flush2 threaded test does not only madvise in a
loop, but also mmap and munmap from inside every thread.
This should create massive contention on the mmap_lock,
resulting in threads going to sleep while waiting in
mmap and munmap.
https://github.com/antonblanchard/will-it-scale/blob/master/tests/tlb_flush2.c
--
All Rights Reversed.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] x86,mm: only trim the mm_cpumask once a second
2024-12-03 19:48 ` [PATCH v2] " Rik van Riel
2024-12-03 20:05 ` Dave Hansen
@ 2024-12-03 23:27 ` Mathieu Desnoyers
1 sibling, 0 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2024-12-03 23:27 UTC (permalink / raw)
To: Rik van Riel
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Mel Gorman
On 2024-12-03 14:48, Rik van Riel wrote:
> On Tue, 3 Dec 2024 09:57:55 -0500
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>
>
>> I'd recommend to rename "last_trimmed_cpumask" to "next_trim_cpumask",
>> and always update it to "jiffies + HZ". Then we can remove the addition
>> from the comparison in the should_flush_tlb() fast-path:
>
> Thanks Mathieu, I have applied your suggested improvements,
> except for the one you posted as a separate patch earlier.
>
> ---8<---
>
> From c7d04233f15ba217ce6ebd0dcf12fab91c437e96 Mon Sep 17 00:00:00 2001
> From: Rik van Riel <riel@fb.com>
> Date: Mon, 2 Dec 2024 09:57:31 -0800
> Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second
>
> Setting and clearing CPU bits in the mm_cpumask is only ever done
> by the CPU itself, from the context switch code or the TLB flush
> code.
>
> Synchronization is handled by switch_mm_irqs_off blocking interrupts.
>
> Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
> longer running the program causes a regression in the will-it-scale
> tlbflush2 test. This test is contrived, but a large regression here
> might cause a small regression in some real world workload.
We should add information detailing why tlbflush2 end up
contending on the mmap_sem, and thus schedule often.
>
> Instead of always sending IPIs to CPUs that are in the mm_cpumask,
> but no longer running the program, send these IPIs only once a second.
>
> The rest of the time we can skip over CPUs where the loaded_mm is
> different from the target mm.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
Much better !
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Reported-by: kernel test roboto <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
> ---
> arch/x86/include/asm/mmu.h | 2 ++
> arch/x86/include/asm/mmu_context.h | 1 +
> arch/x86/mm/tlb.c | 27 ++++++++++++++++++++++++---
> 3 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> index ce4677b8b735..3b496cdcb74b 100644
> --- a/arch/x86/include/asm/mmu.h
> +++ b/arch/x86/include/asm/mmu.h
> @@ -37,6 +37,8 @@ typedef struct {
> */
> atomic64_t tlb_gen;
>
> + unsigned long next_trim_cpumask;
> +
> #ifdef CONFIG_MODIFY_LDT_SYSCALL
> struct rw_semaphore ldt_usr_sem;
> struct ldt_struct *ldt;
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 2886cb668d7f..795fdd53bd0a 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
>
> mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
> atomic64_set(&mm->context.tlb_gen, 0);
> + mm->context.next_trim_cpumask = jiffies + HZ;
>
> #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 1aac4fa90d3d..e90edbbf0188 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -759,8 +759,11 @@ static void flush_tlb_func(void *info)
>
> /* Can only happen on remote CPUs */
> if (f->mm && f->mm != loaded_mm) {
> + unsigned long next_jiffies = jiffies + HZ;
> cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
> trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
> + if (time_after(next_jiffies, READ_ONCE(f->mm->context.next_trim_cpumask)))
> + WRITE_ONCE(f->mm->context.next_trim_cpumask, next_jiffies);
> return;
> }
> }
> @@ -892,9 +895,27 @@ static void flush_tlb_func(void *info)
> nr_invalidate);
> }
>
> -static bool tlb_is_not_lazy(int cpu, void *data)
> +static bool should_flush_tlb(int cpu, void *data)
> {
> - return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
> + struct flush_tlb_info *info = data;
> +
> + /* Lazy TLB will get flushed at the next context switch. */
> + if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
> + return false;
> +
> + /* No mm means kernel memory flush. */
> + if (!info->mm)
> + return true;
> +
> + /* The target mm is loaded, and the CPU is not lazy. */
> + if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
> + return true;
> +
> + /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
> + if (time_after(jiffies, info->mm->context.next_trim_cpumask))
> + return true;
> +
> + return false;
> }
>
> DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
> @@ -928,7 +949,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
> if (info->freed_tables)
> on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
> else
> - on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
> + on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
> (void *)info, 1, cpumask);
> }
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] x86,mm: only trim the mm_cpumask once a second
2024-12-03 20:07 ` Rik van Riel
@ 2024-12-04 0:46 ` Dave Hansen
2024-12-04 1:43 ` Rik van Riel
0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2024-12-04 0:46 UTC (permalink / raw)
To: Rik van Riel, Mathieu Desnoyers
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Linus Torvalds, Peter Zijlstra, Mel Gorman
On 12/3/24 12:07, Rik van Riel wrote:
> The tlb_flush2 threaded test does not only madvise in a
> loop, but also mmap and munmap from inside every thread.
>
> This should create massive contention on the mmap_lock,
> resulting in threads going to sleep while waiting in
> mmap and munmap.
>
> https://github.com/antonblanchard/will-it-scale/blob/master/tests/tlb_flush2.c
Oh, wow, it only madvise()'s a 1MB allocation before doing the
munmap()/mmap(). I somehow remembered it being a lot larger. And, yeah,
I see a ton of idle time which would be 100% explained by mmap_lock
contention.
Did the original workload that you care about have idle time?
I'm wondering if trimming mm_cpumask() on the way to idle but leaving it
alone on a context switch to another thread is a good idea.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] x86,mm: only trim the mm_cpumask once a second
2024-12-04 0:46 ` Dave Hansen
@ 2024-12-04 1:43 ` Rik van Riel
0 siblings, 0 replies; 24+ messages in thread
From: Rik van Riel @ 2024-12-04 1:43 UTC (permalink / raw)
To: Dave Hansen, Mathieu Desnoyers
Cc: kernel test robot, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Linus Torvalds, Peter Zijlstra, Mel Gorman
On Tue, 2024-12-03 at 16:46 -0800, Dave Hansen wrote:
> On 12/3/24 12:07, Rik van Riel wrote:
> > The tlb_flush2 threaded test does not only madvise in a
> > loop, but also mmap and munmap from inside every thread.
> >
> > This should create massive contention on the mmap_lock,
> > resulting in threads going to sleep while waiting in
> > mmap and munmap.
> >
> > https://github.com/antonblanchard/will-it-scale/blob/master/tests/tlb_flush2.c
>
> Oh, wow, it only madvise()'s a 1MB allocation before doing the
> munmap()/mmap(). I somehow remembered it being a lot larger. And,
> yeah,
> I see a ton of idle time which would be 100% explained by mmap_lock
> contention.
>
> Did the original workload that you care about have idle time?
>
The workloads that I care about are things like memcache,
web servers, web proxies, and other workloads that typically
handle very short requests before going idle again.
These programs have a LOT of context switches to and from
the idle task.
> I'm wondering if trimming mm_cpumask() on the way to idle but leaving
> it
> alone on a context switch to another thread is a good idea.
>
The problem with that is that you then have to set the bit
again when switching back to the program, which creates
contention when a number of CPUs are transitioning to and
from idle at the same time.
Atomic operations on a contended cache line from the
context switch code end up being quite visible when
profiling some workloads :)
--
All Rights Reversed.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86,mm: only trim the mm_cpumask once a second
2024-12-03 0:43 ` [PATCH] x86,mm: only trim the mm_cpumask once a second Rik van Riel
@ 2024-12-04 13:15 ` Oliver Sang
2024-12-04 16:07 ` Rik van Riel
2024-12-04 16:56 ` [PATCH v3] " Rik van Riel
0 siblings, 2 replies; 24+ messages in thread
From: Oliver Sang @ 2024-12-04 13:15 UTC (permalink / raw)
To: Rik van Riel
Cc: oe-lkp, lkp, linux-kernel, x86, Ingo Molnar, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Mel Gorman, Mathieu Desnoyers,
oliver.sang
hi, Rik van Riel,
On Mon, Dec 02, 2024 at 07:43:58PM -0500, Rik van Riel wrote:
> On Thu, 28 Nov 2024 22:57:35 +0800
> kernel test robot <oliver.sang@intel.com> wrote:
>
> > Hello,
> >
> > kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on:
> >
> >
> > commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
> > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
>
> The patch below should fix the will-it-scale performance regression,
> while still allowing us to keep the lazy mm_cpumask updates that help
> workloads in other ways.
>
> I do not have the same hardware as the Intel guys have access to, and
> could only test this on one two socket system, but hopefully this
> provides a simple (enough) compromise that allows us to keep both
> the lazier context switch code, and a limited mm_cpumask to keep
> TLB flushing work bounded.
we tested this patch, unfortunately, we found even bigger regression in our
will-it-scale tests. and for another vm-scalability test, it also causes a
little worse performance.
we noticed there is the v2 for this patch, not sure if any significant changes
which could impact performance? if so, please notify us and we could test
further. thanks
below is details.
out bot applied this patch automatically upon tip/x86/mm, like below
* 40036730a9566a x86,mm: only trim the mm_cpumask once a second
* 2815a56e4b7252 (tip/x86/mm) x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU
will-it-scale is now ~20% regression if comparing to 7e33001b8b
(full comparison is as [1])
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
2815a56e4b ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU")
40036730a9 ("x86,mm: only trim the mm_cpumask once a second")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 2815a56e4b7252a836969f5674e 40036730a9566a8abe36ffe2bf4
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
7276 -13.2% 6315 -13.0% 6328 -20.1% 5816 will-it-scale.per_thread_ops
vm-scalability is still ~40% (a little worse) regression comparing to 7e33001b8b
(full comparison is as [2])
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/priority/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled:
gcc-12/performance/x86_64-rhel-9.4/1/32/1/debian-12-x86_64-20240206.cgz/300/lkp-icl-2sp4/swap-w-seq-mt/vm-scalability/always/never
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
2815a56e4b ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU")
40036730a9 ("x86,mm: only trim the mm_cpumask once a second")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 2815a56e4b7252a836969f5674e 40036730a9566a8abe36ffe2bf4
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
1234132 ± 4% -40.7% 732265 -40.8% 730989 ± 3% -41.3% 724324 vm-scalability.throughput
[1]
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
2815a56e4b ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU")
40036730a9 ("x86,mm: only trim the mm_cpumask once a second")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 2815a56e4b7252a836969f5674e 40036730a9566a8abe36ffe2bf4
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
3743 ± 6% -9.6% 3383 ± 10% -2.0% 3669 ± 6% +0.2% 3750 ± 10% numa-meminfo.node1.PageTables
18158 ± 2% -9.9% 16367 ± 2% -12.6% 15874 -19.7% 14573 uptime.idle
36.77 -1.2% 36.34 -1.3% 36.28 -0.7% 36.52 boot-time.boot
3503 -1.3% 3458 -1.5% 3452 -0.8% 3477 boot-time.idle
1.421e+10 -9.6% 1.284e+10 ± 2% -11.4% 1.259e+10 -21.6% 1.114e+10 cpuidle..time
2.595e+08 -12.9% 2.26e+08 -13.2% 2.251e+08 -35.3% 1.678e+08 cpuidle..usage
44.91 -9.3% 40.74 -10.5% 40.20 -17.6% 37.01 vmstat.cpu.id
695438 -8.1% 638790 -7.4% 644221 -12.7% 607113 vmstat.system.cs
4553480 -3.5% 4393928 -2.6% 4436365 -9.6% 4116587 vmstat.system.in
1.139e+08 ± 2% -14.5% 97376097 ± 3% -12.7% 99390724 ± 3% -19.6% 91532740 ± 6% numa-numastat.node0.local_node
1.139e+08 ± 2% -14.5% 97404595 ± 3% -12.7% 99439249 ± 3% -19.6% 91590554 ± 6% numa-numastat.node0.numa_hit
1.146e+08 -11.6% 1.013e+08 ± 2% -13.0% 99664033 ± 3% -20.1% 91533454 ± 4% numa-numastat.node1.local_node
1.146e+08 -11.6% 1.013e+08 ± 2% -13.0% 99724983 ± 3% -20.1% 91583377 ± 4% numa-numastat.node1.numa_hit
20954 ± 17% -17.0% 17391 ± 2% -14.2% 17979 ± 12% -21.1% 16534 perf-c2c.DRAM.remote
26698 ± 17% -13.9% 22995 -10.5% 23907 ± 12% -16.5% 22292 perf-c2c.HITM.local
18165 ± 17% -17.7% 14957 ± 2% -15.1% 15413 ± 12% -22.4% 14089 perf-c2c.HITM.remote
44864 ± 17% -15.4% 37953 -12.4% 39320 ± 12% -18.9% 36381 perf-c2c.HITM.total
756738 -13.2% 656838 -13.0% 658224 -20.1% 604938 will-it-scale.104.threads
43.82 -9.5% 39.67 -9.6% 39.62 -17.0% 36.39 will-it-scale.104.threads_idle
7276 -13.2% 6315 -13.0% 6328 -20.1% 5816 will-it-scale.per_thread_ops
756738 -13.2% 656838 -13.0% 658224 -20.1% 604938 will-it-scale.workload
44.57 -4.3 40.31 -4.7 39.83 -8.0 36.58 mpstat.cpu.all.idle%
9.85 +6.1 15.94 +6.3 16.17 +12.7 22.51 mpstat.cpu.all.irq%
0.10 +0.0 0.12 +0.0 0.13 +0.0 0.12 mpstat.cpu.all.soft%
43.14 -1.5 41.60 -1.3 41.82 -4.4 38.79 mpstat.cpu.all.sys%
2.34 ± 2% -0.3 2.02 -0.3 2.06 -0.4 1.99 mpstat.cpu.all.usr%
1.139e+08 ± 2% -14.5% 97404162 ± 3% -12.7% 99438988 ± 3% -19.6% 91590389 ± 6% numa-vmstat.node0.numa_hit
1.139e+08 ± 2% -14.5% 97375664 ± 3% -12.7% 99390464 ± 3% -19.6% 91532575 ± 6% numa-vmstat.node0.numa_local
936.25 ± 6% -9.7% 845.81 ± 10% -2.0% 917.10 ± 6% +0.1% 937.31 ± 10% numa-vmstat.node1.nr_page_table_pages
1.146e+08 -11.6% 1.013e+08 ± 2% -13.0% 99724221 ± 3% -20.1% 91582434 ± 4% numa-vmstat.node1.numa_hit
1.146e+08 -11.6% 1.012e+08 ± 2% -13.0% 99663271 ± 3% -20.1% 91532511 ± 4% numa-vmstat.node1.numa_local
432388 ± 2% -2.1% 423334 -1.7% 425005 -2.2% 422800 proc-vmstat.nr_active_anon
261209 ± 3% -3.7% 251458 -2.9% 253702 ± 2% -3.8% 251380 proc-vmstat.nr_shmem
105930 +0.2% 106091 -0.0% 105903 -2.3% 103535 proc-vmstat.nr_slab_unreclaimable
432388 ± 2% -2.1% 423334 -1.7% 425005 -2.2% 422800 proc-vmstat.nr_zone_active_anon
2.286e+08 -13.1% 1.987e+08 -12.9% 1.992e+08 -19.9% 1.831e+08 proc-vmstat.numa_hit
2.285e+08 -13.1% 1.986e+08 -12.9% 1.99e+08 -19.9% 1.83e+08 proc-vmstat.numa_local
2.287e+08 -13.0% 1.988e+08 -12.9% 1.993e+08 -19.9% 1.832e+08 proc-vmstat.pgalloc_normal
4.559e+08 -13.1% 3.962e+08 -12.9% 3.971e+08 -20.0% 3.649e+08 proc-vmstat.pgfault
2.283e+08 -13.1% 1.985e+08 -12.9% 1.989e+08 -19.9% 1.828e+08 proc-vmstat.pgfree
35873620 ± 30% -22.4% 27821786 ± 33% -40.9% 21206791 ± 26% -37.6% 22398860 ± 27% sched_debug.cfs_rq:/.avg_vruntime.max
2545764 +2.1% 2599814 +0.8% 2566152 ± 2% -10.5% 2278449 sched_debug.cfs_rq:/.avg_vruntime.min
3.14 ± 4% +27.4% 4.00 ± 6% +21.1% 3.80 ± 12% +2.5% 3.22 ± 8% sched_debug.cfs_rq:/.load_avg.min
35873620 ± 30% -22.4% 27821786 ± 33% -40.9% 21206791 ± 26% -37.6% 22398860 ± 27% sched_debug.cfs_rq:/.min_vruntime.max
2545764 +2.1% 2599814 +0.8% 2566152 ± 2% -10.5% 2278449 sched_debug.cfs_rq:/.min_vruntime.min
127.33 ± 32% +24.4% 158.37 ± 23% +16.2% 147.90 ± 38% +46.3% 186.32 ± 16% sched_debug.cfs_rq:/.runnable_avg.min
126.56 ± 32% +22.5% 155.00 ± 24% +14.7% 145.12 ± 40% +45.8% 184.53 ± 16% sched_debug.cfs_rq:/.util_avg.min
82954 ± 8% +9.9% 91143 ± 7% +9.2% 90574 ± 7% +20.4% 99910 ± 3% sched_debug.cpu.avg_idle.min
175242 -4.2% 167899 ± 2% -5.5% 165629 -10.4% 157068 sched_debug.cpu.clock_task.avg
166599 -4.7% 158832 ± 2% -5.7% 157034 -11.2% 147873 sched_debug.cpu.clock_task.min
1598 ± 3% +53.8% 2458 ± 16% +50.1% 2398 ± 12% +175.8% 4409 ± 7% sched_debug.cpu.clock_task.stddev
1005828 -7.9% 925907 -7.9% 926693 -13.2% 873191 sched_debug.cpu.nr_switches.avg
1040675 -7.3% 964963 -7.3% 965061 -11.9% 917129 sched_debug.cpu.nr_switches.max
958802 -8.9% 873208 -8.8% 874819 -15.0% 815123 sched_debug.cpu.nr_switches.min
0.01 ± 45% -12.2% 0.01 ± 17% -61.2% 0.00 ±163% -100.0% 0.00 perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
0.04 ± 98% -25.9% 0.03 ± 56% -50.3% 0.02 ±196% -100.0% 0.00 perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
400.03 +0.0% 400.07 -66.7% 133.34 ±141% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
0.17 ± 5% -14.8% 0.14 ± 11% -9.2% 0.15 ± 6% -9.8% 0.15 ± 6% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.13 ± 5% -17.0% 0.11 ± 9% -11.2% 0.11 ± 8% -16.2% 0.11 ± 6% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
10.00 +0.0% 10.00 -66.7% 3.33 ±141% -100.0% 0.00 perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
41283 ± 5% +22.8% 50696 ± 13% +13.2% 46723 ± 7% +14.2% 47150 ± 8% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
8372 ± 13% +22.1% 10221 ± 9% +18.0% 9882 ± 8% +27.2% 10652 ± 8% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
1000 -0.0% 999.77 -66.7% 333.42 ±141% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
0.47 ± 5% +3.9% 0.49 ± 7% +8.3% 0.51 ± 11% +15.6% 0.55 ± 10% perf-sched.wait_time.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.part
400.01 +0.0% 400.06 -66.7% 133.33 ±141% -100.0% 0.00 perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
0.16 ± 5% -15.3% 0.14 ± 12% -9.6% 0.14 ± 6% -10.2% 0.14 ± 6% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.12 ± 6% -17.5% 0.10 ± 8% -11.9% 0.11 ± 8% -17.2% 0.10 ± 6% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
1.09 ± 21% +25.0% 1.36 ± 32% +21.4% 1.32 ± 29% +73.7% 1.89 ± 50% perf-sched.wait_time.max.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.part
1000 -0.0% 999.76 -66.7% 333.42 ±141% -100.0% 0.00 perf-sched.wait_time.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
5.74 -5.3% 5.43 -3.1% 5.56 ± 2% -1.7% 5.64 ± 2% perf-stat.i.MPKI
5.392e+09 -6.9% 5.019e+09 -5.7% 5.084e+09 -12.0% 4.747e+09 perf-stat.i.branch-instructions
2.80 +0.0 2.83 +0.0 2.83 +0.0 2.83 perf-stat.i.branch-miss-rate%
1.509e+08 -5.8% 1.421e+08 -4.4% 1.443e+08 -10.6% 1.349e+08 perf-stat.i.branch-misses
24.36 -1.4 22.92 -1.1 23.28 -1.5 22.86 perf-stat.i.cache-miss-rate%
1.538e+08 -12.1% 1.351e+08 -9.2% 1.396e+08 ± 2% -14.3% 1.318e+08 ± 2% perf-stat.i.cache-misses
6.321e+08 -6.4% 5.915e+08 -4.4% 6.041e+08 -8.0% 5.817e+08 perf-stat.i.cache-references
702183 -8.3% 644080 -7.6% 648563 -12.9% 611459 perf-stat.i.context-switches
6.24 +19.0% 7.42 +18.9% 7.42 +34.5% 8.39 perf-stat.i.cpi
1.672e+11 +10.1% 1.841e+11 +10.9% 1.854e+11 +16.8% 1.953e+11 perf-stat.i.cpu-cycles
550.50 +2.6% 565.02 +3.0% 566.88 +5.2% 579.40 perf-stat.i.cpu-migrations
1085 +25.0% 1356 +22.1% 1325 ± 2% +36.2% 1478 ± 2% perf-stat.i.cycles-between-cache-misses
2.683e+10 -7.0% 2.494e+10 -5.8% 2.528e+10 -12.0% 2.361e+10 perf-stat.i.instructions
0.17 -14.7% 0.14 ± 2% -15.0% 0.14 -24.4% 0.13 perf-stat.i.ipc
0.00 ±141% +265.0% 0.00 ± 33% +348.2% 0.00 ± 78% +360.0% 0.01 ±134% perf-stat.i.major-faults
35.60 -12.2% 31.27 -11.5% 31.52 -18.2% 29.13 perf-stat.i.metric.K/sec
1500379 -13.1% 1304071 -12.4% 1314966 -19.4% 1209204 perf-stat.i.minor-faults
1500379 -13.1% 1304071 -12.4% 1314966 -19.4% 1209204 perf-stat.i.page-faults
2.33 ± 44% +0.5 2.83 +0.5 2.84 +0.5 2.84 perf-stat.overall.branch-miss-rate%
5.19 ± 44% +42.2% 7.37 +41.3% 7.33 +59.4% 8.27 perf-stat.overall.cpi
905.91 ± 44% +50.4% 1362 +46.7% 1328 ± 2% +63.7% 1482 ± 2% perf-stat.overall.cycles-between-cache-misses
8967486 ± 44% +28.7% 11541403 +29.1% 11576850 +31.0% 11750089 perf-stat.overall.path-length
1.387e+11 ± 44% +32.2% 1.835e+11 +33.2% 1.848e+11 +40.3% 1.947e+11 perf-stat.ps.cpu-cycles
457.08 ± 44% +23.1% 562.85 +23.6% 564.75 +26.3% 577.31 perf-stat.ps.cpu-migrations
70.53 -6.7 63.83 -6.8 63.71 -10.3 60.22 perf-profile.calltrace.cycles-pp.__madvise
68.82 -6.4 62.40 -6.5 62.29 -10.0 58.78 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
68.63 -6.4 62.23 -6.5 62.12 -10.0 58.62 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
68.38 -6.4 62.02 -6.5 61.92 -10.0 58.42 perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
68.36 -6.4 62.01 -6.5 61.90 -10.0 58.40 perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
54.54 -3.9 50.68 -3.7 50.80 -5.5 49.00 perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
54.49 -3.8 50.64 -3.7 50.76 -5.5 48.96 perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
20.77 -3.8 16.93 ± 2% -3.7 17.07 -5.4 15.33 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
48.74 -3.8 44.99 -3.7 45.00 -5.1 43.64 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
42.51 -3.4 39.15 -3.3 39.18 -4.3 38.25 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
42.90 -3.4 39.54 -3.3 39.56 -4.3 38.63 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
43.33 -3.3 40.07 -3.2 40.12 -4.0 39.33 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
12.96 -2.3 10.63 ± 3% -2.6 10.40 ± 2% -4.2 8.75 ± 3% perf-profile.calltrace.cycles-pp.down_read.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.32 -2.2 10.12 ± 3% -2.4 9.91 ± 2% -4.0 8.30 ± 3% perf-profile.calltrace.cycles-pp.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
9.65 ± 2% -1.9 7.75 ± 3% -2.2 7.49 ± 2% -3.6 6.05 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise
9.46 ± 2% -1.9 7.57 ± 3% -2.1 7.33 ± 2% -3.5 5.91 ± 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.rwsem_down_read_slowpath.down_read.do_madvise
6.54 ± 2% -1.5 5.08 ± 2% -1.5 5.03 ± 4% -4.2 2.32 ± 14% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
6.29 -1.1 5.16 -1.0 5.28 ± 2% -1.4 4.93 perf-profile.calltrace.cycles-pp.testcase
4.34 -0.9 3.41 ± 2% -0.9 3.45 ± 2% -1.2 3.15 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
3.94 -0.9 3.07 ± 2% -0.8 3.11 ± 2% -1.1 2.85 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
3.79 -0.8 2.95 ± 2% -0.8 2.99 ± 2% -1.1 2.74 ± 2% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
3.74 -0.8 2.91 ± 2% -0.8 2.95 ± 2% -1.0 2.71 ± 2% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
4.88 ± 2% -0.8 4.11 ± 3% -0.7 4.13 ± 3% -2.4 2.48 ± 4% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.28 ± 2% -0.8 0.52 -0.8 0.45 ± 39% -1.3 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
4.63 -0.7 3.92 -0.7 3.94 -1.0 3.65 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
3.82 -0.7 3.13 -0.6 3.24 ± 4% -0.9 2.94 ± 3% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
3.34 ± 2% -0.6 2.72 ± 2% -0.5 2.83 ± 5% -0.8 2.57 ± 3% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
3.23 ± 2% -0.6 2.63 ± 2% -0.5 2.75 ± 5% -0.7 2.50 ± 4% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
5.07 -0.4 4.67 ± 2% -0.4 4.63 ± 2% -1.0 4.07 ± 2% perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
5.05 -0.4 4.65 ± 2% -0.4 4.61 ± 2% -1.0 4.06 ± 2% perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
3.31 -0.4 2.92 -0.4 2.94 -0.3 3.00 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
3.48 -0.4 3.09 -0.4 3.11 -0.3 3.17 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
3.35 -0.4 2.96 -0.4 2.98 -0.3 3.04 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
3.82 -0.4 3.44 -0.4 3.46 -0.3 3.52 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
4.88 -0.4 4.51 ± 2% -0.4 4.47 ± 2% -0.9 3.93 ± 2% perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.zap_page_range_single
2.13 -0.3 1.84 -0.3 1.84 -0.4 1.70 perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
1.64 -0.3 1.38 ± 2% -0.2 1.44 ± 5% -0.3 1.33 ± 2% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
1.38 -0.2 1.16 -0.2 1.16 -0.3 1.13 perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
1.40 -0.2 1.18 -0.2 1.23 ± 5% -0.3 1.15 ± 2% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.89 ± 6% -0.2 0.69 ± 9% -0.2 0.74 ± 10% -0.2 0.64 ± 10% perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
3.23 ± 2% -0.2 3.06 -0.2 3.03 ± 3% -0.6 2.66 ± 3% perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
1.13 -0.2 0.97 ± 3% -0.1 0.99 ± 2% -0.2 0.94 perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
2.92 ± 2% -0.1 2.79 -0.2 2.77 ± 3% -0.5 2.42 ± 3% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
2.85 ± 2% -0.1 2.74 -0.1 2.71 ± 3% -0.5 2.37 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
2.74 ± 2% -0.1 2.64 -0.1 2.61 ± 3% -0.5 2.28 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.58 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.58 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.70 -0.1 0.60 -0.1 0.61 -0.1 0.58 perf-profile.calltrace.cycles-pp.__munmap
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.58 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.68 -0.1 0.60 -0.1 0.61 -0.1 0.57 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.62 ± 3% -0.1 0.54 -0.0 0.58 ± 6% -0.1 0.53 ± 3% perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
0.88 ± 2% -0.1 0.82 ± 2% -0.1 0.82 ± 3% -0.2 0.70 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
0.81 ± 2% -0.1 0.75 ± 2% -0.1 0.75 ± 3% -0.2 0.64 ± 3% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
1.48 -0.1 1.43 -0.0 1.47 ± 2% -0.1 1.38 perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read
0.77 ± 2% -0.1 0.72 ± 3% -0.1 0.72 ± 3% -0.2 0.62 ± 4% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.folios_put_refs
0.78 ± 2% -0.1 0.73 ± 2% -0.1 0.73 ± 3% -0.2 0.62 ± 4% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.folios_put_refs.free_pages_and_swap_cache
1.48 -0.0 1.43 -0.0 1.47 ± 2% -0.1 1.38 perf-profile.calltrace.cycles-pp.schedule.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
1.48 -0.0 1.43 -0.0 1.47 ± 2% -0.1 1.39 perf-profile.calltrace.cycles-pp.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise
0.83 ± 2% -0.0 0.79 ± 3% -0.0 0.79 ± 3% -0.1 0.74 ± 4% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
0.75 -0.0 0.71 ± 3% -0.0 0.72 ± 3% -0.1 0.67 ± 4% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.folio_batch_move_lru.lru_add_drain_cpu
0.77 -0.0 0.74 ± 3% -0.0 0.74 ± 3% -0.1 0.69 ± 4% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
0.74 -0.0 0.71 ± 3% -0.0 0.71 ± 3% -0.1 0.67 ± 4% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.folio_batch_move_lru
0.00 +0.0 0.00 +0.0 0.00 +0.6 0.55 ± 7% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.finish_task_switch
0.00 +0.0 0.00 +0.0 0.00 +0.6 0.55 ± 7% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.finish_task_switch.__schedule
0.00 +0.0 0.00 +0.0 0.00 +0.6 0.56 ± 7% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.finish_task_switch.__schedule.schedule_idle
0.00 +0.0 0.00 +0.0 0.00 +0.6 0.58 ± 7% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.finish_task_switch.__schedule.schedule_idle.do_idle
0.00 +0.0 0.00 +0.0 0.00 +0.6 0.61 ± 7% perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule_idle.do_idle.cpu_startup_entry
0.51 +0.0 0.56 +0.0 0.55 -0.5 0.00 perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.54 +0.1 0.64 ± 3% +0.1 0.65 -0.1 0.41 ± 50% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary
0.60 ± 2% +0.1 0.72 ± 3% +0.1 0.73 -0.0 0.58 ± 3% perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.common_startup_64
2.98 +0.2 3.21 ± 2% +0.2 3.22 ± 3% -0.0 2.96 ± 3% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
2.90 ± 2% +0.2 3.15 ± 2% +0.2 3.15 ± 3% -0.0 2.90 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
2.81 ± 2% +0.2 3.06 ± 2% +0.2 3.06 ± 3% -0.0 2.81 ± 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
0.72 ± 2% +0.6 1.29 ± 2% +0.6 1.36 ± 3% +1.2 1.88 ± 3% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
0.70 ± 2% +0.6 1.28 ± 2% +0.6 1.35 ± 3% +1.2 1.87 ± 3% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.00 +0.6 0.63 ± 2% +0.7 0.67 ± 4% +0.8 0.80 ± 4% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule_idle.do_idle.cpu_startup_entry
9.12 +1.0 10.12 +1.0 10.08 +0.8 9.88 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.70 ± 2% +1.4 2.06 ± 4% +1.4 2.11 ± 3% +0.8 1.47 ± 3% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.intel_idle_irq.cpuidle_enter_state.cpuidle_enter
0.61 ± 2% +1.4 2.00 ± 4% +1.4 2.05 ± 3% +0.8 1.42 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.intel_idle_irq.cpuidle_enter_state
0.60 ± 2% +1.4 1.99 ± 4% +1.4 2.04 ± 3% +0.8 1.42 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.intel_idle_irq
19.23 +7.0 26.22 +6.9 26.15 +10.7 29.91 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
19.34 +7.0 26.36 +7.0 26.30 +10.7 30.05 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
20.03 +7.2 27.19 +7.1 27.11 +10.6 30.64 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
21.50 +7.9 29.39 +7.9 29.39 +11.8 33.26 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
21.51 +7.9 29.40 +7.9 29.40 +11.8 33.27 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
21.51 +7.9 29.40 +7.9 29.40 +11.8 33.28 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
21.72 +8.0 29.70 +8.0 29.69 +11.9 33.60 perf-profile.calltrace.cycles-pp.common_startup_64
1.04 +8.2 9.20 ± 2% +8.1 9.19 ± 2% +14.6 15.66 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state
1.05 +8.2 9.23 ± 2% +8.2 9.22 ± 2% +14.6 15.68 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter
1.17 +8.2 9.38 ± 2% +8.2 9.36 ± 2% +14.8 16.01 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
1.28 +8.2 9.51 ± 2% +8.2 9.49 ± 2% +14.9 16.14 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.91 +9.3 11.17 +9.3 11.23 +15.5 17.41 ± 2% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
70.58 -6.7 63.87 -6.8 63.75 -10.3 60.26 perf-profile.children.cycles-pp.__madvise
69.89 -6.5 63.37 -6.6 63.26 -10.2 59.72 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
69.68 -6.5 63.20 -6.6 63.10 -10.1 59.56 perf-profile.children.cycles-pp.do_syscall_64
68.38 -6.4 62.02 -6.5 61.92 -10.0 58.42 perf-profile.children.cycles-pp.__x64_sys_madvise
68.37 -6.4 62.01 -6.5 61.91 -10.0 58.41 perf-profile.children.cycles-pp.do_madvise
21.28 -3.9 17.39 ± 2% -3.8 17.52 -5.5 15.75 perf-profile.children.cycles-pp.llist_add_batch
54.54 -3.9 50.68 -3.7 50.80 -5.5 49.01 perf-profile.children.cycles-pp.madvise_vma_behavior
54.50 -3.8 50.65 -3.7 50.77 -5.5 48.97 perf-profile.children.cycles-pp.zap_page_range_single
48.89 -3.8 45.11 -3.8 45.12 -5.1 43.75 perf-profile.children.cycles-pp.tlb_finish_mmu
43.03 -3.4 39.65 -3.4 39.67 -4.3 38.73 perf-profile.children.cycles-pp.smp_call_function_many_cond
43.03 -3.4 39.65 -3.4 39.67 -4.3 38.73 perf-profile.children.cycles-pp.on_each_cpu_cond_mask
43.48 -3.3 40.19 -3.2 40.24 -4.0 39.45 perf-profile.children.cycles-pp.flush_tlb_mm_range
8.38 ± 2% -2.4 5.97 ± 2% -2.4 5.95 ± 3% -5.1 3.26 ± 4% perf-profile.children.cycles-pp.intel_idle_irq
12.98 -2.3 10.64 ± 3% -2.6 10.40 ± 2% -4.2 8.75 ± 3% perf-profile.children.cycles-pp.down_read
12.41 -2.2 10.19 ± 3% -2.4 9.97 ± 2% -4.1 8.36 ± 3% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
9.72 ± 2% -1.9 7.81 ± 3% -2.2 7.55 ± 2% -3.6 6.10 ± 4% perf-profile.children.cycles-pp._raw_spin_lock_irq
15.12 -1.7 13.38 ± 2% -2.0 13.10 ± 2% -4.0 11.09 ± 3% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
8.04 -1.4 6.63 -1.4 6.68 -2.0 6.02 perf-profile.children.cycles-pp.llist_reverse_order
6.87 -1.2 5.64 -1.1 5.76 ± 2% -1.5 5.37 perf-profile.children.cycles-pp.testcase
4.16 -0.7 3.41 -0.6 3.52 ± 4% -1.0 3.21 ± 3% perf-profile.children.cycles-pp.asm_exc_page_fault
3.34 ± 2% -0.6 2.73 ± 2% -0.5 2.84 ± 5% -0.8 2.58 ± 3% perf-profile.children.cycles-pp.exc_page_fault
3.30 ± 2% -0.6 2.69 ± 2% -0.5 2.80 ± 5% -0.8 2.55 ± 3% perf-profile.children.cycles-pp.do_user_addr_fault
5.07 -0.4 4.67 ± 2% -0.4 4.63 ± 2% -1.0 4.08 ± 2% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
5.05 -0.4 4.66 ± 2% -0.4 4.61 ± 2% -1.0 4.06 ± 2% perf-profile.children.cycles-pp.free_pages_and_swap_cache
5.06 -0.4 4.67 ± 2% -0.4 4.63 ± 2% -1.0 4.07 ± 2% perf-profile.children.cycles-pp.folios_put_refs
2.22 -0.3 1.92 -0.3 1.92 -0.5 1.76 perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
1.65 -0.3 1.39 ± 2% -0.2 1.45 ± 5% -0.3 1.35 ± 2% perf-profile.children.cycles-pp.handle_mm_fault
1.44 -0.2 1.20 -0.2 1.20 -0.2 1.19 perf-profile.children.cycles-pp.__irqentry_text_end
1.41 -0.2 1.19 ± 2% -0.2 1.25 ± 5% -0.3 1.16 ± 2% perf-profile.children.cycles-pp.__handle_mm_fault
0.90 ± 6% -0.2 0.70 ± 9% -0.2 0.75 ± 9% -0.3 0.65 ± 10% perf-profile.children.cycles-pp.lock_vma_under_rcu
1.23 -0.2 1.05 -0.2 1.05 -0.2 0.98 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
3.24 -0.2 3.06 -0.2 3.03 ± 3% -0.6 2.66 ± 3% perf-profile.children.cycles-pp.__page_cache_release
1.14 -0.2 0.98 ± 3% -0.1 1.00 ± 2% -0.2 0.95 perf-profile.children.cycles-pp.do_anonymous_page
0.26 ± 5% -0.1 0.12 ± 8% -0.2 0.11 ± 12% -0.2 0.02 ±122% perf-profile.children.cycles-pp.poll_idle
0.79 -0.1 0.66 -0.1 0.67 ± 2% -0.1 0.66 ± 2% perf-profile.children.cycles-pp.error_entry
0.62 ± 3% -0.1 0.49 ± 8% -0.1 0.49 ± 5% -0.2 0.44 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
0.45 ± 8% -0.1 0.33 ± 16% -0.1 0.39 ± 21% -0.1 0.34 ± 18% perf-profile.children.cycles-pp.mas_walk
0.88 -0.1 0.76 -0.1 0.76 -0.0 0.87 perf-profile.children.cycles-pp.native_irq_return_iret
0.54 ± 3% -0.1 0.42 ± 5% -0.1 0.42 ± 3% -0.2 0.38 perf-profile.children.cycles-pp.page_counter_uncharge
0.50 ± 3% -0.1 0.39 ± 4% -0.1 0.39 ± 3% -0.2 0.35 perf-profile.children.cycles-pp.page_counter_cancel
0.54 ± 3% -0.1 0.43 ± 6% -0.1 0.43 ± 5% -0.2 0.38 perf-profile.children.cycles-pp.uncharge_batch
0.55 ± 2% -0.1 0.45 ± 3% -0.1 0.44 ± 2% -0.2 0.39 ± 3% perf-profile.children.cycles-pp.up_read
0.70 -0.1 0.60 -0.1 0.61 -0.1 0.58 perf-profile.children.cycles-pp.__munmap
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.58 perf-profile.children.cycles-pp.__vm_munmap
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.58 perf-profile.children.cycles-pp.__x64_sys_munmap
0.67 ± 3% -0.1 0.58 -0.1 0.62 ± 6% -0.1 0.57 ± 2% perf-profile.children.cycles-pp.unmap_page_range
0.52 ± 2% -0.1 0.44 ± 4% -0.1 0.45 ± 3% -0.1 0.42 ± 2% perf-profile.children.cycles-pp.alloc_anon_folio
0.57 -0.1 0.50 ± 4% -0.1 0.52 ± 4% -0.1 0.49 ± 2% perf-profile.children.cycles-pp.zap_pmd_range
0.54 -0.1 0.48 ± 2% -0.1 0.49 -0.1 0.47 ± 2% perf-profile.children.cycles-pp.do_vmi_align_munmap
0.54 -0.1 0.48 ± 2% -0.1 0.49 -0.1 0.47 ± 2% perf-profile.children.cycles-pp.do_vmi_munmap
0.38 ± 2% -0.1 0.32 -0.1 0.32 ± 2% -0.1 0.29 ± 2% perf-profile.children.cycles-pp.vma_alloc_folio_noprof
0.53 ± 3% -0.1 0.47 ± 4% -0.0 0.49 ± 5% -0.1 0.46 ± 2% perf-profile.children.cycles-pp.zap_pte_range
1.51 -0.1 1.45 -0.0 1.50 ± 2% -0.1 1.41 perf-profile.children.cycles-pp.schedule_preempt_disabled
0.52 ± 2% -0.1 0.47 ± 2% -0.1 0.47 -0.1 0.46 perf-profile.children.cycles-pp.vms_complete_munmap_vmas
0.48 -0.1 0.42 -0.1 0.42 -0.1 0.39 ± 2% perf-profile.children.cycles-pp.native_flush_tlb_local
0.31 ± 2% -0.1 0.26 -0.0 0.26 ± 2% -0.1 0.24 ± 2% perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
0.27 ± 5% -0.1 0.22 ± 8% -0.0 0.23 ± 8% -0.1 0.20 ± 4% perf-profile.children.cycles-pp.tlb_gather_mmu
0.31 ± 2% -0.1 0.26 ± 2% -0.0 0.27 -0.1 0.24 ± 2% perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
0.34 -0.1 0.29 ± 2% -0.1 0.29 ± 2% -0.0 0.30 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
1.51 -0.1 1.46 -0.0 1.50 ± 2% -0.1 1.42 perf-profile.children.cycles-pp.schedule
0.32 -0.0 0.27 -0.0 0.28 -0.0 0.28 ± 2% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.50 -0.0 0.45 -0.0 0.45 ± 2% -0.1 0.43 perf-profile.children.cycles-pp.dequeue_task_fair
0.40 -0.0 0.35 ± 2% -0.0 0.36 ± 2% -0.1 0.33 perf-profile.children.cycles-pp.__pte_offset_map_lock
0.48 -0.0 0.43 -0.0 0.43 -0.1 0.42 perf-profile.children.cycles-pp.dequeue_entities
0.28 ± 2% -0.0 0.24 ± 2% -0.0 0.24 ± 2% -0.1 0.22 ± 3% perf-profile.children.cycles-pp.__alloc_pages_noprof
0.28 -0.0 0.24 ± 4% -0.0 0.24 ± 3% -0.1 0.22 perf-profile.children.cycles-pp.lru_gen_del_folio
0.24 ± 5% -0.0 0.20 ± 7% -0.0 0.22 ± 13% -0.0 0.20 ± 14% perf-profile.children.cycles-pp.find_vma_prev
0.22 ± 3% -0.0 0.18 ± 4% -0.0 0.18 ± 3% -0.1 0.16 ± 2% perf-profile.children.cycles-pp.__perf_sw_event
0.32 -0.0 0.28 -0.0 0.28 ± 2% -0.0 0.27 ± 2% perf-profile.children.cycles-pp.irqtime_account_irq
0.24 ± 4% -0.0 0.20 ± 7% -0.0 0.19 ± 6% -0.1 0.18 ± 6% perf-profile.children.cycles-pp.down_read_trylock
0.19 ± 3% -0.0 0.15 ± 3% -0.0 0.16 ± 3% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.vms_clear_ptes
0.14 ± 9% -0.0 0.10 ± 8% -0.0 0.11 ± 12% -0.0 0.10 ± 8% perf-profile.children.cycles-pp.flush_tlb_batched_pending
0.22 -0.0 0.19 ± 2% -0.0 0.18 ± 2% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.get_page_from_freelist
0.24 -0.0 0.20 ± 3% -0.0 0.20 ± 2% -0.1 0.18 ± 3% perf-profile.children.cycles-pp.sync_regs
0.21 ± 2% -0.0 0.18 ± 4% -0.0 0.18 ± 3% -0.0 0.17 perf-profile.children.cycles-pp.___perf_sw_event
0.23 ± 3% -0.0 0.19 ± 2% -0.0 0.20 ± 3% -0.1 0.17 ± 2% perf-profile.children.cycles-pp.down_write_killable
0.29 -0.0 0.26 -0.0 0.26 -0.1 0.24 ± 3% perf-profile.children.cycles-pp.dequeue_entity
0.22 ± 3% -0.0 0.19 ± 2% -0.0 0.19 ± 2% -0.1 0.17 ± 2% perf-profile.children.cycles-pp.rwsem_down_write_slowpath
0.06 -0.0 0.03 ± 81% -0.0 0.05 -0.0 0.02 ±122% perf-profile.children.cycles-pp.__cond_resched
0.29 ± 2% -0.0 0.26 ± 2% -0.0 0.25 ± 8% -0.0 0.27 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler
0.27 -0.0 0.24 ± 4% -0.0 0.24 ± 2% -0.0 0.22 ± 2% perf-profile.children.cycles-pp.lru_gen_add_folio
0.09 ± 4% -0.0 0.06 ± 6% -0.0 0.07 ± 7% -0.0 0.06 ± 5% perf-profile.children.cycles-pp.call_function_single_prep_ipi
0.23 ± 2% -0.0 0.20 ± 2% -0.0 0.20 ± 2% -0.0 0.19 ± 3% perf-profile.children.cycles-pp.sched_clock_cpu
0.15 ± 4% -0.0 0.13 ± 13% -0.0 0.13 ± 13% -0.0 0.12 ± 8% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
0.20 ± 2% -0.0 0.18 ± 2% -0.0 0.17 ± 2% -0.0 0.16 ± 2% perf-profile.children.cycles-pp.native_sched_clock
0.15 ± 4% -0.0 0.12 ± 3% -0.0 0.13 ± 3% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.rwsem_mark_wake
0.33 ± 2% -0.0 0.31 -0.0 0.31 -0.0 0.31 perf-profile.children.cycles-pp.downgrade_write
0.14 ± 4% -0.0 0.12 ± 3% -0.0 0.12 ± 3% -0.0 0.11 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.14 ± 7% -0.0 0.12 ± 16% -0.0 0.12 ± 14% -0.0 0.11 ± 9% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.34 -0.0 0.32 ± 3% -0.0 0.32 ± 2% -0.0 0.30 perf-profile.children.cycles-pp.lru_add
0.42 ± 2% -0.0 0.40 ± 2% -0.0 0.40 ± 2% -0.0 0.40 ± 2% perf-profile.children.cycles-pp.try_to_wake_up
0.25 -0.0 0.23 -0.0 0.22 ± 8% -0.0 0.23 ± 2% perf-profile.children.cycles-pp.update_process_times
0.12 ± 6% -0.0 0.10 ± 9% -0.0 0.10 ± 10% -0.0 0.09 ± 4% perf-profile.children.cycles-pp.folio_add_new_anon_rmap
0.20 ± 2% -0.0 0.17 ± 2% -0.0 0.17 ± 2% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.sched_clock
0.37 -0.0 0.35 -0.0 0.35 ± 3% -0.0 0.35 perf-profile.children.cycles-pp.__hrtimer_run_queues
0.14 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 2% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.update_curr
0.13 ± 5% -0.0 0.11 ± 3% -0.0 0.11 ± 2% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.rwsem_optimistic_spin
0.11 -0.0 0.09 ± 4% -0.0 0.09 ± 4% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.clear_page_erms
0.15 -0.0 0.13 ± 3% -0.0 0.14 ± 2% -0.0 0.13 ± 4% perf-profile.children.cycles-pp.__smp_call_single_queue
0.14 ± 3% -0.0 0.13 ± 3% -0.0 0.13 ± 15% +0.0 0.17 ± 2% perf-profile.children.cycles-pp.ktime_get
0.10 ± 4% -0.0 0.08 ± 13% -0.0 0.08 ± 11% -0.0 0.07 ± 12% perf-profile.children.cycles-pp.__folio_mod_stat
0.43 -0.0 0.41 -0.0 0.41 ± 2% -0.0 0.41 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt
0.16 ± 2% -0.0 0.15 ± 5% -0.0 0.15 ± 3% -0.0 0.15 perf-profile.children.cycles-pp.dl_server_stop
0.22 ± 2% -0.0 0.20 -0.0 0.20 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.enqueue_entity
0.18 ± 2% -0.0 0.16 ± 2% -0.0 0.16 ± 3% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.update_load_avg
0.14 ± 2% -0.0 0.12 ± 3% -0.0 0.13 ± 2% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.__hrtimer_start_range_ns
0.12 ± 3% -0.0 0.10 ± 4% -0.0 0.11 ± 4% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.10 ± 3% -0.0 0.09 ± 4% -0.0 0.09 ± 4% -0.0 0.08 ± 3% perf-profile.children.cycles-pp.free_unref_folios
0.19 ± 2% -0.0 0.18 ± 4% -0.0 0.18 ± 4% -0.0 0.17 ± 3% perf-profile.children.cycles-pp.zap_present_ptes
0.19 ± 2% -0.0 0.18 -0.0 0.18 ± 2% -0.0 0.18 ± 3% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.08 ± 6% -0.0 0.06 ± 7% -0.0 0.06 ± 4% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.rmqueue
0.07 ± 7% -0.0 0.05 ± 9% -0.0 0.05 ± 4% -0.0 0.06 ± 5% perf-profile.children.cycles-pp.get_nohz_timer_target
0.09 -0.0 0.08 ± 5% -0.0 0.07 ± 6% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.read_tsc
0.14 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.idle_cpu
0.08 ± 5% -0.0 0.07 ± 6% -0.0 0.08 ± 6% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.prepare_task_switch
0.06 ± 7% -0.0 0.05 ± 9% -0.0 0.05 ± 29% -0.0 0.05 ± 35% perf-profile.children.cycles-pp.mm_cid_get
0.07 -0.0 0.06 -0.0 0.06 ± 4% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.rwsem_spin_on_owner
0.07 -0.0 0.06 -0.0 0.06 ± 7% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.native_apic_mem_eoi
0.09 ± 5% -0.0 0.08 ± 5% -0.0 0.07 ± 4% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.irq_enter_rcu
0.08 ± 4% -0.0 0.07 -0.0 0.07 ± 3% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.__switch_to_asm
0.06 -0.0 0.05 ± 7% -0.0 0.05 -0.0 0.05 ± 7% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.08 -0.0 0.07 ± 6% -0.0 0.08 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.set_next_task_fair
0.32 -0.0 0.31 -0.0 0.31 ± 2% -0.0 0.30 perf-profile.children.cycles-pp.__mmap
0.06 ± 7% -0.0 0.06 -0.0 0.06 ± 9% -0.0 0.04 ± 33% perf-profile.children.cycles-pp.tick_irq_enter
0.06 -0.0 0.06 ± 6% -0.0 0.06 ± 4% +0.0 0.07 perf-profile.children.cycles-pp.set_next_entity
0.00 +0.0 0.00 +0.0 0.00 +0.2 0.21 ± 4% perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.09 ± 7% +0.0 0.09 ± 5% +0.0 0.10 ± 4% +0.0 0.11 ± 3% perf-profile.children.cycles-pp.sched_core_idle_cpu
0.07 ± 6% +0.0 0.08 ± 7% +0.0 0.08 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.irqentry_enter
0.16 ± 2% +0.0 0.18 ± 4% +0.0 0.18 ± 2% +0.0 0.19 ± 2% perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.05 ± 8% +0.0 0.07 ± 5% +0.0 0.07 ± 4% +0.0 0.06 ± 4% perf-profile.children.cycles-pp.__switch_to
0.10 ± 3% +0.0 0.11 ± 4% +0.0 0.12 ± 5% +0.0 0.12 perf-profile.children.cycles-pp.hrtimer_try_to_cancel
0.67 ± 3% +0.0 0.69 ± 2% +0.0 0.70 ± 2% +0.1 0.73 perf-profile.children.cycles-pp.__pick_next_task
0.50 +0.0 0.52 +0.0 0.53 ± 2% +0.1 0.57 perf-profile.children.cycles-pp.__irq_exit_rcu
0.66 +0.0 0.68 +0.0 0.69 +0.1 0.72 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.62 ± 2% +0.0 0.64 ± 2% +0.0 0.65 ± 2% +0.1 0.68 perf-profile.children.cycles-pp.pick_next_task_fair
0.11 ± 3% +0.0 0.14 ± 5% +0.0 0.13 ± 4% +0.0 0.14 ± 2% perf-profile.children.cycles-pp.start_dl_timer
0.05 +0.0 0.08 ± 6% +0.0 0.09 ± 5% +0.0 0.08 ± 4% perf-profile.children.cycles-pp.task_contending
0.47 ± 3% +0.0 0.50 ± 2% +0.0 0.51 ± 3% +0.0 0.50 perf-profile.children.cycles-pp.sched_balance_newidle
0.45 ± 3% +0.0 0.48 ± 3% +0.0 0.49 ± 3% +0.0 0.48 perf-profile.children.cycles-pp.sched_balance_rq
0.03 ± 70% +0.0 0.07 +0.0 0.07 ± 5% -0.0 0.03 ± 81% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.52 +0.0 0.56 +0.0 0.55 -0.1 0.41 perf-profile.children.cycles-pp.menu_select
0.19 ± 5% +0.0 0.23 +0.1 0.24 ± 3% +0.1 0.27 ± 3% perf-profile.children.cycles-pp.handle_softirqs
0.18 ± 2% +0.0 0.22 ± 3% +0.0 0.22 ± 2% +0.1 0.23 perf-profile.children.cycles-pp.enqueue_dl_entity
0.18 ± 3% +0.1 0.23 +0.1 0.23 +0.1 0.24 perf-profile.children.cycles-pp.dl_server_start
0.11 ± 3% +0.1 0.16 ± 2% +0.1 0.17 ± 3% +0.2 0.26 ± 4% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
0.46 +0.1 0.52 +0.1 0.51 +0.0 0.49 perf-profile.children.cycles-pp.enqueue_task_fair
0.00 +0.1 0.06 +0.1 0.06 ± 6% +0.0 0.00 perf-profile.children.cycles-pp.call_cpuidle
0.47 +0.1 0.53 +0.1 0.53 +0.0 0.50 perf-profile.children.cycles-pp.enqueue_task
0.48 +0.1 0.55 +0.1 0.54 +0.0 0.51 perf-profile.children.cycles-pp.ttwu_do_activate
0.62 +0.1 0.70 ± 4% +0.1 0.70 ± 2% +0.1 0.67 perf-profile.children.cycles-pp._find_next_bit
0.18 ± 2% +0.1 0.26 +0.1 0.27 ± 3% +0.2 0.34 ± 3% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.20 +0.1 0.28 ± 3% +0.1 0.29 ± 3% +0.2 0.36 ± 4% perf-profile.children.cycles-pp.sysvec_call_function_single
0.64 +0.1 0.72 +0.1 0.72 +0.1 0.78 perf-profile.children.cycles-pp.sched_ttwu_pending
0.22 +0.1 0.30 +0.1 0.31 ± 3% +0.2 0.38 ± 4% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.21 ± 2% +0.1 0.30 ± 8% +0.1 0.28 ± 8% +0.1 0.32 ± 11% perf-profile.children.cycles-pp.rest_init
0.21 ± 2% +0.1 0.30 ± 8% +0.1 0.28 ± 8% +0.1 0.32 ± 11% perf-profile.children.cycles-pp.start_kernel
0.21 ± 2% +0.1 0.30 ± 8% +0.1 0.28 ± 8% +0.1 0.32 ± 11% perf-profile.children.cycles-pp.x86_64_start_kernel
0.21 ± 2% +0.1 0.30 ± 8% +0.1 0.28 ± 8% +0.1 0.32 ± 11% perf-profile.children.cycles-pp.x86_64_start_reservations
0.10 ± 3% +0.1 0.20 ± 2% +0.1 0.20 ± 2% +0.0 0.14 ± 5% perf-profile.children.cycles-pp.tick_nohz_next_event
0.00 +0.1 0.10 ± 7% +0.1 0.11 ± 5% +0.1 0.11 ± 8% perf-profile.children.cycles-pp.__bitmap_and
0.06 ± 6% +0.1 0.16 ± 4% +0.1 0.16 ± 2% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.__get_next_timer_interrupt
0.48 +0.1 0.58 ± 2% +0.1 0.58 ± 2% +0.1 0.60 ± 2% perf-profile.children.cycles-pp._raw_spin_lock
5.91 ± 2% +0.1 6.01 ± 2% +0.1 5.99 ± 3% -0.5 5.38 ± 3% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
0.16 ± 3% +0.1 0.28 +0.1 0.28 ± 2% +0.0 0.20 ± 3% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.61 +0.1 0.73 ± 3% +0.1 0.74 -0.0 0.59 ± 3% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.28 ± 2% +0.1 0.42 ± 4% +0.2 0.46 ± 4% +0.5 0.78 ± 5% perf-profile.children.cycles-pp.finish_task_switch
0.00 +0.2 0.15 ± 3% +0.2 0.15 ± 5% +0.0 0.00 perf-profile.children.cycles-pp.ct_kernel_enter
0.00 +0.2 0.16 ± 3% +0.2 0.16 ± 5% +0.0 0.00 perf-profile.children.cycles-pp.ct_idle_exit
0.02 ± 99% +0.2 0.24 ± 2% +0.2 0.24 ± 4% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.ct_kernel_exit_state
5.84 ± 2% +0.3 6.09 +0.2 6.07 ± 3% -0.4 5.42 ± 3% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.37 ± 4% +0.4 0.73 ± 2% +0.4 0.78 ± 3% +0.6 0.96 ± 3% perf-profile.children.cycles-pp.switch_mm_irqs_off
2.22 +0.5 2.76 +0.6 2.86 ± 2% +1.1 3.30 ± 2% perf-profile.children.cycles-pp.__schedule
0.73 ± 2% +0.6 1.31 ± 2% +0.6 1.38 ± 3% +1.2 1.90 ± 3% perf-profile.children.cycles-pp.schedule_idle
9.24 +1.0 10.25 +1.0 10.21 +0.8 9.99 perf-profile.children.cycles-pp.intel_idle
18.85 +6.3 25.12 +6.4 25.25 +11.3 30.15 perf-profile.children.cycles-pp.asm_sysvec_call_function
19.52 +7.1 26.62 +7.0 26.53 +10.8 30.33 perf-profile.children.cycles-pp.cpuidle_enter_state
19.53 +7.1 26.63 +7.0 26.55 +10.8 30.34 perf-profile.children.cycles-pp.cpuidle_enter
20.22 +7.2 27.47 +7.2 27.38 +10.7 30.94 perf-profile.children.cycles-pp.cpuidle_idle_call
14.43 +7.8 22.23 +8.0 22.41 +13.9 28.34 perf-profile.children.cycles-pp.sysvec_call_function
13.73 +7.9 21.59 +8.0 21.77 +13.8 27.55 perf-profile.children.cycles-pp.__sysvec_call_function
21.51 +7.9 29.40 +7.9 29.40 +11.8 33.28 perf-profile.children.cycles-pp.start_secondary
21.72 +8.0 29.69 +8.0 29.68 +11.9 33.59 perf-profile.children.cycles-pp.do_idle
21.72 +8.0 29.70 +8.0 29.69 +11.9 33.60 perf-profile.children.cycles-pp.common_startup_64
21.72 +8.0 29.70 +8.0 29.69 +11.9 33.60 perf-profile.children.cycles-pp.cpu_startup_entry
14.36 +8.1 22.42 +8.2 22.61 +14.0 28.33 perf-profile.children.cycles-pp.__flush_smp_call_function_queue
3.58 +9.4 12.97 +9.5 13.09 +16.0 19.62 perf-profile.children.cycles-pp.flush_tlb_func
7.43 ± 2% -3.8 3.66 -3.8 3.58 ± 4% -5.8 1.59 ± 5% perf-profile.self.cycles-pp.intel_idle_irq
16.93 -2.9 13.99 ± 2% -2.9 14.07 -4.3 12.59 perf-profile.self.cycles-pp.llist_add_batch
15.11 -1.7 13.38 ± 2% -2.0 13.09 ± 2% -4.0 11.09 ± 3% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
8.01 -1.4 6.57 -1.4 6.62 -2.0 5.97 perf-profile.self.cycles-pp.llist_reverse_order
1.44 -0.2 1.19 -0.2 1.20 -0.2 1.19 perf-profile.self.cycles-pp.__irqentry_text_end
1.69 -0.2 1.50 -0.2 1.51 -0.3 1.37 perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
0.24 ± 6% -0.1 0.10 ± 10% -0.2 0.09 ± 15% -0.2 0.00 perf-profile.self.cycles-pp.poll_idle
0.78 -0.1 0.66 -0.1 0.66 ± 2% -0.1 0.65 ± 2% perf-profile.self.cycles-pp.error_entry
0.87 -0.1 0.76 -0.1 0.76 -0.0 0.87 perf-profile.self.cycles-pp.native_irq_return_iret
0.65 ± 2% -0.1 0.54 ± 2% -0.1 0.53 -0.2 0.50 perf-profile.self.cycles-pp.testcase
0.46 ± 2% -0.1 0.37 ± 5% -0.1 0.35 ± 3% -0.1 0.32 ± 4% perf-profile.self.cycles-pp.down_read
0.38 ± 8% -0.1 0.29 ± 16% -0.1 0.33 ± 21% -0.1 0.29 ± 17% perf-profile.self.cycles-pp.mas_walk
0.41 ± 3% -0.1 0.32 ± 4% -0.1 0.32 ± 3% -0.1 0.29 ± 2% perf-profile.self.cycles-pp.page_counter_cancel
0.32 ± 12% -0.1 0.24 ± 4% -0.0 0.28 ± 14% -0.0 0.27 ± 10% perf-profile.self.cycles-pp.zap_page_range_single
0.46 ± 2% -0.1 0.38 ± 3% -0.1 0.37 ± 3% -0.1 0.33 ± 4% perf-profile.self.cycles-pp.up_read
0.28 ± 5% -0.1 0.21 ± 5% -0.1 0.22 ± 6% -0.1 0.20 ± 5% perf-profile.self.cycles-pp.tlb_finish_mmu
0.56 -0.1 0.49 ± 2% -0.1 0.48 ± 2% -0.1 0.41 ± 2% perf-profile.self.cycles-pp.rwsem_down_read_slowpath
0.32 ± 11% -0.1 0.25 ± 16% -0.0 0.27 ± 11% -0.1 0.23 ± 15% perf-profile.self.cycles-pp.lock_vma_under_rcu
0.30 ± 2% -0.1 0.24 -0.1 0.22 ± 2% -0.1 0.18 ± 2% perf-profile.self.cycles-pp.menu_select
0.33 ± 6% -0.1 0.27 ± 7% -0.0 0.28 ± 6% +0.1 0.41 ± 5% perf-profile.self.cycles-pp.flush_tlb_mm_range
0.46 -0.1 0.41 -0.0 0.42 ± 2% -0.1 0.38 ± 2% perf-profile.self.cycles-pp.native_flush_tlb_local
0.34 -0.1 0.29 ± 2% -0.1 0.28 ± 2% -0.0 0.29 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.32 -0.0 0.27 -0.0 0.28 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.18 ± 9% -0.0 0.13 ± 12% -0.0 0.16 ± 28% -0.0 0.13 ± 19% perf-profile.self.cycles-pp.__handle_mm_fault
0.22 ± 5% -0.0 0.18 ± 7% -0.0 0.20 ± 8% -0.1 0.17 ± 5% perf-profile.self.cycles-pp.tlb_gather_mmu
0.24 -0.0 0.20 ± 3% -0.0 0.20 ± 2% -0.1 0.18 ± 3% perf-profile.self.cycles-pp.sync_regs
0.12 ± 9% -0.0 0.08 ± 9% -0.0 0.09 ± 13% -0.0 0.08 ± 7% perf-profile.self.cycles-pp.flush_tlb_batched_pending
0.14 ± 4% -0.0 0.11 ± 9% -0.0 0.12 ± 8% -0.0 0.11 ± 6% perf-profile.self.cycles-pp.do_madvise
0.22 ± 2% -0.0 0.18 ± 2% -0.0 0.18 ± 3% -0.0 0.17 ± 3% perf-profile.self.cycles-pp.lru_gen_del_folio
0.21 ± 2% -0.0 0.17 ± 4% -0.0 0.17 ± 2% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.13 ± 7% -0.0 0.10 ± 6% -0.0 0.11 ± 10% -0.0 0.10 ± 6% perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave
0.19 ± 4% -0.0 0.17 ± 6% -0.0 0.16 ± 7% -0.0 0.14 ± 6% perf-profile.self.cycles-pp.down_read_trylock
0.09 ± 5% -0.0 0.06 -0.0 0.06 ± 6% -0.0 0.06 ± 8% perf-profile.self.cycles-pp.call_function_single_prep_ipi
0.20 ± 3% -0.0 0.17 -0.0 0.17 ± 2% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.native_sched_clock
0.13 ± 4% -0.0 0.10 ± 4% -0.0 0.11 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.15 ± 2% -0.0 0.13 ± 3% -0.0 0.13 ± 4% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.___perf_sw_event
0.11 ± 4% -0.0 0.08 ± 5% -0.0 0.08 ± 4% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.do_user_addr_fault
0.22 ± 2% -0.0 0.20 ± 5% -0.0 0.20 ± 3% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.lru_gen_add_folio
0.19 ± 3% -0.0 0.17 ± 4% -0.0 0.17 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.folio_batch_move_lru
0.22 -0.0 0.20 ± 3% -0.0 0.20 ± 2% -0.0 0.18 ± 3% perf-profile.self.cycles-pp.folios_put_refs
0.14 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.irqtime_account_irq
0.09 ± 5% -0.0 0.07 -0.0 0.07 ± 4% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.__madvise
0.12 ± 4% -0.0 0.10 -0.0 0.10 ± 4% -0.0 0.10 ± 5% perf-profile.self.cycles-pp.rwsem_mark_wake
0.20 ± 4% -0.0 0.19 ± 4% -0.0 0.18 ± 3% -0.1 0.15 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.09 ± 5% -0.0 0.07 ± 6% -0.0 0.07 ± 4% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.read_tsc
0.11 ± 8% -0.0 0.09 ± 19% -0.0 0.10 ± 16% -0.0 0.08 ± 9% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.09 -0.0 0.08 ± 5% -0.0 0.07 ± 5% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.clear_page_erms
0.07 -0.0 0.06 ± 6% -0.0 0.06 -0.0 0.05 perf-profile.self.cycles-pp.native_apic_mem_eoi
0.06 ± 7% -0.0 0.05 ± 9% -0.0 0.05 ± 29% -0.0 0.05 ± 35% perf-profile.self.cycles-pp.mm_cid_get
0.10 ± 6% -0.0 0.09 ± 4% -0.0 0.09 ± 5% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.sysvec_call_function
0.10 -0.0 0.09 -0.0 0.09 ± 4% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.asm_sysvec_call_function
0.06 -0.0 0.05 -0.0 0.05 ± 8% -0.0 0.05 ± 5% perf-profile.self.cycles-pp.handle_mm_fault
0.08 ± 4% -0.0 0.07 -0.0 0.07 ± 3% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.__switch_to_asm
0.06 -0.0 0.05 ± 7% -0.0 0.05 -0.0 0.04 ± 50% perf-profile.self.cycles-pp.free_pages_and_swap_cache
0.07 -0.0 0.06 ± 7% -0.0 0.06 ± 7% -0.0 0.06 perf-profile.self.cycles-pp.sched_ttwu_pending
0.06 ± 6% -0.0 0.05 ± 9% -0.0 0.06 ± 31% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.ktime_get
0.00 +0.0 0.00 +0.0 0.00 +0.1 0.14 ± 4% perf-profile.self.cycles-pp.tick_nohz_irq_exit
0.09 ± 6% +0.0 0.09 ± 4% +0.0 0.09 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.sched_core_idle_cpu
0.05 ± 7% +0.0 0.07 ± 5% +0.0 0.07 ± 3% +0.0 0.06 perf-profile.self.cycles-pp.__switch_to
0.17 ± 3% +0.0 0.19 ± 2% +0.0 0.19 ± 2% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.cpuidle_enter_state
2.10 +0.0 2.14 +0.1 2.17 -0.2 1.92 perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.06 +0.1 0.11 ± 3% +0.1 0.12 ± 4% +0.0 0.07 ± 6% perf-profile.self.cycles-pp.cpuidle_idle_call
0.02 ±141% +0.1 0.07 ± 5% +0.1 0.07 ± 6% +0.0 0.05 perf-profile.self.cycles-pp.do_idle
0.00 +0.1 0.06 ± 6% +0.1 0.06 ± 9% +0.0 0.00 perf-profile.self.cycles-pp.call_cpuidle
0.48 +0.1 0.56 ± 4% +0.1 0.57 ± 2% +0.1 0.54 perf-profile.self.cycles-pp._find_next_bit
0.00 +0.1 0.09 ± 5% +0.1 0.10 ± 6% +0.1 0.09 ± 9% perf-profile.self.cycles-pp.__bitmap_and
0.38 ± 3% +0.1 0.49 ± 2% +0.1 0.49 ± 2% +0.1 0.52 ± 2% perf-profile.self.cycles-pp._raw_spin_lock
0.28 +0.1 0.39 ± 3% +0.1 0.38 ± 2% +0.0 0.32 ± 2% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.01 ±223% +0.2 0.24 ± 3% +0.2 0.24 ± 3% +0.1 0.07 ± 6% perf-profile.self.cycles-pp.ct_kernel_exit_state
0.36 ± 4% +0.4 0.73 ± 2% +0.4 0.77 ± 3% +0.6 0.95 ± 3% perf-profile.self.cycles-pp.switch_mm_irqs_off
9.24 +1.0 10.25 +1.0 10.21 +0.8 9.99 perf-profile.self.cycles-pp.intel_idle
15.13 +1.2 16.34 +1.1 16.20 +2.0 17.11 perf-profile.self.cycles-pp.smp_call_function_many_cond
3.07 +9.5 12.52 +9.6 12.63 +16.1 19.20 perf-profile.self.cycles-pp.flush_tlb_func
[2]
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/priority/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled:
gcc-12/performance/x86_64-rhel-9.4/1/32/1/debian-12-x86_64-20240206.cgz/300/lkp-icl-2sp4/swap-w-seq-mt/vm-scalability/always/never
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
2815a56e4b ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU")
40036730a9 ("x86,mm: only trim the mm_cpumask once a second")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 2815a56e4b7252a836969f5674e 40036730a9566a8abe36ffe2bf4
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
368.00 ±114% +237.8% 1243 ± 32% +253.5% 1300 ± 22% +288.3% 1429 ± 24% perf-c2c.HITM.remote
3.016e+10 ± 5% +23.7% 3.732e+10 +23.3% 3.717e+10 +22.4% 3.69e+10 cpuidle..time
2394210 ± 5% +1598.8% 40671711 +1582.1% 40273563 +1598.2% 40659360 cpuidle..usage
280.01 ± 3% +24.8% 349.50 +24.2% 347.79 +23.4% 345.49 uptime.boot
27411 ± 3% +21.2% 33234 +20.4% 32994 +19.2% 32681 uptime.idle
73.97 -2.3% 72.29 -2.5% 72.14 -2.8% 71.93 iostat.cpu.idle
23.61 -7.6% 21.82 -7.4% 21.87 -7.0% 21.96 iostat.cpu.iowait
2.11 ± 2% +169.7% 5.68 ± 4% +174.3% 5.78 ± 2% +180.1% 5.90 iostat.cpu.system
0.26 ± 3% -0.0 0.23 ± 2% -0.0 0.23 ± 2% -0.0 0.23 mpstat.cpu.all.irq%
0.08 -0.0 0.04 ± 4% -0.0 0.04 ± 8% -0.0 0.03 ± 5% mpstat.cpu.all.soft%
1.78 ± 2% +3.7 5.45 ± 4% +3.8 5.55 ± 2% +3.9 5.66 mpstat.cpu.all.sys%
0.31 ± 7% -0.1 0.21 ± 5% -0.1 0.21 ± 4% -0.1 0.21 mpstat.cpu.all.usr%
16661751 ± 42% -59.8% 6694161 ± 70% -29.8% 11689134 ± 53% -43.7% 9376628 ± 67% numa-numastat.node0.numa_miss
16734663 ± 41% -59.5% 6770170 ± 69% -29.9% 11733904 ± 53% -43.5% 9461102 ± 67% numa-numastat.node0.other_node
26857269 ± 23% -60.2% 10694204 ± 50% -39.9% 16138920 ± 41% -48.3% 13893649 ± 47% numa-numastat.node1.local_node
16665351 ± 42% -59.8% 6694094 ± 70% -29.9% 11678532 ± 53% -43.7% 9378675 ± 67% numa-numastat.node1.numa_foreign
26918278 ± 23% -60.1% 10751098 ± 49% -39.8% 16205404 ± 41% -48.2% 13955632 ± 47% numa-numastat.node1.numa_hit
368.92 ± 36% -55.2% 165.39 ± 74% -25.3% 275.44 ± 51% -43.7% 207.84 ± 67% vmstat.io.bi
409795 -51.0% 200717 ± 3% -51.3% 199570 ± 7% -52.2% 195922 ± 5% vmstat.io.bo
4.14 ± 7% +100.0% 8.28 ± 6% +108.5% 8.63 ± 3% +113.2% 8.82 ± 2% vmstat.procs.r
359.98 ± 37% -56.0% 158.48 ± 77% -25.4% 268.51 ± 52% -44.2% 200.87 ± 70% vmstat.swap.si
409786 -51.0% 200710 ± 3% -51.3% 199563 ± 7% -52.2% 195915 ± 5% vmstat.swap.so
5382 -28.9% 3825 ± 2% -29.6% 3788 ± 3% -30.1% 3764 ± 2% vmstat.system.cs
339018 ± 2% -33.0% 227081 ± 3% -32.6% 228406 -31.1% 233426 vmstat.system.in
54162177 ± 11% -32.5% 36537092 ± 17% -21.5% 42515388 ± 26% -29.6% 38135635 ± 32% meminfo.Active
54162037 ± 11% -32.5% 36536947 ± 17% -21.5% 42515235 ± 26% -29.6% 38135282 ± 32% meminfo.Active(anon)
66576747 ± 9% +24.3% 82748036 ± 9% +16.2% 77343432 ± 14% +24.2% 82686648 ± 14% meminfo.Inactive
66575517 ± 9% +24.3% 82746881 ± 9% +16.2% 77342282 ± 14% +24.2% 82685451 ± 14% meminfo.Inactive(anon)
333831 -11.8% 294280 -11.6% 295003 ± 2% -11.3% 296156 meminfo.PageTables
33487 ± 3% +199.2% 100210 ± 26% +184.1% 95125 ± 12% +186.0% 95758 ± 4% meminfo.Shmem
1.627e+08 +11.4% 1.812e+08 +11.5% 1.814e+08 +11.7% 1.818e+08 meminfo.SwapFree
1644 ± 11% -41.9% 955.75 ± 10% -41.9% 954.97 ± 10% -44.0% 920.58 ± 8% meminfo.Writeback
239.68 ± 5% +28.4% 307.63 +28.4% 307.85 +28.5% 307.88 time.elapsed_time
239.68 ± 5% +28.4% 307.63 +28.4% 307.85 +28.5% 307.88 time.elapsed_time.max
6297 ± 5% +60.9% 10134 +63.0% 10265 +64.3% 10345 time.involuntary_context_switches
62687446 -24.8% 47163918 ± 2% -24.9% 47061196 ± 3% -25.7% 46587420 time.minor_page_faults
224.00 ± 3% +166.7% 597.33 ± 3% +170.7% 606.31 ± 2% +174.0% 613.70 ± 2% time.percent_of_cpu_this_job_got
474.22 ± 3% +276.7% 1786 ± 3% +282.9% 1815 ± 2% +288.1% 1840 ± 2% time.system_time
63.58 -17.2% 52.64 ± 3% -17.8% 52.24 ± 5% -20.1% 50.77 ± 2% time.user_time
347556 ± 6% -26.4% 255772 ± 5% -28.1% 249724 ± 5% -29.4% 245293 ± 4% time.voluntary_context_switches
155577 ± 2% -7.2% 144386 ± 3% -9.3% 141161 ± 2% -8.4% 142466 ± 2% numa-meminfo.node0.PageTables
12758 ± 8% +171.6% 34650 ± 68% +117.5% 27743 ± 38% +96.0% 25011 ± 12% numa-meminfo.node0.Shmem
31281633 ± 6% -37.6% 19515816 ± 25% -36.1% 19979648 ± 27% -33.2% 20896590 ± 34% numa-meminfo.node1.Active
31281573 ± 6% -37.6% 19515752 ± 25% -36.1% 19979597 ± 27% -33.2% 20896527 ± 34% numa-meminfo.node1.Active(anon)
28059820 ± 5% +40.6% 39461616 ± 15% +44.6% 40576407 ± 14% +42.1% 39867187 ± 18% numa-meminfo.node1.Inactive
28059215 ± 5% +40.6% 39461201 ± 15% +44.6% 40575754 ± 14% +42.1% 39867128 ± 18% numa-meminfo.node1.Inactive(anon)
178279 -16.5% 148899 ± 2% -14.1% 153180 ± 3% -14.1% 153217 ± 2% numa-meminfo.node1.PageTables
20873 ± 7% +215.2% 65784 ± 13% +225.2% 67888 ± 5% +240.4% 71058 ± 4% numa-meminfo.node1.Shmem
268530 ± 12% -14.6% 229358 ± 12% -19.3% 216639 ± 16% -24.1% 203946 ± 15% numa-meminfo.node1.Slab
32068 ± 55% -54.9% 14471 ± 69% -22.1% 24988 ± 58% -46.1% 17280 ± 75% numa-meminfo.node1.SwapCached
1296 ± 6% -49.8% 650.40 ± 9% -50.0% 647.65 ± 10% -51.7% 626.11 ± 7% numa-meminfo.node1.Writeback
38311 ± 5% -40.8% 22667 -41.1% 22583 ± 2% -41.7% 22338 vm-scalability.median
1234132 ± 4% -40.7% 732265 -40.8% 730989 ± 3% -41.3% 724324 vm-scalability.throughput
239.68 ± 5% +28.4% 307.63 +28.4% 307.85 +28.5% 307.88 vm-scalability.time.elapsed_time
239.68 ± 5% +28.4% 307.63 +28.4% 307.85 +28.5% 307.88 vm-scalability.time.elapsed_time.max
6297 ± 5% +60.9% 10134 +63.0% 10265 +64.3% 10345 vm-scalability.time.involuntary_context_switches
62687446 -24.8% 47163918 ± 2% -24.9% 47061196 ± 3% -25.7% 46587420 vm-scalability.time.minor_page_faults
224.00 ± 3% +166.7% 597.33 ± 3% +170.7% 606.31 ± 2% +174.0% 613.70 ± 2% vm-scalability.time.percent_of_cpu_this_job_got
474.22 ± 3% +276.7% 1786 ± 3% +282.9% 1815 ± 2% +288.1% 1840 ± 2% vm-scalability.time.system_time
63.58 -17.2% 52.64 ± 3% -17.8% 52.24 ± 5% -20.1% 50.77 ± 2% vm-scalability.time.user_time
347556 ± 6% -26.4% 255772 ± 5% -28.1% 249724 ± 5% -29.4% 245293 ± 4% vm-scalability.time.voluntary_context_switches
2.821e+08 -22.0% 2.2e+08 -22.2% 2.196e+08 ± 2% -22.8% 2.177e+08 vm-scalability.workload
38829 ± 2% -7.3% 35978 ± 3% -9.5% 35150 ± 3% -8.5% 35527 ± 2% numa-vmstat.node0.nr_page_table_pages
3197 ± 8% +172.3% 8706 ± 67% +117.4% 6953 ± 37% +97.2% 6307 ± 12% numa-vmstat.node0.nr_shmem
4670410 ± 6% -22.0% 3641951 ± 8% -27.6% 3382558 ± 4% -29.7% 3285604 ± 7% numa-vmstat.node0.nr_vmscan_write
9127776 ± 5% -21.5% 7169554 ± 5% -25.3% 6819419 ± 4% -26.5% 6711641 ± 6% numa-vmstat.node0.nr_written
16661751 ± 42% -59.8% 6694161 ± 70% -29.8% 11689134 ± 53% -43.7% 9376628 ± 67% numa-vmstat.node0.numa_miss
16734663 ± 41% -59.5% 6770170 ± 69% -29.9% 11733904 ± 53% -43.5% 9461102 ± 67% numa-vmstat.node0.numa_other
7829198 ± 6% -36.9% 4941489 ± 24% -35.9% 5014994 ± 27% -32.8% 5264597 ± 33% numa-vmstat.node1.nr_active_anon
718935 ± 14% +53.9% 1106567 ± 31% +32.7% 954186 ± 37% +15.5% 830207 ± 15% numa-vmstat.node1.nr_free_pages
6977014 ± 5% +39.1% 9704494 ± 15% +44.1% 10053122 ± 14% +41.4% 9863333 ± 18% numa-vmstat.node1.nr_inactive_anon
44508 ± 2% -16.6% 37117 ± 2% -14.3% 38152 ± 3% -14.1% 38219 ± 2% numa-vmstat.node1.nr_page_table_pages
5222 ± 8% +218.1% 16612 ± 13% +225.6% 17003 ± 5% +242.6% 17889 ± 4% numa-vmstat.node1.nr_shmem
8026 ± 55% -55.0% 3611 ± 69% -22.4% 6228 ± 58% -46.0% 4336 ± 74% numa-vmstat.node1.nr_swapcached
8007802 ± 6% -47.6% 4196794 ± 7% -46.6% 4278458 ± 8% -48.7% 4108232 ± 6% numa-vmstat.node1.nr_vmscan_write
352.06 ± 7% -50.7% 173.73 ± 16% -54.4% 160.39 ± 16% -55.8% 155.72 ± 9% numa-vmstat.node1.nr_writeback
15775556 ± 6% -45.5% 8590752 ± 5% -44.3% 8782043 ± 9% -46.0% 8516460 ± 5% numa-vmstat.node1.nr_written
7829176 ± 6% -36.9% 4941484 ± 24% -35.9% 5014989 ± 27% -32.8% 5264593 ± 33% numa-vmstat.node1.nr_zone_active_anon
6977031 ± 5% +39.1% 9704497 ± 15% +44.1% 10053126 ± 14% +41.4% 9863337 ± 18% numa-vmstat.node1.nr_zone_inactive_anon
346.80 ± 7% -49.9% 173.73 ± 16% -53.9% 159.98 ± 16% -55.1% 155.79 ± 9% numa-vmstat.node1.nr_zone_write_pending
16665351 ± 42% -59.8% 6694094 ± 70% -29.9% 11678532 ± 53% -43.7% 9378675 ± 67% numa-vmstat.node1.numa_foreign
26917054 ± 23% -60.1% 10749940 ± 49% -39.8% 16204771 ± 41% -48.2% 13954353 ± 47% numa-vmstat.node1.numa_hit
26856045 ± 23% -60.2% 10693045 ± 50% -39.9% 16138287 ± 41% -48.3% 13892370 ± 47% numa-vmstat.node1.numa_local
23.52 ± 13% -18.5% 19.16 ± 24% -16.4% 19.66 ± 18% -28.6% 16.78 ± 15% sched_debug.cfs_rq:/.load_avg.avg
427.78 ± 20% +93.8% 828.91 ± 64% +36.8% 585.39 ± 15% +33.6% 571.37 ± 17% sched_debug.cfs_rq:/.load_avg.max
13.46 ± 23% -63.0% 4.98 ± 47% -50.7% 6.64 ± 46% -67.7% 4.35 ± 59% sched_debug.cfs_rq:/.removed.load_avg.avg
59.42 ± 17% -48.6% 30.57 ± 34% -42.0% 34.44 ± 23% -50.7% 29.27 ± 57% sched_debug.cfs_rq:/.removed.load_avg.stddev
152.11 ± 26% -36.2% 96.98 ± 50% -35.2% 98.50 ± 14% -29.9% 106.61 ± 73% sched_debug.cfs_rq:/.removed.runnable_avg.max
152.11 ± 26% -36.3% 96.87 ± 50% -35.3% 98.34 ± 15% -29.9% 106.61 ± 73% sched_debug.cfs_rq:/.removed.util_avg.max
81.74 ± 14% +11.4% 91.02 ± 5% +24.1% 101.46 ± 8% +21.6% 99.42 ± 6% sched_debug.cfs_rq:/.runnable_avg.avg
114.05 ± 16% +23.9% 141.35 ± 2% +31.5% 150.02 ± 5% +30.6% 149.00 ± 5% sched_debug.cfs_rq:/.runnable_avg.stddev
81.31 ± 14% +11.5% 90.70 ± 5% +24.2% 100.99 ± 8% +21.8% 99.07 ± 6% sched_debug.cfs_rq:/.util_avg.avg
113.70 ± 16% +23.9% 140.92 ± 2% +31.5% 149.52 ± 5% +30.7% 148.55 ± 4% sched_debug.cfs_rq:/.util_avg.stddev
10.59 ± 25% +131.9% 24.56 ± 10% +142.6% 25.70 ± 29% +125.1% 23.85 ± 18% sched_debug.cfs_rq:/.util_est.avg
49.96 ± 28% +71.7% 85.76 ± 5% +77.0% 88.41 ± 16% +74.7% 87.25 ± 9% sched_debug.cfs_rq:/.util_est.stddev
130266 ± 19% +38.9% 180973 ± 8% +28.9% 167881 ± 7% +29.2% 168240 ± 7% sched_debug.cpu.clock.avg
130457 ± 19% +38.9% 181208 ± 8% +28.9% 168141 ± 7% +29.1% 168445 ± 7% sched_debug.cpu.clock.max
130028 ± 19% +38.9% 180638 ± 8% +28.8% 167497 ± 7% +29.2% 167949 ± 7% sched_debug.cpu.clock.min
129816 ± 19% +39.0% 180459 ± 8% +29.0% 167404 ± 7% +29.2% 167757 ± 7% sched_debug.cpu.clock_task.avg
130389 ± 19% +38.9% 181122 ± 8% +28.9% 168056 ± 7% +29.1% 168374 ± 7% sched_debug.cpu.clock_task.max
121562 ± 20% +40.5% 170799 ± 8% +29.8% 157738 ± 8% +30.2% 158264 ± 8% sched_debug.cpu.clock_task.min
573.18 ± 25% +47.9% 847.53 ± 8% +28.1% 734.25 ± 11% +39.9% 801.78 ± 12% sched_debug.cpu.nr_switches.min
4.07 ± 14% +43.8% 5.86 ± 5% +41.2% 5.75 ± 10% +39.6% 5.68 ± 12% sched_debug.cpu.nr_uninterruptible.stddev
130026 ± 19% +38.9% 180621 ± 8% +28.8% 167481 ± 7% +29.2% 167941 ± 7% sched_debug.cpu_clk
129318 ± 19% +39.1% 179912 ± 8% +29.0% 166774 ± 7% +29.3% 167233 ± 8% sched_debug.ktime
130797 ± 19% +38.7% 181392 ± 8% +28.6% 168261 ± 7% +29.0% 168713 ± 7% sched_debug.sched_clk
191035 ± 7% -29.3% 135009 ± 4% -29.8% 134046 ± 8% -31.5% 130777 ± 5% proc-vmstat.allocstall_movable
3850 ± 11% +78.5% 6872 ± 12% +69.7% 6532 ± 10% +79.2% 6898 ± 10% proc-vmstat.allocstall_normal
13525554 ± 10% -32.5% 9125751 ± 17% -21.4% 10625542 ± 26% -29.1% 9588869 ± 32% proc-vmstat.nr_active_anon
16631565 ± 8% +23.8% 20588362 ± 9% +15.8% 19252926 ± 14% +23.7% 20573150 ± 14% proc-vmstat.nr_inactive_anon
83457 -12.1% 73319 -11.8% 73637 ± 2% -11.3% 74009 proc-vmstat.nr_page_table_pages
8392 ± 3% +198.4% 25047 ± 26% +184.6% 23884 ± 12% +186.8% 24069 ± 4% proc-vmstat.nr_shmem
79380 -0.4% 79057 -0.3% 79108 -5.2% 75241 proc-vmstat.nr_slab_unreclaimable
12629057 ± 5% -39.1% 7691618 ± 5% -39.8% 7607004 ± 5% -41.8% 7348143 ± 5% proc-vmstat.nr_vmscan_write
440.92 ± 10% -42.0% 255.71 ± 15% -46.9% 234.18 ± 13% -47.8% 230.18 ± 6% proc-vmstat.nr_writeback
24903332 ± 5% -36.7% 15760306 ± 4% -37.4% 15601462 ± 7% -38.9% 15228101 ± 5% proc-vmstat.nr_written
13525564 ± 10% -32.5% 9125755 ± 17% -21.4% 10625546 ± 26% -29.1% 9588876 ± 32% proc-vmstat.nr_zone_active_anon
16631569 ± 8% +23.8% 20588365 ± 9% +15.8% 19252929 ± 14% +23.7% 20573151 ± 14% proc-vmstat.nr_zone_inactive_anon
443.01 ± 10% -42.0% 257.00 ± 16% -47.0% 234.79 ± 13% -47.4% 232.82 ± 6% proc-vmstat.nr_zone_write_pending
24485570 ± 3% -15.4% 20714438 ± 3% -15.2% 20753649 ± 3% -16.7% 20385384 ± 2% proc-vmstat.numa_foreign
39260606 ± 2% -30.1% 27457969 ± 4% -30.4% 27338587 ± 6% -30.4% 27320376 ± 3% proc-vmstat.numa_hit
39098081 ± 2% -30.1% 27325222 ± 4% -30.4% 27205934 ± 6% -30.6% 27133908 ± 3% proc-vmstat.numa_local
24482446 ± 3% -15.5% 20696329 ± 3% -15.2% 20764313 ± 3% -16.8% 20366343 ± 2% proc-vmstat.numa_miss
24643161 ± 3% -15.5% 20828939 ± 3% -15.2% 20886221 ± 3% -16.7% 20515986 ± 2% proc-vmstat.numa_other
7478080 ± 19% +140.2% 17959948 ± 8% +149.2% 18637853 ± 14% +134.4% 17526799 ± 11% proc-vmstat.numa_pte_updates
63140512 -24.7% 47553512 -24.7% 47523846 ± 3% -25.7% 46943355 ± 2% proc-vmstat.pgalloc_normal
63461017 -24.5% 47896127 ± 2% -24.7% 47801824 ± 3% -25.4% 47311970 proc-vmstat.pgfault
64134373 -24.6% 48331932 ± 2% -25.1% 48010799 ± 3% -25.7% 47630853 ± 2% proc-vmstat.pgfree
2796 ± 78% -70.9% 815.00 ± 50% -57.0% 1202 ± 64% -54.2% 1279 ±134% proc-vmstat.pgmigrate_fail
99615377 ± 5% -36.7% 63043276 ± 4% -37.4% 62407899 ± 7% -38.9% 60914455 ± 5% proc-vmstat.pgpgout
34932 ± 3% -7.8% 32198 ± 2% -8.1% 32104 ± 3% -8.6% 31924 ± 2% proc-vmstat.pgreuse
21507042 ± 5% -36.0% 13775181 ± 4% -36.7% 13623024 ± 7% -38.2% 13284109 ± 6% proc-vmstat.pgrotated
58427243 ± 10% -43.5% 32993860 ± 12% -39.1% 35582889 ± 16% -43.3% 33121601 ± 20% proc-vmstat.pgscan_anon
44324880 ± 10% -37.2% 27839440 ± 10% -34.2% 29186311 ± 14% -37.2% 27856917 ± 18% proc-vmstat.pgscan_direct
14102763 ± 23% -63.4% 5154838 ± 27% -54.6% 6396957 ± 40% -62.7% 5264972 ± 31% proc-vmstat.pgscan_kswapd
2666 ± 88% -90.7% 248.33 ±137% -78.1% 583.38 ± 97% -79.3% 551.70 ±123% proc-vmstat.pgskip_normal
24911061 ± 5% -36.7% 15767491 ± 4% -37.3% 15611299 ± 7% -38.8% 15235326 ± 5% proc-vmstat.pgsteal_anon
17074863 ± 8% -25.3% 12754191 ± 5% -26.2% 12608140 ± 7% -27.7% 12345693 ± 5% proc-vmstat.pgsteal_direct
7836517 ± 8% -61.5% 3013661 ± 7% -61.7% 3003472 ± 8% -63.1% 2889863 ± 8% proc-vmstat.pgsteal_kswapd
24903332 ± 5% -36.7% 15760306 ± 4% -37.4% 15601462 ± 7% -38.9% 15228101 ± 5% proc-vmstat.pswpout
703925 ± 38% -1.8% 690947 ±100% -3.0% 682699 ± 90% -49.2% 357704 ± 84% proc-vmstat.slabs_scanned
78185 ± 27% -82.8% 13463 ± 52% -74.5% 19910 ± 68% -80.9% 14968 ±114% proc-vmstat.workingset_nodereclaim
1.85 ± 4% -31.7% 1.26 -32.9% 1.24 ± 2% -32.8% 1.24 ± 2% perf-stat.i.MPKI
1.992e+09 ± 3% -18.9% 1.615e+09 ± 2% -17.6% 1.641e+09 ± 2% -17.0% 1.652e+09 perf-stat.i.branch-instructions
0.93 ± 6% +0.6 1.55 ± 3% +0.6 1.55 ± 3% +0.6 1.53 ± 3% perf-stat.i.branch-miss-rate%
14377927 ± 11% +29.7% 18645141 ± 5% +33.1% 19132687 +34.9% 19401230 perf-stat.i.branch-misses
13.97 ± 3% -9.0 4.95 -9.0 4.98 ± 2% -9.0 4.96 ± 3% perf-stat.i.cache-miss-rate%
15782867 ± 3% -34.3% 10364434 ± 2% -33.6% 10475081 ± 2% -33.7% 10462489 perf-stat.i.cache-misses
79049148 +92.6% 1.522e+08 ± 2% +93.5% 1.53e+08 +94.7% 1.539e+08 perf-stat.i.cache-references
5344 -29.2% 3783 ± 2% -29.8% 3752 ± 3% -30.2% 3727 ± 2% perf-stat.i.context-switches
1.31 ± 2% +316.3% 5.46 ± 3% +317.0% 5.47 ± 3% +325.9% 5.58 ± 2% perf-stat.i.cpi
8.392e+09 ± 3% +197.0% 2.492e+10 ± 3% +201.9% 2.534e+10 ± 2% +209.1% 2.594e+10 perf-stat.i.cpu-cycles
150.26 +14.1% 171.44 ± 3% +15.3% 173.26 ± 4% +15.2% 173.07 ± 4% perf-stat.i.cpu-migrations
737.89 ± 5% +500.7% 4432 ± 4% +514.1% 4531 ± 3% +524.0% 4604 ± 3% perf-stat.i.cycles-between-cache-misses
7.732e+09 ± 3% -17.2% 6.405e+09 ± 2% -15.9% 6.502e+09 ± 2% -15.3% 6.547e+09 perf-stat.i.instructions
0.80 -69.8% 0.24 ± 5% -69.9% 0.24 ± 3% -70.9% 0.23 ± 2% perf-stat.i.ipc
23.75 ± 27% -52.9% 11.19 ± 69% -23.7% 18.12 ± 42% -40.8% 14.07 ± 59% perf-stat.i.major-faults
2.55 ± 8% -38.4% 1.57 ± 4% -36.8% 1.61 ± 2% -36.8% 1.61 perf-stat.i.metric.K/sec
265295 ± 5% -42.5% 152670 ± 2% -41.6% 155041 ± 3% -41.9% 154115 perf-stat.i.minor-faults
265319 ± 5% -42.5% 152681 ± 2% -41.6% 155059 ± 3% -41.9% 154129 perf-stat.i.page-faults
2.04 ± 2% -20.6% 1.62 ± 2% -21.2% 1.61 -21.8% 1.60 perf-stat.overall.MPKI
0.72 ± 12% +0.4 1.15 ± 4% +0.4 1.17 ± 2% +0.5 1.18 perf-stat.overall.branch-miss-rate%
19.95 ± 2% -13.1 6.84 -13.1 6.82 ± 2% -13.2 6.77 ± 2% perf-stat.overall.cache-miss-rate%
1.09 ± 2% +257.6% 3.88 ± 3% +260.0% 3.91 ± 3% +265.6% 3.97 ± 2% perf-stat.overall.cpi
532.42 ± 2% +350.1% 2396 ± 4% +356.9% 2432 ± 3% +367.5% 2488 ± 2% perf-stat.overall.cycles-between-cache-misses
0.92 -72.0% 0.26 ± 4% -72.2% 0.26 ± 3% -72.6% 0.25 ± 2% perf-stat.overall.ipc
6551 ± 2% +38.5% 9072 +39.5% 9138 +40.9% 9230 perf-stat.overall.path-length
1.982e+09 ± 3% -18.5% 1.616e+09 -17.8% 1.629e+09 ± 2% -17.1% 1.642e+09 perf-stat.ps.branch-instructions
14325844 ± 11% +29.7% 18584702 ± 5% +33.0% 19054101 +34.9% 19329679 perf-stat.ps.branch-misses
15697779 ± 3% -33.9% 10379452 ± 2% -33.8% 10385651 ± 2% -33.8% 10384782 perf-stat.ps.cache-misses
78678984 +93.0% 1.518e+08 ± 2% +93.7% 1.524e+08 +94.9% 1.533e+08 perf-stat.ps.cache-references
5321 -29.1% 3771 ± 2% -29.7% 3740 ± 3% -30.2% 3714 ± 2% perf-stat.ps.context-switches
8.355e+09 ± 3% +197.6% 2.487e+10 ± 3% +202.1% 2.524e+10 ± 2% +209.2% 2.584e+10 perf-stat.ps.cpu-cycles
149.59 +14.2% 170.85 ± 3% +15.4% 172.66 ± 4% +15.3% 172.42 ± 4% perf-stat.ps.cpu-migrations
7.693e+09 ± 3% -16.8% 6.404e+09 -16.0% 6.459e+09 ± 2% -15.4% 6.508e+09 perf-stat.ps.instructions
23.73 ± 27% -52.9% 11.18 ± 69% -23.5% 18.15 ± 43% -40.4% 14.15 ± 59% perf-stat.ps.major-faults
263785 ± 5% -41.9% 153177 -41.8% 153437 ± 3% -42.1% 152747 perf-stat.ps.minor-faults
263809 ± 5% -41.9% 153188 -41.8% 153455 ± 3% -42.1% 152761 perf-stat.ps.page-faults
1.848e+12 ± 2% +8.0% 1.995e+12 +8.5% 2.006e+12 +8.7% 2.009e+12 perf-stat.total.instructions
0.09 ± 3% +316.6% 0.37 ±135% +154.1% 0.22 ±143% +168.1% 0.24 ±113% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.04 ± 15% -34.4% 0.02 ± 16% -38.3% 0.02 ± 44% -45.9% 0.02 ± 17% perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.05 ± 17% +600.7% 0.34 ±172% +199.1% 0.15 ±192% +46.9% 0.07 ± 19% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.01 ± 26% +2198.6% 0.27 ±152% +15606.1% 1.83 ±182% +796.6% 0.10 ± 13% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.06 ± 8% +41.6% 0.09 ± 17% +34.7% 0.09 ± 16% +73.4% 0.11 ± 24% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.07 ± 8% +36.7% 0.10 ± 15% +16.9% 0.08 ± 12% +28.5% 0.09 ± 9% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1.18 ± 45% +14663.2% 173.84 ±219% +5633.0% 67.51 ±366% +9341.9% 111.18 ±275% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.16 ± 7% +81.8% 0.29 ± 22% +7048.0% 11.56 ±377% +3327.9% 5.54 ±286% perf-sched.sch_delay.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
0.13 ± 13% -55.3% 0.06 ± 83% -55.6% 0.06 ±104% -63.2% 0.05 ±124% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.folio_alloc_swap.add_to_swap.shrink_folio_list
0.18 ± 11% +8754.6% 15.60 ±219% +3486.4% 6.32 ±370% +51.3% 0.27 ± 10% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.35 ±107% +2644.1% 256.70 ±154% +1144.2% 116.39 ±206% +201.1% 28.16 ± 98% perf-sched.sch_delay.max.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
0.15 ± 25% +100.0% 0.31 ± 52% +78.6% 0.27 ± 25% +250.8% 0.54 ± 98% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.17 ± 10% +74.6% 0.30 ± 15% +48.4% 0.26 ± 16% +56.9% 0.27 ± 11% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.16 ± 12% +391.4% 0.80 ±148% +11691.7% 19.20 ±272% +3960.3% 6.61 ±162% perf-sched.sch_delay.max.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
0.11 ± 94% +1386.1% 1.62 ± 66% +2315.9% 2.63 ±235% +998.6% 1.20 ± 66% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.15 ± 18% +45.5% 0.22 ± 13% +42.4% 0.22 ± 19% +48.2% 0.23 ± 17% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
87.69 ± 2% +57.5% 138.09 ± 4% +64.0% 143.78 ± 6% +69.6% 148.70 ± 5% perf-sched.total_wait_and_delay.average.ms
87.52 ± 2% +57.6% 137.91 ± 4% +64.1% 143.59 ± 6% +69.8% 148.60 ± 5% perf-sched.total_wait_time.average.ms
5.16 ± 8% +20.1% 6.20 ± 15% +9.7% 5.66 ± 14% +17.1% 6.04 ± 14% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
7.23 ±142% +493.8% 42.93 ± 11% +581.8% 49.29 ± 12% +582.5% 49.34 ± 10% perf-sched.wait_and_delay.avg.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
89.03 ± 56% -97.9% 1.83 ±152% -89.9% 9.00 ±166% -92.9% 6.31 ± 87% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.__folio_lock_or_retry.do_swap_page
21.44 ± 3% +88.0% 40.32 ± 7% +102.6% 43.43 ± 7% +101.0% 43.09 ± 6% perf-sched.wait_and_delay.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
383.35 ± 3% +8.2% 414.60 ± 3% +6.6% 408.50 ± 3% +8.8% 417.24 ± 3% perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
40.29 ± 34% +602.5% 283.08 ± 58% +804.7% 364.55 ± 27% +863.0% 388.03 ± 6% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
4.06 -100.0% 0.00 -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
338.75 ± 23% -65.9% 115.54 ± 72% -59.4% 137.63 ± 81% -51.9% 162.94 ± 66% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
20.91 ± 4% +64.7% 34.43 ± 6% +80.9% 37.82 ± 8% +82.4% 38.12 ± 7% perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
5.97 ± 8% -27.5% 4.33 -26.1% 4.42 ± 3% -26.0% 4.42 ± 3% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
527.31 ± 2% +20.9% 637.26 ± 10% +18.4% 624.19 ± 5% +21.8% 642.29 ± 7% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
159.81 ± 2% +78.2% 284.75 ± 9% +76.7% 282.46 ± 11% +83.7% 293.61 ± 10% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
640.33 ± 11% +33.5% 854.67 ± 16% +25.1% 800.88 ± 17% +44.2% 923.10 ± 19% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
26.83 ±141% +777.6% 235.50 ± 20% +724.8% 221.31 ± 21% +829.4% 249.40 ± 17% perf-sched.wait_and_delay.count.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
5.00 +43.3% 7.17 ± 12% +33.8% 6.69 ± 15% +48.0% 7.40 ± 19% perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
7206 ± 4% -28.5% 5149 ± 12% -40.5% 4290 ± 14% -30.5% 5010 ± 23% perf-sched.wait_and_delay.count.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
8.67 ± 10% +38.5% 12.00 ± 16% +29.8% 11.25 ± 18% +45.4% 12.60 ± 21% perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
1392 ± 25% -57.1% 596.67 ±113% -82.4% 245.19 ±158% -88.2% 164.00 ± 26% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
160.17 ± 10% -100.0% 0.00 -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
112.17 ± 33% +279.8% 426.00 ± 13% +241.2% 382.69 ± 26% +270.8% 415.90 ± 38% perf-sched.wait_and_delay.count.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
639.00 ± 11% +120.6% 1409 ± 18% +97.7% 1263 ± 16% +128.2% 1458 ± 22% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
15.52 ±141% +4400.0% 698.48 ± 63% +5577.1% 881.17 ± 46% +5774.3% 911.78 ± 31% perf-sched.wait_and_delay.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
3425 ± 44% -99.4% 22.13 ±141% -92.9% 243.04 ±282% -98.5% 50.38 ± 86% perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.__folio_lock_or_retry.do_swap_page
1212 ± 4% +81.2% 2197 ± 12% +100.1% 2426 ± 23% +142.1% 2935 ± 37% perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
6.49 ± 46% -100.0% 0.00 -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
499.99 +0.0% 500.07 +50.7% 753.50 ±109% +255.1% 1775 ± 91% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
59.14 ± 24% +280.8% 225.17 ±153% +178.2% 164.54 ±133% +209.8% 183.22 ±149% perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
81.27 ± 26% -60.3% 32.25 ± 60% -6.6% 75.94 ± 87% -12.4% 71.19 ± 59% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
3448 ± 12% +48.4% 5119 ± 23% +31.3% 4528 ± 22% +49.6% 5159 ± 25% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
5.07 ± 8% +15.0% 5.83 ± 9% +7.2% 5.44 ± 10% +14.4% 5.80 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
21.71 ± 7% +97.3% 42.83 ± 11% +126.1% 49.07 ± 12% +126.7% 49.21 ± 10% perf-sched.wait_time.avg.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
23.06 ± 17% +47.2% 33.94 ± 51% +96.9% 45.41 ± 33% +66.4% 38.36 ± 29% perf-sched.wait_time.avg.ms.__cond_resched.rmap_walk_anon.try_to_unmap.shrink_folio_list.evict_folios
6.41 ± 96% +426.5% 33.77 ± 20% +449.5% 35.25 ± 21% +362.3% 29.65 ± 30% perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
9.59 ± 52% +792.6% 85.58 ± 27% +958.8% 101.51 ± 22% +863.2% 92.35 ± 33% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
88.96 ± 56% -90.5% 8.44 ± 53% -80.2% 17.61 ±113% -86.9% 11.64 ± 43% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.__folio_lock_or_retry.do_swap_page
21.33 ± 3% +88.1% 40.12 ± 6% +102.3% 43.16 ± 7% +101.4% 42.97 ± 6% perf-sched.wait_time.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
383.33 ± 3% +8.2% 414.58 ± 3% +6.6% 408.48 ± 3% +8.8% 417.21 ± 3% perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
40.28 ± 34% +602.1% 282.81 ± 59% +800.4% 362.72 ± 27% +863.0% 387.93 ± 6% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
3.97 -16.8% 3.30 ± 3% -12.8% 3.46 ± 10% -16.6% 3.31 ± 8% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
338.69 ± 23% -54.8% 153.25 ± 23% -47.4% 178.01 ± 38% -41.6% 197.80 ± 28% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
22.02 ± 23% +462.9% 123.94 ± 26% +370.4% 103.58 ± 20% +435.5% 117.90 ± 25% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
20.81 ± 4% +65.0% 34.33 ± 6% +80.8% 37.63 ± 9% +82.6% 38.01 ± 7% perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
5.87 ± 8% -28.1% 4.22 -26.8% 4.29 ± 3% -26.9% 4.29 ± 3% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
527.30 ± 2% +20.9% 637.25 ± 10% +18.4% 624.18 ± 5% +21.8% 642.29 ± 7% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
159.22 ± 2% +78.8% 284.72 ± 9% +77.4% 282.42 ± 11% +84.4% 293.57 ± 10% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
42.83 ± 9% +1530.6% 698.38 ± 63% +1957.2% 881.10 ± 46% +2028.8% 911.74 ± 31% perf-sched.wait_time.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
12.83 ± 82% +344.6% 57.05 ± 10% +364.9% 59.65 ± 14% +357.0% 58.63 ± 25% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
124.59 ± 77% +333.7% 540.35 ± 31% +480.6% 723.45 ± 41% +469.8% 709.94 ± 50% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
1212 ± 4% +81.2% 2197 ± 12% +100.1% 2426 ± 23% +142.1% 2935 ± 37% perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
499.95 +0.0% 500.02 +50.7% 753.45 ±109% +255.1% 1775 ± 91% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
220.10 ± 74% +283.8% 844.67 ± 11% +239.2% 746.66 ± 33% +278.3% 832.59 ± 42% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
59.03 ± 24% +281.3% 225.09 ±154% +163.3% 155.44 ±142% +210.2% 183.14 ±149% perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
81.17 ± 26% -60.4% 32.15 ± 60% -22.6% 62.82 ± 78% -19.2% 65.58 ± 62% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
3385 ± 10% +51.2% 5119 ± 23% +33.7% 4528 ± 22% +52.4% 5159 ± 25% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
79.77 -10.1 69.65 -10.8 69.01 ± 3% -9.9 69.88 ± 2% perf-profile.calltrace.cycles-pp.do_access
77.33 -7.8 69.51 -8.4 68.90 ± 3% -7.6 69.76 ± 3% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
7.43 ± 2% -6.8 0.66 ± 13% -6.8 0.66 ± 6% -6.8 0.65 ± 4% perf-profile.calltrace.cycles-pp.add_to_swap.shrink_folio_list.evict_folios.try_to_shrink_lruvec.shrink_one
6.76 ± 5% -5.8 0.95 ± 5% -5.8 0.97 ± 4% -5.9 0.87 ± 4% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty
6.24 ± 2% -5.7 0.58 ± 12% -5.7 0.58 ± 7% -5.7 0.58 ± 4% perf-profile.calltrace.cycles-pp.folio_alloc_swap.add_to_swap.shrink_folio_list.evict_folios.try_to_shrink_lruvec
5.73 ± 4% -5.6 0.17 ±141% -5.5 0.23 ±113% -5.4 0.37 ± 65% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty
74.64 -5.4 69.25 -6.0 68.69 ± 3% -5.1 69.54 ± 3% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
74.54 -5.3 69.25 -5.9 68.69 ± 3% -5.0 69.53 ± 3% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
5.79 ± 3% -5.2 0.55 ± 11% -5.2 0.56 ± 7% -5.2 0.56 ± 5% perf-profile.calltrace.cycles-pp.__mem_cgroup_try_charge_swap.folio_alloc_swap.add_to_swap.shrink_folio_list.evict_folios
5.51 ± 4% -5.1 0.37 ± 72% -5.4 0.10 ±208% -5.5 0.00 perf-profile.calltrace.cycles-pp.do_rw_once
73.45 -4.3 69.17 -4.8 68.63 ± 3% -4.0 69.47 ± 3% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.92 ± 2% -3.5 0.44 ± 44% -3.7 0.26 ±100% -3.6 0.31 ± 81% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty
72.77 -2.9 69.91 -3.2 69.54 ± 3% -2.4 70.38 ± 2% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
74.31 -1.9 72.37 -1.6 72.73 -1.1 73.19 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +0.5 0.46 ± 72% +0.8 0.82 ± 23% +0.7 0.71 ± 44% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
0.00 +0.5 0.46 ± 72% +0.8 0.82 ± 23% +0.7 0.71 ± 44% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
0.00 +0.6 0.59 ± 7% +0.6 0.57 ± 5% +0.5 0.51 ± 33% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry
0.00 +0.7 0.66 ± 5% +0.6 0.63 ± 6% +0.6 0.62 ± 4% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state
0.00 +0.7 0.69 ± 5% +0.7 0.66 ± 5% +0.6 0.65 ± 4% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 23% +0.8 0.78 ± 42% perf-profile.calltrace.cycles-pp.handle_mm_fault.__get_user_pages.get_user_pages_remote.get_arg_page.copy_string_kernel
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 22% +0.8 0.78 ± 42% perf-profile.calltrace.cycles-pp.__get_user_pages.get_user_pages_remote.get_arg_page.copy_string_kernel.do_execveat_common
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 22% +0.8 0.78 ± 42% perf-profile.calltrace.cycles-pp.copy_string_kernel.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 22% +0.8 0.78 ± 42% perf-profile.calltrace.cycles-pp.get_arg_page.copy_string_kernel.do_execveat_common.__x64_sys_execve.do_syscall_64
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 22% +0.8 0.78 ± 42% perf-profile.calltrace.cycles-pp.get_user_pages_remote.get_arg_page.copy_string_kernel.do_execveat_common.__x64_sys_execve
0.00 +0.8 0.81 ± 12% +1.0 0.95 ± 20% +0.8 0.81 ± 43% perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
0.00 +0.8 0.81 ± 12% +1.0 0.95 ± 20% +0.8 0.81 ± 43% perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
0.00 +0.8 0.81 ± 12% +1.0 0.95 ± 20% +0.8 0.81 ± 43% perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve
0.00 +0.8 0.81 ± 12% +1.0 0.95 ± 19% +0.8 0.81 ± 43% perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.9 0.95 ± 19% +1.3 1.26 ± 17% +1.1 1.11 ± 19% perf-profile.calltrace.cycles-pp._Fork
0.08 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +0.9 0.98 ± 17% perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.wp_page_copy
0.08 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +0.9 0.98 ± 17% perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.wp_page_copy.__handle_mm_fault
0.08 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +0.9 0.98 ± 17% perf-profile.calltrace.cycles-pp.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.wp_page_copy.__handle_mm_fault.handle_mm_fault
0.08 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +0.9 0.98 ± 17% perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.09 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +0.9 0.98 ± 17% perf-profile.calltrace.cycles-pp.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.00 +1.3 1.26 ± 7% +1.2 1.21 ± 4% +1.2 1.18 ± 5% perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.00 +1.4 1.36 ± 30% +1.2 1.18 ± 35% +1.2 1.17 ± 32% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__get_user_pages.get_user_pages_remote.get_arg_page
0.00 +1.6 1.64 ± 6% +1.5 1.55 ± 5% +1.5 1.51 ± 4% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
1.21 ± 46% +2.0 3.22 ± 29% +2.7 3.94 ± 34% +2.5 3.75 ± 37% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
1.20 ± 46% +2.0 3.22 ± 29% +2.7 3.94 ± 34% +2.6 3.75 ± 37% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
1.20 ± 46% +2.0 3.22 ± 29% +2.7 3.94 ± 34% +2.6 3.75 ± 37% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
1.18 ± 47% +2.0 3.22 ± 29% +2.8 3.93 ± 34% +2.6 3.75 ± 37% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +2.1 2.09 ± 7% +2.0 1.97 ± 5% +1.9 1.92 ± 4% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.30 ±100% +2.2 2.48 ± 11% +2.3 2.65 ± 13% +2.3 2.58 ± 17% perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% +2.3 2.65 ± 13% +2.3 2.58 ± 17% perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% +2.3 2.65 ± 13% +2.3 2.58 ± 17% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% +2.3 2.65 ± 13% +2.3 2.58 ± 17% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% +2.4 2.65 ± 13% +2.3 2.58 ± 17% perf-profile.calltrace.cycles-pp.execve
67.34 +2.3 69.67 +1.8 69.15 ± 2% +2.5 69.79 ± 2% perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
67.27 +2.4 69.67 +1.9 69.14 ± 2% +2.5 69.78 ± 2% perf-profile.calltrace.cycles-pp.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
67.22 +2.4 69.66 +1.9 69.13 ± 2% +2.6 69.77 ± 2% perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page
67.12 +2.5 69.64 +2.0 69.13 ± 2% +2.6 69.77 ± 2% perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.alloc_anon_folio
5.78 +3.1 8.85 ± 6% +3.3 9.10 ± 3% +3.1 8.88 ± 3% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
5.78 +3.1 8.85 ± 6% +3.3 9.10 ± 3% +3.1 8.88 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
5.78 +3.1 8.85 ± 6% +3.3 9.10 ± 3% +3.1 8.88 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
2.21 ± 6% +3.2 5.36 ± 11% +2.9 5.07 ± 6% +2.8 4.99 ± 4% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +3.8 8.66 ± 3% perf-profile.calltrace.cycles-pp.shrink_many.shrink_node.balance_pgdat.kswapd.kthread
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +3.8 8.66 ± 3% perf-profile.calltrace.cycles-pp.shrink_node.balance_pgdat.kswapd.kthread.ret_from_fork
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +3.8 8.66 ± 3% perf-profile.calltrace.cycles-pp.shrink_one.shrink_many.shrink_node.balance_pgdat.kswapd
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +3.8 8.66 ± 3% perf-profile.calltrace.cycles-pp.balance_pgdat.kswapd.kthread.ret_from_fork.ret_from_fork_asm
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +3.8 8.66 ± 3% perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork.ret_from_fork_asm
4.87 +3.8 8.62 ± 6% +4.0 8.84 ± 3% +3.8 8.66 ± 3% perf-profile.calltrace.cycles-pp.try_to_shrink_lruvec.shrink_one.shrink_many.shrink_node.balance_pgdat
66.53 +4.1 70.63 +3.6 70.13 ± 2% +4.1 70.68 ± 2% perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
6.72 ± 2% +4.5 11.21 ± 9% +3.9 10.60 ± 4% +3.7 10.38 ± 3% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
6.72 ± 2% +4.5 11.21 ± 9% +3.9 10.60 ± 4% +3.7 10.38 ± 3% perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
6.72 ± 2% +4.5 11.20 ± 9% +3.9 10.59 ± 4% +3.7 10.37 ± 3% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
6.94 ± 2% +4.6 11.50 ± 9% +4.0 10.90 ± 4% +3.8 10.69 ± 3% perf-profile.calltrace.cycles-pp.common_startup_64
3.57 ± 2% +5.1 8.64 ± 9% +4.6 8.16 ± 5% +4.4 8.02 ± 4% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
3.65 ± 2% +5.5 9.14 ± 9% +5.0 8.62 ± 5% +4.8 8.46 ± 4% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
4.52 ± 2% +6.3 10.82 ± 8% +5.7 10.24 ± 5% +5.5 10.03 ± 4% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
64.31 +6.9 71.18 +7.3 71.57 +7.4 71.70 ± 2% perf-profile.calltrace.cycles-pp.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof
64.17 +7.4 71.56 +8.2 72.39 +8.6 72.78 perf-profile.calltrace.cycles-pp.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_noprof.alloc_pages_mpol_noprof
64.15 +7.4 71.56 +8.3 72.42 +8.6 72.78 perf-profile.calltrace.cycles-pp.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_noprof
62.53 +9.0 71.48 +9.8 72.37 +10.2 72.72 perf-profile.calltrace.cycles-pp.shrink_many.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath
62.50 +9.0 71.48 +9.9 72.37 +10.2 72.72 perf-profile.calltrace.cycles-pp.shrink_one.shrink_many.shrink_node.do_try_to_free_pages.try_to_free_pages
62.03 +9.4 71.45 +10.3 72.34 +10.7 72.69 perf-profile.calltrace.cycles-pp.try_to_shrink_lruvec.shrink_one.shrink_many.shrink_node.do_try_to_free_pages
66.79 +13.3 80.06 +14.4 81.18 +14.6 81.35 perf-profile.calltrace.cycles-pp.evict_folios.try_to_shrink_lruvec.shrink_one.shrink_many.shrink_node
63.11 +16.6 79.70 +17.9 81.02 +18.0 81.14 perf-profile.calltrace.cycles-pp.shrink_folio_list.evict_folios.try_to_shrink_lruvec.shrink_one.shrink_many
42.45 ± 2% +35.3 77.74 ± 2% +36.9 79.33 +37.0 79.48 perf-profile.calltrace.cycles-pp.try_to_unmap_flush_dirty.shrink_folio_list.evict_folios.try_to_shrink_lruvec.shrink_one
42.43 ± 2% +35.3 77.74 ± 2% +36.9 79.33 +37.1 79.48 perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_folio_list.evict_folios.try_to_shrink_lruvec
42.34 ± 2% +35.4 77.73 ± 2% +37.0 79.32 +37.1 79.48 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_folio_list.evict_folios
41.73 ± 2% +35.9 77.58 ± 2% +37.4 79.18 +37.6 79.32 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_folio_list
15.56 ± 2% -12.5 3.03 ± 4% -12.6 2.94 ± 2% -12.7 2.90 ± 3% perf-profile.children.cycles-pp.asm_sysvec_call_function
80.28 -10.4 69.87 -11.1 69.17 ± 3% -10.2 70.07 ± 2% perf-profile.children.cycles-pp.do_access
11.47 ± 4% -10.1 1.37 ± 4% -10.1 1.35 ± 3% -10.1 1.35 ± 2% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
10.79 ± 4% -9.5 1.32 ± 4% -9.5 1.31 ± 3% -9.5 1.31 ± 2% perf-profile.children.cycles-pp.__sysvec_call_function
11.76 ± 3% -9.4 2.34 ± 3% -9.5 2.28 ± 3% -9.5 2.25 ± 3% perf-profile.children.cycles-pp.sysvec_call_function
8.04 ± 2% -7.2 0.79 ± 11% -7.2 0.80 ± 5% -7.3 0.78 ± 4% perf-profile.children.cycles-pp.add_to_swap
7.85 ± 4% -6.7 1.19 ± 7% -6.6 1.24 ± 2% -6.7 1.12 ± 3% perf-profile.children.cycles-pp.llist_add_batch
6.84 ± 2% -6.2 0.69 ± 9% -6.1 0.70 ± 5% -6.1 0.70 ± 4% perf-profile.children.cycles-pp.folio_alloc_swap
6.38 ± 2% -5.7 0.65 ± 8% -5.7 0.67 ± 6% -5.7 0.67 ± 4% perf-profile.children.cycles-pp.__mem_cgroup_try_charge_swap
5.83 ± 7% -5.3 0.57 ± 50% -5.4 0.44 ± 3% -5.4 0.44 ± 3% perf-profile.children.cycles-pp.rmap_walk_anon
5.72 ± 4% -5.1 0.62 ± 15% -5.2 0.48 ± 17% -5.2 0.54 ± 12% perf-profile.children.cycles-pp.do_rw_once
5.03 ± 6% -4.6 0.47 ± 4% -4.6 0.46 ± 4% -4.6 0.45 ± 3% perf-profile.children.cycles-pp.flush_tlb_func
4.76 ± 4% -4.3 0.41 ±143% -4.6 0.15 ± 11% -4.6 0.15 ± 7% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
4.83 ± 2% -4.1 0.73 ± 3% -4.1 0.72 ± 3% -4.1 0.71 ± 3% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
4.27 ± 7% -3.9 0.34 ± 9% -3.9 0.32 ± 4% -3.9 0.32 ± 4% perf-profile.children.cycles-pp.try_to_unmap
4.31 ± 3% -3.9 0.40 ± 17% -4.0 0.34 ± 4% -4.0 0.35 ± 6% perf-profile.children.cycles-pp.pageout
4.46 ± 2% -3.8 0.64 ± 4% -3.8 0.65 ± 4% -3.8 0.65 ± 3% perf-profile.children.cycles-pp.llist_reverse_order
78.01 -3.6 74.42 -3.4 74.60 -3.0 74.97 perf-profile.children.cycles-pp.asm_exc_page_fault
3.88 ± 8% -3.6 0.30 ± 6% -3.6 0.29 ± 4% -3.6 0.29 ± 4% perf-profile.children.cycles-pp.try_to_unmap_one
3.94 ± 3% -3.6 0.37 ± 17% -3.6 0.32 ± 5% -3.6 0.32 ± 6% perf-profile.children.cycles-pp.swap_writepage
3.10 ± 4% -2.7 0.36 ± 77% -2.9 0.24 ± 6% -2.9 0.24 ± 6% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
73.30 -2.3 70.98 -2.8 70.49 ± 2% -2.0 71.28 ± 2% perf-profile.children.cycles-pp.do_anonymous_page
2.48 ± 6% -2.3 0.19 ± 10% -2.3 0.16 ± 10% -2.3 0.17 ± 11% perf-profile.children.cycles-pp.get_page_from_freelist
2.36 ± 7% -2.1 0.25 ± 20% -2.1 0.25 ± 8% -2.1 0.24 ± 5% perf-profile.children.cycles-pp.swap_cgroup_record
2.31 ± 3% -2.1 0.25 ± 5% -2.0 0.27 ± 6% -2.0 0.27 ± 7% perf-profile.children.cycles-pp.page_counter_try_charge
2.39 -2.1 0.34 ± 5% -2.1 0.31 ± 6% -2.1 0.32 ± 4% perf-profile.children.cycles-pp.native_irq_return_iret
2.26 ± 5% -2.0 0.25 ±115% -2.2 0.10 ± 7% -2.2 0.11 ± 9% perf-profile.children.cycles-pp.folio_batch_move_lru
76.12 -2.0 74.16 -1.7 74.39 -1.4 74.74 perf-profile.children.cycles-pp.exc_page_fault
76.08 -1.9 74.15 -1.7 74.38 -1.3 74.74 perf-profile.children.cycles-pp.do_user_addr_fault
2.20 ± 4% -1.9 0.34 ± 12% -1.9 0.29 ± 9% -1.9 0.31 ± 9% perf-profile.children.cycles-pp._raw_spin_lock
1.85 ± 3% -1.8 0.09 ± 10% -1.8 0.09 ± 7% -1.8 0.09 ± 8% perf-profile.children.cycles-pp.native_flush_tlb_local
2.09 ± 2% -1.7 0.36 ± 23% -1.8 0.32 ± 8% -1.8 0.34 ± 11% perf-profile.children.cycles-pp.handle_softirqs
1.76 ± 6% -1.5 0.22 ± 8% -1.6 0.17 ± 17% -1.6 0.19 ± 14% perf-profile.children.cycles-pp.__pte_offset_map_lock
1.78 ± 8% -1.5 0.25 ±103% -1.6 0.13 ± 6% -1.6 0.13 ± 7% perf-profile.children.cycles-pp.folio_referenced
1.68 ± 3% -1.5 0.19 ± 36% -1.5 0.15 ± 6% -1.5 0.15 ± 5% perf-profile.children.cycles-pp.blk_complete_reqs
1.57 ± 6% -1.5 0.12 ± 31% -1.5 0.09 ± 7% -1.5 0.10 ± 8% perf-profile.children.cycles-pp.flush_smp_call_function_queue
1.54 ± 15% -1.4 0.11 ± 8% -1.4 0.12 ± 6% -1.4 0.11 ± 6% perf-profile.children.cycles-pp.set_tlb_ubc_flush_pending
1.61 ± 3% -1.4 0.18 ± 34% -1.5 0.14 ± 6% -1.5 0.15 ± 5% perf-profile.children.cycles-pp.scsi_end_request
1.61 ± 3% -1.4 0.18 ± 34% -1.5 0.14 ± 6% -1.5 0.15 ± 5% perf-profile.children.cycles-pp.scsi_io_completion
1.57 ± 4% -1.4 0.16 ± 11% -1.4 0.13 ± 12% -1.4 0.14 ± 11% perf-profile.children.cycles-pp.__mem_cgroup_charge
1.48 ± 4% -1.3 0.16 ± 35% -1.4 0.13 ± 6% -1.3 0.13 ± 4% perf-profile.children.cycles-pp.blk_update_request
1.44 ± 8% -1.3 0.13 ± 12% -1.3 0.12 ± 8% -1.3 0.11 ± 9% perf-profile.children.cycles-pp.__swap_writepage
1.38 ± 7% -1.3 0.08 ± 40% -1.3 0.07 ± 11% -1.3 0.07 ± 10% perf-profile.children.cycles-pp.__remove_mapping
1.33 ± 6% -1.3 0.07 ± 40% -1.3 0.05 ± 28% -1.3 0.06 ± 36% perf-profile.children.cycles-pp.do_softirq
1.32 ± 11% -1.2 0.08 ± 6% -1.3 0.07 ± 8% -1.3 0.07 ± 16% perf-profile.children.cycles-pp.rmqueue
1.34 ± 11% -1.2 0.12 ± 15% -1.2 0.09 ± 9% -1.2 0.10 ± 9% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
1.43 ± 7% -1.2 0.23 ±106% -1.3 0.10 ± 8% -1.3 0.11 ± 10% perf-profile.children.cycles-pp.__folio_batch_add_and_move
1.18 ± 12% -1.1 0.06 ± 7% -1.1 0.06 ± 10% -1.1 0.06 ± 17% perf-profile.children.cycles-pp.__rmqueue_pcplist
1.25 ± 4% -1.1 0.14 ± 50% -1.1 0.11 ± 6% -1.1 0.11 ± 8% perf-profile.children.cycles-pp.isolate_folios
1.24 ± 4% -1.1 0.14 ± 48% -1.1 0.11 ± 6% -1.1 0.11 ± 9% perf-profile.children.cycles-pp.scan_folios
1.19 ± 6% -1.1 0.12 ± 10% -1.1 0.11 ± 12% -1.1 0.12 ± 12% perf-profile.children.cycles-pp.try_charge_memcg
1.12 ± 13% -1.1 0.06 ± 9% -1.1 0.04 ± 58% -1.1 0.05 ± 37% perf-profile.children.cycles-pp.rmqueue_bulk
1.18 ± 4% -1.1 0.12 ± 25% -1.1 0.10 ± 8% -1.1 0.10 ± 6% perf-profile.children.cycles-pp.submit_bio_noacct_nocheck
1.25 ± 9% -1.0 0.20 ±122% -1.2 0.09 ± 9% -1.2 0.09 ± 7% perf-profile.children.cycles-pp.folio_referenced_one
1.14 ± 7% -1.0 0.11 ± 6% -1.0 0.11 ± 9% -1.0 0.11 ± 13% perf-profile.children.cycles-pp.mem_cgroup_id_get_online
1.13 ± 3% -1.0 0.12 ± 42% -1.0 0.10 ± 6% -1.0 0.10 ± 6% perf-profile.children.cycles-pp.end_swap_bio_write
1.08 ± 5% -1.0 0.09 ± 22% -1.0 0.09 ± 9% -1.0 0.08 ± 8% perf-profile.children.cycles-pp.add_to_swap_cache
1.10 ± 3% -1.0 0.12 ± 43% -1.0 0.09 ± 6% -1.0 0.10 ± 8% perf-profile.children.cycles-pp.folio_end_writeback
1.09 ± 4% -1.0 0.11 ± 26% -1.0 0.09 ± 7% -1.0 0.09 ± 6% perf-profile.children.cycles-pp.__submit_bio
1.06 ± 4% -1.0 0.11 ± 24% -1.0 0.09 ± 7% -1.0 0.09 ± 6% perf-profile.children.cycles-pp.blk_mq_submit_bio
1.04 ± 6% -0.9 0.12 ± 6% -0.9 0.12 ± 7% -0.9 0.12 ± 4% perf-profile.children.cycles-pp._find_next_bit
1.00 ± 2% -0.9 0.11 ± 47% -0.9 0.09 ± 7% -0.9 0.08 ± 9% perf-profile.children.cycles-pp.isolate_folio
0.96 ± 12% -0.9 0.08 ± 34% -0.9 0.06 ± 10% -0.9 0.06 ± 14% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.94 ± 3% -0.9 0.08 ± 38% -0.9 0.07 ± 12% -0.9 0.07 ± 17% perf-profile.children.cycles-pp.page_vma_mapped_walk
1.28 ± 9% -0.8 0.46 ± 15% -0.9 0.43 ± 5% -0.8 0.44 ± 8% perf-profile.children.cycles-pp.__irq_exit_rcu
0.95 ± 4% -0.8 0.15 ± 17% -0.8 0.13 ± 8% -0.8 0.13 ± 3% perf-profile.children.cycles-pp.__schedule
0.85 ± 3% -0.7 0.13 ± 8% -0.7 0.11 ± 9% -0.7 0.12 ± 9% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.75 ± 4% -0.7 0.08 ± 17% -0.7 0.06 ± 14% -0.7 0.07 ± 12% perf-profile.children.cycles-pp.sync_regs
1.16 ± 4% -0.6 0.52 ± 3% -0.7 0.50 ± 5% -0.6 0.52 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.70 ± 4% -0.6 0.08 ± 61% -0.6 0.06 ± 10% -0.6 0.06 ± 14% perf-profile.children.cycles-pp.lru_gen_del_folio
0.70 ± 5% -0.6 0.08 ± 52% -0.6 0.06 ± 9% -0.6 0.06 ± 11% perf-profile.children.cycles-pp.lru_gen_add_folio
0.66 ± 8% -0.6 0.05 ± 48% -0.6 0.05 ± 26% -0.6 0.02 ±100% perf-profile.children.cycles-pp.__folio_start_writeback
1.08 ± 4% -0.6 0.47 ± 3% -0.6 0.46 ± 5% -0.6 0.48 ± 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.69 ± 7% -0.6 0.11 ± 18% -0.6 0.10 ± 8% -0.6 0.10 ± 6% perf-profile.children.cycles-pp.schedule
0.75 ± 6% -0.6 0.18 ± 19% -0.5 0.21 ± 60% -0.6 0.18 ± 24% perf-profile.children.cycles-pp.worker_thread
0.65 ± 11% -0.5 0.10 ± 20% -0.6 0.09 ± 12% -0.6 0.10 ± 10% perf-profile.children.cycles-pp.__drain_all_pages
0.64 ± 4% -0.5 0.11 ± 6% -0.5 0.10 ± 9% -0.5 0.10 ± 10% perf-profile.children.cycles-pp.sysvec_call_function_single
0.64 ± 16% -0.5 0.12 ± 25% -0.5 0.11 ± 10% -0.5 0.10 ± 13% perf-profile.children.cycles-pp.asm_common_interrupt
0.64 ± 16% -0.5 0.12 ± 25% -0.5 0.11 ± 10% -0.5 0.10 ± 13% perf-profile.children.cycles-pp.common_interrupt
0.54 ± 6% -0.5 0.04 ± 75% -0.5 0.04 ± 58% -0.5 0.05 ± 33% perf-profile.children.cycles-pp.blk_mq_sched_dispatch_requests
0.54 ± 6% -0.5 0.04 ± 75% -0.5 0.04 ± 58% -0.5 0.05 ± 33% perf-profile.children.cycles-pp.__blk_mq_sched_dispatch_requests
0.53 ± 6% -0.5 0.04 ± 73% -0.5 0.04 ± 58% -0.5 0.05 ± 33% perf-profile.children.cycles-pp.__blk_mq_do_dispatch_sched
0.56 ± 12% -0.5 0.07 ± 23% -0.5 0.07 ± 15% -0.5 0.07 ± 16% perf-profile.children.cycles-pp.drain_pages_zone
0.54 ± 4% -0.5 0.06 ± 19% -0.5 0.05 ± 10% -0.5 0.05 ± 34% perf-profile.children.cycles-pp.__blk_flush_plug
0.52 ± 7% -0.5 0.03 ± 70% -0.5 0.00 -0.5 0.01 ±201% perf-profile.children.cycles-pp.lock_vma_under_rcu
0.54 ± 4% -0.5 0.06 ± 19% -0.5 0.05 ± 10% -0.5 0.05 ± 34% perf-profile.children.cycles-pp.blk_mq_flush_plug_list
0.54 ± 3% -0.5 0.06 ± 19% -0.5 0.05 ± 28% -0.5 0.05 ± 34% perf-profile.children.cycles-pp.blk_mq_dispatch_plug_list
0.51 ± 10% -0.4 0.08 ± 24% -0.4 0.07 ± 13% -0.4 0.08 ± 16% perf-profile.children.cycles-pp.free_pcppages_bulk
0.45 ± 5% -0.4 0.04 ± 75% -0.4 0.01 ±173% -0.4 0.02 ±100% perf-profile.children.cycles-pp.__rq_qos_throttle
0.49 ± 7% -0.4 0.08 ± 17% -0.4 0.07 ± 11% -0.4 0.07 ± 5% perf-profile.children.cycles-pp.__pick_next_task
0.62 ± 4% -0.4 0.22 ± 8% -0.4 0.22 ± 6% -0.4 0.22 ± 5% perf-profile.children.cycles-pp.irqtime_account_irq
0.44 ± 5% -0.4 0.04 ± 75% -0.4 0.01 ±173% -0.4 0.02 ±100% perf-profile.children.cycles-pp.wbt_wait
0.42 ± 6% -0.4 0.04 ± 75% -0.4 0.01 ±264% -0.4 0.02 ±122% perf-profile.children.cycles-pp.rq_qos_wait
0.42 ± 5% -0.4 0.04 ± 45% -0.4 0.03 ± 77% -0.4 0.02 ±122% perf-profile.children.cycles-pp.bio_alloc_bioset
0.66 ± 3% -0.3 0.31 ± 3% -0.4 0.31 ± 5% -0.3 0.32 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.65 ± 3% -0.3 0.31 ± 3% -0.3 0.31 ± 5% -0.3 0.32 ± 4% perf-profile.children.cycles-pp.hrtimer_interrupt
0.48 ± 3% -0.3 0.15 ± 8% -0.3 0.15 ± 5% -0.3 0.14 ± 7% perf-profile.children.cycles-pp.sched_clock_cpu
0.40 ± 10% -0.3 0.06 ± 17% -0.3 0.05 ± 39% -0.3 0.06 ± 9% perf-profile.children.cycles-pp.pick_next_task_fair
0.46 ± 6% -0.3 0.14 ± 23% -0.3 0.17 ± 73% -0.3 0.14 ± 34% perf-profile.children.cycles-pp.process_one_work
0.44 ± 6% -0.3 0.12 ± 15% -0.3 0.12 ± 4% -0.3 0.11 ± 4% perf-profile.children.cycles-pp.tick_nohz_stop_tick
0.38 ± 10% -0.3 0.06 ± 19% -0.3 0.06 ± 11% -0.3 0.06 ± 8% perf-profile.children.cycles-pp.sched_balance_newidle
0.56 ± 3% -0.3 0.24 ± 4% -0.3 0.24 ± 5% -0.3 0.25 ± 5% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.42 ± 10% -0.3 0.11 ± 13% -0.3 0.10 ± 7% -0.3 0.11 ± 9% perf-profile.children.cycles-pp.sched_balance_rq
0.32 ± 13% -0.3 0.01 ±223% -0.3 0.01 ±209% -0.3 0.02 ±100% perf-profile.children.cycles-pp.__free_one_page
0.42 ± 4% -0.3 0.12 ± 7% -0.3 0.12 ± 5% -0.3 0.12 ± 7% perf-profile.children.cycles-pp.sched_clock
0.45 ± 6% -0.3 0.16 ± 13% -0.3 0.15 ± 5% -0.3 0.14 ± 5% perf-profile.children.cycles-pp.tick_nohz_idle_stop_tick
0.37 ± 9% -0.3 0.09 ± 12% -0.3 0.09 ± 8% -0.3 0.09 ± 9% perf-profile.children.cycles-pp.sched_balance_find_src_group
0.50 ± 3% -0.3 0.23 ± 3% -0.3 0.22 ± 4% -0.3 0.24 ± 5% perf-profile.children.cycles-pp.tick_nohz_handler
0.36 ± 8% -0.3 0.09 ± 13% -0.3 0.08 ± 9% -0.3 0.09 ± 7% perf-profile.children.cycles-pp.update_sd_lb_stats
0.31 ± 8% -0.3 0.04 ± 71% -0.3 0.06 ± 11% -0.3 0.00 perf-profile.children.cycles-pp.tlb_is_not_lazy
0.33 ± 11% -0.3 0.08 ± 15% -0.3 0.08 ± 10% -0.3 0.08 ± 9% perf-profile.children.cycles-pp.update_sg_lb_stats
0.44 ± 4% -0.2 0.20 ± 4% -0.2 0.19 ± 5% -0.2 0.20 ± 5% perf-profile.children.cycles-pp.update_process_times
0.30 ± 7% -0.2 0.07 ± 10% -0.2 0.07 ± 12% -0.2 0.06 ± 7% perf-profile.children.cycles-pp.error_entry
0.29 ± 4% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.irq_work_run_list
0.39 ± 6% -0.2 0.19 ± 3% -0.2 0.19 ± 7% -0.2 0.18 ± 7% perf-profile.children.cycles-pp.native_sched_clock
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.07 ± 13% perf-profile.children.cycles-pp.__sysvec_irq_work
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.07 ± 13% perf-profile.children.cycles-pp._printk
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.07 ± 13% perf-profile.children.cycles-pp.asm_sysvec_irq_work
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.07 ± 13% perf-profile.children.cycles-pp.irq_work_run
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.07 ± 13% perf-profile.children.cycles-pp.irq_work_single
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.07 ± 13% perf-profile.children.cycles-pp.sysvec_irq_work
0.28 ± 5% -0.2 0.09 ± 10% -0.2 0.11 ± 88% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.console_flush_all
0.28 ± 5% -0.2 0.09 ± 10% -0.2 0.11 ± 88% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.console_unlock
0.28 ± 5% -0.2 0.09 ± 10% -0.2 0.11 ± 88% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.vprintk_emit
0.28 ± 4% -0.2 0.09 ± 7% -0.2 0.11 ± 86% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.serial8250_console_write
0.28 ± 5% -0.2 0.09 ± 9% -0.2 0.10 ± 77% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.wait_for_lsr
0.23 ± 15% -0.2 0.06 ± 65% -0.2 0.04 ± 83% -0.2 0.02 ±124% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
0.24 ± 7% -0.2 0.08 ± 13% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
0.24 ± 7% -0.2 0.08 ± 13% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.drm_fb_helper_damage_work
0.24 ± 7% -0.2 0.08 ± 13% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.drm_fbdev_shmem_helper_fb_dirty
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.drm_atomic_helper_commit
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.drm_atomic_commit
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.commit_tail
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.07 ± 14% perf-profile.children.cycles-pp.drm_fb_memcpy
0.23 ± 8% -0.2 0.08 ± 11% -0.1 0.11 ±113% -0.2 0.07 ± 12% perf-profile.children.cycles-pp.memcpy_toio
0.19 ± 11% -0.1 0.06 ± 13% -0.1 0.07 ±105% -0.1 0.05 ± 9% perf-profile.children.cycles-pp.io_serial_in
0.19 ± 7% -0.1 0.06 ± 98% -0.1 0.06 ±133% -0.1 0.06 ±108% perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
0.11 ± 12% -0.1 0.02 ±141% -0.1 0.01 ±173% -0.1 0.03 ±102% perf-profile.children.cycles-pp.sched_balance_update_blocked_averages
0.20 ± 10% -0.1 0.12 ± 6% -0.1 0.11 ± 9% -0.1 0.12 ± 5% perf-profile.children.cycles-pp.sched_tick
0.11 ± 13% -0.0 0.06 ± 11% -0.0 0.06 ± 13% -0.0 0.07 ± 15% perf-profile.children.cycles-pp.sched_balance_domains
0.10 ± 4% -0.0 0.06 ± 13% -0.1 0.05 ± 27% -0.1 0.05 ± 5% perf-profile.children.cycles-pp.sched_core_idle_cpu
0.14 ± 5% -0.0 0.09 ± 7% -0.1 0.09 ± 9% -0.1 0.08 ± 9% perf-profile.children.cycles-pp.irqentry_enter
0.09 ± 14% -0.0 0.06 ± 9% -0.0 0.05 ± 27% -0.0 0.05 ± 7% perf-profile.children.cycles-pp.clockevents_program_event
0.09 ± 11% -0.0 0.06 ± 9% -0.0 0.06 ± 13% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.task_tick_fair
0.00 +0.0 0.00 +0.0 0.00 +0.1 0.11 ± 8% perf-profile.children.cycles-pp.should_flush_tlb
0.03 ± 70% +0.0 0.08 ± 11% +0.0 0.08 ± 6% +0.0 0.07 ± 8% perf-profile.children.cycles-pp.read_tsc
0.00 +0.0 0.05 ± 45% +0.1 0.06 ± 12% +0.0 0.04 ± 50% perf-profile.children.cycles-pp.menu_reflect
0.00 +0.1 0.06 ± 6% +0.1 0.06 ± 10% +0.1 0.05 ± 9% perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.00 +0.1 0.06 ± 14% +0.1 0.06 ± 8% +0.1 0.06 ± 6% perf-profile.children.cycles-pp.ct_kernel_exit
0.00 +0.1 0.06 ± 14% +0.1 0.06 ± 7% +0.1 0.06 ± 8% perf-profile.children.cycles-pp.nr_iowait_cpu
0.00 +0.1 0.06 ± 7% +0.1 0.06 ± 9% +0.1 0.06 ± 12% perf-profile.children.cycles-pp.hrtimer_get_next_event
0.00 +0.1 0.07 ± 7% +0.1 0.07 ± 9% +0.1 0.07 ± 11% perf-profile.children.cycles-pp.tmigr_cpu_new_timer
0.00 +0.1 0.07 ± 10% +0.1 0.07 ± 9% +0.1 0.07 ± 8% perf-profile.children.cycles-pp.irq_work_needs_cpu
0.02 ±142% +0.1 0.10 ± 52% +0.1 0.10 ± 77% +0.1 0.11 ± 56% perf-profile.children.cycles-pp.generic_perform_write
0.02 ±142% +0.1 0.10 ± 52% +0.1 0.10 ± 76% +0.1 0.11 ± 56% perf-profile.children.cycles-pp.shmem_file_write_iter
0.00 +0.1 0.08 ± 11% +0.1 0.08 ± 8% +0.1 0.08 ± 6% perf-profile.children.cycles-pp.get_cpu_device
0.15 ± 35% +0.1 0.24 ± 62% +0.2 0.36 ± 35% +0.2 0.36 ± 36% perf-profile.children.cycles-pp.alloc_bprm
0.21 ± 9% +0.1 0.29 ± 10% +0.1 0.30 ± 21% +0.1 0.32 ± 19% perf-profile.children.cycles-pp.rest_init
0.21 ± 9% +0.1 0.29 ± 10% +0.1 0.30 ± 21% +0.1 0.32 ± 19% perf-profile.children.cycles-pp.start_kernel
0.21 ± 9% +0.1 0.29 ± 10% +0.1 0.30 ± 21% +0.1 0.32 ± 19% perf-profile.children.cycles-pp.x86_64_start_kernel
0.21 ± 9% +0.1 0.29 ± 10% +0.1 0.30 ± 21% +0.1 0.32 ± 19% perf-profile.children.cycles-pp.x86_64_start_reservations
0.00 +0.1 0.09 ± 13% +0.1 0.08 ± 10% +0.1 0.08 ± 7% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.00 +0.1 0.09 ± 18% +0.1 0.09 ± 12% +0.1 0.08 ± 10% perf-profile.children.cycles-pp.intel_idle_irq
0.01 ±223% +0.1 0.10 ± 32% +0.1 0.10 ± 72% +0.0 0.05 ±114% perf-profile.children.cycles-pp.load_elf_interp
0.00 +0.1 0.10 ± 12% +0.1 0.09 ± 9% +0.1 0.09 ± 5% perf-profile.children.cycles-pp.ct_kernel_enter
0.00 +0.1 0.10 ± 15% +0.1 0.10 ± 9% +0.1 0.09 ± 7% perf-profile.children.cycles-pp.tsc_verify_tsc_adjust
0.12 ± 9% +0.1 0.22 ± 8% +0.1 0.21 ± 5% +0.1 0.20 ± 3% perf-profile.children.cycles-pp.ktime_get
0.00 +0.1 0.10 ± 7% +0.1 0.10 ± 10% +0.1 0.10 ± 10% perf-profile.children.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.00 +0.1 0.11 ± 15% +0.1 0.10 ± 10% +0.1 0.10 ± 6% perf-profile.children.cycles-pp.tick_nohz_stop_idle
0.00 +0.1 0.11 ± 14% +0.1 0.10 ± 8% +0.1 0.10 ± 5% perf-profile.children.cycles-pp.arch_cpu_idle_enter
0.01 ±223% +0.1 0.14 ± 47% +0.1 0.09 ± 63% +0.0 0.03 ±127% perf-profile.children.cycles-pp._IO_setvbuf
0.00 +0.1 0.13 ± 9% +0.1 0.12 ± 9% +0.1 0.12 ± 7% perf-profile.children.cycles-pp.ct_idle_exit
0.01 ±223% +0.1 0.14 ± 83% +0.1 0.11 ± 64% +0.1 0.09 ± 75% perf-profile.children.cycles-pp._copy_to_iter
0.01 ±223% +0.1 0.15 ± 8% +0.1 0.13 ± 6% +0.1 0.13 ± 7% perf-profile.children.cycles-pp.local_clock_noinstr
0.02 ±142% +0.1 0.16 ± 8% +0.1 0.15 ± 6% +0.1 0.15 ± 5% perf-profile.children.cycles-pp.cpuidle_governor_latency_req
0.01 ±223% +0.2 0.16 ± 40% +0.1 0.14 ± 62% +0.1 0.14 ± 78% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.01 ±223% +0.2 0.16 ± 40% +0.1 0.14 ± 62% +0.1 0.14 ± 78% perf-profile.children.cycles-pp.rseq_ip_fixup
0.08 ± 41% +0.2 0.23 ± 24% +0.2 0.27 ± 49% +0.1 0.21 ± 34% perf-profile.children.cycles-pp.write
0.01 ±223% +0.2 0.16 ± 39% +0.2 0.16 ± 59% +0.1 0.14 ± 84% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.16 ± 34% +0.2 0.32 ± 53% +0.3 0.47 ± 32% +0.3 0.49 ± 36% perf-profile.children.cycles-pp.mm_init
0.16 ± 36% +0.2 0.32 ± 53% +0.3 0.47 ± 32% +0.3 0.49 ± 36% perf-profile.children.cycles-pp.pgd_alloc
0.07 ± 63% +0.2 0.23 ± 24% +0.2 0.26 ± 50% +0.1 0.21 ± 34% perf-profile.children.cycles-pp.ksys_write
0.06 ± 60% +0.2 0.23 ± 24% +0.2 0.26 ± 51% +0.1 0.21 ± 34% perf-profile.children.cycles-pp.vfs_write
0.00 +0.2 0.17 ± 33% +0.3 0.26 ± 33% +0.2 0.21 ± 62% perf-profile.children.cycles-pp.copy_p4d_range
0.00 +0.2 0.17 ± 33% +0.3 0.26 ± 33% +0.2 0.21 ± 62% perf-profile.children.cycles-pp.copy_page_range
0.00 +0.2 0.18 ± 32% +0.3 0.27 ± 32% +0.2 0.22 ± 59% perf-profile.children.cycles-pp.dup_mmap
0.12 ± 15% +0.2 0.30 ± 6% +0.2 0.29 ± 5% +0.2 0.28 ± 7% perf-profile.children.cycles-pp.__get_next_timer_interrupt
0.00 +0.2 0.18 ± 38% +0.2 0.20 ± 44% +0.2 0.20 ± 60% perf-profile.children.cycles-pp.__do_fault
0.00 +0.2 0.18 ± 14% +0.2 0.23 ± 55% +0.2 0.20 ± 45% perf-profile.children.cycles-pp.__pmd_alloc
0.01 ±223% +0.2 0.20 ± 36% +0.3 0.33 ± 34% +0.2 0.22 ± 34% perf-profile.children.cycles-pp.__libc_fork
0.02 ±141% +0.2 0.21 ±130% +0.2 0.24 ±156% +0.4 0.37 ±100% perf-profile.children.cycles-pp.__cmd_record
0.04 ± 72% +0.2 0.27 ± 30% +0.3 0.38 ± 20% +0.3 0.36 ± 49% perf-profile.children.cycles-pp.dup_mm
0.07 ± 16% +0.2 0.30 ± 13% +0.2 0.30 ± 33% +0.2 0.24 ± 47% perf-profile.children.cycles-pp.elf_load
0.05 ± 82% +0.3 0.31 ± 44% +0.4 0.40 ± 27% +0.3 0.30 ± 27% perf-profile.children.cycles-pp.schedule_tail
0.15 ± 16% +0.3 0.41 ± 17% +0.3 0.41 ± 36% +0.3 0.48 ± 26% perf-profile.children.cycles-pp.__vfork
0.14 ± 17% +0.3 0.41 ± 17% +0.3 0.41 ± 36% +0.3 0.48 ± 26% perf-profile.children.cycles-pp.__x64_sys_vfork
0.03 ±101% +0.3 0.30 ± 13% +0.3 0.30 ± 33% +0.2 0.24 ± 43% perf-profile.children.cycles-pp.rep_stos_alternative
0.04 ± 71% +0.3 0.32 ± 19% +0.3 0.31 ± 15% +0.3 0.32 ± 13% perf-profile.children.cycles-pp.poll_idle
0.01 ±223% +0.3 0.29 ± 30% +0.3 0.29 ± 22% +0.3 0.28 ± 26% perf-profile.children.cycles-pp.__kmalloc_large_node_noprof
0.01 ±223% +0.3 0.29 ± 30% +0.3 0.29 ± 22% +0.3 0.29 ± 27% perf-profile.children.cycles-pp.___kmalloc_large_node
0.01 ±223% +0.3 0.29 ± 30% +0.3 0.30 ± 22% +0.3 0.28 ± 26% perf-profile.children.cycles-pp.__kmalloc_node_noprof
0.04 ±112% +0.3 0.32 ± 41% +0.4 0.40 ± 27% +0.3 0.30 ± 27% perf-profile.children.cycles-pp.__put_user_4
0.12 ± 26% +0.3 0.42 ± 17% +0.3 0.46 ± 27% +0.3 0.45 ± 36% perf-profile.children.cycles-pp.alloc_pages_bulk_noprof
0.09 ± 28% +0.3 0.40 ± 37% +0.5 0.60 ± 46% +0.5 0.59 ± 42% perf-profile.children.cycles-pp.__p4d_alloc
0.09 ± 28% +0.3 0.40 ± 37% +0.5 0.60 ± 46% +0.5 0.59 ± 42% perf-profile.children.cycles-pp.get_zeroed_page_noprof
0.10 ± 21% +0.3 0.43 ± 39% +0.4 0.45 ± 32% +0.4 0.45 ± 37% perf-profile.children.cycles-pp.__x64_sys_openat
0.10 ± 19% +0.3 0.43 ± 39% +0.4 0.45 ± 32% +0.4 0.45 ± 37% perf-profile.children.cycles-pp.do_sys_openat2
0.01 ±223% +0.3 0.34 ± 26% +0.4 0.36 ± 24% +0.3 0.34 ± 27% perf-profile.children.cycles-pp.__kvmalloc_node_noprof
0.01 ±223% +0.3 0.34 ± 26% +0.4 0.36 ± 24% +0.3 0.34 ± 27% perf-profile.children.cycles-pp.single_open_size
0.09 ± 22% +0.3 0.43 ± 39% +0.4 0.45 ± 32% +0.4 0.45 ± 37% perf-profile.children.cycles-pp.do_filp_open
0.09 ± 22% +0.3 0.43 ± 39% +0.4 0.45 ± 32% +0.4 0.45 ± 37% perf-profile.children.cycles-pp.path_openat
0.12 ± 6% +0.3 0.47 ± 8% +0.3 0.44 ± 6% +0.3 0.43 ± 5% perf-profile.children.cycles-pp.irq_enter_rcu
0.04 ± 45% +0.3 0.39 ± 34% +0.4 0.41 ± 27% +0.3 0.35 ± 22% perf-profile.children.cycles-pp.perf_evlist__poll
0.04 ± 45% +0.3 0.39 ± 34% +0.4 0.41 ± 27% +0.3 0.35 ± 22% perf-profile.children.cycles-pp.perf_evlist__poll_thread
0.04 ± 44% +0.4 0.39 ± 34% +0.4 0.42 ± 27% +0.3 0.36 ± 22% perf-profile.children.cycles-pp.perf_poll
0.04 ± 45% +0.4 0.40 ± 33% +0.4 0.42 ± 27% +0.3 0.36 ± 22% perf-profile.children.cycles-pp.do_poll
0.04 ± 45% +0.4 0.40 ± 33% +0.4 0.42 ± 27% +0.3 0.36 ± 22% perf-profile.children.cycles-pp.__x64_sys_poll
0.04 ± 45% +0.4 0.40 ± 33% +0.4 0.42 ± 27% +0.3 0.36 ± 22% perf-profile.children.cycles-pp.do_sys_poll
0.04 ± 45% +0.4 0.40 ± 33% +0.4 0.43 ± 27% +0.3 0.36 ± 22% perf-profile.children.cycles-pp.__poll
0.02 ±141% +0.4 0.38 ± 32% +0.4 0.39 ± 25% +0.4 0.41 ± 36% perf-profile.children.cycles-pp.vfs_open
0.02 ±141% +0.4 0.38 ± 32% +0.4 0.39 ± 25% +0.4 0.41 ± 36% perf-profile.children.cycles-pp.do_open
0.07 ± 14% +0.4 0.44 ± 9% +0.3 0.42 ± 6% +0.3 0.40 ± 5% perf-profile.children.cycles-pp.tick_irq_enter
0.01 ±223% +0.4 0.38 ± 32% +0.4 0.39 ± 25% +0.4 0.41 ± 36% perf-profile.children.cycles-pp.do_dentry_open
0.02 ±141% +0.4 0.39 ± 34% +0.4 0.41 ± 28% +0.3 0.35 ± 22% perf-profile.children.cycles-pp.__pollwait
0.10 ± 11% +0.4 0.47 ± 7% +0.4 0.46 ± 5% +0.3 0.45 ± 6% perf-profile.children.cycles-pp.tick_nohz_next_event
0.04 ± 72% +0.4 0.43 ± 10% +0.5 0.55 ± 32% +0.5 0.55 ± 33% perf-profile.children.cycles-pp.alloc_new_pud
0.22 ± 21% +0.4 0.62 ± 7% +0.3 0.56 ± 26% +0.4 0.60 ± 33% perf-profile.children.cycles-pp.do_pte_missing
0.06 ± 51% +0.4 0.47 ± 15% +0.5 0.60 ± 29% +0.5 0.59 ± 30% perf-profile.children.cycles-pp.setup_arg_pages
0.01 ±223% +0.4 0.42 ± 40% +0.4 0.44 ± 32% +0.4 0.45 ± 38% perf-profile.children.cycles-pp.open64
0.06 ± 50% +0.4 0.47 ± 15% +0.5 0.60 ± 30% +0.5 0.59 ± 30% perf-profile.children.cycles-pp.relocate_vma_down
0.05 ± 73% +0.4 0.47 ± 15% +0.6 0.60 ± 30% +0.5 0.58 ± 30% perf-profile.children.cycles-pp.move_page_tables
0.14 ± 10% +0.5 0.61 ± 7% +0.4 0.58 ± 5% +0.4 0.57 ± 6% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.10 ± 13% +0.5 0.56 ± 23% +0.6 0.74 ± 23% +0.6 0.70 ± 29% perf-profile.children.cycles-pp.__do_sys_clone
0.21 ± 26% +0.5 0.74 ± 29% +0.7 0.92 ± 18% +0.7 0.88 ± 19% perf-profile.children.cycles-pp.get_free_pages_noprof
0.16 ± 17% +0.5 0.69 ± 13% +0.6 0.76 ± 24% +0.6 0.81 ± 23% perf-profile.children.cycles-pp.alloc_thread_stack_node
0.16 ± 18% +0.5 0.70 ± 15% +0.6 0.77 ± 24% +0.6 0.81 ± 23% perf-profile.children.cycles-pp.dup_task_struct
0.08 ± 41% +0.5 0.62 ± 19% +0.5 0.58 ± 26% +0.5 0.54 ± 45% perf-profile.children.cycles-pp.copy_strings
0.15 ± 20% +0.6 0.73 ± 16% +0.7 0.81 ± 24% +0.7 0.86 ± 22% perf-profile.children.cycles-pp.__vmalloc_area_node
0.16 ± 17% +0.6 0.74 ± 15% +0.7 0.82 ± 24% +0.7 0.87 ± 22% perf-profile.children.cycles-pp.__vmalloc_node_range_noprof
0.19 ± 19% +0.6 0.81 ± 12% +0.8 0.96 ± 19% +0.7 0.86 ± 31% perf-profile.children.cycles-pp.bprm_execve
0.18 ± 20% +0.6 0.81 ± 12% +0.8 0.95 ± 20% +0.7 0.85 ± 31% perf-profile.children.cycles-pp.exec_binprm
0.18 ± 20% +0.6 0.81 ± 12% +0.8 0.95 ± 20% +0.7 0.85 ± 31% perf-profile.children.cycles-pp.search_binary_handler
0.18 ± 21% +0.6 0.81 ± 12% +0.8 0.95 ± 20% +0.7 0.85 ± 31% perf-profile.children.cycles-pp.load_elf_binary
0.10 ± 54% +0.7 0.81 ± 26% +0.6 0.75 ± 22% +0.7 0.82 ± 29% perf-profile.children.cycles-pp.copy_string_kernel
0.44 ±141% +0.7 1.15 ±100% +1.4 1.87 ± 71% +1.1 1.57 ± 86% perf-profile.children.cycles-pp.do_swap_page
0.24 ± 10% +0.7 0.98 ± 15% +0.9 1.15 ± 18% +0.9 1.17 ± 22% perf-profile.children.cycles-pp.kernel_clone
0.23 ± 12% +0.7 0.98 ± 15% +0.9 1.15 ± 18% +0.9 1.17 ± 22% perf-profile.children.cycles-pp.copy_process
0.16 ± 22% +0.8 0.95 ± 19% +1.1 1.26 ± 17% +1.0 1.11 ± 19% perf-profile.children.cycles-pp._Fork
0.09 ± 28% +0.9 0.96 ± 16% +0.8 0.89 ± 19% +0.8 0.91 ± 28% perf-profile.children.cycles-pp.__pud_alloc
0.32 ± 8% +0.9 1.27 ± 7% +0.9 1.23 ± 4% +0.9 1.19 ± 5% perf-profile.children.cycles-pp.menu_select
0.18 ± 38% +1.3 1.43 ± 20% +1.2 1.33 ± 19% +1.2 1.36 ± 17% perf-profile.children.cycles-pp.get_arg_page
0.18 ± 37% +1.3 1.43 ± 20% +1.2 1.33 ± 19% +1.2 1.36 ± 17% perf-profile.children.cycles-pp.__get_user_pages
0.18 ± 37% +1.3 1.43 ± 20% +1.2 1.33 ± 19% +1.2 1.36 ± 17% perf-profile.children.cycles-pp.get_user_pages_remote
0.39 ± 43% +1.4 1.82 ± 8% +1.5 1.93 ± 13% +1.3 1.68 ± 10% perf-profile.children.cycles-pp.wp_page_copy
0.53 ± 14% +1.9 2.48 ± 11% +2.1 2.65 ± 13% +2.0 2.58 ± 17% perf-profile.children.cycles-pp.execve
0.53 ± 15% +1.9 2.48 ± 11% +2.1 2.65 ± 13% +2.1 2.58 ± 17% perf-profile.children.cycles-pp.do_execveat_common
0.53 ± 15% +1.9 2.48 ± 11% +2.1 2.65 ± 13% +2.1 2.58 ± 17% perf-profile.children.cycles-pp.__x64_sys_execve
5.78 +3.1 8.85 ± 6% +3.3 9.10 ± 3% +3.1 8.88 ± 3% perf-profile.children.cycles-pp.kthread
2.28 ± 5% +3.2 5.48 ± 11% +2.9 5.20 ± 6% +2.8 5.11 ± 4% perf-profile.children.cycles-pp.intel_idle
5.84 +3.3 9.16 ± 7% +3.7 9.50 ± 3% +3.4 9.19 ± 3% perf-profile.children.cycles-pp.ret_from_fork
5.84 +3.4 9.20 ± 6% +3.7 9.54 ± 3% +3.4 9.25 ± 3% perf-profile.children.cycles-pp.ret_from_fork_asm
1.46 ± 7% +3.5 4.97 ± 8% +3.9 5.32 ± 11% +3.7 5.12 ± 12% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
1.46 ± 7% +3.5 4.97 ± 8% +3.9 5.32 ± 11% +3.7 5.12 ± 12% perf-profile.children.cycles-pp.do_syscall_64
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +3.8 8.66 ± 3% perf-profile.children.cycles-pp.balance_pgdat
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +3.8 8.66 ± 3% perf-profile.children.cycles-pp.kswapd
68.16 +4.2 72.34 +3.7 71.89 ± 2% +4.1 72.25 ± 2% perf-profile.children.cycles-pp.vma_alloc_folio_noprof
6.72 ± 2% +4.5 11.21 ± 9% +3.9 10.60 ± 4% +3.7 10.38 ± 3% perf-profile.children.cycles-pp.start_secondary
6.94 ± 2% +4.6 11.50 ± 9% +4.0 10.90 ± 4% +3.8 10.69 ± 3% perf-profile.children.cycles-pp.common_startup_64
6.94 ± 2% +4.6 11.50 ± 9% +4.0 10.90 ± 4% +3.8 10.69 ± 3% perf-profile.children.cycles-pp.cpu_startup_entry
6.93 ± 2% +4.6 11.50 ± 9% +4.0 10.90 ± 4% +3.8 10.69 ± 3% perf-profile.children.cycles-pp.do_idle
68.51 +5.0 73.50 +5.2 73.70 +5.3 73.82 perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
3.78 +5.5 9.32 ± 9% +5.0 8.82 ± 5% +4.9 8.67 ± 3% perf-profile.children.cycles-pp.cpuidle_enter_state
3.80 +5.6 9.39 ± 9% +5.1 8.87 ± 5% +4.9 8.74 ± 3% perf-profile.children.cycles-pp.cpuidle_enter
4.70 +6.4 11.09 ± 8% +5.8 10.52 ± 4% +5.6 10.33 ± 3% perf-profile.children.cycles-pp.cpuidle_idle_call
69.40 +7.4 76.85 +8.2 77.62 +8.5 77.86 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
69.44 +8.1 77.55 +8.9 78.37 +9.1 78.58 perf-profile.children.cycles-pp.__alloc_pages_noprof
68.72 +8.7 77.46 +9.6 78.30 +9.8 78.51 perf-profile.children.cycles-pp.__alloc_pages_slowpath
65.97 +11.3 77.23 +12.1 78.09 +12.3 78.30 perf-profile.children.cycles-pp.try_to_free_pages
65.66 +11.5 77.18 +12.4 78.05 +12.6 78.26 perf-profile.children.cycles-pp.do_try_to_free_pages
70.51 +15.3 85.80 +16.4 86.90 +16.4 86.92 perf-profile.children.cycles-pp.shrink_node
68.95 +16.8 85.72 +17.9 86.84 +17.9 86.86 perf-profile.children.cycles-pp.shrink_many
68.92 +16.8 85.71 +17.9 86.83 +17.9 86.86 perf-profile.children.cycles-pp.shrink_one
68.42 +17.3 85.68 +18.4 86.80 +18.4 86.83 perf-profile.children.cycles-pp.try_to_shrink_lruvec
68.37 +17.3 85.67 +18.4 86.80 +18.5 86.83 perf-profile.children.cycles-pp.evict_folios
64.64 +20.7 85.30 ± 2% +22.0 86.63 +22.0 86.66 perf-profile.children.cycles-pp.shrink_folio_list
43.46 +39.8 83.30 ± 2% +41.4 84.85 +41.4 84.89 perf-profile.children.cycles-pp.try_to_unmap_flush_dirty
43.44 +39.9 83.30 ± 2% +41.4 84.85 +41.5 84.89 perf-profile.children.cycles-pp.arch_tlbbatch_flush
43.35 +40.0 83.33 ± 2% +41.5 84.87 +41.6 84.93 perf-profile.children.cycles-pp.on_each_cpu_cond_mask
43.34 +40.0 83.33 ± 2% +41.5 84.87 +41.6 84.93 perf-profile.children.cycles-pp.smp_call_function_many_cond
5.95 ± 4% -4.9 1.04 ± 7% -4.9 1.08 ± 2% -5.0 0.97 ± 3% perf-profile.self.cycles-pp.llist_add_batch
4.70 ± 4% -4.3 0.41 ±142% -4.6 0.15 ± 11% -4.6 0.15 ± 8% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
4.45 ± 2% -3.8 0.64 ± 4% -3.8 0.65 ± 4% -3.8 0.65 ± 3% perf-profile.self.cycles-pp.llist_reverse_order
4.31 ± 5% -3.8 0.53 ± 14% -3.9 0.40 ± 16% -3.9 0.46 ± 12% perf-profile.self.cycles-pp.do_rw_once
3.65 ± 2% -3.0 0.66 ± 4% -3.0 0.65 ± 3% -3.0 0.63 ± 4% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
3.14 ± 9% -2.8 0.36 ± 4% -2.8 0.36 ± 4% -2.8 0.35 ± 3% perf-profile.self.cycles-pp.flush_tlb_func
2.56 ± 5% -2.3 0.30 ± 16% -2.3 0.23 ± 18% -2.3 0.26 ± 12% perf-profile.self.cycles-pp.do_access
2.35 ± 4% -2.1 0.27 ± 6% -2.1 0.26 ± 5% -2.1 0.26 ± 5% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
2.39 -2.1 0.34 ± 5% -2.1 0.31 ± 6% -2.1 0.32 ± 4% perf-profile.self.cycles-pp.native_irq_return_iret
1.83 ± 3% -1.7 0.09 ± 10% -1.7 0.09 ± 7% -1.7 0.09 ± 8% perf-profile.self.cycles-pp.native_flush_tlb_local
1.92 ± 3% -1.7 0.24 ± 4% -1.7 0.25 ± 7% -1.7 0.25 ± 7% perf-profile.self.cycles-pp.page_counter_try_charge
1.69 ± 4% -1.4 0.31 ± 11% -1.4 0.26 ± 9% -1.4 0.28 ± 7% perf-profile.self.cycles-pp._raw_spin_lock
1.12 ± 14% -1.0 0.10 ± 6% -1.0 0.10 ± 7% -1.0 0.10 ± 8% perf-profile.self.cycles-pp.set_tlb_ubc_flush_pending
0.99 ± 6% -0.9 0.10 ± 10% -0.9 0.09 ± 8% -0.9 0.09 ± 8% perf-profile.self.cycles-pp.try_to_unmap_one
0.98 ± 7% -0.9 0.12 ± 11% -0.9 0.10 ± 16% -0.9 0.11 ± 11% perf-profile.self.cycles-pp.try_charge_memcg
0.94 ± 5% -0.8 0.10 ± 4% -0.8 0.11 ± 8% -0.8 0.11 ± 11% perf-profile.self.cycles-pp.mem_cgroup_id_get_online
0.75 ± 13% -0.7 0.07 ± 34% -0.7 0.04 ± 57% -0.7 0.04 ± 66% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.74 ± 4% -0.7 0.08 ± 17% -0.7 0.06 ± 14% -0.7 0.07 ± 12% perf-profile.self.cycles-pp.sync_regs
0.75 ± 5% -0.7 0.09 ± 27% -0.7 0.07 ± 11% -0.7 0.07 ± 6% perf-profile.self.cycles-pp.shrink_folio_list
0.76 ± 9% -0.7 0.10 ± 9% -0.7 0.10 ± 7% -0.7 0.10 ± 8% perf-profile.self.cycles-pp._find_next_bit
0.63 ± 3% -0.6 0.07 ± 16% -0.6 0.06 ± 10% -0.6 0.06 ± 10% perf-profile.self.cycles-pp.swap_writepage
0.57 ± 9% -0.5 0.04 ± 72% -0.5 0.02 ±100% -0.5 0.03 ± 81% perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
0.47 ± 7% -0.4 0.02 ± 99% -0.5 0.02 ±129% -0.5 0.02 ±122% perf-profile.self.cycles-pp.rmqueue_bulk
0.56 ± 4% -0.4 0.13 ± 5% -0.4 0.13 ± 6% -0.4 0.12 ± 7% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.48 ± 6% -0.4 0.05 ± 7% -0.4 0.06 ± 26% -0.4 0.06 ± 9% perf-profile.self.cycles-pp.swap_cgroup_record
0.45 ± 6% -0.4 0.04 ±115% -0.4 0.00 -0.4 0.01 ±200% perf-profile.self.cycles-pp.lru_gen_add_folio
0.46 ± 2% -0.4 0.05 ± 90% -0.4 0.01 ±173% -0.4 0.02 ±122% perf-profile.self.cycles-pp.lru_gen_del_folio
0.40 ± 6% -0.4 0.04 ± 71% -0.4 0.01 ±173% -0.4 0.02 ±152% perf-profile.self.cycles-pp.get_page_from_freelist
0.38 ± 3% -0.3 0.04 ± 45% -0.4 0.00 ±387% -0.4 0.01 ±299% perf-profile.self.cycles-pp.do_anonymous_page
0.28 ± 8% -0.2 0.07 ± 12% -0.2 0.07 ± 11% -0.2 0.06 ± 7% perf-profile.self.cycles-pp.error_entry
0.38 ± 6% -0.2 0.19 ± 5% -0.2 0.18 ± 6% -0.2 0.17 ± 6% perf-profile.self.cycles-pp.native_sched_clock
0.25 ± 10% -0.2 0.06 ± 13% -0.2 0.05 ± 28% -0.2 0.06 ± 10% perf-profile.self.cycles-pp.update_sg_lb_stats
0.23 ± 7% -0.1 0.08 ± 10% -0.1 0.10 ±113% -0.2 0.07 ± 12% perf-profile.self.cycles-pp.memcpy_toio
0.19 ± 11% -0.1 0.06 ± 13% -0.1 0.07 ±105% -0.1 0.05 ± 9% perf-profile.self.cycles-pp.io_serial_in
0.16 ± 8% -0.1 0.06 ± 17% -0.1 0.06 ± 6% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.asm_sysvec_call_function
0.11 ± 11% -0.1 0.04 ± 44% -0.1 0.01 ±173% -0.1 0.01 ±200% perf-profile.self.cycles-pp.irqentry_enter
0.17 ± 8% -0.1 0.10 ± 8% -0.1 0.10 ± 8% -0.1 0.10 ± 5% perf-profile.self.cycles-pp.irqtime_account_irq
0.09 ± 7% -0.0 0.05 ± 8% -0.0 0.05 ± 38% -0.0 0.04 ± 33% perf-profile.self.cycles-pp.sched_core_idle_cpu
0.00 +0.0 0.00 +0.0 0.00 +0.1 0.09 ± 8% perf-profile.self.cycles-pp.should_flush_tlb
0.03 ± 70% +0.0 0.08 ± 6% +0.0 0.08 ± 7% +0.0 0.07 ± 6% perf-profile.self.cycles-pp.read_tsc
0.00 +0.1 0.05 ± 49% +0.1 0.06 ± 15% +0.1 0.06 ± 9% perf-profile.self.cycles-pp.intel_idle_irq
0.00 +0.1 0.05 ± 8% +0.0 0.04 ± 48% +0.0 0.04 ± 50% perf-profile.self.cycles-pp.__hrtimer_next_event_base
0.00 +0.1 0.06 ± 11% +0.1 0.06 ± 9% +0.1 0.05 ± 8% perf-profile.self.cycles-pp.tick_nohz_stop_tick
0.00 +0.1 0.06 ± 14% +0.1 0.06 ± 8% +0.1 0.06 ± 11% perf-profile.self.cycles-pp.cpuidle_enter
0.00 +0.1 0.06 ± 14% +0.1 0.06 ± 8% +0.1 0.06 ± 8% perf-profile.self.cycles-pp.nr_iowait_cpu
0.07 ± 12% +0.1 0.14 ± 11% +0.1 0.13 ± 8% +0.1 0.13 ± 5% perf-profile.self.cycles-pp.ktime_get
0.00 +0.1 0.06 ± 11% +0.1 0.06 ± 9% +0.1 0.06 ± 12% perf-profile.self.cycles-pp.irq_work_needs_cpu
0.00 +0.1 0.07 ± 11% +0.1 0.06 ± 10% +0.1 0.06 ± 8% perf-profile.self.cycles-pp.tsc_verify_tsc_adjust
0.00 +0.1 0.07 ± 13% +0.1 0.06 ± 13% +0.1 0.06 ± 12% perf-profile.self.cycles-pp.ct_kernel_enter
0.00 +0.1 0.07 ± 10% +0.1 0.07 ± 9% +0.1 0.07 ± 7% perf-profile.self.cycles-pp.tick_nohz_next_event
0.00 +0.1 0.08 ± 10% +0.1 0.08 ± 9% +0.1 0.08 ± 6% perf-profile.self.cycles-pp.get_cpu_device
0.00 +0.1 0.09 ± 4% +0.1 0.09 ± 9% +0.1 0.08 ± 11% perf-profile.self.cycles-pp.tick_irq_enter
0.01 ±223% +0.1 0.10 ± 9% +0.1 0.10 ± 9% +0.1 0.10 ± 6% perf-profile.self.cycles-pp.__get_next_timer_interrupt
0.00 +0.1 0.10 ± 11% +0.1 0.09 ± 9% +0.1 0.09 ± 6% perf-profile.self.cycles-pp.cpuidle_idle_call
0.00 +0.1 0.10 ± 9% +0.1 0.10 ± 8% +0.1 0.10 ± 9% perf-profile.self.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.02 ± 99% +0.3 0.30 ± 19% +0.3 0.30 ± 14% +0.3 0.30 ± 13% perf-profile.self.cycles-pp.poll_idle
0.13 ± 8% +0.4 0.50 ± 8% +0.4 0.49 ± 4% +0.3 0.47 ± 4% perf-profile.self.cycles-pp.menu_select
0.11 ± 13% +0.6 0.69 ± 9% +0.5 0.64 ± 4% +0.5 0.60 ± 5% perf-profile.self.cycles-pp.cpuidle_enter_state
2.28 ± 5% +3.2 5.48 ± 11% +2.9 5.20 ± 6% +2.8 5.11 ± 4% perf-profile.self.cycles-pp.intel_idle
24.40 +56.1 80.53 ± 2% +57.6 82.01 +57.7 82.14 perf-profile.self.cycles-pp.smp_call_function_many_cond
>
> ---8<---
>
> From b639c1f16ddf4bcfc44dbaa2b8077220f88b1876 Mon Sep 17 00:00:00 2001
> From: Rik van Riel <riel@fb.com>
> Date: Mon, 2 Dec 2024 09:57:31 -0800
> Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second
>
> Setting and clearing CPU bits in the mm_cpumask is only ever done
> by the CPU itself, from the context switch code or the TLB flush
> code.
>
> Synchronization is handled by switch_mm_irqs_off blocking interrupts.
>
> Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
> longer running the program causes a regression in the will-it-scale
> tlbflush2 test. This test is contrived, but a large regression here
> might cause a small regression in some real world workload.
>
> Instead of always sending IPIs to CPUs that are in the mm_cpumask,
> but no longer running the program, send these IPIs only once a second.
>
> The rest of the time we can skip over CPUs where the loaded_mm is
> different from the target mm.
>
> On a two socket system with 20 CPU cores on each socket (80 CPUs total),
> this patch, on top of the other context switch patches shows a 3.6%
> speedup in the total runtime of will-it-scale tlbflush2 -t 40 -s 100000.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reported-by: kernel test roboto <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
> ---
> arch/x86/include/asm/mmu.h | 2 ++
> arch/x86/include/asm/mmu_context.h | 1 +
> arch/x86/mm/tlb.c | 25 ++++++++++++++++++++++---
> 3 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> index ce4677b8b735..2c7e3855b88b 100644
> --- a/arch/x86/include/asm/mmu.h
> +++ b/arch/x86/include/asm/mmu.h
> @@ -37,6 +37,8 @@ typedef struct {
> */
> atomic64_t tlb_gen;
>
> + unsigned long last_trimmed_cpumask;
> +
> #ifdef CONFIG_MODIFY_LDT_SYSCALL
> struct rw_semaphore ldt_usr_sem;
> struct ldt_struct *ldt;
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 8dac45a2c7fc..428fd190477a 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -145,6 +145,7 @@ static inline int init_new_context(struct task_struct *tsk,
>
> mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
> atomic64_set(&mm->context.tlb_gen, 0);
> + mm->context.last_trimmed_cpumask = jiffies;
>
> #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index fcea29e07eed..0ce5f2ed7825 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -766,6 +766,7 @@ static void flush_tlb_func(void *info)
> */
> if (f->mm && f->mm != loaded_mm) {
> cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
> + f->mm->context.last_trimmed_cpumask = jiffies;
> return;
> }
> }
> @@ -897,9 +898,27 @@ static void flush_tlb_func(void *info)
> nr_invalidate);
> }
>
> -static bool tlb_is_not_lazy(int cpu, void *data)
> +static bool should_flush_tlb(int cpu, void *data)
> {
> - return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
> + struct flush_tlb_info *info = data;
> +
> + /* Lazy TLB will get flushed at the next context switch. */
> + if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
> + return false;
> +
> + /* No mm means kernel memory flush. */
> + if (!info->mm)
> + return true;
> +
> + /* The target mm is loaded, and the CPU is not lazy. */
> + if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
> + return true;
> +
> + /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
> + if (jiffies > info->mm->context.last_trimmed_cpumask + HZ)
> + return true;
> +
> + return false;
> }
>
> DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
> @@ -933,7 +952,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
> if (info->freed_tables)
> on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
> else
> - on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
> + on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
> (void *)info, 1, cpumask);
> }
>
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86,mm: only trim the mm_cpumask once a second
2024-12-04 13:15 ` Oliver Sang
@ 2024-12-04 16:07 ` Rik van Riel
2024-12-04 16:56 ` [PATCH v3] " Rik van Riel
1 sibling, 0 replies; 24+ messages in thread
From: Rik van Riel @ 2024-12-04 16:07 UTC (permalink / raw)
To: Oliver Sang
Cc: oe-lkp, lkp, linux-kernel, x86, Ingo Molnar, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Mel Gorman, Mathieu Desnoyers
On Wed, 2024-12-04 at 21:15 +0800, Oliver Sang wrote:
>
> we tested this patch, unfortunately, we found even bigger regression
> in our
> will-it-scale tests. and for another vm-scalability test, it also
> causes a
> little worse performance.
>
> we noticed there is the v2 for this patch, not sure if any
> significant changes
> which could impact performance? if so, please notify us and we could
> test
> further. thanks
>
> below is details.
Looking at the profile, it looks like:
1) switch_mm_irqs_off is somehow taking more
CPU time after these changes, despite
removing an unconditional atomic set_bit.
I have no good explanation for this.
2) Moving some overhead from the fast path
in the context switch patch (switch_mm_irqs_off)
to the slower path in flush_tlb_func isn't
right for the tlb_flush2 threaded test,
which basically only does madvise and
TLB flushes :)
However, I think we can reduce the overhead
in the TLB flush side a little more, by moving
the jiffies test from tlb_flush_func into the
calling unction flush_tlb_mm_range, so the
jiffies comparison is only ever done on the
calling CPU, not on all the CPUs that receive
the IPIs.
Let me send over a v3 in a little bit.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3] x86,mm: only trim the mm_cpumask once a second
2024-12-04 13:15 ` Oliver Sang
2024-12-04 16:07 ` Rik van Riel
@ 2024-12-04 16:56 ` Rik van Riel
2024-12-04 20:19 ` Mathieu Desnoyers
1 sibling, 1 reply; 24+ messages in thread
From: Rik van Riel @ 2024-12-04 16:56 UTC (permalink / raw)
To: Oliver Sang
Cc: oe-lkp, lkp, linux-kernel, x86, Ingo Molnar, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Mel Gorman, Mathieu Desnoyers
On Wed, 4 Dec 2024 21:15:24 +0800
Oliver Sang <oliver.sang@intel.com> wrote:
> we noticed there is the v2 for this patch, not sure if any significant changes
> which could impact performance? if so, please notify us and we could test
> further. thanks
To some extent, I suspect we should expect some regressions with the
will-it-scale tlb_flush2 threaded test, since for "normal" workloads
the context switch code is the fast path, and madvise is much less
common.
However, v3 of the patch (below) shifts a lot less work into
flush_tlb_func, where it is done by all CPUs, and does more of
that work on the calling CPU, where it is done only once, instead.
For performance, I'm just going to throw it over to you, because
the largest 2 socket systems I have access to do not seem to behave
like your (much larger) 2 socket system.
---8<---
From 3118ddb2260bd92a8b0679b7e6fd51ee494c17c9 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@fb.com>
Date: Mon, 2 Dec 2024 09:57:31 -0800
Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second
Setting and clearing CPU bits in the mm_cpumask is only ever done
by the CPU itself, from the context switch code or the TLB flush
code.
Synchronization is handled by switch_mm_irqs_off blocking interrupts.
Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
longer running the program causes a regression in the will-it-scale
tlbflush2 test. This test is contrived, but a large regression here
might cause a small regression in some real world workload.
Instead of always sending IPIs to CPUs that are in the mm_cpumask,
but no longer running the program, send these IPIs only once a second.
The rest of the time we can skip over CPUs where the loaded_mm is
different from the target mm.
Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: kernel test roboto <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
---
arch/x86/include/asm/mmu.h | 2 ++
arch/x86/include/asm/mmu_context.h | 1 +
arch/x86/include/asm/tlbflush.h | 1 +
arch/x86/mm/tlb.c | 35 +++++++++++++++++++++++++++---
4 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index ce4677b8b735..3b496cdcb74b 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -37,6 +37,8 @@ typedef struct {
*/
atomic64_t tlb_gen;
+ unsigned long next_trim_cpumask;
+
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct rw_semaphore ldt_usr_sem;
struct ldt_struct *ldt;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2886cb668d7f..795fdd53bd0a 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
atomic64_set(&mm->context.tlb_gen, 0);
+ mm->context.next_trim_cpumask = jiffies + HZ;
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 69e79fff41b8..02fc2aa06e9e 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -222,6 +222,7 @@ struct flush_tlb_info {
unsigned int initiating_cpu;
u8 stride_shift;
u8 freed_tables;
+ u8 trim_cpumask;
};
void flush_tlb_local(void);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1aac4fa90d3d..a758143afa01 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -892,9 +892,36 @@ static void flush_tlb_func(void *info)
nr_invalidate);
}
-static bool tlb_is_not_lazy(int cpu, void *data)
+static bool should_flush_tlb(int cpu, void *data)
{
- return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
+ struct flush_tlb_info *info = data;
+
+ /* Lazy TLB will get flushed at the next context switch. */
+ if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
+ return false;
+
+ /* No mm means kernel memory flush. */
+ if (!info->mm)
+ return true;
+
+ /* The target mm is loaded, and the CPU is not lazy. */
+ if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+ return true;
+
+ /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
+ if (info->trim_cpumask)
+ return true;
+
+ return false;
+}
+
+static bool should_trim_cpumask(struct mm_struct *mm)
+{
+ if (time_after(jiffies, mm->context.next_trim_cpumask)) {
+ mm->context.next_trim_cpumask = jiffies + HZ;
+ return true;
+ }
+ return false;
}
DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
@@ -928,7 +955,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
if (info->freed_tables)
on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
else
- on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
+ on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
(void *)info, 1, cpumask);
}
@@ -979,6 +1006,7 @@ static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
info->freed_tables = freed_tables;
info->new_tlb_gen = new_tlb_gen;
info->initiating_cpu = smp_processor_id();
+ info->trim_cpumask = 0;
return info;
}
@@ -1021,6 +1049,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
* flush_tlb_func_local() directly in this case.
*/
if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
+ info->trim_cpumask = should_trim_cpumask(mm);
flush_tlb_multi(mm_cpumask(mm), info);
} else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
lockdep_assert_irqs_enabled();
--
2.47.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3] x86,mm: only trim the mm_cpumask once a second
2024-12-04 16:56 ` [PATCH v3] " Rik van Riel
@ 2024-12-04 20:19 ` Mathieu Desnoyers
2024-12-05 2:03 ` [PATCH v4] " Rik van Riel
0 siblings, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2024-12-04 20:19 UTC (permalink / raw)
To: Rik van Riel, Oliver Sang
Cc: oe-lkp, lkp, linux-kernel, x86, Ingo Molnar, Dave Hansen,
Linus Torvalds, Peter Zijlstra, Mel Gorman
On 2024-12-04 11:56, Rik van Riel wrote:
[...]
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reported-by: kernel test roboto <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
> ---
> arch/x86/include/asm/mmu.h | 2 ++
> arch/x86/include/asm/mmu_context.h | 1 +
> arch/x86/include/asm/tlbflush.h | 1 +
> arch/x86/mm/tlb.c | 35 +++++++++++++++++++++++++++---
> 4 files changed, 36 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> index ce4677b8b735..3b496cdcb74b 100644
> --- a/arch/x86/include/asm/mmu.h
> +++ b/arch/x86/include/asm/mmu.h
> @@ -37,6 +37,8 @@ typedef struct {
> */
> atomic64_t tlb_gen;
>
> + unsigned long next_trim_cpumask;
> +
> #ifdef CONFIG_MODIFY_LDT_SYSCALL
> struct rw_semaphore ldt_usr_sem;
> struct ldt_struct *ldt;
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 2886cb668d7f..795fdd53bd0a 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
>
> mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
> atomic64_set(&mm->context.tlb_gen, 0);
> + mm->context.next_trim_cpumask = jiffies + HZ;
>
> #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 69e79fff41b8..02fc2aa06e9e 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -222,6 +222,7 @@ struct flush_tlb_info {
> unsigned int initiating_cpu;
> u8 stride_shift;
> u8 freed_tables;
> + u8 trim_cpumask;
> };
>
> void flush_tlb_local(void);
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 1aac4fa90d3d..a758143afa01 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -892,9 +892,36 @@ static void flush_tlb_func(void *info)
> nr_invalidate);
> }
>
> -static bool tlb_is_not_lazy(int cpu, void *data)
> +static bool should_flush_tlb(int cpu, void *data)
> {
> - return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
> + struct flush_tlb_info *info = data;
> +
> + /* Lazy TLB will get flushed at the next context switch. */
> + if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
> + return false;
> +
> + /* No mm means kernel memory flush. */
> + if (!info->mm)
> + return true;
> +
> + /* The target mm is loaded, and the CPU is not lazy. */
> + if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
> + return true;
> +
> + /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
> + if (info->trim_cpumask)
> + return true;
> +
> + return false;
> +}
> +
> +static bool should_trim_cpumask(struct mm_struct *mm)
> +{
> + if (time_after(jiffies, mm->context.next_trim_cpumask)) {
> + mm->context.next_trim_cpumask = jiffies + HZ;
AFAIU this should_trim_cpumask can be called from many cpus
concurrently for a given mm, so we'd want READ_ONCE/WRITE_ONCE
on the next_trim_cpumask.
Thanks,
Mathieu
> + return true;
> + }
> + return false;
> }
>
> DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
> @@ -928,7 +955,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
> if (info->freed_tables)
> on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
> else
> - on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
> + on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
> (void *)info, 1, cpumask);
> }
>
> @@ -979,6 +1006,7 @@ static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
> info->freed_tables = freed_tables;
> info->new_tlb_gen = new_tlb_gen;
> info->initiating_cpu = smp_processor_id();
> + info->trim_cpumask = 0;
>
> return info;
> }
> @@ -1021,6 +1049,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> * flush_tlb_func_local() directly in this case.
> */
> if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
> + info->trim_cpumask = should_trim_cpumask(mm);
> flush_tlb_multi(mm_cpumask(mm), info);
> } else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
> lockdep_assert_irqs_enabled();
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v4] x86,mm: only trim the mm_cpumask once a second
2024-12-04 20:19 ` Mathieu Desnoyers
@ 2024-12-05 2:03 ` Rik van Riel
2024-12-06 1:30 ` Oliver Sang
2024-12-06 9:40 ` [tip: x86/mm] x86/mm/tlb: Only " tip-bot2 for Rik van Riel
0 siblings, 2 replies; 24+ messages in thread
From: Rik van Riel @ 2024-12-05 2:03 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Oliver Sang, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Mel Gorman
On Wed, 4 Dec 2024 15:19:46 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> AFAIU this should_trim_cpumask can be called from many cpus
> concurrently for a given mm, so we'd want READ_ONCE/WRITE_ONCE
> on the next_trim_cpumask.
Here is v4, which is identical to v3 except for READ_ONCE/WRITE_ONCE.
Looking forward to the test bot results, since the hardware I have
available does not seem to behave in quite the same way :)
---8<---
From 49af9b203e971d00c87b2d020f48602936870576 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@fb.com>
Date: Mon, 2 Dec 2024 09:57:31 -0800
Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second
Setting and clearing CPU bits in the mm_cpumask is only ever done
by the CPU itself, from the context switch code or the TLB flush
code.
Synchronization is handled by switch_mm_irqs_off blocking interrupts.
Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
longer running the program causes a regression in the will-it-scale
tlbflush2 test. This test is contrived, but a large regression here
might cause a small regression in some real world workload.
Instead of always sending IPIs to CPUs that are in the mm_cpumask,
but no longer running the program, send these IPIs only once a second.
The rest of the time we can skip over CPUs where the loaded_mm is
different from the target mm.
Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: kernel test roboto <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
---
arch/x86/include/asm/mmu.h | 2 ++
arch/x86/include/asm/mmu_context.h | 1 +
arch/x86/include/asm/tlbflush.h | 1 +
arch/x86/mm/tlb.c | 35 +++++++++++++++++++++++++++---
4 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index ce4677b8b735..3b496cdcb74b 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -37,6 +37,8 @@ typedef struct {
*/
atomic64_t tlb_gen;
+ unsigned long next_trim_cpumask;
+
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct rw_semaphore ldt_usr_sem;
struct ldt_struct *ldt;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2886cb668d7f..795fdd53bd0a 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
atomic64_set(&mm->context.tlb_gen, 0);
+ mm->context.next_trim_cpumask = jiffies + HZ;
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 69e79fff41b8..02fc2aa06e9e 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -222,6 +222,7 @@ struct flush_tlb_info {
unsigned int initiating_cpu;
u8 stride_shift;
u8 freed_tables;
+ u8 trim_cpumask;
};
void flush_tlb_local(void);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1aac4fa90d3d..0507a6773a37 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -892,9 +892,36 @@ static void flush_tlb_func(void *info)
nr_invalidate);
}
-static bool tlb_is_not_lazy(int cpu, void *data)
+static bool should_flush_tlb(int cpu, void *data)
{
- return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
+ struct flush_tlb_info *info = data;
+
+ /* Lazy TLB will get flushed at the next context switch. */
+ if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
+ return false;
+
+ /* No mm means kernel memory flush. */
+ if (!info->mm)
+ return true;
+
+ /* The target mm is loaded, and the CPU is not lazy. */
+ if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+ return true;
+
+ /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
+ if (info->trim_cpumask)
+ return true;
+
+ return false;
+}
+
+static bool should_trim_cpumask(struct mm_struct *mm)
+{
+ if (time_after(jiffies, READ_ONCE(mm->context.next_trim_cpumask))) {
+ WRITE_ONCE(mm->context.next_trim_cpumask, jiffies + HZ);
+ return true;
+ }
+ return false;
}
DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
@@ -928,7 +955,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
if (info->freed_tables)
on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
else
- on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
+ on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
(void *)info, 1, cpumask);
}
@@ -979,6 +1006,7 @@ static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
info->freed_tables = freed_tables;
info->new_tlb_gen = new_tlb_gen;
info->initiating_cpu = smp_processor_id();
+ info->trim_cpumask = 0;
return info;
}
@@ -1021,6 +1049,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
* flush_tlb_func_local() directly in this case.
*/
if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
+ info->trim_cpumask = should_trim_cpumask(mm);
flush_tlb_multi(mm_cpumask(mm), info);
} else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
lockdep_assert_irqs_enabled();
--
2.47.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4] x86,mm: only trim the mm_cpumask once a second
2024-12-05 2:03 ` [PATCH v4] " Rik van Riel
@ 2024-12-06 1:30 ` Oliver Sang
2024-12-06 9:40 ` [tip: x86/mm] x86/mm/tlb: Only " tip-bot2 for Rik van Riel
1 sibling, 0 replies; 24+ messages in thread
From: Oliver Sang @ 2024-12-06 1:30 UTC (permalink / raw)
To: Rik van Riel
Cc: Mathieu Desnoyers, oe-lkp, lkp, linux-kernel, x86, Ingo Molnar,
Dave Hansen, Linus Torvalds, Peter Zijlstra, Mel Gorman,
oliver.sang
hi, Rik van Riel,
On Wed, Dec 04, 2024 at 09:03:16PM -0500, Rik van Riel wrote:
> On Wed, 4 Dec 2024 15:19:46 -0500
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>
> > AFAIU this should_trim_cpumask can be called from many cpus
> > concurrently for a given mm, so we'd want READ_ONCE/WRITE_ONCE
> > on the next_trim_cpumask.
>
> Here is v4, which is identical to v3 except for READ_ONCE/WRITE_ONCE.
>
> Looking forward to the test bot results, since the hardware I have
> available does not seem to behave in quite the same way :)
thanks for waiting our results!
however, it's sorry to say we didn't see the regression recovered by this v4
patch, for both tests.
our bot still apply this patch upon 2815a56e4b725 as below.
* 852ff7f2f791a x86,mm: only trim the mm_cpumask once a second <--- v4
* 2815a56e4b725 (tip/x86/mm) x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU
* 209954cbc7d0c x86/mm/tlb: Update mm_cpumask lazily
* 7e33001b8b9a7 x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM
for will-it-scale (full comparison is in [1])
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
2815a56e4b ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU")
852ff7f2f7 ("x86,mm: only trim the mm_cpumask once a second")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 2815a56e4b7252a836969f5674e 852ff7f2f791aadd04317d1a53f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
7276 -13.2% 6315 -13.0% 6328 -14.9% 6191 will-it-scale.per_thread_ops
for vm-scalability (full comparison is in [2])
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/priority/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled:
gcc-12/performance/x86_64-rhel-9.4/1/32/1/debian-12-x86_64-20240206.cgz/300/lkp-icl-2sp4/swap-w-seq-mt/vm-scalability/always/never
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
2815a56e4b ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU")
852ff7f2f7 ("x86,mm: only trim the mm_cpumask once a second")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 2815a56e4b7252a836969f5674e 852ff7f2f791aadd04317d1a53f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
38311 ± 5% -40.8% 22667 -41.1% 22583 ± 2% -41.3% 22494 vm-scalability.median
1234132 ± 4% -40.7% 732265 -40.8% 730989 ± 3% -40.9% 729108 vm-scalability.throughput
[1]
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
2815a56e4b ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU")
852ff7f2f7 ("x86,mm: only trim the mm_cpumask once a second")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 2815a56e4b7252a836969f5674e 852ff7f2f791aadd04317d1a53f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
3743 ± 6% -9.6% 3383 ± 10% -2.0% 3669 ± 6% -4.9% 3561 ± 11% numa-meminfo.node1.PageTables
18158 ± 2% -9.9% 16367 ± 2% -12.6% 15874 -14.9% 15449 uptime.idle
36.77 -1.2% 36.34 -1.3% 36.28 -0.9% 36.42 boot-time.boot
3503 -1.3% 3458 -1.5% 3452 -1.0% 3467 boot-time.idle
1.421e+10 -9.6% 1.284e+10 ± 2% -11.4% 1.259e+10 -14.2% 1.219e+10 cpuidle..time
2.595e+08 -12.9% 2.26e+08 -13.2% 2.251e+08 -18.7% 2.109e+08 ± 2% cpuidle..usage
20954 ± 17% -17.0% 17391 ± 2% -14.2% 17979 ± 12% -17.6% 17255 perf-c2c.DRAM.remote
18165 ± 17% -17.7% 14957 ± 2% -15.1% 15413 ± 12% -18.7% 14774 perf-c2c.HITM.remote
44864 ± 17% -15.4% 37953 -12.4% 39320 ± 12% -15.9% 37727 perf-c2c.HITM.total
44.91 -9.3% 40.74 -10.5% 40.20 -12.5% 39.29 vmstat.cpu.id
695438 -8.1% 638790 -7.4% 644221 -8.5% 636573 vmstat.system.cs
4553480 -3.5% 4393928 -2.6% 4436365 -4.0% 4371742 vmstat.system.in
44.57 -4.3 40.31 -4.7 39.83 -5.7 38.86 mpstat.cpu.all.idle%
9.85 +6.1 15.94 +6.3 16.17 +7.6 17.41 mpstat.cpu.all.irq%
0.10 +0.0 0.12 +0.0 0.13 +0.0 0.12 mpstat.cpu.all.soft%
2.34 ± 2% -0.3 2.02 -0.3 2.06 -0.3 2.03 mpstat.cpu.all.usr%
1.139e+08 ± 2% -14.5% 97376097 ± 3% -12.7% 99390724 ± 3% -13.3% 98785729 ± 3% numa-numastat.node0.local_node
1.139e+08 ± 2% -14.5% 97404595 ± 3% -12.7% 99439249 ± 3% -13.2% 98835438 ± 3% numa-numastat.node0.numa_hit
1.146e+08 -11.6% 1.013e+08 ± 2% -13.0% 99664033 ± 3% -16.3% 95955617 ± 3% numa-numastat.node1.local_node
1.146e+08 -11.6% 1.013e+08 ± 2% -13.0% 99724983 ± 3% -16.3% 96015424 ± 3% numa-numastat.node1.numa_hit
756738 -13.2% 656838 -13.0% 658224 -14.9% 643961 will-it-scale.104.threads
43.82 -9.5% 39.67 -9.6% 39.62 -11.8% 38.64 will-it-scale.104.threads_idle
7276 -13.2% 6315 -13.0% 6328 -14.9% 6191 will-it-scale.per_thread_ops
756738 -13.2% 656838 -13.0% 658224 -14.9% 643961 will-it-scale.workload
1.139e+08 ± 2% -14.5% 97404162 ± 3% -12.7% 99438988 ± 3% -13.2% 98835133 ± 3% numa-vmstat.node0.numa_hit
1.139e+08 ± 2% -14.5% 97375664 ± 3% -12.7% 99390464 ± 3% -13.3% 98785428 ± 3% numa-vmstat.node0.numa_local
936.25 ± 6% -9.7% 845.81 ± 10% -2.0% 917.10 ± 6% -4.9% 890.29 ± 11% numa-vmstat.node1.nr_page_table_pages
1.146e+08 -11.6% 1.013e+08 ± 2% -13.0% 99724221 ± 3% -16.3% 96014486 ± 3% numa-vmstat.node1.numa_hit
1.146e+08 -11.6% 1.012e+08 ± 2% -13.0% 99663271 ± 3% -16.3% 95954678 ± 3% numa-vmstat.node1.numa_local
0.17 ± 5% -14.8% 0.14 ± 11% -9.2% 0.15 ± 6% -9.6% 0.15 ± 9% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.13 ± 5% -17.0% 0.11 ± 9% -11.2% 0.11 ± 8% -14.9% 0.11 ± 14% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
41283 ± 5% +22.8% 50696 ± 13% +13.2% 46723 ± 7% +14.6% 47323 ± 11% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
8372 ± 13% +22.1% 10221 ± 9% +18.0% 9882 ± 8% +26.4% 10587 ± 16% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
0.16 ± 5% -15.3% 0.14 ± 12% -9.6% 0.14 ± 6% -10.2% 0.14 ± 10% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.12 ± 6% -17.5% 0.10 ± 8% -11.9% 0.11 ± 8% -15.9% 0.10 ± 14% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
35873620 ± 30% -22.4% 27821786 ± 33% -40.9% 21206791 ± 26% -19.3% 28944541 ± 12% sched_debug.cfs_rq:/.avg_vruntime.max
3.14 ± 4% +27.4% 4.00 ± 6% +21.1% 3.80 ± 12% +26.4% 3.97 ± 9% sched_debug.cfs_rq:/.load_avg.min
35873620 ± 30% -22.4% 27821786 ± 33% -40.9% 21206791 ± 26% -19.3% 28944541 ± 12% sched_debug.cfs_rq:/.min_vruntime.max
82954 ± 8% +9.9% 91143 ± 7% +9.2% 90574 ± 7% +16.9% 96981 ± 5% sched_debug.cpu.avg_idle.min
1598 ± 3% +53.8% 2458 ± 16% +50.1% 2398 ± 12% +93.2% 3087 ± 8% sched_debug.cpu.clock_task.stddev
1005828 -7.9% 925907 -7.9% 926693 -9.1% 914560 sched_debug.cpu.nr_switches.avg
958802 -8.9% 873208 -8.8% 874819 -10.7% 856154 sched_debug.cpu.nr_switches.min
432388 ± 2% -2.1% 423334 -1.7% 425005 -2.0% 423845 proc-vmstat.nr_active_anon
261209 ± 3% -3.7% 251458 -2.9% 253702 ± 2% -3.3% 252550 proc-vmstat.nr_shmem
105930 +0.2% 106091 -0.0% 105903 -2.2% 103599 proc-vmstat.nr_slab_unreclaimable
432388 ± 2% -2.1% 423334 -1.7% 425005 -2.0% 423845 proc-vmstat.nr_zone_active_anon
2.286e+08 -13.1% 1.987e+08 -12.9% 1.992e+08 -14.8% 1.947e+08 proc-vmstat.numa_hit
2.285e+08 -13.1% 1.986e+08 -12.9% 1.99e+08 -14.8% 1.946e+08 proc-vmstat.numa_local
2.287e+08 -13.0% 1.988e+08 -12.9% 1.993e+08 -14.8% 1.948e+08 proc-vmstat.pgalloc_normal
4.559e+08 -13.1% 3.962e+08 -12.9% 3.971e+08 -14.8% 3.882e+08 proc-vmstat.pgfault
2.283e+08 -13.1% 1.985e+08 -12.9% 1.989e+08 -14.8% 1.945e+08 proc-vmstat.pgfree
5.74 -5.3% 5.43 -3.1% 5.56 ± 2% -6.0% 5.39 ± 2% perf-stat.i.MPKI
5.392e+09 -6.9% 5.019e+09 -5.7% 5.084e+09 -6.4% 5.049e+09 perf-stat.i.branch-instructions
2.80 +0.0 2.83 +0.0 2.83 +0.0 2.85 perf-stat.i.branch-miss-rate%
1.509e+08 -5.8% 1.421e+08 -4.4% 1.443e+08 -4.2% 1.445e+08 perf-stat.i.branch-misses
24.36 -1.4 22.92 -1.1 23.28 -1.8 22.61 perf-stat.i.cache-miss-rate%
1.538e+08 -12.1% 1.351e+08 -9.2% 1.396e+08 ± 2% -12.7% 1.343e+08 ± 2% perf-stat.i.cache-misses
6.321e+08 -6.4% 5.915e+08 -4.4% 6.041e+08 -5.2% 5.993e+08 perf-stat.i.cache-references
702183 -8.3% 644080 -7.6% 648563 -8.7% 641354 perf-stat.i.context-switches
6.24 +19.0% 7.42 +18.9% 7.42 +21.8% 7.60 perf-stat.i.cpi
1.672e+11 +10.1% 1.841e+11 +10.9% 1.854e+11 +12.8% 1.886e+11 perf-stat.i.cpu-cycles
550.50 +2.6% 565.02 +3.0% 566.88 +4.6% 575.75 perf-stat.i.cpu-migrations
1085 +25.0% 1356 +22.1% 1325 ± 2% +29.1% 1401 ± 2% perf-stat.i.cycles-between-cache-misses
2.683e+10 -7.0% 2.494e+10 -5.8% 2.528e+10 -6.4% 2.511e+10 perf-stat.i.instructions
0.17 -14.7% 0.14 ± 2% -15.0% 0.14 -17.1% 0.14 perf-stat.i.ipc
0.00 ±141% +265.0% 0.00 ± 33% +348.2% 0.00 ± 78% +451.7% 0.01 ± 59% perf-stat.i.major-faults
35.60 -12.2% 31.27 -11.5% 31.52 -13.2% 30.91 perf-stat.i.metric.K/sec
1500379 -13.1% 1304071 -12.4% 1314966 -14.3% 1286573 perf-stat.i.minor-faults
1500379 -13.1% 1304071 -12.4% 1314966 -14.3% 1286573 perf-stat.i.page-faults
2.33 ± 44% +0.5 2.83 +0.5 2.84 +0.5 2.86 perf-stat.overall.branch-miss-rate%
5.19 ± 44% +42.2% 7.37 +41.3% 7.33 +44.7% 7.51 perf-stat.overall.cpi
905.91 ± 44% +50.4% 1362 +46.7% 1328 ± 2% +55.1% 1405 ± 2% perf-stat.overall.cycles-between-cache-misses
8967486 ± 44% +28.7% 11541403 +29.1% 11576850 +30.9% 11738346 perf-stat.overall.path-length
1.387e+11 ± 44% +32.2% 1.835e+11 +33.2% 1.848e+11 +35.5% 1.88e+11 perf-stat.ps.cpu-cycles
457.08 ± 44% +23.1% 562.85 +23.6% 564.75 +25.5% 573.58 perf-stat.ps.cpu-migrations
70.53 -6.7 63.83 -6.8 63.71 -6.8 63.78 perf-profile.calltrace.cycles-pp.__madvise
68.82 -6.4 62.40 -6.5 62.29 -6.5 62.35 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
68.63 -6.4 62.23 -6.5 62.12 -6.4 62.18 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
68.38 -6.4 62.02 -6.5 61.92 -6.4 61.97 perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
68.36 -6.4 62.01 -6.5 61.90 -6.4 61.95 perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
54.54 -3.9 50.68 -3.7 50.80 -3.5 51.02 perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
54.49 -3.8 50.64 -3.7 50.76 -3.5 50.98 perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
20.77 -3.8 16.93 ± 2% -3.7 17.07 -3.8 16.93 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
48.74 -3.8 44.99 -3.7 45.00 -3.6 45.17 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
42.51 -3.4 39.15 -3.3 39.18 -3.3 39.21 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
42.90 -3.4 39.54 -3.3 39.56 -3.3 39.59 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
43.33 -3.3 40.07 -3.2 40.12 -3.1 40.21 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
12.96 -2.3 10.63 ± 3% -2.6 10.40 ± 2% -2.7 10.23 ± 3% perf-profile.calltrace.cycles-pp.down_read.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.32 -2.2 10.12 ± 3% -2.4 9.91 ± 2% -2.6 9.74 ± 2% perf-profile.calltrace.cycles-pp.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
9.65 ± 2% -1.9 7.75 ± 3% -2.2 7.49 ± 2% -2.3 7.33 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise
9.46 ± 2% -1.9 7.57 ± 3% -2.1 7.33 ± 2% -2.3 7.17 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.rwsem_down_read_slowpath.down_read.do_madvise
6.54 ± 2% -1.5 5.08 ± 2% -1.5 5.03 ± 4% -1.6 4.95 ± 5% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
6.29 -1.1 5.16 -1.0 5.28 ± 2% -1.1 5.22 ± 3% perf-profile.calltrace.cycles-pp.testcase
4.34 -0.9 3.41 ± 2% -0.9 3.45 ± 2% -0.9 3.40 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
3.94 -0.9 3.07 ± 2% -0.8 3.11 ± 2% -0.9 3.06 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
3.79 -0.8 2.95 ± 2% -0.8 2.99 ± 2% -0.8 2.95 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
3.74 -0.8 2.91 ± 2% -0.8 2.95 ± 2% -0.8 2.91 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
4.88 ± 2% -0.8 4.11 ± 3% -0.7 4.13 ± 3% -0.8 4.06 ± 3% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.28 ± 2% -0.8 0.52 -0.8 0.45 ± 39% -1.0 0.31 ± 81% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
4.63 -0.7 3.92 -0.7 3.94 -0.7 3.91 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
3.82 -0.7 3.13 -0.6 3.24 ± 4% -0.6 3.19 ± 5% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
3.34 ± 2% -0.6 2.72 ± 2% -0.5 2.83 ± 5% -0.6 2.78 ± 6% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
3.23 ± 2% -0.6 2.63 ± 2% -0.5 2.75 ± 5% -0.5 2.70 ± 6% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
0.68 -0.6 0.10 ±200% -0.5 0.21 ±122% -0.4 0.31 ± 81% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise
5.07 -0.4 4.67 ± 2% -0.4 4.63 ± 2% -0.4 4.71 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
5.05 -0.4 4.65 ± 2% -0.4 4.61 ± 2% -0.4 4.69 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
3.31 -0.4 2.92 -0.4 2.94 -0.4 2.94 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
3.48 -0.4 3.09 -0.4 3.11 -0.4 3.11 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
3.35 -0.4 2.96 -0.4 2.98 -0.4 2.97 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
3.82 -0.4 3.44 -0.4 3.46 -0.4 3.46 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
4.88 -0.4 4.51 ± 2% -0.4 4.47 ± 2% -0.3 4.56 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.zap_page_range_single
2.13 -0.3 1.84 -0.3 1.84 -0.3 1.85 perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
1.64 -0.3 1.38 ± 2% -0.2 1.44 ± 5% -0.2 1.44 ± 4% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
1.38 -0.2 1.16 -0.2 1.16 -0.2 1.15 perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
1.40 -0.2 1.18 -0.2 1.23 ± 5% -0.2 1.24 ± 5% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.89 ± 6% -0.2 0.69 ± 9% -0.2 0.74 ± 10% -0.2 0.70 ± 15% perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
3.23 ± 2% -0.2 3.06 -0.2 3.03 ± 3% -0.1 3.09 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
1.13 -0.2 0.97 ± 3% -0.1 0.99 ± 2% -0.1 1.00 ± 2% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
2.92 ± 2% -0.1 2.79 -0.2 2.77 ± 3% -0.1 2.83 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
2.85 ± 2% -0.1 2.74 -0.1 2.71 ± 3% -0.1 2.78 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
2.74 ± 2% -0.1 2.64 -0.1 2.61 ± 3% -0.1 2.67 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.60 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.70 -0.1 0.60 -0.1 0.61 -0.1 0.61 perf-profile.calltrace.cycles-pp.__munmap
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.60 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.68 -0.1 0.60 -0.1 0.61 -0.1 0.60 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.62 ± 3% -0.1 0.54 -0.0 0.58 ± 6% -0.1 0.56 ± 7% perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
0.88 ± 2% -0.1 0.82 ± 2% -0.1 0.82 ± 3% -0.0 0.83 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
0.81 ± 2% -0.1 0.75 ± 2% -0.1 0.75 ± 3% -0.0 0.76 ± 2% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
1.48 -0.1 1.43 -0.0 1.47 ± 2% -0.0 1.46 perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read
0.77 ± 2% -0.1 0.72 ± 3% -0.1 0.72 ± 3% -0.0 0.74 ± 2% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.folios_put_refs
0.78 ± 2% -0.1 0.73 ± 2% -0.1 0.73 ± 3% -0.0 0.74 ± 2% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.folios_put_refs.free_pages_and_swap_cache
1.48 -0.0 1.43 -0.0 1.47 ± 2% -0.0 1.46 perf-profile.calltrace.cycles-pp.schedule.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
1.48 -0.0 1.43 -0.0 1.47 ± 2% -0.0 1.47 perf-profile.calltrace.cycles-pp.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise.__x64_sys_madvise
0.83 ± 2% -0.0 0.79 ± 3% -0.0 0.79 ± 3% -0.0 0.82 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
0.51 +0.0 0.56 +0.0 0.55 +0.0 0.55 perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.54 +0.1 0.64 ± 3% +0.1 0.65 +0.1 0.64 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary
0.43 ± 44% +0.1 0.53 ± 3% +0.1 0.55 ± 3% +0.1 0.56 ± 2% perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.schedule_preempt_disabled.rwsem_down_read_slowpath
0.60 ± 2% +0.1 0.72 ± 3% +0.1 0.73 +0.1 0.73 ± 3% perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.common_startup_64
4.47 +0.1 4.61 ± 2% +0.1 4.62 ± 3% +0.3 4.73 perf-profile.calltrace.cycles-pp.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
4.47 +0.1 4.61 ± 2% +0.1 4.61 ± 3% +0.3 4.73 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise
4.39 +0.1 4.54 ± 2% +0.2 4.54 ± 3% +0.3 4.66 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior
2.98 +0.2 3.21 ± 2% +0.2 3.22 ± 3% +0.3 3.30 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
2.90 ± 2% +0.2 3.15 ± 2% +0.2 3.15 ± 3% +0.3 3.24 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
2.81 ± 2% +0.2 3.06 ± 2% +0.2 3.06 ± 3% +0.3 3.14 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
0.72 ± 2% +0.6 1.29 ± 2% +0.6 1.36 ± 3% +0.7 1.37 ± 5% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
0.70 ± 2% +0.6 1.28 ± 2% +0.6 1.35 ± 3% +0.7 1.36 ± 5% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.00 +0.6 0.63 ± 2% +0.7 0.67 ± 4% +0.7 0.66 ± 7% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule_idle.do_idle.cpu_startup_entry
9.12 +1.0 10.12 +1.0 10.08 +0.9 10.06 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.70 ± 2% +1.4 2.06 ± 4% +1.4 2.11 ± 3% +1.4 2.11 ± 3% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.intel_idle_irq.cpuidle_enter_state.cpuidle_enter
0.61 ± 2% +1.4 2.00 ± 4% +1.4 2.05 ± 3% +1.4 2.05 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.intel_idle_irq.cpuidle_enter_state
0.60 ± 2% +1.4 1.99 ± 4% +1.4 2.04 ± 3% +1.4 2.04 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.intel_idle_irq
19.23 +7.0 26.22 +6.9 26.15 +7.0 26.19 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
19.34 +7.0 26.36 +7.0 26.30 +7.0 26.32 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
20.03 +7.2 27.19 +7.1 27.11 +7.1 27.13 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
21.50 +7.9 29.39 +7.9 29.39 +7.9 29.40 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
21.51 +7.9 29.40 +7.9 29.40 +7.9 29.41 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
21.51 +7.9 29.40 +7.9 29.40 +7.9 29.41 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
21.72 +8.0 29.70 +8.0 29.69 +8.0 29.68 perf-profile.calltrace.cycles-pp.common_startup_64
1.04 +8.2 9.20 ± 2% +8.1 9.19 ± 2% +8.3 9.37 ± 2% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state
1.05 +8.2 9.23 ± 2% +8.2 9.22 ± 2% +8.3 9.40 ± 2% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter
1.17 +8.2 9.38 ± 2% +8.2 9.36 ± 2% +8.4 9.55 ± 2% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
1.28 +8.2 9.51 ± 2% +8.2 9.49 ± 2% +8.4 9.68 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.91 +9.3 11.17 +9.3 11.23 +9.5 11.40 ± 2% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
70.58 -6.7 63.87 -6.8 63.75 -6.8 63.82 perf-profile.children.cycles-pp.__madvise
69.89 -6.5 63.37 -6.6 63.26 -6.6 63.33 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
69.68 -6.5 63.20 -6.6 63.10 -6.5 63.16 perf-profile.children.cycles-pp.do_syscall_64
68.38 -6.4 62.02 -6.5 61.92 -6.4 61.97 perf-profile.children.cycles-pp.__x64_sys_madvise
68.37 -6.4 62.01 -6.5 61.91 -6.4 61.95 perf-profile.children.cycles-pp.do_madvise
21.28 -3.9 17.39 ± 2% -3.8 17.52 -3.9 17.39 perf-profile.children.cycles-pp.llist_add_batch
54.54 -3.9 50.68 -3.7 50.80 -3.5 51.02 perf-profile.children.cycles-pp.madvise_vma_behavior
54.50 -3.8 50.65 -3.7 50.77 -3.5 50.99 perf-profile.children.cycles-pp.zap_page_range_single
48.89 -3.8 45.11 -3.8 45.12 -3.6 45.29 perf-profile.children.cycles-pp.tlb_finish_mmu
43.03 -3.4 39.65 -3.4 39.67 -3.3 39.70 perf-profile.children.cycles-pp.smp_call_function_many_cond
43.03 -3.4 39.65 -3.4 39.67 -3.3 39.70 perf-profile.children.cycles-pp.on_each_cpu_cond_mask
43.48 -3.3 40.19 -3.2 40.24 -3.1 40.33 perf-profile.children.cycles-pp.flush_tlb_mm_range
8.38 ± 2% -2.4 5.97 ± 2% -2.4 5.95 ± 3% -2.6 5.82 ± 4% perf-profile.children.cycles-pp.intel_idle_irq
12.98 -2.3 10.64 ± 3% -2.6 10.40 ± 2% -2.7 10.24 ± 3% perf-profile.children.cycles-pp.down_read
12.41 -2.2 10.19 ± 3% -2.4 9.97 ± 2% -2.6 9.81 ± 2% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
9.72 ± 2% -1.9 7.81 ± 3% -2.2 7.55 ± 2% -2.3 7.39 ± 3% perf-profile.children.cycles-pp._raw_spin_lock_irq
15.12 -1.7 13.38 ± 2% -2.0 13.10 ± 2% -2.0 13.09 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
8.04 -1.4 6.63 -1.4 6.68 -1.4 6.66 perf-profile.children.cycles-pp.llist_reverse_order
6.87 -1.2 5.64 -1.1 5.76 ± 2% -1.2 5.70 ± 2% perf-profile.children.cycles-pp.testcase
4.16 -0.7 3.41 -0.6 3.52 ± 4% -0.7 3.48 ± 4% perf-profile.children.cycles-pp.asm_exc_page_fault
3.34 ± 2% -0.6 2.73 ± 2% -0.5 2.84 ± 5% -0.6 2.79 ± 6% perf-profile.children.cycles-pp.exc_page_fault
3.30 ± 2% -0.6 2.69 ± 2% -0.5 2.80 ± 5% -0.5 2.76 ± 6% perf-profile.children.cycles-pp.do_user_addr_fault
5.07 -0.4 4.67 ± 2% -0.4 4.63 ± 2% -0.4 4.71 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
5.05 -0.4 4.66 ± 2% -0.4 4.61 ± 2% -0.4 4.70 perf-profile.children.cycles-pp.free_pages_and_swap_cache
5.06 -0.4 4.67 ± 2% -0.4 4.63 ± 2% -0.3 4.71 perf-profile.children.cycles-pp.folios_put_refs
2.22 -0.3 1.92 -0.3 1.92 -0.3 1.92 perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
1.65 -0.3 1.39 ± 2% -0.2 1.45 ± 5% -0.2 1.45 ± 4% perf-profile.children.cycles-pp.handle_mm_fault
1.44 -0.2 1.20 -0.2 1.20 -0.2 1.19 perf-profile.children.cycles-pp.__irqentry_text_end
1.41 -0.2 1.19 ± 2% -0.2 1.25 ± 5% -0.2 1.25 ± 5% perf-profile.children.cycles-pp.__handle_mm_fault
0.90 ± 6% -0.2 0.70 ± 9% -0.2 0.75 ± 9% -0.2 0.71 ± 15% perf-profile.children.cycles-pp.lock_vma_under_rcu
1.23 -0.2 1.05 -0.2 1.05 -0.2 1.05 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
3.24 -0.2 3.06 -0.2 3.03 ± 3% -0.1 3.10 perf-profile.children.cycles-pp.__page_cache_release
1.14 -0.2 0.98 ± 3% -0.1 1.00 ± 2% -0.1 1.01 ± 2% perf-profile.children.cycles-pp.do_anonymous_page
0.26 ± 5% -0.1 0.12 ± 8% -0.2 0.11 ± 12% -0.2 0.10 ± 13% perf-profile.children.cycles-pp.poll_idle
0.79 -0.1 0.66 -0.1 0.67 ± 2% -0.1 0.66 perf-profile.children.cycles-pp.error_entry
0.62 ± 3% -0.1 0.49 ± 8% -0.1 0.49 ± 5% -0.1 0.48 ± 5% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
0.45 ± 8% -0.1 0.33 ± 16% -0.1 0.39 ± 21% -0.1 0.38 ± 28% perf-profile.children.cycles-pp.mas_walk
0.88 -0.1 0.76 -0.1 0.76 -0.1 0.77 perf-profile.children.cycles-pp.native_irq_return_iret
0.54 ± 3% -0.1 0.42 ± 5% -0.1 0.42 ± 3% -0.1 0.42 ± 4% perf-profile.children.cycles-pp.page_counter_uncharge
0.50 ± 3% -0.1 0.39 ± 4% -0.1 0.39 ± 3% -0.1 0.39 ± 4% perf-profile.children.cycles-pp.page_counter_cancel
0.54 ± 3% -0.1 0.43 ± 6% -0.1 0.43 ± 5% -0.1 0.42 ± 5% perf-profile.children.cycles-pp.uncharge_batch
0.55 ± 2% -0.1 0.45 ± 3% -0.1 0.44 ± 2% -0.1 0.44 ± 2% perf-profile.children.cycles-pp.up_read
0.70 -0.1 0.60 -0.1 0.61 -0.1 0.61 perf-profile.children.cycles-pp.__munmap
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.60 perf-profile.children.cycles-pp.__vm_munmap
0.69 -0.1 0.60 -0.1 0.61 -0.1 0.60 perf-profile.children.cycles-pp.__x64_sys_munmap
0.67 ± 3% -0.1 0.58 -0.1 0.62 ± 6% -0.1 0.60 ± 7% perf-profile.children.cycles-pp.unmap_page_range
0.52 ± 2% -0.1 0.44 ± 4% -0.1 0.45 ± 3% -0.1 0.45 ± 2% perf-profile.children.cycles-pp.alloc_anon_folio
0.57 -0.1 0.50 ± 4% -0.1 0.52 ± 4% -0.1 0.52 ± 4% perf-profile.children.cycles-pp.zap_pmd_range
0.54 -0.1 0.48 ± 2% -0.1 0.49 -0.1 0.48 perf-profile.children.cycles-pp.do_vmi_align_munmap
0.54 -0.1 0.48 ± 2% -0.1 0.49 -0.1 0.48 perf-profile.children.cycles-pp.do_vmi_munmap
0.38 ± 2% -0.1 0.32 -0.1 0.32 ± 2% -0.1 0.31 ± 2% perf-profile.children.cycles-pp.vma_alloc_folio_noprof
0.53 ± 3% -0.1 0.47 ± 4% -0.0 0.49 ± 5% -0.0 0.48 ± 4% perf-profile.children.cycles-pp.zap_pte_range
1.51 -0.1 1.45 -0.0 1.50 ± 2% -0.0 1.49 perf-profile.children.cycles-pp.schedule_preempt_disabled
0.52 ± 2% -0.1 0.47 ± 2% -0.1 0.47 -0.1 0.47 perf-profile.children.cycles-pp.vms_complete_munmap_vmas
0.48 -0.1 0.42 -0.1 0.42 -0.0 0.42 perf-profile.children.cycles-pp.native_flush_tlb_local
0.31 ± 2% -0.1 0.26 -0.0 0.26 ± 2% -0.1 0.25 ± 2% perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
0.27 ± 5% -0.1 0.22 ± 8% -0.0 0.23 ± 8% -0.0 0.22 ± 7% perf-profile.children.cycles-pp.tlb_gather_mmu
0.31 ± 2% -0.1 0.26 ± 2% -0.0 0.27 -0.1 0.26 ± 2% perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
0.34 -0.1 0.29 ± 2% -0.1 0.29 ± 2% -0.1 0.29 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
1.51 -0.1 1.46 -0.0 1.50 ± 2% -0.0 1.50 perf-profile.children.cycles-pp.schedule
0.32 -0.0 0.27 -0.0 0.28 -0.1 0.27 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.50 -0.0 0.45 -0.0 0.45 ± 2% -0.0 0.45 perf-profile.children.cycles-pp.dequeue_task_fair
0.40 -0.0 0.35 ± 2% -0.0 0.36 ± 2% -0.0 0.35 perf-profile.children.cycles-pp.__pte_offset_map_lock
0.48 -0.0 0.43 -0.0 0.43 -0.0 0.44 perf-profile.children.cycles-pp.dequeue_entities
0.28 ± 2% -0.0 0.24 ± 2% -0.0 0.24 ± 2% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.__alloc_pages_noprof
0.28 -0.0 0.24 ± 4% -0.0 0.24 ± 3% -0.0 0.24 ± 3% perf-profile.children.cycles-pp.lru_gen_del_folio
0.24 ± 5% -0.0 0.20 ± 7% -0.0 0.22 ± 13% -0.0 0.21 ± 17% perf-profile.children.cycles-pp.find_vma_prev
0.22 ± 3% -0.0 0.18 ± 4% -0.0 0.18 ± 3% -0.0 0.18 ± 3% perf-profile.children.cycles-pp.__perf_sw_event
0.32 -0.0 0.28 -0.0 0.28 ± 2% -0.0 0.28 perf-profile.children.cycles-pp.irqtime_account_irq
0.24 ± 4% -0.0 0.20 ± 7% -0.0 0.19 ± 6% -0.0 0.20 ± 6% perf-profile.children.cycles-pp.down_read_trylock
0.19 ± 3% -0.0 0.15 ± 3% -0.0 0.16 ± 3% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.vms_clear_ptes
0.14 ± 9% -0.0 0.10 ± 8% -0.0 0.11 ± 12% -0.0 0.10 ± 13% perf-profile.children.cycles-pp.flush_tlb_batched_pending
0.22 -0.0 0.19 ± 2% -0.0 0.18 ± 2% -0.0 0.18 ± 3% perf-profile.children.cycles-pp.get_page_from_freelist
0.24 -0.0 0.20 ± 3% -0.0 0.20 ± 2% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.sync_regs
0.21 ± 2% -0.0 0.18 ± 4% -0.0 0.18 ± 3% -0.0 0.18 ± 3% perf-profile.children.cycles-pp.___perf_sw_event
0.23 ± 3% -0.0 0.19 ± 2% -0.0 0.20 ± 3% -0.0 0.19 ± 3% perf-profile.children.cycles-pp.down_write_killable
0.29 -0.0 0.26 -0.0 0.26 -0.0 0.26 perf-profile.children.cycles-pp.dequeue_entity
0.22 ± 3% -0.0 0.19 ± 2% -0.0 0.19 ± 2% -0.0 0.19 ± 4% perf-profile.children.cycles-pp.rwsem_down_write_slowpath
0.06 -0.0 0.03 ± 81% -0.0 0.05 -0.0 0.03 ± 81% perf-profile.children.cycles-pp.__cond_resched
0.29 ± 2% -0.0 0.26 ± 2% -0.0 0.25 ± 8% -0.0 0.27 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler
0.27 -0.0 0.24 ± 4% -0.0 0.24 ± 2% -0.0 0.24 ± 2% perf-profile.children.cycles-pp.lru_gen_add_folio
0.09 ± 4% -0.0 0.06 ± 6% -0.0 0.07 ± 7% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.call_function_single_prep_ipi
0.23 ± 2% -0.0 0.20 ± 2% -0.0 0.20 ± 2% -0.0 0.20 ± 3% perf-profile.children.cycles-pp.sched_clock_cpu
0.20 ± 2% -0.0 0.18 ± 2% -0.0 0.17 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.native_sched_clock
0.15 ± 4% -0.0 0.12 ± 3% -0.0 0.13 ± 3% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.rwsem_mark_wake
0.33 ± 2% -0.0 0.31 -0.0 0.31 -0.0 0.31 ± 2% perf-profile.children.cycles-pp.downgrade_write
0.14 ± 4% -0.0 0.12 ± 3% -0.0 0.12 ± 3% -0.0 0.12 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.14 ± 7% -0.0 0.12 ± 16% -0.0 0.12 ± 14% -0.0 0.12 ± 9% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.34 -0.0 0.32 ± 3% -0.0 0.32 ± 2% -0.0 0.32 ± 2% perf-profile.children.cycles-pp.lru_add
0.42 ± 2% -0.0 0.40 ± 2% -0.0 0.40 ± 2% -0.0 0.40 perf-profile.children.cycles-pp.try_to_wake_up
0.25 -0.0 0.23 -0.0 0.22 ± 8% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.update_process_times
0.12 ± 6% -0.0 0.10 ± 9% -0.0 0.10 ± 10% -0.0 0.10 ± 6% perf-profile.children.cycles-pp.folio_add_new_anon_rmap
0.20 ± 2% -0.0 0.17 ± 2% -0.0 0.17 ± 2% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.sched_clock
0.37 -0.0 0.35 -0.0 0.35 ± 3% -0.0 0.35 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.14 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 2% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.update_curr
0.13 ± 5% -0.0 0.11 ± 3% -0.0 0.11 ± 2% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.rwsem_optimistic_spin
0.11 -0.0 0.09 ± 4% -0.0 0.09 ± 4% -0.0 0.09 ± 3% perf-profile.children.cycles-pp.clear_page_erms
0.15 -0.0 0.13 ± 3% -0.0 0.14 ± 2% -0.0 0.14 ± 4% perf-profile.children.cycles-pp.__smp_call_single_queue
0.14 ± 3% -0.0 0.13 ± 3% -0.0 0.13 ± 15% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.ktime_get
0.10 ± 4% -0.0 0.08 ± 13% -0.0 0.08 ± 11% -0.0 0.08 ± 8% perf-profile.children.cycles-pp.__folio_mod_stat
0.43 -0.0 0.41 -0.0 0.41 ± 2% -0.0 0.42 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt
0.22 ± 2% -0.0 0.20 -0.0 0.20 ± 2% -0.0 0.19 perf-profile.children.cycles-pp.enqueue_entity
0.18 ± 2% -0.0 0.16 ± 2% -0.0 0.16 ± 3% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.update_load_avg
0.14 ± 2% -0.0 0.12 ± 3% -0.0 0.13 ± 2% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.__hrtimer_start_range_ns
0.12 ± 3% -0.0 0.10 ± 4% -0.0 0.11 ± 4% -0.0 0.11 ± 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.10 ± 3% -0.0 0.09 ± 4% -0.0 0.09 ± 4% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.free_unref_folios
0.19 ± 2% -0.0 0.18 -0.0 0.18 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.08 ± 6% -0.0 0.06 ± 7% -0.0 0.06 ± 4% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.rmqueue
0.07 ± 7% -0.0 0.05 ± 9% -0.0 0.05 ± 4% -0.0 0.05 ± 5% perf-profile.children.cycles-pp.get_nohz_timer_target
0.09 -0.0 0.08 ± 5% -0.0 0.07 ± 6% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.read_tsc
0.14 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.idle_cpu
0.08 ± 5% -0.0 0.07 ± 6% -0.0 0.08 ± 6% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.prepare_task_switch
0.06 ± 7% -0.0 0.05 ± 9% -0.0 0.05 ± 29% -0.0 0.05 ± 37% perf-profile.children.cycles-pp.mm_cid_get
0.07 -0.0 0.06 -0.0 0.06 ± 4% -0.0 0.06 perf-profile.children.cycles-pp.rwsem_spin_on_owner
0.07 -0.0 0.06 -0.0 0.06 ± 7% -0.0 0.06 perf-profile.children.cycles-pp.native_apic_mem_eoi
0.09 ± 5% -0.0 0.08 ± 5% -0.0 0.07 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.irq_enter_rcu
0.06 -0.0 0.05 ± 7% -0.0 0.05 -0.0 0.05 ± 7% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.16 ± 2% +0.0 0.18 ± 4% +0.0 0.18 ± 2% +0.0 0.19 perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.05 ± 8% +0.0 0.07 ± 5% +0.0 0.07 ± 4% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.__switch_to
0.10 ± 3% +0.0 0.11 ± 4% +0.0 0.12 ± 5% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.hrtimer_try_to_cancel
0.67 ± 3% +0.0 0.69 ± 2% +0.0 0.70 ± 2% +0.0 0.72 ± 2% perf-profile.children.cycles-pp.__pick_next_task
0.50 +0.0 0.52 +0.0 0.53 ± 2% +0.0 0.54 perf-profile.children.cycles-pp.__irq_exit_rcu
0.66 +0.0 0.68 +0.0 0.69 +0.0 0.70 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.62 ± 2% +0.0 0.64 ± 2% +0.0 0.65 ± 2% +0.1 0.67 ± 3% perf-profile.children.cycles-pp.pick_next_task_fair
0.11 ± 3% +0.0 0.14 ± 5% +0.0 0.13 ± 4% +0.0 0.14 perf-profile.children.cycles-pp.start_dl_timer
0.05 +0.0 0.08 ± 6% +0.0 0.09 ± 5% +0.0 0.08 ± 4% perf-profile.children.cycles-pp.task_contending
0.47 ± 3% +0.0 0.50 ± 2% +0.0 0.51 ± 3% +0.0 0.52 ± 3% perf-profile.children.cycles-pp.sched_balance_newidle
0.45 ± 3% +0.0 0.48 ± 3% +0.0 0.49 ± 3% +0.1 0.50 ± 3% perf-profile.children.cycles-pp.sched_balance_rq
0.03 ± 70% +0.0 0.07 +0.0 0.07 ± 5% +0.0 0.07 perf-profile.children.cycles-pp.hrtimer_next_event_without
0.52 +0.0 0.56 +0.0 0.55 +0.0 0.55 perf-profile.children.cycles-pp.menu_select
0.19 ± 5% +0.0 0.23 +0.1 0.24 ± 3% +0.1 0.24 ± 2% perf-profile.children.cycles-pp.handle_softirqs
0.18 ± 2% +0.0 0.22 ± 3% +0.0 0.22 ± 2% +0.1 0.23 ± 2% perf-profile.children.cycles-pp.enqueue_dl_entity
0.18 ± 3% +0.1 0.23 +0.1 0.23 +0.1 0.24 ± 2% perf-profile.children.cycles-pp.dl_server_start
0.11 ± 3% +0.1 0.16 ± 2% +0.1 0.17 ± 3% +0.1 0.17 ± 2% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
0.46 +0.1 0.52 +0.1 0.51 +0.1 0.52 perf-profile.children.cycles-pp.enqueue_task_fair
0.00 +0.1 0.06 +0.1 0.06 ± 6% +0.1 0.06 ± 6% perf-profile.children.cycles-pp.call_cpuidle
0.47 +0.1 0.53 +0.1 0.53 +0.1 0.53 perf-profile.children.cycles-pp.enqueue_task
0.48 +0.1 0.55 +0.1 0.54 +0.1 0.54 perf-profile.children.cycles-pp.ttwu_do_activate
0.62 +0.1 0.70 ± 4% +0.1 0.70 ± 2% +0.1 0.70 perf-profile.children.cycles-pp._find_next_bit
0.18 ± 2% +0.1 0.26 +0.1 0.27 ± 3% +0.1 0.28 ± 5% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.20 +0.1 0.28 ± 3% +0.1 0.29 ± 3% +0.1 0.29 ± 5% perf-profile.children.cycles-pp.sysvec_call_function_single
0.64 +0.1 0.72 +0.1 0.72 +0.1 0.71 perf-profile.children.cycles-pp.sched_ttwu_pending
0.22 +0.1 0.30 +0.1 0.31 ± 3% +0.1 0.32 ± 4% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.21 ± 2% +0.1 0.30 ± 8% +0.1 0.28 ± 8% +0.1 0.27 ± 8% perf-profile.children.cycles-pp.rest_init
0.21 ± 2% +0.1 0.30 ± 8% +0.1 0.28 ± 8% +0.1 0.27 ± 8% perf-profile.children.cycles-pp.start_kernel
0.21 ± 2% +0.1 0.30 ± 8% +0.1 0.28 ± 8% +0.1 0.27 ± 8% perf-profile.children.cycles-pp.x86_64_start_kernel
0.21 ± 2% +0.1 0.30 ± 8% +0.1 0.28 ± 8% +0.1 0.27 ± 8% perf-profile.children.cycles-pp.x86_64_start_reservations
0.10 ± 3% +0.1 0.20 ± 2% +0.1 0.20 ± 2% +0.1 0.20 ± 4% perf-profile.children.cycles-pp.tick_nohz_next_event
0.00 +0.1 0.10 ± 7% +0.1 0.11 ± 5% +0.1 0.07 ± 6% perf-profile.children.cycles-pp.__bitmap_and
0.06 ± 6% +0.1 0.16 ± 4% +0.1 0.16 ± 2% +0.1 0.16 ± 3% perf-profile.children.cycles-pp.__get_next_timer_interrupt
0.48 +0.1 0.58 ± 2% +0.1 0.58 ± 2% +0.1 0.58 perf-profile.children.cycles-pp._raw_spin_lock
0.16 ± 3% +0.1 0.28 +0.1 0.28 ± 2% +0.1 0.28 ± 3% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.61 +0.1 0.73 ± 3% +0.1 0.74 +0.1 0.74 ± 3% perf-profile.children.cycles-pp.flush_smp_call_function_queue
4.49 +0.1 4.62 ± 2% +0.1 4.63 ± 3% +0.3 4.74 perf-profile.children.cycles-pp.lru_add_drain
4.48 +0.1 4.62 ± 2% +0.1 4.62 ± 3% +0.3 4.74 perf-profile.children.cycles-pp.lru_add_drain_cpu
4.45 +0.1 4.59 ± 2% +0.2 4.60 ± 3% +0.3 4.72 perf-profile.children.cycles-pp.folio_batch_move_lru
0.28 ± 2% +0.1 0.42 ± 4% +0.2 0.46 ± 4% +0.2 0.45 ± 6% perf-profile.children.cycles-pp.finish_task_switch
0.00 +0.2 0.15 ± 3% +0.2 0.15 ± 5% +0.2 0.16 ± 3% perf-profile.children.cycles-pp.ct_kernel_enter
0.00 +0.2 0.16 ± 3% +0.2 0.16 ± 5% +0.2 0.16 ± 4% perf-profile.children.cycles-pp.ct_idle_exit
0.02 ± 99% +0.2 0.24 ± 2% +0.2 0.24 ± 4% +0.2 0.25 ± 4% perf-profile.children.cycles-pp.ct_kernel_exit_state
5.84 ± 2% +0.3 6.09 +0.2 6.07 ± 3% +0.4 6.23 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.37 ± 4% +0.4 0.73 ± 2% +0.4 0.78 ± 3% +0.4 0.76 ± 6% perf-profile.children.cycles-pp.switch_mm_irqs_off
2.22 +0.5 2.76 +0.6 2.86 ± 2% +0.6 2.86 ± 3% perf-profile.children.cycles-pp.__schedule
0.73 ± 2% +0.6 1.31 ± 2% +0.6 1.38 ± 3% +0.7 1.38 ± 5% perf-profile.children.cycles-pp.schedule_idle
9.24 +1.0 10.25 +1.0 10.21 +0.9 10.18 perf-profile.children.cycles-pp.intel_idle
18.85 +6.3 25.12 +6.4 25.25 +6.5 25.36 perf-profile.children.cycles-pp.asm_sysvec_call_function
19.52 +7.1 26.62 +7.0 26.53 +7.0 26.56 perf-profile.children.cycles-pp.cpuidle_enter_state
19.53 +7.1 26.63 +7.0 26.55 +7.0 26.57 perf-profile.children.cycles-pp.cpuidle_enter
20.22 +7.2 27.47 +7.2 27.38 +7.2 27.39 perf-profile.children.cycles-pp.cpuidle_idle_call
14.43 +7.8 22.23 +8.0 22.41 +8.1 22.56 perf-profile.children.cycles-pp.sysvec_call_function
13.73 +7.9 21.59 +8.0 21.77 +8.2 21.93 perf-profile.children.cycles-pp.__sysvec_call_function
21.51 +7.9 29.40 +7.9 29.40 +7.9 29.41 perf-profile.children.cycles-pp.start_secondary
21.72 +8.0 29.69 +8.0 29.68 +8.0 29.68 perf-profile.children.cycles-pp.do_idle
21.72 +8.0 29.70 +8.0 29.69 +8.0 29.68 perf-profile.children.cycles-pp.common_startup_64
21.72 +8.0 29.70 +8.0 29.69 +8.0 29.68 perf-profile.children.cycles-pp.cpu_startup_entry
14.36 +8.1 22.42 +8.2 22.61 +8.4 22.78 perf-profile.children.cycles-pp.__flush_smp_call_function_queue
3.58 +9.4 12.97 +9.5 13.09 +9.7 13.27 ± 2% perf-profile.children.cycles-pp.flush_tlb_func
7.43 ± 2% -3.8 3.66 -3.8 3.58 ± 4% -4.0 3.47 ± 7% perf-profile.self.cycles-pp.intel_idle_irq
16.93 -2.9 13.99 ± 2% -2.9 14.07 -2.9 13.99 perf-profile.self.cycles-pp.llist_add_batch
15.11 -1.7 13.38 ± 2% -2.0 13.09 ± 2% -2.0 13.08 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
8.01 -1.4 6.57 -1.4 6.62 -1.4 6.59 perf-profile.self.cycles-pp.llist_reverse_order
1.44 -0.2 1.19 -0.2 1.20 -0.2 1.19 perf-profile.self.cycles-pp.__irqentry_text_end
1.69 -0.2 1.50 -0.2 1.51 -0.2 1.51 perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
0.24 ± 6% -0.1 0.10 ± 10% -0.2 0.09 ± 15% -0.2 0.08 ± 16% perf-profile.self.cycles-pp.poll_idle
0.78 -0.1 0.66 -0.1 0.66 ± 2% -0.1 0.66 perf-profile.self.cycles-pp.error_entry
0.87 -0.1 0.76 -0.1 0.76 -0.1 0.76 perf-profile.self.cycles-pp.native_irq_return_iret
0.65 ± 2% -0.1 0.54 ± 2% -0.1 0.53 -0.1 0.54 perf-profile.self.cycles-pp.testcase
0.46 ± 2% -0.1 0.37 ± 5% -0.1 0.35 ± 3% -0.1 0.35 ± 5% perf-profile.self.cycles-pp.down_read
0.38 ± 8% -0.1 0.29 ± 16% -0.1 0.33 ± 21% -0.1 0.32 ± 28% perf-profile.self.cycles-pp.mas_walk
0.41 ± 3% -0.1 0.32 ± 4% -0.1 0.32 ± 3% -0.1 0.32 ± 4% perf-profile.self.cycles-pp.page_counter_cancel
0.32 ± 12% -0.1 0.24 ± 4% -0.0 0.28 ± 14% -0.1 0.25 ± 20% perf-profile.self.cycles-pp.zap_page_range_single
0.46 ± 2% -0.1 0.38 ± 3% -0.1 0.37 ± 3% -0.1 0.37 ± 3% perf-profile.self.cycles-pp.up_read
0.28 ± 5% -0.1 0.21 ± 5% -0.1 0.22 ± 6% -0.1 0.20 ± 6% perf-profile.self.cycles-pp.tlb_finish_mmu
0.56 -0.1 0.49 ± 2% -0.1 0.48 ± 2% -0.1 0.48 ± 2% perf-profile.self.cycles-pp.rwsem_down_read_slowpath
0.32 ± 11% -0.1 0.25 ± 16% -0.0 0.27 ± 11% -0.1 0.23 ± 16% perf-profile.self.cycles-pp.lock_vma_under_rcu
0.30 ± 2% -0.1 0.24 -0.1 0.22 ± 2% -0.1 0.22 ± 3% perf-profile.self.cycles-pp.menu_select
0.33 ± 6% -0.1 0.27 ± 7% -0.0 0.28 ± 6% +0.0 0.35 ± 8% perf-profile.self.cycles-pp.flush_tlb_mm_range
0.46 -0.1 0.41 -0.0 0.42 ± 2% -0.0 0.42 ± 2% perf-profile.self.cycles-pp.native_flush_tlb_local
0.34 -0.1 0.29 ± 2% -0.1 0.28 ± 2% -0.1 0.28 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.32 -0.0 0.27 -0.0 0.28 -0.1 0.27 ± 2% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.18 ± 9% -0.0 0.13 ± 12% -0.0 0.16 ± 28% -0.0 0.15 ± 35% perf-profile.self.cycles-pp.__handle_mm_fault
0.22 ± 5% -0.0 0.18 ± 7% -0.0 0.20 ± 8% -0.0 0.19 ± 8% perf-profile.self.cycles-pp.tlb_gather_mmu
0.24 -0.0 0.20 ± 3% -0.0 0.20 ± 2% -0.0 0.19 ± 2% perf-profile.self.cycles-pp.sync_regs
0.12 ± 9% -0.0 0.08 ± 9% -0.0 0.09 ± 13% -0.0 0.09 ± 14% perf-profile.self.cycles-pp.flush_tlb_batched_pending
0.14 ± 4% -0.0 0.11 ± 9% -0.0 0.12 ± 8% -0.0 0.11 ± 9% perf-profile.self.cycles-pp.do_madvise
0.22 ± 2% -0.0 0.18 ± 2% -0.0 0.18 ± 3% -0.0 0.19 ± 2% perf-profile.self.cycles-pp.lru_gen_del_folio
0.21 ± 2% -0.0 0.17 ± 4% -0.0 0.17 ± 2% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.13 ± 7% -0.0 0.10 ± 6% -0.0 0.11 ± 10% -0.0 0.11 ± 6% perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave
0.19 ± 4% -0.0 0.17 ± 6% -0.0 0.16 ± 7% -0.0 0.16 ± 6% perf-profile.self.cycles-pp.down_read_trylock
0.09 ± 5% -0.0 0.06 -0.0 0.06 ± 6% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.call_function_single_prep_ipi
0.20 ± 3% -0.0 0.17 -0.0 0.17 ± 2% -0.0 0.17 perf-profile.self.cycles-pp.native_sched_clock
0.13 ± 4% -0.0 0.10 ± 4% -0.0 0.11 ± 4% -0.0 0.11 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.15 ± 2% -0.0 0.13 ± 3% -0.0 0.13 ± 4% -0.0 0.13 ± 4% perf-profile.self.cycles-pp.___perf_sw_event
0.11 ± 4% -0.0 0.08 ± 5% -0.0 0.08 ± 4% -0.0 0.08 ± 3% perf-profile.self.cycles-pp.do_user_addr_fault
0.22 ± 2% -0.0 0.20 ± 5% -0.0 0.20 ± 3% -0.0 0.20 ± 3% perf-profile.self.cycles-pp.lru_gen_add_folio
0.19 ± 3% -0.0 0.17 ± 4% -0.0 0.17 ± 2% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.folio_batch_move_lru
0.22 -0.0 0.20 ± 3% -0.0 0.20 ± 2% -0.0 0.20 perf-profile.self.cycles-pp.folios_put_refs
0.14 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 3% -0.0 0.12 perf-profile.self.cycles-pp.irqtime_account_irq
0.09 ± 5% -0.0 0.07 -0.0 0.07 ± 4% -0.0 0.07 perf-profile.self.cycles-pp.__madvise
0.12 ± 4% -0.0 0.10 -0.0 0.10 ± 4% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.rwsem_mark_wake
0.20 ± 4% -0.0 0.19 ± 4% -0.0 0.18 ± 3% -0.0 0.18 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.09 ± 5% -0.0 0.07 ± 6% -0.0 0.07 ± 4% -0.0 0.07 ± 5% perf-profile.self.cycles-pp.read_tsc
0.09 -0.0 0.08 ± 5% -0.0 0.07 ± 5% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.clear_page_erms
0.07 -0.0 0.06 ± 6% -0.0 0.06 -0.0 0.06 ± 9% perf-profile.self.cycles-pp.native_apic_mem_eoi
0.06 ± 7% -0.0 0.05 ± 9% -0.0 0.05 ± 29% -0.0 0.05 ± 37% perf-profile.self.cycles-pp.mm_cid_get
0.10 -0.0 0.09 -0.0 0.09 ± 4% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.asm_sysvec_call_function
0.06 -0.0 0.05 -0.0 0.05 ± 8% -0.0 0.05 ± 9% perf-profile.self.cycles-pp.handle_mm_fault
0.06 -0.0 0.05 ± 7% -0.0 0.05 -0.0 0.04 ± 33% perf-profile.self.cycles-pp.free_pages_and_swap_cache
0.05 ± 7% +0.0 0.07 ± 5% +0.0 0.07 ± 3% +0.0 0.07 ± 6% perf-profile.self.cycles-pp.__switch_to
0.17 ± 3% +0.0 0.19 ± 2% +0.0 0.19 ± 2% +0.0 0.19 ± 2% perf-profile.self.cycles-pp.cpuidle_enter_state
2.10 +0.0 2.14 +0.1 2.17 +0.1 2.19 perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.06 +0.1 0.11 ± 3% +0.1 0.12 ± 4% +0.1 0.12 ± 4% perf-profile.self.cycles-pp.cpuidle_idle_call
0.02 ±141% +0.1 0.07 ± 5% +0.1 0.07 ± 6% +0.1 0.07 ± 6% perf-profile.self.cycles-pp.do_idle
0.00 +0.1 0.06 ± 6% +0.1 0.06 ± 9% +0.1 0.06 ± 8% perf-profile.self.cycles-pp.call_cpuidle
0.48 +0.1 0.56 ± 4% +0.1 0.57 ± 2% +0.1 0.57 ± 2% perf-profile.self.cycles-pp._find_next_bit
0.00 +0.1 0.09 ± 5% +0.1 0.10 ± 6% +0.1 0.06 ± 9% perf-profile.self.cycles-pp.__bitmap_and
0.38 ± 3% +0.1 0.49 ± 2% +0.1 0.49 ± 2% +0.1 0.49 ± 2% perf-profile.self.cycles-pp._raw_spin_lock
0.28 +0.1 0.39 ± 3% +0.1 0.38 ± 2% +0.1 0.40 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.01 ±223% +0.2 0.24 ± 3% +0.2 0.24 ± 3% +0.2 0.25 ± 4% perf-profile.self.cycles-pp.ct_kernel_exit_state
0.36 ± 4% +0.4 0.73 ± 2% +0.4 0.77 ± 3% +0.4 0.76 ± 6% perf-profile.self.cycles-pp.switch_mm_irqs_off
9.24 +1.0 10.25 +1.0 10.21 +0.9 10.18 perf-profile.self.cycles-pp.intel_idle
15.13 +1.2 16.34 +1.1 16.20 +1.3 16.39 perf-profile.self.cycles-pp.smp_call_function_many_cond
3.07 +9.5 12.52 +9.6 12.63 +9.7 12.82 ± 2% perf-profile.self.cycles-pp.flush_tlb_func
[2]
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/priority/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled:
gcc-12/performance/x86_64-rhel-9.4/1/32/1/debian-12-x86_64-20240206.cgz/300/lkp-icl-2sp4/swap-w-seq-mt/vm-scalability/always/never
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
2815a56e4b ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU")
852ff7f2f7 ("x86,mm: only trim the mm_cpumask once a second")
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 2815a56e4b7252a836969f5674e 852ff7f2f791aadd04317d1a53f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
368.00 ±114% +237.8% 1243 ± 32% +253.5% 1300 ± 22% +230.3% 1215 ± 22% perf-c2c.HITM.remote
3.016e+10 ± 5% +23.7% 3.732e+10 +23.3% 3.717e+10 +22.4% 3.693e+10 cpuidle..time
2394210 ± 5% +1598.8% 40671711 +1582.1% 40273563 +1588.4% 40423635 cpuidle..usage
280.01 ± 3% +24.8% 349.50 +24.2% 347.79 +23.4% 345.66 uptime.boot
27411 ± 3% +21.2% 33234 +20.4% 32994 +19.4% 32716 uptime.idle
73.97 -2.3% 72.29 -2.5% 72.14 -2.7% 71.96 iostat.cpu.idle
23.61 -7.6% 21.82 -7.4% 21.87 -6.8% 22.00 iostat.cpu.iowait
2.11 ± 2% +169.7% 5.68 ± 4% +174.3% 5.78 ± 2% +176.2% 5.82 ± 2% iostat.cpu.system
0.26 ± 3% -0.0 0.23 ± 2% -0.0 0.23 ± 2% -0.0 0.23 mpstat.cpu.all.irq%
0.08 -0.0 0.04 ± 4% -0.0 0.04 ± 8% -0.0 0.04 ± 4% mpstat.cpu.all.soft%
1.78 ± 2% +3.7 5.45 ± 4% +3.8 5.55 ± 2% +3.8 5.58 ± 2% mpstat.cpu.all.sys%
0.31 ± 7% -0.1 0.21 ± 5% -0.1 0.21 ± 4% -0.1 0.21 mpstat.cpu.all.usr%
16661751 ± 42% -59.8% 6694161 ± 70% -29.8% 11689134 ± 53% -43.1% 9474687 ± 66% numa-numastat.node0.numa_miss
16734663 ± 41% -59.5% 6770170 ± 69% -29.9% 11733904 ± 53% -43.0% 9533441 ± 66% numa-numastat.node0.other_node
26857269 ± 23% -60.2% 10694204 ± 50% -39.9% 16138920 ± 41% -47.7% 14040582 ± 47% numa-numastat.node1.local_node
16665351 ± 42% -59.8% 6694094 ± 70% -29.9% 11678532 ± 53% -43.3% 9441584 ± 66% numa-numastat.node1.numa_foreign
26918278 ± 23% -60.1% 10751098 ± 49% -39.8% 16205404 ± 41% -47.6% 14114601 ± 47% numa-numastat.node1.numa_hit
368.92 ± 36% -55.2% 165.39 ± 74% -25.3% 275.44 ± 51% -47.2% 194.83 ± 54% vmstat.io.bi
409795 -51.0% 200717 ± 3% -51.3% 199570 ± 7% -51.4% 199148 ± 5% vmstat.io.bo
4.14 ± 7% +100.0% 8.28 ± 6% +108.5% 8.63 ± 3% +110.8% 8.72 ± 4% vmstat.procs.r
359.98 ± 37% -56.0% 158.48 ± 77% -25.4% 268.51 ± 52% -47.8% 187.86 ± 56% vmstat.swap.si
409786 -51.0% 200710 ± 3% -51.3% 199563 ± 7% -51.4% 199142 ± 5% vmstat.swap.so
5382 -28.9% 3825 ± 2% -29.6% 3788 ± 3% -29.6% 3791 ± 2% vmstat.system.cs
339018 ± 2% -33.0% 227081 ± 3% -32.6% 228406 -32.2% 229751 vmstat.system.in
54162177 ± 11% -32.5% 36537092 ± 17% -21.5% 42515388 ± 26% -29.1% 38427326 ± 29% meminfo.Active
54162037 ± 11% -32.5% 36536947 ± 17% -21.5% 42515235 ± 26% -29.1% 38427028 ± 29% meminfo.Active(anon)
66576747 ± 9% +24.3% 82748036 ± 9% +16.2% 77343432 ± 14% +23.9% 82492139 ± 13% meminfo.Inactive
66575517 ± 9% +24.3% 82746881 ± 9% +16.2% 77342282 ± 14% +23.9% 82490802 ± 13% meminfo.Inactive(anon)
333831 -11.8% 294280 -11.6% 295003 ± 2% -11.0% 297103 meminfo.PageTables
33487 ± 3% +199.2% 100210 ± 26% +184.1% 95125 ± 12% +196.7% 99362 ± 11% meminfo.Shmem
1.627e+08 +11.4% 1.812e+08 +11.5% 1.814e+08 +11.5% 1.814e+08 meminfo.SwapFree
1644 ± 11% -41.9% 955.75 ± 10% -41.9% 954.97 ± 10% -42.4% 947.74 ± 13% meminfo.Writeback
239.68 ± 5% +28.4% 307.63 +28.4% 307.85 +28.4% 307.77 time.elapsed_time
239.68 ± 5% +28.4% 307.63 +28.4% 307.85 +28.4% 307.77 time.elapsed_time.max
6297 ± 5% +60.9% 10134 +63.0% 10265 +63.5% 10295 time.involuntary_context_switches
62687446 -24.8% 47163918 ± 2% -24.9% 47061196 ± 3% -25.1% 46934963 time.minor_page_faults
224.00 ± 3% +166.7% 597.33 ± 3% +170.7% 606.31 ± 2% +170.7% 606.40 ± 2% time.percent_of_cpu_this_job_got
474.22 ± 3% +276.7% 1786 ± 3% +282.9% 1815 ± 2% +282.9% 1815 ± 2% time.system_time
63.58 -17.2% 52.64 ± 3% -17.8% 52.24 ± 5% -18.4% 51.86 ± 3% time.user_time
347556 ± 6% -26.4% 255772 ± 5% -28.1% 249724 ± 5% -28.4% 248893 ± 5% time.voluntary_context_switches
155577 ± 2% -7.2% 144386 ± 3% -9.3% 141161 ± 2% -7.6% 143717 ± 2% numa-meminfo.node0.PageTables
12758 ± 8% +171.6% 34650 ± 68% +117.5% 27743 ± 38% +122.3% 28364 ± 32% numa-meminfo.node0.Shmem
31281633 ± 6% -37.6% 19515816 ± 25% -36.1% 19979648 ± 27% -42.1% 18122110 ± 25% numa-meminfo.node1.Active
31281573 ± 6% -37.6% 19515752 ± 25% -36.1% 19979597 ± 27% -42.1% 18122082 ± 25% numa-meminfo.node1.Active(anon)
28059820 ± 5% +40.6% 39461616 ± 15% +44.6% 40576407 ± 14% +50.4% 42208234 ± 12% numa-meminfo.node1.Inactive
28059215 ± 5% +40.6% 39461201 ± 15% +44.6% 40575754 ± 14% +50.4% 42207926 ± 12% numa-meminfo.node1.Inactive(anon)
178279 -16.5% 148899 ± 2% -14.1% 153180 ± 3% -14.4% 152652 ± 2% numa-meminfo.node1.PageTables
20873 ± 7% +215.2% 65784 ± 13% +225.2% 67888 ± 5% +243.6% 71729 ± 9% numa-meminfo.node1.Shmem
32068 ± 55% -54.9% 14471 ± 69% -22.1% 24988 ± 58% -46.4% 17195 ± 65% numa-meminfo.node1.SwapCached
1296 ± 6% -49.8% 650.40 ± 9% -50.0% 647.65 ± 10% -50.5% 641.72 ± 14% numa-meminfo.node1.Writeback
38311 ± 5% -40.8% 22667 -41.1% 22583 ± 2% -41.3% 22494 vm-scalability.median
1234132 ± 4% -40.7% 732265 -40.8% 730989 ± 3% -40.9% 729108 vm-scalability.throughput
239.68 ± 5% +28.4% 307.63 +28.4% 307.85 +28.4% 307.77 vm-scalability.time.elapsed_time
239.68 ± 5% +28.4% 307.63 +28.4% 307.85 +28.4% 307.77 vm-scalability.time.elapsed_time.max
6297 ± 5% +60.9% 10134 +63.0% 10265 +63.5% 10295 vm-scalability.time.involuntary_context_switches
62687446 -24.8% 47163918 ± 2% -24.9% 47061196 ± 3% -25.1% 46934963 vm-scalability.time.minor_page_faults
224.00 ± 3% +166.7% 597.33 ± 3% +170.7% 606.31 ± 2% +170.7% 606.40 ± 2% vm-scalability.time.percent_of_cpu_this_job_got
474.22 ± 3% +276.7% 1786 ± 3% +282.9% 1815 ± 2% +282.9% 1815 ± 2% vm-scalability.time.system_time
63.58 -17.2% 52.64 ± 3% -17.8% 52.24 ± 5% -18.4% 51.86 ± 3% vm-scalability.time.user_time
347556 ± 6% -26.4% 255772 ± 5% -28.1% 249724 ± 5% -28.4% 248893 ± 5% vm-scalability.time.voluntary_context_switches
2.821e+08 -22.0% 2.2e+08 -22.2% 2.196e+08 ± 2% -22.3% 2.191e+08 vm-scalability.workload
38829 ± 2% -7.3% 35978 ± 3% -9.5% 35150 ± 3% -7.8% 35819 ± 2% numa-vmstat.node0.nr_page_table_pages
3197 ± 8% +172.3% 8706 ± 67% +117.4% 6953 ± 37% +121.9% 7095 ± 32% numa-vmstat.node0.nr_shmem
4670410 ± 6% -22.0% 3641951 ± 8% -27.6% 3382558 ± 4% -29.1% 3313347 ± 7% numa-vmstat.node0.nr_vmscan_write
9127776 ± 5% -21.5% 7169554 ± 5% -25.3% 6819419 ± 4% -26.6% 6702778 ± 7% numa-vmstat.node0.nr_written
16661751 ± 42% -59.8% 6694161 ± 70% -29.8% 11689134 ± 53% -43.1% 9474687 ± 66% numa-vmstat.node0.numa_miss
16734663 ± 41% -59.5% 6770170 ± 69% -29.9% 11733904 ± 53% -43.0% 9533441 ± 66% numa-vmstat.node0.numa_other
7829198 ± 6% -36.9% 4941489 ± 24% -35.9% 5014994 ± 27% -42.1% 4531460 ± 25% numa-vmstat.node1.nr_active_anon
718935 ± 14% +53.9% 1106567 ± 31% +32.7% 954186 ± 37% +15.6% 830825 ± 13% numa-vmstat.node1.nr_free_pages
6977014 ± 5% +39.1% 9704494 ± 15% +44.1% 10053122 ± 14% +50.5% 10503788 ± 12% numa-vmstat.node1.nr_inactive_anon
44508 ± 2% -16.6% 37117 ± 2% -14.3% 38152 ± 3% -14.5% 38046 ± 2% numa-vmstat.node1.nr_page_table_pages
5222 ± 8% +218.1% 16612 ± 13% +225.6% 17003 ± 5% +243.9% 17957 ± 9% numa-vmstat.node1.nr_shmem
8026 ± 55% -55.0% 3611 ± 69% -22.4% 6228 ± 58% -46.6% 4289 ± 65% numa-vmstat.node1.nr_swapcached
8007802 ± 6% -47.6% 4196794 ± 7% -46.6% 4278458 ± 8% -47.8% 4179885 ± 6% numa-vmstat.node1.nr_vmscan_write
352.06 ± 7% -50.7% 173.73 ± 16% -54.4% 160.39 ± 16% -54.8% 158.99 ± 15% numa-vmstat.node1.nr_writeback
15775556 ± 6% -45.5% 8590752 ± 5% -44.3% 8782043 ± 9% -44.4% 8775781 ± 4% numa-vmstat.node1.nr_written
7829176 ± 6% -36.9% 4941484 ± 24% -35.9% 5014989 ± 27% -42.1% 4531456 ± 25% numa-vmstat.node1.nr_zone_active_anon
6977031 ± 5% +39.1% 9704497 ± 15% +44.1% 10053126 ± 14% +50.5% 10503791 ± 12% numa-vmstat.node1.nr_zone_inactive_anon
346.80 ± 7% -49.9% 173.73 ± 16% -53.9% 159.98 ± 16% -54.1% 159.14 ± 15% numa-vmstat.node1.nr_zone_write_pending
16665351 ± 42% -59.8% 6694094 ± 70% -29.9% 11678532 ± 53% -43.3% 9441584 ± 66% numa-vmstat.node1.numa_foreign
26917054 ± 23% -60.1% 10749940 ± 49% -39.8% 16204771 ± 41% -47.6% 14113544 ± 47% numa-vmstat.node1.numa_hit
26856045 ± 23% -60.2% 10693045 ± 50% -39.9% 16138287 ± 41% -47.7% 14039526 ± 47% numa-vmstat.node1.numa_local
427.78 ± 20% +93.8% 828.91 ± 64% +36.8% 585.39 ± 15% +25.7% 537.74 ± 15% sched_debug.cfs_rq:/.load_avg.max
13.46 ± 23% -63.0% 4.98 ± 47% -50.7% 6.64 ± 46% -56.0% 5.92 ± 43% sched_debug.cfs_rq:/.removed.load_avg.avg
59.42 ± 17% -48.6% 30.57 ± 34% -42.0% 34.44 ± 23% -43.9% 33.35 ± 32% sched_debug.cfs_rq:/.removed.load_avg.stddev
152.11 ± 26% -36.2% 96.98 ± 50% -35.2% 98.50 ± 14% -28.0% 109.46 ± 31% sched_debug.cfs_rq:/.removed.runnable_avg.max
152.11 ± 26% -36.3% 96.87 ± 50% -35.3% 98.34 ± 15% -28.1% 109.40 ± 31% sched_debug.cfs_rq:/.removed.util_avg.max
81.74 ± 14% +11.4% 91.02 ± 5% +24.1% 101.46 ± 8% +19.8% 97.92 ± 5% sched_debug.cfs_rq:/.runnable_avg.avg
114.05 ± 16% +23.9% 141.35 ± 2% +31.5% 150.02 ± 5% +29.2% 147.38 ± 4% sched_debug.cfs_rq:/.runnable_avg.stddev
81.31 ± 14% +11.5% 90.70 ± 5% +24.2% 100.99 ± 8% +19.9% 97.52 ± 5% sched_debug.cfs_rq:/.util_avg.avg
113.70 ± 16% +23.9% 140.92 ± 2% +31.5% 149.52 ± 5% +29.3% 147.02 ± 4% sched_debug.cfs_rq:/.util_avg.stddev
10.59 ± 25% +131.9% 24.56 ± 10% +142.6% 25.70 ± 29% +142.5% 25.69 ± 12% sched_debug.cfs_rq:/.util_est.avg
49.96 ± 28% +71.7% 85.76 ± 5% +77.0% 88.41 ± 16% +86.3% 93.05 ± 8% sched_debug.cfs_rq:/.util_est.stddev
130266 ± 19% +38.9% 180973 ± 8% +28.9% 167881 ± 7% +29.6% 168787 ± 8% sched_debug.cpu.clock.avg
130457 ± 19% +38.9% 181208 ± 8% +28.9% 168141 ± 7% +29.5% 169004 ± 7% sched_debug.cpu.clock.max
130028 ± 19% +38.9% 180638 ± 8% +28.8% 167497 ± 7% +29.6% 168482 ± 8% sched_debug.cpu.clock.min
129816 ± 19% +39.0% 180459 ± 8% +29.0% 167404 ± 7% +29.6% 168307 ± 8% sched_debug.cpu.clock_task.avg
130389 ± 19% +38.9% 181122 ± 8% +28.9% 168056 ± 7% +29.6% 168923 ± 7% sched_debug.cpu.clock_task.max
121562 ± 20% +40.5% 170799 ± 8% +29.8% 157738 ± 8% +30.5% 158608 ± 8% sched_debug.cpu.clock_task.min
573.18 ± 25% +47.9% 847.53 ± 8% +28.1% 734.25 ± 11% +27.2% 728.81 ± 15% sched_debug.cpu.nr_switches.min
0.15 ± 11% +8.3% 0.17 ± 11% +15.9% 0.18 ± 15% +20.3% 0.18 ± 7% sched_debug.cpu.nr_uninterruptible.avg
4.07 ± 14% +43.8% 5.86 ± 5% +41.2% 5.75 ± 10% +42.1% 5.79 ± 9% sched_debug.cpu.nr_uninterruptible.stddev
130026 ± 19% +38.9% 180621 ± 8% +28.8% 167481 ± 7% +29.6% 168466 ± 8% sched_debug.cpu_clk
129318 ± 19% +39.1% 179912 ± 8% +29.0% 166774 ± 7% +29.7% 167759 ± 8% sched_debug.ktime
130797 ± 19% +38.7% 181392 ± 8% +28.6% 168261 ± 7% +29.4% 169239 ± 8% sched_debug.sched_clk
191035 ± 7% -29.3% 135009 ± 4% -29.8% 134046 ± 8% -29.9% 133825 ± 4% proc-vmstat.allocstall_movable
3850 ± 11% +78.5% 6872 ± 12% +69.7% 6532 ± 10% +76.3% 6786 ± 10% proc-vmstat.allocstall_normal
13525554 ± 10% -32.5% 9125751 ± 17% -21.4% 10625542 ± 26% -29.1% 9588171 ± 29% proc-vmstat.nr_active_anon
16631565 ± 8% +23.8% 20588362 ± 9% +15.8% 19252926 ± 14% +23.8% 20585579 ± 13% proc-vmstat.nr_inactive_anon
83457 -12.1% 73319 -11.8% 73637 ± 2% -11.2% 74151 proc-vmstat.nr_page_table_pages
8392 ± 3% +198.4% 25047 ± 26% +184.6% 23884 ± 12% +196.1% 24854 ± 10% proc-vmstat.nr_shmem
79380 -0.4% 79057 -0.3% 79108 -5.1% 75299 proc-vmstat.nr_slab_unreclaimable
12629057 ± 5% -39.1% 7691618 ± 5% -39.8% 7607004 ± 5% -41.2% 7422326 ± 5% proc-vmstat.nr_vmscan_write
440.92 ± 10% -42.0% 255.71 ± 15% -46.9% 234.18 ± 13% -43.9% 247.16 ± 15% proc-vmstat.nr_writeback
24903332 ± 5% -36.7% 15760306 ± 4% -37.4% 15601462 ± 7% -37.8% 15478560 ± 5% proc-vmstat.nr_written
13525564 ± 10% -32.5% 9125755 ± 17% -21.4% 10625546 ± 26% -29.1% 9588177 ± 29% proc-vmstat.nr_zone_active_anon
16631569 ± 8% +23.8% 20588365 ± 9% +15.8% 19252929 ± 14% +23.8% 20585582 ± 13% proc-vmstat.nr_zone_inactive_anon
443.01 ± 10% -42.0% 257.00 ± 16% -47.0% 234.79 ± 13% -43.4% 250.68 ± 14% proc-vmstat.nr_zone_write_pending
24485570 ± 3% -15.4% 20714438 ± 3% -15.2% 20753649 ± 3% -16.4% 20472331 ± 3% proc-vmstat.numa_foreign
39260606 ± 2% -30.1% 27457969 ± 4% -30.4% 27338587 ± 6% -30.0% 27473772 ± 2% proc-vmstat.numa_hit
39098081 ± 2% -30.1% 27325222 ± 4% -30.4% 27205934 ± 6% -30.1% 27340944 ± 2% proc-vmstat.numa_local
24482446 ± 3% -15.5% 20696329 ± 3% -15.2% 20764313 ± 3% -16.1% 20537690 ± 3% proc-vmstat.numa_miss
24643161 ± 3% -15.5% 20828939 ± 3% -15.2% 20886221 ± 3% -16.3% 20637029 ± 3% proc-vmstat.numa_other
7478080 ± 19% +140.2% 17959948 ± 8% +149.2% 18637853 ± 14% +149.0% 18622062 ± 11% proc-vmstat.numa_pte_updates
63140512 -24.7% 47553512 -24.7% 47523846 ± 3% -25.0% 47327605 ± 2% proc-vmstat.pgalloc_normal
63461017 -24.5% 47896127 ± 2% -24.7% 47801824 ± 3% -24.9% 47669279 proc-vmstat.pgfault
64134373 -24.6% 48331932 ± 2% -25.1% 48010799 ± 3% -25.2% 47988257 proc-vmstat.pgfree
2796 ± 78% -70.9% 815.00 ± 50% -57.0% 1202 ± 64% -72.0% 782.30 ± 64% proc-vmstat.pgmigrate_fail
99615377 ± 5% -36.7% 63043276 ± 4% -37.4% 62407899 ± 7% -37.8% 61916291 ± 5% proc-vmstat.pgpgout
34932 ± 3% -7.8% 32198 ± 2% -8.1% 32104 ± 3% -8.9% 31826 ± 2% proc-vmstat.pgreuse
21507042 ± 5% -36.0% 13775181 ± 4% -36.7% 13623024 ± 7% -37.3% 13487008 ± 5% proc-vmstat.pgrotated
58427243 ± 10% -43.5% 32993860 ± 12% -39.1% 35582889 ± 16% -42.7% 33494232 ± 18% proc-vmstat.pgscan_anon
44324880 ± 10% -37.2% 27839440 ± 10% -34.2% 29186311 ± 14% -36.9% 27972671 ± 15% proc-vmstat.pgscan_direct
14102763 ± 23% -63.4% 5154838 ± 27% -54.6% 6396957 ± 40% -60.8% 5521838 ± 35% proc-vmstat.pgscan_kswapd
2666 ± 88% -90.7% 248.33 ±137% -78.1% 583.38 ± 97% -83.3% 446.20 ±124% proc-vmstat.pgskip_normal
24911061 ± 5% -36.7% 15767491 ± 4% -37.3% 15611299 ± 7% -37.8% 15487227 ± 5% proc-vmstat.pgsteal_anon
17074863 ± 8% -25.3% 12754191 ± 5% -26.2% 12608140 ± 7% -26.4% 12564844 ± 4% proc-vmstat.pgsteal_direct
7836517 ± 8% -61.5% 3013661 ± 7% -61.7% 3003472 ± 8% -62.7% 2922611 ± 11% proc-vmstat.pgsteal_kswapd
24903332 ± 5% -36.7% 15760306 ± 4% -37.4% 15601462 ± 7% -37.8% 15478560 ± 5% proc-vmstat.pswpout
78185 ± 27% -82.8% 13463 ± 52% -74.5% 19910 ± 68% -71.3% 22474 ± 49% proc-vmstat.workingset_nodereclaim
1.85 ± 4% -31.7% 1.26 -32.9% 1.24 ± 2% -34.2% 1.22 ± 2% perf-stat.i.MPKI
1.992e+09 ± 3% -18.9% 1.615e+09 ± 2% -17.6% 1.641e+09 ± 2% -17.0% 1.653e+09 perf-stat.i.branch-instructions
0.93 ± 6% +0.6 1.55 ± 3% +0.6 1.55 ± 3% +0.6 1.54 ± 2% perf-stat.i.branch-miss-rate%
14377927 ± 11% +29.7% 18645141 ± 5% +33.1% 19132687 +34.0% 19271478 perf-stat.i.branch-misses
13.97 ± 3% -9.0 4.95 -9.0 4.98 ± 2% -9.0 4.92 ± 2% perf-stat.i.cache-miss-rate%
15782867 ± 3% -34.3% 10364434 ± 2% -33.6% 10475081 ± 2% -33.7% 10458719 ± 2% perf-stat.i.cache-misses
79049148 +92.6% 1.522e+08 ± 2% +93.5% 1.53e+08 +94.1% 1.534e+08 ± 2% perf-stat.i.cache-references
5344 -29.2% 3783 ± 2% -29.8% 3752 ± 3% -29.8% 3750 ± 2% perf-stat.i.context-switches
1.31 ± 2% +316.3% 5.46 ± 3% +317.0% 5.47 ± 3% +319.5% 5.50 ± 2% perf-stat.i.cpi
8.392e+09 ± 3% +197.0% 2.492e+10 ± 3% +201.9% 2.534e+10 ± 2% +204.2% 2.553e+10 ± 2% perf-stat.i.cpu-cycles
150.26 +14.1% 171.44 ± 3% +15.3% 173.26 ± 4% +15.4% 173.37 ± 3% perf-stat.i.cpu-migrations
737.89 ± 5% +500.7% 4432 ± 4% +514.1% 4531 ± 3% +529.4% 4644 ± 3% perf-stat.i.cycles-between-cache-misses
7.732e+09 ± 3% -17.2% 6.405e+09 ± 2% -15.9% 6.502e+09 ± 2% -15.3% 6.551e+09 perf-stat.i.instructions
0.80 -69.8% 0.24 ± 5% -69.9% 0.24 ± 3% -70.5% 0.24 ± 2% perf-stat.i.ipc
23.75 ± 27% -52.9% 11.19 ± 69% -23.7% 18.12 ± 42% -39.8% 14.31 ± 47% perf-stat.i.major-faults
2.55 ± 8% -38.4% 1.57 ± 4% -36.8% 1.61 ± 2% -36.3% 1.62 perf-stat.i.metric.K/sec
265295 ± 5% -42.5% 152670 ± 2% -41.6% 155041 ± 3% -41.4% 155453 perf-stat.i.minor-faults
265319 ± 5% -42.5% 152681 ± 2% -41.6% 155059 ± 3% -41.4% 155468 perf-stat.i.page-faults
2.04 ± 2% -20.6% 1.62 ± 2% -21.2% 1.61 -21.9% 1.59 ± 2% perf-stat.overall.MPKI
0.72 ± 12% +0.4 1.15 ± 4% +0.4 1.17 ± 2% +0.4 1.17 perf-stat.overall.branch-miss-rate%
19.95 ± 2% -13.1 6.84 -13.1 6.82 ± 2% -13.2 6.79 ± 2% perf-stat.overall.cache-miss-rate%
1.09 ± 2% +257.6% 3.88 ± 3% +260.0% 3.91 ± 3% +259.9% 3.91 ± 2% perf-stat.overall.cpi
532.42 ± 2% +350.1% 2396 ± 4% +356.9% 2432 ± 3% +360.8% 2453 ± 3% perf-stat.overall.cycles-between-cache-misses
0.92 -72.0% 0.26 ± 4% -72.2% 0.26 ± 3% -72.2% 0.26 ± 2% perf-stat.overall.ipc
6551 ± 2% +38.5% 9072 +39.5% 9138 +40.0% 9171 perf-stat.overall.path-length
1.982e+09 ± 3% -18.5% 1.616e+09 -17.8% 1.629e+09 ± 2% -17.2% 1.641e+09 perf-stat.ps.branch-instructions
14325844 ± 11% +29.7% 18584702 ± 5% +33.0% 19054101 +33.9% 19184930 perf-stat.ps.branch-misses
15697779 ± 3% -33.9% 10379452 ± 2% -33.8% 10385651 ± 2% -33.9% 10369490 ± 2% perf-stat.ps.cache-misses
78678984 +93.0% 1.518e+08 ± 2% +93.7% 1.524e+08 +94.3% 1.528e+08 ± 2% perf-stat.ps.cache-references
5321 -29.1% 3771 ± 2% -29.7% 3740 ± 3% -29.8% 3737 ± 2% perf-stat.ps.context-switches
8.355e+09 ± 3% +197.6% 2.487e+10 ± 3% +202.1% 2.524e+10 ± 2% +204.3% 2.543e+10 ± 2% perf-stat.ps.cpu-cycles
149.59 +14.2% 170.85 ± 3% +15.4% 172.66 ± 4% +15.5% 172.78 ± 3% perf-stat.ps.cpu-migrations
7.693e+09 ± 3% -16.8% 6.404e+09 -16.0% 6.459e+09 ± 2% -15.4% 6.507e+09 perf-stat.ps.instructions
23.73 ± 27% -52.9% 11.18 ± 69% -23.5% 18.15 ± 43% -39.7% 14.31 ± 48% perf-stat.ps.major-faults
263785 ± 5% -41.9% 153177 -41.8% 153437 ± 3% -41.7% 153864 perf-stat.ps.minor-faults
263809 ± 5% -41.9% 153188 -41.8% 153455 ± 3% -41.7% 153879 perf-stat.ps.page-faults
1.848e+12 ± 2% +8.0% 1.995e+12 +8.5% 2.006e+12 +8.7% 2.009e+12 perf-stat.total.instructions
0.09 ± 3% +316.6% 0.37 ±135% +154.1% 0.22 ±143% +255.1% 0.31 ±151% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.11 ± 13% -45.7% 0.06 ± 83% -51.4% 0.05 ±105% -62.6% 0.04 ±118% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.folio_alloc_swap.add_to_swap.shrink_folio_list
0.04 ± 15% -34.4% 0.02 ± 16% -38.3% 0.02 ± 44% -34.9% 0.02 ± 26% perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.05 ± 17% +600.7% 0.34 ±172% +199.1% 0.15 ±192% +905.3% 0.49 ±255% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.01 ± 26% +2198.6% 0.27 ±152% +15606.1% 1.83 ±182% +681.7% 0.09 ± 28% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.06 ± 8% +41.6% 0.09 ± 17% +34.7% 0.09 ± 16% +31.1% 0.08 ± 18% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.07 ± 8% +36.7% 0.10 ± 15% +16.9% 0.08 ± 12% +225.8% 0.23 ±182% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.01 ± 4% -5.6% 0.01 ± 29% +6.8% 0.01 ± 91% -21.7% 0.01 ± 16% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1.18 ± 45% +14663.2% 173.84 ±219% +5633.0% 67.51 ±366% +9459.9% 112.57 ±271% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.16 ± 7% +81.8% 0.29 ± 22% +7048.0% 11.56 ±377% +37.8% 0.22 ± 16% perf-sched.sch_delay.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
0.13 ± 13% -55.3% 0.06 ± 83% -55.6% 0.06 ±104% -65.7% 0.04 ±116% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.folio_alloc_swap.add_to_swap.shrink_folio_list
0.10 ± 23% +14.1% 0.12 ± 65% +56.1% 0.16 ± 53% +102.9% 0.21 ± 25% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.18 ± 11% +8754.6% 15.60 ±219% +3486.4% 6.32 ±370% +21811.5% 38.60 ±297% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.35 ±107% +2644.1% 256.70 ±154% +1144.2% 116.39 ±206% +667.0% 71.75 ±138% perf-sched.sch_delay.max.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
0.15 ± 25% +100.0% 0.31 ± 52% +78.6% 0.27 ± 25% +146.6% 0.38 ± 72% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.17 ± 10% +74.6% 0.30 ± 15% +48.4% 0.26 ± 16% +3855.3% 6.83 ±288% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.16 ± 12% +391.4% 0.80 ±148% +11691.7% 19.20 ±272% +7032.4% 11.61 ±155% perf-sched.sch_delay.max.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
0.11 ± 94% +1386.1% 1.62 ± 66% +2315.9% 2.63 ±235% +971.1% 1.17 ± 75% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.15 ± 18% +45.5% 0.22 ± 13% +42.4% 0.22 ± 19% +46.8% 0.22 ± 21% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
87.69 ± 2% +57.5% 138.09 ± 4% +64.0% 143.78 ± 6% +69.1% 148.27 ± 5% perf-sched.total_wait_and_delay.average.ms
87.52 ± 2% +57.6% 137.91 ± 4% +64.1% 143.59 ± 6% +69.3% 148.15 ± 5% perf-sched.total_wait_time.average.ms
5.16 ± 8% +20.1% 6.20 ± 15% +9.7% 5.66 ± 14% +26.9% 6.54 ± 13% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
7.23 ±142% +493.8% 42.93 ± 11% +581.8% 49.29 ± 12% +581.8% 49.29 ± 14% perf-sched.wait_and_delay.avg.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
89.03 ± 56% -97.9% 1.83 ±152% -89.9% 9.00 ±166% -93.5% 5.80 ± 69% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.__folio_lock_or_retry.do_swap_page
21.44 ± 3% +88.0% 40.32 ± 7% +102.6% 43.43 ± 7% +99.2% 42.71 ± 5% perf-sched.wait_and_delay.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
383.35 ± 3% +8.2% 414.60 ± 3% +6.6% 408.50 ± 3% +10.1% 422.07 ± 4% perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
40.29 ± 34% +602.5% 283.08 ± 58% +804.7% 364.55 ± 27% +691.2% 318.81 ± 28% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
4.06 -100.0% 0.00 -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
338.75 ± 23% -65.9% 115.54 ± 72% -59.4% 137.63 ± 81% -41.1% 199.64 ± 55% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
20.91 ± 4% +64.7% 34.43 ± 6% +80.9% 37.82 ± 8% +75.6% 36.71 ± 5% perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
5.97 ± 8% -27.5% 4.33 -26.1% 4.42 ± 3% -25.8% 4.43 ± 3% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
527.31 ± 2% +20.9% 637.26 ± 10% +18.4% 624.19 ± 5% +22.5% 645.76 ± 8% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
159.81 ± 2% +78.2% 284.75 ± 9% +76.7% 282.46 ± 11% +83.8% 293.80 ± 12% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
640.33 ± 11% +33.5% 854.67 ± 16% +25.1% 800.88 ± 17% +56.1% 999.80 ± 31% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
26.83 ±141% +777.6% 235.50 ± 20% +724.8% 221.31 ± 21% +886.8% 264.80 ± 33% perf-sched.wait_and_delay.count.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
5.00 +43.3% 7.17 ± 12% +33.8% 6.69 ± 15% +58.0% 7.90 ± 32% perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
7206 ± 4% -28.5% 5149 ± 12% -40.5% 4290 ± 14% -25.6% 5361 ± 35% perf-sched.wait_and_delay.count.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
8.67 ± 10% +38.5% 12.00 ± 16% +29.8% 11.25 ± 18% +61.5% 14.00 ± 33% perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
160.17 ± 10% -100.0% 0.00 -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
112.17 ± 33% +279.8% 426.00 ± 13% +241.2% 382.69 ± 26% +339.2% 492.60 ± 49% perf-sched.wait_and_delay.count.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
639.00 ± 11% +120.6% 1409 ± 18% +97.7% 1263 ± 16% +141.3% 1542 ± 35% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
15.52 ±141% +4400.0% 698.48 ± 63% +5577.1% 881.17 ± 46% +5092.6% 805.97 ± 58% perf-sched.wait_and_delay.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
3425 ± 44% -99.4% 22.13 ±141% -92.9% 243.04 ±282% -97.5% 84.27 ±127% perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.__folio_lock_or_retry.do_swap_page
1212 ± 4% +81.2% 2197 ± 12% +100.1% 2426 ± 23% +172.1% 3300 ± 76% perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
6.49 ± 46% -100.0% 0.00 -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
59.14 ± 24% +280.8% 225.17 ±153% +178.2% 164.54 ±133% +72.1% 101.75 ± 23% perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
81.27 ± 26% -60.3% 32.25 ± 60% -6.6% 75.94 ± 87% -27.1% 59.23 ± 48% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
3448 ± 12% +48.4% 5119 ± 23% +31.3% 4528 ± 22% +55.0% 5346 ± 33% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
5.07 ± 8% +15.0% 5.83 ± 9% +7.2% 5.44 ± 10% +22.9% 6.23 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
21.71 ± 7% +97.3% 42.83 ± 11% +126.1% 49.07 ± 12% +126.6% 49.19 ± 15% perf-sched.wait_time.avg.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
23.06 ± 17% +47.2% 33.94 ± 51% +96.9% 45.41 ± 33% +83.6% 42.33 ± 17% perf-sched.wait_time.avg.ms.__cond_resched.rmap_walk_anon.try_to_unmap.shrink_folio_list.evict_folios
6.41 ± 96% +426.5% 33.77 ± 20% +449.5% 35.25 ± 21% +449.5% 35.25 ± 37% perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
9.59 ± 52% +792.6% 85.58 ± 27% +958.8% 101.51 ± 22% +875.0% 93.48 ± 24% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
88.96 ± 56% -90.5% 8.44 ± 53% -80.2% 17.61 ±113% -89.0% 9.76 ± 37% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.__folio_lock_or_retry.do_swap_page
21.33 ± 3% +88.1% 40.12 ± 6% +102.3% 43.16 ± 7% +99.5% 42.56 ± 5% perf-sched.wait_time.avg.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle
383.33 ± 3% +8.2% 414.58 ± 3% +6.6% 408.48 ± 3% +10.1% 422.05 ± 4% perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
20.96 ± 16% +67.9% 35.18 ± 50% +121.1% 46.33 ± 46% +111.4% 44.29 ± 23% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
40.28 ± 34% +602.1% 282.81 ± 59% +800.4% 362.72 ± 27% +691.2% 318.72 ± 28% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
3.97 -16.8% 3.30 ± 3% -12.8% 3.46 ± 10% -11.9% 3.50 ± 5% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
338.69 ± 23% -54.8% 153.25 ± 23% -47.4% 178.01 ± 38% -33.6% 224.74 ± 30% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
22.02 ± 23% +462.9% 123.94 ± 26% +370.4% 103.58 ± 20% +430.7% 116.84 ± 30% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
20.81 ± 4% +65.0% 34.33 ± 6% +80.8% 37.63 ± 9% +75.8% 36.58 ± 5% perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
5.87 ± 8% -28.1% 4.22 -26.8% 4.29 ± 3% -26.3% 4.33 ± 3% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
527.30 ± 2% +20.9% 637.25 ± 10% +18.4% 624.18 ± 5% +22.5% 645.75 ± 8% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
159.22 ± 2% +78.8% 284.72 ± 9% +77.4% 282.42 ± 11% +84.5% 293.76 ± 12% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
42.83 ± 9% +1530.6% 698.38 ± 63% +1957.2% 881.10 ± 46% +1781.5% 805.85 ± 58% perf-sched.wait_time.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.__swap_writepage.swap_writepage
12.83 ± 82% +344.6% 57.05 ± 10% +364.9% 59.65 ± 14% +437.6% 68.98 ± 42% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
124.59 ± 77% +333.7% 540.35 ± 31% +480.6% 723.45 ± 41% +647.9% 931.82 ± 57% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
35.21 ± 27% +27.5% 44.90 ± 49% +62.6% 57.25 ± 54% +62.0% 57.03 ± 20% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
1212 ± 4% +81.2% 2197 ± 12% +100.1% 2426 ± 23% +172.1% 3299 ± 76% perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
220.10 ± 74% +283.8% 844.67 ± 11% +239.2% 746.66 ± 33% +262.0% 796.72 ± 31% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
59.03 ± 24% +281.3% 225.09 ±154% +163.3% 155.44 ±142% +72.2% 101.64 ± 24% perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.mempool_alloc_noprof.bio_alloc_bioset
81.17 ± 26% -60.4% 32.15 ± 60% -22.6% 62.82 ± 78% -27.2% 59.11 ± 48% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
3385 ± 10% +51.2% 5119 ± 23% +33.7% 4528 ± 22% +57.9% 5346 ± 33% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
79.77 -10.1 69.65 -10.8 69.01 ± 3% -10.3 69.43 ± 3% perf-profile.calltrace.cycles-pp.do_access
77.33 -7.8 69.51 -8.4 68.90 ± 3% -8.0 69.31 ± 3% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
7.43 ± 2% -6.8 0.66 ± 13% -6.8 0.66 ± 6% -6.8 0.67 ± 4% perf-profile.calltrace.cycles-pp.add_to_swap.shrink_folio_list.evict_folios.try_to_shrink_lruvec.shrink_one
6.76 ± 5% -5.8 0.95 ± 5% -5.8 0.97 ± 4% -5.9 0.90 ± 4% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty
6.24 ± 2% -5.7 0.58 ± 12% -5.7 0.58 ± 7% -5.6 0.59 ± 4% perf-profile.calltrace.cycles-pp.folio_alloc_swap.add_to_swap.shrink_folio_list.evict_folios.try_to_shrink_lruvec
5.73 ± 4% -5.6 0.17 ±141% -5.5 0.23 ±113% -5.5 0.26 ±100% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty
74.64 -5.4 69.25 -6.0 68.69 ± 3% -5.5 69.11 ± 3% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
74.54 -5.3 69.25 -5.9 68.69 ± 3% -5.4 69.10 ± 3% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
5.79 ± 3% -5.2 0.55 ± 11% -5.2 0.56 ± 7% -5.2 0.57 ± 4% perf-profile.calltrace.cycles-pp.__mem_cgroup_try_charge_swap.folio_alloc_swap.add_to_swap.shrink_folio_list.evict_folios
5.51 ± 4% -5.1 0.37 ± 72% -5.4 0.10 ±208% -5.5 0.05 ±299% perf-profile.calltrace.cycles-pp.do_rw_once
73.45 -4.3 69.17 -4.8 68.63 ± 3% -4.4 69.04 ± 3% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.92 ± 2% -3.5 0.44 ± 44% -3.7 0.26 ±100% -3.7 0.22 ±122% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty
72.77 -2.9 69.91 -3.2 69.54 ± 3% -2.8 69.97 ± 3% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
74.31 -1.9 72.37 -1.6 72.73 -1.8 72.54 ± 2% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +0.4 0.42 ± 72% +0.7 0.71 ± 33% +0.7 0.73 ± 17% perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.4 0.43 ± 72% +0.7 0.71 ± 34% +0.7 0.73 ± 17% perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
0.00 +0.4 0.43 ± 72% +0.7 0.71 ± 34% +0.7 0.73 ± 17% perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
0.00 +0.5 0.46 ± 72% +0.8 0.82 ± 23% +0.8 0.79 ± 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
0.00 +0.5 0.46 ± 72% +0.8 0.82 ± 23% +0.8 0.79 ± 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
0.00 +0.6 0.59 ± 7% +0.6 0.57 ± 5% +0.6 0.57 ± 6% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry
0.00 +0.7 0.66 ± 5% +0.6 0.63 ± 6% +0.6 0.62 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state
0.00 +0.7 0.69 ± 5% +0.7 0.66 ± 5% +0.7 0.65 ± 4% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 23% +0.8 0.83 ± 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.__get_user_pages.get_user_pages_remote.get_arg_page.copy_string_kernel
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 22% +0.8 0.83 ± 13% perf-profile.calltrace.cycles-pp.__get_user_pages.get_user_pages_remote.get_arg_page.copy_string_kernel.do_execveat_common
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 22% +0.8 0.83 ± 13% perf-profile.calltrace.cycles-pp.copy_string_kernel.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 22% +0.8 0.83 ± 13% perf-profile.calltrace.cycles-pp.get_arg_page.copy_string_kernel.do_execveat_common.__x64_sys_execve.do_syscall_64
0.00 +0.8 0.81 ± 26% +0.7 0.75 ± 22% +0.8 0.83 ± 13% perf-profile.calltrace.cycles-pp.get_user_pages_remote.get_arg_page.copy_string_kernel.do_execveat_common.__x64_sys_execve
0.00 +0.8 0.81 ± 12% +1.0 0.95 ± 20% +0.9 0.92 ± 44% perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
0.00 +0.8 0.81 ± 12% +1.0 0.95 ± 20% +0.9 0.92 ± 44% perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common
0.00 +0.8 0.81 ± 12% +1.0 0.95 ± 20% +0.9 0.92 ± 44% perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve
0.00 +0.8 0.81 ± 12% +1.0 0.95 ± 19% +0.9 0.92 ± 44% perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.9 0.95 ± 19% +1.3 1.26 ± 17% +1.2 1.20 ± 16% perf-profile.calltrace.cycles-pp._Fork
0.08 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +1.0 1.09 ± 26% perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.wp_page_copy
0.08 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +1.0 1.09 ± 26% perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.wp_page_copy.__handle_mm_fault
0.08 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +1.0 1.09 ± 26% perf-profile.calltrace.cycles-pp.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.wp_page_copy.__handle_mm_fault.handle_mm_fault
0.08 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +1.0 1.09 ± 26% perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.09 ±223% +1.0 1.06 ± 12% +1.0 1.06 ± 20% +1.0 1.09 ± 26% perf-profile.calltrace.cycles-pp.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.00 +1.3 1.26 ± 7% +1.2 1.21 ± 4% +1.2 1.21 ± 6% perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.00 +1.4 1.36 ± 30% +1.2 1.18 ± 35% +1.3 1.33 ± 20% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__get_user_pages.get_user_pages_remote.get_arg_page
0.00 +1.6 1.64 ± 6% +1.5 1.55 ± 5% +1.5 1.54 ± 5% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
1.21 ± 46% +2.0 3.22 ± 29% +2.7 3.94 ± 34% +2.2 3.40 ± 29% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
1.20 ± 46% +2.0 3.22 ± 29% +2.7 3.94 ± 34% +2.2 3.40 ± 29% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
1.20 ± 46% +2.0 3.22 ± 29% +2.7 3.94 ± 34% +2.2 3.40 ± 29% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
1.18 ± 47% +2.0 3.22 ± 29% +2.8 3.93 ± 34% +2.2 3.39 ± 29% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +2.1 2.09 ± 7% +2.0 1.97 ± 5% +1.9 1.95 ± 4% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.30 ±100% +2.2 2.48 ± 11% +2.3 2.65 ± 13% +2.4 2.69 ± 14% perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% +2.3 2.65 ± 13% +2.4 2.69 ± 14% perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% +2.3 2.65 ± 13% +2.4 2.69 ± 14% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% +2.3 2.65 ± 13% +2.4 2.69 ± 14% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
0.30 ±100% +2.2 2.48 ± 11% +2.4 2.65 ± 13% +2.4 2.69 ± 14% perf-profile.calltrace.cycles-pp.execve
67.34 +2.3 69.67 +1.8 69.15 ± 2% +2.2 69.57 ± 2% perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
67.27 +2.4 69.67 +1.9 69.14 ± 2% +2.3 69.57 ± 2% perf-profile.calltrace.cycles-pp.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
67.22 +2.4 69.66 +1.9 69.13 ± 2% +2.3 69.56 ± 2% perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page
67.12 +2.5 69.64 +2.0 69.13 ± 2% +2.4 69.55 ± 2% perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof.alloc_anon_folio
5.78 +3.1 8.85 ± 6% +3.3 9.10 ± 3% +3.4 9.17 ± 4% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
5.78 +3.1 8.85 ± 6% +3.3 9.10 ± 3% +3.4 9.17 ± 4% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
5.78 +3.1 8.85 ± 6% +3.3 9.10 ± 3% +3.4 9.17 ± 4% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
2.21 ± 6% +3.2 5.36 ± 11% +2.9 5.07 ± 6% +2.8 5.01 ± 5% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +4.1 8.97 ± 4% perf-profile.calltrace.cycles-pp.balance_pgdat.kswapd.kthread.ret_from_fork.ret_from_fork_asm
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +4.1 8.97 ± 4% perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork.ret_from_fork_asm
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +4.1 8.97 ± 4% perf-profile.calltrace.cycles-pp.shrink_many.shrink_node.balance_pgdat.kswapd.kthread
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +4.1 8.97 ± 4% perf-profile.calltrace.cycles-pp.shrink_node.balance_pgdat.kswapd.kthread.ret_from_fork
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +4.1 8.97 ± 4% perf-profile.calltrace.cycles-pp.shrink_one.shrink_many.shrink_node.balance_pgdat.kswapd
4.87 +3.8 8.62 ± 6% +4.0 8.84 ± 3% +4.1 8.97 ± 4% perf-profile.calltrace.cycles-pp.try_to_shrink_lruvec.shrink_one.shrink_many.shrink_node.balance_pgdat
66.53 +4.1 70.63 +3.6 70.13 ± 2% +4.1 70.63 ± 2% perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
6.72 ± 2% +4.5 11.21 ± 9% +3.9 10.60 ± 4% +3.8 10.50 ± 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
6.72 ± 2% +4.5 11.21 ± 9% +3.9 10.60 ± 4% +3.8 10.50 ± 5% perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
6.72 ± 2% +4.5 11.20 ± 9% +3.9 10.59 ± 4% +3.8 10.50 ± 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
6.94 ± 2% +4.6 11.50 ± 9% +4.0 10.90 ± 4% +3.8 10.78 ± 5% perf-profile.calltrace.cycles-pp.common_startup_64
3.57 ± 2% +5.1 8.64 ± 9% +4.6 8.16 ± 5% +4.5 8.08 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
3.65 ± 2% +5.5 9.14 ± 9% +5.0 8.62 ± 5% +4.9 8.54 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
4.52 ± 2% +6.3 10.82 ± 8% +5.7 10.24 ± 5% +5.6 10.15 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
64.31 +6.9 71.18 +7.3 71.57 +7.1 71.43 ± 2% perf-profile.calltrace.cycles-pp.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof
64.17 +7.4 71.56 +8.2 72.39 +8.2 72.40 perf-profile.calltrace.cycles-pp.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_noprof.alloc_pages_mpol_noprof
64.15 +7.4 71.56 +8.3 72.42 +8.3 72.40 perf-profile.calltrace.cycles-pp.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_noprof
62.53 +9.0 71.48 +9.8 72.37 +9.8 72.35 perf-profile.calltrace.cycles-pp.shrink_many.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath
62.50 +9.0 71.48 +9.9 72.37 +9.8 72.35 perf-profile.calltrace.cycles-pp.shrink_one.shrink_many.shrink_node.do_try_to_free_pages.try_to_free_pages
62.03 +9.4 71.45 +10.3 72.34 +10.3 72.32 perf-profile.calltrace.cycles-pp.try_to_shrink_lruvec.shrink_one.shrink_many.shrink_node.do_try_to_free_pages
66.79 +13.3 80.06 +14.4 81.18 +14.5 81.29 perf-profile.calltrace.cycles-pp.evict_folios.try_to_shrink_lruvec.shrink_one.shrink_many.shrink_node
63.11 +16.6 79.70 +17.9 81.02 +18.0 81.13 perf-profile.calltrace.cycles-pp.shrink_folio_list.evict_folios.try_to_shrink_lruvec.shrink_one.shrink_many
42.45 ± 2% +35.3 77.74 ± 2% +36.9 79.33 +37.0 79.46 perf-profile.calltrace.cycles-pp.try_to_unmap_flush_dirty.shrink_folio_list.evict_folios.try_to_shrink_lruvec.shrink_one
42.43 ± 2% +35.3 77.74 ± 2% +36.9 79.33 +37.0 79.45 perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_folio_list.evict_folios.try_to_shrink_lruvec
42.34 ± 2% +35.4 77.73 ± 2% +37.0 79.32 +37.1 79.45 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_folio_list.evict_folios
41.73 ± 2% +35.9 77.58 ± 2% +37.4 79.18 +37.6 79.30 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_folio_list
15.56 ± 2% -12.5 3.03 ± 4% -12.6 2.94 ± 2% -12.6 2.91 ± 2% perf-profile.children.cycles-pp.asm_sysvec_call_function
80.28 -10.4 69.87 -11.1 69.17 ± 3% -10.7 69.60 ± 3% perf-profile.children.cycles-pp.do_access
11.47 ± 4% -10.1 1.37 ± 4% -10.1 1.35 ± 3% -10.1 1.35 ± 2% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
10.79 ± 4% -9.5 1.32 ± 4% -9.5 1.31 ± 3% -9.5 1.31 ± 3% perf-profile.children.cycles-pp.__sysvec_call_function
11.76 ± 3% -9.4 2.34 ± 3% -9.5 2.28 ± 3% -9.5 2.26 ± 2% perf-profile.children.cycles-pp.sysvec_call_function
8.04 ± 2% -7.2 0.79 ± 11% -7.2 0.80 ± 5% -7.2 0.80 ± 2% perf-profile.children.cycles-pp.add_to_swap
7.85 ± 4% -6.7 1.19 ± 7% -6.6 1.24 ± 2% -6.7 1.14 ± 2% perf-profile.children.cycles-pp.llist_add_batch
6.84 ± 2% -6.2 0.69 ± 9% -6.1 0.70 ± 5% -6.1 0.71 ± 3% perf-profile.children.cycles-pp.folio_alloc_swap
6.38 ± 2% -5.7 0.65 ± 8% -5.7 0.67 ± 6% -5.7 0.68 ± 3% perf-profile.children.cycles-pp.__mem_cgroup_try_charge_swap
5.83 ± 7% -5.3 0.57 ± 50% -5.4 0.44 ± 3% -5.4 0.45 ± 6% perf-profile.children.cycles-pp.rmap_walk_anon
5.72 ± 4% -5.1 0.62 ± 15% -5.2 0.48 ± 17% -5.2 0.49 ± 18% perf-profile.children.cycles-pp.do_rw_once
5.03 ± 6% -4.6 0.47 ± 4% -4.6 0.46 ± 4% -4.6 0.44 ± 3% perf-profile.children.cycles-pp.flush_tlb_func
4.76 ± 4% -4.3 0.41 ±143% -4.6 0.15 ± 11% -4.6 0.15 ± 12% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
4.83 ± 2% -4.1 0.73 ± 3% -4.1 0.72 ± 3% -4.1 0.72 ± 3% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
4.27 ± 7% -3.9 0.34 ± 9% -3.9 0.32 ± 4% -3.9 0.33 ± 5% perf-profile.children.cycles-pp.try_to_unmap
4.31 ± 3% -3.9 0.40 ± 17% -4.0 0.34 ± 4% -4.0 0.35 ± 6% perf-profile.children.cycles-pp.pageout
4.46 ± 2% -3.8 0.64 ± 4% -3.8 0.65 ± 4% -3.8 0.66 ± 4% perf-profile.children.cycles-pp.llist_reverse_order
78.01 -3.6 74.42 -3.4 74.60 -3.6 74.39 perf-profile.children.cycles-pp.asm_exc_page_fault
3.88 ± 8% -3.6 0.30 ± 6% -3.6 0.29 ± 4% -3.6 0.30 ± 6% perf-profile.children.cycles-pp.try_to_unmap_one
3.94 ± 3% -3.6 0.37 ± 17% -3.6 0.32 ± 5% -3.6 0.32 ± 6% perf-profile.children.cycles-pp.swap_writepage
3.10 ± 4% -2.7 0.36 ± 77% -2.9 0.24 ± 6% -2.9 0.24 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
73.30 -2.3 70.98 -2.8 70.49 ± 2% -2.3 70.98 ± 2% perf-profile.children.cycles-pp.do_anonymous_page
2.48 ± 6% -2.3 0.19 ± 10% -2.3 0.16 ± 10% -2.3 0.16 ± 10% perf-profile.children.cycles-pp.get_page_from_freelist
2.36 ± 7% -2.1 0.25 ± 20% -2.1 0.25 ± 8% -2.1 0.24 ± 7% perf-profile.children.cycles-pp.swap_cgroup_record
2.31 ± 3% -2.1 0.25 ± 5% -2.0 0.27 ± 6% -2.0 0.27 ± 4% perf-profile.children.cycles-pp.page_counter_try_charge
2.39 -2.1 0.34 ± 5% -2.1 0.31 ± 6% -2.1 0.31 ± 5% perf-profile.children.cycles-pp.native_irq_return_iret
2.26 ± 5% -2.0 0.25 ±115% -2.2 0.10 ± 7% -2.2 0.10 ± 10% perf-profile.children.cycles-pp.folio_batch_move_lru
76.12 -2.0 74.16 -1.7 74.39 -1.9 74.18 perf-profile.children.cycles-pp.exc_page_fault
76.08 -1.9 74.15 -1.7 74.38 -1.9 74.18 perf-profile.children.cycles-pp.do_user_addr_fault
2.20 ± 4% -1.9 0.34 ± 12% -1.9 0.29 ± 9% -1.9 0.29 ± 12% perf-profile.children.cycles-pp._raw_spin_lock
1.85 ± 3% -1.8 0.09 ± 10% -1.8 0.09 ± 7% -1.8 0.09 ± 9% perf-profile.children.cycles-pp.native_flush_tlb_local
2.09 ± 2% -1.7 0.36 ± 23% -1.8 0.32 ± 8% -1.8 0.31 ± 6% perf-profile.children.cycles-pp.handle_softirqs
1.76 ± 6% -1.5 0.22 ± 8% -1.6 0.17 ± 17% -1.6 0.17 ± 20% perf-profile.children.cycles-pp.__pte_offset_map_lock
1.78 ± 8% -1.5 0.25 ±103% -1.6 0.13 ± 6% -1.6 0.14 ± 8% perf-profile.children.cycles-pp.folio_referenced
1.68 ± 3% -1.5 0.19 ± 36% -1.5 0.15 ± 6% -1.5 0.15 ± 4% perf-profile.children.cycles-pp.blk_complete_reqs
1.57 ± 6% -1.5 0.12 ± 31% -1.5 0.09 ± 7% -1.5 0.09 ± 7% perf-profile.children.cycles-pp.flush_smp_call_function_queue
1.54 ± 15% -1.4 0.11 ± 8% -1.4 0.12 ± 6% -1.4 0.12 ± 5% perf-profile.children.cycles-pp.set_tlb_ubc_flush_pending
1.61 ± 3% -1.4 0.18 ± 34% -1.5 0.14 ± 6% -1.5 0.15 ± 5% perf-profile.children.cycles-pp.scsi_end_request
1.61 ± 3% -1.4 0.18 ± 34% -1.5 0.14 ± 6% -1.5 0.15 ± 5% perf-profile.children.cycles-pp.scsi_io_completion
1.57 ± 4% -1.4 0.16 ± 11% -1.4 0.13 ± 12% -1.4 0.13 ± 16% perf-profile.children.cycles-pp.__mem_cgroup_charge
1.48 ± 4% -1.3 0.16 ± 35% -1.4 0.13 ± 6% -1.4 0.13 ± 4% perf-profile.children.cycles-pp.blk_update_request
1.44 ± 8% -1.3 0.13 ± 12% -1.3 0.12 ± 8% -1.3 0.12 ± 7% perf-profile.children.cycles-pp.__swap_writepage
1.38 ± 7% -1.3 0.08 ± 40% -1.3 0.07 ± 11% -1.3 0.07 ± 11% perf-profile.children.cycles-pp.__remove_mapping
1.33 ± 6% -1.3 0.07 ± 40% -1.3 0.05 ± 28% -1.3 0.06 ± 11% perf-profile.children.cycles-pp.do_softirq
1.32 ± 11% -1.2 0.08 ± 6% -1.3 0.07 ± 8% -1.3 0.06 ± 15% perf-profile.children.cycles-pp.rmqueue
1.34 ± 11% -1.2 0.12 ± 15% -1.2 0.09 ± 9% -1.2 0.09 ± 14% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
1.43 ± 7% -1.2 0.23 ±106% -1.3 0.10 ± 8% -1.3 0.11 ± 10% perf-profile.children.cycles-pp.__folio_batch_add_and_move
1.18 ± 12% -1.1 0.06 ± 7% -1.1 0.06 ± 10% -1.1 0.06 ± 11% perf-profile.children.cycles-pp.__rmqueue_pcplist
1.25 ± 4% -1.1 0.14 ± 50% -1.1 0.11 ± 6% -1.1 0.11 ± 4% perf-profile.children.cycles-pp.isolate_folios
1.24 ± 4% -1.1 0.14 ± 48% -1.1 0.11 ± 6% -1.1 0.11 ± 4% perf-profile.children.cycles-pp.scan_folios
1.19 ± 6% -1.1 0.12 ± 10% -1.1 0.11 ± 12% -1.1 0.11 ± 16% perf-profile.children.cycles-pp.try_charge_memcg
1.12 ± 13% -1.1 0.06 ± 9% -1.1 0.04 ± 58% -1.1 0.05 ± 35% perf-profile.children.cycles-pp.rmqueue_bulk
1.18 ± 4% -1.1 0.12 ± 25% -1.1 0.10 ± 8% -1.1 0.10 ± 7% perf-profile.children.cycles-pp.submit_bio_noacct_nocheck
1.25 ± 9% -1.0 0.20 ±122% -1.2 0.09 ± 9% -1.2 0.10 ± 9% perf-profile.children.cycles-pp.folio_referenced_one
1.14 ± 7% -1.0 0.11 ± 6% -1.0 0.11 ± 9% -1.0 0.12 ± 8% perf-profile.children.cycles-pp.mem_cgroup_id_get_online
1.13 ± 3% -1.0 0.12 ± 42% -1.0 0.10 ± 6% -1.0 0.10 ± 5% perf-profile.children.cycles-pp.end_swap_bio_write
1.08 ± 5% -1.0 0.09 ± 22% -1.0 0.09 ± 9% -1.0 0.08 ± 9% perf-profile.children.cycles-pp.add_to_swap_cache
1.10 ± 3% -1.0 0.12 ± 43% -1.0 0.09 ± 6% -1.0 0.10 ± 5% perf-profile.children.cycles-pp.folio_end_writeback
1.09 ± 4% -1.0 0.11 ± 26% -1.0 0.09 ± 7% -1.0 0.10 ± 6% perf-profile.children.cycles-pp.__submit_bio
1.06 ± 4% -1.0 0.11 ± 24% -1.0 0.09 ± 7% -1.0 0.10 ± 7% perf-profile.children.cycles-pp.blk_mq_submit_bio
1.04 ± 6% -0.9 0.12 ± 6% -0.9 0.12 ± 7% -0.9 0.12 ± 6% perf-profile.children.cycles-pp._find_next_bit
1.00 ± 2% -0.9 0.11 ± 47% -0.9 0.09 ± 7% -0.9 0.09 ± 7% perf-profile.children.cycles-pp.isolate_folio
0.96 ± 12% -0.9 0.08 ± 34% -0.9 0.06 ± 10% -0.9 0.06 ± 11% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.94 ± 3% -0.9 0.08 ± 38% -0.9 0.07 ± 12% -0.9 0.07 ± 19% perf-profile.children.cycles-pp.page_vma_mapped_walk
1.28 ± 9% -0.8 0.46 ± 15% -0.9 0.43 ± 5% -0.9 0.42 ± 6% perf-profile.children.cycles-pp.__irq_exit_rcu
0.95 ± 4% -0.8 0.15 ± 17% -0.8 0.13 ± 8% -0.8 0.13 ± 8% perf-profile.children.cycles-pp.__schedule
0.85 ± 3% -0.7 0.13 ± 8% -0.7 0.11 ± 9% -0.7 0.12 ± 11% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.75 ± 4% -0.7 0.08 ± 17% -0.7 0.06 ± 14% -0.7 0.06 ± 15% perf-profile.children.cycles-pp.sync_regs
1.16 ± 4% -0.6 0.52 ± 3% -0.7 0.50 ± 5% -0.7 0.51 ± 6% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.70 ± 4% -0.6 0.08 ± 61% -0.6 0.06 ± 10% -0.6 0.06 ± 14% perf-profile.children.cycles-pp.lru_gen_del_folio
0.70 ± 5% -0.6 0.08 ± 52% -0.6 0.06 ± 9% -0.6 0.05 ± 35% perf-profile.children.cycles-pp.lru_gen_add_folio
0.66 ± 8% -0.6 0.05 ± 48% -0.6 0.05 ± 26% -0.6 0.04 ± 50% perf-profile.children.cycles-pp.__folio_start_writeback
1.08 ± 4% -0.6 0.47 ± 3% -0.6 0.46 ± 5% -0.6 0.46 ± 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.69 ± 7% -0.6 0.11 ± 18% -0.6 0.10 ± 8% -0.6 0.10 ± 10% perf-profile.children.cycles-pp.schedule
0.75 ± 6% -0.6 0.18 ± 19% -0.5 0.21 ± 60% -0.6 0.16 ± 15% perf-profile.children.cycles-pp.worker_thread
0.65 ± 11% -0.5 0.10 ± 20% -0.6 0.09 ± 12% -0.6 0.10 ± 8% perf-profile.children.cycles-pp.__drain_all_pages
0.64 ± 4% -0.5 0.11 ± 6% -0.5 0.10 ± 9% -0.5 0.10 ± 9% perf-profile.children.cycles-pp.sysvec_call_function_single
0.64 ± 16% -0.5 0.12 ± 25% -0.5 0.11 ± 10% -0.5 0.11 ± 8% perf-profile.children.cycles-pp.asm_common_interrupt
0.64 ± 16% -0.5 0.12 ± 25% -0.5 0.11 ± 10% -0.5 0.11 ± 8% perf-profile.children.cycles-pp.common_interrupt
0.54 ± 6% -0.5 0.04 ± 75% -0.5 0.04 ± 58% -0.5 0.04 ± 50% perf-profile.children.cycles-pp.blk_mq_sched_dispatch_requests
0.54 ± 6% -0.5 0.04 ± 75% -0.5 0.04 ± 58% -0.5 0.04 ± 50% perf-profile.children.cycles-pp.__blk_mq_sched_dispatch_requests
0.53 ± 6% -0.5 0.04 ± 73% -0.5 0.04 ± 58% -0.5 0.04 ± 50% perf-profile.children.cycles-pp.__blk_mq_do_dispatch_sched
0.56 ± 12% -0.5 0.07 ± 23% -0.5 0.07 ± 15% -0.5 0.07 ± 10% perf-profile.children.cycles-pp.drain_pages_zone
0.54 ± 4% -0.5 0.06 ± 19% -0.5 0.05 ± 10% -0.5 0.05 ± 34% perf-profile.children.cycles-pp.__blk_flush_plug
0.52 ± 7% -0.5 0.03 ± 70% -0.5 0.00 -0.5 0.02 ±153% perf-profile.children.cycles-pp.lock_vma_under_rcu
0.54 ± 4% -0.5 0.06 ± 19% -0.5 0.05 ± 10% -0.5 0.05 ± 34% perf-profile.children.cycles-pp.blk_mq_flush_plug_list
0.54 ± 3% -0.5 0.06 ± 19% -0.5 0.05 ± 28% -0.5 0.04 ± 50% perf-profile.children.cycles-pp.blk_mq_dispatch_plug_list
0.51 ± 10% -0.4 0.08 ± 24% -0.4 0.07 ± 13% -0.4 0.07 ± 9% perf-profile.children.cycles-pp.free_pcppages_bulk
0.45 ± 5% -0.4 0.04 ± 75% -0.4 0.01 ±173% -0.4 0.03 ± 82% perf-profile.children.cycles-pp.__rq_qos_throttle
0.49 ± 7% -0.4 0.08 ± 17% -0.4 0.07 ± 11% -0.4 0.07 ± 8% perf-profile.children.cycles-pp.__pick_next_task
0.62 ± 4% -0.4 0.22 ± 8% -0.4 0.22 ± 6% -0.4 0.22 ± 6% perf-profile.children.cycles-pp.irqtime_account_irq
0.44 ± 5% -0.4 0.04 ± 75% -0.4 0.01 ±173% -0.4 0.03 ±100% perf-profile.children.cycles-pp.wbt_wait
0.42 ± 6% -0.4 0.04 ± 75% -0.4 0.01 ±264% -0.4 0.02 ±123% perf-profile.children.cycles-pp.rq_qos_wait
0.42 ± 5% -0.4 0.04 ± 45% -0.4 0.03 ± 77% -0.4 0.03 ± 82% perf-profile.children.cycles-pp.bio_alloc_bioset
0.66 ± 3% -0.3 0.31 ± 3% -0.4 0.31 ± 5% -0.3 0.32 ± 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.65 ± 3% -0.3 0.31 ± 3% -0.3 0.31 ± 5% -0.3 0.32 ± 4% perf-profile.children.cycles-pp.hrtimer_interrupt
0.48 ± 3% -0.3 0.15 ± 8% -0.3 0.15 ± 5% -0.3 0.15 ± 7% perf-profile.children.cycles-pp.sched_clock_cpu
0.40 ± 10% -0.3 0.06 ± 17% -0.3 0.05 ± 39% -0.3 0.05 ± 9% perf-profile.children.cycles-pp.pick_next_task_fair
0.46 ± 6% -0.3 0.14 ± 23% -0.3 0.17 ± 73% -0.3 0.12 ± 19% perf-profile.children.cycles-pp.process_one_work
0.44 ± 6% -0.3 0.12 ± 15% -0.3 0.12 ± 4% -0.3 0.12 ± 6% perf-profile.children.cycles-pp.tick_nohz_stop_tick
0.38 ± 10% -0.3 0.06 ± 19% -0.3 0.06 ± 11% -0.3 0.06 ± 10% perf-profile.children.cycles-pp.sched_balance_newidle
0.56 ± 3% -0.3 0.24 ± 4% -0.3 0.24 ± 5% -0.3 0.25 ± 5% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.42 ± 10% -0.3 0.11 ± 13% -0.3 0.10 ± 7% -0.3 0.10 ± 10% perf-profile.children.cycles-pp.sched_balance_rq
0.42 ± 4% -0.3 0.12 ± 7% -0.3 0.12 ± 5% -0.3 0.12 ± 8% perf-profile.children.cycles-pp.sched_clock
0.45 ± 6% -0.3 0.16 ± 13% -0.3 0.15 ± 5% -0.3 0.15 ± 6% perf-profile.children.cycles-pp.tick_nohz_idle_stop_tick
0.37 ± 9% -0.3 0.09 ± 12% -0.3 0.09 ± 8% -0.3 0.08 ± 9% perf-profile.children.cycles-pp.sched_balance_find_src_group
0.50 ± 3% -0.3 0.23 ± 3% -0.3 0.22 ± 4% -0.3 0.23 ± 5% perf-profile.children.cycles-pp.tick_nohz_handler
0.36 ± 8% -0.3 0.09 ± 13% -0.3 0.08 ± 9% -0.3 0.08 ± 10% perf-profile.children.cycles-pp.update_sd_lb_stats
0.31 ± 8% -0.3 0.04 ± 71% -0.3 0.06 ± 11% -0.3 0.00 perf-profile.children.cycles-pp.tlb_is_not_lazy
0.33 ± 11% -0.3 0.08 ± 15% -0.3 0.08 ± 10% -0.3 0.08 ± 13% perf-profile.children.cycles-pp.update_sg_lb_stats
0.44 ± 4% -0.2 0.20 ± 4% -0.2 0.19 ± 5% -0.2 0.20 ± 5% perf-profile.children.cycles-pp.update_process_times
0.30 ± 7% -0.2 0.07 ± 10% -0.2 0.07 ± 12% -0.2 0.07 ± 10% perf-profile.children.cycles-pp.error_entry
0.29 ± 4% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.irq_work_run_list
0.39 ± 6% -0.2 0.19 ± 3% -0.2 0.19 ± 7% -0.2 0.18 ± 7% perf-profile.children.cycles-pp.native_sched_clock
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.__sysvec_irq_work
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp._printk
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.asm_sysvec_irq_work
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.irq_work_run
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.irq_work_single
0.28 ± 5% -0.2 0.09 ± 5% -0.2 0.08 ± 7% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.sysvec_irq_work
0.28 ± 5% -0.2 0.09 ± 10% -0.2 0.11 ± 88% -0.2 0.09 ± 15% perf-profile.children.cycles-pp.console_flush_all
0.28 ± 5% -0.2 0.09 ± 10% -0.2 0.11 ± 88% -0.2 0.09 ± 15% perf-profile.children.cycles-pp.console_unlock
0.28 ± 5% -0.2 0.09 ± 10% -0.2 0.11 ± 88% -0.2 0.09 ± 15% perf-profile.children.cycles-pp.vprintk_emit
0.28 ± 4% -0.2 0.09 ± 7% -0.2 0.11 ± 86% -0.2 0.08 ± 15% perf-profile.children.cycles-pp.serial8250_console_write
0.28 ± 5% -0.2 0.09 ± 9% -0.2 0.10 ± 77% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.wait_for_lsr
0.23 ± 15% -0.2 0.06 ± 65% -0.2 0.04 ± 83% -0.2 0.02 ±127% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
0.24 ± 7% -0.2 0.08 ± 13% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
0.24 ± 7% -0.2 0.08 ± 13% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.drm_fb_helper_damage_work
0.24 ± 7% -0.2 0.08 ± 13% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.drm_fbdev_shmem_helper_fb_dirty
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.drm_atomic_commit
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.drm_atomic_helper_commit
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.commit_tail
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail
0.24 ± 7% -0.2 0.08 ± 11% -0.1 0.11 ±115% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.drm_fb_memcpy
0.23 ± 8% -0.2 0.08 ± 11% -0.1 0.11 ±113% -0.2 0.08 ± 14% perf-profile.children.cycles-pp.memcpy_toio
0.19 ± 11% -0.1 0.06 ± 13% -0.1 0.07 ±105% -0.1 0.04 ± 51% perf-profile.children.cycles-pp.io_serial_in
0.19 ± 7% -0.1 0.06 ± 98% -0.1 0.06 ±133% -0.1 0.08 ±114% perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
0.20 ± 10% -0.1 0.12 ± 6% -0.1 0.11 ± 9% -0.1 0.11 ± 5% perf-profile.children.cycles-pp.sched_tick
0.11 ± 13% -0.0 0.06 ± 11% -0.0 0.06 ± 13% -0.1 0.06 ± 9% perf-profile.children.cycles-pp.sched_balance_domains
0.10 ± 4% -0.0 0.06 ± 13% -0.1 0.05 ± 27% -0.1 0.05 ± 7% perf-profile.children.cycles-pp.sched_core_idle_cpu
0.14 ± 5% -0.0 0.09 ± 7% -0.1 0.09 ± 9% -0.1 0.08 ± 8% perf-profile.children.cycles-pp.irqentry_enter
0.09 ± 14% -0.0 0.06 ± 9% -0.0 0.05 ± 27% -0.0 0.04 ± 51% perf-profile.children.cycles-pp.clockevents_program_event
0.09 ± 11% -0.0 0.06 ± 9% -0.0 0.06 ± 13% -0.0 0.06 ± 10% perf-profile.children.cycles-pp.task_tick_fair
0.12 ± 14% -0.0 0.10 ± 14% -0.0 0.10 ± 24% -0.0 0.09 ± 17% perf-profile.children.cycles-pp._nohz_idle_balance
0.00 +0.0 0.00 +0.0 0.00 +0.1 0.12 ± 5% perf-profile.children.cycles-pp.should_flush_tlb
0.03 ± 70% +0.0 0.08 ± 11% +0.0 0.08 ± 6% +0.0 0.08 ± 10% perf-profile.children.cycles-pp.read_tsc
0.00 +0.0 0.05 ± 45% +0.1 0.06 ± 12% +0.1 0.05 ± 8% perf-profile.children.cycles-pp.menu_reflect
0.00 +0.1 0.06 ± 6% +0.1 0.06 ± 10% +0.1 0.06 ± 11% perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.00 +0.1 0.06 ± 14% +0.1 0.06 ± 8% +0.1 0.06 ± 8% perf-profile.children.cycles-pp.ct_kernel_exit
0.00 +0.1 0.06 ± 14% +0.1 0.06 ± 7% +0.1 0.06 ± 12% perf-profile.children.cycles-pp.nr_iowait_cpu
0.00 +0.1 0.06 ± 7% +0.1 0.06 ± 9% +0.1 0.06 ± 10% perf-profile.children.cycles-pp.hrtimer_get_next_event
0.00 +0.1 0.07 ± 7% +0.1 0.07 ± 9% +0.1 0.07 ± 14% perf-profile.children.cycles-pp.tmigr_cpu_new_timer
0.00 +0.1 0.07 ± 10% +0.1 0.07 ± 9% +0.1 0.08 ± 6% perf-profile.children.cycles-pp.irq_work_needs_cpu
0.00 +0.1 0.08 ± 11% +0.1 0.08 ± 8% +0.1 0.08 ± 10% perf-profile.children.cycles-pp.get_cpu_device
0.15 ± 35% +0.1 0.24 ± 62% +0.2 0.36 ± 35% +0.2 0.33 ± 25% perf-profile.children.cycles-pp.alloc_bprm
0.21 ± 9% +0.1 0.29 ± 10% +0.1 0.30 ± 21% +0.1 0.28 ± 15% perf-profile.children.cycles-pp.rest_init
0.21 ± 9% +0.1 0.29 ± 10% +0.1 0.30 ± 21% +0.1 0.28 ± 15% perf-profile.children.cycles-pp.start_kernel
0.21 ± 9% +0.1 0.29 ± 10% +0.1 0.30 ± 21% +0.1 0.28 ± 15% perf-profile.children.cycles-pp.x86_64_start_kernel
0.21 ± 9% +0.1 0.29 ± 10% +0.1 0.30 ± 21% +0.1 0.28 ± 15% perf-profile.children.cycles-pp.x86_64_start_reservations
0.00 +0.1 0.09 ± 13% +0.1 0.08 ± 10% +0.1 0.09 ± 10% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.00 +0.1 0.09 ± 18% +0.1 0.09 ± 12% +0.1 0.09 ± 13% perf-profile.children.cycles-pp.intel_idle_irq
0.01 ±223% +0.1 0.10 ± 32% +0.1 0.10 ± 72% +0.1 0.10 ± 64% perf-profile.children.cycles-pp.load_elf_interp
0.00 +0.1 0.10 ± 12% +0.1 0.09 ± 9% +0.1 0.09 ± 9% perf-profile.children.cycles-pp.ct_kernel_enter
0.00 +0.1 0.10 ± 15% +0.1 0.10 ± 9% +0.1 0.09 ± 8% perf-profile.children.cycles-pp.tsc_verify_tsc_adjust
0.12 ± 9% +0.1 0.22 ± 8% +0.1 0.21 ± 5% +0.1 0.21 ± 7% perf-profile.children.cycles-pp.ktime_get
0.00 +0.1 0.10 ± 7% +0.1 0.10 ± 10% +0.1 0.10 ± 6% perf-profile.children.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.00 +0.1 0.11 ± 15% +0.1 0.10 ± 10% +0.1 0.10 ± 10% perf-profile.children.cycles-pp.tick_nohz_stop_idle
0.00 +0.1 0.11 ± 14% +0.1 0.10 ± 8% +0.1 0.10 ± 8% perf-profile.children.cycles-pp.arch_cpu_idle_enter
0.01 ±223% +0.1 0.14 ± 47% +0.1 0.09 ± 63% +0.0 0.05 ± 90% perf-profile.children.cycles-pp._IO_setvbuf
0.00 +0.1 0.13 ± 9% +0.1 0.12 ± 9% +0.1 0.12 ± 8% perf-profile.children.cycles-pp.ct_idle_exit
0.01 ±223% +0.1 0.14 ± 83% +0.1 0.11 ± 64% +0.1 0.14 ± 63% perf-profile.children.cycles-pp._copy_to_iter
0.01 ±223% +0.1 0.15 ± 8% +0.1 0.13 ± 6% +0.1 0.13 ± 8% perf-profile.children.cycles-pp.local_clock_noinstr
0.02 ±142% +0.1 0.16 ± 8% +0.1 0.15 ± 6% +0.1 0.15 ± 7% perf-profile.children.cycles-pp.cpuidle_governor_latency_req
0.01 ±223% +0.2 0.16 ± 40% +0.1 0.14 ± 62% +0.1 0.14 ± 49% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.01 ±223% +0.2 0.16 ± 40% +0.1 0.14 ± 62% +0.1 0.14 ± 49% perf-profile.children.cycles-pp.rseq_ip_fixup
0.08 ± 41% +0.2 0.23 ± 24% +0.2 0.27 ± 49% +0.2 0.32 ± 71% perf-profile.children.cycles-pp.write
0.01 ±223% +0.2 0.16 ± 39% +0.2 0.16 ± 59% +0.1 0.14 ± 40% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.16 ± 34% +0.2 0.32 ± 53% +0.3 0.47 ± 32% +0.3 0.45 ± 25% perf-profile.children.cycles-pp.mm_init
0.16 ± 36% +0.2 0.32 ± 53% +0.3 0.47 ± 32% +0.3 0.45 ± 26% perf-profile.children.cycles-pp.pgd_alloc
0.07 ± 63% +0.2 0.23 ± 24% +0.2 0.26 ± 50% +0.3 0.32 ± 71% perf-profile.children.cycles-pp.ksys_write
0.06 ± 60% +0.2 0.23 ± 24% +0.2 0.26 ± 51% +0.3 0.32 ± 71% perf-profile.children.cycles-pp.vfs_write
0.00 +0.2 0.17 ± 33% +0.3 0.26 ± 33% +0.2 0.23 ± 33% perf-profile.children.cycles-pp.copy_p4d_range
0.00 +0.2 0.17 ± 33% +0.3 0.26 ± 33% +0.2 0.23 ± 33% perf-profile.children.cycles-pp.copy_page_range
0.00 +0.2 0.18 ± 32% +0.3 0.27 ± 32% +0.2 0.24 ± 33% perf-profile.children.cycles-pp.dup_mmap
0.12 ± 15% +0.2 0.30 ± 6% +0.2 0.29 ± 5% +0.2 0.29 ± 8% perf-profile.children.cycles-pp.__get_next_timer_interrupt
0.00 +0.2 0.18 ± 38% +0.2 0.20 ± 44% +0.2 0.20 ± 48% perf-profile.children.cycles-pp.__do_fault
0.00 +0.2 0.18 ± 14% +0.2 0.23 ± 55% +0.2 0.16 ± 26% perf-profile.children.cycles-pp.__pmd_alloc
0.01 ±223% +0.2 0.20 ± 36% +0.3 0.33 ± 34% +0.2 0.23 ± 38% perf-profile.children.cycles-pp.__libc_fork
0.02 ±141% +0.2 0.21 ±130% +0.2 0.24 ±156% +0.5 0.50 ±124% perf-profile.children.cycles-pp.__cmd_record
0.04 ± 72% +0.2 0.27 ± 30% +0.3 0.38 ± 20% +0.3 0.36 ± 33% perf-profile.children.cycles-pp.dup_mm
0.07 ± 16% +0.2 0.30 ± 13% +0.2 0.30 ± 33% +0.2 0.27 ± 43% perf-profile.children.cycles-pp.elf_load
0.05 ± 82% +0.3 0.31 ± 44% +0.4 0.40 ± 27% +0.3 0.37 ± 31% perf-profile.children.cycles-pp.schedule_tail
0.15 ± 16% +0.3 0.41 ± 17% +0.3 0.41 ± 36% +0.3 0.42 ± 28% perf-profile.children.cycles-pp.__vfork
0.14 ± 17% +0.3 0.41 ± 17% +0.3 0.41 ± 36% +0.3 0.42 ± 28% perf-profile.children.cycles-pp.__x64_sys_vfork
0.03 ±101% +0.3 0.30 ± 13% +0.3 0.30 ± 33% +0.2 0.27 ± 44% perf-profile.children.cycles-pp.rep_stos_alternative
0.04 ± 71% +0.3 0.32 ± 19% +0.3 0.31 ± 15% +0.3 0.30 ± 15% perf-profile.children.cycles-pp.poll_idle
0.01 ±223% +0.3 0.29 ± 30% +0.3 0.29 ± 22% +0.3 0.32 ± 17% perf-profile.children.cycles-pp.___kmalloc_large_node
0.01 ±223% +0.3 0.29 ± 30% +0.3 0.29 ± 22% +0.3 0.32 ± 17% perf-profile.children.cycles-pp.__kmalloc_large_node_noprof
0.01 ±223% +0.3 0.29 ± 30% +0.3 0.30 ± 22% +0.3 0.32 ± 17% perf-profile.children.cycles-pp.__kmalloc_node_noprof
0.04 ±112% +0.3 0.32 ± 41% +0.4 0.40 ± 27% +0.3 0.37 ± 28% perf-profile.children.cycles-pp.__put_user_4
0.12 ± 26% +0.3 0.42 ± 17% +0.3 0.46 ± 27% +0.4 0.51 ± 25% perf-profile.children.cycles-pp.alloc_pages_bulk_noprof
0.09 ± 28% +0.3 0.40 ± 37% +0.5 0.60 ± 46% +0.6 0.68 ± 44% perf-profile.children.cycles-pp.__p4d_alloc
0.09 ± 28% +0.3 0.40 ± 37% +0.5 0.60 ± 46% +0.6 0.68 ± 44% perf-profile.children.cycles-pp.get_zeroed_page_noprof
0.10 ± 21% +0.3 0.43 ± 39% +0.4 0.45 ± 32% +0.4 0.52 ± 28% perf-profile.children.cycles-pp.__x64_sys_openat
0.10 ± 19% +0.3 0.43 ± 39% +0.4 0.45 ± 32% +0.4 0.52 ± 28% perf-profile.children.cycles-pp.do_sys_openat2
0.01 ±223% +0.3 0.34 ± 26% +0.4 0.36 ± 24% +0.4 0.38 ± 23% perf-profile.children.cycles-pp.__kvmalloc_node_noprof
0.01 ±223% +0.3 0.34 ± 26% +0.4 0.36 ± 24% +0.4 0.38 ± 23% perf-profile.children.cycles-pp.single_open_size
0.09 ± 22% +0.3 0.43 ± 39% +0.4 0.45 ± 32% +0.4 0.52 ± 28% perf-profile.children.cycles-pp.do_filp_open
0.09 ± 22% +0.3 0.43 ± 39% +0.4 0.45 ± 32% +0.4 0.52 ± 28% perf-profile.children.cycles-pp.path_openat
0.12 ± 6% +0.3 0.47 ± 8% +0.3 0.44 ± 6% +0.3 0.44 ± 7% perf-profile.children.cycles-pp.irq_enter_rcu
0.04 ± 45% +0.3 0.39 ± 34% +0.4 0.41 ± 27% +0.4 0.41 ± 29% perf-profile.children.cycles-pp.perf_evlist__poll
0.04 ± 45% +0.3 0.39 ± 34% +0.4 0.41 ± 27% +0.4 0.41 ± 28% perf-profile.children.cycles-pp.perf_evlist__poll_thread
0.04 ± 44% +0.4 0.39 ± 34% +0.4 0.42 ± 27% +0.4 0.41 ± 28% perf-profile.children.cycles-pp.perf_poll
0.04 ± 45% +0.4 0.40 ± 33% +0.4 0.42 ± 27% +0.4 0.42 ± 28% perf-profile.children.cycles-pp.do_poll
0.04 ± 45% +0.4 0.40 ± 33% +0.4 0.42 ± 27% +0.4 0.42 ± 28% perf-profile.children.cycles-pp.__x64_sys_poll
0.04 ± 45% +0.4 0.40 ± 33% +0.4 0.42 ± 27% +0.4 0.42 ± 28% perf-profile.children.cycles-pp.do_sys_poll
0.04 ± 45% +0.4 0.40 ± 33% +0.4 0.43 ± 27% +0.4 0.42 ± 28% perf-profile.children.cycles-pp.__poll
0.02 ±141% +0.4 0.38 ± 32% +0.4 0.39 ± 25% +0.4 0.46 ± 23% perf-profile.children.cycles-pp.vfs_open
0.02 ±141% +0.4 0.38 ± 32% +0.4 0.39 ± 25% +0.4 0.46 ± 23% perf-profile.children.cycles-pp.do_open
0.07 ± 14% +0.4 0.44 ± 9% +0.3 0.42 ± 6% +0.3 0.41 ± 7% perf-profile.children.cycles-pp.tick_irq_enter
0.01 ±223% +0.4 0.38 ± 32% +0.4 0.39 ± 25% +0.4 0.46 ± 23% perf-profile.children.cycles-pp.do_dentry_open
0.02 ±141% +0.4 0.39 ± 34% +0.4 0.41 ± 28% +0.4 0.41 ± 29% perf-profile.children.cycles-pp.__pollwait
0.10 ± 11% +0.4 0.47 ± 7% +0.4 0.46 ± 5% +0.4 0.46 ± 6% perf-profile.children.cycles-pp.tick_nohz_next_event
0.04 ± 72% +0.4 0.43 ± 10% +0.5 0.55 ± 32% +0.6 0.64 ± 45% perf-profile.children.cycles-pp.alloc_new_pud
0.22 ± 21% +0.4 0.62 ± 7% +0.3 0.56 ± 26% +0.4 0.59 ± 28% perf-profile.children.cycles-pp.do_pte_missing
0.06 ± 51% +0.4 0.47 ± 15% +0.5 0.60 ± 29% +0.6 0.67 ± 42% perf-profile.children.cycles-pp.setup_arg_pages
0.01 ±223% +0.4 0.42 ± 40% +0.4 0.44 ± 32% +0.5 0.52 ± 29% perf-profile.children.cycles-pp.open64
0.06 ± 50% +0.4 0.47 ± 15% +0.5 0.60 ± 30% +0.6 0.67 ± 42% perf-profile.children.cycles-pp.relocate_vma_down
0.05 ± 73% +0.4 0.47 ± 15% +0.6 0.60 ± 30% +0.6 0.66 ± 42% perf-profile.children.cycles-pp.move_page_tables
0.14 ± 10% +0.5 0.61 ± 7% +0.4 0.58 ± 5% +0.4 0.58 ± 6% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.10 ± 13% +0.5 0.56 ± 23% +0.6 0.74 ± 23% +0.6 0.73 ± 17% perf-profile.children.cycles-pp.__do_sys_clone
0.21 ± 26% +0.5 0.74 ± 29% +0.7 0.92 ± 18% +0.7 0.87 ± 17% perf-profile.children.cycles-pp.get_free_pages_noprof
0.16 ± 17% +0.5 0.69 ± 13% +0.6 0.76 ± 24% +0.6 0.78 ± 13% perf-profile.children.cycles-pp.alloc_thread_stack_node
0.16 ± 18% +0.5 0.70 ± 15% +0.6 0.77 ± 24% +0.6 0.79 ± 13% perf-profile.children.cycles-pp.dup_task_struct
0.08 ± 41% +0.5 0.62 ± 19% +0.5 0.58 ± 26% +0.5 0.56 ± 47% perf-profile.children.cycles-pp.copy_strings
0.15 ± 20% +0.6 0.73 ± 16% +0.7 0.81 ± 24% +0.7 0.84 ± 14% perf-profile.children.cycles-pp.__vmalloc_area_node
0.16 ± 17% +0.6 0.74 ± 15% +0.7 0.82 ± 24% +0.7 0.85 ± 13% perf-profile.children.cycles-pp.__vmalloc_node_range_noprof
0.19 ± 19% +0.6 0.81 ± 12% +0.8 0.96 ± 19% +0.8 0.97 ± 32% perf-profile.children.cycles-pp.bprm_execve
0.18 ± 20% +0.6 0.81 ± 12% +0.8 0.95 ± 20% +0.8 0.97 ± 33% perf-profile.children.cycles-pp.exec_binprm
0.18 ± 20% +0.6 0.81 ± 12% +0.8 0.95 ± 20% +0.8 0.97 ± 33% perf-profile.children.cycles-pp.search_binary_handler
0.18 ± 21% +0.6 0.81 ± 12% +0.8 0.95 ± 20% +0.8 0.97 ± 33% perf-profile.children.cycles-pp.load_elf_binary
0.10 ± 54% +0.7 0.81 ± 26% +0.6 0.75 ± 22% +0.7 0.83 ± 13% perf-profile.children.cycles-pp.copy_string_kernel
0.44 ±141% +0.7 1.15 ±100% +1.4 1.87 ± 71% +0.8 1.21 ± 83% perf-profile.children.cycles-pp.do_swap_page
0.24 ± 10% +0.7 0.98 ± 15% +0.9 1.15 ± 18% +0.9 1.15 ± 17% perf-profile.children.cycles-pp.kernel_clone
0.23 ± 12% +0.7 0.98 ± 15% +0.9 1.15 ± 18% +0.9 1.15 ± 17% perf-profile.children.cycles-pp.copy_process
0.16 ± 22% +0.8 0.95 ± 19% +1.1 1.26 ± 17% +1.0 1.20 ± 16% perf-profile.children.cycles-pp._Fork
0.09 ± 28% +0.9 0.96 ± 16% +0.8 0.89 ± 19% +0.9 0.98 ± 20% perf-profile.children.cycles-pp.__pud_alloc
0.32 ± 8% +0.9 1.27 ± 7% +0.9 1.23 ± 4% +0.9 1.22 ± 6% perf-profile.children.cycles-pp.menu_select
0.18 ± 38% +1.3 1.43 ± 20% +1.2 1.33 ± 19% +1.2 1.38 ± 14% perf-profile.children.cycles-pp.get_arg_page
0.18 ± 37% +1.3 1.43 ± 20% +1.2 1.33 ± 19% +1.2 1.38 ± 14% perf-profile.children.cycles-pp.__get_user_pages
0.18 ± 37% +1.3 1.43 ± 20% +1.2 1.33 ± 19% +1.2 1.38 ± 14% perf-profile.children.cycles-pp.get_user_pages_remote
0.39 ± 43% +1.4 1.82 ± 8% +1.5 1.93 ± 13% +1.4 1.81 ± 15% perf-profile.children.cycles-pp.wp_page_copy
0.53 ± 14% +1.9 2.48 ± 11% +2.1 2.65 ± 13% +2.2 2.69 ± 14% perf-profile.children.cycles-pp.execve
0.53 ± 15% +1.9 2.48 ± 11% +2.1 2.65 ± 13% +2.2 2.69 ± 14% perf-profile.children.cycles-pp.do_execveat_common
0.53 ± 15% +1.9 2.48 ± 11% +2.1 2.65 ± 13% +2.2 2.69 ± 14% perf-profile.children.cycles-pp.__x64_sys_execve
5.78 +3.1 8.85 ± 6% +3.3 9.10 ± 3% +3.4 9.17 ± 4% perf-profile.children.cycles-pp.kthread
2.28 ± 5% +3.2 5.48 ± 11% +2.9 5.20 ± 6% +2.8 5.12 ± 5% perf-profile.children.cycles-pp.intel_idle
5.84 +3.3 9.16 ± 7% +3.7 9.50 ± 3% +3.7 9.54 ± 4% perf-profile.children.cycles-pp.ret_from_fork
5.84 +3.4 9.20 ± 6% +3.7 9.54 ± 3% +3.7 9.58 ± 4% perf-profile.children.cycles-pp.ret_from_fork_asm
1.46 ± 7% +3.5 4.97 ± 8% +3.9 5.32 ± 11% +4.0 5.50 ± 11% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
1.46 ± 7% +3.5 4.97 ± 8% +3.9 5.32 ± 11% +4.0 5.50 ± 11% perf-profile.children.cycles-pp.do_syscall_64
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +4.1 8.97 ± 4% perf-profile.children.cycles-pp.balance_pgdat
4.87 +3.7 8.62 ± 6% +4.0 8.84 ± 3% +4.1 8.97 ± 4% perf-profile.children.cycles-pp.kswapd
68.16 +4.2 72.34 +3.7 71.89 ± 2% +4.0 72.20 ± 2% perf-profile.children.cycles-pp.vma_alloc_folio_noprof
6.72 ± 2% +4.5 11.21 ± 9% +3.9 10.60 ± 4% +3.8 10.50 ± 5% perf-profile.children.cycles-pp.start_secondary
6.94 ± 2% +4.6 11.50 ± 9% +4.0 10.90 ± 4% +3.8 10.78 ± 5% perf-profile.children.cycles-pp.common_startup_64
6.94 ± 2% +4.6 11.50 ± 9% +4.0 10.90 ± 4% +3.8 10.78 ± 5% perf-profile.children.cycles-pp.cpu_startup_entry
6.93 ± 2% +4.6 11.50 ± 9% +4.0 10.90 ± 4% +3.9 10.78 ± 5% perf-profile.children.cycles-pp.do_idle
68.51 +5.0 73.50 +5.2 73.70 +5.0 73.46 perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
3.78 +5.5 9.32 ± 9% +5.0 8.82 ± 5% +4.9 8.70 ± 5% perf-profile.children.cycles-pp.cpuidle_enter_state
3.80 +5.6 9.39 ± 9% +5.1 8.87 ± 5% +5.0 8.76 ± 5% perf-profile.children.cycles-pp.cpuidle_enter
4.70 +6.4 11.09 ± 8% +5.8 10.52 ± 4% +5.7 10.40 ± 5% perf-profile.children.cycles-pp.cpuidle_idle_call
69.40 +7.4 76.85 +8.2 77.62 +8.1 77.54 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
69.44 +8.1 77.55 +8.9 78.37 +8.9 78.36 perf-profile.children.cycles-pp.__alloc_pages_noprof
68.72 +8.7 77.46 +9.6 78.30 +9.6 78.30 perf-profile.children.cycles-pp.__alloc_pages_slowpath
65.97 +11.3 77.23 +12.1 78.09 +12.1 78.09 perf-profile.children.cycles-pp.try_to_free_pages
65.66 +11.5 77.18 +12.4 78.05 +12.4 78.05 perf-profile.children.cycles-pp.do_try_to_free_pages
70.51 +15.3 85.80 +16.4 86.90 +16.5 87.02 perf-profile.children.cycles-pp.shrink_node
68.95 +16.8 85.72 +17.9 86.84 +18.0 86.97 perf-profile.children.cycles-pp.shrink_many
68.92 +16.8 85.71 +17.9 86.83 +18.0 86.96 perf-profile.children.cycles-pp.shrink_one
68.42 +17.3 85.68 +18.4 86.80 +18.5 86.94 perf-profile.children.cycles-pp.try_to_shrink_lruvec
68.37 +17.3 85.67 +18.4 86.80 +18.6 86.93 perf-profile.children.cycles-pp.evict_folios
64.64 +20.7 85.30 ± 2% +22.0 86.63 +22.1 86.76 perf-profile.children.cycles-pp.shrink_folio_list
43.46 +39.8 83.30 ± 2% +41.4 84.85 +41.5 84.97 perf-profile.children.cycles-pp.try_to_unmap_flush_dirty
43.44 +39.9 83.30 ± 2% +41.4 84.85 +41.5 84.97 perf-profile.children.cycles-pp.arch_tlbbatch_flush
43.35 +40.0 83.33 ± 2% +41.5 84.87 +41.6 84.99 perf-profile.children.cycles-pp.on_each_cpu_cond_mask
43.34 +40.0 83.33 ± 2% +41.5 84.87 +41.6 84.99 perf-profile.children.cycles-pp.smp_call_function_many_cond
5.95 ± 4% -4.9 1.04 ± 7% -4.9 1.08 ± 2% -5.0 1.00 ± 2% perf-profile.self.cycles-pp.llist_add_batch
4.70 ± 4% -4.3 0.41 ±142% -4.6 0.15 ± 11% -4.6 0.15 ± 11% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
4.45 ± 2% -3.8 0.64 ± 4% -3.8 0.65 ± 4% -3.8 0.66 ± 4% perf-profile.self.cycles-pp.llist_reverse_order
4.31 ± 5% -3.8 0.53 ± 14% -3.9 0.40 ± 16% -3.9 0.42 ± 19% perf-profile.self.cycles-pp.do_rw_once
3.65 ± 2% -3.0 0.66 ± 4% -3.0 0.65 ± 3% -3.0 0.65 ± 4% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
3.14 ± 9% -2.8 0.36 ± 4% -2.8 0.36 ± 4% -2.8 0.35 ± 2% perf-profile.self.cycles-pp.flush_tlb_func
2.56 ± 5% -2.3 0.30 ± 16% -2.3 0.23 ± 18% -2.3 0.24 ± 15% perf-profile.self.cycles-pp.do_access
2.35 ± 4% -2.1 0.27 ± 6% -2.1 0.26 ± 5% -2.1 0.26 ± 3% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
2.39 -2.1 0.34 ± 5% -2.1 0.31 ± 6% -2.1 0.31 ± 5% perf-profile.self.cycles-pp.native_irq_return_iret
1.83 ± 3% -1.7 0.09 ± 10% -1.7 0.09 ± 7% -1.7 0.09 ± 10% perf-profile.self.cycles-pp.native_flush_tlb_local
1.92 ± 3% -1.7 0.24 ± 4% -1.7 0.25 ± 7% -1.7 0.26 ± 3% perf-profile.self.cycles-pp.page_counter_try_charge
1.69 ± 4% -1.4 0.31 ± 11% -1.4 0.26 ± 9% -1.4 0.26 ± 12% perf-profile.self.cycles-pp._raw_spin_lock
1.12 ± 14% -1.0 0.10 ± 6% -1.0 0.10 ± 7% -1.0 0.10 ± 5% perf-profile.self.cycles-pp.set_tlb_ubc_flush_pending
0.99 ± 6% -0.9 0.10 ± 10% -0.9 0.09 ± 8% -0.9 0.09 ± 9% perf-profile.self.cycles-pp.try_to_unmap_one
0.98 ± 7% -0.9 0.12 ± 11% -0.9 0.10 ± 16% -0.9 0.10 ± 13% perf-profile.self.cycles-pp.try_charge_memcg
0.94 ± 5% -0.8 0.10 ± 4% -0.8 0.11 ± 8% -0.8 0.11 ± 5% perf-profile.self.cycles-pp.mem_cgroup_id_get_online
0.75 ± 13% -0.7 0.07 ± 34% -0.7 0.04 ± 57% -0.7 0.04 ± 66% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.74 ± 4% -0.7 0.08 ± 17% -0.7 0.06 ± 14% -0.7 0.06 ± 15% perf-profile.self.cycles-pp.sync_regs
0.75 ± 5% -0.7 0.09 ± 27% -0.7 0.07 ± 11% -0.7 0.07 ± 7% perf-profile.self.cycles-pp.shrink_folio_list
0.76 ± 9% -0.7 0.10 ± 9% -0.7 0.10 ± 7% -0.7 0.10 ± 7% perf-profile.self.cycles-pp._find_next_bit
0.63 ± 3% -0.6 0.07 ± 16% -0.6 0.06 ± 10% -0.6 0.06 ± 9% perf-profile.self.cycles-pp.swap_writepage
0.57 ± 9% -0.5 0.04 ± 72% -0.5 0.02 ±100% -0.5 0.03 ±100% perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
0.47 ± 7% -0.4 0.02 ± 99% -0.5 0.02 ±129% -0.5 0.01 ±300% perf-profile.self.cycles-pp.rmqueue_bulk
0.56 ± 4% -0.4 0.13 ± 5% -0.4 0.13 ± 6% -0.4 0.13 ± 8% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.48 ± 6% -0.4 0.05 ± 7% -0.4 0.06 ± 26% -0.4 0.06 ± 11% perf-profile.self.cycles-pp.swap_cgroup_record
0.45 ± 6% -0.4 0.04 ±115% -0.4 0.00 -0.4 0.00 perf-profile.self.cycles-pp.lru_gen_add_folio
0.46 ± 2% -0.4 0.05 ± 90% -0.4 0.01 ±173% -0.4 0.02 ±152% perf-profile.self.cycles-pp.lru_gen_del_folio
0.40 ± 6% -0.4 0.04 ± 71% -0.4 0.01 ±173% -0.4 0.02 ±152% perf-profile.self.cycles-pp.get_page_from_freelist
0.38 ± 3% -0.3 0.04 ± 45% -0.4 0.00 ±387% -0.4 0.00 perf-profile.self.cycles-pp.do_anonymous_page
0.28 ± 8% -0.2 0.07 ± 12% -0.2 0.07 ± 11% -0.2 0.06 ± 10% perf-profile.self.cycles-pp.error_entry
0.38 ± 6% -0.2 0.19 ± 5% -0.2 0.18 ± 6% -0.2 0.18 ± 7% perf-profile.self.cycles-pp.native_sched_clock
0.25 ± 10% -0.2 0.06 ± 13% -0.2 0.05 ± 28% -0.2 0.05 ± 35% perf-profile.self.cycles-pp.update_sg_lb_stats
0.23 ± 7% -0.1 0.08 ± 10% -0.1 0.10 ±113% -0.2 0.08 ± 13% perf-profile.self.cycles-pp.memcpy_toio
0.19 ± 11% -0.1 0.06 ± 13% -0.1 0.07 ±105% -0.1 0.04 ± 51% perf-profile.self.cycles-pp.io_serial_in
0.16 ± 8% -0.1 0.06 ± 17% -0.1 0.06 ± 6% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.asm_sysvec_call_function
0.11 ± 11% -0.1 0.04 ± 44% -0.1 0.01 ±173% -0.1 0.01 ±300% perf-profile.self.cycles-pp.irqentry_enter
0.17 ± 8% -0.1 0.10 ± 8% -0.1 0.10 ± 8% -0.1 0.10 ± 9% perf-profile.self.cycles-pp.irqtime_account_irq
0.09 ± 7% -0.0 0.05 ± 8% -0.0 0.05 ± 38% -0.0 0.05 ± 34% perf-profile.self.cycles-pp.sched_core_idle_cpu
0.00 +0.0 0.00 +0.0 0.00 +0.1 0.10 ± 6% perf-profile.self.cycles-pp.should_flush_tlb
0.03 ± 70% +0.0 0.08 ± 6% +0.0 0.08 ± 7% +0.0 0.07 ± 10% perf-profile.self.cycles-pp.read_tsc
0.00 +0.1 0.05 ± 49% +0.1 0.06 ± 15% +0.1 0.05 ± 34% perf-profile.self.cycles-pp.intel_idle_irq
0.00 +0.1 0.05 ± 8% +0.0 0.04 ± 48% +0.0 0.04 ± 51% perf-profile.self.cycles-pp.__hrtimer_next_event_base
0.00 +0.1 0.06 ± 11% +0.1 0.06 ± 9% +0.1 0.06 ± 5% perf-profile.self.cycles-pp.tick_nohz_stop_tick
0.00 +0.1 0.06 ± 14% +0.1 0.06 ± 8% +0.1 0.06 ± 9% perf-profile.self.cycles-pp.cpuidle_enter
0.00 +0.1 0.06 ± 14% +0.1 0.06 ± 8% +0.1 0.06 ± 12% perf-profile.self.cycles-pp.nr_iowait_cpu
0.07 ± 12% +0.1 0.14 ± 11% +0.1 0.13 ± 8% +0.1 0.13 ± 6% perf-profile.self.cycles-pp.ktime_get
0.00 +0.1 0.06 ± 11% +0.1 0.06 ± 9% +0.1 0.06 ± 7% perf-profile.self.cycles-pp.irq_work_needs_cpu
0.00 +0.1 0.07 ± 11% +0.1 0.06 ± 10% +0.1 0.06 ± 11% perf-profile.self.cycles-pp.tsc_verify_tsc_adjust
0.00 +0.1 0.07 ± 13% +0.1 0.06 ± 13% +0.1 0.06 ± 12% perf-profile.self.cycles-pp.ct_kernel_enter
0.00 +0.1 0.07 ± 10% +0.1 0.07 ± 9% +0.1 0.07 ± 10% perf-profile.self.cycles-pp.tick_nohz_next_event
0.00 +0.1 0.08 ± 10% +0.1 0.08 ± 9% +0.1 0.08 ± 10% perf-profile.self.cycles-pp.get_cpu_device
0.00 +0.1 0.09 ± 4% +0.1 0.09 ± 9% +0.1 0.08 ± 10% perf-profile.self.cycles-pp.tick_irq_enter
0.01 ±223% +0.1 0.10 ± 9% +0.1 0.10 ± 9% +0.1 0.10 ± 9% perf-profile.self.cycles-pp.__get_next_timer_interrupt
0.00 +0.1 0.10 ± 11% +0.1 0.09 ± 9% +0.1 0.09 ± 6% perf-profile.self.cycles-pp.cpuidle_idle_call
0.00 +0.1 0.10 ± 9% +0.1 0.10 ± 8% +0.1 0.10 ± 6% perf-profile.self.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.02 ± 99% +0.3 0.30 ± 19% +0.3 0.30 ± 14% +0.3 0.29 ± 14% perf-profile.self.cycles-pp.poll_idle
0.13 ± 8% +0.4 0.50 ± 8% +0.4 0.49 ± 4% +0.3 0.48 ± 6% perf-profile.self.cycles-pp.menu_select
0.11 ± 13% +0.6 0.69 ± 9% +0.5 0.64 ± 4% +0.5 0.63 ± 5% perf-profile.self.cycles-pp.cpuidle_enter_state
2.28 ± 5% +3.2 5.48 ± 11% +2.9 5.20 ± 6% +2.8 5.12 ± 5% perf-profile.self.cycles-pp.intel_idle
24.40 +56.1 80.53 ± 2% +57.6 82.01 +57.8 82.17 perf-profile.self.cycles-pp.smp_call_function_many_cond
>
> ---8<---
>
> From 49af9b203e971d00c87b2d020f48602936870576 Mon Sep 17 00:00:00 2001
> From: Rik van Riel <riel@fb.com>
> Date: Mon, 2 Dec 2024 09:57:31 -0800
> Subject: [PATCH] x86,mm: only trim the mm_cpumask once a second
>
> Setting and clearing CPU bits in the mm_cpumask is only ever done
> by the CPU itself, from the context switch code or the TLB flush
> code.
>
> Synchronization is handled by switch_mm_irqs_off blocking interrupts.
>
> Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
> longer running the program causes a regression in the will-it-scale
> tlbflush2 test. This test is contrived, but a large regression here
> might cause a small regression in some real world workload.
>
> Instead of always sending IPIs to CPUs that are in the mm_cpumask,
> but no longer running the program, send these IPIs only once a second.
>
> The rest of the time we can skip over CPUs where the loaded_mm is
> different from the target mm.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reported-by: kernel test roboto <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
> ---
> arch/x86/include/asm/mmu.h | 2 ++
> arch/x86/include/asm/mmu_context.h | 1 +
> arch/x86/include/asm/tlbflush.h | 1 +
> arch/x86/mm/tlb.c | 35 +++++++++++++++++++++++++++---
> 4 files changed, 36 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> index ce4677b8b735..3b496cdcb74b 100644
> --- a/arch/x86/include/asm/mmu.h
> +++ b/arch/x86/include/asm/mmu.h
> @@ -37,6 +37,8 @@ typedef struct {
> */
> atomic64_t tlb_gen;
>
> + unsigned long next_trim_cpumask;
> +
> #ifdef CONFIG_MODIFY_LDT_SYSCALL
> struct rw_semaphore ldt_usr_sem;
> struct ldt_struct *ldt;
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 2886cb668d7f..795fdd53bd0a 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
>
> mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
> atomic64_set(&mm->context.tlb_gen, 0);
> + mm->context.next_trim_cpumask = jiffies + HZ;
>
> #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 69e79fff41b8..02fc2aa06e9e 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -222,6 +222,7 @@ struct flush_tlb_info {
> unsigned int initiating_cpu;
> u8 stride_shift;
> u8 freed_tables;
> + u8 trim_cpumask;
> };
>
> void flush_tlb_local(void);
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 1aac4fa90d3d..0507a6773a37 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -892,9 +892,36 @@ static void flush_tlb_func(void *info)
> nr_invalidate);
> }
>
> -static bool tlb_is_not_lazy(int cpu, void *data)
> +static bool should_flush_tlb(int cpu, void *data)
> {
> - return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
> + struct flush_tlb_info *info = data;
> +
> + /* Lazy TLB will get flushed at the next context switch. */
> + if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
> + return false;
> +
> + /* No mm means kernel memory flush. */
> + if (!info->mm)
> + return true;
> +
> + /* The target mm is loaded, and the CPU is not lazy. */
> + if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
> + return true;
> +
> + /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
> + if (info->trim_cpumask)
> + return true;
> +
> + return false;
> +}
> +
> +static bool should_trim_cpumask(struct mm_struct *mm)
> +{
> + if (time_after(jiffies, READ_ONCE(mm->context.next_trim_cpumask))) {
> + WRITE_ONCE(mm->context.next_trim_cpumask, jiffies + HZ);
> + return true;
> + }
> + return false;
> }
>
> DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
> @@ -928,7 +955,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
> if (info->freed_tables)
> on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
> else
> - on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
> + on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
> (void *)info, 1, cpumask);
> }
>
> @@ -979,6 +1006,7 @@ static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
> info->freed_tables = freed_tables;
> info->new_tlb_gen = new_tlb_gen;
> info->initiating_cpu = smp_processor_id();
> + info->trim_cpumask = 0;
>
> return info;
> }
> @@ -1021,6 +1049,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> * flush_tlb_func_local() directly in this case.
> */
> if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
> + info->trim_cpumask = should_trim_cpumask(mm);
> flush_tlb_multi(mm_cpumask(mm), info);
> } else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
> lockdep_assert_irqs_enabled();
> --
> 2.47.0
>
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [tip: x86/mm] x86/mm/tlb: Only trim the mm_cpumask once a second
2024-12-05 2:03 ` [PATCH v4] " Rik van Riel
2024-12-06 1:30 ` Oliver Sang
@ 2024-12-06 9:40 ` tip-bot2 for Rik van Riel
1 sibling, 0 replies; 24+ messages in thread
From: tip-bot2 for Rik van Riel @ 2024-12-06 9:40 UTC (permalink / raw)
To: linux-tip-commits
Cc: kernel test roboto, Rik van Riel, Ingo Molnar, Dave Hansen,
Andy Lutomirski, Mathieu Desnoyers, Peter Zijlstra,
Linus Torvalds, x86, linux-kernel
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 6db2526c1d694c91c6e05e2f186c085e9460f202
Gitweb: https://git.kernel.org/tip/6db2526c1d694c91c6e05e2f186c085e9460f202
Author: Rik van Riel <riel@fb.com>
AuthorDate: Wed, 04 Dec 2024 21:03:16 -05:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Dec 2024 10:26:20 +01:00
x86/mm/tlb: Only trim the mm_cpumask once a second
Setting and clearing CPU bits in the mm_cpumask is only ever done
by the CPU itself, from the context switch code or the TLB flush
code.
Synchronization is handled by switch_mm_irqs_off() blocking interrupts.
Sending TLB flush IPIs to CPUs that are in the mm_cpumask, but no
longer running the program causes a regression in the will-it-scale
tlbflush2 test. This test is contrived, but a large regression here
might cause a small regression in some real world workload.
Instead of always sending IPIs to CPUs that are in the mm_cpumask,
but no longer running the program, send these IPIs only once a second.
The rest of the time we can skip over CPUs where the loaded_mm is
different from the target mm.
Reported-by: kernel test roboto <oliver.sang@intel.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241204210316.612ee573@fangorn
Closes: https://lore.kernel.org/oe-lkp/202411282207.6bd28eae-lkp@intel.com/
---
arch/x86/include/asm/mmu.h | 2 ++-
arch/x86/include/asm/mmu_context.h | 1 +-
arch/x86/include/asm/tlbflush.h | 1 +-
arch/x86/mm/tlb.c | 35 ++++++++++++++++++++++++++---
4 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index ce4677b..3b496cd 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -37,6 +37,8 @@ typedef struct {
*/
atomic64_t tlb_gen;
+ unsigned long next_trim_cpumask;
+
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct rw_semaphore ldt_usr_sem;
struct ldt_struct *ldt;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2886cb6..795fdd5 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -151,6 +151,7 @@ static inline int init_new_context(struct task_struct *tsk,
mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
atomic64_set(&mm->context.tlb_gen, 0);
+ mm->context.next_trim_cpumask = jiffies + HZ;
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 69e79ff..02fc2aa 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -222,6 +222,7 @@ struct flush_tlb_info {
unsigned int initiating_cpu;
u8 stride_shift;
u8 freed_tables;
+ u8 trim_cpumask;
};
void flush_tlb_local(void);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3c30817..458a5d5 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -892,9 +892,36 @@ done:
nr_invalidate);
}
-static bool tlb_is_not_lazy(int cpu, void *data)
+static bool should_flush_tlb(int cpu, void *data)
{
- return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
+ struct flush_tlb_info *info = data;
+
+ /* Lazy TLB will get flushed at the next context switch. */
+ if (per_cpu(cpu_tlbstate_shared.is_lazy, cpu))
+ return false;
+
+ /* No mm means kernel memory flush. */
+ if (!info->mm)
+ return true;
+
+ /* The target mm is loaded, and the CPU is not lazy. */
+ if (per_cpu(cpu_tlbstate.loaded_mm, cpu) == info->mm)
+ return true;
+
+ /* In cpumask, but not the loaded mm? Periodically remove by flushing. */
+ if (info->trim_cpumask)
+ return true;
+
+ return false;
+}
+
+static bool should_trim_cpumask(struct mm_struct *mm)
+{
+ if (time_after(jiffies, READ_ONCE(mm->context.next_trim_cpumask))) {
+ WRITE_ONCE(mm->context.next_trim_cpumask, jiffies + HZ);
+ return true;
+ }
+ return false;
}
DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
@@ -928,7 +955,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
if (info->freed_tables)
on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
else
- on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
+ on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func,
(void *)info, 1, cpumask);
}
@@ -979,6 +1006,7 @@ static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
info->freed_tables = freed_tables;
info->new_tlb_gen = new_tlb_gen;
info->initiating_cpu = smp_processor_id();
+ info->trim_cpumask = 0;
return info;
}
@@ -1021,6 +1049,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
* flush_tlb_func_local() directly in this case.
*/
if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
+ info->trim_cpumask = should_trim_cpumask(mm);
flush_tlb_multi(mm_cpumask(mm), info);
} else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
lockdep_assert_irqs_enabled();
^ permalink raw reply related [flat|nested] 24+ messages in thread
end of thread, other threads:[~2024-12-06 9:40 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28 14:57 [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression kernel test robot
2024-11-28 16:21 ` Peter Zijlstra
2024-11-29 1:44 ` Oliver Sang
2024-11-28 19:46 ` Mathieu Desnoyers
2024-11-29 2:52 ` Rik van Riel
2024-12-02 16:30 ` Mathieu Desnoyers
2024-12-02 18:10 ` Rik van Riel
2024-12-02 16:50 ` Dave Hansen
2024-12-03 0:43 ` [PATCH] x86,mm: only trim the mm_cpumask once a second Rik van Riel
2024-12-04 13:15 ` Oliver Sang
2024-12-04 16:07 ` Rik van Riel
2024-12-04 16:56 ` [PATCH v3] " Rik van Riel
2024-12-04 20:19 ` Mathieu Desnoyers
2024-12-05 2:03 ` [PATCH v4] " Rik van Riel
2024-12-06 1:30 ` Oliver Sang
2024-12-06 9:40 ` [tip: x86/mm] x86/mm/tlb: Only " tip-bot2 for Rik van Riel
2024-12-03 1:22 ` [PATCH -tip] x86,mm: only " Rik van Riel
2024-12-03 14:57 ` Mathieu Desnoyers
2024-12-03 19:48 ` [PATCH v2] " Rik van Riel
2024-12-03 20:05 ` Dave Hansen
2024-12-03 20:07 ` Rik van Riel
2024-12-04 0:46 ` Dave Hansen
2024-12-04 1:43 ` Rik van Riel
2024-12-03 23:27 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox