* [linus:master] [iomap] c5c810b94c: stress-ng.metamix.ops_per_sec -98.4% regression @ 2024-10-14 7:55 kernel test robot 2024-10-14 16:34 ` Brian Foster 0 siblings, 1 reply; 5+ messages in thread From: kernel test robot @ 2024-10-14 7:55 UTC (permalink / raw) To: Brian Foster Cc: oe-lkp, lkp, linux-kernel, Christian Brauner, Darrick J. Wong, Josef Bacik, linux-xfs, linux-fsdevel, ying.huang, feng.tang, fengwei.yin, oliver.sang Hello, kernel test robot noticed a -98.4% regression of stress-ng.metamix.ops_per_sec on: commit: c5c810b94cfd818fc2f58c96feee58a9e5ead96d ("iomap: fix handling of dirty folios over unwritten extents") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: stress-ng config: x86_64-rhel-8.3 compiler: gcc-12 test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% disk: 1HDD testtime: 60s fs: xfs test: metamix cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202410141536.1167190b-oliver.sang@intel.com Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20241014/202410141536.1167190b-oliver.sang@intel.com ========================================================================================= compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/1HDD/xfs/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/metamix/stress-ng/60s commit: 6f634eb080 ("filemap: fix htmldoc warning for mapping_align_index()") c5c810b94c ("iomap: fix handling of dirty folios over unwritten extents") 6f634eb080161baa c5c810b94cfd818fc2f58c96fee ---------------- --------------------------- %stddev %change %stddev \ | \ 5.762e+09 ± 6% +1784.6% 1.086e+11 ± 3% cpuidle..time 537216 ± 8% +846.9% 5087100 ± 3% cpuidle..usage 5.83 ± 13% -72.0% 1.63 ± 3% iostat.cpu.idle 93.48 +5.1% 98.28 iostat.cpu.iowait 134.32 ± 4% +1197.6% 1742 ± 3% uptime.boot 2789 ± 3% +54.0% 4295 ± 4% uptime.idle 47.17 ±144% -81.6% 8.67 ± 24% perf-c2c.DRAM.local 89.00 ± 77% -58.4% 37.00 ± 9% perf-c2c.HITM.local 42.33 ± 86% -63.0% 15.67 ± 26% perf-c2c.HITM.remote 609419 ± 10% +144.5% 1489941 ± 18% numa-numastat.node0.local_node 628123 ± 10% +142.2% 1521154 ± 17% numa-numastat.node0.numa_hit 537937 ± 4% +288.1% 2087519 ± 10% numa-numastat.node1.local_node 585710 ± 4% +262.4% 2122854 ± 10% numa-numastat.node1.numa_hit 33057 ± 5% -94.2% 1926 ± 2% vmstat.io.bo 1.72 ± 6% -37.5% 1.08 vmstat.procs.r 5454 ± 6% -45.2% 2990 vmstat.system.cs 5999 ± 4% -53.5% 2790 vmstat.system.in 3.79 ± 19% -2.3 1.52 ± 3% mpstat.cpu.all.idle% 0.02 ± 23% -0.0 0.01 ± 3% mpstat.cpu.all.irq% 0.01 ± 4% -0.0 0.00 mpstat.cpu.all.soft% 0.25 ± 6% -0.2 0.05 ± 4% mpstat.cpu.all.sys% 0.40 ± 5% -0.4 0.03 ± 7% mpstat.cpu.all.usr% 6.44 ± 5% -13.8% 5.55 ± 3% mpstat.max_utilization_pct 1991 ± 14% -68.8% 621.17 ± 9% stress-ng.metamix.ops 23.12 ± 10% -98.4% 0.37 ± 7% stress-ng.metamix.ops_per_sec 87.77 ± 6% +1831.7% 1695 ± 3% stress-ng.time.elapsed_time 87.77 ± 6% +1831.7% 1695 ± 3% stress-ng.time.elapsed_time.max 120134 ± 2% -5.1% 114001 stress-ng.time.minor_page_faults 5.67 ± 8% -82.4% 1.00 stress-ng.time.percent_of_cpu_this_job_got 4.90 ± 7% +331.2% 21.13 ± 4% stress-ng.time.system_time 63630 ± 9% +1332.9% 911761 ± 3% stress-ng.time.voluntary_context_switches 25272 ± 6% +100.3% 50631 meminfo.Active 20787 ± 10% +96.7% 40898 ± 2% meminfo.Active(anon) 4485 ± 15% +117.0% 9733 meminfo.Active(file) 207516 ± 6% +216.6% 656984 ± 10% meminfo.AnonHugePages 377749 ± 4% +31.5% 496804 meminfo.Dirty 1808866 -20.6% 1436964 meminfo.Inactive 902066 -43.7% 508085 meminfo.Inactive(file) 6425133 -10.6% 5746563 meminfo.Memused 11.47 +17676.6% 2038 meminfo.Mlocked 102534 +22.7% 125834 meminfo.Shmem 119495 -95.0% 5925 ± 3% meminfo.Writeback 7239789 -14.4% 6193951 meminfo.max_used_kB 2230 ± 16% +72.8% 3853 ± 5% numa-meminfo.node0.Active(file) 84262 ± 27% +532.6% 533075 ± 25% numa-meminfo.node0.AnonHugePages 401080 ± 65% +81.0% 726026 ± 10% numa-meminfo.node0.AnonPages.max 450689 ± 3% -59.7% 181817 ± 4% numa-meminfo.node0.Inactive(file) 15857 ± 4% -12.5% 13880 ± 4% numa-meminfo.node0.KernelStack 5.73 ±100% +14883.5% 858.83 ± 96% numa-meminfo.node0.Mlocked 59744 ± 3% -96.6% 2047 ± 4% numa-meminfo.node0.Writeback 16267 ± 11% +121.2% 35987 ± 16% numa-meminfo.node1.Active 14010 ± 13% +114.9% 30108 ± 20% numa-meminfo.node1.Active(anon) 2257 ± 15% +160.4% 5879 ± 4% numa-meminfo.node1.Active(file) 188486 ± 7% +69.0% 318533 ± 3% numa-meminfo.node1.Dirty 956395 ± 30% -43.4% 541330 ± 20% numa-meminfo.node1.Inactive 452002 ± 5% -27.8% 326319 ± 3% numa-meminfo.node1.Inactive(file) 150109 ± 13% +16.7% 175200 ± 5% numa-meminfo.node1.Slab 59749 ± 4% -93.5% 3891 ± 5% numa-meminfo.node1.Writeback 556.82 ± 16% +73.0% 963.40 ± 5% numa-vmstat.node0.nr_active_file 41.21 ± 27% +531.7% 260.29 ± 25% numa-vmstat.node0.nr_anon_transparent_hugepages 379768 ± 11% -23.8% 289236 ± 3% numa-vmstat.node0.nr_dirtied 112900 ± 2% -59.7% 45451 ± 4% numa-vmstat.node0.nr_inactive_file 15877 ± 4% -12.6% 13883 ± 3% numa-vmstat.node0.nr_kernel_stack 1.44 ±100% +14824.7% 214.67 ± 96% numa-vmstat.node0.nr_mlock 14977 ± 3% -96.6% 512.83 ± 4% numa-vmstat.node0.nr_writeback 379768 ± 11% -25.6% 282645 ± 3% numa-vmstat.node0.nr_written 556.84 ± 16% +73.0% 963.40 ± 5% numa-vmstat.node0.nr_zone_active_file 112900 ± 2% -59.7% 45451 ± 4% numa-vmstat.node0.nr_zone_inactive_file 62482 ± 3% -27.8% 45088 ± 4% numa-vmstat.node0.nr_zone_write_pending 625814 ± 10% +143.0% 1520756 ± 17% numa-vmstat.node0.numa_hit 607109 ± 10% +145.4% 1489543 ± 18% numa-vmstat.node0.numa_local 3496 ± 13% +115.3% 7527 ± 20% numa-vmstat.node1.nr_active_anon 563.82 ± 16% +160.7% 1469 ± 4% numa-vmstat.node1.nr_active_file 380179 ± 9% +38.2% 525240 ± 5% numa-vmstat.node1.nr_dirtied 47231 ± 7% +68.6% 79622 ± 3% numa-vmstat.node1.nr_dirty 113239 ± 5% -28.0% 81571 ± 3% numa-vmstat.node1.nr_inactive_file 14977 ± 4% -93.5% 974.17 ± 5% numa-vmstat.node1.nr_writeback 380179 ± 9% +35.0% 513207 ± 5% numa-vmstat.node1.nr_written 3496 ± 13% +115.3% 7527 ± 20% numa-vmstat.node1.nr_zone_active_anon 563.82 ± 16% +160.7% 1469 ± 4% numa-vmstat.node1.nr_zone_active_file 113239 ± 5% -28.0% 81571 ± 3% numa-vmstat.node1.nr_zone_inactive_file 62209 ± 6% +29.6% 80597 ± 3% numa-vmstat.node1.nr_zone_write_pending 583795 ± 4% +263.5% 2121826 ± 10% numa-vmstat.node1.numa_hit 535988 ± 4% +289.3% 2086491 ± 10% numa-vmstat.node1.numa_local 5190 ± 10% +97.0% 10224 ± 2% proc-vmstat.nr_active_anon 1122 ± 14% +116.7% 2433 proc-vmstat.nr_active_file 208668 +2.2% 213362 proc-vmstat.nr_anon_pages 101.32 ± 6% +216.6% 320.79 ± 10% proc-vmstat.nr_anon_transparent_hugepages 94628 ± 4% +31.3% 124208 proc-vmstat.nr_dirty 1051112 -8.7% 959304 proc-vmstat.nr_file_pages 226820 +2.4% 232205 proc-vmstat.nr_inactive_anon 225925 -43.8% 127025 proc-vmstat.nr_inactive_file 29400 -6.6% 27458 proc-vmstat.nr_kernel_stack 2.88 +17612.4% 509.76 proc-vmstat.nr_mlock 22780 -6.0% 21412 ± 3% proc-vmstat.nr_page_table_pages 25696 ± 2% +22.4% 31461 proc-vmstat.nr_shmem 26966 +2.2% 27573 proc-vmstat.nr_slab_reclaimable 63926 +2.0% 65209 proc-vmstat.nr_slab_unreclaimable 29903 -95.0% 1484 ± 3% proc-vmstat.nr_writeback 5190 ± 10% +97.0% 10224 ± 2% proc-vmstat.nr_zone_active_anon 1122 ± 14% +116.7% 2433 proc-vmstat.nr_zone_active_file 226820 +2.4% 232205 proc-vmstat.nr_zone_inactive_anon 225925 -43.8% 127025 proc-vmstat.nr_zone_inactive_file 1215783 ± 6% +199.9% 3646420 ± 3% proc-vmstat.numa_hit 1149305 ± 7% +211.5% 3579877 ± 3% proc-vmstat.numa_local 89633 ± 7% +349.7% 403114 ± 3% proc-vmstat.pgactivate 1312602 ± 6% +200.5% 3944776 ± 3% proc-vmstat.pgalloc_normal 415149 ± 3% +897.4% 4140633 ± 3% proc-vmstat.pgfault 1306213 ± 6% +200.3% 3923139 ± 3% proc-vmstat.pgfree 16256 ± 3% +1057.0% 188090 ± 3% proc-vmstat.pgreuse 1.82 ± 4% -9.6% 1.65 perf-stat.i.MPKI 3.76e+08 ± 6% -89.1% 40978627 ± 4% perf-stat.i.branch-instructions 2.30 ± 3% -0.6 1.66 perf-stat.i.branch-miss-rate% 17129191 ± 7% -93.0% 1196180 ± 7% perf-stat.i.branch-misses 16.84 ± 3% -6.5 10.36 perf-stat.i.cache-miss-rate% 1341312 ± 4% -81.2% 251609 perf-stat.i.cache-misses 7592299 ± 3% -73.3% 2030865 perf-stat.i.cache-references 5488 ± 6% -45.6% 2986 perf-stat.i.context-switches 1.35 ± 4% +8.7% 1.47 perf-stat.i.cpi 1.867e+09 ± 5% -86.0% 2.607e+08 ± 6% perf-stat.i.cpu-cycles 199.44 ± 2% -59.4% 80.94 perf-stat.i.cpu-migrations 1655 ± 13% -41.4% 969.67 perf-stat.i.cycles-between-cache-misses 1.841e+09 ± 6% -89.2% 1.994e+08 ± 4% perf-stat.i.instructions 0.87 -18.5% 0.71 perf-stat.i.ipc 0.57 ± 40% -97.6% 0.01 ± 43% perf-stat.i.major-faults 0.01 ±141% +30993.6% 2.48 ± 5% perf-stat.i.metric.K/sec 3665 ± 3% -36.0% 2345 perf-stat.i.minor-faults 3666 ± 3% -36.0% 2345 perf-stat.i.page-faults 0.73 ± 5% +72.7% 1.26 ± 4% perf-stat.overall.MPKI 4.55 -1.6 2.92 ± 3% perf-stat.overall.branch-miss-rate% 17.66 ± 2% -5.3 12.39 perf-stat.overall.cache-miss-rate% 1.02 ± 5% +28.1% 1.31 ± 3% perf-stat.overall.cpi 1397 ± 5% -25.8% 1036 ± 5% perf-stat.overall.cycles-between-cache-misses 0.98 ± 5% -22.1% 0.77 ± 3% perf-stat.overall.ipc 3.728e+08 ± 6% -89.0% 41014226 ± 4% perf-stat.ps.branch-instructions 16957001 ± 7% -92.9% 1198002 ± 7% perf-stat.ps.branch-misses 1332233 ± 4% -81.1% 251650 perf-stat.ps.cache-misses 7543069 ± 3% -73.1% 2030809 perf-stat.ps.cache-references 5443 ± 6% -45.2% 2985 perf-stat.ps.context-switches 63285 +1.1% 63962 perf-stat.ps.cpu-clock 1.859e+09 ± 5% -86.0% 2.611e+08 ± 6% perf-stat.ps.cpu-cycles 198.47 ± 3% -59.2% 80.91 perf-stat.ps.cpu-migrations 1.826e+09 ± 6% -89.1% 1.996e+08 ± 4% perf-stat.ps.instructions 0.58 ± 41% -97.6% 0.01 ± 43% perf-stat.ps.major-faults 3640 ± 3% -35.6% 2344 perf-stat.ps.minor-faults 3640 ± 3% -35.6% 2344 perf-stat.ps.page-faults 63285 +1.1% 63962 perf-stat.ps.task-clock 1.637e+11 ± 5% +106.8% 3.387e+11 ± 4% perf-stat.total.instructions 0.01 ± 31% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.iomap_writepage_map_blocks.iomap_writepage_map 0.01 ± 11% -100.0% 0.00 perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra 0.00 ± 16% -100.0% 0.00 perf-sched.sch_delay.avg.ms.schedule_timeout.xfsaild.kthread.ret_from_fork 0.00 ± 17% -100.0% 0.00 perf-sched.sch_delay.avg.ms.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync 0.01 ± 11% -100.0% 0.00 perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 0.00 ± 12% -100.0% 0.00 perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync 0.01 ± 8% -100.0% 0.00 perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread 0.02 ±135% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.iomap_writepage_map_blocks.iomap_writepage_map 0.14 ± 79% -88.5% 0.02 ±103% perf-sched.sch_delay.max.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle 0.02 ± 77% -67.9% 0.01 ± 5% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 0.02 ± 38% -100.0% 0.00 perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra 0.01 ± 44% -100.0% 0.00 perf-sched.sch_delay.max.ms.schedule_timeout.xfsaild.kthread.ret_from_fork 0.00 ± 26% -100.0% 0.00 perf-sched.sch_delay.max.ms.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync 0.01 ±103% -100.0% 0.00 perf-sched.sch_delay.max.ms.xlog_wait_on_iclog.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 0.00 ± 11% -100.0% 0.00 perf-sched.sch_delay.max.ms.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync 0.01 ± 11% -100.0% 0.00 perf-sched.sch_delay.max.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread 8119 ± 9% -50.9% 3990 ± 15% perf-sched.total_wait_and_delay.count.ms 765.01 ± 48% -100.0% 0.01 perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range 8.46 ± 6% +27.6% 10.79 ± 16% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 50.66 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.xfsaild.kthread.ret_from_fork 134.86 ± 13% +158.8% 348.98 ± 18% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 65.33 ± 70% +353.1% 296.00 ± 64% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range 1366 ± 18% -78.6% 293.00 ± 64% perf-sched.wait_and_delay.count.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle 97.17 -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.xfsaild.kthread.ret_from_fork 3007 ± 18% -77.6% 674.17 ± 29% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 2858 ± 50% -100.0% 0.04 ± 72% perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range 103.17 ± 10% +73.5% 179.00 ± 24% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 59.66 ± 10% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.xfsaild.kthread.ret_from_fork 765.00 ± 48% -100.0% 0.00 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range 93.17 ± 32% -100.0% 0.00 perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra 8.45 ± 6% +27.6% 10.78 ± 16% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 50.65 -100.0% 0.00 perf-sched.wait_time.avg.ms.schedule_timeout.xfsaild.kthread.ret_from_fork 134.64 ± 13% +158.9% 348.54 ± 18% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.01 ± 22% -100.0% 0.00 perf-sched.wait_time.avg.ms.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync 202.39 ± 62% -100.0% 0.00 perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 248.16 ± 50% -100.0% 0.00 perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync 219.73 ± 46% -100.0% 0.00 perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread 2858 ± 50% -100.0% 0.00 perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range 468.55 ± 29% -100.0% 0.00 perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra 103.16 ± 10% +73.5% 179.00 ± 24% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 59.65 ± 10% -100.0% 0.00 perf-sched.wait_time.max.ms.schedule_timeout.xfsaild.kthread.ret_from_fork 0.02 ± 25% -100.0% 0.00 perf-sched.wait_time.max.ms.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync 348.84 ± 58% -100.0% 0.00 perf-sched.wait_time.max.ms.xlog_wait_on_iclog.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 486.25 ± 23% -100.0% 0.00 perf-sched.wait_time.max.ms.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync 473.19 ± 38% -100.0% 0.00 perf-sched.wait_time.max.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread 7346 ± 4% +23.0% 9033 ± 7% sched_debug.cfs_rq:/.avg_vruntime.avg 848.55 ± 8% +123.3% 1894 ± 11% sched_debug.cfs_rq:/.avg_vruntime.min 0.14 ± 14% -71.2% 0.04 ± 2% sched_debug.cfs_rq:/.h_nr_running.avg 0.32 ± 9% -41.3% 0.19 ± 2% sched_debug.cfs_rq:/.h_nr_running.stddev 139317 ±191% -88.9% 15437 ± 4% sched_debug.cfs_rq:/.load.avg 8499863 ±201% -92.8% 613636 ± 4% sched_debug.cfs_rq:/.load.max 247.33 ± 32% -85.6% 35.54 ± 46% sched_debug.cfs_rq:/.load_avg.avg 632.63 ± 71% -74.1% 163.67 ± 68% sched_debug.cfs_rq:/.load_avg.stddev 7346 ± 4% +23.0% 9033 ± 7% sched_debug.cfs_rq:/.min_vruntime.avg 848.55 ± 8% +123.3% 1894 ± 11% sched_debug.cfs_rq:/.min_vruntime.min 0.14 ± 14% -71.2% 0.04 ± 2% sched_debug.cfs_rq:/.nr_running.avg 0.32 ± 9% -41.3% 0.19 ± 2% sched_debug.cfs_rq:/.nr_running.stddev 62.04 ± 22% -89.5% 6.51 ± 42% sched_debug.cfs_rq:/.removed.load_avg.avg 597.33 ± 31% -63.0% 220.77 ± 53% sched_debug.cfs_rq:/.removed.load_avg.max 171.33 ± 9% -80.1% 34.06 ± 47% sched_debug.cfs_rq:/.removed.load_avg.stddev 18.96 ± 25% -85.8% 2.70 ± 45% sched_debug.cfs_rq:/.removed.runnable_avg.avg 308.25 ± 31% -64.0% 111.09 ± 54% sched_debug.cfs_rq:/.removed.runnable_avg.max 60.15 ± 17% -74.0% 15.67 ± 51% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 18.96 ± 25% -85.8% 2.70 ± 45% sched_debug.cfs_rq:/.removed.util_avg.avg 308.25 ± 31% -64.0% 111.09 ± 54% sched_debug.cfs_rq:/.removed.util_avg.max 60.15 ± 17% -74.0% 15.67 ± 51% sched_debug.cfs_rq:/.removed.util_avg.stddev 237.66 ± 5% -85.8% 33.72 ± 3% sched_debug.cfs_rq:/.runnable_avg.avg 1104 ± 12% -45.6% 600.94 ± 2% sched_debug.cfs_rq:/.runnable_avg.max 226.39 ± 5% -57.8% 95.63 ± 2% sched_debug.cfs_rq:/.runnable_avg.stddev 236.68 ± 5% -85.9% 33.42 ± 4% sched_debug.cfs_rq:/.util_avg.avg 1103 ± 12% -46.3% 591.99 sched_debug.cfs_rq:/.util_avg.max 226.37 ± 5% -58.2% 94.62 ± 2% sched_debug.cfs_rq:/.util_avg.stddev 18.18 ± 24% -83.7% 2.96 ± 17% sched_debug.cfs_rq:/.util_est.avg 338.67 ± 13% -67.8% 109.03 ± 15% sched_debug.cfs_rq:/.util_est.max 67.93 ± 17% -76.8% 15.79 ± 14% sched_debug.cfs_rq:/.util_est.stddev 803668 ± 2% +18.9% 955672 sched_debug.cpu.avg_idle.avg 198446 ± 5% -41.6% 115938 ± 11% sched_debug.cpu.avg_idle.stddev 74790 +1064.4% 870887 ± 3% sched_debug.cpu.clock.avg 74795 +1064.4% 870893 ± 3% sched_debug.cpu.clock.max 74786 +1064.5% 870879 ± 3% sched_debug.cpu.clock.min 2.48 ± 4% +26.3% 3.14 ± 8% sched_debug.cpu.clock.stddev 74525 +1068.1% 870538 ± 3% sched_debug.cpu.clock_task.avg 74776 +1064.6% 870850 ± 3% sched_debug.cpu.clock_task.max 66679 +1193.0% 862189 ± 3% sched_debug.cpu.clock_task.min 339.14 ± 15% +27.9% 433.69 ± 4% sched_debug.cpu.curr->pid.avg 4069 +498.1% 24338 ± 3% sched_debug.cpu.curr->pid.max 895.15 ± 7% +243.7% 3076 ± 3% sched_debug.cpu.curr->pid.stddev 0.14 ± 15% -74.6% 0.04 ± 8% sched_debug.cpu.nr_running.avg 0.31 ± 11% -44.7% 0.17 ± 6% sched_debug.cpu.nr_running.stddev 4679 ± 4% +794.9% 41879 ± 4% sched_debug.cpu.nr_switches.avg 27295 ± 11% +571.5% 183299 ± 4% sched_debug.cpu.nr_switches.max 1238 ± 20% +985.0% 13436 ± 12% sched_debug.cpu.nr_switches.min 4550 ± 12% +581.7% 31022 ± 7% sched_debug.cpu.nr_switches.stddev 8.07 +60.9% 12.99 ± 2% sched_debug.cpu.nr_uninterruptible.avg 35.50 ± 10% +49.5% 53.06 ± 14% sched_debug.cpu.nr_uninterruptible.max 10.63 ± 7% +33.5% 14.19 ± 12% sched_debug.cpu.nr_uninterruptible.stddev 74787 +1064.5% 870884 ± 3% sched_debug.cpu_clk 73624 +1081.3% 869721 ± 3% sched_debug.ktime 75383 +1056.2% 871587 ± 3% sched_debug.sched_clk 12.22 ± 39% -12.2 0.00 perf-profile.calltrace.cycles-pp.fdatasync.stress_metamix 12.15 ± 39% -12.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fdatasync.stress_metamix 12.15 ± 39% -12.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fdatasync.stress_metamix 12.02 ± 40% -12.0 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_fdatasync.do_syscall_64.entry_SYSCALL_64_after_hwframe.fdatasync.stress_metamix 12.02 ± 40% -12.0 0.00 perf-profile.calltrace.cycles-pp.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64.entry_SYSCALL_64_after_hwframe.fdatasync 9.67 ± 50% -9.2 0.44 ±112% perf-profile.calltrace.cycles-pp.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64 8.34 ± 50% -7.9 0.41 ±108% perf-profile.calltrace.cycles-pp.iomap_write_iter.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write.ksys_write 7.12 ± 30% -7.1 0.00 perf-profile.calltrace.cycles-pp.file_write_and_wait_range.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.81 ± 32% -6.8 0.00 perf-profile.calltrace.cycles-pp.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 6.81 ± 32% -6.8 0.00 perf-profile.calltrace.cycles-pp.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fsync.__x64_sys_fdatasync 6.80 ± 32% -6.8 0.00 perf-profile.calltrace.cycles-pp.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fsync 6.80 ± 32% -6.8 0.00 perf-profile.calltrace.cycles-pp.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.file_write_and_wait_range 6.44 ± 49% -6.3 0.12 ±223% perf-profile.calltrace.cycles-pp.iomap_write_begin.iomap_write_iter.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write 5.25 ± 45% -5.2 0.00 perf-profile.calltrace.cycles-pp.iomap_writepage_map.iomap_writepages.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc 4.70 ± 47% -4.7 0.00 perf-profile.calltrace.cycles-pp.read.stress_metamix 4.69 ± 46% -4.7 0.00 perf-profile.calltrace.cycles-pp.iomap_writepage_map_blocks.iomap_writepage_map.iomap_writepages.xfs_vm_writepages.do_writepages 4.68 ± 63% -4.7 0.00 perf-profile.calltrace.cycles-pp.unlink.stress_metamix 4.66 ± 63% -4.7 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink.stress_metamix 4.66 ± 63% -4.7 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink.stress_metamix 4.64 ± 63% -4.6 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink.stress_metamix 4.64 ± 63% -4.6 0.00 perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink 4.12 ± 48% -4.1 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read.stress_metamix 4.08 ± 48% -4.1 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.stress_metamix 4.01 ± 64% -4.0 0.00 perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe 4.00 ± 64% -4.0 0.00 perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64 3.96 ± 48% -4.0 0.00 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.stress_metamix 3.88 ± 52% -3.9 0.00 perf-profile.calltrace.cycles-pp.iomap_submit_ioend.iomap_writepage_map_blocks.iomap_writepage_map.iomap_writepages.xfs_vm_writepages 3.87 ± 53% -3.9 0.00 perf-profile.calltrace.cycles-pp.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepage_map_blocks.iomap_writepage_map.iomap_writepages 3.86 ± 53% -3.9 0.00 perf-profile.calltrace.cycles-pp.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepage_map_blocks.iomap_writepage_map 6.96 ± 11% -3.3 3.62 ± 29% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 6.29 ± 14% -3.1 3.16 ± 28% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.65 ± 64% +0.5 1.12 ± 25% perf-profile.calltrace.cycles-pp._nohz_idle_balance.handle_softirqs.__irq_exit_rcu.sysvec_call_function_single.asm_sysvec_call_function_single 0.22 ±141% +0.6 0.85 ± 28% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.11 ±223% +0.6 0.74 ± 28% perf-profile.calltrace.cycles-pp.pipe_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.22 ±141% +0.7 0.88 ± 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.22 ±141% +0.7 0.88 ± 27% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 0.50 ± 76% +0.7 1.15 ± 25% perf-profile.calltrace.cycles-pp.__sysvec_posted_msi_notification.sysvec_posted_msi_notification.asm_sysvec_posted_msi_notification.acpi_safe_halt.acpi_idle_enter 0.22 ±142% +0.7 0.94 ± 25% perf-profile.calltrace.cycles-pp.__open64_nocancel.setlocale 0.27 ±141% +0.8 1.07 ± 28% perf-profile.calltrace.cycles-pp.write 0.33 ±102% +0.8 1.14 ± 22% perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.22 ±145% +0.8 1.05 ± 24% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 0.43 ±108% +0.9 1.35 ± 24% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.16 ±223% +1.0 1.17 ± 36% perf-profile.calltrace.cycles-pp.blk_mq_submit_bio.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepages 0.15 ±223% +1.1 1.20 ± 38% perf-profile.calltrace.cycles-pp.blk_mq_dispatch_rq_list.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue 0.66 ± 98% +1.1 1.76 ± 33% perf-profile.calltrace.cycles-pp.cmd_stat.run_builtin.handle_internal_command.main 0.66 ± 98% +1.1 1.76 ± 33% perf-profile.calltrace.cycles-pp.dispatch_events.cmd_stat.run_builtin.handle_internal_command.main 0.66 ± 98% +1.1 1.76 ± 33% perf-profile.calltrace.cycles-pp.process_interval.dispatch_events.cmd_stat.run_builtin.handle_internal_command 0.54 ±119% +1.1 1.68 ± 34% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.exit_mm.do_exit.do_group_exit 0.54 ±118% +1.2 1.70 ± 33% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 0.54 ±118% +1.2 1.70 ± 33% perf-profile.calltrace.cycles-pp.mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group 1.56 ± 19% +1.2 2.73 ± 22% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +1.2 1.18 ± 40% perf-profile.calltrace.cycles-pp.scsi_queue_rq.blk_mq_dispatch_rq_list.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests 0.45 ±148% +1.2 1.66 ± 33% perf-profile.calltrace.cycles-pp.read_counters.process_interval.dispatch_events.cmd_stat.run_builtin 0.72 ± 98% +1.3 1.99 ± 32% perf-profile.calltrace.cycles-pp.handle_internal_command.main 0.72 ± 98% +1.3 1.99 ± 32% perf-profile.calltrace.cycles-pp.main 0.72 ± 98% +1.3 1.99 ± 32% perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main 0.48 ±151% +1.3 1.76 ± 32% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.80 ± 20% +1.3 3.10 ± 23% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.82 ± 20% +1.3 3.12 ± 23% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 0.64 ±114% +1.4 2.00 ± 32% perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.64 ±114% +1.4 2.00 ± 32% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64 0.64 ±114% +1.4 2.00 ± 32% perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.52 ±109% +1.4 1.88 ± 27% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.53 ±109% +1.4 1.92 ± 28% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.93 ± 83% +1.4 2.32 ± 32% perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.91 ± 89% +1.5 2.41 ± 42% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.71 ±110% +1.6 2.32 ± 42% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64 0.66 ±112% +1.7 2.35 ± 30% perf-profile.calltrace.cycles-pp.setlocale 2.08 ± 26% +1.7 3.82 ± 29% perf-profile.calltrace.cycles-pp.asm_exc_page_fault 1.05 ± 78% +1.9 2.94 ± 28% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 1.46 ± 49% +1.9 3.35 ± 19% perf-profile.calltrace.cycles-pp.asm_sysvec_posted_msi_notification.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter 1.10 ± 78% +1.9 3.02 ± 28% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read 1.10 ± 78% +1.9 3.02 ± 28% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 1.30 ± 69% +2.0 3.28 ± 32% perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common 1.32 ± 68% +2.0 3.30 ± 31% perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64 1.32 ± 68% +2.0 3.30 ± 31% perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve 1.43 ± 70% +2.0 3.45 ± 30% perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.11 ± 78% +2.1 3.24 ± 27% perf-profile.calltrace.cycles-pp.read 2.01 ± 71% +3.0 4.99 ± 22% perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve 2.01 ± 71% +3.0 5.00 ± 22% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve 2.01 ± 71% +3.0 5.00 ± 22% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve 2.01 ± 71% +3.0 5.00 ± 22% perf-profile.calltrace.cycles-pp.execve 2.00 ± 70% +3.0 4.99 ± 22% perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve 1.01 ± 54% +3.3 4.27 ± 31% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dd_dispatch_request.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests 1.01 ± 54% +3.4 4.36 ± 30% perf-profile.calltrace.cycles-pp._raw_spin_lock.dd_dispatch_request.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests 1.05 ± 55% +3.4 4.47 ± 31% perf-profile.calltrace.cycles-pp.dd_dispatch_request.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue 2.33 ± 39% +4.0 6.29 ± 28% perf-profile.calltrace.cycles-pp.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_dispatch_plug_list.blk_mq_flush_plug_list 2.33 ± 39% +4.0 6.31 ± 28% perf-profile.calltrace.cycles-pp.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_dispatch_plug_list.blk_mq_flush_plug_list.__blk_flush_plug 2.35 ± 39% +4.0 6.35 ± 27% perf-profile.calltrace.cycles-pp.blk_mq_run_hw_queue.blk_mq_dispatch_plug_list.blk_mq_flush_plug_list.__blk_flush_plug.__submit_bio 1.71 ± 50% +4.5 6.23 ± 29% perf-profile.calltrace.cycles-pp.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_dispatch_plug_list 3.24 ± 66% +4.8 8.01 ± 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.24 ± 66% +4.8 8.04 ± 27% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 3.55 ± 40% +5.2 8.77 ± 29% perf-profile.calltrace.cycles-pp.blk_mq_dispatch_plug_list.blk_mq_flush_plug_list.__blk_flush_plug.__submit_bio.submit_bio_noacct_nocheck 3.55 ± 40% +5.2 8.77 ± 29% perf-profile.calltrace.cycles-pp.blk_mq_flush_plug_list.__blk_flush_plug.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend 3.98 ± 55% +8.1 12.11 ± 30% perf-profile.calltrace.cycles-pp.xfs_file_write_checks.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64 0.56 ±164% +8.2 8.77 ± 29% perf-profile.calltrace.cycles-pp.__blk_flush_plug.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepages 3.33 ± 56% +8.3 11.60 ± 31% perf-profile.calltrace.cycles-pp.iomap_zero_range.xfs_file_write_checks.xfs_file_buffered_write.vfs_write.ksys_write 0.91 ±108% +9.0 9.94 ± 29% perf-profile.calltrace.cycles-pp.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepages.xfs_vm_writepages 0.91 ±108% +9.0 9.94 ± 29% perf-profile.calltrace.cycles-pp.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepages.xfs_vm_writepages.do_writepages 0.92 ±107% +9.0 9.95 ± 29% perf-profile.calltrace.cycles-pp.iomap_submit_ioend.iomap_writepages.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc 0.00 +10.3 10.29 ± 30% perf-profile.calltrace.cycles-pp.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.filemap_write_and_wait_range.iomap_zero_range 0.00 +10.3 10.29 ± 30% perf-profile.calltrace.cycles-pp.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.filemap_write_and_wait_range 0.00 +10.4 10.40 ± 30% perf-profile.calltrace.cycles-pp.__filemap_fdatawrite_range.filemap_write_and_wait_range.iomap_zero_range.xfs_file_write_checks.xfs_file_buffered_write 0.00 +10.4 10.40 ± 30% perf-profile.calltrace.cycles-pp.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.filemap_write_and_wait_range.iomap_zero_range.xfs_file_write_checks 0.00 +11.2 11.22 ± 30% perf-profile.calltrace.cycles-pp.filemap_write_and_wait_range.iomap_zero_range.xfs_file_write_checks.xfs_file_buffered_write.vfs_write 12.24 ± 39% -12.2 0.00 perf-profile.children.cycles-pp.fdatasync 12.02 ± 40% -12.0 0.00 perf-profile.children.cycles-pp.__x64_sys_fdatasync 12.02 ± 40% -12.0 0.00 perf-profile.children.cycles-pp.xfs_file_fsync 9.68 ± 50% -9.1 0.57 ± 69% perf-profile.children.cycles-pp.iomap_file_buffered_write 8.37 ± 50% -7.8 0.53 ± 66% perf-profile.children.cycles-pp.iomap_write_iter 7.12 ± 30% -7.1 0.00 perf-profile.children.cycles-pp.file_write_and_wait_range 6.48 ± 49% -6.1 0.35 ± 65% perf-profile.children.cycles-pp.iomap_write_begin 5.25 ± 45% -5.0 0.22 ± 74% perf-profile.children.cycles-pp.iomap_writepage_map 4.74 ± 60% -4.7 0.00 perf-profile.children.cycles-pp.unlink 4.69 ± 60% -4.7 0.00 perf-profile.children.cycles-pp.__x64_sys_unlink 4.69 ± 60% -4.7 0.00 perf-profile.children.cycles-pp.do_unlinkat 4.70 ± 46% -4.6 0.14 ± 97% perf-profile.children.cycles-pp.iomap_writepage_map_blocks 4.31 ± 55% -4.2 0.16 ±108% perf-profile.children.cycles-pp.iomap_iter 4.13 ± 61% -4.1 0.00 perf-profile.children.cycles-pp.truncate_inode_pages_range 4.06 ± 61% -4.1 0.01 ±223% perf-profile.children.cycles-pp.evict 3.90 ± 48% -3.8 0.11 ± 82% perf-profile.children.cycles-pp.__iomap_write_begin 3.82 ± 56% -3.7 0.08 ± 80% perf-profile.children.cycles-pp.xfs_buffered_write_iomap_begin 3.42 ± 47% -3.4 0.07 ±111% perf-profile.children.cycles-pp.zero_user_segments 6.97 ± 11% -3.4 3.62 ± 29% perf-profile.children.cycles-pp.worker_thread 3.45 ± 46% -3.2 0.24 ± 35% perf-profile.children.cycles-pp.memset_orig 3.25 ± 45% -3.1 0.11 ± 88% perf-profile.children.cycles-pp.filemap_read 6.29 ± 14% -3.1 3.16 ± 28% perf-profile.children.cycles-pp.process_one_work 3.18 ± 54% -2.6 0.59 ± 50% perf-profile.children.cycles-pp.folios_put_refs 2.46 ± 67% -2.4 0.10 ± 65% perf-profile.children.cycles-pp.__page_cache_release 2.51 ± 50% -2.2 0.28 ± 52% perf-profile.children.cycles-pp.__filemap_get_folio 1.58 ± 51% -1.5 0.10 ± 84% perf-profile.children.cycles-pp.filemap_add_folio 1.52 ± 45% -1.4 0.10 ±119% perf-profile.children.cycles-pp.copy_page_to_iter 1.48 ± 44% -1.4 0.12 ± 75% perf-profile.children.cycles-pp._copy_to_iter 2.55 ± 16% -1.1 1.41 ± 36% perf-profile.children.cycles-pp.pick_next_task_fair 1.22 ± 24% -1.1 0.10 ± 71% perf-profile.children.cycles-pp.mod_delayed_work_on 1.16 ± 50% -1.0 0.13 ± 78% perf-profile.children.cycles-pp.open64 0.95 ± 31% -0.9 0.06 ± 84% perf-profile.children.cycles-pp.try_to_grab_pending 0.99 ± 45% -0.9 0.10 ± 71% perf-profile.children.cycles-pp.kblockd_mod_delayed_work_on 0.88 ± 50% -0.8 0.06 ±111% perf-profile.children.cycles-pp.filemap_get_pages 0.81 ± 48% -0.7 0.06 ±113% perf-profile.children.cycles-pp.filemap_get_read_batch 1.08 ± 28% -0.6 0.45 ± 39% perf-profile.children.cycles-pp.clear_bhb_loop 0.70 ± 51% -0.6 0.08 ±125% perf-profile.children.cycles-pp.folio_alloc_noprof 0.66 ± 53% -0.6 0.06 ± 84% perf-profile.children.cycles-pp.__filemap_add_folio 0.59 ± 32% -0.6 0.04 ±100% perf-profile.children.cycles-pp.xfs_map_blocks 0.88 ± 14% -0.5 0.38 ± 51% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio 0.58 ± 38% -0.5 0.13 ± 81% perf-profile.children.cycles-pp.writeback_iter 0.71 ± 29% -0.4 0.26 ± 51% perf-profile.children.cycles-pp.entry_SYSCALL_64 0.52 ± 43% -0.4 0.12 ± 91% perf-profile.children.cycles-pp.writeback_get_folio 0.44 ± 53% -0.4 0.06 ±130% perf-profile.children.cycles-pp.__folio_start_writeback 0.49 ± 20% -0.4 0.12 ± 62% perf-profile.children.cycles-pp.percpu_counter_add_batch 0.35 ± 38% -0.3 0.08 ± 54% perf-profile.children.cycles-pp.touch_atime 0.63 ± 23% -0.3 0.37 ± 66% perf-profile.children.cycles-pp.get_page_from_freelist 0.47 ± 33% -0.3 0.21 ± 79% perf-profile.children.cycles-pp.xas_load 0.30 ± 35% -0.2 0.09 ± 86% perf-profile.children.cycles-pp.rmqueue 0.24 ± 29% -0.2 0.05 ± 71% perf-profile.children.cycles-pp.__mod_node_page_state 0.13 ± 22% -0.1 0.04 ±101% perf-profile.children.cycles-pp.lock_timer_base 0.16 ± 31% -0.1 0.08 ± 85% perf-profile.children.cycles-pp.xas_find 0.16 ± 22% -0.1 0.08 ± 78% perf-profile.children.cycles-pp.xfs_trans_reserve 0.05 ± 78% +0.1 0.12 ± 37% perf-profile.children.cycles-pp.hrtimer_try_to_cancel 0.00 +0.1 0.08 ± 10% perf-profile.children.cycles-pp.change_protection_range 0.13 ± 33% +0.1 0.21 ± 38% perf-profile.children.cycles-pp.wakeup_preempt 0.05 ±108% +0.1 0.14 ± 24% perf-profile.children.cycles-pp.perf_event_read 0.05 ±108% +0.1 0.14 ± 24% perf-profile.children.cycles-pp.smp_call_function_single 0.04 ±118% +0.1 0.14 ± 37% perf-profile.children.cycles-pp.run_ksoftirqd 0.02 ±143% +0.1 0.12 ± 60% perf-profile.children.cycles-pp.__poll 0.02 ±143% +0.1 0.12 ± 60% perf-profile.children.cycles-pp.__x64_sys_poll 0.02 ±143% +0.1 0.12 ± 60% perf-profile.children.cycles-pp.do_sys_poll 0.01 ±223% +0.1 0.11 ± 48% perf-profile.children.cycles-pp.lockref_put_return 0.02 ±142% +0.1 0.12 ± 60% perf-profile.children.cycles-pp.do_poll 0.07 ± 75% +0.1 0.18 ± 34% perf-profile.children.cycles-pp.switch_fpu_return 0.04 ±112% +0.1 0.15 ± 66% perf-profile.children.cycles-pp.getenv 0.01 ±223% +0.1 0.12 ± 39% perf-profile.children.cycles-pp.folio_putback_lru 0.04 ±114% +0.1 0.16 ± 37% perf-profile.children.cycles-pp.generic_exec_single 0.02 ±223% +0.1 0.14 ± 38% perf-profile.children.cycles-pp.expand_downwards 0.12 ± 44% +0.1 0.24 ± 15% perf-profile.children.cycles-pp.__check_object_size 0.14 ± 29% +0.1 0.27 ± 19% perf-profile.children.cycles-pp.vma_alloc_folio_noprof 0.01 ±223% +0.1 0.14 ± 37% perf-profile.children.cycles-pp.xfsaild 0.01 ±223% +0.1 0.14 ± 37% perf-profile.children.cycles-pp.xfsaild_push 0.01 ±223% +0.1 0.14 ± 39% perf-profile.children.cycles-pp.get_cpu_sleep_time_us 0.06 ± 83% +0.1 0.19 ± 62% perf-profile.children.cycles-pp.__d_add 0.10 ± 92% +0.1 0.24 ± 15% perf-profile.children.cycles-pp.shift_arg_pages 0.00 +0.1 0.14 ± 37% perf-profile.children.cycles-pp.get_idle_time 0.13 ± 37% +0.1 0.27 ± 19% perf-profile.children.cycles-pp.folio_alloc_mpol_noprof 0.11 ± 80% +0.1 0.26 ± 39% perf-profile.children.cycles-pp.mm_init 0.07 ± 49% +0.1 0.22 ± 35% perf-profile.children.cycles-pp.pte_alloc_one 0.22 ± 47% +0.2 0.37 ± 33% perf-profile.children.cycles-pp.scsi_mq_get_budget 0.06 ±103% +0.2 0.21 ± 53% perf-profile.children.cycles-pp.unlink_anon_vmas 0.14 ± 79% +0.2 0.30 ± 30% perf-profile.children.cycles-pp.mas_wr_node_store 0.04 ±107% +0.2 0.20 ± 70% perf-profile.children.cycles-pp.dyntick_save_progress_counter 0.05 ±132% +0.2 0.22 ± 66% perf-profile.children.cycles-pp.sysfs_kf_seq_show 0.05 ±136% +0.2 0.22 ± 66% perf-profile.children.cycles-pp.dev_attr_show 0.07 ± 89% +0.2 0.24 ± 28% perf-profile.children.cycles-pp.__cmd_record 0.07 ± 89% +0.2 0.24 ± 28% perf-profile.children.cycles-pp.cmd_record 0.05 ±125% +0.2 0.22 ± 43% perf-profile.children.cycles-pp.move_queued_task 0.08 ± 14% +0.2 0.26 ± 51% perf-profile.children.cycles-pp.flush_smp_call_function_queue 0.07 ±127% +0.2 0.25 ± 31% perf-profile.children.cycles-pp.__dentry_kill 0.04 ± 77% +0.2 0.22 ± 62% perf-profile.children.cycles-pp.mas_split 0.08 ± 27% +0.2 0.27 ± 47% perf-profile.children.cycles-pp.mas_alloc_nodes 0.14 ± 62% +0.2 0.34 ± 29% perf-profile.children.cycles-pp.create_elf_tables 0.01 ±223% +0.2 0.21 ± 60% perf-profile.children.cycles-pp.__put_user_8 0.12 ± 64% +0.2 0.32 ± 48% perf-profile.children.cycles-pp.seq_printf 0.21 ± 36% +0.2 0.43 ± 37% perf-profile.children.cycles-pp.vfs_statx 0.23 ± 73% +0.2 0.45 ± 38% perf-profile.children.cycles-pp.sbitmap_get 0.15 ±104% +0.2 0.38 ± 52% perf-profile.children.cycles-pp.get_arg_page 0.24 ± 70% +0.2 0.47 ± 42% perf-profile.children.cycles-pp.load_elf_interp 0.08 ± 52% +0.2 0.31 ± 44% perf-profile.children.cycles-pp.__get_user_8 0.13 ± 48% +0.3 0.38 ± 51% perf-profile.children.cycles-pp.rcu_gp_fqs_loop 0.21 ± 66% +0.3 0.46 ± 46% perf-profile.children.cycles-pp.vsnprintf 0.12 ± 61% +0.3 0.38 ± 30% perf-profile.children.cycles-pp.slab_show 0.15 ± 65% +0.3 0.40 ± 50% perf-profile.children.cycles-pp.rep_stos_alternative 0.23 ± 32% +0.3 0.49 ± 32% perf-profile.children.cycles-pp.path_lookupat 0.13 ± 82% +0.3 0.40 ± 50% perf-profile.children.cycles-pp.perf_evsel__read 0.13 ± 62% +0.3 0.40 ± 24% perf-profile.children.cycles-pp.step_into 0.14 ± 70% +0.3 0.42 ± 35% perf-profile.children.cycles-pp.alloc_anon_folio 0.23 ± 32% +0.3 0.51 ± 28% perf-profile.children.cycles-pp.filename_lookup 0.06 ± 88% +0.3 0.34 ± 48% perf-profile.children.cycles-pp.rseq_ip_fixup 0.18 ± 39% +0.3 0.47 ± 38% perf-profile.children.cycles-pp.dput 0.16 ± 43% +0.3 0.45 ± 37% perf-profile.children.cycles-pp.rcu_gp_kthread 0.28 ± 69% +0.3 0.57 ± 32% perf-profile.children.cycles-pp.__vfork 0.24 ± 68% +0.3 0.54 ± 29% perf-profile.children.cycles-pp.__x64_sys_sched_setaffinity 0.09 ± 65% +0.3 0.42 ± 38% perf-profile.children.cycles-pp.__rseq_handle_notify_resume 0.36 ± 40% +0.3 0.68 ± 27% perf-profile.children.cycles-pp.__do_sys_newfstatat 0.23 ± 66% +0.3 0.58 ± 56% perf-profile.children.cycles-pp.free_pgtables 0.32 ± 42% +0.4 0.67 ± 30% perf-profile.children.cycles-pp.tick_irq_enter 0.43 ± 39% +0.4 0.78 ± 19% perf-profile.children.cycles-pp.irq_enter_rcu 0.21 ± 55% +0.4 0.57 ± 41% perf-profile.children.cycles-pp.readn 0.16 ± 33% +0.4 0.52 ± 50% perf-profile.children.cycles-pp.dup_mmap 0.15 ± 77% +0.4 0.51 ± 39% perf-profile.children.cycles-pp.show_stat 0.34 ± 69% +0.4 0.71 ± 42% perf-profile.children.cycles-pp.zap_present_ptes 0.21 ± 67% +0.4 0.60 ± 23% perf-profile.children.cycles-pp.i2c_outb 0.23 ± 79% +0.4 0.62 ± 36% perf-profile.children.cycles-pp.seq_read 0.18 ± 40% +0.4 0.59 ± 46% perf-profile.children.cycles-pp.dup_mm 0.30 ± 67% +0.4 0.72 ± 36% perf-profile.children.cycles-pp.tmigr_handle_remote_up 0.20 ± 65% +0.4 0.63 ± 58% perf-profile.children.cycles-pp.sync_regs 0.25 ± 66% +0.4 0.68 ± 23% perf-profile.children.cycles-pp.try_address 0.27 ± 74% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.output_poll_execute 0.28 ± 70% +0.5 0.74 ± 28% perf-profile.children.cycles-pp.pipe_read 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.__i2c_transfer 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.bit_xfer 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.drm_connector_helper_detect_from_ddc 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.drm_do_probe_ddc_edid 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.drm_helper_probe_detect_ctx 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.drm_probe_ddc 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.i2c_transfer 0.14 ± 61% +0.5 0.61 ± 35% perf-profile.children.cycles-pp.balance_fair 0.23 ± 68% +0.5 0.69 ± 34% perf-profile.children.cycles-pp.folio_wait_bit_common 0.28 ± 81% +0.5 0.77 ± 29% perf-profile.children.cycles-pp.pipe_write 0.35 ± 68% +0.5 0.84 ± 41% perf-profile.children.cycles-pp.tmigr_handle_remote 0.24 ± 81% +0.5 0.74 ± 34% perf-profile.children.cycles-pp.copy_strings 0.27 ± 72% +0.5 0.78 ± 50% perf-profile.children.cycles-pp.exec_mmap 0.19 ± 83% +0.5 0.70 ± 34% perf-profile.children.cycles-pp.folio_wait_writeback 0.30 ± 67% +0.5 0.83 ± 31% perf-profile.children.cycles-pp.do_anonymous_page 0.44 ± 59% +0.6 1.00 ± 46% perf-profile.children.cycles-pp.zap_pmd_range 0.35 ± 86% +0.6 0.93 ± 31% perf-profile.children.cycles-pp.collapse_huge_page 0.36 ± 84% +0.6 0.93 ± 30% perf-profile.children.cycles-pp.khugepaged 0.35 ± 85% +0.6 0.93 ± 30% perf-profile.children.cycles-pp.hpage_collapse_scan_pmd 0.35 ± 85% +0.6 0.93 ± 30% perf-profile.children.cycles-pp.khugepaged_scan_mm_slot 0.34 ± 63% +0.6 0.92 ± 37% perf-profile.children.cycles-pp.evlist_cpu_iterator__next 0.24 ± 42% +0.6 0.89 ± 47% perf-profile.children.cycles-pp.scsi_dispatch_cmd 0.32 ± 62% +0.7 0.99 ± 45% perf-profile.children.cycles-pp.begin_new_exec 0.41 ± 58% +0.7 1.10 ± 21% perf-profile.children.cycles-pp.__open64_nocancel 0.37 ± 48% +0.7 1.06 ± 34% perf-profile.children.cycles-pp._Fork 0.54 ± 66% +0.7 1.27 ± 29% perf-profile.children.cycles-pp.sched_setaffinity 0.78 ± 51% +0.8 1.56 ± 28% perf-profile.children.cycles-pp.link_path_walk 0.40 ± 35% +0.8 1.19 ± 40% perf-profile.children.cycles-pp.scsi_queue_rq 0.51 ± 74% +0.8 1.30 ± 39% perf-profile.children.cycles-pp.elf_load 0.34 ± 54% +0.8 1.15 ± 34% perf-profile.children.cycles-pp.smpboot_thread_fn 0.72 ± 69% +0.9 1.66 ± 33% perf-profile.children.cycles-pp.read_counters 0.77 ± 70% +1.0 1.76 ± 33% perf-profile.children.cycles-pp.cmd_stat 0.77 ± 70% +1.0 1.76 ± 33% perf-profile.children.cycles-pp.dispatch_events 0.77 ± 70% +1.0 1.76 ± 33% perf-profile.children.cycles-pp.process_interval 1.00 ± 19% +1.0 2.01 ± 26% perf-profile.children.cycles-pp.filemap_map_pages 1.39 ± 34% +1.1 2.48 ± 22% perf-profile.children.cycles-pp.asm_sysvec_posted_msi_notification 0.85 ± 68% +1.1 1.99 ± 32% perf-profile.children.cycles-pp.handle_internal_command 0.85 ± 68% +1.1 1.99 ± 32% perf-profile.children.cycles-pp.main 0.85 ± 68% +1.1 1.99 ± 32% perf-profile.children.cycles-pp.run_builtin 0.88 ± 64% +1.2 2.04 ± 32% perf-profile.children.cycles-pp.do_group_exit 0.88 ± 64% +1.2 2.05 ± 32% perf-profile.children.cycles-pp.__x64_sys_exit_group 0.64 ± 66% +1.2 1.87 ± 37% perf-profile.children.cycles-pp.seq_read_iter 1.24 ± 54% +1.2 2.48 ± 31% perf-profile.children.cycles-pp.x64_sys_call 1.03 ± 66% +1.3 2.30 ± 30% perf-profile.children.cycles-pp.do_exit 0.88 ± 64% +1.5 2.35 ± 30% perf-profile.children.cycles-pp.setlocale 1.08 ± 71% +1.5 2.56 ± 26% perf-profile.children.cycles-pp.mmput 1.05 ± 70% +1.5 2.55 ± 27% perf-profile.children.cycles-pp.exit_mmap 3.54 ± 34% +1.8 5.33 ± 20% perf-profile.children.cycles-pp.handle_softirqs 1.61 ± 55% +1.8 3.40 ± 35% perf-profile.children.cycles-pp.mmap_region 1.30 ± 69% +2.0 3.28 ± 32% perf-profile.children.cycles-pp.load_elf_binary 1.32 ± 68% +2.0 3.30 ± 31% perf-profile.children.cycles-pp.search_binary_handler 1.32 ± 68% +2.0 3.32 ± 31% perf-profile.children.cycles-pp.exec_binprm 1.45 ± 70% +2.1 3.52 ± 30% perf-profile.children.cycles-pp.bprm_execve 2.01 ± 71% +3.0 5.01 ± 22% perf-profile.children.cycles-pp.__x64_sys_execve 2.01 ± 71% +3.0 5.01 ± 23% perf-profile.children.cycles-pp.execve 2.00 ± 70% +3.0 5.01 ± 22% perf-profile.children.cycles-pp.do_execveat_common 1.18 ± 39% +3.3 4.48 ± 31% perf-profile.children.cycles-pp.dd_dispatch_request 2.77 ± 34% +3.6 6.33 ± 28% perf-profile.children.cycles-pp.__blk_mq_sched_dispatch_requests 2.77 ± 34% +3.6 6.34 ± 28% perf-profile.children.cycles-pp.blk_mq_sched_dispatch_requests 2.59 ± 34% +3.8 6.35 ± 27% perf-profile.children.cycles-pp.blk_mq_run_hw_queue 1.94 ± 37% +4.3 6.24 ± 29% perf-profile.children.cycles-pp.__blk_mq_do_dispatch_sched 3.76 ± 36% +5.0 8.79 ± 30% perf-profile.children.cycles-pp.__blk_flush_plug 3.76 ± 36% +5.0 8.79 ± 30% perf-profile.children.cycles-pp.blk_mq_dispatch_plug_list 3.76 ± 36% +5.0 8.79 ± 30% perf-profile.children.cycles-pp.blk_mq_flush_plug_list 4.84 ± 35% +5.1 9.96 ± 29% perf-profile.children.cycles-pp.iomap_submit_ioend 4.01 ± 55% +8.1 12.11 ± 30% perf-profile.children.cycles-pp.xfs_file_write_checks 3.34 ± 56% +8.3 11.60 ± 31% perf-profile.children.cycles-pp.iomap_zero_range 0.00 +11.2 11.22 ± 30% perf-profile.children.cycles-pp.filemap_write_and_wait_range 3.43 ± 46% -3.2 0.24 ± 35% perf-profile.self.cycles-pp.memset_orig 1.42 ± 47% -1.3 0.08 ±121% perf-profile.self.cycles-pp._copy_to_iter 1.07 ± 27% -0.6 0.45 ± 39% perf-profile.self.cycles-pp.clear_bhb_loop 0.38 ± 23% -0.3 0.12 ± 62% perf-profile.self.cycles-pp.percpu_counter_add_batch 0.30 ± 33% -0.2 0.09 ± 88% perf-profile.self.cycles-pp.do_syscall_64 0.22 ± 30% -0.2 0.05 ± 71% perf-profile.self.cycles-pp.__mod_node_page_state 0.13 ± 20% -0.1 0.04 ±103% perf-profile.self.cycles-pp.__page_cache_release 0.01 ±223% +0.1 0.11 ± 48% perf-profile.self.cycles-pp.lockref_put_return 0.02 ±142% +0.1 0.13 ± 65% perf-profile.self.cycles-pp.xfs_ag_block_count 0.01 ±223% +0.1 0.13 ± 45% perf-profile.self.cycles-pp.get_cpu_sleep_time_us 0.04 ±112% +0.2 0.19 ± 68% perf-profile.self.cycles-pp.dyntick_save_progress_counter 0.14 ± 35% +0.2 0.31 ± 35% perf-profile.self.cycles-pp.sched_balance_domains 0.08 ± 52% +0.2 0.31 ± 44% perf-profile.self.cycles-pp.__get_user_8 0.06 ± 92% +0.3 0.34 ± 60% perf-profile.self.cycles-pp.fold_vm_numa_events 0.20 ± 65% +0.4 0.63 ± 58% perf-profile.self.cycles-pp.sync_regs 0.26 ± 34% +0.5 0.75 ± 37% perf-profile.self.cycles-pp.filemap_map_pages Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linus:master] [iomap] c5c810b94c: stress-ng.metamix.ops_per_sec -98.4% regression 2024-10-14 7:55 [linus:master] [iomap] c5c810b94c: stress-ng.metamix.ops_per_sec -98.4% regression kernel test robot @ 2024-10-14 16:34 ` Brian Foster 2024-10-15 21:50 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Brian Foster @ 2024-10-14 16:34 UTC (permalink / raw) To: kernel test robot Cc: oe-lkp, lkp, linux-kernel, Christian Brauner, Darrick J. Wong, Josef Bacik, linux-xfs, linux-fsdevel, ying.huang, feng.tang, fengwei.yin On Mon, Oct 14, 2024 at 03:55:24PM +0800, kernel test robot wrote: > > > Hello, > > kernel test robot noticed a -98.4% regression of stress-ng.metamix.ops_per_sec on: > > > commit: c5c810b94cfd818fc2f58c96feee58a9e5ead96d ("iomap: fix handling of dirty folios over unwritten extents") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > testcase: stress-ng > config: x86_64-rhel-8.3 > compiler: gcc-12 > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > parameters: > > nr_threads: 100% > disk: 1HDD > testtime: 60s > fs: xfs > test: metamix > cpufreq_governor: performance > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202410141536.1167190b-oliver.sang@intel.com > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20241014/202410141536.1167190b-oliver.sang@intel.com > So I basically just run this on a >64xcpu guest and reproduce the delta: stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64 The short of it is that with tracing enabled, I see a very large number of extending writes across unwritten mappings, which basically means XFS eof zeroing is calling zero range and hitting the newly introduced flush. This is all pretty much expected given the patch. I ran a quick experiment to skip the flush on sub-4k ranges in favor of doing explicit folio zeroing. The idea with that is that the range is likely restricted to single folio and since it's dirty, we can assume unwritten conversion is imminent and just explicitly zero the range. I still see a decent number of flushes from larger ranges in that experiment, but that still seems to get things pretty close to my baseline test (on a 6.10 distro kernel). That said, the next thing on my todo list in this area is to try out the revalidation based iteration scheme that was discussed on earlier review passes of the flush fix, with the idea that would elide the flush entirely on filesystems that support revalidation (i.e. XFS). Given that the current approach was intended to be a simple and correct fallback for fs' without revalidation support, I'm a little reluctant to introduce a heuristic just to appease this benchmark on XFS. So... my current thinking is that is I'm going to prioritize hacking on the reval scheme over optimizing the fallback solution and see how that goes. If that doesn't pan out for whatever reason and/or if some other more practical workload comes up where this is a practical problem, then I suspect it will be easy enough to work around with a heuristic to balance flushes vs. explicit folio zeroing. For example, I still have another prototype lying around that let iomap_truncate_page() default to zeroing rather than flushing since the operation is by design restricted to a partial range of the EOF folio. With the caveat that I haven't fully dug into this stress-ng test to know exactly what it's doing, I suspect something like that could help elide flushes entirely from this workload with some fairly small tweaks in the fs. Brian > ========================================================================================= > compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > gcc-12/performance/1HDD/xfs/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/metamix/stress-ng/60s > > commit: > 6f634eb080 ("filemap: fix htmldoc warning for mapping_align_index()") > c5c810b94c ("iomap: fix handling of dirty folios over unwritten extents") > > 6f634eb080161baa c5c810b94cfd818fc2f58c96fee > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 5.762e+09 ± 6% +1784.6% 1.086e+11 ± 3% cpuidle..time > 537216 ± 8% +846.9% 5087100 ± 3% cpuidle..usage > 5.83 ± 13% -72.0% 1.63 ± 3% iostat.cpu.idle > 93.48 +5.1% 98.28 iostat.cpu.iowait > 134.32 ± 4% +1197.6% 1742 ± 3% uptime.boot > 2789 ± 3% +54.0% 4295 ± 4% uptime.idle > 47.17 ±144% -81.6% 8.67 ± 24% perf-c2c.DRAM.local > 89.00 ± 77% -58.4% 37.00 ± 9% perf-c2c.HITM.local > 42.33 ± 86% -63.0% 15.67 ± 26% perf-c2c.HITM.remote > 609419 ± 10% +144.5% 1489941 ± 18% numa-numastat.node0.local_node > 628123 ± 10% +142.2% 1521154 ± 17% numa-numastat.node0.numa_hit > 537937 ± 4% +288.1% 2087519 ± 10% numa-numastat.node1.local_node > 585710 ± 4% +262.4% 2122854 ± 10% numa-numastat.node1.numa_hit > 33057 ± 5% -94.2% 1926 ± 2% vmstat.io.bo > 1.72 ± 6% -37.5% 1.08 vmstat.procs.r > 5454 ± 6% -45.2% 2990 vmstat.system.cs > 5999 ± 4% -53.5% 2790 vmstat.system.in > 3.79 ± 19% -2.3 1.52 ± 3% mpstat.cpu.all.idle% > 0.02 ± 23% -0.0 0.01 ± 3% mpstat.cpu.all.irq% > 0.01 ± 4% -0.0 0.00 mpstat.cpu.all.soft% > 0.25 ± 6% -0.2 0.05 ± 4% mpstat.cpu.all.sys% > 0.40 ± 5% -0.4 0.03 ± 7% mpstat.cpu.all.usr% > 6.44 ± 5% -13.8% 5.55 ± 3% mpstat.max_utilization_pct > 1991 ± 14% -68.8% 621.17 ± 9% stress-ng.metamix.ops > 23.12 ± 10% -98.4% 0.37 ± 7% stress-ng.metamix.ops_per_sec > 87.77 ± 6% +1831.7% 1695 ± 3% stress-ng.time.elapsed_time > 87.77 ± 6% +1831.7% 1695 ± 3% stress-ng.time.elapsed_time.max > 120134 ± 2% -5.1% 114001 stress-ng.time.minor_page_faults > 5.67 ± 8% -82.4% 1.00 stress-ng.time.percent_of_cpu_this_job_got > 4.90 ± 7% +331.2% 21.13 ± 4% stress-ng.time.system_time > 63630 ± 9% +1332.9% 911761 ± 3% stress-ng.time.voluntary_context_switches > 25272 ± 6% +100.3% 50631 meminfo.Active > 20787 ± 10% +96.7% 40898 ± 2% meminfo.Active(anon) > 4485 ± 15% +117.0% 9733 meminfo.Active(file) > 207516 ± 6% +216.6% 656984 ± 10% meminfo.AnonHugePages > 377749 ± 4% +31.5% 496804 meminfo.Dirty > 1808866 -20.6% 1436964 meminfo.Inactive > 902066 -43.7% 508085 meminfo.Inactive(file) > 6425133 -10.6% 5746563 meminfo.Memused > 11.47 +17676.6% 2038 meminfo.Mlocked > 102534 +22.7% 125834 meminfo.Shmem > 119495 -95.0% 5925 ± 3% meminfo.Writeback > 7239789 -14.4% 6193951 meminfo.max_used_kB > 2230 ± 16% +72.8% 3853 ± 5% numa-meminfo.node0.Active(file) > 84262 ± 27% +532.6% 533075 ± 25% numa-meminfo.node0.AnonHugePages > 401080 ± 65% +81.0% 726026 ± 10% numa-meminfo.node0.AnonPages.max > 450689 ± 3% -59.7% 181817 ± 4% numa-meminfo.node0.Inactive(file) > 15857 ± 4% -12.5% 13880 ± 4% numa-meminfo.node0.KernelStack > 5.73 ±100% +14883.5% 858.83 ± 96% numa-meminfo.node0.Mlocked > 59744 ± 3% -96.6% 2047 ± 4% numa-meminfo.node0.Writeback > 16267 ± 11% +121.2% 35987 ± 16% numa-meminfo.node1.Active > 14010 ± 13% +114.9% 30108 ± 20% numa-meminfo.node1.Active(anon) > 2257 ± 15% +160.4% 5879 ± 4% numa-meminfo.node1.Active(file) > 188486 ± 7% +69.0% 318533 ± 3% numa-meminfo.node1.Dirty > 956395 ± 30% -43.4% 541330 ± 20% numa-meminfo.node1.Inactive > 452002 ± 5% -27.8% 326319 ± 3% numa-meminfo.node1.Inactive(file) > 150109 ± 13% +16.7% 175200 ± 5% numa-meminfo.node1.Slab > 59749 ± 4% -93.5% 3891 ± 5% numa-meminfo.node1.Writeback > 556.82 ± 16% +73.0% 963.40 ± 5% numa-vmstat.node0.nr_active_file > 41.21 ± 27% +531.7% 260.29 ± 25% numa-vmstat.node0.nr_anon_transparent_hugepages > 379768 ± 11% -23.8% 289236 ± 3% numa-vmstat.node0.nr_dirtied > 112900 ± 2% -59.7% 45451 ± 4% numa-vmstat.node0.nr_inactive_file > 15877 ± 4% -12.6% 13883 ± 3% numa-vmstat.node0.nr_kernel_stack > 1.44 ±100% +14824.7% 214.67 ± 96% numa-vmstat.node0.nr_mlock > 14977 ± 3% -96.6% 512.83 ± 4% numa-vmstat.node0.nr_writeback > 379768 ± 11% -25.6% 282645 ± 3% numa-vmstat.node0.nr_written > 556.84 ± 16% +73.0% 963.40 ± 5% numa-vmstat.node0.nr_zone_active_file > 112900 ± 2% -59.7% 45451 ± 4% numa-vmstat.node0.nr_zone_inactive_file > 62482 ± 3% -27.8% 45088 ± 4% numa-vmstat.node0.nr_zone_write_pending > 625814 ± 10% +143.0% 1520756 ± 17% numa-vmstat.node0.numa_hit > 607109 ± 10% +145.4% 1489543 ± 18% numa-vmstat.node0.numa_local > 3496 ± 13% +115.3% 7527 ± 20% numa-vmstat.node1.nr_active_anon > 563.82 ± 16% +160.7% 1469 ± 4% numa-vmstat.node1.nr_active_file > 380179 ± 9% +38.2% 525240 ± 5% numa-vmstat.node1.nr_dirtied > 47231 ± 7% +68.6% 79622 ± 3% numa-vmstat.node1.nr_dirty > 113239 ± 5% -28.0% 81571 ± 3% numa-vmstat.node1.nr_inactive_file > 14977 ± 4% -93.5% 974.17 ± 5% numa-vmstat.node1.nr_writeback > 380179 ± 9% +35.0% 513207 ± 5% numa-vmstat.node1.nr_written > 3496 ± 13% +115.3% 7527 ± 20% numa-vmstat.node1.nr_zone_active_anon > 563.82 ± 16% +160.7% 1469 ± 4% numa-vmstat.node1.nr_zone_active_file > 113239 ± 5% -28.0% 81571 ± 3% numa-vmstat.node1.nr_zone_inactive_file > 62209 ± 6% +29.6% 80597 ± 3% numa-vmstat.node1.nr_zone_write_pending > 583795 ± 4% +263.5% 2121826 ± 10% numa-vmstat.node1.numa_hit > 535988 ± 4% +289.3% 2086491 ± 10% numa-vmstat.node1.numa_local > 5190 ± 10% +97.0% 10224 ± 2% proc-vmstat.nr_active_anon > 1122 ± 14% +116.7% 2433 proc-vmstat.nr_active_file > 208668 +2.2% 213362 proc-vmstat.nr_anon_pages > 101.32 ± 6% +216.6% 320.79 ± 10% proc-vmstat.nr_anon_transparent_hugepages > 94628 ± 4% +31.3% 124208 proc-vmstat.nr_dirty > 1051112 -8.7% 959304 proc-vmstat.nr_file_pages > 226820 +2.4% 232205 proc-vmstat.nr_inactive_anon > 225925 -43.8% 127025 proc-vmstat.nr_inactive_file > 29400 -6.6% 27458 proc-vmstat.nr_kernel_stack > 2.88 +17612.4% 509.76 proc-vmstat.nr_mlock > 22780 -6.0% 21412 ± 3% proc-vmstat.nr_page_table_pages > 25696 ± 2% +22.4% 31461 proc-vmstat.nr_shmem > 26966 +2.2% 27573 proc-vmstat.nr_slab_reclaimable > 63926 +2.0% 65209 proc-vmstat.nr_slab_unreclaimable > 29903 -95.0% 1484 ± 3% proc-vmstat.nr_writeback > 5190 ± 10% +97.0% 10224 ± 2% proc-vmstat.nr_zone_active_anon > 1122 ± 14% +116.7% 2433 proc-vmstat.nr_zone_active_file > 226820 +2.4% 232205 proc-vmstat.nr_zone_inactive_anon > 225925 -43.8% 127025 proc-vmstat.nr_zone_inactive_file > 1215783 ± 6% +199.9% 3646420 ± 3% proc-vmstat.numa_hit > 1149305 ± 7% +211.5% 3579877 ± 3% proc-vmstat.numa_local > 89633 ± 7% +349.7% 403114 ± 3% proc-vmstat.pgactivate > 1312602 ± 6% +200.5% 3944776 ± 3% proc-vmstat.pgalloc_normal > 415149 ± 3% +897.4% 4140633 ± 3% proc-vmstat.pgfault > 1306213 ± 6% +200.3% 3923139 ± 3% proc-vmstat.pgfree > 16256 ± 3% +1057.0% 188090 ± 3% proc-vmstat.pgreuse > 1.82 ± 4% -9.6% 1.65 perf-stat.i.MPKI > 3.76e+08 ± 6% -89.1% 40978627 ± 4% perf-stat.i.branch-instructions > 2.30 ± 3% -0.6 1.66 perf-stat.i.branch-miss-rate% > 17129191 ± 7% -93.0% 1196180 ± 7% perf-stat.i.branch-misses > 16.84 ± 3% -6.5 10.36 perf-stat.i.cache-miss-rate% > 1341312 ± 4% -81.2% 251609 perf-stat.i.cache-misses > 7592299 ± 3% -73.3% 2030865 perf-stat.i.cache-references > 5488 ± 6% -45.6% 2986 perf-stat.i.context-switches > 1.35 ± 4% +8.7% 1.47 perf-stat.i.cpi > 1.867e+09 ± 5% -86.0% 2.607e+08 ± 6% perf-stat.i.cpu-cycles > 199.44 ± 2% -59.4% 80.94 perf-stat.i.cpu-migrations > 1655 ± 13% -41.4% 969.67 perf-stat.i.cycles-between-cache-misses > 1.841e+09 ± 6% -89.2% 1.994e+08 ± 4% perf-stat.i.instructions > 0.87 -18.5% 0.71 perf-stat.i.ipc > 0.57 ± 40% -97.6% 0.01 ± 43% perf-stat.i.major-faults > 0.01 ±141% +30993.6% 2.48 ± 5% perf-stat.i.metric.K/sec > 3665 ± 3% -36.0% 2345 perf-stat.i.minor-faults > 3666 ± 3% -36.0% 2345 perf-stat.i.page-faults > 0.73 ± 5% +72.7% 1.26 ± 4% perf-stat.overall.MPKI > 4.55 -1.6 2.92 ± 3% perf-stat.overall.branch-miss-rate% > 17.66 ± 2% -5.3 12.39 perf-stat.overall.cache-miss-rate% > 1.02 ± 5% +28.1% 1.31 ± 3% perf-stat.overall.cpi > 1397 ± 5% -25.8% 1036 ± 5% perf-stat.overall.cycles-between-cache-misses > 0.98 ± 5% -22.1% 0.77 ± 3% perf-stat.overall.ipc > 3.728e+08 ± 6% -89.0% 41014226 ± 4% perf-stat.ps.branch-instructions > 16957001 ± 7% -92.9% 1198002 ± 7% perf-stat.ps.branch-misses > 1332233 ± 4% -81.1% 251650 perf-stat.ps.cache-misses > 7543069 ± 3% -73.1% 2030809 perf-stat.ps.cache-references > 5443 ± 6% -45.2% 2985 perf-stat.ps.context-switches > 63285 +1.1% 63962 perf-stat.ps.cpu-clock > 1.859e+09 ± 5% -86.0% 2.611e+08 ± 6% perf-stat.ps.cpu-cycles > 198.47 ± 3% -59.2% 80.91 perf-stat.ps.cpu-migrations > 1.826e+09 ± 6% -89.1% 1.996e+08 ± 4% perf-stat.ps.instructions > 0.58 ± 41% -97.6% 0.01 ± 43% perf-stat.ps.major-faults > 3640 ± 3% -35.6% 2344 perf-stat.ps.minor-faults > 3640 ± 3% -35.6% 2344 perf-stat.ps.page-faults > 63285 +1.1% 63962 perf-stat.ps.task-clock > 1.637e+11 ± 5% +106.8% 3.387e+11 ± 4% perf-stat.total.instructions > 0.01 ± 31% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.iomap_writepage_map_blocks.iomap_writepage_map > 0.01 ± 11% -100.0% 0.00 perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra > 0.00 ± 16% -100.0% 0.00 perf-sched.sch_delay.avg.ms.schedule_timeout.xfsaild.kthread.ret_from_fork > 0.00 ± 17% -100.0% 0.00 perf-sched.sch_delay.avg.ms.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync > 0.01 ± 11% -100.0% 0.00 perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 > 0.00 ± 12% -100.0% 0.00 perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync > 0.01 ± 8% -100.0% 0.00 perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread > 0.02 ±135% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.iomap_writepage_map_blocks.iomap_writepage_map > 0.14 ± 79% -88.5% 0.02 ±103% perf-sched.sch_delay.max.ms.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle > 0.02 ± 77% -67.9% 0.01 ± 5% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll > 0.02 ± 38% -100.0% 0.00 perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra > 0.01 ± 44% -100.0% 0.00 perf-sched.sch_delay.max.ms.schedule_timeout.xfsaild.kthread.ret_from_fork > 0.00 ± 26% -100.0% 0.00 perf-sched.sch_delay.max.ms.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync > 0.01 ±103% -100.0% 0.00 perf-sched.sch_delay.max.ms.xlog_wait_on_iclog.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 > 0.00 ± 11% -100.0% 0.00 perf-sched.sch_delay.max.ms.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync > 0.01 ± 11% -100.0% 0.00 perf-sched.sch_delay.max.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread > 8119 ± 9% -50.9% 3990 ± 15% perf-sched.total_wait_and_delay.count.ms > 765.01 ± 48% -100.0% 0.01 perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range > 8.46 ± 6% +27.6% 10.79 ± 16% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 50.66 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.xfsaild.kthread.ret_from_fork > 134.86 ± 13% +158.8% 348.98 ± 18% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 65.33 ± 70% +353.1% 296.00 ± 64% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range > 1366 ± 18% -78.6% 293.00 ± 64% perf-sched.wait_and_delay.count.io_schedule.rq_qos_wait.wbt_wait.__rq_qos_throttle > 97.17 -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.xfsaild.kthread.ret_from_fork > 3007 ± 18% -77.6% 674.17 ± 29% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 2858 ± 50% -100.0% 0.04 ± 72% perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range > 103.17 ± 10% +73.5% 179.00 ± 24% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 59.66 ± 10% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.xfsaild.kthread.ret_from_fork > 765.00 ± 48% -100.0% 0.00 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range > 93.17 ± 32% -100.0% 0.00 perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra > 8.45 ± 6% +27.6% 10.78 ± 16% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 50.65 -100.0% 0.00 perf-sched.wait_time.avg.ms.schedule_timeout.xfsaild.kthread.ret_from_fork > 134.64 ± 13% +158.9% 348.54 ± 18% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 0.01 ± 22% -100.0% 0.00 perf-sched.wait_time.avg.ms.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync > 202.39 ± 62% -100.0% 0.00 perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 > 248.16 ± 50% -100.0% 0.00 perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync > 219.73 ± 46% -100.0% 0.00 perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread > 2858 ± 50% -100.0% 0.00 perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range > 468.55 ± 29% -100.0% 0.00 perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra > 103.16 ± 10% +73.5% 179.00 ± 24% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 59.65 ± 10% -100.0% 0.00 perf-sched.wait_time.max.ms.schedule_timeout.xfsaild.kthread.ret_from_fork > 0.02 ± 25% -100.0% 0.00 perf-sched.wait_time.max.ms.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync > 348.84 ± 58% -100.0% 0.00 perf-sched.wait_time.max.ms.xlog_wait_on_iclog.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 > 486.25 ± 23% -100.0% 0.00 perf-sched.wait_time.max.ms.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.__x64_sys_fdatasync > 473.19 ± 38% -100.0% 0.00 perf-sched.wait_time.max.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread > 7346 ± 4% +23.0% 9033 ± 7% sched_debug.cfs_rq:/.avg_vruntime.avg > 848.55 ± 8% +123.3% 1894 ± 11% sched_debug.cfs_rq:/.avg_vruntime.min > 0.14 ± 14% -71.2% 0.04 ± 2% sched_debug.cfs_rq:/.h_nr_running.avg > 0.32 ± 9% -41.3% 0.19 ± 2% sched_debug.cfs_rq:/.h_nr_running.stddev > 139317 ±191% -88.9% 15437 ± 4% sched_debug.cfs_rq:/.load.avg > 8499863 ±201% -92.8% 613636 ± 4% sched_debug.cfs_rq:/.load.max > 247.33 ± 32% -85.6% 35.54 ± 46% sched_debug.cfs_rq:/.load_avg.avg > 632.63 ± 71% -74.1% 163.67 ± 68% sched_debug.cfs_rq:/.load_avg.stddev > 7346 ± 4% +23.0% 9033 ± 7% sched_debug.cfs_rq:/.min_vruntime.avg > 848.55 ± 8% +123.3% 1894 ± 11% sched_debug.cfs_rq:/.min_vruntime.min > 0.14 ± 14% -71.2% 0.04 ± 2% sched_debug.cfs_rq:/.nr_running.avg > 0.32 ± 9% -41.3% 0.19 ± 2% sched_debug.cfs_rq:/.nr_running.stddev > 62.04 ± 22% -89.5% 6.51 ± 42% sched_debug.cfs_rq:/.removed.load_avg.avg > 597.33 ± 31% -63.0% 220.77 ± 53% sched_debug.cfs_rq:/.removed.load_avg.max > 171.33 ± 9% -80.1% 34.06 ± 47% sched_debug.cfs_rq:/.removed.load_avg.stddev > 18.96 ± 25% -85.8% 2.70 ± 45% sched_debug.cfs_rq:/.removed.runnable_avg.avg > 308.25 ± 31% -64.0% 111.09 ± 54% sched_debug.cfs_rq:/.removed.runnable_avg.max > 60.15 ± 17% -74.0% 15.67 ± 51% sched_debug.cfs_rq:/.removed.runnable_avg.stddev > 18.96 ± 25% -85.8% 2.70 ± 45% sched_debug.cfs_rq:/.removed.util_avg.avg > 308.25 ± 31% -64.0% 111.09 ± 54% sched_debug.cfs_rq:/.removed.util_avg.max > 60.15 ± 17% -74.0% 15.67 ± 51% sched_debug.cfs_rq:/.removed.util_avg.stddev > 237.66 ± 5% -85.8% 33.72 ± 3% sched_debug.cfs_rq:/.runnable_avg.avg > 1104 ± 12% -45.6% 600.94 ± 2% sched_debug.cfs_rq:/.runnable_avg.max > 226.39 ± 5% -57.8% 95.63 ± 2% sched_debug.cfs_rq:/.runnable_avg.stddev > 236.68 ± 5% -85.9% 33.42 ± 4% sched_debug.cfs_rq:/.util_avg.avg > 1103 ± 12% -46.3% 591.99 sched_debug.cfs_rq:/.util_avg.max > 226.37 ± 5% -58.2% 94.62 ± 2% sched_debug.cfs_rq:/.util_avg.stddev > 18.18 ± 24% -83.7% 2.96 ± 17% sched_debug.cfs_rq:/.util_est.avg > 338.67 ± 13% -67.8% 109.03 ± 15% sched_debug.cfs_rq:/.util_est.max > 67.93 ± 17% -76.8% 15.79 ± 14% sched_debug.cfs_rq:/.util_est.stddev > 803668 ± 2% +18.9% 955672 sched_debug.cpu.avg_idle.avg > 198446 ± 5% -41.6% 115938 ± 11% sched_debug.cpu.avg_idle.stddev > 74790 +1064.4% 870887 ± 3% sched_debug.cpu.clock.avg > 74795 +1064.4% 870893 ± 3% sched_debug.cpu.clock.max > 74786 +1064.5% 870879 ± 3% sched_debug.cpu.clock.min > 2.48 ± 4% +26.3% 3.14 ± 8% sched_debug.cpu.clock.stddev > 74525 +1068.1% 870538 ± 3% sched_debug.cpu.clock_task.avg > 74776 +1064.6% 870850 ± 3% sched_debug.cpu.clock_task.max > 66679 +1193.0% 862189 ± 3% sched_debug.cpu.clock_task.min > 339.14 ± 15% +27.9% 433.69 ± 4% sched_debug.cpu.curr->pid.avg > 4069 +498.1% 24338 ± 3% sched_debug.cpu.curr->pid.max > 895.15 ± 7% +243.7% 3076 ± 3% sched_debug.cpu.curr->pid.stddev > 0.14 ± 15% -74.6% 0.04 ± 8% sched_debug.cpu.nr_running.avg > 0.31 ± 11% -44.7% 0.17 ± 6% sched_debug.cpu.nr_running.stddev > 4679 ± 4% +794.9% 41879 ± 4% sched_debug.cpu.nr_switches.avg > 27295 ± 11% +571.5% 183299 ± 4% sched_debug.cpu.nr_switches.max > 1238 ± 20% +985.0% 13436 ± 12% sched_debug.cpu.nr_switches.min > 4550 ± 12% +581.7% 31022 ± 7% sched_debug.cpu.nr_switches.stddev > 8.07 +60.9% 12.99 ± 2% sched_debug.cpu.nr_uninterruptible.avg > 35.50 ± 10% +49.5% 53.06 ± 14% sched_debug.cpu.nr_uninterruptible.max > 10.63 ± 7% +33.5% 14.19 ± 12% sched_debug.cpu.nr_uninterruptible.stddev > 74787 +1064.5% 870884 ± 3% sched_debug.cpu_clk > 73624 +1081.3% 869721 ± 3% sched_debug.ktime > 75383 +1056.2% 871587 ± 3% sched_debug.sched_clk > 12.22 ± 39% -12.2 0.00 perf-profile.calltrace.cycles-pp.fdatasync.stress_metamix > 12.15 ± 39% -12.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fdatasync.stress_metamix > 12.15 ± 39% -12.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fdatasync.stress_metamix > 12.02 ± 40% -12.0 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_fdatasync.do_syscall_64.entry_SYSCALL_64_after_hwframe.fdatasync.stress_metamix > 12.02 ± 40% -12.0 0.00 perf-profile.calltrace.cycles-pp.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64.entry_SYSCALL_64_after_hwframe.fdatasync > 9.67 ± 50% -9.2 0.44 ±112% perf-profile.calltrace.cycles-pp.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64 > 8.34 ± 50% -7.9 0.41 ±108% perf-profile.calltrace.cycles-pp.iomap_write_iter.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write.ksys_write > 7.12 ± 30% -7.1 0.00 perf-profile.calltrace.cycles-pp.file_write_and_wait_range.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64.entry_SYSCALL_64_after_hwframe > 6.81 ± 32% -6.8 0.00 perf-profile.calltrace.cycles-pp.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fsync.__x64_sys_fdatasync.do_syscall_64 > 6.81 ± 32% -6.8 0.00 perf-profile.calltrace.cycles-pp.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fsync.__x64_sys_fdatasync > 6.80 ± 32% -6.8 0.00 perf-profile.calltrace.cycles-pp.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.file_write_and_wait_range.xfs_file_fsync > 6.80 ± 32% -6.8 0.00 perf-profile.calltrace.cycles-pp.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.file_write_and_wait_range > 6.44 ± 49% -6.3 0.12 ±223% perf-profile.calltrace.cycles-pp.iomap_write_begin.iomap_write_iter.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write > 5.25 ± 45% -5.2 0.00 perf-profile.calltrace.cycles-pp.iomap_writepage_map.iomap_writepages.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc > 4.70 ± 47% -4.7 0.00 perf-profile.calltrace.cycles-pp.read.stress_metamix > 4.69 ± 46% -4.7 0.00 perf-profile.calltrace.cycles-pp.iomap_writepage_map_blocks.iomap_writepage_map.iomap_writepages.xfs_vm_writepages.do_writepages > 4.68 ± 63% -4.7 0.00 perf-profile.calltrace.cycles-pp.unlink.stress_metamix > 4.66 ± 63% -4.7 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink.stress_metamix > 4.66 ± 63% -4.7 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink.stress_metamix > 4.64 ± 63% -4.6 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink.stress_metamix > 4.64 ± 63% -4.6 0.00 perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink > 4.12 ± 48% -4.1 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read.stress_metamix > 4.08 ± 48% -4.1 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.stress_metamix > 4.01 ± 64% -4.0 0.00 perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe > 4.00 ± 64% -4.0 0.00 perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64 > 3.96 ± 48% -4.0 0.00 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.stress_metamix > 3.88 ± 52% -3.9 0.00 perf-profile.calltrace.cycles-pp.iomap_submit_ioend.iomap_writepage_map_blocks.iomap_writepage_map.iomap_writepages.xfs_vm_writepages > 3.87 ± 53% -3.9 0.00 perf-profile.calltrace.cycles-pp.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepage_map_blocks.iomap_writepage_map.iomap_writepages > 3.86 ± 53% -3.9 0.00 perf-profile.calltrace.cycles-pp.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepage_map_blocks.iomap_writepage_map > 6.96 ± 11% -3.3 3.62 ± 29% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 6.29 ± 14% -3.1 3.16 ± 28% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 0.65 ± 64% +0.5 1.12 ± 25% perf-profile.calltrace.cycles-pp._nohz_idle_balance.handle_softirqs.__irq_exit_rcu.sysvec_call_function_single.asm_sysvec_call_function_single > 0.22 ±141% +0.6 0.85 ± 28% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.11 ±223% +0.6 0.74 ± 28% perf-profile.calltrace.cycles-pp.pipe_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.22 ±141% +0.7 0.88 ± 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.22 ±141% +0.7 0.88 ± 27% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write > 0.50 ± 76% +0.7 1.15 ± 25% perf-profile.calltrace.cycles-pp.__sysvec_posted_msi_notification.sysvec_posted_msi_notification.asm_sysvec_posted_msi_notification.acpi_safe_halt.acpi_idle_enter > 0.22 ±142% +0.7 0.94 ± 25% perf-profile.calltrace.cycles-pp.__open64_nocancel.setlocale > 0.27 ±141% +0.8 1.07 ± 28% perf-profile.calltrace.cycles-pp.write > 0.33 ±102% +0.8 1.14 ± 22% perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault > 0.22 ±145% +0.8 1.05 ± 24% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault > 0.43 ±108% +0.9 1.35 ± 24% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault > 0.16 ±223% +1.0 1.17 ± 36% perf-profile.calltrace.cycles-pp.blk_mq_submit_bio.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepages > 0.15 ±223% +1.1 1.20 ± 38% perf-profile.calltrace.cycles-pp.blk_mq_dispatch_rq_list.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue > 0.66 ± 98% +1.1 1.76 ± 33% perf-profile.calltrace.cycles-pp.cmd_stat.run_builtin.handle_internal_command.main > 0.66 ± 98% +1.1 1.76 ± 33% perf-profile.calltrace.cycles-pp.dispatch_events.cmd_stat.run_builtin.handle_internal_command.main > 0.66 ± 98% +1.1 1.76 ± 33% perf-profile.calltrace.cycles-pp.process_interval.dispatch_events.cmd_stat.run_builtin.handle_internal_command > 0.54 ±119% +1.1 1.68 ± 34% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.exit_mm.do_exit.do_group_exit > 0.54 ±118% +1.2 1.70 ± 33% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call > 0.54 ±118% +1.2 1.70 ± 33% perf-profile.calltrace.cycles-pp.mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group > 1.56 ± 19% +1.2 2.73 ± 22% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault > 0.00 +1.2 1.18 ± 40% perf-profile.calltrace.cycles-pp.scsi_queue_rq.blk_mq_dispatch_rq_list.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests > 0.45 ±148% +1.2 1.66 ± 33% perf-profile.calltrace.cycles-pp.read_counters.process_interval.dispatch_events.cmd_stat.run_builtin > 0.72 ± 98% +1.3 1.99 ± 32% perf-profile.calltrace.cycles-pp.handle_internal_command.main > 0.72 ± 98% +1.3 1.99 ± 32% perf-profile.calltrace.cycles-pp.main > 0.72 ± 98% +1.3 1.99 ± 32% perf-profile.calltrace.cycles-pp.run_builtin.handle_internal_command.main > 0.48 ±151% +1.3 1.76 ± 32% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe > 1.80 ± 20% +1.3 3.10 ± 23% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault > 1.82 ± 20% +1.3 3.12 ± 23% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault > 0.64 ±114% +1.4 2.00 ± 32% perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.64 ±114% +1.4 2.00 ± 32% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64 > 0.64 ±114% +1.4 2.00 ± 32% perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.52 ±109% +1.4 1.88 ± 27% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.53 ±109% +1.4 1.92 ± 28% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.93 ± 83% +1.4 2.32 ± 32% perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.91 ± 89% +1.5 2.41 ± 42% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.71 ±110% +1.6 2.32 ± 42% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64 > 0.66 ±112% +1.7 2.35 ± 30% perf-profile.calltrace.cycles-pp.setlocale > 2.08 ± 26% +1.7 3.82 ± 29% perf-profile.calltrace.cycles-pp.asm_exc_page_fault > 1.05 ± 78% +1.9 2.94 ± 28% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read > 1.46 ± 49% +1.9 3.35 ± 19% perf-profile.calltrace.cycles-pp.asm_sysvec_posted_msi_notification.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter > 1.10 ± 78% +1.9 3.02 ± 28% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read > 1.10 ± 78% +1.9 3.02 ± 28% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read > 1.30 ± 69% +2.0 3.28 ± 32% perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common > 1.32 ± 68% +2.0 3.30 ± 31% perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64 > 1.32 ± 68% +2.0 3.30 ± 31% perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve > 1.43 ± 70% +2.0 3.45 ± 30% perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe > 1.11 ± 78% +2.1 3.24 ± 27% perf-profile.calltrace.cycles-pp.read > 2.01 ± 71% +3.0 4.99 ± 22% perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve > 2.01 ± 71% +3.0 5.00 ± 22% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve > 2.01 ± 71% +3.0 5.00 ± 22% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve > 2.01 ± 71% +3.0 5.00 ± 22% perf-profile.calltrace.cycles-pp.execve > 2.00 ± 70% +3.0 4.99 ± 22% perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve > 1.01 ± 54% +3.3 4.27 ± 31% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dd_dispatch_request.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests > 1.01 ± 54% +3.4 4.36 ± 30% perf-profile.calltrace.cycles-pp._raw_spin_lock.dd_dispatch_request.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests > 1.05 ± 55% +3.4 4.47 ± 31% perf-profile.calltrace.cycles-pp.dd_dispatch_request.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue > 2.33 ± 39% +4.0 6.29 ± 28% perf-profile.calltrace.cycles-pp.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_dispatch_plug_list.blk_mq_flush_plug_list > 2.33 ± 39% +4.0 6.31 ± 28% perf-profile.calltrace.cycles-pp.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_dispatch_plug_list.blk_mq_flush_plug_list.__blk_flush_plug > 2.35 ± 39% +4.0 6.35 ± 27% perf-profile.calltrace.cycles-pp.blk_mq_run_hw_queue.blk_mq_dispatch_plug_list.blk_mq_flush_plug_list.__blk_flush_plug.__submit_bio > 1.71 ± 50% +4.5 6.23 ± 29% perf-profile.calltrace.cycles-pp.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_dispatch_plug_list > 3.24 ± 66% +4.8 8.01 ± 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe > 3.24 ± 66% +4.8 8.04 ± 27% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe > 3.55 ± 40% +5.2 8.77 ± 29% perf-profile.calltrace.cycles-pp.blk_mq_dispatch_plug_list.blk_mq_flush_plug_list.__blk_flush_plug.__submit_bio.submit_bio_noacct_nocheck > 3.55 ± 40% +5.2 8.77 ± 29% perf-profile.calltrace.cycles-pp.blk_mq_flush_plug_list.__blk_flush_plug.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend > 3.98 ± 55% +8.1 12.11 ± 30% perf-profile.calltrace.cycles-pp.xfs_file_write_checks.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64 > 0.56 ±164% +8.2 8.77 ± 29% perf-profile.calltrace.cycles-pp.__blk_flush_plug.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepages > 3.33 ± 56% +8.3 11.60 ± 31% perf-profile.calltrace.cycles-pp.iomap_zero_range.xfs_file_write_checks.xfs_file_buffered_write.vfs_write.ksys_write > 0.91 ±108% +9.0 9.94 ± 29% perf-profile.calltrace.cycles-pp.__submit_bio.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepages.xfs_vm_writepages > 0.91 ±108% +9.0 9.94 ± 29% perf-profile.calltrace.cycles-pp.submit_bio_noacct_nocheck.iomap_submit_ioend.iomap_writepages.xfs_vm_writepages.do_writepages > 0.92 ±107% +9.0 9.95 ± 29% perf-profile.calltrace.cycles-pp.iomap_submit_ioend.iomap_writepages.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc > 0.00 +10.3 10.29 ± 30% perf-profile.calltrace.cycles-pp.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.filemap_write_and_wait_range.iomap_zero_range > 0.00 +10.3 10.29 ± 30% perf-profile.calltrace.cycles-pp.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.filemap_write_and_wait_range > 0.00 +10.4 10.40 ± 30% perf-profile.calltrace.cycles-pp.__filemap_fdatawrite_range.filemap_write_and_wait_range.iomap_zero_range.xfs_file_write_checks.xfs_file_buffered_write > 0.00 +10.4 10.40 ± 30% perf-profile.calltrace.cycles-pp.filemap_fdatawrite_wbc.__filemap_fdatawrite_range.filemap_write_and_wait_range.iomap_zero_range.xfs_file_write_checks > 0.00 +11.2 11.22 ± 30% perf-profile.calltrace.cycles-pp.filemap_write_and_wait_range.iomap_zero_range.xfs_file_write_checks.xfs_file_buffered_write.vfs_write > 12.24 ± 39% -12.2 0.00 perf-profile.children.cycles-pp.fdatasync > 12.02 ± 40% -12.0 0.00 perf-profile.children.cycles-pp.__x64_sys_fdatasync > 12.02 ± 40% -12.0 0.00 perf-profile.children.cycles-pp.xfs_file_fsync > 9.68 ± 50% -9.1 0.57 ± 69% perf-profile.children.cycles-pp.iomap_file_buffered_write > 8.37 ± 50% -7.8 0.53 ± 66% perf-profile.children.cycles-pp.iomap_write_iter > 7.12 ± 30% -7.1 0.00 perf-profile.children.cycles-pp.file_write_and_wait_range > 6.48 ± 49% -6.1 0.35 ± 65% perf-profile.children.cycles-pp.iomap_write_begin > 5.25 ± 45% -5.0 0.22 ± 74% perf-profile.children.cycles-pp.iomap_writepage_map > 4.74 ± 60% -4.7 0.00 perf-profile.children.cycles-pp.unlink > 4.69 ± 60% -4.7 0.00 perf-profile.children.cycles-pp.__x64_sys_unlink > 4.69 ± 60% -4.7 0.00 perf-profile.children.cycles-pp.do_unlinkat > 4.70 ± 46% -4.6 0.14 ± 97% perf-profile.children.cycles-pp.iomap_writepage_map_blocks > 4.31 ± 55% -4.2 0.16 ±108% perf-profile.children.cycles-pp.iomap_iter > 4.13 ± 61% -4.1 0.00 perf-profile.children.cycles-pp.truncate_inode_pages_range > 4.06 ± 61% -4.1 0.01 ±223% perf-profile.children.cycles-pp.evict > 3.90 ± 48% -3.8 0.11 ± 82% perf-profile.children.cycles-pp.__iomap_write_begin > 3.82 ± 56% -3.7 0.08 ± 80% perf-profile.children.cycles-pp.xfs_buffered_write_iomap_begin > 3.42 ± 47% -3.4 0.07 ±111% perf-profile.children.cycles-pp.zero_user_segments > 6.97 ± 11% -3.4 3.62 ± 29% perf-profile.children.cycles-pp.worker_thread > 3.45 ± 46% -3.2 0.24 ± 35% perf-profile.children.cycles-pp.memset_orig > 3.25 ± 45% -3.1 0.11 ± 88% perf-profile.children.cycles-pp.filemap_read > 6.29 ± 14% -3.1 3.16 ± 28% perf-profile.children.cycles-pp.process_one_work > 3.18 ± 54% -2.6 0.59 ± 50% perf-profile.children.cycles-pp.folios_put_refs > 2.46 ± 67% -2.4 0.10 ± 65% perf-profile.children.cycles-pp.__page_cache_release > 2.51 ± 50% -2.2 0.28 ± 52% perf-profile.children.cycles-pp.__filemap_get_folio > 1.58 ± 51% -1.5 0.10 ± 84% perf-profile.children.cycles-pp.filemap_add_folio > 1.52 ± 45% -1.4 0.10 ±119% perf-profile.children.cycles-pp.copy_page_to_iter > 1.48 ± 44% -1.4 0.12 ± 75% perf-profile.children.cycles-pp._copy_to_iter > 2.55 ± 16% -1.1 1.41 ± 36% perf-profile.children.cycles-pp.pick_next_task_fair > 1.22 ± 24% -1.1 0.10 ± 71% perf-profile.children.cycles-pp.mod_delayed_work_on > 1.16 ± 50% -1.0 0.13 ± 78% perf-profile.children.cycles-pp.open64 > 0.95 ± 31% -0.9 0.06 ± 84% perf-profile.children.cycles-pp.try_to_grab_pending > 0.99 ± 45% -0.9 0.10 ± 71% perf-profile.children.cycles-pp.kblockd_mod_delayed_work_on > 0.88 ± 50% -0.8 0.06 ±111% perf-profile.children.cycles-pp.filemap_get_pages > 0.81 ± 48% -0.7 0.06 ±113% perf-profile.children.cycles-pp.filemap_get_read_batch > 1.08 ± 28% -0.6 0.45 ± 39% perf-profile.children.cycles-pp.clear_bhb_loop > 0.70 ± 51% -0.6 0.08 ±125% perf-profile.children.cycles-pp.folio_alloc_noprof > 0.66 ± 53% -0.6 0.06 ± 84% perf-profile.children.cycles-pp.__filemap_add_folio > 0.59 ± 32% -0.6 0.04 ±100% perf-profile.children.cycles-pp.xfs_map_blocks > 0.88 ± 14% -0.5 0.38 ± 51% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio > 0.58 ± 38% -0.5 0.13 ± 81% perf-profile.children.cycles-pp.writeback_iter > 0.71 ± 29% -0.4 0.26 ± 51% perf-profile.children.cycles-pp.entry_SYSCALL_64 > 0.52 ± 43% -0.4 0.12 ± 91% perf-profile.children.cycles-pp.writeback_get_folio > 0.44 ± 53% -0.4 0.06 ±130% perf-profile.children.cycles-pp.__folio_start_writeback > 0.49 ± 20% -0.4 0.12 ± 62% perf-profile.children.cycles-pp.percpu_counter_add_batch > 0.35 ± 38% -0.3 0.08 ± 54% perf-profile.children.cycles-pp.touch_atime > 0.63 ± 23% -0.3 0.37 ± 66% perf-profile.children.cycles-pp.get_page_from_freelist > 0.47 ± 33% -0.3 0.21 ± 79% perf-profile.children.cycles-pp.xas_load > 0.30 ± 35% -0.2 0.09 ± 86% perf-profile.children.cycles-pp.rmqueue > 0.24 ± 29% -0.2 0.05 ± 71% perf-profile.children.cycles-pp.__mod_node_page_state > 0.13 ± 22% -0.1 0.04 ±101% perf-profile.children.cycles-pp.lock_timer_base > 0.16 ± 31% -0.1 0.08 ± 85% perf-profile.children.cycles-pp.xas_find > 0.16 ± 22% -0.1 0.08 ± 78% perf-profile.children.cycles-pp.xfs_trans_reserve > 0.05 ± 78% +0.1 0.12 ± 37% perf-profile.children.cycles-pp.hrtimer_try_to_cancel > 0.00 +0.1 0.08 ± 10% perf-profile.children.cycles-pp.change_protection_range > 0.13 ± 33% +0.1 0.21 ± 38% perf-profile.children.cycles-pp.wakeup_preempt > 0.05 ±108% +0.1 0.14 ± 24% perf-profile.children.cycles-pp.perf_event_read > 0.05 ±108% +0.1 0.14 ± 24% perf-profile.children.cycles-pp.smp_call_function_single > 0.04 ±118% +0.1 0.14 ± 37% perf-profile.children.cycles-pp.run_ksoftirqd > 0.02 ±143% +0.1 0.12 ± 60% perf-profile.children.cycles-pp.__poll > 0.02 ±143% +0.1 0.12 ± 60% perf-profile.children.cycles-pp.__x64_sys_poll > 0.02 ±143% +0.1 0.12 ± 60% perf-profile.children.cycles-pp.do_sys_poll > 0.01 ±223% +0.1 0.11 ± 48% perf-profile.children.cycles-pp.lockref_put_return > 0.02 ±142% +0.1 0.12 ± 60% perf-profile.children.cycles-pp.do_poll > 0.07 ± 75% +0.1 0.18 ± 34% perf-profile.children.cycles-pp.switch_fpu_return > 0.04 ±112% +0.1 0.15 ± 66% perf-profile.children.cycles-pp.getenv > 0.01 ±223% +0.1 0.12 ± 39% perf-profile.children.cycles-pp.folio_putback_lru > 0.04 ±114% +0.1 0.16 ± 37% perf-profile.children.cycles-pp.generic_exec_single > 0.02 ±223% +0.1 0.14 ± 38% perf-profile.children.cycles-pp.expand_downwards > 0.12 ± 44% +0.1 0.24 ± 15% perf-profile.children.cycles-pp.__check_object_size > 0.14 ± 29% +0.1 0.27 ± 19% perf-profile.children.cycles-pp.vma_alloc_folio_noprof > 0.01 ±223% +0.1 0.14 ± 37% perf-profile.children.cycles-pp.xfsaild > 0.01 ±223% +0.1 0.14 ± 37% perf-profile.children.cycles-pp.xfsaild_push > 0.01 ±223% +0.1 0.14 ± 39% perf-profile.children.cycles-pp.get_cpu_sleep_time_us > 0.06 ± 83% +0.1 0.19 ± 62% perf-profile.children.cycles-pp.__d_add > 0.10 ± 92% +0.1 0.24 ± 15% perf-profile.children.cycles-pp.shift_arg_pages > 0.00 +0.1 0.14 ± 37% perf-profile.children.cycles-pp.get_idle_time > 0.13 ± 37% +0.1 0.27 ± 19% perf-profile.children.cycles-pp.folio_alloc_mpol_noprof > 0.11 ± 80% +0.1 0.26 ± 39% perf-profile.children.cycles-pp.mm_init > 0.07 ± 49% +0.1 0.22 ± 35% perf-profile.children.cycles-pp.pte_alloc_one > 0.22 ± 47% +0.2 0.37 ± 33% perf-profile.children.cycles-pp.scsi_mq_get_budget > 0.06 ±103% +0.2 0.21 ± 53% perf-profile.children.cycles-pp.unlink_anon_vmas > 0.14 ± 79% +0.2 0.30 ± 30% perf-profile.children.cycles-pp.mas_wr_node_store > 0.04 ±107% +0.2 0.20 ± 70% perf-profile.children.cycles-pp.dyntick_save_progress_counter > 0.05 ±132% +0.2 0.22 ± 66% perf-profile.children.cycles-pp.sysfs_kf_seq_show > 0.05 ±136% +0.2 0.22 ± 66% perf-profile.children.cycles-pp.dev_attr_show > 0.07 ± 89% +0.2 0.24 ± 28% perf-profile.children.cycles-pp.__cmd_record > 0.07 ± 89% +0.2 0.24 ± 28% perf-profile.children.cycles-pp.cmd_record > 0.05 ±125% +0.2 0.22 ± 43% perf-profile.children.cycles-pp.move_queued_task > 0.08 ± 14% +0.2 0.26 ± 51% perf-profile.children.cycles-pp.flush_smp_call_function_queue > 0.07 ±127% +0.2 0.25 ± 31% perf-profile.children.cycles-pp.__dentry_kill > 0.04 ± 77% +0.2 0.22 ± 62% perf-profile.children.cycles-pp.mas_split > 0.08 ± 27% +0.2 0.27 ± 47% perf-profile.children.cycles-pp.mas_alloc_nodes > 0.14 ± 62% +0.2 0.34 ± 29% perf-profile.children.cycles-pp.create_elf_tables > 0.01 ±223% +0.2 0.21 ± 60% perf-profile.children.cycles-pp.__put_user_8 > 0.12 ± 64% +0.2 0.32 ± 48% perf-profile.children.cycles-pp.seq_printf > 0.21 ± 36% +0.2 0.43 ± 37% perf-profile.children.cycles-pp.vfs_statx > 0.23 ± 73% +0.2 0.45 ± 38% perf-profile.children.cycles-pp.sbitmap_get > 0.15 ±104% +0.2 0.38 ± 52% perf-profile.children.cycles-pp.get_arg_page > 0.24 ± 70% +0.2 0.47 ± 42% perf-profile.children.cycles-pp.load_elf_interp > 0.08 ± 52% +0.2 0.31 ± 44% perf-profile.children.cycles-pp.__get_user_8 > 0.13 ± 48% +0.3 0.38 ± 51% perf-profile.children.cycles-pp.rcu_gp_fqs_loop > 0.21 ± 66% +0.3 0.46 ± 46% perf-profile.children.cycles-pp.vsnprintf > 0.12 ± 61% +0.3 0.38 ± 30% perf-profile.children.cycles-pp.slab_show > 0.15 ± 65% +0.3 0.40 ± 50% perf-profile.children.cycles-pp.rep_stos_alternative > 0.23 ± 32% +0.3 0.49 ± 32% perf-profile.children.cycles-pp.path_lookupat > 0.13 ± 82% +0.3 0.40 ± 50% perf-profile.children.cycles-pp.perf_evsel__read > 0.13 ± 62% +0.3 0.40 ± 24% perf-profile.children.cycles-pp.step_into > 0.14 ± 70% +0.3 0.42 ± 35% perf-profile.children.cycles-pp.alloc_anon_folio > 0.23 ± 32% +0.3 0.51 ± 28% perf-profile.children.cycles-pp.filename_lookup > 0.06 ± 88% +0.3 0.34 ± 48% perf-profile.children.cycles-pp.rseq_ip_fixup > 0.18 ± 39% +0.3 0.47 ± 38% perf-profile.children.cycles-pp.dput > 0.16 ± 43% +0.3 0.45 ± 37% perf-profile.children.cycles-pp.rcu_gp_kthread > 0.28 ± 69% +0.3 0.57 ± 32% perf-profile.children.cycles-pp.__vfork > 0.24 ± 68% +0.3 0.54 ± 29% perf-profile.children.cycles-pp.__x64_sys_sched_setaffinity > 0.09 ± 65% +0.3 0.42 ± 38% perf-profile.children.cycles-pp.__rseq_handle_notify_resume > 0.36 ± 40% +0.3 0.68 ± 27% perf-profile.children.cycles-pp.__do_sys_newfstatat > 0.23 ± 66% +0.3 0.58 ± 56% perf-profile.children.cycles-pp.free_pgtables > 0.32 ± 42% +0.4 0.67 ± 30% perf-profile.children.cycles-pp.tick_irq_enter > 0.43 ± 39% +0.4 0.78 ± 19% perf-profile.children.cycles-pp.irq_enter_rcu > 0.21 ± 55% +0.4 0.57 ± 41% perf-profile.children.cycles-pp.readn > 0.16 ± 33% +0.4 0.52 ± 50% perf-profile.children.cycles-pp.dup_mmap > 0.15 ± 77% +0.4 0.51 ± 39% perf-profile.children.cycles-pp.show_stat > 0.34 ± 69% +0.4 0.71 ± 42% perf-profile.children.cycles-pp.zap_present_ptes > 0.21 ± 67% +0.4 0.60 ± 23% perf-profile.children.cycles-pp.i2c_outb > 0.23 ± 79% +0.4 0.62 ± 36% perf-profile.children.cycles-pp.seq_read > 0.18 ± 40% +0.4 0.59 ± 46% perf-profile.children.cycles-pp.dup_mm > 0.30 ± 67% +0.4 0.72 ± 36% perf-profile.children.cycles-pp.tmigr_handle_remote_up > 0.20 ± 65% +0.4 0.63 ± 58% perf-profile.children.cycles-pp.sync_regs > 0.25 ± 66% +0.4 0.68 ± 23% perf-profile.children.cycles-pp.try_address > 0.27 ± 74% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.output_poll_execute > 0.28 ± 70% +0.5 0.74 ± 28% perf-profile.children.cycles-pp.pipe_read > 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.__i2c_transfer > 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.bit_xfer > 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.drm_connector_helper_detect_from_ddc > 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.drm_do_probe_ddc_edid > 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.drm_helper_probe_detect_ctx > 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.drm_probe_ddc > 0.26 ± 70% +0.5 0.72 ± 24% perf-profile.children.cycles-pp.i2c_transfer > 0.14 ± 61% +0.5 0.61 ± 35% perf-profile.children.cycles-pp.balance_fair > 0.23 ± 68% +0.5 0.69 ± 34% perf-profile.children.cycles-pp.folio_wait_bit_common > 0.28 ± 81% +0.5 0.77 ± 29% perf-profile.children.cycles-pp.pipe_write > 0.35 ± 68% +0.5 0.84 ± 41% perf-profile.children.cycles-pp.tmigr_handle_remote > 0.24 ± 81% +0.5 0.74 ± 34% perf-profile.children.cycles-pp.copy_strings > 0.27 ± 72% +0.5 0.78 ± 50% perf-profile.children.cycles-pp.exec_mmap > 0.19 ± 83% +0.5 0.70 ± 34% perf-profile.children.cycles-pp.folio_wait_writeback > 0.30 ± 67% +0.5 0.83 ± 31% perf-profile.children.cycles-pp.do_anonymous_page > 0.44 ± 59% +0.6 1.00 ± 46% perf-profile.children.cycles-pp.zap_pmd_range > 0.35 ± 86% +0.6 0.93 ± 31% perf-profile.children.cycles-pp.collapse_huge_page > 0.36 ± 84% +0.6 0.93 ± 30% perf-profile.children.cycles-pp.khugepaged > 0.35 ± 85% +0.6 0.93 ± 30% perf-profile.children.cycles-pp.hpage_collapse_scan_pmd > 0.35 ± 85% +0.6 0.93 ± 30% perf-profile.children.cycles-pp.khugepaged_scan_mm_slot > 0.34 ± 63% +0.6 0.92 ± 37% perf-profile.children.cycles-pp.evlist_cpu_iterator__next > 0.24 ± 42% +0.6 0.89 ± 47% perf-profile.children.cycles-pp.scsi_dispatch_cmd > 0.32 ± 62% +0.7 0.99 ± 45% perf-profile.children.cycles-pp.begin_new_exec > 0.41 ± 58% +0.7 1.10 ± 21% perf-profile.children.cycles-pp.__open64_nocancel > 0.37 ± 48% +0.7 1.06 ± 34% perf-profile.children.cycles-pp._Fork > 0.54 ± 66% +0.7 1.27 ± 29% perf-profile.children.cycles-pp.sched_setaffinity > 0.78 ± 51% +0.8 1.56 ± 28% perf-profile.children.cycles-pp.link_path_walk > 0.40 ± 35% +0.8 1.19 ± 40% perf-profile.children.cycles-pp.scsi_queue_rq > 0.51 ± 74% +0.8 1.30 ± 39% perf-profile.children.cycles-pp.elf_load > 0.34 ± 54% +0.8 1.15 ± 34% perf-profile.children.cycles-pp.smpboot_thread_fn > 0.72 ± 69% +0.9 1.66 ± 33% perf-profile.children.cycles-pp.read_counters > 0.77 ± 70% +1.0 1.76 ± 33% perf-profile.children.cycles-pp.cmd_stat > 0.77 ± 70% +1.0 1.76 ± 33% perf-profile.children.cycles-pp.dispatch_events > 0.77 ± 70% +1.0 1.76 ± 33% perf-profile.children.cycles-pp.process_interval > 1.00 ± 19% +1.0 2.01 ± 26% perf-profile.children.cycles-pp.filemap_map_pages > 1.39 ± 34% +1.1 2.48 ± 22% perf-profile.children.cycles-pp.asm_sysvec_posted_msi_notification > 0.85 ± 68% +1.1 1.99 ± 32% perf-profile.children.cycles-pp.handle_internal_command > 0.85 ± 68% +1.1 1.99 ± 32% perf-profile.children.cycles-pp.main > 0.85 ± 68% +1.1 1.99 ± 32% perf-profile.children.cycles-pp.run_builtin > 0.88 ± 64% +1.2 2.04 ± 32% perf-profile.children.cycles-pp.do_group_exit > 0.88 ± 64% +1.2 2.05 ± 32% perf-profile.children.cycles-pp.__x64_sys_exit_group > 0.64 ± 66% +1.2 1.87 ± 37% perf-profile.children.cycles-pp.seq_read_iter > 1.24 ± 54% +1.2 2.48 ± 31% perf-profile.children.cycles-pp.x64_sys_call > 1.03 ± 66% +1.3 2.30 ± 30% perf-profile.children.cycles-pp.do_exit > 0.88 ± 64% +1.5 2.35 ± 30% perf-profile.children.cycles-pp.setlocale > 1.08 ± 71% +1.5 2.56 ± 26% perf-profile.children.cycles-pp.mmput > 1.05 ± 70% +1.5 2.55 ± 27% perf-profile.children.cycles-pp.exit_mmap > 3.54 ± 34% +1.8 5.33 ± 20% perf-profile.children.cycles-pp.handle_softirqs > 1.61 ± 55% +1.8 3.40 ± 35% perf-profile.children.cycles-pp.mmap_region > 1.30 ± 69% +2.0 3.28 ± 32% perf-profile.children.cycles-pp.load_elf_binary > 1.32 ± 68% +2.0 3.30 ± 31% perf-profile.children.cycles-pp.search_binary_handler > 1.32 ± 68% +2.0 3.32 ± 31% perf-profile.children.cycles-pp.exec_binprm > 1.45 ± 70% +2.1 3.52 ± 30% perf-profile.children.cycles-pp.bprm_execve > 2.01 ± 71% +3.0 5.01 ± 22% perf-profile.children.cycles-pp.__x64_sys_execve > 2.01 ± 71% +3.0 5.01 ± 23% perf-profile.children.cycles-pp.execve > 2.00 ± 70% +3.0 5.01 ± 22% perf-profile.children.cycles-pp.do_execveat_common > 1.18 ± 39% +3.3 4.48 ± 31% perf-profile.children.cycles-pp.dd_dispatch_request > 2.77 ± 34% +3.6 6.33 ± 28% perf-profile.children.cycles-pp.__blk_mq_sched_dispatch_requests > 2.77 ± 34% +3.6 6.34 ± 28% perf-profile.children.cycles-pp.blk_mq_sched_dispatch_requests > 2.59 ± 34% +3.8 6.35 ± 27% perf-profile.children.cycles-pp.blk_mq_run_hw_queue > 1.94 ± 37% +4.3 6.24 ± 29% perf-profile.children.cycles-pp.__blk_mq_do_dispatch_sched > 3.76 ± 36% +5.0 8.79 ± 30% perf-profile.children.cycles-pp.__blk_flush_plug > 3.76 ± 36% +5.0 8.79 ± 30% perf-profile.children.cycles-pp.blk_mq_dispatch_plug_list > 3.76 ± 36% +5.0 8.79 ± 30% perf-profile.children.cycles-pp.blk_mq_flush_plug_list > 4.84 ± 35% +5.1 9.96 ± 29% perf-profile.children.cycles-pp.iomap_submit_ioend > 4.01 ± 55% +8.1 12.11 ± 30% perf-profile.children.cycles-pp.xfs_file_write_checks > 3.34 ± 56% +8.3 11.60 ± 31% perf-profile.children.cycles-pp.iomap_zero_range > 0.00 +11.2 11.22 ± 30% perf-profile.children.cycles-pp.filemap_write_and_wait_range > 3.43 ± 46% -3.2 0.24 ± 35% perf-profile.self.cycles-pp.memset_orig > 1.42 ± 47% -1.3 0.08 ±121% perf-profile.self.cycles-pp._copy_to_iter > 1.07 ± 27% -0.6 0.45 ± 39% perf-profile.self.cycles-pp.clear_bhb_loop > 0.38 ± 23% -0.3 0.12 ± 62% perf-profile.self.cycles-pp.percpu_counter_add_batch > 0.30 ± 33% -0.2 0.09 ± 88% perf-profile.self.cycles-pp.do_syscall_64 > 0.22 ± 30% -0.2 0.05 ± 71% perf-profile.self.cycles-pp.__mod_node_page_state > 0.13 ± 20% -0.1 0.04 ±103% perf-profile.self.cycles-pp.__page_cache_release > 0.01 ±223% +0.1 0.11 ± 48% perf-profile.self.cycles-pp.lockref_put_return > 0.02 ±142% +0.1 0.13 ± 65% perf-profile.self.cycles-pp.xfs_ag_block_count > 0.01 ±223% +0.1 0.13 ± 45% perf-profile.self.cycles-pp.get_cpu_sleep_time_us > 0.04 ±112% +0.2 0.19 ± 68% perf-profile.self.cycles-pp.dyntick_save_progress_counter > 0.14 ± 35% +0.2 0.31 ± 35% perf-profile.self.cycles-pp.sched_balance_domains > 0.08 ± 52% +0.2 0.31 ± 44% perf-profile.self.cycles-pp.__get_user_8 > 0.06 ± 92% +0.3 0.34 ± 60% perf-profile.self.cycles-pp.fold_vm_numa_events > 0.20 ± 65% +0.4 0.63 ± 58% perf-profile.self.cycles-pp.sync_regs > 0.26 ± 34% +0.5 0.75 ± 37% perf-profile.self.cycles-pp.filemap_map_pages > > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linus:master] [iomap] c5c810b94c: stress-ng.metamix.ops_per_sec -98.4% regression 2024-10-14 16:34 ` Brian Foster @ 2024-10-15 21:50 ` Dave Chinner 2024-10-16 15:47 ` Brian Foster 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2024-10-15 21:50 UTC (permalink / raw) To: Brian Foster Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Christian Brauner, Darrick J. Wong, Josef Bacik, linux-xfs, linux-fsdevel, ying.huang, feng.tang, fengwei.yin On Mon, Oct 14, 2024 at 12:34:37PM -0400, Brian Foster wrote: > On Mon, Oct 14, 2024 at 03:55:24PM +0800, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a -98.4% regression of stress-ng.metamix.ops_per_sec on: > > > > > > commit: c5c810b94cfd818fc2f58c96feee58a9e5ead96d ("iomap: fix handling of dirty folios over unwritten extents") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > testcase: stress-ng > > config: x86_64-rhel-8.3 > > compiler: gcc-12 > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > > parameters: > > > > nr_threads: 100% > > disk: 1HDD > > testtime: 60s > > fs: xfs > > test: metamix > > cpufreq_governor: performance > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > | Closes: https://lore.kernel.org/oe-lkp/202410141536.1167190b-oliver.sang@intel.com > > > > > > Details are as below: > > --------------------------------------------------------------------------------------------------> > > > > > > The kernel config and materials to reproduce are available at: > > https://download.01.org/0day-ci/archive/20241014/202410141536.1167190b-oliver.sang@intel.com > > > > So I basically just run this on a >64xcpu guest and reproduce the delta: > > stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64 > > The short of it is that with tracing enabled, I see a very large number > of extending writes across unwritten mappings, which basically means XFS > eof zeroing is calling zero range and hitting the newly introduced > flush. This is all pretty much expected given the patch. Ouch. The conditions required to cause this regression are that we either first use fallocate() to preallocate beyond EOF, or buffered writes trigger specualtive delalloc beyond EOF and they get converted to unwritten beyond EOF through background writeback or fsync operations. Both of these lead to unwritten extents beyond EOF that extending writes will fall into. All we need now is the extending writes to be slightly non-sequential and those non-sequential extending writes will not land at EOF but at some distance beyond it. At this point, we trigger the new flush code. Unfortunately, this is actually a fairly common workload pattern. For example, experience tells me that NFS server processing of async sequential write requests from a client will -always- end up with slightly out of order extending writes because the incoming async write requests are processed concurrently. Hence they always race to extend the file and slightly out of order file extension happens quite frequently. Further, the NFS client will also periodically be sending a write commit request (i.e. server side fsync), the NFS server writeback will convert the speculative delalloc that extends beyond EOF into unwritten extents beyond EOF whilst the incoming extending write requests are still incoming from the client. Hence I think that there are common workloads (e.g. large sequential writes on a NFS client) that set up the exact conditions and IO patterns necessary to trigger this performance regression in production systems... > I ran a quick experiment to skip the flush on sub-4k ranges in favor of > doing explicit folio zeroing. The idea with that is that the range is > likely restricted to single folio and since it's dirty, we can assume > unwritten conversion is imminent and just explicitly zero the range. I > still see a decent number of flushes from larger ranges in that > experiment, but that still seems to get things pretty close to my > baseline test (on a 6.10 distro kernel). What filesystems other than XFS actually need this iomap bandaid right now? If there are none (which I think is the case), then we should just revert this change it until a more performant fix is available for XFS. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linus:master] [iomap] c5c810b94c: stress-ng.metamix.ops_per_sec -98.4% regression 2024-10-15 21:50 ` Dave Chinner @ 2024-10-16 15:47 ` Brian Foster 2024-10-23 14:15 ` Brian Foster 0 siblings, 1 reply; 5+ messages in thread From: Brian Foster @ 2024-10-16 15:47 UTC (permalink / raw) To: Dave Chinner Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Christian Brauner, Darrick J. Wong, Josef Bacik, linux-xfs, linux-fsdevel, ying.huang, feng.tang, fengwei.yin On Wed, Oct 16, 2024 at 08:50:58AM +1100, Dave Chinner wrote: > On Mon, Oct 14, 2024 at 12:34:37PM -0400, Brian Foster wrote: > > On Mon, Oct 14, 2024 at 03:55:24PM +0800, kernel test robot wrote: > > > > > > > > > Hello, > > > > > > kernel test robot noticed a -98.4% regression of stress-ng.metamix.ops_per_sec on: > > > > > > > > > commit: c5c810b94cfd818fc2f58c96feee58a9e5ead96d ("iomap: fix handling of dirty folios over unwritten extents") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > testcase: stress-ng > > > config: x86_64-rhel-8.3 > > > compiler: gcc-12 > > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > > > parameters: > > > > > > nr_threads: 100% > > > disk: 1HDD > > > testtime: 60s > > > fs: xfs > > > test: metamix > > > cpufreq_governor: performance > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > the same patch/commit), kindly add following tags > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > | Closes: https://lore.kernel.org/oe-lkp/202410141536.1167190b-oliver.sang@intel.com > > > > > > > > > Details are as below: > > > --------------------------------------------------------------------------------------------------> > > > > > > > > > The kernel config and materials to reproduce are available at: > > > https://download.01.org/0day-ci/archive/20241014/202410141536.1167190b-oliver.sang@intel.com > > > > > > > So I basically just run this on a >64xcpu guest and reproduce the delta: > > > > stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64 > > > > The short of it is that with tracing enabled, I see a very large number > > of extending writes across unwritten mappings, which basically means XFS > > eof zeroing is calling zero range and hitting the newly introduced > > flush. This is all pretty much expected given the patch. > > Ouch. > > The conditions required to cause this regression are that we either > first use fallocate() to preallocate beyond EOF, or buffered writes > trigger specualtive delalloc beyond EOF and they get converted to > unwritten beyond EOF through background writeback or fsync > operations. Both of these lead to unwritten extents beyond EOF that > extending writes will fall into. > > All we need now is the extending writes to be slightly > non-sequential and those non-sequential extending writes will not > land at EOF but at some distance beyond it. At this point, we > trigger the new flush code. Unfortunately, this is actually a fairly > common workload pattern. > > For example, experience tells me that NFS server processing of async > sequential write requests from a client will -always- end up with > slightly out of order extending writes because the incoming async > write requests are processed concurrently. Hence they always race to > extend the file and slightly out of order file extension happens > quite frequently. > > Further, the NFS client will also periodically be sending a write > commit request (i.e. server side fsync), the > NFS server writeback will convert the speculative delalloc that > extends beyond EOF into unwritten extents beyond EOF whilst the > incoming extending write requests are still incoming from the > client. > > Hence I think that there are common workloads (e.g. large sequential > writes on a NFS client) that set up the exact conditions and IO > patterns necessary to trigger this performance regression in > production systems... > It's not clear to me that purely out of order writeback via NFS would produce the same sort of hit here because we'd only flush on write extensions. I think the pathological case would have to be something like reordering such that every other write lands sequentially to maximize the number of post-eof write extensions, and then going back and filling in the gaps. That seems rather suboptimal to start, and short of that the cost of the flushes will start to amortize to some degree (including with commit requests, etc.). That said, I don't have much experience with NFS and I think this is a reasonable enough argument to try and optimize here. If you or anybody has an NFS test/workload that might exacerbate this condition, let me know and I'll try to play around with it. > > I ran a quick experiment to skip the flush on sub-4k ranges in favor of > > doing explicit folio zeroing. The idea with that is that the range is > > likely restricted to single folio and since it's dirty, we can assume > > unwritten conversion is imminent and just explicitly zero the range. I > > still see a decent number of flushes from larger ranges in that > > experiment, but that still seems to get things pretty close to my > > baseline test (on a 6.10 distro kernel). > > What filesystems other than XFS actually need this iomap bandaid > right now? If there are none (which I think is the case), then we > should just revert this change it until a more performant fix is > available for XFS. > I think that's a bit hasty. I had one or two ideas/prototypes to work around this sort of problem before the flush patches even landed, it just wasn't clear to me they were worth the extra logic. I'd prefer to try and iterate on performance from a baseline of functional correctness rather than the other way around, if possible. A quick hack to test out some of that on latest master brings the result of this test right back to baseline in my local env. Let me play around with trying to work that into something more production worthy before we break out the pitchforks.. ;) Brian > -Dave. > -- > Dave Chinner > david@fromorbit.com > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linus:master] [iomap] c5c810b94c: stress-ng.metamix.ops_per_sec -98.4% regression 2024-10-16 15:47 ` Brian Foster @ 2024-10-23 14:15 ` Brian Foster 0 siblings, 0 replies; 5+ messages in thread From: Brian Foster @ 2024-10-23 14:15 UTC (permalink / raw) To: Dave Chinner Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Christian Brauner, Darrick J. Wong, Josef Bacik, linux-xfs, linux-fsdevel, ying.huang, feng.tang, fengwei.yin On Wed, Oct 16, 2024 at 11:47:24AM -0400, Brian Foster wrote: > On Wed, Oct 16, 2024 at 08:50:58AM +1100, Dave Chinner wrote: > > On Mon, Oct 14, 2024 at 12:34:37PM -0400, Brian Foster wrote: > > > On Mon, Oct 14, 2024 at 03:55:24PM +0800, kernel test robot wrote: > > > > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed a -98.4% regression of stress-ng.metamix.ops_per_sec on: > > > > > > > > > > > > commit: c5c810b94cfd818fc2f58c96feee58a9e5ead96d ("iomap: fix handling of dirty folios over unwritten extents") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > testcase: stress-ng > > > > config: x86_64-rhel-8.3 > > > > compiler: gcc-12 > > > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > > > > parameters: > > > > > > > > nr_threads: 100% > > > > disk: 1HDD > > > > testtime: 60s > > > > fs: xfs > > > > test: metamix > > > > cpufreq_governor: performance > > > > > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > the same patch/commit), kindly add following tags > > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > > | Closes: https://lore.kernel.org/oe-lkp/202410141536.1167190b-oliver.sang@intel.com > > > > > > > > > > > > Details are as below: > > > > --------------------------------------------------------------------------------------------------> > > > > > > > > > > > > The kernel config and materials to reproduce are available at: > > > > https://download.01.org/0day-ci/archive/20241014/202410141536.1167190b-oliver.sang@intel.com > > > > > > > > > > So I basically just run this on a >64xcpu guest and reproduce the delta: > > > > > > stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64 > > > > > > The short of it is that with tracing enabled, I see a very large number > > > of extending writes across unwritten mappings, which basically means XFS > > > eof zeroing is calling zero range and hitting the newly introduced > > > flush. This is all pretty much expected given the patch. > > > > Ouch. > > > > The conditions required to cause this regression are that we either > > first use fallocate() to preallocate beyond EOF, or buffered writes > > trigger specualtive delalloc beyond EOF and they get converted to > > unwritten beyond EOF through background writeback or fsync > > operations. Both of these lead to unwritten extents beyond EOF that > > extending writes will fall into. > > > > All we need now is the extending writes to be slightly > > non-sequential and those non-sequential extending writes will not > > land at EOF but at some distance beyond it. At this point, we > > trigger the new flush code. Unfortunately, this is actually a fairly > > common workload pattern. > > > > For example, experience tells me that NFS server processing of async > > sequential write requests from a client will -always- end up with > > slightly out of order extending writes because the incoming async > > write requests are processed concurrently. Hence they always race to > > extend the file and slightly out of order file extension happens > > quite frequently. > > > > Further, the NFS client will also periodically be sending a write > > commit request (i.e. server side fsync), the > > NFS server writeback will convert the speculative delalloc that > > extends beyond EOF into unwritten extents beyond EOF whilst the > > incoming extending write requests are still incoming from the > > client. > > > > Hence I think that there are common workloads (e.g. large sequential > > writes on a NFS client) that set up the exact conditions and IO > > patterns necessary to trigger this performance regression in > > production systems... > > > > It's not clear to me that purely out of order writeback via NFS would > produce the same sort of hit here because we'd only flush on write > extensions. I think the pathological case would have to be something > like reordering such that every other write lands sequentially to > maximize the number of post-eof write extensions, and then going back > and filling in the gaps. That seems rather suboptimal to start, and > short of that the cost of the flushes will start to amortize to some > degree (including with commit requests, etc.). > > That said, I don't have much experience with NFS and I think this is a > reasonable enough argument to try and optimize here. If you or anybody > has an NFS test/workload that might exacerbate this condition, let me > know and I'll try to play around with it. > > > > I ran a quick experiment to skip the flush on sub-4k ranges in favor of > > > doing explicit folio zeroing. The idea with that is that the range is > > > likely restricted to single folio and since it's dirty, we can assume > > > unwritten conversion is imminent and just explicitly zero the range. I > > > still see a decent number of flushes from larger ranges in that > > > experiment, but that still seems to get things pretty close to my > > > baseline test (on a 6.10 distro kernel). > > > > What filesystems other than XFS actually need this iomap bandaid > > right now? If there are none (which I think is the case), then we > > should just revert this change it until a more performant fix is > > available for XFS. > > > > I think that's a bit hasty. I had one or two ideas/prototypes to work > around this sort of problem before the flush patches even landed, it > just wasn't clear to me they were worth the extra logic. I'd prefer to > try and iterate on performance from a baseline of functional correctness > rather than the other way around, if possible. > > A quick hack to test out some of that on latest master brings the result > of this test right back to baseline in my local env. Let me play around > with trying to work that into something more production worthy before we > break out the pitchforks.. ;) > So it turns out there is a little bit more going on here. The regression is not so much the flush on its own, but the combination of the flush and changes in commit 5ce5674187c34 ("xfs: convert delayed extents to unwritten when zeroing post eof blocks"). This changes post-eof zero range calls on XFS to convert delalloc extents to unwritten instead of the prior behavior of leaving them as delalloc, zeroing in memory, and continuing on. IOW, the regression also goes away by bypassing this particular commit, even with flushing in place. The prealloc change seems fairly reasonable at face value, but the commit log description documents it as purely an i_size change bug fix associated with an internal zero range, which AFAICT isn't relevant any more because iomap_zero_range() doesn't update i_size AFAICS. However, it looks like it did so in the past and this behavior also swizzled back and forth a time or two in the same timeframe as this particular commit, so perhaps it was a problem when this was introduced and then iomap changed again after (or maybe I'm just missing something?). On thinking more about this, I'd be a little concerned on whether this will reduce effectiveness of speculative preallocation on similar sorts of write extending workloads as this test (i.e. strided extending writes). This changes behavior from doing in-memory zeroing between physical allocations via the writeback path to doing physical allocation on every write that starts beyond EOF, which feels a little like going from one extreme to the other. Instead, I'd expect to see something where this converts larger mappings to avoid excessive zeroing and zeroes on smallish ranges to avoid overly frequent and unnecessarily small physical allocations, allowing multiple speculative preallocations to compound. Anyways, I've not dug into this enough to know whether it's a problem, but since this is documented purely as a bug fix I don't see any evidence that potential impact on allocation patterns was tested either. This might be something to evaluate more closely in XFS. On the iomap side, it also seems like the current handling of i_size on zero range is confused. If iomap_zero_range() doesn't update i_size, then it basically doesn't fully support post-eof ranges. It zeroes through buffered writes, which writeback will just drop on the floor if beyond EOF. However, XFS explicitly calls zero range on post-eof ranges to trigger the aforementioned conversion in its begin callback (but never expecting to see ranges that need buffered writes). I think this is a landmine waiting to happen. If iomap decides to be deliberate and skip post-eof ranges, then this could break current XFS behavior if it skips the begin callback. OTOH if XFS were to change back to at least doing some speculative prealloc delalloc zeroing, IIUC this now introduces a race between writeback potentially throwing away the zeroed folios over delalloc preallocation and the subsequent write operation extending i_size so that doesn't happen. :/ None of this is particularly obvious. And FWIW, I'm also skeptical that i_size updates were ever consistent across mapping types. I.e., if the size was only ever updated via iomap_write_end() for example, then behavior is kind of unpredictable. Maybe this is something that should be configurable via a keepsize flag or some such. That would at least allow for correct behavior and/or a failure/warning if we ever fell into doing zeroing for post-eof ranges without updating i_size. Thoughts on any of this? Brian > Brian > > > -Dave. > > -- > > Dave Chinner > > david@fromorbit.com > > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-23 14:14 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-14 7:55 [linus:master] [iomap] c5c810b94c: stress-ng.metamix.ops_per_sec -98.4% regression kernel test robot 2024-10-14 16:34 ` Brian Foster 2024-10-15 21:50 ` Dave Chinner 2024-10-16 15:47 ` Brian Foster 2024-10-23 14:15 ` Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).