* [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression
@ 2024-05-23 2:58 kernel test robot
2024-05-30 13:27 ` Amir Goldstein
0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2024-05-23 2:58 UTC (permalink / raw)
To: Amir Goldstein; +Cc: oe-lkp, lkp, oliver.sang
Hello,
kernel test robot noticed a -2.3% regression of stress-ng.fault.ops_per_sec on:
commit: 8829cb6189b7a6b5283b9ffc870df13c085f1cd6 ("fs: hold s_write_srcu for pre-modify permission events on write")
https://github.com/amir73il/linux sb_write_barrier
testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: fault
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405231056.66ecbb94-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240523/202405231056.66ecbb94-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/fault/stress-ng/60s
commit:
3f7a9d8157 ("fs: add srcu variants for mnt_{want,drop}_write() helpers")
8829cb6189 ("fs: hold s_write_srcu for pre-modify permission events on write")
3f7a9d815783aeff 8829cb6189b7a6b5283b9ffc870
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 17% 1:6 dmesg.RIP:native_queued_spin_lock_slowpath
:6 17% 1:6 dmesg.RIP:setup_pebs_adaptive_sample_data
:6 17% 1:6 dmesg.WARNING:at_arch/x86/events/intel/ds.c:#setup_pebs_adaptive_sample_data
%stddev %change %stddev
\ | \
155.51 ± 12% +23.3% 191.81 ± 13% sched_debug.cfs_rq:/.util_est.stddev
5270 ±141% +378.6% 25225 ± 79% sched_debug.cpu.max_idle_balance_cost.stddev
0.63 ± 2% -0.0 0.59 perf-stat.i.branch-miss-rate%
2.61 ± 2% +3.5% 2.70 perf-stat.i.cpi
0.40 ± 5% -5.2% 0.38 perf-stat.i.ipc
53250 -2.3% 52032 stress-ng.fault.minor_page_faults_per_sec
51143720 -2.3% 49967689 stress-ng.fault.ops
852394 -2.3% 832793 stress-ng.fault.ops_per_sec
2.046e+08 -2.3% 1.999e+08 stress-ng.time.minor_page_faults
1.157e+08 -2.2% 1.132e+08 proc-vmstat.numa_hit
1.157e+08 -2.2% 1.131e+08 proc-vmstat.numa_local
51220291 -2.4% 49995156 proc-vmstat.pgactivate
1.377e+08 -2.1% 1.349e+08 proc-vmstat.pgalloc_normal
2.053e+08 -2.4% 2.003e+08 proc-vmstat.pgfault
1.368e+08 -2.2% 1.338e+08 proc-vmstat.pgfree
51073893 -2.4% 49869748 proc-vmstat.unevictable_pgs_culled
24.17 ± 2% -1.7 22.46 ± 2% perf-profile.calltrace.cycles-pp.__madvise
23.20 ± 2% -1.7 21.52 ± 2% perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
23.33 ± 2% -1.7 21.65 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
23.24 ± 2% -1.7 21.55 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
23.31 ± 2% -1.7 21.62 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
22.51 ± 2% -1.7 20.83 ± 2% perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
18.38 ± 3% -1.5 16.87 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
18.12 ± 3% -1.5 16.62 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
17.63 -1.2 16.39 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
17.36 -1.2 16.14 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
17.61 -1.2 16.38 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
17.48 -1.2 16.25 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
15.64 ± 2% -1.2 14.49 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
15.03 ± 2% -1.1 13.91 ± 2% perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
13.49 ± 3% -1.1 12.38 ± 2% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior
13.51 ± 3% -1.1 12.41 ± 2% perf-profile.calltrace.cycles-pp.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
13.51 ± 3% -1.1 12.41 ± 2% perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise
12.53 ± 3% -1.0 11.50 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
36.55 -1.0 35.54 perf-profile.calltrace.cycles-pp.__munmap
36.27 -1.0 35.27 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
36.26 -1.0 35.26 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
12.06 ± 2% -1.0 11.08 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
7.33 -0.6 6.72 ± 2% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
7.10 ± 2% -0.6 6.49 ± 2% perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
7.11 ± 2% -0.6 6.50 ± 2% perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
7.02 ± 2% -0.6 6.42 ± 2% perf-profile.calltrace.cycles-pp.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
7.01 ± 2% -0.6 6.40 ± 2% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
6.99 ± 3% -0.6 6.42 ± 2% perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_range.madvise_pageout.madvise_vma_behavior.do_madvise
7.38 ± 2% -0.6 6.82 ± 2% perf-profile.calltrace.cycles-pp.madvise_pageout.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
6.94 ± 3% -0.6 6.38 ± 2% perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range.madvise_pageout
6.29 ± 2% -0.6 5.73 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache
6.97 ± 2% -0.6 6.41 ± 2% perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_range.madvise_pageout.madvise_vma_behavior
6.38 ± 2% -0.6 5.82 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
6.16 ± 2% -0.6 5.60 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain
6.92 ± 3% -0.6 6.36 ± 2% perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range
7.10 ± 2% -0.6 6.54 ± 2% perf-profile.calltrace.cycles-pp.walk_page_range.madvise_pageout.madvise_vma_behavior.do_madvise.__x64_sys_madvise
6.90 ± 3% -0.6 6.34 ± 2% perf-profile.calltrace.cycles-pp.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range
6.88 ± 3% -0.6 6.32 ± 2% perf-profile.calltrace.cycles-pp.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range
6.97 -0.6 6.42 perf-profile.calltrace.cycles-pp.folios_put_refs.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
6.54 ± 3% -0.5 5.99 ± 2% perf-profile.calltrace.cycles-pp.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range
7.84 -0.5 7.29 perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.__dentry_kill.dput.__fput
7.72 -0.5 7.17 perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill.dput
6.27 ± 3% -0.5 5.75 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range
6.17 ± 3% -0.5 5.65 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range
6.09 ± 3% -0.5 5.57 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range
6.42 -0.5 5.90 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_evict_inode.evict
6.58 ± 3% -0.5 6.07 ± 2% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap
6.25 -0.5 5.73 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_evict_inode
6.62 ± 2% -0.5 6.10 ± 2% perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap
6.62 ± 3% -0.5 6.11 ± 2% perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
6.18 -0.5 5.66 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range
6.16 ± 3% -0.5 5.67 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region
6.14 ± 2% -0.5 5.68 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict
6.07 ± 2% -0.5 5.60 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.truncate_inode_pages_range
3.43 ± 2% -0.3 3.17 perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.__dentry_kill.dput
3.75 ± 2% -0.2 3.50 perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.__dentry_kill.dput.__fput
3.41 ± 2% -0.2 3.17 perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink
3.14 ± 3% -0.2 2.91 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict.__dentry_kill
3.74 ± 2% -0.2 3.50 perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64
3.13 ± 2% -0.2 2.91 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
0.51 -0.2 0.33 ± 70% perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.stress_fault
5.64 -0.1 5.50 perf-profile.calltrace.cycles-pp.stress_fault
4.70 -0.1 4.60 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.stress_fault
4.16 -0.1 4.07 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.stress_fault
4.12 -0.1 4.03 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
3.59 -0.1 3.52 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
2.13 -0.1 2.08 perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
0.92 -0.0 0.88 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.__x64_sys_pwrite64
0.71 -0.0 0.67 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
1.13 -0.0 1.09 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
0.81 -0.0 0.79 perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
0.75 -0.0 0.73 perf-profile.calltrace.cycles-pp.alloc_inode.new_inode.__shmem_get_inode.__shmem_file_setup.shmem_zero_setup
0.60 -0.0 0.58 perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
18.59 +0.2 18.83 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
18.33 +0.2 18.58 perf-profile.calltrace.cycles-pp.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
17.29 +0.3 17.54 perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64
18.08 +0.3 18.34 perf-profile.calltrace.cycles-pp.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
17.19 +0.3 17.45 perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.task_work_run.syscall_exit_to_user_mode
1.96 +0.3 2.23 perf-profile.calltrace.cycles-pp.__libc_pwrite
1.84 +0.3 2.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
1.85 +0.3 2.13 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite
1.82 +0.3 2.10 perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
1.78 +0.3 2.06 perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
15.98 +0.3 16.27 perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.task_work_run
8.24 +0.9 9.15 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
8.24 +0.9 9.16 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink
8.36 +0.9 9.27 ± 2% perf-profile.calltrace.cycles-pp.unlink
8.04 +0.9 8.96 ± 2% perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
8.20 +0.9 9.13 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
5.36 +1.0 6.37 ± 3% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.64 ± 6% +1.0 4.68 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
3.88 ± 6% +1.1 4.94 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
1.35 ± 11% +1.2 2.52 ± 10% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.do_unlinkat.__x64_sys_unlink
1.42 ± 10% +1.2 2.62 ± 10% perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64
8.34 ± 5% +2.3 10.62 ± 6% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
8.36 ± 5% +2.3 10.64 ± 6% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
8.53 ± 5% +2.3 10.81 ± 6% perf-profile.calltrace.cycles-pp.open64
8.39 ± 5% +2.3 10.67 ± 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
8.40 ± 5% +2.3 10.68 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
8.01 ± 5% +2.3 10.30 ± 6% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.98 ± 5% +2.3 10.27 ± 6% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
2.87 ± 10% +2.3 5.19 ± 11% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.new_inode.ramfs_get_inode.ramfs_mknod
5.99 ± 6% +2.3 8.32 ± 7% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
4.04 ± 7% +2.3 6.38 ± 9% perf-profile.calltrace.cycles-pp.new_inode.ramfs_get_inode.ramfs_mknod.lookup_open.open_last_lookups
3.06 ± 9% +2.4 5.44 ± 11% perf-profile.calltrace.cycles-pp._raw_spin_lock.new_inode.ramfs_get_inode.ramfs_mknod.lookup_open
5.53 ± 6% +2.4 7.91 ± 8% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
4.53 ± 7% +2.4 6.92 ± 9% perf-profile.calltrace.cycles-pp.ramfs_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open
4.39 ± 7% +2.4 6.79 ± 9% perf-profile.calltrace.cycles-pp.ramfs_get_inode.ramfs_mknod.lookup_open.open_last_lookups.path_openat
37.47 ± 2% -3.1 34.42 ± 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
36.99 ± 2% -3.0 33.94 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
27.20 ± 2% -2.2 24.96 ± 2% perf-profile.children.cycles-pp.lru_add_drain
27.10 ± 2% -2.2 24.88 ± 2% perf-profile.children.cycles-pp.folio_batch_move_lru
24.23 ± 2% -1.7 22.52 ± 2% perf-profile.children.cycles-pp.__madvise
23.22 ± 2% -1.7 21.53 ± 2% perf-profile.children.cycles-pp.do_madvise
23.24 ± 2% -1.7 21.56 ± 2% perf-profile.children.cycles-pp.__x64_sys_madvise
22.52 ± 2% -1.7 20.84 ± 2% perf-profile.children.cycles-pp.madvise_vma_behavior
20.19 ± 3% -1.6 18.56 ± 2% perf-profile.children.cycles-pp.lru_add_drain_cpu
17.60 -1.2 16.37 perf-profile.children.cycles-pp.do_vmi_munmap
17.63 -1.2 16.40 perf-profile.children.cycles-pp.__x64_sys_munmap
17.62 -1.2 16.39 perf-profile.children.cycles-pp.__vm_munmap
17.38 -1.2 16.16 perf-profile.children.cycles-pp.do_vmi_align_munmap
15.65 ± 2% -1.2 14.50 perf-profile.children.cycles-pp.unmap_region
15.04 ± 2% -1.1 13.91 ± 2% perf-profile.children.cycles-pp.zap_page_range_single
14.24 ± 2% -1.1 13.19 perf-profile.children.cycles-pp.folios_put_refs
36.59 -1.0 35.58 perf-profile.children.cycles-pp.__munmap
12.71 ± 2% -1.0 11.72 perf-profile.children.cycles-pp.__page_cache_release
7.75 -0.6 7.13 perf-profile.children.cycles-pp.tlb_finish_mmu
7.33 ± 2% -0.6 6.72 ± 2% perf-profile.children.cycles-pp.free_pages_and_swap_cache
7.44 -0.6 6.82 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
7.39 ± 2% -0.6 6.83 ± 2% perf-profile.children.cycles-pp.madvise_pageout
6.95 ± 3% -0.6 6.38 ± 2% perf-profile.children.cycles-pp.walk_p4d_range
7.10 ± 2% -0.6 6.54 ± 2% perf-profile.children.cycles-pp.walk_page_range
6.98 ± 2% -0.6 6.42 ± 2% perf-profile.children.cycles-pp.walk_pgd_range
6.99 ± 3% -0.6 6.43 ± 2% perf-profile.children.cycles-pp.__walk_page_range
6.92 ± 3% -0.6 6.36 ± 2% perf-profile.children.cycles-pp.walk_pud_range
6.88 ± 3% -0.6 6.32 ± 2% perf-profile.children.cycles-pp.madvise_cold_or_pageout_pte_range
6.90 ± 3% -0.6 6.34 ± 2% perf-profile.children.cycles-pp.walk_pmd_range
6.55 ± 3% -0.6 6.00 ± 2% perf-profile.children.cycles-pp.folio_isolate_lru
7.84 -0.5 7.30 perf-profile.children.cycles-pp.shmem_evict_inode
7.73 -0.5 7.18 perf-profile.children.cycles-pp.shmem_undo_range
6.28 ± 3% -0.5 5.75 ± 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irq
6.30 ± 3% -0.5 5.78 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irq
7.50 ± 2% -0.5 7.02 perf-profile.children.cycles-pp.truncate_inode_pages_range
6.62 -0.2 6.47 perf-profile.children.cycles-pp.stress_fault
5.72 -0.1 5.60 perf-profile.children.cycles-pp.asm_exc_page_fault
2.28 -0.1 2.15 perf-profile.children.cycles-pp.__do_softirq
2.26 -0.1 2.14 perf-profile.children.cycles-pp.rcu_do_batch
2.26 -0.1 2.15 perf-profile.children.cycles-pp.rcu_core
2.12 -0.1 2.01 perf-profile.children.cycles-pp.irq_exit_rcu
2.00 -0.1 1.91 perf-profile.children.cycles-pp.kmem_cache_free
0.25 ± 2% -0.1 0.16 ± 2% perf-profile.children.cycles-pp.vfs_fallocate
2.34 -0.1 2.25 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
4.17 -0.1 4.08 perf-profile.children.cycles-pp.exc_page_fault
2.32 -0.1 2.23 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.29 ± 2% -0.1 0.20 perf-profile.children.cycles-pp.__x64_sys_fallocate
4.14 -0.1 4.05 perf-profile.children.cycles-pp.do_user_addr_fault
0.42 ± 3% -0.1 0.34 ± 2% perf-profile.children.cycles-pp.posix_fallocate64
3.60 -0.1 3.53 perf-profile.children.cycles-pp.handle_mm_fault
1.70 ± 2% -0.1 1.63 perf-profile.children.cycles-pp.alloc_inode
2.96 -0.1 2.90 perf-profile.children.cycles-pp.do_fault
0.17 ± 3% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.rw_verify_area
1.03 -0.0 0.99 perf-profile.children.cycles-pp.__slab_free
0.92 -0.0 0.88 perf-profile.children.cycles-pp.simple_write_begin
0.64 ± 2% -0.0 0.59 perf-profile.children.cycles-pp.inode_init_always
1.16 -0.0 1.12 perf-profile.children.cycles-pp.generic_perform_write
0.46 ± 3% -0.0 0.42 ± 2% perf-profile.children.cycles-pp.mnt_want_write
0.84 -0.0 0.80 perf-profile.children.cycles-pp.__filemap_get_folio
1.12 -0.0 1.08 perf-profile.children.cycles-pp.perf_event_mmap
1.08 -0.0 1.05 perf-profile.children.cycles-pp.perf_event_mmap_event
0.15 -0.0 0.12 ± 3% perf-profile.children.cycles-pp.__fsnotify_parent
0.23 ± 3% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.may_open
0.58 -0.0 0.55 perf-profile.children.cycles-pp.mas_prev_slot
0.28 -0.0 0.26 ± 4% perf-profile.children.cycles-pp.__count_memcg_events
0.45 ± 2% -0.0 0.42 ± 2% perf-profile.children.cycles-pp.filemap_add_folio
0.18 ± 2% -0.0 0.15 ± 4% perf-profile.children.cycles-pp.security_inode_alloc
0.57 -0.0 0.54 perf-profile.children.cycles-pp.__cond_resched
0.26 -0.0 0.24 ± 2% perf-profile.children.cycles-pp.percpu_counter_add_batch
0.68 -0.0 0.66 perf-profile.children.cycles-pp.flush_tlb_mm_range
0.32 ± 2% -0.0 0.30 perf-profile.children.cycles-pp.generic_file_mmap
0.14 ± 3% -0.0 0.12 ± 7% perf-profile.children.cycles-pp.mem_cgroup_commit_charge
0.31 ± 2% -0.0 0.29 ± 2% perf-profile.children.cycles-pp.touch_atime
0.50 ± 2% -0.0 0.48 perf-profile.children.cycles-pp.mas_rev_awalk
0.32 -0.0 0.30 ± 2% perf-profile.children.cycles-pp.alloc_pages_mpol
0.22 ± 2% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.shmem_alloc_folio
0.17 ± 2% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.fsnotify
0.12 ± 4% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.blk_finish_plug
0.42 -0.0 0.40 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.17 ± 2% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.folio_alloc
0.31 -0.0 0.30 perf-profile.children.cycles-pp.mas_ascend
0.18 ± 2% -0.0 0.17 perf-profile.children.cycles-pp.fsnotify_grab_connector
0.10 ± 4% -0.0 0.08 ± 5% perf-profile.children.cycles-pp.kfree
0.19 ± 2% -0.0 0.18 perf-profile.children.cycles-pp.xas_start
0.64 -0.0 0.62 perf-profile.children.cycles-pp.lru_add_fn
0.09 ± 4% -0.0 0.08 perf-profile.children.cycles-pp.prepend_path
0.14 ± 3% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.simple_getattr
0.20 ± 2% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.fsnotify_destroy_marks
0.06 ± 6% +0.0 0.08 ± 6% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
0.08 ± 9% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.security_current_getsecid_subj
0.10 ± 7% +0.0 0.12 perf-profile.children.cycles-pp.security_file_post_open
0.09 ± 6% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.ima_file_check
0.02 ± 99% +0.0 0.06 perf-profile.children.cycles-pp.__x64_sys_fcntl
0.55 ± 2% +0.1 0.62 ± 2% perf-profile.children.cycles-pp.inode_wait_for_writeback
91.01 +0.2 91.25 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
18.74 +0.2 18.98 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
90.84 +0.2 91.08 perf-profile.children.cycles-pp.do_syscall_64
0.00 +0.2 0.24 ± 3% perf-profile.children.cycles-pp.file_start_write_area
18.34 +0.2 18.59 perf-profile.children.cycles-pp.task_work_run
17.46 +0.3 17.72 perf-profile.children.cycles-pp.dput
17.25 +0.3 17.50 perf-profile.children.cycles-pp.__dentry_kill
18.09 +0.3 18.35 perf-profile.children.cycles-pp.__fput
1.98 +0.3 2.25 perf-profile.children.cycles-pp.__libc_pwrite
1.82 +0.3 2.10 perf-profile.children.cycles-pp.vfs_write
1.82 +0.3 2.10 perf-profile.children.cycles-pp.__x64_sys_pwrite64
8.38 +0.9 9.29 ± 2% perf-profile.children.cycles-pp.unlink
8.21 +0.9 9.13 ± 2% perf-profile.children.cycles-pp.__x64_sys_unlink
8.04 +0.9 8.96 ± 2% perf-profile.children.cycles-pp.do_unlinkat
21.35 +1.3 22.65 perf-profile.children.cycles-pp.evict
8.11 ± 5% +2.0 10.10 ± 3% perf-profile.children.cycles-pp.new_inode
8.36 ± 5% +2.3 10.64 ± 6% perf-profile.children.cycles-pp.do_sys_openat2
8.37 ± 5% +2.3 10.65 ± 6% perf-profile.children.cycles-pp.__x64_sys_openat
8.55 ± 5% +2.3 10.83 ± 6% perf-profile.children.cycles-pp.open64
7.99 ± 5% +2.3 10.28 ± 6% perf-profile.children.cycles-pp.path_openat
8.02 ± 5% +2.3 10.31 ± 6% perf-profile.children.cycles-pp.do_filp_open
6.00 ± 6% +2.3 8.33 ± 7% perf-profile.children.cycles-pp.open_last_lookups
5.54 ± 6% +2.4 7.92 ± 8% perf-profile.children.cycles-pp.lookup_open
4.54 ± 7% +2.4 6.93 ± 9% perf-profile.children.cycles-pp.ramfs_mknod
4.40 ± 7% +2.4 6.79 ± 9% perf-profile.children.cycles-pp.ramfs_get_inode
12.62 ± 6% +4.4 16.99 ± 4% perf-profile.children.cycles-pp._raw_spin_lock
1.00 -0.0 0.95 perf-profile.self.cycles-pp.__slab_free
0.10 ± 4% -0.0 0.06 perf-profile.self.cycles-pp.vfs_fallocate
1.25 -0.0 1.21 perf-profile.self.cycles-pp.stress_fault
0.44 ± 5% -0.0 0.41 ± 4% perf-profile.self.cycles-pp.apparmor_file_alloc_security
0.14 ± 3% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.__fsnotify_parent
0.26 -0.0 0.23 ± 4% perf-profile.self.cycles-pp.__count_memcg_events
0.25 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.percpu_counter_add_batch
0.17 -0.0 0.15 perf-profile.self.cycles-pp.fsnotify
0.21 -0.0 0.19 perf-profile.self.cycles-pp.mas_prev_slot
0.35 -0.0 0.34 perf-profile.self.cycles-pp.__cond_resched
0.17 ± 2% -0.0 0.16 perf-profile.self.cycles-pp.xas_start
0.12 ± 4% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.__srcu_read_lock
0.13 -0.0 0.12 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.09 -0.0 0.08 perf-profile.self.cycles-pp.mas_store_gfp
0.07 -0.0 0.06 perf-profile.self.cycles-pp.unmap_region
0.06 ± 6% +0.0 0.07 perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
0.15 ± 3% +0.0 0.19 perf-profile.self.cycles-pp.ramfs_get_inode
0.12 ± 3% +0.1 0.26 ± 2% perf-profile.self.cycles-pp.vfs_write
1.60 ± 2% +0.2 1.76 perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.2 0.22 ± 3% perf-profile.self.cycles-pp.file_start_write_area
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression
2024-05-23 2:58 [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression kernel test robot
@ 2024-05-30 13:27 ` Amir Goldstein
2024-06-03 7:56 ` Oliver Sang
0 siblings, 1 reply; 4+ messages in thread
From: Amir Goldstein @ 2024-05-30 13:27 UTC (permalink / raw)
To: kernel test robot, Jan Kara; +Cc: oe-lkp, lkp
On Thu, May 23, 2024 at 5:59 AM kernel test robot <oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a -2.3% regression of stress-ng.fault.ops_per_sec on:
>
>
> commit: 8829cb6189b7a6b5283b9ffc870df13c085f1cd6 ("fs: hold s_write_srcu for pre-modify permission events on write")
> https://github.com/amir73il/linux sb_write_barrier
>
> testcase: stress-ng
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
> nr_threads: 100%
> testtime: 60s
> test: fault
> cpufreq_governor: performance
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202405231056.66ecbb94-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240523/202405231056.66ecbb94-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/fault/stress-ng/60s
>
> commit:
> 3f7a9d8157 ("fs: add srcu variants for mnt_{want,drop}_write() helpers")
> 8829cb6189 ("fs: hold s_write_srcu for pre-modify permission events on write")
>
> 3f7a9d815783aeff 8829cb6189b7a6b5283b9ffc870
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :6 17% 1:6 dmesg.RIP:native_queued_spin_lock_slowpath
> :6 17% 1:6 dmesg.RIP:setup_pebs_adaptive_sample_data
> :6 17% 1:6 dmesg.WARNING:at_arch/x86/events/intel/ds.c:#setup_pebs_adaptive_sample_data
> %stddev %change %stddev
> \ | \
> 155.51 ą 12% +23.3% 191.81 ą 13% sched_debug.cfs_rq:/.util_est.stddev
> 5270 ą141% +378.6% 25225 ą 79% sched_debug.cpu.max_idle_balance_cost.stddev
> 0.63 ą 2% -0.0 0.59 perf-stat.i.branch-miss-rate%
> 2.61 ą 2% +3.5% 2.70 perf-stat.i.cpi
> 0.40 ą 5% -5.2% 0.38 perf-stat.i.ipc
> 53250 -2.3% 52032 stress-ng.fault.minor_page_faults_per_sec
> 51143720 -2.3% 49967689 stress-ng.fault.ops
> 852394 -2.3% 832793 stress-ng.fault.ops_per_sec
> 2.046e+08 -2.3% 1.999e+08 stress-ng.time.minor_page_faults
> 1.157e+08 -2.2% 1.132e+08 proc-vmstat.numa_hit
> 1.157e+08 -2.2% 1.131e+08 proc-vmstat.numa_local
> 51220291 -2.4% 49995156 proc-vmstat.pgactivate
> 1.377e+08 -2.1% 1.349e+08 proc-vmstat.pgalloc_normal
> 2.053e+08 -2.4% 2.003e+08 proc-vmstat.pgfault
> 1.368e+08 -2.2% 1.338e+08 proc-vmstat.pgfree
> 51073893 -2.4% 49869748 proc-vmstat.unevictable_pgs_culled
> 24.17 ą 2% -1.7 22.46 ą 2% perf-profile.calltrace.cycles-pp.__madvise
> 23.20 ą 2% -1.7 21.52 ą 2% perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> 23.33 ą 2% -1.7 21.65 ą 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
> 23.24 ą 2% -1.7 21.55 ą 2% perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> 23.31 ą 2% -1.7 21.62 ą 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> 22.51 ą 2% -1.7 20.83 ą 2% perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 18.38 ą 3% -1.5 16.87 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
> 18.12 ą 3% -1.5 16.62 ą 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
> 17.63 -1.2 16.39 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
> 17.36 -1.2 16.14 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
> 17.61 -1.2 16.38 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
> 17.48 -1.2 16.25 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 15.64 ą 2% -1.2 14.49 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
> 15.03 ą 2% -1.1 13.91 ą 2% perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
> 13.49 ą 3% -1.1 12.38 ą 2% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior
> 13.51 ą 3% -1.1 12.41 ą 2% perf-profile.calltrace.cycles-pp.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
> 13.51 ą 3% -1.1 12.41 ą 2% perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise
> 12.53 ą 3% -1.0 11.50 ą 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
> 36.55 -1.0 35.54 perf-profile.calltrace.cycles-pp.__munmap
> 36.27 -1.0 35.27 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
> 36.26 -1.0 35.26 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
> 12.06 ą 2% -1.0 11.08 ą 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
> 7.33 -0.6 6.72 ą 2% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
> 7.10 ą 2% -0.6 6.49 ą 2% perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
> 7.11 ą 2% -0.6 6.50 ą 2% perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
> 7.02 ą 2% -0.6 6.42 ą 2% perf-profile.calltrace.cycles-pp.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
> 7.01 ą 2% -0.6 6.40 ą 2% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
> 6.99 ą 3% -0.6 6.42 ą 2% perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_range.madvise_pageout.madvise_vma_behavior.do_madvise
> 7.38 ą 2% -0.6 6.82 ą 2% perf-profile.calltrace.cycles-pp.madvise_pageout.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
> 6.94 ą 3% -0.6 6.38 ą 2% perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range.madvise_pageout
> 6.29 ą 2% -0.6 5.73 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache
> 6.97 ą 2% -0.6 6.41 ą 2% perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_range.madvise_pageout.madvise_vma_behavior
> 6.38 ą 2% -0.6 5.82 ą 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
> 6.16 ą 2% -0.6 5.60 ą 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain
> 6.92 ą 3% -0.6 6.36 ą 2% perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range
> 7.10 ą 2% -0.6 6.54 ą 2% perf-profile.calltrace.cycles-pp.walk_page_range.madvise_pageout.madvise_vma_behavior.do_madvise.__x64_sys_madvise
> 6.90 ą 3% -0.6 6.34 ą 2% perf-profile.calltrace.cycles-pp.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range
> 6.88 ą 3% -0.6 6.32 ą 2% perf-profile.calltrace.cycles-pp.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range
> 6.97 -0.6 6.42 perf-profile.calltrace.cycles-pp.folios_put_refs.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
> 6.54 ą 3% -0.5 5.99 ą 2% perf-profile.calltrace.cycles-pp.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range
> 7.84 -0.5 7.29 perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.__dentry_kill.dput.__fput
> 7.72 -0.5 7.17 perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill.dput
> 6.27 ą 3% -0.5 5.75 ą 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range
> 6.17 ą 3% -0.5 5.65 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range
> 6.09 ą 3% -0.5 5.57 ą 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range
> 6.42 -0.5 5.90 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_evict_inode.evict
> 6.58 ą 3% -0.5 6.07 ą 2% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap
> 6.25 -0.5 5.73 ą 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_evict_inode
> 6.62 ą 2% -0.5 6.10 ą 2% perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap
> 6.62 ą 3% -0.5 6.11 ą 2% perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
> 6.18 -0.5 5.66 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range
> 6.16 ą 3% -0.5 5.67 ą 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region
> 6.14 ą 2% -0.5 5.68 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict
> 6.07 ą 2% -0.5 5.60 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.truncate_inode_pages_range
> 3.43 ą 2% -0.3 3.17 perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.__dentry_kill.dput
> 3.75 ą 2% -0.2 3.50 perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.__dentry_kill.dput.__fput
> 3.41 ą 2% -0.2 3.17 perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink
> 3.14 ą 3% -0.2 2.91 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict.__dentry_kill
> 3.74 ą 2% -0.2 3.50 perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64
> 3.13 ą 2% -0.2 2.91 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
> 0.51 -0.2 0.33 ą 70% perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.stress_fault
> 5.64 -0.1 5.50 perf-profile.calltrace.cycles-pp.stress_fault
> 4.70 -0.1 4.60 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.stress_fault
> 4.16 -0.1 4.07 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.stress_fault
> 4.12 -0.1 4.03 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
> 3.59 -0.1 3.52 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
> 2.13 -0.1 2.08 perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
> 0.92 -0.0 0.88 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.__x64_sys_pwrite64
> 0.71 -0.0 0.67 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
> 1.13 -0.0 1.09 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
> 0.81 -0.0 0.79 perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
> 0.75 -0.0 0.73 perf-profile.calltrace.cycles-pp.alloc_inode.new_inode.__shmem_get_inode.__shmem_file_setup.shmem_zero_setup
> 0.60 -0.0 0.58 perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
> 18.59 +0.2 18.83 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
> 18.33 +0.2 18.58 perf-profile.calltrace.cycles-pp.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
> 17.29 +0.3 17.54 perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> 18.08 +0.3 18.34 perf-profile.calltrace.cycles-pp.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 17.19 +0.3 17.45 perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.task_work_run.syscall_exit_to_user_mode
> 1.96 +0.3 2.23 perf-profile.calltrace.cycles-pp.__libc_pwrite
> 1.84 +0.3 2.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 1.85 +0.3 2.13 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 1.82 +0.3 2.10 perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 1.78 +0.3 2.06 perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 15.98 +0.3 16.27 perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.task_work_run
> 8.24 +0.9 9.15 ą 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
> 8.24 +0.9 9.16 ą 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink
> 8.36 +0.9 9.27 ą 2% perf-profile.calltrace.cycles-pp.unlink
> 8.04 +0.9 8.96 ą 2% perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
> 8.20 +0.9 9.13 ą 2% perf-profile.calltrace.cycles-pp.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
> 5.36 +1.0 6.37 ą 3% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.64 ą 6% +1.0 4.68 ą 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
> 3.88 ą 6% +1.1 4.94 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
> 1.35 ą 11% +1.2 2.52 ą 10% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.do_unlinkat.__x64_sys_unlink
> 1.42 ą 10% +1.2 2.62 ą 10% perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64
> 8.34 ą 5% +2.3 10.62 ą 6% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
> 8.36 ą 5% +2.3 10.64 ą 6% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
> 8.53 ą 5% +2.3 10.81 ą 6% perf-profile.calltrace.cycles-pp.open64
> 8.39 ą 5% +2.3 10.67 ą 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
> 8.40 ą 5% +2.3 10.68 ą 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
> 8.01 ą 5% +2.3 10.30 ą 6% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 7.98 ą 5% +2.3 10.27 ą 6% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
> 2.87 ą 10% +2.3 5.19 ą 11% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.new_inode.ramfs_get_inode.ramfs_mknod
> 5.99 ą 6% +2.3 8.32 ą 7% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
> 4.04 ą 7% +2.3 6.38 ą 9% perf-profile.calltrace.cycles-pp.new_inode.ramfs_get_inode.ramfs_mknod.lookup_open.open_last_lookups
> 3.06 ą 9% +2.4 5.44 ą 11% perf-profile.calltrace.cycles-pp._raw_spin_lock.new_inode.ramfs_get_inode.ramfs_mknod.lookup_open
> 5.53 ą 6% +2.4 7.91 ą 8% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
> 4.53 ą 7% +2.4 6.92 ą 9% perf-profile.calltrace.cycles-pp.ramfs_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open
> 4.39 ą 7% +2.4 6.79 ą 9% perf-profile.calltrace.cycles-pp.ramfs_get_inode.ramfs_mknod.lookup_open.open_last_lookups.path_openat
> 37.47 ą 2% -3.1 34.42 ą 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
> 36.99 ą 2% -3.0 33.94 ą 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 27.20 ą 2% -2.2 24.96 ą 2% perf-profile.children.cycles-pp.lru_add_drain
> 27.10 ą 2% -2.2 24.88 ą 2% perf-profile.children.cycles-pp.folio_batch_move_lru
> 24.23 ą 2% -1.7 22.52 ą 2% perf-profile.children.cycles-pp.__madvise
> 23.22 ą 2% -1.7 21.53 ą 2% perf-profile.children.cycles-pp.do_madvise
> 23.24 ą 2% -1.7 21.56 ą 2% perf-profile.children.cycles-pp.__x64_sys_madvise
> 22.52 ą 2% -1.7 20.84 ą 2% perf-profile.children.cycles-pp.madvise_vma_behavior
> 20.19 ą 3% -1.6 18.56 ą 2% perf-profile.children.cycles-pp.lru_add_drain_cpu
> 17.60 -1.2 16.37 perf-profile.children.cycles-pp.do_vmi_munmap
> 17.63 -1.2 16.40 perf-profile.children.cycles-pp.__x64_sys_munmap
> 17.62 -1.2 16.39 perf-profile.children.cycles-pp.__vm_munmap
> 17.38 -1.2 16.16 perf-profile.children.cycles-pp.do_vmi_align_munmap
> 15.65 ą 2% -1.2 14.50 perf-profile.children.cycles-pp.unmap_region
> 15.04 ą 2% -1.1 13.91 ą 2% perf-profile.children.cycles-pp.zap_page_range_single
> 14.24 ą 2% -1.1 13.19 perf-profile.children.cycles-pp.folios_put_refs
> 36.59 -1.0 35.58 perf-profile.children.cycles-pp.__munmap
> 12.71 ą 2% -1.0 11.72 perf-profile.children.cycles-pp.__page_cache_release
> 7.75 -0.6 7.13 perf-profile.children.cycles-pp.tlb_finish_mmu
> 7.33 ą 2% -0.6 6.72 ą 2% perf-profile.children.cycles-pp.free_pages_and_swap_cache
> 7.44 -0.6 6.82 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
> 7.39 ą 2% -0.6 6.83 ą 2% perf-profile.children.cycles-pp.madvise_pageout
> 6.95 ą 3% -0.6 6.38 ą 2% perf-profile.children.cycles-pp.walk_p4d_range
> 7.10 ą 2% -0.6 6.54 ą 2% perf-profile.children.cycles-pp.walk_page_range
> 6.98 ą 2% -0.6 6.42 ą 2% perf-profile.children.cycles-pp.walk_pgd_range
> 6.99 ą 3% -0.6 6.43 ą 2% perf-profile.children.cycles-pp.__walk_page_range
> 6.92 ą 3% -0.6 6.36 ą 2% perf-profile.children.cycles-pp.walk_pud_range
> 6.88 ą 3% -0.6 6.32 ą 2% perf-profile.children.cycles-pp.madvise_cold_or_pageout_pte_range
> 6.90 ą 3% -0.6 6.34 ą 2% perf-profile.children.cycles-pp.walk_pmd_range
> 6.55 ą 3% -0.6 6.00 ą 2% perf-profile.children.cycles-pp.folio_isolate_lru
> 7.84 -0.5 7.30 perf-profile.children.cycles-pp.shmem_evict_inode
> 7.73 -0.5 7.18 perf-profile.children.cycles-pp.shmem_undo_range
> 6.28 ą 3% -0.5 5.75 ą 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irq
> 6.30 ą 3% -0.5 5.78 ą 2% perf-profile.children.cycles-pp._raw_spin_lock_irq
> 7.50 ą 2% -0.5 7.02 perf-profile.children.cycles-pp.truncate_inode_pages_range
> 6.62 -0.2 6.47 perf-profile.children.cycles-pp.stress_fault
> 5.72 -0.1 5.60 perf-profile.children.cycles-pp.asm_exc_page_fault
> 2.28 -0.1 2.15 perf-profile.children.cycles-pp.__do_softirq
> 2.26 -0.1 2.14 perf-profile.children.cycles-pp.rcu_do_batch
> 2.26 -0.1 2.15 perf-profile.children.cycles-pp.rcu_core
> 2.12 -0.1 2.01 perf-profile.children.cycles-pp.irq_exit_rcu
> 2.00 -0.1 1.91 perf-profile.children.cycles-pp.kmem_cache_free
> 0.25 ą 2% -0.1 0.16 ą 2% perf-profile.children.cycles-pp.vfs_fallocate
> 2.34 -0.1 2.25 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> 4.17 -0.1 4.08 perf-profile.children.cycles-pp.exc_page_fault
> 2.32 -0.1 2.23 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> 0.29 ą 2% -0.1 0.20 perf-profile.children.cycles-pp.__x64_sys_fallocate
> 4.14 -0.1 4.05 perf-profile.children.cycles-pp.do_user_addr_fault
> 0.42 ą 3% -0.1 0.34 ą 2% perf-profile.children.cycles-pp.posix_fallocate64
> 3.60 -0.1 3.53 perf-profile.children.cycles-pp.handle_mm_fault
> 1.70 ą 2% -0.1 1.63 perf-profile.children.cycles-pp.alloc_inode
> 2.96 -0.1 2.90 perf-profile.children.cycles-pp.do_fault
> 0.17 ą 3% -0.1 0.11 ą 3% perf-profile.children.cycles-pp.rw_verify_area
> 1.03 -0.0 0.99 perf-profile.children.cycles-pp.__slab_free
> 0.92 -0.0 0.88 perf-profile.children.cycles-pp.simple_write_begin
> 0.64 ą 2% -0.0 0.59 perf-profile.children.cycles-pp.inode_init_always
> 1.16 -0.0 1.12 perf-profile.children.cycles-pp.generic_perform_write
> 0.46 ą 3% -0.0 0.42 ą 2% perf-profile.children.cycles-pp.mnt_want_write
> 0.84 -0.0 0.80 perf-profile.children.cycles-pp.__filemap_get_folio
> 1.12 -0.0 1.08 perf-profile.children.cycles-pp.perf_event_mmap
> 1.08 -0.0 1.05 perf-profile.children.cycles-pp.perf_event_mmap_event
> 0.15 -0.0 0.12 ą 3% perf-profile.children.cycles-pp.__fsnotify_parent
> 0.23 ą 3% -0.0 0.20 ą 2% perf-profile.children.cycles-pp.may_open
> 0.58 -0.0 0.55 perf-profile.children.cycles-pp.mas_prev_slot
> 0.28 -0.0 0.26 ą 4% perf-profile.children.cycles-pp.__count_memcg_events
> 0.45 ą 2% -0.0 0.42 ą 2% perf-profile.children.cycles-pp.filemap_add_folio
> 0.18 ą 2% -0.0 0.15 ą 4% perf-profile.children.cycles-pp.security_inode_alloc
> 0.57 -0.0 0.54 perf-profile.children.cycles-pp.__cond_resched
> 0.26 -0.0 0.24 ą 2% perf-profile.children.cycles-pp.percpu_counter_add_batch
> 0.68 -0.0 0.66 perf-profile.children.cycles-pp.flush_tlb_mm_range
> 0.32 ą 2% -0.0 0.30 perf-profile.children.cycles-pp.generic_file_mmap
> 0.14 ą 3% -0.0 0.12 ą 7% perf-profile.children.cycles-pp.mem_cgroup_commit_charge
> 0.31 ą 2% -0.0 0.29 ą 2% perf-profile.children.cycles-pp.touch_atime
> 0.50 ą 2% -0.0 0.48 perf-profile.children.cycles-pp.mas_rev_awalk
> 0.32 -0.0 0.30 ą 2% perf-profile.children.cycles-pp.alloc_pages_mpol
> 0.22 ą 2% -0.0 0.20 ą 2% perf-profile.children.cycles-pp.shmem_alloc_folio
> 0.17 ą 2% -0.0 0.16 ą 3% perf-profile.children.cycles-pp.fsnotify
> 0.12 ą 4% -0.0 0.10 ą 3% perf-profile.children.cycles-pp.blk_finish_plug
> 0.42 -0.0 0.40 perf-profile.children.cycles-pp.entry_SYSCALL_64
> 0.17 ą 2% -0.0 0.15 ą 2% perf-profile.children.cycles-pp.folio_alloc
> 0.31 -0.0 0.30 perf-profile.children.cycles-pp.mas_ascend
> 0.18 ą 2% -0.0 0.17 perf-profile.children.cycles-pp.fsnotify_grab_connector
> 0.10 ą 4% -0.0 0.08 ą 5% perf-profile.children.cycles-pp.kfree
> 0.19 ą 2% -0.0 0.18 perf-profile.children.cycles-pp.xas_start
> 0.64 -0.0 0.62 perf-profile.children.cycles-pp.lru_add_fn
> 0.09 ą 4% -0.0 0.08 perf-profile.children.cycles-pp.prepend_path
> 0.14 ą 3% -0.0 0.12 ą 3% perf-profile.children.cycles-pp.simple_getattr
> 0.20 ą 2% -0.0 0.19 ą 2% perf-profile.children.cycles-pp.fsnotify_destroy_marks
> 0.06 ą 6% +0.0 0.08 ą 6% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
> 0.08 ą 9% +0.0 0.10 ą 5% perf-profile.children.cycles-pp.security_current_getsecid_subj
> 0.10 ą 7% +0.0 0.12 perf-profile.children.cycles-pp.security_file_post_open
> 0.09 ą 6% +0.0 0.12 ą 4% perf-profile.children.cycles-pp.ima_file_check
> 0.02 ą 99% +0.0 0.06 perf-profile.children.cycles-pp.__x64_sys_fcntl
> 0.55 ą 2% +0.1 0.62 ą 2% perf-profile.children.cycles-pp.inode_wait_for_writeback
> 91.01 +0.2 91.25 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 18.74 +0.2 18.98 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 90.84 +0.2 91.08 perf-profile.children.cycles-pp.do_syscall_64
> 0.00 +0.2 0.24 ą 3% perf-profile.children.cycles-pp.file_start_write_area
> 18.34 +0.2 18.59 perf-profile.children.cycles-pp.task_work_run
> 17.46 +0.3 17.72 perf-profile.children.cycles-pp.dput
> 17.25 +0.3 17.50 perf-profile.children.cycles-pp.__dentry_kill
> 18.09 +0.3 18.35 perf-profile.children.cycles-pp.__fput
> 1.98 +0.3 2.25 perf-profile.children.cycles-pp.__libc_pwrite
> 1.82 +0.3 2.10 perf-profile.children.cycles-pp.vfs_write
> 1.82 +0.3 2.10 perf-profile.children.cycles-pp.__x64_sys_pwrite64
> 8.38 +0.9 9.29 ą 2% perf-profile.children.cycles-pp.unlink
> 8.21 +0.9 9.13 ą 2% perf-profile.children.cycles-pp.__x64_sys_unlink
> 8.04 +0.9 8.96 ą 2% perf-profile.children.cycles-pp.do_unlinkat
> 21.35 +1.3 22.65 perf-profile.children.cycles-pp.evict
> 8.11 ą 5% +2.0 10.10 ą 3% perf-profile.children.cycles-pp.new_inode
> 8.36 ą 5% +2.3 10.64 ą 6% perf-profile.children.cycles-pp.do_sys_openat2
> 8.37 ą 5% +2.3 10.65 ą 6% perf-profile.children.cycles-pp.__x64_sys_openat
> 8.55 ą 5% +2.3 10.83 ą 6% perf-profile.children.cycles-pp.open64
> 7.99 ą 5% +2.3 10.28 ą 6% perf-profile.children.cycles-pp.path_openat
> 8.02 ą 5% +2.3 10.31 ą 6% perf-profile.children.cycles-pp.do_filp_open
> 6.00 ą 6% +2.3 8.33 ą 7% perf-profile.children.cycles-pp.open_last_lookups
> 5.54 ą 6% +2.4 7.92 ą 8% perf-profile.children.cycles-pp.lookup_open
> 4.54 ą 7% +2.4 6.93 ą 9% perf-profile.children.cycles-pp.ramfs_mknod
> 4.40 ą 7% +2.4 6.79 ą 9% perf-profile.children.cycles-pp.ramfs_get_inode
> 12.62 ą 6% +4.4 16.99 ą 4% perf-profile.children.cycles-pp._raw_spin_lock
I am scratching my head to figure out why these functions are affected by
the regressing commit, which as far as I can see only adds
if (READ_ONCE(sb->s_write_srcu)) test in write helpers,
which should always be false.
The only thing I can think of is that s_write_srcu on the same cache line as
s_inode_*_lock, which impacts performance of acquiring those spinlocks,
but this explanation seems far-fetched.
Anyway, I tried moving sb->s_write_srcu next to s_fsnotify_info and other
read-mostly sb members to see if it makes any difference.
Also rebased branch on v6.10-rc1:
* 1d15ffdc12d2 - (sb_write_barrier) fanotify: introduce FAN_MARK_SYNC flag
* 5029c0cbd085 - fanotify: activate sb write barriers for pre-modify
event watchers
* fda0270c803d - fs: hold s_write_srcu for pre-modify permission
events on aio write
* e34d0ca5cdfd - fs: hold s_write_srcu for pre-modify permission events on write
* afdd0701bfb7 - fs: add srcu variants for mnt_{want,drop}_write() helpers
* 61d0f429d8bf - fs: implement 'vfs write barriers'
Oliver,
Can you please re-test?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression
2024-05-30 13:27 ` Amir Goldstein
@ 2024-06-03 7:56 ` Oliver Sang
2024-06-03 8:13 ` Amir Goldstein
0 siblings, 1 reply; 4+ messages in thread
From: Oliver Sang @ 2024-06-03 7:56 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang
hi, Amir,
On Thu, May 30, 2024 at 04:27:57PM +0300, Amir Goldstein wrote:
> On Thu, May 23, 2024 at 5:59 AM kernel test robot <oliver.sang@intel.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed a -2.3% regression of stress-ng.fault.ops_per_sec on:
[...]
> I am scratching my head to figure out why these functions are affected by
> the regressing commit, which as far as I can see only adds
> if (READ_ONCE(sb->s_write_srcu)) test in write helpers,
> which should always be false.
>
> The only thing I can think of is that s_write_srcu on the same cache line as
> s_inode_*_lock, which impacts performance of acquiring those spinlocks,
> but this explanation seems far-fetched.
>
> Anyway, I tried moving sb->s_write_srcu next to s_fsnotify_info and other
> read-mostly sb members to see if it makes any difference.
> Also rebased branch on v6.10-rc1:
>
> * 1d15ffdc12d2 - (sb_write_barrier) fanotify: introduce FAN_MARK_SYNC flag
> * 5029c0cbd085 - fanotify: activate sb write barriers for pre-modify
> event watchers
> * fda0270c803d - fs: hold s_write_srcu for pre-modify permission
> events on aio write
> * e34d0ca5cdfd - fs: hold s_write_srcu for pre-modify permission events on write
> * afdd0701bfb7 - fs: add srcu variants for mnt_{want,drop}_write() helpers
> * 61d0f429d8bf - fs: implement 'vfs write barriers'
>
> Oliver,
>
> Can you please re-test?
I compare the tip 1d15ffdc12d2 with v6.10-rc1, found there is no peformance
difference now. (if you need full comparison, please let me know). Thanks!
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/fault/stress-ng/60s
v6.10-rc1 1d15ffdc12d22e06ffa9ca34afd
---------------- ---------------------------
%stddev %change %stddev
\ | \
49171337 +0.0% 49192831 stress-ng.fault.ops
819521 +0.0% 819879 stress-ng.fault.ops_per_sec
>
> Thanks,
> Amir.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression
2024-06-03 7:56 ` Oliver Sang
@ 2024-06-03 8:13 ` Amir Goldstein
0 siblings, 0 replies; 4+ messages in thread
From: Amir Goldstein @ 2024-06-03 8:13 UTC (permalink / raw)
To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp
On Mon, Jun 3, 2024 at 10:57 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Thu, May 30, 2024 at 04:27:57PM +0300, Amir Goldstein wrote:
> > On Thu, May 23, 2024 at 5:59 AM kernel test robot <oliver.sang@intel.com> wrote:
> > >
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed a -2.3% regression of stress-ng.fault.ops_per_sec on:
>
> [...]
>
> > I am scratching my head to figure out why these functions are affected by
> > the regressing commit, which as far as I can see only adds
> > if (READ_ONCE(sb->s_write_srcu)) test in write helpers,
> > which should always be false.
> >
> > The only thing I can think of is that s_write_srcu on the same cache line as
> > s_inode_*_lock, which impacts performance of acquiring those spinlocks,
> > but this explanation seems far-fetched.
> >
> > Anyway, I tried moving sb->s_write_srcu next to s_fsnotify_info and other
> > read-mostly sb members to see if it makes any difference.
> > Also rebased branch on v6.10-rc1:
> >
> > * 1d15ffdc12d2 - (sb_write_barrier) fanotify: introduce FAN_MARK_SYNC flag
> > * 5029c0cbd085 - fanotify: activate sb write barriers for pre-modify
> > event watchers
> > * fda0270c803d - fs: hold s_write_srcu for pre-modify permission
> > events on aio write
> > * e34d0ca5cdfd - fs: hold s_write_srcu for pre-modify permission events on write
> > * afdd0701bfb7 - fs: add srcu variants for mnt_{want,drop}_write() helpers
> > * 61d0f429d8bf - fs: implement 'vfs write barriers'
> >
> > Oliver,
> >
> > Can you please re-test?
>
> I compare the tip 1d15ffdc12d2 with v6.10-rc1, found there is no peformance
> difference now. (if you need full comparison, please let me know). Thanks!
>
Excellent, thanks you!
Amir.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-03 8:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23 2:58 [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression kernel test robot
2024-05-30 13:27 ` Amir Goldstein
2024-06-03 7:56 ` Oliver Sang
2024-06-03 8:13 ` Amir Goldstein
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.