All of lore.kernel.org
 help / color / mirror / Atom feed
* [amir73il:sb_write_barrier] [fs]  8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression
@ 2024-05-23  2:58 kernel test robot
  2024-05-30 13:27 ` Amir Goldstein
  0 siblings, 1 reply; 4+ messages in thread
From: kernel test robot @ 2024-05-23  2:58 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: oe-lkp, lkp, oliver.sang



Hello,

kernel test robot noticed a -2.3% regression of stress-ng.fault.ops_per_sec on:


commit: 8829cb6189b7a6b5283b9ffc870df13c085f1cd6 ("fs: hold s_write_srcu for pre-modify permission events on write")
https://github.com/amir73il/linux sb_write_barrier

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: fault
	cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405231056.66ecbb94-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240523/202405231056.66ecbb94-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/fault/stress-ng/60s

commit: 
  3f7a9d8157 ("fs: add srcu variants for mnt_{want,drop}_write() helpers")
  8829cb6189 ("fs: hold s_write_srcu for pre-modify permission events on write")

3f7a9d815783aeff 8829cb6189b7a6b5283b9ffc870 
---------------- --------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
           :6           17%           1:6     dmesg.RIP:native_queued_spin_lock_slowpath
           :6           17%           1:6     dmesg.RIP:setup_pebs_adaptive_sample_data
           :6           17%           1:6     dmesg.WARNING:at_arch/x86/events/intel/ds.c:#setup_pebs_adaptive_sample_data
         %stddev     %change         %stddev
             \          |                \  
    155.51 ± 12%     +23.3%     191.81 ± 13%  sched_debug.cfs_rq:/.util_est.stddev
      5270 ±141%    +378.6%      25225 ± 79%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.63 ±  2%      -0.0        0.59        perf-stat.i.branch-miss-rate%
      2.61 ±  2%      +3.5%       2.70        perf-stat.i.cpi
      0.40 ±  5%      -5.2%       0.38        perf-stat.i.ipc
     53250            -2.3%      52032        stress-ng.fault.minor_page_faults_per_sec
  51143720            -2.3%   49967689        stress-ng.fault.ops
    852394            -2.3%     832793        stress-ng.fault.ops_per_sec
 2.046e+08            -2.3%  1.999e+08        stress-ng.time.minor_page_faults
 1.157e+08            -2.2%  1.132e+08        proc-vmstat.numa_hit
 1.157e+08            -2.2%  1.131e+08        proc-vmstat.numa_local
  51220291            -2.4%   49995156        proc-vmstat.pgactivate
 1.377e+08            -2.1%  1.349e+08        proc-vmstat.pgalloc_normal
 2.053e+08            -2.4%  2.003e+08        proc-vmstat.pgfault
 1.368e+08            -2.2%  1.338e+08        proc-vmstat.pgfree
  51073893            -2.4%   49869748        proc-vmstat.unevictable_pgs_culled
     24.17 ±  2%      -1.7       22.46 ±  2%  perf-profile.calltrace.cycles-pp.__madvise
     23.20 ±  2%      -1.7       21.52 ±  2%  perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     23.33 ±  2%      -1.7       21.65 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
     23.24 ±  2%      -1.7       21.55 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     23.31 ±  2%      -1.7       21.62 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     22.51 ±  2%      -1.7       20.83 ±  2%  perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
     18.38 ±  3%      -1.5       16.87 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
     18.12 ±  3%      -1.5       16.62 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
     17.63            -1.2       16.39        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     17.36            -1.2       16.14        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     17.61            -1.2       16.38        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     17.48            -1.2       16.25        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     15.64 ±  2%      -1.2       14.49        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
     15.03 ±  2%      -1.1       13.91 ±  2%  perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
     13.49 ±  3%      -1.1       12.38 ±  2%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior
     13.51 ±  3%      -1.1       12.41 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
     13.51 ±  3%      -1.1       12.41 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise
     12.53 ±  3%      -1.0       11.50 ±  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
     36.55            -1.0       35.54        perf-profile.calltrace.cycles-pp.__munmap
     36.27            -1.0       35.27        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     36.26            -1.0       35.26        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     12.06 ±  2%      -1.0       11.08 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
      7.33            -0.6        6.72 ±  2%  perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      7.10 ±  2%      -0.6        6.49 ±  2%  perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      7.11 ±  2%      -0.6        6.50 ±  2%  perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      7.02 ±  2%      -0.6        6.42 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
      7.01 ±  2%      -0.6        6.40 ±  2%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
      6.99 ±  3%      -0.6        6.42 ±  2%  perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_range.madvise_pageout.madvise_vma_behavior.do_madvise
      7.38 ±  2%      -0.6        6.82 ±  2%  perf-profile.calltrace.cycles-pp.madvise_pageout.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
      6.94 ±  3%      -0.6        6.38 ±  2%  perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range.madvise_pageout
      6.29 ±  2%      -0.6        5.73 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache
      6.97 ±  2%      -0.6        6.41 ±  2%  perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_range.madvise_pageout.madvise_vma_behavior
      6.38 ±  2%      -0.6        5.82 ±  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
      6.16 ±  2%      -0.6        5.60 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain
      6.92 ±  3%      -0.6        6.36 ±  2%  perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range
      7.10 ±  2%      -0.6        6.54 ±  2%  perf-profile.calltrace.cycles-pp.walk_page_range.madvise_pageout.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      6.90 ±  3%      -0.6        6.34 ±  2%  perf-profile.calltrace.cycles-pp.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range
      6.88 ±  3%      -0.6        6.32 ±  2%  perf-profile.calltrace.cycles-pp.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range
      6.97            -0.6        6.42        perf-profile.calltrace.cycles-pp.folios_put_refs.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
      6.54 ±  3%      -0.5        5.99 ±  2%  perf-profile.calltrace.cycles-pp.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range
      7.84            -0.5        7.29        perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.__dentry_kill.dput.__fput
      7.72            -0.5        7.17        perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill.dput
      6.27 ±  3%      -0.5        5.75 ±  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range
      6.17 ±  3%      -0.5        5.65 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range
      6.09 ±  3%      -0.5        5.57 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range
      6.42            -0.5        5.90        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_evict_inode.evict
      6.58 ±  3%      -0.5        6.07 ±  2%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap
      6.25            -0.5        5.73 ±  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_evict_inode
      6.62 ±  2%      -0.5        6.10 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      6.62 ±  3%      -0.5        6.11 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      6.18            -0.5        5.66 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range
      6.16 ±  3%      -0.5        5.67 ±  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region
      6.14 ±  2%      -0.5        5.68        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict
      6.07 ±  2%      -0.5        5.60 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.truncate_inode_pages_range
      3.43 ±  2%      -0.3        3.17        perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.__dentry_kill.dput
      3.75 ±  2%      -0.2        3.50        perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.__dentry_kill.dput.__fput
      3.41 ±  2%      -0.2        3.17        perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink
      3.14 ±  3%      -0.2        2.91        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict.__dentry_kill
      3.74 ±  2%      -0.2        3.50        perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64
      3.13 ±  2%      -0.2        2.91        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
      0.51            -0.2        0.33 ± 70%  perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.stress_fault
      5.64            -0.1        5.50        perf-profile.calltrace.cycles-pp.stress_fault
      4.70            -0.1        4.60        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.stress_fault
      4.16            -0.1        4.07        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.stress_fault
      4.12            -0.1        4.03        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
      3.59            -0.1        3.52        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
      2.13            -0.1        2.08        perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.92            -0.0        0.88        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.__x64_sys_pwrite64
      0.71            -0.0        0.67        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
      1.13            -0.0        1.09        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
      0.81            -0.0        0.79        perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
      0.75            -0.0        0.73        perf-profile.calltrace.cycles-pp.alloc_inode.new_inode.__shmem_get_inode.__shmem_file_setup.shmem_zero_setup
      0.60            -0.0        0.58        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
     18.59            +0.2       18.83        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     18.33            +0.2       18.58        perf-profile.calltrace.cycles-pp.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     17.29            +0.3       17.54        perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64
     18.08            +0.3       18.34        perf-profile.calltrace.cycles-pp.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     17.19            +0.3       17.45        perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.task_work_run.syscall_exit_to_user_mode
      1.96            +0.3        2.23        perf-profile.calltrace.cycles-pp.__libc_pwrite
      1.84            +0.3        2.12        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
      1.85            +0.3        2.13        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite
      1.82            +0.3        2.10        perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
      1.78            +0.3        2.06        perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
     15.98            +0.3       16.27        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.task_work_run
      8.24            +0.9        9.15 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
      8.24            +0.9        9.16 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink
      8.36            +0.9        9.27 ±  2%  perf-profile.calltrace.cycles-pp.unlink
      8.04            +0.9        8.96 ±  2%  perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
      8.20            +0.9        9.13 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
      5.36            +1.0        6.37 ±  3%  perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.64 ±  6%      +1.0        4.68 ±  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
      3.88 ±  6%      +1.1        4.94 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
      1.35 ± 11%      +1.2        2.52 ± 10%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.do_unlinkat.__x64_sys_unlink
      1.42 ± 10%      +1.2        2.62 ± 10%  perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64
      8.34 ±  5%      +2.3       10.62 ±  6%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      8.36 ±  5%      +2.3       10.64 ±  6%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      8.53 ±  5%      +2.3       10.81 ±  6%  perf-profile.calltrace.cycles-pp.open64
      8.39 ±  5%      +2.3       10.67 ±  6%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
      8.40 ±  5%      +2.3       10.68 ±  6%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
      8.01 ±  5%      +2.3       10.30 ±  6%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.98 ±  5%      +2.3       10.27 ±  6%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
      2.87 ± 10%      +2.3        5.19 ± 11%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.new_inode.ramfs_get_inode.ramfs_mknod
      5.99 ±  6%      +2.3        8.32 ±  7%  perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
      4.04 ±  7%      +2.3        6.38 ±  9%  perf-profile.calltrace.cycles-pp.new_inode.ramfs_get_inode.ramfs_mknod.lookup_open.open_last_lookups
      3.06 ±  9%      +2.4        5.44 ± 11%  perf-profile.calltrace.cycles-pp._raw_spin_lock.new_inode.ramfs_get_inode.ramfs_mknod.lookup_open
      5.53 ±  6%      +2.4        7.91 ±  8%  perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
      4.53 ±  7%      +2.4        6.92 ±  9%  perf-profile.calltrace.cycles-pp.ramfs_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open
      4.39 ±  7%      +2.4        6.79 ±  9%  perf-profile.calltrace.cycles-pp.ramfs_get_inode.ramfs_mknod.lookup_open.open_last_lookups.path_openat
     37.47 ±  2%      -3.1       34.42 ±  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     36.99 ±  2%      -3.0       33.94 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     27.20 ±  2%      -2.2       24.96 ±  2%  perf-profile.children.cycles-pp.lru_add_drain
     27.10 ±  2%      -2.2       24.88 ±  2%  perf-profile.children.cycles-pp.folio_batch_move_lru
     24.23 ±  2%      -1.7       22.52 ±  2%  perf-profile.children.cycles-pp.__madvise
     23.22 ±  2%      -1.7       21.53 ±  2%  perf-profile.children.cycles-pp.do_madvise
     23.24 ±  2%      -1.7       21.56 ±  2%  perf-profile.children.cycles-pp.__x64_sys_madvise
     22.52 ±  2%      -1.7       20.84 ±  2%  perf-profile.children.cycles-pp.madvise_vma_behavior
     20.19 ±  3%      -1.6       18.56 ±  2%  perf-profile.children.cycles-pp.lru_add_drain_cpu
     17.60            -1.2       16.37        perf-profile.children.cycles-pp.do_vmi_munmap
     17.63            -1.2       16.40        perf-profile.children.cycles-pp.__x64_sys_munmap
     17.62            -1.2       16.39        perf-profile.children.cycles-pp.__vm_munmap
     17.38            -1.2       16.16        perf-profile.children.cycles-pp.do_vmi_align_munmap
     15.65 ±  2%      -1.2       14.50        perf-profile.children.cycles-pp.unmap_region
     15.04 ±  2%      -1.1       13.91 ±  2%  perf-profile.children.cycles-pp.zap_page_range_single
     14.24 ±  2%      -1.1       13.19        perf-profile.children.cycles-pp.folios_put_refs
     36.59            -1.0       35.58        perf-profile.children.cycles-pp.__munmap
     12.71 ±  2%      -1.0       11.72        perf-profile.children.cycles-pp.__page_cache_release
      7.75            -0.6        7.13        perf-profile.children.cycles-pp.tlb_finish_mmu
      7.33 ±  2%      -0.6        6.72 ±  2%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
      7.44            -0.6        6.82        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
      7.39 ±  2%      -0.6        6.83 ±  2%  perf-profile.children.cycles-pp.madvise_pageout
      6.95 ±  3%      -0.6        6.38 ±  2%  perf-profile.children.cycles-pp.walk_p4d_range
      7.10 ±  2%      -0.6        6.54 ±  2%  perf-profile.children.cycles-pp.walk_page_range
      6.98 ±  2%      -0.6        6.42 ±  2%  perf-profile.children.cycles-pp.walk_pgd_range
      6.99 ±  3%      -0.6        6.43 ±  2%  perf-profile.children.cycles-pp.__walk_page_range
      6.92 ±  3%      -0.6        6.36 ±  2%  perf-profile.children.cycles-pp.walk_pud_range
      6.88 ±  3%      -0.6        6.32 ±  2%  perf-profile.children.cycles-pp.madvise_cold_or_pageout_pte_range
      6.90 ±  3%      -0.6        6.34 ±  2%  perf-profile.children.cycles-pp.walk_pmd_range
      6.55 ±  3%      -0.6        6.00 ±  2%  perf-profile.children.cycles-pp.folio_isolate_lru
      7.84            -0.5        7.30        perf-profile.children.cycles-pp.shmem_evict_inode
      7.73            -0.5        7.18        perf-profile.children.cycles-pp.shmem_undo_range
      6.28 ±  3%      -0.5        5.75 ±  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irq
      6.30 ±  3%      -0.5        5.78 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      7.50 ±  2%      -0.5        7.02        perf-profile.children.cycles-pp.truncate_inode_pages_range
      6.62            -0.2        6.47        perf-profile.children.cycles-pp.stress_fault
      5.72            -0.1        5.60        perf-profile.children.cycles-pp.asm_exc_page_fault
      2.28            -0.1        2.15        perf-profile.children.cycles-pp.__do_softirq
      2.26            -0.1        2.14        perf-profile.children.cycles-pp.rcu_do_batch
      2.26            -0.1        2.15        perf-profile.children.cycles-pp.rcu_core
      2.12            -0.1        2.01        perf-profile.children.cycles-pp.irq_exit_rcu
      2.00            -0.1        1.91        perf-profile.children.cycles-pp.kmem_cache_free
      0.25 ±  2%      -0.1        0.16 ±  2%  perf-profile.children.cycles-pp.vfs_fallocate
      2.34            -0.1        2.25        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      4.17            -0.1        4.08        perf-profile.children.cycles-pp.exc_page_fault
      2.32            -0.1        2.23        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.29 ±  2%      -0.1        0.20        perf-profile.children.cycles-pp.__x64_sys_fallocate
      4.14            -0.1        4.05        perf-profile.children.cycles-pp.do_user_addr_fault
      0.42 ±  3%      -0.1        0.34 ±  2%  perf-profile.children.cycles-pp.posix_fallocate64
      3.60            -0.1        3.53        perf-profile.children.cycles-pp.handle_mm_fault
      1.70 ±  2%      -0.1        1.63        perf-profile.children.cycles-pp.alloc_inode
      2.96            -0.1        2.90        perf-profile.children.cycles-pp.do_fault
      0.17 ±  3%      -0.1        0.11 ±  3%  perf-profile.children.cycles-pp.rw_verify_area
      1.03            -0.0        0.99        perf-profile.children.cycles-pp.__slab_free
      0.92            -0.0        0.88        perf-profile.children.cycles-pp.simple_write_begin
      0.64 ±  2%      -0.0        0.59        perf-profile.children.cycles-pp.inode_init_always
      1.16            -0.0        1.12        perf-profile.children.cycles-pp.generic_perform_write
      0.46 ±  3%      -0.0        0.42 ±  2%  perf-profile.children.cycles-pp.mnt_want_write
      0.84            -0.0        0.80        perf-profile.children.cycles-pp.__filemap_get_folio
      1.12            -0.0        1.08        perf-profile.children.cycles-pp.perf_event_mmap
      1.08            -0.0        1.05        perf-profile.children.cycles-pp.perf_event_mmap_event
      0.15            -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.__fsnotify_parent
      0.23 ±  3%      -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.may_open
      0.58            -0.0        0.55        perf-profile.children.cycles-pp.mas_prev_slot
      0.28            -0.0        0.26 ±  4%  perf-profile.children.cycles-pp.__count_memcg_events
      0.45 ±  2%      -0.0        0.42 ±  2%  perf-profile.children.cycles-pp.filemap_add_folio
      0.18 ±  2%      -0.0        0.15 ±  4%  perf-profile.children.cycles-pp.security_inode_alloc
      0.57            -0.0        0.54        perf-profile.children.cycles-pp.__cond_resched
      0.26            -0.0        0.24 ±  2%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.68            -0.0        0.66        perf-profile.children.cycles-pp.flush_tlb_mm_range
      0.32 ±  2%      -0.0        0.30        perf-profile.children.cycles-pp.generic_file_mmap
      0.14 ±  3%      -0.0        0.12 ±  7%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      0.31 ±  2%      -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.touch_atime
      0.50 ±  2%      -0.0        0.48        perf-profile.children.cycles-pp.mas_rev_awalk
      0.32            -0.0        0.30 ±  2%  perf-profile.children.cycles-pp.alloc_pages_mpol
      0.22 ±  2%      -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.shmem_alloc_folio
      0.17 ±  2%      -0.0        0.16 ±  3%  perf-profile.children.cycles-pp.fsnotify
      0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.blk_finish_plug
      0.42            -0.0        0.40        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.17 ±  2%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.folio_alloc
      0.31            -0.0        0.30        perf-profile.children.cycles-pp.mas_ascend
      0.18 ±  2%      -0.0        0.17        perf-profile.children.cycles-pp.fsnotify_grab_connector
      0.10 ±  4%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.kfree
      0.19 ±  2%      -0.0        0.18        perf-profile.children.cycles-pp.xas_start
      0.64            -0.0        0.62        perf-profile.children.cycles-pp.lru_add_fn
      0.09 ±  4%      -0.0        0.08        perf-profile.children.cycles-pp.prepend_path
      0.14 ±  3%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.simple_getattr
      0.20 ±  2%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.fsnotify_destroy_marks
      0.06 ±  6%      +0.0        0.08 ±  6%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      0.08 ±  9%      +0.0        0.10 ±  5%  perf-profile.children.cycles-pp.security_current_getsecid_subj
      0.10 ±  7%      +0.0        0.12        perf-profile.children.cycles-pp.security_file_post_open
      0.09 ±  6%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.ima_file_check
      0.02 ± 99%      +0.0        0.06        perf-profile.children.cycles-pp.__x64_sys_fcntl
      0.55 ±  2%      +0.1        0.62 ±  2%  perf-profile.children.cycles-pp.inode_wait_for_writeback
     91.01            +0.2       91.25        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     18.74            +0.2       18.98        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     90.84            +0.2       91.08        perf-profile.children.cycles-pp.do_syscall_64
      0.00            +0.2        0.24 ±  3%  perf-profile.children.cycles-pp.file_start_write_area
     18.34            +0.2       18.59        perf-profile.children.cycles-pp.task_work_run
     17.46            +0.3       17.72        perf-profile.children.cycles-pp.dput
     17.25            +0.3       17.50        perf-profile.children.cycles-pp.__dentry_kill
     18.09            +0.3       18.35        perf-profile.children.cycles-pp.__fput
      1.98            +0.3        2.25        perf-profile.children.cycles-pp.__libc_pwrite
      1.82            +0.3        2.10        perf-profile.children.cycles-pp.vfs_write
      1.82            +0.3        2.10        perf-profile.children.cycles-pp.__x64_sys_pwrite64
      8.38            +0.9        9.29 ±  2%  perf-profile.children.cycles-pp.unlink
      8.21            +0.9        9.13 ±  2%  perf-profile.children.cycles-pp.__x64_sys_unlink
      8.04            +0.9        8.96 ±  2%  perf-profile.children.cycles-pp.do_unlinkat
     21.35            +1.3       22.65        perf-profile.children.cycles-pp.evict
      8.11 ±  5%      +2.0       10.10 ±  3%  perf-profile.children.cycles-pp.new_inode
      8.36 ±  5%      +2.3       10.64 ±  6%  perf-profile.children.cycles-pp.do_sys_openat2
      8.37 ±  5%      +2.3       10.65 ±  6%  perf-profile.children.cycles-pp.__x64_sys_openat
      8.55 ±  5%      +2.3       10.83 ±  6%  perf-profile.children.cycles-pp.open64
      7.99 ±  5%      +2.3       10.28 ±  6%  perf-profile.children.cycles-pp.path_openat
      8.02 ±  5%      +2.3       10.31 ±  6%  perf-profile.children.cycles-pp.do_filp_open
      6.00 ±  6%      +2.3        8.33 ±  7%  perf-profile.children.cycles-pp.open_last_lookups
      5.54 ±  6%      +2.4        7.92 ±  8%  perf-profile.children.cycles-pp.lookup_open
      4.54 ±  7%      +2.4        6.93 ±  9%  perf-profile.children.cycles-pp.ramfs_mknod
      4.40 ±  7%      +2.4        6.79 ±  9%  perf-profile.children.cycles-pp.ramfs_get_inode
     12.62 ±  6%      +4.4       16.99 ±  4%  perf-profile.children.cycles-pp._raw_spin_lock
      1.00            -0.0        0.95        perf-profile.self.cycles-pp.__slab_free
      0.10 ±  4%      -0.0        0.06        perf-profile.self.cycles-pp.vfs_fallocate
      1.25            -0.0        1.21        perf-profile.self.cycles-pp.stress_fault
      0.44 ±  5%      -0.0        0.41 ±  4%  perf-profile.self.cycles-pp.apparmor_file_alloc_security
      0.14 ±  3%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.__fsnotify_parent
      0.26            -0.0        0.23 ±  4%  perf-profile.self.cycles-pp.__count_memcg_events
      0.25            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.17            -0.0        0.15        perf-profile.self.cycles-pp.fsnotify
      0.21            -0.0        0.19        perf-profile.self.cycles-pp.mas_prev_slot
      0.35            -0.0        0.34        perf-profile.self.cycles-pp.__cond_resched
      0.17 ±  2%      -0.0        0.16        perf-profile.self.cycles-pp.xas_start
      0.12 ±  4%      -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.__srcu_read_lock
      0.13            -0.0        0.12        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.09            -0.0        0.08        perf-profile.self.cycles-pp.mas_store_gfp
      0.07            -0.0        0.06        perf-profile.self.cycles-pp.unmap_region
      0.06 ±  6%      +0.0        0.07        perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      0.15 ±  3%      +0.0        0.19        perf-profile.self.cycles-pp.ramfs_get_inode
      0.12 ±  3%      +0.1        0.26 ±  2%  perf-profile.self.cycles-pp.vfs_write
      1.60 ±  2%      +0.2        1.76        perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +0.2        0.22 ±  3%  perf-profile.self.cycles-pp.file_start_write_area



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression
  2024-05-23  2:58 [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression kernel test robot
@ 2024-05-30 13:27 ` Amir Goldstein
  2024-06-03  7:56   ` Oliver Sang
  0 siblings, 1 reply; 4+ messages in thread
From: Amir Goldstein @ 2024-05-30 13:27 UTC (permalink / raw)
  To: kernel test robot, Jan Kara; +Cc: oe-lkp, lkp

On Thu, May 23, 2024 at 5:59 AM kernel test robot <oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a -2.3% regression of stress-ng.fault.ops_per_sec on:
>
>
> commit: 8829cb6189b7a6b5283b9ffc870df13c085f1cd6 ("fs: hold s_write_srcu for pre-modify permission events on write")
> https://github.com/amir73il/linux sb_write_barrier
>
> testcase: stress-ng
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
>         nr_threads: 100%
>         testtime: 60s
>         test: fault
>         cpufreq_governor: performance
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202405231056.66ecbb94-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240523/202405231056.66ecbb94-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/fault/stress-ng/60s
>
> commit:
>   3f7a9d8157 ("fs: add srcu variants for mnt_{want,drop}_write() helpers")
>   8829cb6189 ("fs: hold s_write_srcu for pre-modify permission events on write")
>
> 3f7a9d815783aeff 8829cb6189b7a6b5283b9ffc870
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :6           17%           1:6     dmesg.RIP:native_queued_spin_lock_slowpath
>            :6           17%           1:6     dmesg.RIP:setup_pebs_adaptive_sample_data
>            :6           17%           1:6     dmesg.WARNING:at_arch/x86/events/intel/ds.c:#setup_pebs_adaptive_sample_data
>          %stddev     %change         %stddev
>              \          |                \
>     155.51 ą 12%     +23.3%     191.81 ą 13%  sched_debug.cfs_rq:/.util_est.stddev
>       5270 ą141%    +378.6%      25225 ą 79%  sched_debug.cpu.max_idle_balance_cost.stddev
>       0.63 ą  2%      -0.0        0.59        perf-stat.i.branch-miss-rate%
>       2.61 ą  2%      +3.5%       2.70        perf-stat.i.cpi
>       0.40 ą  5%      -5.2%       0.38        perf-stat.i.ipc
>      53250            -2.3%      52032        stress-ng.fault.minor_page_faults_per_sec
>   51143720            -2.3%   49967689        stress-ng.fault.ops
>     852394            -2.3%     832793        stress-ng.fault.ops_per_sec
>  2.046e+08            -2.3%  1.999e+08        stress-ng.time.minor_page_faults
>  1.157e+08            -2.2%  1.132e+08        proc-vmstat.numa_hit
>  1.157e+08            -2.2%  1.131e+08        proc-vmstat.numa_local
>   51220291            -2.4%   49995156        proc-vmstat.pgactivate
>  1.377e+08            -2.1%  1.349e+08        proc-vmstat.pgalloc_normal
>  2.053e+08            -2.4%  2.003e+08        proc-vmstat.pgfault
>  1.368e+08            -2.2%  1.338e+08        proc-vmstat.pgfree
>   51073893            -2.4%   49869748        proc-vmstat.unevictable_pgs_culled
>      24.17 ą  2%      -1.7       22.46 ą  2%  perf-profile.calltrace.cycles-pp.__madvise
>      23.20 ą  2%      -1.7       21.52 ą  2%  perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>      23.33 ą  2%      -1.7       21.65 ą  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
>      23.24 ą  2%      -1.7       21.55 ą  2%  perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>      23.31 ą  2%      -1.7       21.62 ą  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>      22.51 ą  2%      -1.7       20.83 ą  2%  perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      18.38 ą  3%      -1.5       16.87 ą  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
>      18.12 ą  3%      -1.5       16.62 ą  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
>      17.63            -1.2       16.39        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      17.36            -1.2       16.14        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
>      17.61            -1.2       16.38        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      17.48            -1.2       16.25        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      15.64 ą  2%      -1.2       14.49        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>      15.03 ą  2%      -1.1       13.91 ą  2%  perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
>      13.49 ą  3%      -1.1       12.38 ą  2%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior
>      13.51 ą  3%      -1.1       12.41 ą  2%  perf-profile.calltrace.cycles-pp.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>      13.51 ą  3%      -1.1       12.41 ą  2%  perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise
>      12.53 ą  3%      -1.0       11.50 ą  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
>      36.55            -1.0       35.54        perf-profile.calltrace.cycles-pp.__munmap
>      36.27            -1.0       35.27        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
>      36.26            -1.0       35.26        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      12.06 ą  2%      -1.0       11.08 ą  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
>       7.33            -0.6        6.72 ą  2%  perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       7.10 ą  2%      -0.6        6.49 ą  2%  perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
>       7.11 ą  2%      -0.6        6.50 ą  2%  perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       7.02 ą  2%      -0.6        6.42 ą  2%  perf-profile.calltrace.cycles-pp.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
>       7.01 ą  2%      -0.6        6.40 ą  2%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
>       6.99 ą  3%      -0.6        6.42 ą  2%  perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_range.madvise_pageout.madvise_vma_behavior.do_madvise
>       7.38 ą  2%      -0.6        6.82 ą  2%  perf-profile.calltrace.cycles-pp.madvise_pageout.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
>       6.94 ą  3%      -0.6        6.38 ą  2%  perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range.madvise_pageout
>       6.29 ą  2%      -0.6        5.73 ą  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache
>       6.97 ą  2%      -0.6        6.41 ą  2%  perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_range.madvise_pageout.madvise_vma_behavior
>       6.38 ą  2%      -0.6        5.82 ą  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
>       6.16 ą  2%      -0.6        5.60 ą  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain
>       6.92 ą  3%      -0.6        6.36 ą  2%  perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range
>       7.10 ą  2%      -0.6        6.54 ą  2%  perf-profile.calltrace.cycles-pp.walk_page_range.madvise_pageout.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>       6.90 ą  3%      -0.6        6.34 ą  2%  perf-profile.calltrace.cycles-pp.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range
>       6.88 ą  3%      -0.6        6.32 ą  2%  perf-profile.calltrace.cycles-pp.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range.walk_pgd_range
>       6.97            -0.6        6.42        perf-profile.calltrace.cycles-pp.folios_put_refs.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
>       6.54 ą  3%      -0.5        5.99 ą  2%  perf-profile.calltrace.cycles-pp.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range.walk_p4d_range
>       7.84            -0.5        7.29        perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.__dentry_kill.dput.__fput
>       7.72            -0.5        7.17        perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill.dput
>       6.27 ą  3%      -0.5        5.75 ą  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range.walk_pud_range
>       6.17 ą  3%      -0.5        5.65 ą  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range.walk_pmd_range
>       6.09 ą  3%      -0.5        5.57 ą  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_lruvec_lock_irq.folio_isolate_lru.madvise_cold_or_pageout_pte_range
>       6.42            -0.5        5.90        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_evict_inode.evict
>       6.58 ą  3%      -0.5        6.07 ą  2%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap
>       6.25            -0.5        5.73 ą  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_evict_inode
>       6.62 ą  2%      -0.5        6.10 ą  2%  perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       6.62 ą  3%      -0.5        6.11 ą  2%  perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       6.18            -0.5        5.66 ą  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range
>       6.16 ą  3%      -0.5        5.67 ą  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region
>       6.14 ą  2%      -0.5        5.68        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict
>       6.07 ą  2%      -0.5        5.60 ą  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.truncate_inode_pages_range
>       3.43 ą  2%      -0.3        3.17        perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.__dentry_kill.dput
>       3.75 ą  2%      -0.2        3.50        perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.__dentry_kill.dput.__fput
>       3.41 ą  2%      -0.2        3.17        perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink
>       3.14 ą  3%      -0.2        2.91        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict.__dentry_kill
>       3.74 ą  2%      -0.2        3.50        perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64
>       3.13 ą  2%      -0.2        2.91        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
>       0.51            -0.2        0.33 ą 70%  perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.stress_fault
>       5.64            -0.1        5.50        perf-profile.calltrace.cycles-pp.stress_fault
>       4.70            -0.1        4.60        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.stress_fault
>       4.16            -0.1        4.07        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.stress_fault
>       4.12            -0.1        4.03        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
>       3.59            -0.1        3.52        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_fault
>       2.13            -0.1        2.08        perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>       0.92            -0.0        0.88        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.__x64_sys_pwrite64
>       0.71            -0.0        0.67        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
>       1.13            -0.0        1.09        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
>       0.81            -0.0        0.79        perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
>       0.75            -0.0        0.73        perf-profile.calltrace.cycles-pp.alloc_inode.new_inode.__shmem_get_inode.__shmem_file_setup.shmem_zero_setup
>       0.60            -0.0        0.58        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
>      18.59            +0.2       18.83        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      18.33            +0.2       18.58        perf-profile.calltrace.cycles-pp.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      17.29            +0.3       17.54        perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64
>      18.08            +0.3       18.34        perf-profile.calltrace.cycles-pp.__fput.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      17.19            +0.3       17.45        perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.task_work_run.syscall_exit_to_user_mode
>       1.96            +0.3        2.23        perf-profile.calltrace.cycles-pp.__libc_pwrite
>       1.84            +0.3        2.12        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>       1.85            +0.3        2.13        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>       1.82            +0.3        2.10        perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>       1.78            +0.3        2.06        perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>      15.98            +0.3       16.27        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.task_work_run
>       8.24            +0.9        9.15 ą  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
>       8.24            +0.9        9.16 ą  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink
>       8.36            +0.9        9.27 ą  2%  perf-profile.calltrace.cycles-pp.unlink
>       8.04            +0.9        8.96 ą  2%  perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
>       8.20            +0.9        9.13 ą  2%  perf-profile.calltrace.cycles-pp.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
>       5.36            +1.0        6.37 ą  3%  perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       3.64 ą  6%      +1.0        4.68 ą  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
>       3.88 ą  6%      +1.1        4.94 ą  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
>       1.35 ą 11%      +1.2        2.52 ą 10%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.do_unlinkat.__x64_sys_unlink
>       1.42 ą 10%      +1.2        2.62 ą 10%  perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64
>       8.34 ą  5%      +2.3       10.62 ą  6%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>       8.36 ą  5%      +2.3       10.64 ą  6%  perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>       8.53 ą  5%      +2.3       10.81 ą  6%  perf-profile.calltrace.cycles-pp.open64
>       8.39 ą  5%      +2.3       10.67 ą  6%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>       8.40 ą  5%      +2.3       10.68 ą  6%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
>       8.01 ą  5%      +2.3       10.30 ą  6%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.98 ą  5%      +2.3       10.27 ą  6%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
>       2.87 ą 10%      +2.3        5.19 ą 11%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.new_inode.ramfs_get_inode.ramfs_mknod
>       5.99 ą  6%      +2.3        8.32 ą  7%  perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
>       4.04 ą  7%      +2.3        6.38 ą  9%  perf-profile.calltrace.cycles-pp.new_inode.ramfs_get_inode.ramfs_mknod.lookup_open.open_last_lookups
>       3.06 ą  9%      +2.4        5.44 ą 11%  perf-profile.calltrace.cycles-pp._raw_spin_lock.new_inode.ramfs_get_inode.ramfs_mknod.lookup_open
>       5.53 ą  6%      +2.4        7.91 ą  8%  perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
>       4.53 ą  7%      +2.4        6.92 ą  9%  perf-profile.calltrace.cycles-pp.ramfs_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open
>       4.39 ą  7%      +2.4        6.79 ą  9%  perf-profile.calltrace.cycles-pp.ramfs_get_inode.ramfs_mknod.lookup_open.open_last_lookups.path_openat
>      37.47 ą  2%      -3.1       34.42 ą  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
>      36.99 ą  2%      -3.0       33.94 ą  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>      27.20 ą  2%      -2.2       24.96 ą  2%  perf-profile.children.cycles-pp.lru_add_drain
>      27.10 ą  2%      -2.2       24.88 ą  2%  perf-profile.children.cycles-pp.folio_batch_move_lru
>      24.23 ą  2%      -1.7       22.52 ą  2%  perf-profile.children.cycles-pp.__madvise
>      23.22 ą  2%      -1.7       21.53 ą  2%  perf-profile.children.cycles-pp.do_madvise
>      23.24 ą  2%      -1.7       21.56 ą  2%  perf-profile.children.cycles-pp.__x64_sys_madvise
>      22.52 ą  2%      -1.7       20.84 ą  2%  perf-profile.children.cycles-pp.madvise_vma_behavior
>      20.19 ą  3%      -1.6       18.56 ą  2%  perf-profile.children.cycles-pp.lru_add_drain_cpu
>      17.60            -1.2       16.37        perf-profile.children.cycles-pp.do_vmi_munmap
>      17.63            -1.2       16.40        perf-profile.children.cycles-pp.__x64_sys_munmap
>      17.62            -1.2       16.39        perf-profile.children.cycles-pp.__vm_munmap
>      17.38            -1.2       16.16        perf-profile.children.cycles-pp.do_vmi_align_munmap
>      15.65 ą  2%      -1.2       14.50        perf-profile.children.cycles-pp.unmap_region
>      15.04 ą  2%      -1.1       13.91 ą  2%  perf-profile.children.cycles-pp.zap_page_range_single
>      14.24 ą  2%      -1.1       13.19        perf-profile.children.cycles-pp.folios_put_refs
>      36.59            -1.0       35.58        perf-profile.children.cycles-pp.__munmap
>      12.71 ą  2%      -1.0       11.72        perf-profile.children.cycles-pp.__page_cache_release
>       7.75            -0.6        7.13        perf-profile.children.cycles-pp.tlb_finish_mmu
>       7.33 ą  2%      -0.6        6.72 ą  2%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
>       7.44            -0.6        6.82        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
>       7.39 ą  2%      -0.6        6.83 ą  2%  perf-profile.children.cycles-pp.madvise_pageout
>       6.95 ą  3%      -0.6        6.38 ą  2%  perf-profile.children.cycles-pp.walk_p4d_range
>       7.10 ą  2%      -0.6        6.54 ą  2%  perf-profile.children.cycles-pp.walk_page_range
>       6.98 ą  2%      -0.6        6.42 ą  2%  perf-profile.children.cycles-pp.walk_pgd_range
>       6.99 ą  3%      -0.6        6.43 ą  2%  perf-profile.children.cycles-pp.__walk_page_range
>       6.92 ą  3%      -0.6        6.36 ą  2%  perf-profile.children.cycles-pp.walk_pud_range
>       6.88 ą  3%      -0.6        6.32 ą  2%  perf-profile.children.cycles-pp.madvise_cold_or_pageout_pte_range
>       6.90 ą  3%      -0.6        6.34 ą  2%  perf-profile.children.cycles-pp.walk_pmd_range
>       6.55 ą  3%      -0.6        6.00 ą  2%  perf-profile.children.cycles-pp.folio_isolate_lru
>       7.84            -0.5        7.30        perf-profile.children.cycles-pp.shmem_evict_inode
>       7.73            -0.5        7.18        perf-profile.children.cycles-pp.shmem_undo_range
>       6.28 ą  3%      -0.5        5.75 ą  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irq
>       6.30 ą  3%      -0.5        5.78 ą  2%  perf-profile.children.cycles-pp._raw_spin_lock_irq
>       7.50 ą  2%      -0.5        7.02        perf-profile.children.cycles-pp.truncate_inode_pages_range
>       6.62            -0.2        6.47        perf-profile.children.cycles-pp.stress_fault
>       5.72            -0.1        5.60        perf-profile.children.cycles-pp.asm_exc_page_fault
>       2.28            -0.1        2.15        perf-profile.children.cycles-pp.__do_softirq
>       2.26            -0.1        2.14        perf-profile.children.cycles-pp.rcu_do_batch
>       2.26            -0.1        2.15        perf-profile.children.cycles-pp.rcu_core
>       2.12            -0.1        2.01        perf-profile.children.cycles-pp.irq_exit_rcu
>       2.00            -0.1        1.91        perf-profile.children.cycles-pp.kmem_cache_free
>       0.25 ą  2%      -0.1        0.16 ą  2%  perf-profile.children.cycles-pp.vfs_fallocate
>       2.34            -0.1        2.25        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       4.17            -0.1        4.08        perf-profile.children.cycles-pp.exc_page_fault
>       2.32            -0.1        2.23        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>       0.29 ą  2%      -0.1        0.20        perf-profile.children.cycles-pp.__x64_sys_fallocate
>       4.14            -0.1        4.05        perf-profile.children.cycles-pp.do_user_addr_fault
>       0.42 ą  3%      -0.1        0.34 ą  2%  perf-profile.children.cycles-pp.posix_fallocate64
>       3.60            -0.1        3.53        perf-profile.children.cycles-pp.handle_mm_fault
>       1.70 ą  2%      -0.1        1.63        perf-profile.children.cycles-pp.alloc_inode
>       2.96            -0.1        2.90        perf-profile.children.cycles-pp.do_fault
>       0.17 ą  3%      -0.1        0.11 ą  3%  perf-profile.children.cycles-pp.rw_verify_area
>       1.03            -0.0        0.99        perf-profile.children.cycles-pp.__slab_free
>       0.92            -0.0        0.88        perf-profile.children.cycles-pp.simple_write_begin
>       0.64 ą  2%      -0.0        0.59        perf-profile.children.cycles-pp.inode_init_always
>       1.16            -0.0        1.12        perf-profile.children.cycles-pp.generic_perform_write
>       0.46 ą  3%      -0.0        0.42 ą  2%  perf-profile.children.cycles-pp.mnt_want_write
>       0.84            -0.0        0.80        perf-profile.children.cycles-pp.__filemap_get_folio
>       1.12            -0.0        1.08        perf-profile.children.cycles-pp.perf_event_mmap
>       1.08            -0.0        1.05        perf-profile.children.cycles-pp.perf_event_mmap_event
>       0.15            -0.0        0.12 ą  3%  perf-profile.children.cycles-pp.__fsnotify_parent
>       0.23 ą  3%      -0.0        0.20 ą  2%  perf-profile.children.cycles-pp.may_open
>       0.58            -0.0        0.55        perf-profile.children.cycles-pp.mas_prev_slot
>       0.28            -0.0        0.26 ą  4%  perf-profile.children.cycles-pp.__count_memcg_events
>       0.45 ą  2%      -0.0        0.42 ą  2%  perf-profile.children.cycles-pp.filemap_add_folio
>       0.18 ą  2%      -0.0        0.15 ą  4%  perf-profile.children.cycles-pp.security_inode_alloc
>       0.57            -0.0        0.54        perf-profile.children.cycles-pp.__cond_resched
>       0.26            -0.0        0.24 ą  2%  perf-profile.children.cycles-pp.percpu_counter_add_batch
>       0.68            -0.0        0.66        perf-profile.children.cycles-pp.flush_tlb_mm_range
>       0.32 ą  2%      -0.0        0.30        perf-profile.children.cycles-pp.generic_file_mmap
>       0.14 ą  3%      -0.0        0.12 ą  7%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
>       0.31 ą  2%      -0.0        0.29 ą  2%  perf-profile.children.cycles-pp.touch_atime
>       0.50 ą  2%      -0.0        0.48        perf-profile.children.cycles-pp.mas_rev_awalk
>       0.32            -0.0        0.30 ą  2%  perf-profile.children.cycles-pp.alloc_pages_mpol
>       0.22 ą  2%      -0.0        0.20 ą  2%  perf-profile.children.cycles-pp.shmem_alloc_folio
>       0.17 ą  2%      -0.0        0.16 ą  3%  perf-profile.children.cycles-pp.fsnotify
>       0.12 ą  4%      -0.0        0.10 ą  3%  perf-profile.children.cycles-pp.blk_finish_plug
>       0.42            -0.0        0.40        perf-profile.children.cycles-pp.entry_SYSCALL_64
>       0.17 ą  2%      -0.0        0.15 ą  2%  perf-profile.children.cycles-pp.folio_alloc
>       0.31            -0.0        0.30        perf-profile.children.cycles-pp.mas_ascend
>       0.18 ą  2%      -0.0        0.17        perf-profile.children.cycles-pp.fsnotify_grab_connector
>       0.10 ą  4%      -0.0        0.08 ą  5%  perf-profile.children.cycles-pp.kfree
>       0.19 ą  2%      -0.0        0.18        perf-profile.children.cycles-pp.xas_start
>       0.64            -0.0        0.62        perf-profile.children.cycles-pp.lru_add_fn
>       0.09 ą  4%      -0.0        0.08        perf-profile.children.cycles-pp.prepend_path
>       0.14 ą  3%      -0.0        0.12 ą  3%  perf-profile.children.cycles-pp.simple_getattr
>       0.20 ą  2%      -0.0        0.19 ą  2%  perf-profile.children.cycles-pp.fsnotify_destroy_marks
>       0.06 ą  6%      +0.0        0.08 ą  6%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
>       0.08 ą  9%      +0.0        0.10 ą  5%  perf-profile.children.cycles-pp.security_current_getsecid_subj
>       0.10 ą  7%      +0.0        0.12        perf-profile.children.cycles-pp.security_file_post_open
>       0.09 ą  6%      +0.0        0.12 ą  4%  perf-profile.children.cycles-pp.ima_file_check
>       0.02 ą 99%      +0.0        0.06        perf-profile.children.cycles-pp.__x64_sys_fcntl
>       0.55 ą  2%      +0.1        0.62 ą  2%  perf-profile.children.cycles-pp.inode_wait_for_writeback
>      91.01            +0.2       91.25        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      18.74            +0.2       18.98        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>      90.84            +0.2       91.08        perf-profile.children.cycles-pp.do_syscall_64
>       0.00            +0.2        0.24 ą  3%  perf-profile.children.cycles-pp.file_start_write_area
>      18.34            +0.2       18.59        perf-profile.children.cycles-pp.task_work_run
>      17.46            +0.3       17.72        perf-profile.children.cycles-pp.dput
>      17.25            +0.3       17.50        perf-profile.children.cycles-pp.__dentry_kill
>      18.09            +0.3       18.35        perf-profile.children.cycles-pp.__fput
>       1.98            +0.3        2.25        perf-profile.children.cycles-pp.__libc_pwrite
>       1.82            +0.3        2.10        perf-profile.children.cycles-pp.vfs_write
>       1.82            +0.3        2.10        perf-profile.children.cycles-pp.__x64_sys_pwrite64
>       8.38            +0.9        9.29 ą  2%  perf-profile.children.cycles-pp.unlink
>       8.21            +0.9        9.13 ą  2%  perf-profile.children.cycles-pp.__x64_sys_unlink
>       8.04            +0.9        8.96 ą  2%  perf-profile.children.cycles-pp.do_unlinkat
>      21.35            +1.3       22.65        perf-profile.children.cycles-pp.evict
>       8.11 ą  5%      +2.0       10.10 ą  3%  perf-profile.children.cycles-pp.new_inode
>       8.36 ą  5%      +2.3       10.64 ą  6%  perf-profile.children.cycles-pp.do_sys_openat2
>       8.37 ą  5%      +2.3       10.65 ą  6%  perf-profile.children.cycles-pp.__x64_sys_openat
>       8.55 ą  5%      +2.3       10.83 ą  6%  perf-profile.children.cycles-pp.open64
>       7.99 ą  5%      +2.3       10.28 ą  6%  perf-profile.children.cycles-pp.path_openat
>       8.02 ą  5%      +2.3       10.31 ą  6%  perf-profile.children.cycles-pp.do_filp_open
>       6.00 ą  6%      +2.3        8.33 ą  7%  perf-profile.children.cycles-pp.open_last_lookups
>       5.54 ą  6%      +2.4        7.92 ą  8%  perf-profile.children.cycles-pp.lookup_open
>       4.54 ą  7%      +2.4        6.93 ą  9%  perf-profile.children.cycles-pp.ramfs_mknod
>       4.40 ą  7%      +2.4        6.79 ą  9%  perf-profile.children.cycles-pp.ramfs_get_inode
>      12.62 ą  6%      +4.4       16.99 ą  4%  perf-profile.children.cycles-pp._raw_spin_lock

I am scratching my head to figure out why these functions are affected by
the regressing commit, which as far as I can see only adds
if (READ_ONCE(sb->s_write_srcu)) test in write helpers,
which should always be false.

The only thing I can think of is that s_write_srcu on the same cache line as
s_inode_*_lock, which impacts performance of acquiring those spinlocks,
but this explanation seems far-fetched.

Anyway, I tried moving sb->s_write_srcu next to s_fsnotify_info and other
read-mostly sb members to see if it makes any difference.
Also rebased branch on v6.10-rc1:

* 1d15ffdc12d2 - (sb_write_barrier) fanotify: introduce FAN_MARK_SYNC flag
* 5029c0cbd085 - fanotify: activate sb write barriers for pre-modify
event watchers
* fda0270c803d - fs: hold s_write_srcu for pre-modify permission
events on aio write
* e34d0ca5cdfd - fs: hold s_write_srcu for pre-modify permission events on write
* afdd0701bfb7 - fs: add srcu variants for mnt_{want,drop}_write() helpers
* 61d0f429d8bf - fs: implement 'vfs write barriers'

Oliver,

Can you please re-test?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression
  2024-05-30 13:27 ` Amir Goldstein
@ 2024-06-03  7:56   ` Oliver Sang
  2024-06-03  8:13     ` Amir Goldstein
  0 siblings, 1 reply; 4+ messages in thread
From: Oliver Sang @ 2024-06-03  7:56 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang

hi, Amir,

On Thu, May 30, 2024 at 04:27:57PM +0300, Amir Goldstein wrote:
> On Thu, May 23, 2024 at 5:59 AM kernel test robot <oliver.sang@intel.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed a -2.3% regression of stress-ng.fault.ops_per_sec on:

[...]

> I am scratching my head to figure out why these functions are affected by
> the regressing commit, which as far as I can see only adds
> if (READ_ONCE(sb->s_write_srcu)) test in write helpers,
> which should always be false.
> 
> The only thing I can think of is that s_write_srcu on the same cache line as
> s_inode_*_lock, which impacts performance of acquiring those spinlocks,
> but this explanation seems far-fetched.
> 
> Anyway, I tried moving sb->s_write_srcu next to s_fsnotify_info and other
> read-mostly sb members to see if it makes any difference.
> Also rebased branch on v6.10-rc1:
> 
> * 1d15ffdc12d2 - (sb_write_barrier) fanotify: introduce FAN_MARK_SYNC flag
> * 5029c0cbd085 - fanotify: activate sb write barriers for pre-modify
> event watchers
> * fda0270c803d - fs: hold s_write_srcu for pre-modify permission
> events on aio write
> * e34d0ca5cdfd - fs: hold s_write_srcu for pre-modify permission events on write
> * afdd0701bfb7 - fs: add srcu variants for mnt_{want,drop}_write() helpers
> * 61d0f429d8bf - fs: implement 'vfs write barriers'
> 
> Oliver,
> 
> Can you please re-test?

I compare the tip 1d15ffdc12d2 with v6.10-rc1, found there is no peformance
difference now. (if you need full comparison, please let me know). Thanks!


=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/fault/stress-ng/60s

       v6.10-rc1 1d15ffdc12d22e06ffa9ca34afd
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
  49171337            +0.0%   49192831        stress-ng.fault.ops
    819521            +0.0%     819879        stress-ng.fault.ops_per_sec


> 
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression
  2024-06-03  7:56   ` Oliver Sang
@ 2024-06-03  8:13     ` Amir Goldstein
  0 siblings, 0 replies; 4+ messages in thread
From: Amir Goldstein @ 2024-06-03  8:13 UTC (permalink / raw)
  To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp

On Mon, Jun 3, 2024 at 10:57 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Thu, May 30, 2024 at 04:27:57PM +0300, Amir Goldstein wrote:
> > On Thu, May 23, 2024 at 5:59 AM kernel test robot <oliver.sang@intel.com> wrote:
> > >
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed a -2.3% regression of stress-ng.fault.ops_per_sec on:
>
> [...]
>
> > I am scratching my head to figure out why these functions are affected by
> > the regressing commit, which as far as I can see only adds
> > if (READ_ONCE(sb->s_write_srcu)) test in write helpers,
> > which should always be false.
> >
> > The only thing I can think of is that s_write_srcu on the same cache line as
> > s_inode_*_lock, which impacts performance of acquiring those spinlocks,
> > but this explanation seems far-fetched.
> >
> > Anyway, I tried moving sb->s_write_srcu next to s_fsnotify_info and other
> > read-mostly sb members to see if it makes any difference.
> > Also rebased branch on v6.10-rc1:
> >
> > * 1d15ffdc12d2 - (sb_write_barrier) fanotify: introduce FAN_MARK_SYNC flag
> > * 5029c0cbd085 - fanotify: activate sb write barriers for pre-modify
> > event watchers
> > * fda0270c803d - fs: hold s_write_srcu for pre-modify permission
> > events on aio write
> > * e34d0ca5cdfd - fs: hold s_write_srcu for pre-modify permission events on write
> > * afdd0701bfb7 - fs: add srcu variants for mnt_{want,drop}_write() helpers
> > * 61d0f429d8bf - fs: implement 'vfs write barriers'
> >
> > Oliver,
> >
> > Can you please re-test?
>
> I compare the tip 1d15ffdc12d2 with v6.10-rc1, found there is no peformance
> difference now. (if you need full comparison, please let me know). Thanks!
>

Excellent, thanks you!

Amir.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-03  8:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23  2:58 [amir73il:sb_write_barrier] [fs] 8829cb6189: stress-ng.fault.ops_per_sec -2.3% regression kernel test robot
2024-05-30 13:27 ` Amir Goldstein
2024-06-03  7:56   ` Oliver Sang
2024-06-03  8:13     ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.