All of lore.kernel.org
 help / color / mirror / Atom feed
* [amir73il:sb_write_barrier] [fanotify]  9d1fd61f1d: unixbench.throughput -7.9% regression
@ 2024-05-29  8:25 kernel test robot
  2024-05-29 11:17 ` Amir Goldstein
  0 siblings, 1 reply; 17+ messages in thread
From: kernel test robot @ 2024-05-29  8:25 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: oe-lkp, lkp, oliver.sang



Hello,

kernel test robot noticed a -7.9% regression of unixbench.throughput on:


commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
https://github.com/amir73il/linux sb_write_barrier

testcase: unixbench
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	runtime: 300s
	nr_task: 100%
	test: fsbuffer-w
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit: 
  00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event")
  9d1fd61f1d ("fanotify: pass optional file access range in pre-content event")

00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  1.23e+08            -7.9%  1.133e+08        unixbench.throughput
      6169            -7.7%       5694        unixbench.time.user_time
 4.566e+10            -7.9%  4.206e+10        unixbench.workload
 1.513e+11            -4.5%  1.445e+11        perf-stat.i.branch-instructions
   6891152            +4.8%    7221484        perf-stat.i.branch-misses
  29764445 ±  2%      -7.4%   27565609 ±  3%  perf-stat.i.cache-references
      0.91            +2.0%       0.93        perf-stat.i.cpi
 7.187e+11            -2.7%  6.996e+11        perf-stat.i.instructions
      1.26            -2.6%       1.23        perf-stat.i.ipc
      0.00            +0.0        0.01        perf-stat.overall.branch-miss-rate%
      0.73            +2.7%       0.75        perf-stat.overall.cpi
      1.37            -2.6%       1.34        perf-stat.overall.ipc
      5828            +5.7%       6162        perf-stat.overall.path-length
 1.505e+11            -4.5%  1.437e+11        perf-stat.ps.branch-instructions
   6873687            +4.8%    7203107        perf-stat.ps.branch-misses
  29721957 ±  2%      -7.3%   27538369 ±  3%  perf-stat.ps.cache-references
 7.148e+11            -2.6%   6.96e+11        perf-stat.ps.instructions
 2.662e+14            -2.6%  2.592e+14        perf-stat.total.instructions
     57.79            -2.0       55.78        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     37.58            -2.0       35.63        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
     13.06            -1.0       12.04        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
     13.81            -1.0       12.83        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     12.72            -0.9       11.78        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
      7.00            -0.5        6.47        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      6.53            -0.5        6.02        perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      5.36            -0.5        4.89        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      3.66            -0.4        3.28        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
      2.68            -0.3        2.36        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
      6.57            -0.2        6.34        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
      2.36 ±  2%      -0.2        2.18 ±  2%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      1.83            -0.2        1.66        perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      2.92            -0.2        2.76        perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      2.65            -0.2        2.49        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
      3.95            -0.1        3.83        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
      1.62            -0.1        1.50        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.74            -0.1        0.64        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.26            -0.1        3.17        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
      3.57            -0.1        3.49        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.61            -0.1        1.53        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.93            -0.1        0.85        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      1.05            -0.1        0.99        perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
      0.61            -0.1        0.55        perf-profile.calltrace.cycles-pp.w_test
      0.64            -0.1        0.58        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.87            -0.1        0.82        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
      2.50            -0.1        2.44        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
      0.62            -0.1        0.56        perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
      0.74            -0.0        0.69        perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
      0.91            -0.0        0.86        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.84            -0.0        0.79        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.74            -0.0        0.71        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      0.62            -0.0        0.59        perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.97            +0.0        1.00        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      0.91            +0.1        0.97        perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
      0.86 ±  3%      +0.1        0.94        perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
      0.58 ±  2%      +0.1        0.66 ±  7%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
     11.24            +0.1       11.36        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      2.01 ±  2%      +0.1        2.14        perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      6.04            +0.2        6.24        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      5.17            +0.2        5.42        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
     96.75            +0.3       97.03        perf-profile.calltrace.cycles-pp.write
      2.57            +0.4        2.92        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
      3.20            +0.4        3.57        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
     84.82            +1.1       85.88        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
     83.38            +1.2       84.56        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     78.73            +1.5       80.20        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     74.54            +1.8       76.32        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.00            +4.0        3.99        perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64
      5.32            +4.2        9.48        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     58.42            -2.0       56.38        perf-profile.children.cycles-pp.generic_file_write_iter
     38.46            -2.0       36.50        perf-profile.children.cycles-pp.generic_perform_write
     13.99            -1.0       13.01        perf-profile.children.cycles-pp.simple_write_begin
     13.11            -1.0       12.15        perf-profile.children.cycles-pp.__filemap_get_folio
      7.23            -0.6        6.66        perf-profile.children.cycles-pp.entry_SYSCALL_64
      7.12            -0.5        6.59        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
      6.73            -0.5        6.21        perf-profile.children.cycles-pp.filemap_get_entry
      5.76            -0.5        5.26        perf-profile.children.cycles-pp.simple_write_end
      4.05            -0.4        3.64        perf-profile.children.cycles-pp.security_file_permission
      2.93            -0.3        2.59        perf-profile.children.cycles-pp.apparmor_file_permission
      4.32            -0.3        4.04        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      4.20            -0.3        3.92        perf-profile.children.cycles-pp.__cond_resched
      6.91            -0.2        6.67        perf-profile.children.cycles-pp.file_remove_privs_flags
      2.43            -0.2        2.24        perf-profile.children.cycles-pp.rcu_all_qs
      3.10            -0.2        2.92        perf-profile.children.cycles-pp.xas_load
      2.47 ±  2%      -0.2        2.29 ±  2%  perf-profile.children.cycles-pp.__fdget_pos
      1.92            -0.2        1.74        perf-profile.children.cycles-pp.folio_unlock
      3.11            -0.2        2.94        perf-profile.children.cycles-pp.down_write
      4.18            -0.1        4.04        perf-profile.children.cycles-pp.security_inode_need_killpriv
      1.68            -0.1        1.56        perf-profile.children.cycles-pp.up_write
      3.48            -0.1        3.38        perf-profile.children.cycles-pp.cap_inode_need_killpriv
      1.96            -0.1        1.87        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      1.28            -0.1        1.18        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.92            -0.1        0.84        perf-profile.children.cycles-pp.w_test
      3.14            -0.1        3.06        perf-profile.children.cycles-pp.__vfs_getxattr
      1.00            -0.1        0.92        perf-profile.children.cycles-pp.aa_file_perm
      1.29            -0.1        1.22        perf-profile.children.cycles-pp.xas_descend
      0.76            -0.1        0.70        perf-profile.children.cycles-pp.x64_sys_call
      0.87            -0.1        0.80        perf-profile.children.cycles-pp.setattr_should_drop_suidgid
      1.07            -0.1        1.01        perf-profile.children.cycles-pp.xattr_resolve_name
      1.10            -0.1        1.04        perf-profile.children.cycles-pp.folio_wait_stable
      1.05            -0.1        1.00        perf-profile.children.cycles-pp.folio_mapping
      0.73            -0.1        0.67        perf-profile.children.cycles-pp.xas_start
      0.93            -0.1        0.88        perf-profile.children.cycles-pp.folio_mark_dirty
      0.50            -0.0        0.46        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.60            -0.0        0.56        perf-profile.children.cycles-pp.inode_to_bdi
      0.43            -0.0        0.39        perf-profile.children.cycles-pp.write@plt
      0.36            -0.0        0.33        perf-profile.children.cycles-pp.amd_clear_divider
      0.37            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
      0.33            -0.0        0.31        perf-profile.children.cycles-pp.noop_dirty_folio
      0.36            -0.0        0.34        perf-profile.children.cycles-pp.is_bad_inode
      0.24            -0.0        0.23 ±  2%  perf-profile.children.cycles-pp.file_remove_privs
      1.18            +0.0        1.21        perf-profile.children.cycles-pp.strcmp
      1.02            +0.1        1.08        perf-profile.children.cycles-pp.timestamp_truncate
     99.01            +0.1       99.09        perf-profile.children.cycles-pp.write
      0.98 ±  3%      +0.1        1.06        perf-profile.children.cycles-pp.generic_write_check_limits
      0.68 ±  2%      +0.1        0.77 ±  6%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
     11.58            +0.1       11.69        perf-profile.children.cycles-pp.__generic_file_write_iter
      2.36 ±  2%      +0.1        2.50        perf-profile.children.cycles-pp.generic_write_checks
      5.57            +0.2        5.75        perf-profile.children.cycles-pp.fault_in_readable
      6.28            +0.2        6.49        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
      2.98            +0.4        3.33        perf-profile.children.cycles-pp.inode_needs_update_time
      3.51            +0.4        3.89        perf-profile.children.cycles-pp.file_update_time
     85.24            +1.1       86.31        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     84.05            +1.2       85.21        perf-profile.children.cycles-pp.do_syscall_64
     79.32            +1.5       80.78        perf-profile.children.cycles-pp.ksys_write
     75.49            +1.7       77.21        perf-profile.children.cycles-pp.vfs_write
      3.64            +4.0        7.64        perf-profile.children.cycles-pp.__fsnotify_parent
      5.68            +4.3       10.03        perf-profile.children.cycles-pp.rw_verify_area
      6.96            -0.5        6.44        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
      6.52            -0.5        6.01        perf-profile.self.cycles-pp.write
      6.92            -0.4        6.48        perf-profile.self.cycles-pp.vfs_write
      3.59            -0.3        3.24        perf-profile.self.cycles-pp.filemap_get_entry
      4.41            -0.3        4.09        perf-profile.self.cycles-pp.__filemap_get_folio
      4.23            -0.3        3.95        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      2.79            -0.3        2.52        perf-profile.self.cycles-pp.simple_write_end
      1.76            -0.2        1.52        perf-profile.self.cycles-pp.apparmor_file_permission
      2.32 ±  2%      -0.2        2.16 ±  2%  perf-profile.self.cycles-pp.__fdget_pos
      1.79            -0.2        1.62        perf-profile.self.cycles-pp.folio_unlock
      2.05            -0.2        1.89        perf-profile.self.cycles-pp.down_write
      2.35            -0.1        2.22        perf-profile.self.cycles-pp.__cond_resched
      1.89            -0.1        1.77        perf-profile.self.cycles-pp.do_syscall_64
      1.38            -0.1        1.26        perf-profile.self.cycles-pp.entry_SYSCALL_64
      1.56            -0.1        1.45        perf-profile.self.cycles-pp.up_write
      1.30            -0.1        1.19        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.42            -0.1        1.31        perf-profile.self.cycles-pp.rcu_all_qs
      1.12            -0.1        1.02        perf-profile.self.cycles-pp.security_file_permission
      1.46            -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
      0.90            -0.1        0.83        perf-profile.self.cycles-pp.aa_file_perm
      1.29            -0.1        1.22        perf-profile.self.cycles-pp.xas_load
      0.74            -0.1        0.67        perf-profile.self.cycles-pp.w_test
      1.08            -0.1        1.01        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.98            -0.1        1.92        perf-profile.self.cycles-pp.file_remove_privs_flags
      1.30            -0.1        1.24        perf-profile.self.cycles-pp.__vfs_getxattr
      1.06            -0.1        1.00        perf-profile.self.cycles-pp.xas_descend
      0.80            -0.1        0.74        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.63            -0.1        0.58        perf-profile.self.cycles-pp.x64_sys_call
      0.74            -0.1        0.69        perf-profile.self.cycles-pp.setattr_should_drop_suidgid
      0.63            -0.0        0.58        perf-profile.self.cycles-pp.xas_start
      0.87            -0.0        0.83        perf-profile.self.cycles-pp.folio_mapping
      0.50            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.60            -0.0        0.57        perf-profile.self.cycles-pp.xattr_resolve_name
      0.48            -0.0        0.44        perf-profile.self.cycles-pp.folio_mark_dirty
      0.68            -0.0        0.65        perf-profile.self.cycles-pp.security_inode_need_killpriv
      0.36            -0.0        0.33 ±  2%  perf-profile.self.cycles-pp.inode_to_bdi
      0.52            -0.0        0.49        perf-profile.self.cycles-pp.folio_wait_stable
      0.34            -0.0        0.32        perf-profile.self.cycles-pp.cap_inode_need_killpriv
      0.89            -0.0        0.87        perf-profile.self.cycles-pp.simple_write_begin
      0.25            -0.0        0.23        perf-profile.self.cycles-pp.__x64_sys_write
      0.23 ±  2%      -0.0        0.22 ±  2%  perf-profile.self.cycles-pp.amd_clear_divider
      0.23 ±  2%      -0.0        0.21        perf-profile.self.cycles-pp.noop_dirty_folio
      0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.write@plt
      0.24            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.is_bad_inode
      0.62            +0.0        0.65        perf-profile.self.cycles-pp.file_update_time
      0.86            +0.0        0.90        perf-profile.self.cycles-pp.strcmp
      0.69            +0.0        0.74        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
      0.75 ±  3%      +0.1        0.81        perf-profile.self.cycles-pp.generic_write_check_limits
      1.42 ±  2%      +0.1        1.48        perf-profile.self.cycles-pp.generic_write_checks
      0.82            +0.1        0.89        perf-profile.self.cycles-pp.timestamp_truncate
      0.58 ±  3%      +0.1        0.66 ±  6%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
      5.44            +0.2        5.60        perf-profile.self.cycles-pp.fault_in_readable
      1.36            +0.2        1.55        perf-profile.self.cycles-pp.inode_needs_update_time
      1.76 ±  3%      +0.9        2.64        perf-profile.self.cycles-pp.rw_verify_area
      3.46            +3.8        7.25        perf-profile.self.cycles-pp.__fsnotify_parent




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-05-29  8:25 [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression kernel test robot
@ 2024-05-29 11:17 ` Amir Goldstein
  2024-05-31  3:15   ` Oliver Sang
  0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-05-29 11:17 UTC (permalink / raw)
  To: Jan Kara, oe-lkp; +Cc: lkp, kernel test robot

On Wed, May 29, 2024 at 11:26 AM kernel test robot
<oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a -7.9% regression of unixbench.throughput on:
>
>
> commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> https://github.com/amir73il/linux sb_write_barrier
>

Jan,

I speculate that the regression is due to the fact that we store and pass the
path information on struct file_range on the stack before the optimizations
in fsnotify_parent(), so rw_verify_area() pays some price for the stores
and __fsnotify_parent() pays a bigger price for fetches?

Luckily, we already have the way to check
fsnotify_sb_has_priority_watchers(inode->i_sb,
                                               FSNOTIFY_PRIO_PRE_CONTENT))
so now I used it to optimize out the fsnotify_file_range() inline
code entirely.

Oliver,

Can you please re-test with fixed branch (also rebased on v6.10-rc1):

* a82fd282befc - (fan_pre_content) fanotify: report file range info
with pre-content events
* f301cd18006c - fanotify: rename a misnamed constant
* 64108c0b47db - fanotify: pass optional file access range in pre-content event
* 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
* 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
* 83af0c89527a - fsnotify: generate pre-content permission event on exec
* aca408421327 - fsnotify: generate pre-content permission event on open
* 93656e196b00 - fsnotify: introduce pre-content permission event

The optimization was done in the first commit (fsnotify: introduce
pre-content permission event),
but impacts the regressing commit (fanotify: pass optional file access
range in pre-content event).
no need to test all middle commits.

Thanks,
Amir.




> testcase: unixbench
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> parameters:
>
>         runtime: 300s
>         nr_task: 100%
>         test: fsbuffer-w
>         cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event")
>   9d1fd61f1d ("fanotify: pass optional file access range in pre-content event")
>
> 00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>   1.23e+08            -7.9%  1.133e+08        unixbench.throughput
>       6169            -7.7%       5694        unixbench.time.user_time
>  4.566e+10            -7.9%  4.206e+10        unixbench.workload
>  1.513e+11            -4.5%  1.445e+11        perf-stat.i.branch-instructions
>    6891152            +4.8%    7221484        perf-stat.i.branch-misses
>   29764445 ±  2%      -7.4%   27565609 ±  3%  perf-stat.i.cache-references
>       0.91            +2.0%       0.93        perf-stat.i.cpi
>  7.187e+11            -2.7%  6.996e+11        perf-stat.i.instructions
>       1.26            -2.6%       1.23        perf-stat.i.ipc
>       0.00            +0.0        0.01        perf-stat.overall.branch-miss-rate%
>       0.73            +2.7%       0.75        perf-stat.overall.cpi
>       1.37            -2.6%       1.34        perf-stat.overall.ipc
>       5828            +5.7%       6162        perf-stat.overall.path-length
>  1.505e+11            -4.5%  1.437e+11        perf-stat.ps.branch-instructions
>    6873687            +4.8%    7203107        perf-stat.ps.branch-misses
>   29721957 ±  2%      -7.3%   27538369 ±  3%  perf-stat.ps.cache-references
>  7.148e+11            -2.6%   6.96e+11        perf-stat.ps.instructions
>  2.662e+14            -2.6%  2.592e+14        perf-stat.total.instructions
>      57.79            -2.0       55.78        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      37.58            -2.0       35.63        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>      13.06            -1.0       12.04        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
>      13.81            -1.0       12.83        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>      12.72            -0.9       11.78        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
>       7.00            -0.5        6.47        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       6.53            -0.5        6.02        perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       5.36            -0.5        4.89        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       3.66            -0.4        3.28        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
>       2.68            -0.3        2.36        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
>       6.57            -0.2        6.34        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
>       2.36 ±  2%      -0.2        2.18 ±  2%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       1.83            -0.2        1.66        perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
>       2.92            -0.2        2.76        perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       2.65            -0.2        2.49        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
>       3.95            -0.1        3.83        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
>       1.62            -0.1        1.50        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       0.74            -0.1        0.64        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       3.26            -0.1        3.17        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
>       3.57            -0.1        3.49        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.61            -0.1        1.53        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       0.93            -0.1        0.85        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       1.05            -0.1        0.99        perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
>       0.61            -0.1        0.55        perf-profile.calltrace.cycles-pp.w_test
>       0.64            -0.1        0.58        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       0.87            -0.1        0.82        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
>       2.50            -0.1        2.44        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
>       0.62            -0.1        0.56        perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
>       0.74            -0.0        0.69        perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
>       0.91            -0.0        0.86        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       0.84            -0.0        0.79        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
>       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       0.74            -0.0        0.71        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
>       0.62            -0.0        0.59        perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       0.97            +0.0        1.00        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
>       0.91            +0.1        0.97        perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
>       0.86 ±  3%      +0.1        0.94        perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
>       0.58 ±  2%      +0.1        0.66 ±  7%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
>      11.24            +0.1       11.36        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       2.01 ±  2%      +0.1        2.14        perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       6.04            +0.2        6.24        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       5.17            +0.2        5.42        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
>      96.75            +0.3       97.03        perf-profile.calltrace.cycles-pp.write
>       2.57            +0.4        2.92        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
>       3.20            +0.4        3.57        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
>      84.82            +1.1       85.88        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
>      83.38            +1.2       84.56        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>      78.73            +1.5       80.20        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>      74.54            +1.8       76.32        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       0.00            +4.0        3.99        perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64
>       5.32            +4.2        9.48        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      58.42            -2.0       56.38        perf-profile.children.cycles-pp.generic_file_write_iter
>      38.46            -2.0       36.50        perf-profile.children.cycles-pp.generic_perform_write
>      13.99            -1.0       13.01        perf-profile.children.cycles-pp.simple_write_begin
>      13.11            -1.0       12.15        perf-profile.children.cycles-pp.__filemap_get_folio
>       7.23            -0.6        6.66        perf-profile.children.cycles-pp.entry_SYSCALL_64
>       7.12            -0.5        6.59        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
>       6.73            -0.5        6.21        perf-profile.children.cycles-pp.filemap_get_entry
>       5.76            -0.5        5.26        perf-profile.children.cycles-pp.simple_write_end
>       4.05            -0.4        3.64        perf-profile.children.cycles-pp.security_file_permission
>       2.93            -0.3        2.59        perf-profile.children.cycles-pp.apparmor_file_permission
>       4.32            -0.3        4.04        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       4.20            -0.3        3.92        perf-profile.children.cycles-pp.__cond_resched
>       6.91            -0.2        6.67        perf-profile.children.cycles-pp.file_remove_privs_flags
>       2.43            -0.2        2.24        perf-profile.children.cycles-pp.rcu_all_qs
>       3.10            -0.2        2.92        perf-profile.children.cycles-pp.xas_load
>       2.47 ±  2%      -0.2        2.29 ±  2%  perf-profile.children.cycles-pp.__fdget_pos
>       1.92            -0.2        1.74        perf-profile.children.cycles-pp.folio_unlock
>       3.11            -0.2        2.94        perf-profile.children.cycles-pp.down_write
>       4.18            -0.1        4.04        perf-profile.children.cycles-pp.security_inode_need_killpriv
>       1.68            -0.1        1.56        perf-profile.children.cycles-pp.up_write
>       3.48            -0.1        3.38        perf-profile.children.cycles-pp.cap_inode_need_killpriv
>       1.96            -0.1        1.87        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       1.28            -0.1        1.18        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
>       0.92            -0.1        0.84        perf-profile.children.cycles-pp.w_test
>       3.14            -0.1        3.06        perf-profile.children.cycles-pp.__vfs_getxattr
>       1.00            -0.1        0.92        perf-profile.children.cycles-pp.aa_file_perm
>       1.29            -0.1        1.22        perf-profile.children.cycles-pp.xas_descend
>       0.76            -0.1        0.70        perf-profile.children.cycles-pp.x64_sys_call
>       0.87            -0.1        0.80        perf-profile.children.cycles-pp.setattr_should_drop_suidgid
>       1.07            -0.1        1.01        perf-profile.children.cycles-pp.xattr_resolve_name
>       1.10            -0.1        1.04        perf-profile.children.cycles-pp.folio_wait_stable
>       1.05            -0.1        1.00        perf-profile.children.cycles-pp.folio_mapping
>       0.73            -0.1        0.67        perf-profile.children.cycles-pp.xas_start
>       0.93            -0.1        0.88        perf-profile.children.cycles-pp.folio_mark_dirty
>       0.50            -0.0        0.46        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
>       0.60            -0.0        0.56        perf-profile.children.cycles-pp.inode_to_bdi
>       0.43            -0.0        0.39        perf-profile.children.cycles-pp.write@plt
>       0.36            -0.0        0.33        perf-profile.children.cycles-pp.amd_clear_divider
>       0.37            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
>       0.33            -0.0        0.31        perf-profile.children.cycles-pp.noop_dirty_folio
>       0.36            -0.0        0.34        perf-profile.children.cycles-pp.is_bad_inode
>       0.24            -0.0        0.23 ±  2%  perf-profile.children.cycles-pp.file_remove_privs
>       1.18            +0.0        1.21        perf-profile.children.cycles-pp.strcmp
>       1.02            +0.1        1.08        perf-profile.children.cycles-pp.timestamp_truncate
>      99.01            +0.1       99.09        perf-profile.children.cycles-pp.write
>       0.98 ±  3%      +0.1        1.06        perf-profile.children.cycles-pp.generic_write_check_limits
>       0.68 ±  2%      +0.1        0.77 ±  6%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
>      11.58            +0.1       11.69        perf-profile.children.cycles-pp.__generic_file_write_iter
>       2.36 ±  2%      +0.1        2.50        perf-profile.children.cycles-pp.generic_write_checks
>       5.57            +0.2        5.75        perf-profile.children.cycles-pp.fault_in_readable
>       6.28            +0.2        6.49        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
>       2.98            +0.4        3.33        perf-profile.children.cycles-pp.inode_needs_update_time
>       3.51            +0.4        3.89        perf-profile.children.cycles-pp.file_update_time
>      85.24            +1.1       86.31        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      84.05            +1.2       85.21        perf-profile.children.cycles-pp.do_syscall_64
>      79.32            +1.5       80.78        perf-profile.children.cycles-pp.ksys_write
>      75.49            +1.7       77.21        perf-profile.children.cycles-pp.vfs_write
>       3.64            +4.0        7.64        perf-profile.children.cycles-pp.__fsnotify_parent
>       5.68            +4.3       10.03        perf-profile.children.cycles-pp.rw_verify_area
>       6.96            -0.5        6.44        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
>       6.52            -0.5        6.01        perf-profile.self.cycles-pp.write
>       6.92            -0.4        6.48        perf-profile.self.cycles-pp.vfs_write
>       3.59            -0.3        3.24        perf-profile.self.cycles-pp.filemap_get_entry
>       4.41            -0.3        4.09        perf-profile.self.cycles-pp.__filemap_get_folio
>       4.23            -0.3        3.95        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       2.79            -0.3        2.52        perf-profile.self.cycles-pp.simple_write_end
>       1.76            -0.2        1.52        perf-profile.self.cycles-pp.apparmor_file_permission
>       2.32 ±  2%      -0.2        2.16 ±  2%  perf-profile.self.cycles-pp.__fdget_pos
>       1.79            -0.2        1.62        perf-profile.self.cycles-pp.folio_unlock
>       2.05            -0.2        1.89        perf-profile.self.cycles-pp.down_write
>       2.35            -0.1        2.22        perf-profile.self.cycles-pp.__cond_resched
>       1.89            -0.1        1.77        perf-profile.self.cycles-pp.do_syscall_64
>       1.38            -0.1        1.26        perf-profile.self.cycles-pp.entry_SYSCALL_64
>       1.56            -0.1        1.45        perf-profile.self.cycles-pp.up_write
>       1.30            -0.1        1.19        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       1.42            -0.1        1.31        perf-profile.self.cycles-pp.rcu_all_qs
>       1.12            -0.1        1.02        perf-profile.self.cycles-pp.security_file_permission
>       1.46            -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
>       0.90            -0.1        0.83        perf-profile.self.cycles-pp.aa_file_perm
>       1.29            -0.1        1.22        perf-profile.self.cycles-pp.xas_load
>       0.74            -0.1        0.67        perf-profile.self.cycles-pp.w_test
>       1.08            -0.1        1.01        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
>       1.98            -0.1        1.92        perf-profile.self.cycles-pp.file_remove_privs_flags
>       1.30            -0.1        1.24        perf-profile.self.cycles-pp.__vfs_getxattr
>       1.06            -0.1        1.00        perf-profile.self.cycles-pp.xas_descend
>       0.80            -0.1        0.74        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
>       0.63            -0.1        0.58        perf-profile.self.cycles-pp.x64_sys_call
>       0.74            -0.1        0.69        perf-profile.self.cycles-pp.setattr_should_drop_suidgid
>       0.63            -0.0        0.58        perf-profile.self.cycles-pp.xas_start
>       0.87            -0.0        0.83        perf-profile.self.cycles-pp.folio_mapping
>       0.50            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
>       0.60            -0.0        0.57        perf-profile.self.cycles-pp.xattr_resolve_name
>       0.48            -0.0        0.44        perf-profile.self.cycles-pp.folio_mark_dirty
>       0.68            -0.0        0.65        perf-profile.self.cycles-pp.security_inode_need_killpriv
>       0.36            -0.0        0.33 ±  2%  perf-profile.self.cycles-pp.inode_to_bdi
>       0.52            -0.0        0.49        perf-profile.self.cycles-pp.folio_wait_stable
>       0.34            -0.0        0.32        perf-profile.self.cycles-pp.cap_inode_need_killpriv
>       0.89            -0.0        0.87        perf-profile.self.cycles-pp.simple_write_begin
>       0.25            -0.0        0.23        perf-profile.self.cycles-pp.__x64_sys_write
>       0.23 ±  2%      -0.0        0.22 ±  2%  perf-profile.self.cycles-pp.amd_clear_divider
>       0.23 ±  2%      -0.0        0.21        perf-profile.self.cycles-pp.noop_dirty_folio
>       0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.write@plt
>       0.24            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.is_bad_inode
>       0.62            +0.0        0.65        perf-profile.self.cycles-pp.file_update_time
>       0.86            +0.0        0.90        perf-profile.self.cycles-pp.strcmp
>       0.69            +0.0        0.74        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
>       0.75 ±  3%      +0.1        0.81        perf-profile.self.cycles-pp.generic_write_check_limits
>       1.42 ±  2%      +0.1        1.48        perf-profile.self.cycles-pp.generic_write_checks
>       0.82            +0.1        0.89        perf-profile.self.cycles-pp.timestamp_truncate
>       0.58 ±  3%      +0.1        0.66 ±  6%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
>       5.44            +0.2        5.60        perf-profile.self.cycles-pp.fault_in_readable
>       1.36            +0.2        1.55        perf-profile.self.cycles-pp.inode_needs_update_time
>       1.76 ±  3%      +0.9        2.64        perf-profile.self.cycles-pp.rw_verify_area
>       3.46            +3.8        7.25        perf-profile.self.cycles-pp.__fsnotify_parent
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-05-29 11:17 ` Amir Goldstein
@ 2024-05-31  3:15   ` Oliver Sang
  2024-05-31  5:18     ` Amir Goldstein
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-05-31  3:15 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang

hi, Amir,

On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> On Wed, May 29, 2024 at 11:26 AM kernel test robot
> <oliver.sang@intel.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> >
> >
> > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > https://github.com/amir73il/linux sb_write_barrier
> >
> 
> Jan,
> 
> I speculate that the regression is due to the fact that we store and pass the
> path information on struct file_range on the stack before the optimizations
> in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> and __fsnotify_parent() pays a bigger price for fetches?
> 
> Luckily, we already have the way to check
> fsnotify_sb_has_priority_watchers(inode->i_sb,
>                                                FSNOTIFY_PRIO_PRE_CONTENT))
> so now I used it to optimize out the fsnotify_file_range() inline
> code entirely.
> 
> Oliver,
> 
> Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> 
> * a82fd282befc - (fan_pre_content) fanotify: report file range info
> with pre-content events
> * f301cd18006c - fanotify: rename a misnamed constant
> * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> * aca408421327 - fsnotify: generate pre-content permission event on open
> * 93656e196b00 - fsnotify: introduce pre-content permission event
> 
> The optimization was done in the first commit (fsnotify: introduce
> pre-content permission event),
> but impacts the regressing commit (fanotify: pass optional file access
> range in pre-content event).
> no need to test all middle commits.

I directly compare the tip with v6.10-rc1, still a regression but better now

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  v6.10-rc1
  a82fd282befc7 ("fanotify: report file range info with pre-content events")

       v6.10-rc1 a82fd282befc71d99106bf31066
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
 1.216e+08            -3.9%  1.168e+08        unixbench.throughput

full data is as below [1]


then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"

it also has a small regression comparing to its parent, but better also.

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
  64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")

94167e071109d573 64108c0b47db91b20d658a89969
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
 1.163e+08            -2.4%  1.135e+08        unixbench.throughput

full data is as below [2]


[1]

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  v6.10-rc1
  a82fd282befc7 ("fanotify: report file range info with pre-content events")

       v6.10-rc1 a82fd282befc71d99106bf31066
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
      1614 ±  6%    +252.4%       5688 ± 67%  numa-vmstat.node1.nr_mapped
      6199            -5.8%       5841        time.user_time
    220234 ± 13%    +121.4%     487546 ± 41%  numa-meminfo.node0.AnonPages.max
    836146 ±  6%     -36.0%     535267 ± 45%  numa-meminfo.node1.AnonPages.max
      6233 ±  7%    +251.3%      21898 ± 69%  numa-meminfo.node1.Mapped
 1.216e+08            -3.9%  1.168e+08        unixbench.throughput
      6199            -5.8%       5841        unixbench.time.user_time
 4.513e+10            -3.9%  4.338e+10        unixbench.workload
 1.458e+11            -2.7%  1.419e+11        perf-stat.i.branch-instructions
     11.47 ±  6%      +2.6       14.10 ±  9%  perf-stat.i.cache-miss-rate%
   3915539 ±  8%    +510.0%   23884093 ±  9%  perf-stat.i.cache-misses
  32425619 ±  3%    +396.4%   1.61e+08 ±  4%  perf-stat.i.cache-references
    151202 ± 16%     -78.6%      32364 ± 56%  perf-stat.i.cycles-between-cache-misses
 6.961e+11            -1.9%  6.828e+11        perf-stat.i.instructions
      1.22            -1.3%       1.20        perf-stat.i.ipc
      0.01 ±  9%    +519.5%       0.04 ± 10%  perf-stat.overall.MPKI
      0.01            +0.0        0.01        perf-stat.overall.branch-miss-rate%
     12.09 ±  6%      +2.8       14.86 ±  8%  perf-stat.overall.cache-miss-rate%
      0.75            +2.0%       0.77        perf-stat.overall.cpi
    133775 ±  8%     -83.5%      22060 ±  9%  perf-stat.overall.cycles-between-cache-misses
      1.33            -1.9%       1.31        perf-stat.overall.ipc
      5721            +2.0%       5836        perf-stat.overall.path-length
 1.452e+11            -2.7%  1.413e+11        perf-stat.ps.branch-instructions
   3921138 ±  8%    +507.4%   23818053 ±  9%  perf-stat.ps.cache-misses
  32415461 ±  3%    +394.4%  1.603e+08 ±  4%  perf-stat.ps.cache-references
 6.932e+11            -1.9%  6.797e+11        perf-stat.ps.instructions
 2.582e+14            -1.9%  2.532e+14        perf-stat.total.instructions
     13.19            -0.7       12.50        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
      7.01            -0.2        6.80        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      1.11            -0.2        0.91        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
      2.50            -0.1        2.35        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
      1.68            -0.1        1.59        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      3.73            -0.1        3.64        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.62            -0.1        1.55        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      2.18            -0.1        2.12        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
      0.65            -0.1        0.60 ±  2%  perf-profile.calltrace.cycles-pp.w_test
      0.92            -0.0        0.87        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.70            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
      0.86            -0.0        0.82        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
      0.92            -0.0        0.88        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.63            -0.0        0.59        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.86            -0.0        0.83        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      3.53            -0.0        3.50        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
      0.68            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.53            -0.0        0.51        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write
      0.72            -0.0        0.71        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.75            +0.0        0.77        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      1.13            +0.0        1.17        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      5.30            +0.1        5.36        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      5.30            +0.1        5.38        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
      6.17            +0.1        6.27        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     96.84            +0.1       96.98        perf-profile.calltrace.cycles-pp.write
      0.78 ±  2%      +0.3        1.13 ±  5%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
      2.97            +0.6        3.57        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
     12.01            +0.6       12.62        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      3.63            +0.6        4.24        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
      4.32            +0.6        4.96        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     37.28            +0.8       38.12        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
     84.26            +1.0       85.21        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
     13.39            +1.0       14.36        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     12.30            +1.0       13.30        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
     82.83            +1.0       83.86        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     57.94            +1.3       59.20        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.99            +1.3        7.25 ±  3%  perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
     78.13            +1.3       79.41        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     74.26            +1.3       75.59        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      7.43            -0.4        7.06        perf-profile.children.cycles-pp.entry_SYSCALL_64
      4.42            -0.2        4.18        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.21            -0.2        1.00        perf-profile.children.cycles-pp.syscall_return_via_sysret
      7.14            -0.2        6.94        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
      4.18            -0.2        4.00        perf-profile.children.cycles-pp.__cond_resched
      2.74            -0.2        2.58        perf-profile.children.cycles-pp.apparmor_file_permission
      2.42            -0.1        2.30        perf-profile.children.cycles-pp.rcu_all_qs
      3.82            -0.1        3.72        perf-profile.children.cycles-pp.__fsnotify_parent
      1.74            -0.1        1.65        perf-profile.children.cycles-pp.up_write
      1.99            -0.1        1.90        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.99            -0.1        0.91        perf-profile.children.cycles-pp.w_test
      3.71            -0.1        3.64        perf-profile.children.cycles-pp.security_file_permission
      2.47            -0.1        2.41        perf-profile.children.cycles-pp.xas_load
      1.12            -0.1        1.06        perf-profile.children.cycles-pp.folio_wait_stable
      1.26            -0.1        1.21        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.75            -0.0        0.71        perf-profile.children.cycles-pp.x64_sys_call
      0.98            -0.0        0.94        perf-profile.children.cycles-pp.aa_file_perm
      0.46            -0.0        0.42        perf-profile.children.cycles-pp.write@plt
      1.10            -0.0        1.07        perf-profile.children.cycles-pp.xattr_resolve_name
      0.36            -0.0        0.34        perf-profile.children.cycles-pp.amd_clear_divider
      3.76            -0.0        3.73        perf-profile.children.cycles-pp.cap_inode_need_killpriv
      0.59            -0.0        0.56        perf-profile.children.cycles-pp.inode_to_bdi
      3.41            -0.0        3.38        perf-profile.children.cycles-pp.__vfs_getxattr
      0.56            -0.0        0.53        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      1.05            -0.0        1.03        perf-profile.children.cycles-pp.folio_mapping
      0.38            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
      0.25            -0.0        0.24        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
      0.36            -0.0        0.35        perf-profile.children.cycles-pp.is_bad_inode
      1.38            +0.0        1.40        perf-profile.children.cycles-pp.strcmp
      0.93            +0.0        0.95        perf-profile.children.cycles-pp.folio_mark_dirty
      1.07            +0.0        1.09        perf-profile.children.cycles-pp.timestamp_truncate
      5.70            +0.0        5.75        perf-profile.children.cycles-pp.simple_write_end
     98.96            +0.1       99.02        perf-profile.children.cycles-pp.write
      5.69            +0.1        5.75        perf-profile.children.cycles-pp.fault_in_readable
      6.42            +0.1        6.53        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
      0.89            +0.3        1.24 ±  4%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
      3.39            +0.6        3.97        perf-profile.children.cycles-pp.inode_needs_update_time
     12.35            +0.6       12.96        perf-profile.children.cycles-pp.__generic_file_write_iter
      3.96            +0.6        4.57        perf-profile.children.cycles-pp.file_update_time
      4.56            +0.8        5.33        perf-profile.children.cycles-pp.rw_verify_area
     38.16            +0.8       39.01        perf-profile.children.cycles-pp.generic_perform_write
     13.58            +1.0       14.54        perf-profile.children.cycles-pp.simple_write_begin
     84.67            +1.0       85.63        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     12.68            +1.0       13.68        perf-profile.children.cycles-pp.__filemap_get_folio
     83.50            +1.0       84.51        perf-profile.children.cycles-pp.do_syscall_64
     58.52            +1.3       59.78        perf-profile.children.cycles-pp.generic_file_write_iter
      6.18            +1.3        7.44 ±  3%  perf-profile.children.cycles-pp.filemap_get_entry
     78.74            +1.3       80.00        perf-profile.children.cycles-pp.ksys_write
     75.13            +1.3       76.42        perf-profile.children.cycles-pp.vfs_write
      7.25            -0.6        6.64        perf-profile.self.cycles-pp.vfs_write
      6.45            -0.4        6.08        perf-profile.self.cycles-pp.write
      4.32            -0.2        4.08        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.21            -0.2        1.00        perf-profile.self.cycles-pp.syscall_return_via_sysret
      6.98            -0.2        6.78        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
      4.52            -0.2        4.36        perf-profile.self.cycles-pp.__filemap_get_folio
      2.34            -0.1        2.22        perf-profile.self.cycles-pp.__cond_resched
      1.90            -0.1        1.78        perf-profile.self.cycles-pp.do_syscall_64
      1.60            -0.1        1.50 ±  2%  perf-profile.self.cycles-pp.apparmor_file_permission
      1.47            -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
      1.62            -0.1        1.53        perf-profile.self.cycles-pp.up_write
      3.65            -0.1        3.56        perf-profile.self.cycles-pp.__fsnotify_parent
      1.09            -0.1        1.02        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.80            -0.1        1.74        perf-profile.self.cycles-pp.xas_load
      0.79            -0.1        0.73        perf-profile.self.cycles-pp.w_test
      1.10            -0.1        1.04        perf-profile.self.cycles-pp.security_file_permission
      1.25            -0.1        1.20        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.66            -0.1        1.60        perf-profile.self.cycles-pp.entry_SYSCALL_64
      1.41            -0.0        1.36        perf-profile.self.cycles-pp.rcu_all_qs
      0.90            -0.0        0.86        perf-profile.self.cycles-pp.simple_write_begin
      0.88            -0.0        0.84        perf-profile.self.cycles-pp.aa_file_perm
      0.80            -0.0        0.76        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.62            -0.0        0.59        perf-profile.self.cycles-pp.x64_sys_call
      1.39            -0.0        1.36        perf-profile.self.cycles-pp.__vfs_getxattr
      0.53            -0.0        0.51        perf-profile.self.cycles-pp.folio_wait_stable
      0.87            -0.0        0.85        perf-profile.self.cycles-pp.folio_mapping
      0.56            -0.0        0.53        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.24            -0.0        0.22        perf-profile.self.cycles-pp.amd_clear_divider
      0.12 ±  3%      -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.write@plt
      0.25            -0.0        0.23        perf-profile.self.cycles-pp.__x64_sys_write
      0.35            -0.0        0.34        perf-profile.self.cycles-pp.inode_to_bdi
      0.22            -0.0        0.21        perf-profile.self.cycles-pp.noop_dirty_folio
      0.66            +0.0        0.69        perf-profile.self.cycles-pp.file_update_time
      1.03            +0.0        1.06        perf-profile.self.cycles-pp.strcmp
      2.75            +0.0        2.79        perf-profile.self.cycles-pp.simple_write_end
      0.72            +0.0        0.77        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
      0.87            +0.1        0.92        perf-profile.self.cycles-pp.timestamp_truncate
      5.54            +0.1        5.59        perf-profile.self.cycles-pp.fault_in_readable
      2.04            +0.1        2.09        perf-profile.self.cycles-pp.file_remove_privs_flags
      1.51            +0.2        1.69        perf-profile.self.cycles-pp.inode_needs_update_time
      0.78 ±  2%      +0.3        1.11 ±  5%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
      0.84            +1.0        1.82        perf-profile.self.cycles-pp.rw_verify_area
      3.66            +1.3        4.97 ±  4%  perf-profile.self.cycles-pp.filemap_get_entry


[2]

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
  64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")

94167e071109d573 64108c0b47db91b20d658a89969
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
     38903 ±113%    +313.8%     160973 ± 66%  numa-meminfo.node1.AnonHugePages
   1666466 ±  4%     -12.2%    1462703 ±  9%  numa-numastat.node1.local_node
     18.97 ±113%    +314.3%      78.59 ± 66%  numa-vmstat.node1.nr_anon_transparent_hugepages
      6003            -5.6%       5668        time.user_time
 1.163e+08            -2.4%  1.135e+08        unixbench.throughput
      6003            -5.6%       5668        unixbench.time.user_time
 4.314e+10            -2.3%  4.215e+10        unixbench.workload
    -12.17           +33.7%     -16.26        sched_debug.cpu.nr_uninterruptible.min
      0.00 ± 95%    +600.3%       0.00 ± 88%  sched_debug.rt_rq:.rt_time.avg
      0.02 ± 95%    +600.3%       0.14 ± 88%  sched_debug.rt_rq:.rt_time.max
      0.00 ± 95%    +600.3%       0.01 ± 88%  sched_debug.rt_rq:.rt_time.stddev
 1.407e+11            -2.0%  1.379e+11        perf-stat.i.branch-instructions
      0.55            -0.0        0.51 ±  4%  perf-stat.i.branch-miss-rate%
  55780077           -85.5%    8078438        perf-stat.i.branch-misses
   5029827 ±  6%    +315.5%   20897838 ± 10%  perf-stat.i.cache-misses
  35311245 ±  2%    +328.2%  1.512e+08 ±  6%  perf-stat.i.cache-references
    118639 ± 18%     -61.7%      45421 ± 41%  perf-stat.i.cycles-between-cache-misses
 6.736e+11            -1.5%  6.634e+11        perf-stat.i.instructions
      0.01 ±  6%    +321.2%       0.03 ± 10%  perf-stat.overall.MPKI
      0.04            -0.0        0.01        perf-stat.overall.branch-miss-rate%
      0.78            +1.5%       0.79        perf-stat.overall.cpi
    103942 ±  6%     -75.7%      25208 ± 10%  perf-stat.overall.cycles-between-cache-misses
      1.29            -1.5%       1.27        perf-stat.overall.ipc
   1.4e+11            -1.9%  1.373e+11        perf-stat.ps.branch-instructions
  55517704           -85.5%    8057745        perf-stat.ps.branch-misses
   5026889 ±  6%    +315.3%   20876882 ± 10%  perf-stat.ps.cache-misses
  35229110 ±  2%    +327.6%  1.506e+08 ±  6%  perf-stat.ps.cache-references
 6.701e+11            -1.4%  6.608e+11        perf-stat.ps.instructions
 2.496e+14            -1.4%   2.46e+14        perf-stat.total.instructions
      3.61            -0.5        3.09        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
      2.66            -0.5        2.18        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
     12.62            -0.5       12.16        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
      7.29            -0.3        7.03        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
      4.98            -0.2        4.74        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.96            -0.2        6.74        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      4.50            -0.2        4.33        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
      3.74            -0.2        3.58        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
     12.82            -0.2       12.66        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      1.04            -0.1        0.91        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
      2.87            -0.1        2.78        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
      5.27            -0.1        5.18        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.81            -0.1        0.75        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      2.17            -0.0        2.13        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
      1.66            -0.0        1.62        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.74            -0.0        0.70        perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
      1.27            -0.0        1.23        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      0.84            -0.0        0.81        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
      0.61            -0.0        0.57        perf-profile.calltrace.cycles-pp.w_test
      0.88            -0.0        0.86        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.90            -0.0        0.87        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.61            -0.0        0.59        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.68            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.71            -0.0        0.69        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.00 ±  4%      +0.1        1.10 ±  3%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
      2.68            +0.1        2.79        perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      5.18            +0.1        5.31        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
      6.00            +0.1        6.14        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      3.35            +0.1        3.48        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
      4.00            +0.1        4.14        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
     96.95            +0.2       97.12        perf-profile.calltrace.cycles-pp.write
      3.68            +0.2        3.86        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     84.87            +0.6       85.45        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
     83.48            +0.6       84.10        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     78.96            +0.7       79.70        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     75.00            +0.8       75.81        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     58.04            +1.1       59.14        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     36.40            +1.2       37.55        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
     12.88            +1.3       14.20        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     11.79            +1.3       13.13        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
      5.74            +1.4        7.16 ±  2%  perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      3.98            -0.5        3.44        perf-profile.children.cycles-pp.security_file_permission
      2.90            -0.5        2.40        perf-profile.children.cycles-pp.apparmor_file_permission
      7.66            -0.3        7.38        perf-profile.children.cycles-pp.file_remove_privs_flags
      7.14            -0.3        6.89        perf-profile.children.cycles-pp.entry_SYSCALL_64
      5.34            -0.2        5.10        perf-profile.children.cycles-pp.rw_verify_area
      7.10            -0.2        6.88        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
      3.98            -0.2        3.80        perf-profile.children.cycles-pp.cap_inode_need_killpriv
      4.73            -0.2        4.56        perf-profile.children.cycles-pp.security_inode_need_killpriv
     13.16            -0.2       13.00        perf-profile.children.cycles-pp.__generic_file_write_iter
      1.14            -0.1        1.01        perf-profile.children.cycles-pp.syscall_return_via_sysret
      5.67            -0.1        5.56        perf-profile.children.cycles-pp.simple_write_end
      3.56            -0.1        3.46        perf-profile.children.cycles-pp.__vfs_getxattr
      4.04            -0.1        3.95        perf-profile.children.cycles-pp.__cond_resched
      4.21            -0.1        4.14        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      2.31            -0.1        2.24        perf-profile.children.cycles-pp.rcu_all_qs
      0.99            -0.1        0.93        perf-profile.children.cycles-pp.folio_mark_dirty
      2.46            -0.0        2.42        perf-profile.children.cycles-pp.xas_load
      0.93            -0.0        0.88 ±  2%  perf-profile.children.cycles-pp.w_test
      1.50            -0.0        1.46        perf-profile.children.cycles-pp.strcmp
      1.72            -0.0        1.68        perf-profile.children.cycles-pp.up_write
      0.87            -0.0        0.82        perf-profile.children.cycles-pp.setattr_should_drop_suidgid
      1.04            -0.0        1.00        perf-profile.children.cycles-pp.folio_mapping
      0.96            -0.0        0.92        perf-profile.children.cycles-pp.aa_file_perm
      1.23            -0.0        1.20        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.73            -0.0        0.70        perf-profile.children.cycles-pp.x64_sys_call
      1.07            -0.0        1.05        perf-profile.children.cycles-pp.folio_wait_stable
      0.43            -0.0        0.41 ±  2%  perf-profile.children.cycles-pp.write@plt
      1.08            -0.0        1.06        perf-profile.children.cycles-pp.xattr_resolve_name
      0.35            -0.0        0.34        perf-profile.children.cycles-pp.__x64_sys_write
     99.01            +0.0       99.04        perf-profile.children.cycles-pp.write
      1.12 ±  4%      +0.1        1.20 ±  2%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
      2.86            +0.1        2.97        perf-profile.children.cycles-pp.down_write
      5.50            +0.1        5.63        perf-profile.children.cycles-pp.fault_in_readable
      3.75            +0.1        3.88        perf-profile.children.cycles-pp.inode_needs_update_time
      4.34            +0.1        4.47        perf-profile.children.cycles-pp.file_update_time
      6.25            +0.1        6.39        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
      3.77            +0.2        3.96        perf-profile.children.cycles-pp.__fsnotify_parent
     85.29            +0.6       85.86        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     84.14            +0.6       84.74        perf-profile.children.cycles-pp.do_syscall_64
     79.57            +0.7       80.28        perf-profile.children.cycles-pp.ksys_write
     75.84            +0.8       76.64        perf-profile.children.cycles-pp.vfs_write
     58.64            +1.1       59.72        perf-profile.children.cycles-pp.generic_file_write_iter
     37.30            +1.1       38.43        perf-profile.children.cycles-pp.generic_perform_write
     13.05            +1.3       14.38        perf-profile.children.cycles-pp.simple_write_begin
     12.18            +1.3       13.52        perf-profile.children.cycles-pp.__filemap_get_folio
      5.94            +1.4        7.35 ±  2%  perf-profile.children.cycles-pp.filemap_get_entry
      1.77            -0.4        1.35 ±  2%  perf-profile.self.cycles-pp.apparmor_file_permission
      6.23            -0.3        5.94        perf-profile.self.cycles-pp.write
      6.94            -0.2        6.71        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
      7.14            -0.2        6.93        perf-profile.self.cycles-pp.vfs_write
      1.13            -0.1        1.01        perf-profile.self.cycles-pp.syscall_return_via_sysret
      1.49 ±  3%      -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
      4.12            -0.1        4.03        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.42            -0.1        0.34        perf-profile.self.cycles-pp.cap_inode_need_killpriv
      2.17            -0.1        2.10        perf-profile.self.cycles-pp.file_remove_privs_flags
      1.08            -0.1        1.02        perf-profile.self.cycles-pp.security_file_permission
      1.83            -0.1        1.77        perf-profile.self.cycles-pp.do_syscall_64
      2.74            -0.1        2.68        perf-profile.self.cycles-pp.simple_write_end
      0.86            -0.0        0.81        perf-profile.self.cycles-pp.aa_file_perm
      1.60            -0.0        1.56        perf-profile.self.cycles-pp.up_write
      1.42            -0.0        1.38        perf-profile.self.cycles-pp.__vfs_getxattr
      0.86            -0.0        0.82        perf-profile.self.cycles-pp.folio_mapping
      1.36            -0.0        1.32        perf-profile.self.cycles-pp.rcu_all_qs
      0.52            -0.0        0.49        perf-profile.self.cycles-pp.folio_mark_dirty
      1.78            -0.0        1.75        perf-profile.self.cycles-pp.xas_load
      1.23            -0.0        1.20        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.74            -0.0        0.71        perf-profile.self.cycles-pp.setattr_should_drop_suidgid
      0.74            -0.0        0.71 ±  2%  perf-profile.self.cycles-pp.w_test
      1.14            -0.0        1.11        perf-profile.self.cycles-pp.strcmp
      2.25            -0.0        2.22        perf-profile.self.cycles-pp.__cond_resched
      0.60            -0.0        0.58        perf-profile.self.cycles-pp.x64_sys_call
      0.77            -0.0        0.75        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.61            -0.0        0.60        perf-profile.self.cycles-pp.xattr_resolve_name
      0.74            +0.0        0.76        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
      1.40            +0.1        1.45        perf-profile.self.cycles-pp.generic_write_checks
      1.60            +0.1        1.65        perf-profile.self.cycles-pp.inode_needs_update_time
      1.00 ±  4%      +0.1        1.08 ±  3%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
      1.86 ±  2%      +0.1        1.98        perf-profile.self.cycles-pp.down_write
      5.34            +0.1        5.47        perf-profile.self.cycles-pp.fault_in_readable
      3.61            +0.2        3.80        perf-profile.self.cycles-pp.__fsnotify_parent
      1.46            +0.3        1.77        perf-profile.self.cycles-pp.rw_verify_area
      3.43            +1.4        4.88 ±  3%  perf-profile.self.cycles-pp.filemap_get_entry


> 
> Thanks,
> Amir.
> 
> 
> 
> 
> > testcase: unixbench
> > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> > parameters:
> >
> >         runtime: 300s
> >         nr_task: 100%
> >         test: fsbuffer-w
> >         cpufreq_governor: performance
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> >   00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event")
> >   9d1fd61f1d ("fanotify: pass optional file access range in pre-content event")
> >
> > 00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >   1.23e+08            -7.9%  1.133e+08        unixbench.throughput
> >       6169            -7.7%       5694        unixbench.time.user_time
> >  4.566e+10            -7.9%  4.206e+10        unixbench.workload
> >  1.513e+11            -4.5%  1.445e+11        perf-stat.i.branch-instructions
> >    6891152            +4.8%    7221484        perf-stat.i.branch-misses
> >   29764445 ±  2%      -7.4%   27565609 ±  3%  perf-stat.i.cache-references
> >       0.91            +2.0%       0.93        perf-stat.i.cpi
> >  7.187e+11            -2.7%  6.996e+11        perf-stat.i.instructions
> >       1.26            -2.6%       1.23        perf-stat.i.ipc
> >       0.00            +0.0        0.01        perf-stat.overall.branch-miss-rate%
> >       0.73            +2.7%       0.75        perf-stat.overall.cpi
> >       1.37            -2.6%       1.34        perf-stat.overall.ipc
> >       5828            +5.7%       6162        perf-stat.overall.path-length
> >  1.505e+11            -4.5%  1.437e+11        perf-stat.ps.branch-instructions
> >    6873687            +4.8%    7203107        perf-stat.ps.branch-misses
> >   29721957 ±  2%      -7.3%   27538369 ±  3%  perf-stat.ps.cache-references
> >  7.148e+11            -2.6%   6.96e+11        perf-stat.ps.instructions
> >  2.662e+14            -2.6%  2.592e+14        perf-stat.total.instructions
> >      57.79            -2.0       55.78        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      37.58            -2.0       35.63        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> >      13.06            -1.0       12.04        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> >      13.81            -1.0       12.83        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> >      12.72            -0.9       11.78        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
> >       7.00            -0.5        6.47        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> >       6.53            -0.5        6.02        perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> >       5.36            -0.5        4.89        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> >       3.66            -0.4        3.28        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> >       2.68            -0.3        2.36        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
> >       6.57            -0.2        6.34        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> >       2.36 ±  2%      -0.2        2.18 ±  2%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> >       1.83            -0.2        1.66        perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> >       2.92            -0.2        2.76        perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> >       2.65            -0.2        2.49        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
> >       3.95            -0.1        3.83        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> >       1.62            -0.1        1.50        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> >       0.74            -0.1        0.64        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       3.26            -0.1        3.17        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
> >       3.57            -0.1        3.49        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       1.61            -0.1        1.53        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> >       0.93            -0.1        0.85        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> >       1.05            -0.1        0.99        perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> >       0.61            -0.1        0.55        perf-profile.calltrace.cycles-pp.w_test
> >       0.64            -0.1        0.58        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> >       0.87            -0.1        0.82        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
> >       2.50            -0.1        2.44        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
> >       0.62            -0.1        0.56        perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> >       0.74            -0.0        0.69        perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> >       0.91            -0.0        0.86        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> >       0.84            -0.0        0.79        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> >       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> >       0.74            -0.0        0.71        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> >       0.62            -0.0        0.59        perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> >       0.97            +0.0        1.00        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> >       0.91            +0.1        0.97        perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> >       0.86 ±  3%      +0.1        0.94        perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
> >       0.58 ±  2%      +0.1        0.66 ±  7%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> >      11.24            +0.1       11.36        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> >       2.01 ±  2%      +0.1        2.14        perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> >       6.04            +0.2        6.24        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> >       5.17            +0.2        5.42        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
> >      96.75            +0.3       97.03        perf-profile.calltrace.cycles-pp.write
> >       2.57            +0.4        2.92        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
> >       3.20            +0.4        3.57        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> >      84.82            +1.1       85.88        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
> >      83.38            +1.2       84.56        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> >      78.73            +1.5       80.20        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> >      74.54            +1.8       76.32        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> >       0.00            +4.0        3.99        perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> >       5.32            +4.2        9.48        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      58.42            -2.0       56.38        perf-profile.children.cycles-pp.generic_file_write_iter
> >      38.46            -2.0       36.50        perf-profile.children.cycles-pp.generic_perform_write
> >      13.99            -1.0       13.01        perf-profile.children.cycles-pp.simple_write_begin
> >      13.11            -1.0       12.15        perf-profile.children.cycles-pp.__filemap_get_folio
> >       7.23            -0.6        6.66        perf-profile.children.cycles-pp.entry_SYSCALL_64
> >       7.12            -0.5        6.59        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
> >       6.73            -0.5        6.21        perf-profile.children.cycles-pp.filemap_get_entry
> >       5.76            -0.5        5.26        perf-profile.children.cycles-pp.simple_write_end
> >       4.05            -0.4        3.64        perf-profile.children.cycles-pp.security_file_permission
> >       2.93            -0.3        2.59        perf-profile.children.cycles-pp.apparmor_file_permission
> >       4.32            -0.3        4.04        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       4.20            -0.3        3.92        perf-profile.children.cycles-pp.__cond_resched
> >       6.91            -0.2        6.67        perf-profile.children.cycles-pp.file_remove_privs_flags
> >       2.43            -0.2        2.24        perf-profile.children.cycles-pp.rcu_all_qs
> >       3.10            -0.2        2.92        perf-profile.children.cycles-pp.xas_load
> >       2.47 ±  2%      -0.2        2.29 ±  2%  perf-profile.children.cycles-pp.__fdget_pos
> >       1.92            -0.2        1.74        perf-profile.children.cycles-pp.folio_unlock
> >       3.11            -0.2        2.94        perf-profile.children.cycles-pp.down_write
> >       4.18            -0.1        4.04        perf-profile.children.cycles-pp.security_inode_need_killpriv
> >       1.68            -0.1        1.56        perf-profile.children.cycles-pp.up_write
> >       3.48            -0.1        3.38        perf-profile.children.cycles-pp.cap_inode_need_killpriv
> >       1.96            -0.1        1.87        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> >       1.28            -0.1        1.18        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
> >       0.92            -0.1        0.84        perf-profile.children.cycles-pp.w_test
> >       3.14            -0.1        3.06        perf-profile.children.cycles-pp.__vfs_getxattr
> >       1.00            -0.1        0.92        perf-profile.children.cycles-pp.aa_file_perm
> >       1.29            -0.1        1.22        perf-profile.children.cycles-pp.xas_descend
> >       0.76            -0.1        0.70        perf-profile.children.cycles-pp.x64_sys_call
> >       0.87            -0.1        0.80        perf-profile.children.cycles-pp.setattr_should_drop_suidgid
> >       1.07            -0.1        1.01        perf-profile.children.cycles-pp.xattr_resolve_name
> >       1.10            -0.1        1.04        perf-profile.children.cycles-pp.folio_wait_stable
> >       1.05            -0.1        1.00        perf-profile.children.cycles-pp.folio_mapping
> >       0.73            -0.1        0.67        perf-profile.children.cycles-pp.xas_start
> >       0.93            -0.1        0.88        perf-profile.children.cycles-pp.folio_mark_dirty
> >       0.50            -0.0        0.46        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> >       0.60            -0.0        0.56        perf-profile.children.cycles-pp.inode_to_bdi
> >       0.43            -0.0        0.39        perf-profile.children.cycles-pp.write@plt
> >       0.36            -0.0        0.33        perf-profile.children.cycles-pp.amd_clear_divider
> >       0.37            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
> >       0.33            -0.0        0.31        perf-profile.children.cycles-pp.noop_dirty_folio
> >       0.36            -0.0        0.34        perf-profile.children.cycles-pp.is_bad_inode
> >       0.24            -0.0        0.23 ±  2%  perf-profile.children.cycles-pp.file_remove_privs
> >       1.18            +0.0        1.21        perf-profile.children.cycles-pp.strcmp
> >       1.02            +0.1        1.08        perf-profile.children.cycles-pp.timestamp_truncate
> >      99.01            +0.1       99.09        perf-profile.children.cycles-pp.write
> >       0.98 ±  3%      +0.1        1.06        perf-profile.children.cycles-pp.generic_write_check_limits
> >       0.68 ±  2%      +0.1        0.77 ±  6%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> >      11.58            +0.1       11.69        perf-profile.children.cycles-pp.__generic_file_write_iter
> >       2.36 ±  2%      +0.1        2.50        perf-profile.children.cycles-pp.generic_write_checks
> >       5.57            +0.2        5.75        perf-profile.children.cycles-pp.fault_in_readable
> >       6.28            +0.2        6.49        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
> >       2.98            +0.4        3.33        perf-profile.children.cycles-pp.inode_needs_update_time
> >       3.51            +0.4        3.89        perf-profile.children.cycles-pp.file_update_time
> >      85.24            +1.1       86.31        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >      84.05            +1.2       85.21        perf-profile.children.cycles-pp.do_syscall_64
> >      79.32            +1.5       80.78        perf-profile.children.cycles-pp.ksys_write
> >      75.49            +1.7       77.21        perf-profile.children.cycles-pp.vfs_write
> >       3.64            +4.0        7.64        perf-profile.children.cycles-pp.__fsnotify_parent
> >       5.68            +4.3       10.03        perf-profile.children.cycles-pp.rw_verify_area
> >       6.96            -0.5        6.44        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
> >       6.52            -0.5        6.01        perf-profile.self.cycles-pp.write
> >       6.92            -0.4        6.48        perf-profile.self.cycles-pp.vfs_write
> >       3.59            -0.3        3.24        perf-profile.self.cycles-pp.filemap_get_entry
> >       4.41            -0.3        4.09        perf-profile.self.cycles-pp.__filemap_get_folio
> >       4.23            -0.3        3.95        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       2.79            -0.3        2.52        perf-profile.self.cycles-pp.simple_write_end
> >       1.76            -0.2        1.52        perf-profile.self.cycles-pp.apparmor_file_permission
> >       2.32 ±  2%      -0.2        2.16 ±  2%  perf-profile.self.cycles-pp.__fdget_pos
> >       1.79            -0.2        1.62        perf-profile.self.cycles-pp.folio_unlock
> >       2.05            -0.2        1.89        perf-profile.self.cycles-pp.down_write
> >       2.35            -0.1        2.22        perf-profile.self.cycles-pp.__cond_resched
> >       1.89            -0.1        1.77        perf-profile.self.cycles-pp.do_syscall_64
> >       1.38            -0.1        1.26        perf-profile.self.cycles-pp.entry_SYSCALL_64
> >       1.56            -0.1        1.45        perf-profile.self.cycles-pp.up_write
> >       1.30            -0.1        1.19        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> >       1.42            -0.1        1.31        perf-profile.self.cycles-pp.rcu_all_qs
> >       1.12            -0.1        1.02        perf-profile.self.cycles-pp.security_file_permission
> >       1.46            -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
> >       0.90            -0.1        0.83        perf-profile.self.cycles-pp.aa_file_perm
> >       1.29            -0.1        1.22        perf-profile.self.cycles-pp.xas_load
> >       0.74            -0.1        0.67        perf-profile.self.cycles-pp.w_test
> >       1.08            -0.1        1.01        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> >       1.98            -0.1        1.92        perf-profile.self.cycles-pp.file_remove_privs_flags
> >       1.30            -0.1        1.24        perf-profile.self.cycles-pp.__vfs_getxattr
> >       1.06            -0.1        1.00        perf-profile.self.cycles-pp.xas_descend
> >       0.80            -0.1        0.74        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
> >       0.63            -0.1        0.58        perf-profile.self.cycles-pp.x64_sys_call
> >       0.74            -0.1        0.69        perf-profile.self.cycles-pp.setattr_should_drop_suidgid
> >       0.63            -0.0        0.58        perf-profile.self.cycles-pp.xas_start
> >       0.87            -0.0        0.83        perf-profile.self.cycles-pp.folio_mapping
> >       0.50            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> >       0.60            -0.0        0.57        perf-profile.self.cycles-pp.xattr_resolve_name
> >       0.48            -0.0        0.44        perf-profile.self.cycles-pp.folio_mark_dirty
> >       0.68            -0.0        0.65        perf-profile.self.cycles-pp.security_inode_need_killpriv
> >       0.36            -0.0        0.33 ±  2%  perf-profile.self.cycles-pp.inode_to_bdi
> >       0.52            -0.0        0.49        perf-profile.self.cycles-pp.folio_wait_stable
> >       0.34            -0.0        0.32        perf-profile.self.cycles-pp.cap_inode_need_killpriv
> >       0.89            -0.0        0.87        perf-profile.self.cycles-pp.simple_write_begin
> >       0.25            -0.0        0.23        perf-profile.self.cycles-pp.__x64_sys_write
> >       0.23 ±  2%      -0.0        0.22 ±  2%  perf-profile.self.cycles-pp.amd_clear_divider
> >       0.23 ±  2%      -0.0        0.21        perf-profile.self.cycles-pp.noop_dirty_folio
> >       0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.write@plt
> >       0.24            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.is_bad_inode
> >       0.62            +0.0        0.65        perf-profile.self.cycles-pp.file_update_time
> >       0.86            +0.0        0.90        perf-profile.self.cycles-pp.strcmp
> >       0.69            +0.0        0.74        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
> >       0.75 ±  3%      +0.1        0.81        perf-profile.self.cycles-pp.generic_write_check_limits
> >       1.42 ±  2%      +0.1        1.48        perf-profile.self.cycles-pp.generic_write_checks
> >       0.82            +0.1        0.89        perf-profile.self.cycles-pp.timestamp_truncate
> >       0.58 ±  3%      +0.1        0.66 ±  6%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> >       5.44            +0.2        5.60        perf-profile.self.cycles-pp.fault_in_readable
> >       1.36            +0.2        1.55        perf-profile.self.cycles-pp.inode_needs_update_time
> >       1.76 ±  3%      +0.9        2.64        perf-profile.self.cycles-pp.rw_verify_area
> >       3.46            +3.8        7.25        perf-profile.self.cycles-pp.__fsnotify_parent
> >
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-05-31  3:15   ` Oliver Sang
@ 2024-05-31  5:18     ` Amir Goldstein
  2024-06-03  8:13       ` Oliver Sang
  0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-05-31  5:18 UTC (permalink / raw)
  To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp

On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > <oliver.sang@intel.com> wrote:
> > >
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > >
> > >
> > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > https://github.com/amir73il/linux sb_write_barrier
> > >
> >
> > Jan,
> >
> > I speculate that the regression is due to the fact that we store and pass the
> > path information on struct file_range on the stack before the optimizations
> > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > and __fsnotify_parent() pays a bigger price for fetches?
> >
> > Luckily, we already have the way to check
> > fsnotify_sb_has_priority_watchers(inode->i_sb,
> >                                                FSNOTIFY_PRIO_PRE_CONTENT))
> > so now I used it to optimize out the fsnotify_file_range() inline
> > code entirely.
> >
> > Oliver,
> >
> > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> >
> > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > with pre-content events
> > * f301cd18006c - fanotify: rename a misnamed constant
> > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > * aca408421327 - fsnotify: generate pre-content permission event on open
> > * 93656e196b00 - fsnotify: introduce pre-content permission event
> >
> > The optimization was done in the first commit (fsnotify: introduce
> > pre-content permission event),
> > but impacts the regressing commit (fanotify: pass optional file access
> > range in pre-content event).
> > no need to test all middle commits.
>
> I directly compare the tip with v6.10-rc1, still a regression but better now
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   v6.10-rc1
>   a82fd282befc7 ("fanotify: report file range info with pre-content events")
>
>        v6.10-rc1 a82fd282befc71d99106bf31066
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>  1.216e+08            -3.9%  1.168e+08        unixbench.throughput
>
> full data is as below [1]
>
>
> then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
>
> it also has a small regression comparing to its parent, but better also.
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
>   64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
>
> 94167e071109d573 64108c0b47db91b20d658a89969
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
>
> full data is as below [2]
>

Ok, this looks sane, the small overhead in the write path makes sense.
It may have been a "tactic mistake" merging this optimization to v6.10-rc1
a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
before the rest of the pre-content infrastructure, because together they
would still be a performance win.

Can you please compare this branch to v6.9?

Thanks,
Amir.

>
> [1]
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   v6.10-rc1
>   a82fd282befc7 ("fanotify: report file range info with pre-content events")
>
>        v6.10-rc1 a82fd282befc71d99106bf31066
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>       1614 ±  6%    +252.4%       5688 ± 67%  numa-vmstat.node1.nr_mapped
>       6199            -5.8%       5841        time.user_time
>     220234 ± 13%    +121.4%     487546 ± 41%  numa-meminfo.node0.AnonPages.max
>     836146 ±  6%     -36.0%     535267 ± 45%  numa-meminfo.node1.AnonPages.max
>       6233 ±  7%    +251.3%      21898 ± 69%  numa-meminfo.node1.Mapped
>  1.216e+08            -3.9%  1.168e+08        unixbench.throughput
>       6199            -5.8%       5841        unixbench.time.user_time
>  4.513e+10            -3.9%  4.338e+10        unixbench.workload
>  1.458e+11            -2.7%  1.419e+11        perf-stat.i.branch-instructions
>      11.47 ±  6%      +2.6       14.10 ±  9%  perf-stat.i.cache-miss-rate%
>    3915539 ±  8%    +510.0%   23884093 ±  9%  perf-stat.i.cache-misses
>   32425619 ±  3%    +396.4%   1.61e+08 ±  4%  perf-stat.i.cache-references
>     151202 ± 16%     -78.6%      32364 ± 56%  perf-stat.i.cycles-between-cache-misses
>  6.961e+11            -1.9%  6.828e+11        perf-stat.i.instructions
>       1.22            -1.3%       1.20        perf-stat.i.ipc
>       0.01 ±  9%    +519.5%       0.04 ± 10%  perf-stat.overall.MPKI
>       0.01            +0.0        0.01        perf-stat.overall.branch-miss-rate%
>      12.09 ±  6%      +2.8       14.86 ±  8%  perf-stat.overall.cache-miss-rate%
>       0.75            +2.0%       0.77        perf-stat.overall.cpi
>     133775 ±  8%     -83.5%      22060 ±  9%  perf-stat.overall.cycles-between-cache-misses
>       1.33            -1.9%       1.31        perf-stat.overall.ipc
>       5721            +2.0%       5836        perf-stat.overall.path-length
>  1.452e+11            -2.7%  1.413e+11        perf-stat.ps.branch-instructions
>    3921138 ±  8%    +507.4%   23818053 ±  9%  perf-stat.ps.cache-misses
>   32415461 ±  3%    +394.4%  1.603e+08 ±  4%  perf-stat.ps.cache-references
>  6.932e+11            -1.9%  6.797e+11        perf-stat.ps.instructions
>  2.582e+14            -1.9%  2.532e+14        perf-stat.total.instructions
>      13.19            -0.7       12.50        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
>       7.01            -0.2        6.80        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       1.11            -0.2        0.91        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
>       2.50            -0.1        2.35        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
>       1.68            -0.1        1.59        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       3.73            -0.1        3.64        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.62            -0.1        1.55        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       2.18            -0.1        2.12        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
>       0.65            -0.1        0.60 ±  2%  perf-profile.calltrace.cycles-pp.w_test
>       0.92            -0.0        0.87        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       0.70            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
>       0.86            -0.0        0.82        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
>       0.92            -0.0        0.88        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       0.63            -0.0        0.59        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       0.86            -0.0        0.83        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
>       3.53            -0.0        3.50        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
>       0.68            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       0.53            -0.0        0.51        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write
>       0.72            -0.0        0.71        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.75            +0.0        0.77        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
>       1.13            +0.0        1.17        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
>       5.30            +0.1        5.36        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       5.30            +0.1        5.38        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
>       6.17            +0.1        6.27        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>      96.84            +0.1       96.98        perf-profile.calltrace.cycles-pp.write
>       0.78 ±  2%      +0.3        1.13 ±  5%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
>       2.97            +0.6        3.57        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
>      12.01            +0.6       12.62        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       3.63            +0.6        4.24        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
>       4.32            +0.6        4.96        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      37.28            +0.8       38.12        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>      84.26            +1.0       85.21        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
>      13.39            +1.0       14.36        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>      12.30            +1.0       13.30        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
>      82.83            +1.0       83.86        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>      57.94            +1.3       59.20        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       5.99            +1.3        7.25 ±  3%  perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>      78.13            +1.3       79.41        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>      74.26            +1.3       75.59        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       7.43            -0.4        7.06        perf-profile.children.cycles-pp.entry_SYSCALL_64
>       4.42            -0.2        4.18        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       1.21            -0.2        1.00        perf-profile.children.cycles-pp.syscall_return_via_sysret
>       7.14            -0.2        6.94        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
>       4.18            -0.2        4.00        perf-profile.children.cycles-pp.__cond_resched
>       2.74            -0.2        2.58        perf-profile.children.cycles-pp.apparmor_file_permission
>       2.42            -0.1        2.30        perf-profile.children.cycles-pp.rcu_all_qs
>       3.82            -0.1        3.72        perf-profile.children.cycles-pp.__fsnotify_parent
>       1.74            -0.1        1.65        perf-profile.children.cycles-pp.up_write
>       1.99            -0.1        1.90        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       0.99            -0.1        0.91        perf-profile.children.cycles-pp.w_test
>       3.71            -0.1        3.64        perf-profile.children.cycles-pp.security_file_permission
>       2.47            -0.1        2.41        perf-profile.children.cycles-pp.xas_load
>       1.12            -0.1        1.06        perf-profile.children.cycles-pp.folio_wait_stable
>       1.26            -0.1        1.21        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
>       0.75            -0.0        0.71        perf-profile.children.cycles-pp.x64_sys_call
>       0.98            -0.0        0.94        perf-profile.children.cycles-pp.aa_file_perm
>       0.46            -0.0        0.42        perf-profile.children.cycles-pp.write@plt
>       1.10            -0.0        1.07        perf-profile.children.cycles-pp.xattr_resolve_name
>       0.36            -0.0        0.34        perf-profile.children.cycles-pp.amd_clear_divider
>       3.76            -0.0        3.73        perf-profile.children.cycles-pp.cap_inode_need_killpriv
>       0.59            -0.0        0.56        perf-profile.children.cycles-pp.inode_to_bdi
>       3.41            -0.0        3.38        perf-profile.children.cycles-pp.__vfs_getxattr
>       0.56            -0.0        0.53        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
>       1.05            -0.0        1.03        perf-profile.children.cycles-pp.folio_mapping
>       0.38            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
>       0.25            -0.0        0.24        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
>       0.36            -0.0        0.35        perf-profile.children.cycles-pp.is_bad_inode
>       1.38            +0.0        1.40        perf-profile.children.cycles-pp.strcmp
>       0.93            +0.0        0.95        perf-profile.children.cycles-pp.folio_mark_dirty
>       1.07            +0.0        1.09        perf-profile.children.cycles-pp.timestamp_truncate
>       5.70            +0.0        5.75        perf-profile.children.cycles-pp.simple_write_end
>      98.96            +0.1       99.02        perf-profile.children.cycles-pp.write
>       5.69            +0.1        5.75        perf-profile.children.cycles-pp.fault_in_readable
>       6.42            +0.1        6.53        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
>       0.89            +0.3        1.24 ±  4%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
>       3.39            +0.6        3.97        perf-profile.children.cycles-pp.inode_needs_update_time
>      12.35            +0.6       12.96        perf-profile.children.cycles-pp.__generic_file_write_iter
>       3.96            +0.6        4.57        perf-profile.children.cycles-pp.file_update_time
>       4.56            +0.8        5.33        perf-profile.children.cycles-pp.rw_verify_area
>      38.16            +0.8       39.01        perf-profile.children.cycles-pp.generic_perform_write
>      13.58            +1.0       14.54        perf-profile.children.cycles-pp.simple_write_begin
>      84.67            +1.0       85.63        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      12.68            +1.0       13.68        perf-profile.children.cycles-pp.__filemap_get_folio
>      83.50            +1.0       84.51        perf-profile.children.cycles-pp.do_syscall_64
>      58.52            +1.3       59.78        perf-profile.children.cycles-pp.generic_file_write_iter
>       6.18            +1.3        7.44 ±  3%  perf-profile.children.cycles-pp.filemap_get_entry
>      78.74            +1.3       80.00        perf-profile.children.cycles-pp.ksys_write
>      75.13            +1.3       76.42        perf-profile.children.cycles-pp.vfs_write
>       7.25            -0.6        6.64        perf-profile.self.cycles-pp.vfs_write
>       6.45            -0.4        6.08        perf-profile.self.cycles-pp.write
>       4.32            -0.2        4.08        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       1.21            -0.2        1.00        perf-profile.self.cycles-pp.syscall_return_via_sysret
>       6.98            -0.2        6.78        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
>       4.52            -0.2        4.36        perf-profile.self.cycles-pp.__filemap_get_folio
>       2.34            -0.1        2.22        perf-profile.self.cycles-pp.__cond_resched
>       1.90            -0.1        1.78        perf-profile.self.cycles-pp.do_syscall_64
>       1.60            -0.1        1.50 ±  2%  perf-profile.self.cycles-pp.apparmor_file_permission
>       1.47            -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
>       1.62            -0.1        1.53        perf-profile.self.cycles-pp.up_write
>       3.65            -0.1        3.56        perf-profile.self.cycles-pp.__fsnotify_parent
>       1.09            -0.1        1.02        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
>       1.80            -0.1        1.74        perf-profile.self.cycles-pp.xas_load
>       0.79            -0.1        0.73        perf-profile.self.cycles-pp.w_test
>       1.10            -0.1        1.04        perf-profile.self.cycles-pp.security_file_permission
>       1.25            -0.1        1.20        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       1.66            -0.1        1.60        perf-profile.self.cycles-pp.entry_SYSCALL_64
>       1.41            -0.0        1.36        perf-profile.self.cycles-pp.rcu_all_qs
>       0.90            -0.0        0.86        perf-profile.self.cycles-pp.simple_write_begin
>       0.88            -0.0        0.84        perf-profile.self.cycles-pp.aa_file_perm
>       0.80            -0.0        0.76        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
>       0.62            -0.0        0.59        perf-profile.self.cycles-pp.x64_sys_call
>       1.39            -0.0        1.36        perf-profile.self.cycles-pp.__vfs_getxattr
>       0.53            -0.0        0.51        perf-profile.self.cycles-pp.folio_wait_stable
>       0.87            -0.0        0.85        perf-profile.self.cycles-pp.folio_mapping
>       0.56            -0.0        0.53        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
>       0.24            -0.0        0.22        perf-profile.self.cycles-pp.amd_clear_divider
>       0.12 ±  3%      -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.write@plt
>       0.25            -0.0        0.23        perf-profile.self.cycles-pp.__x64_sys_write
>       0.35            -0.0        0.34        perf-profile.self.cycles-pp.inode_to_bdi
>       0.22            -0.0        0.21        perf-profile.self.cycles-pp.noop_dirty_folio
>       0.66            +0.0        0.69        perf-profile.self.cycles-pp.file_update_time
>       1.03            +0.0        1.06        perf-profile.self.cycles-pp.strcmp
>       2.75            +0.0        2.79        perf-profile.self.cycles-pp.simple_write_end
>       0.72            +0.0        0.77        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
>       0.87            +0.1        0.92        perf-profile.self.cycles-pp.timestamp_truncate
>       5.54            +0.1        5.59        perf-profile.self.cycles-pp.fault_in_readable
>       2.04            +0.1        2.09        perf-profile.self.cycles-pp.file_remove_privs_flags
>       1.51            +0.2        1.69        perf-profile.self.cycles-pp.inode_needs_update_time
>       0.78 ±  2%      +0.3        1.11 ±  5%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
>       0.84            +1.0        1.82        perf-profile.self.cycles-pp.rw_verify_area
>       3.66            +1.3        4.97 ±  4%  perf-profile.self.cycles-pp.filemap_get_entry
>
>
> [2]
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
>   64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
>
> 94167e071109d573 64108c0b47db91b20d658a89969
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>      38903 ±113%    +313.8%     160973 ± 66%  numa-meminfo.node1.AnonHugePages
>    1666466 ±  4%     -12.2%    1462703 ±  9%  numa-numastat.node1.local_node
>      18.97 ±113%    +314.3%      78.59 ± 66%  numa-vmstat.node1.nr_anon_transparent_hugepages
>       6003            -5.6%       5668        time.user_time
>  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
>       6003            -5.6%       5668        unixbench.time.user_time
>  4.314e+10            -2.3%  4.215e+10        unixbench.workload
>     -12.17           +33.7%     -16.26        sched_debug.cpu.nr_uninterruptible.min
>       0.00 ± 95%    +600.3%       0.00 ± 88%  sched_debug.rt_rq:.rt_time.avg
>       0.02 ± 95%    +600.3%       0.14 ± 88%  sched_debug.rt_rq:.rt_time.max
>       0.00 ± 95%    +600.3%       0.01 ± 88%  sched_debug.rt_rq:.rt_time.stddev
>  1.407e+11            -2.0%  1.379e+11        perf-stat.i.branch-instructions
>       0.55            -0.0        0.51 ±  4%  perf-stat.i.branch-miss-rate%
>   55780077           -85.5%    8078438        perf-stat.i.branch-misses
>    5029827 ±  6%    +315.5%   20897838 ± 10%  perf-stat.i.cache-misses
>   35311245 ±  2%    +328.2%  1.512e+08 ±  6%  perf-stat.i.cache-references
>     118639 ± 18%     -61.7%      45421 ± 41%  perf-stat.i.cycles-between-cache-misses
>  6.736e+11            -1.5%  6.634e+11        perf-stat.i.instructions
>       0.01 ±  6%    +321.2%       0.03 ± 10%  perf-stat.overall.MPKI
>       0.04            -0.0        0.01        perf-stat.overall.branch-miss-rate%
>       0.78            +1.5%       0.79        perf-stat.overall.cpi
>     103942 ±  6%     -75.7%      25208 ± 10%  perf-stat.overall.cycles-between-cache-misses
>       1.29            -1.5%       1.27        perf-stat.overall.ipc
>    1.4e+11            -1.9%  1.373e+11        perf-stat.ps.branch-instructions
>   55517704           -85.5%    8057745        perf-stat.ps.branch-misses
>    5026889 ±  6%    +315.3%   20876882 ± 10%  perf-stat.ps.cache-misses
>   35229110 ±  2%    +327.6%  1.506e+08 ±  6%  perf-stat.ps.cache-references
>  6.701e+11            -1.4%  6.608e+11        perf-stat.ps.instructions
>  2.496e+14            -1.4%   2.46e+14        perf-stat.total.instructions
>       3.61            -0.5        3.09        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
>       2.66            -0.5        2.18        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
>      12.62            -0.5       12.16        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
>       7.29            -0.3        7.03        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
>       4.98            -0.2        4.74        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.96            -0.2        6.74        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       4.50            -0.2        4.33        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
>       3.74            -0.2        3.58        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
>      12.82            -0.2       12.66        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       1.04            -0.1        0.91        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
>       2.87            -0.1        2.78        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
>       5.27            -0.1        5.18        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       0.81            -0.1        0.75        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
>       2.17            -0.0        2.13        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
>       1.66            -0.0        1.62        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       0.74            -0.0        0.70        perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
>       1.27            -0.0        1.23        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
>       0.84            -0.0        0.81        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
>       0.61            -0.0        0.57        perf-profile.calltrace.cycles-pp.w_test
>       0.88            -0.0        0.86        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       0.90            -0.0        0.87        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       0.61            -0.0        0.59        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       0.68            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       0.71            -0.0        0.69        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.00 ±  4%      +0.1        1.10 ±  3%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
>       2.68            +0.1        2.79        perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       5.18            +0.1        5.31        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
>       6.00            +0.1        6.14        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       3.35            +0.1        3.48        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
>       4.00            +0.1        4.14        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
>      96.95            +0.2       97.12        perf-profile.calltrace.cycles-pp.write
>       3.68            +0.2        3.86        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      84.87            +0.6       85.45        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
>      83.48            +0.6       84.10        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>      78.96            +0.7       79.70        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>      75.00            +0.8       75.81        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>      58.04            +1.1       59.14        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      36.40            +1.2       37.55        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>      12.88            +1.3       14.20        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>      11.79            +1.3       13.13        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
>       5.74            +1.4        7.16 ±  2%  perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       3.98            -0.5        3.44        perf-profile.children.cycles-pp.security_file_permission
>       2.90            -0.5        2.40        perf-profile.children.cycles-pp.apparmor_file_permission
>       7.66            -0.3        7.38        perf-profile.children.cycles-pp.file_remove_privs_flags
>       7.14            -0.3        6.89        perf-profile.children.cycles-pp.entry_SYSCALL_64
>       5.34            -0.2        5.10        perf-profile.children.cycles-pp.rw_verify_area
>       7.10            -0.2        6.88        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
>       3.98            -0.2        3.80        perf-profile.children.cycles-pp.cap_inode_need_killpriv
>       4.73            -0.2        4.56        perf-profile.children.cycles-pp.security_inode_need_killpriv
>      13.16            -0.2       13.00        perf-profile.children.cycles-pp.__generic_file_write_iter
>       1.14            -0.1        1.01        perf-profile.children.cycles-pp.syscall_return_via_sysret
>       5.67            -0.1        5.56        perf-profile.children.cycles-pp.simple_write_end
>       3.56            -0.1        3.46        perf-profile.children.cycles-pp.__vfs_getxattr
>       4.04            -0.1        3.95        perf-profile.children.cycles-pp.__cond_resched
>       4.21            -0.1        4.14        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       2.31            -0.1        2.24        perf-profile.children.cycles-pp.rcu_all_qs
>       0.99            -0.1        0.93        perf-profile.children.cycles-pp.folio_mark_dirty
>       2.46            -0.0        2.42        perf-profile.children.cycles-pp.xas_load
>       0.93            -0.0        0.88 ±  2%  perf-profile.children.cycles-pp.w_test
>       1.50            -0.0        1.46        perf-profile.children.cycles-pp.strcmp
>       1.72            -0.0        1.68        perf-profile.children.cycles-pp.up_write
>       0.87            -0.0        0.82        perf-profile.children.cycles-pp.setattr_should_drop_suidgid
>       1.04            -0.0        1.00        perf-profile.children.cycles-pp.folio_mapping
>       0.96            -0.0        0.92        perf-profile.children.cycles-pp.aa_file_perm
>       1.23            -0.0        1.20        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
>       0.73            -0.0        0.70        perf-profile.children.cycles-pp.x64_sys_call
>       1.07            -0.0        1.05        perf-profile.children.cycles-pp.folio_wait_stable
>       0.43            -0.0        0.41 ±  2%  perf-profile.children.cycles-pp.write@plt
>       1.08            -0.0        1.06        perf-profile.children.cycles-pp.xattr_resolve_name
>       0.35            -0.0        0.34        perf-profile.children.cycles-pp.__x64_sys_write
>      99.01            +0.0       99.04        perf-profile.children.cycles-pp.write
>       1.12 ±  4%      +0.1        1.20 ±  2%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
>       2.86            +0.1        2.97        perf-profile.children.cycles-pp.down_write
>       5.50            +0.1        5.63        perf-profile.children.cycles-pp.fault_in_readable
>       3.75            +0.1        3.88        perf-profile.children.cycles-pp.inode_needs_update_time
>       4.34            +0.1        4.47        perf-profile.children.cycles-pp.file_update_time
>       6.25            +0.1        6.39        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
>       3.77            +0.2        3.96        perf-profile.children.cycles-pp.__fsnotify_parent
>      85.29            +0.6       85.86        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      84.14            +0.6       84.74        perf-profile.children.cycles-pp.do_syscall_64
>      79.57            +0.7       80.28        perf-profile.children.cycles-pp.ksys_write
>      75.84            +0.8       76.64        perf-profile.children.cycles-pp.vfs_write
>      58.64            +1.1       59.72        perf-profile.children.cycles-pp.generic_file_write_iter
>      37.30            +1.1       38.43        perf-profile.children.cycles-pp.generic_perform_write
>      13.05            +1.3       14.38        perf-profile.children.cycles-pp.simple_write_begin
>      12.18            +1.3       13.52        perf-profile.children.cycles-pp.__filemap_get_folio
>       5.94            +1.4        7.35 ±  2%  perf-profile.children.cycles-pp.filemap_get_entry
>       1.77            -0.4        1.35 ±  2%  perf-profile.self.cycles-pp.apparmor_file_permission
>       6.23            -0.3        5.94        perf-profile.self.cycles-pp.write
>       6.94            -0.2        6.71        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
>       7.14            -0.2        6.93        perf-profile.self.cycles-pp.vfs_write
>       1.13            -0.1        1.01        perf-profile.self.cycles-pp.syscall_return_via_sysret
>       1.49 ±  3%      -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
>       4.12            -0.1        4.03        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.42            -0.1        0.34        perf-profile.self.cycles-pp.cap_inode_need_killpriv
>       2.17            -0.1        2.10        perf-profile.self.cycles-pp.file_remove_privs_flags
>       1.08            -0.1        1.02        perf-profile.self.cycles-pp.security_file_permission
>       1.83            -0.1        1.77        perf-profile.self.cycles-pp.do_syscall_64
>       2.74            -0.1        2.68        perf-profile.self.cycles-pp.simple_write_end
>       0.86            -0.0        0.81        perf-profile.self.cycles-pp.aa_file_perm
>       1.60            -0.0        1.56        perf-profile.self.cycles-pp.up_write
>       1.42            -0.0        1.38        perf-profile.self.cycles-pp.__vfs_getxattr
>       0.86            -0.0        0.82        perf-profile.self.cycles-pp.folio_mapping
>       1.36            -0.0        1.32        perf-profile.self.cycles-pp.rcu_all_qs
>       0.52            -0.0        0.49        perf-profile.self.cycles-pp.folio_mark_dirty
>       1.78            -0.0        1.75        perf-profile.self.cycles-pp.xas_load
>       1.23            -0.0        1.20        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       0.74            -0.0        0.71        perf-profile.self.cycles-pp.setattr_should_drop_suidgid
>       0.74            -0.0        0.71 ±  2%  perf-profile.self.cycles-pp.w_test
>       1.14            -0.0        1.11        perf-profile.self.cycles-pp.strcmp
>       2.25            -0.0        2.22        perf-profile.self.cycles-pp.__cond_resched
>       0.60            -0.0        0.58        perf-profile.self.cycles-pp.x64_sys_call
>       0.77            -0.0        0.75        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
>       0.61            -0.0        0.60        perf-profile.self.cycles-pp.xattr_resolve_name
>       0.74            +0.0        0.76        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
>       1.40            +0.1        1.45        perf-profile.self.cycles-pp.generic_write_checks
>       1.60            +0.1        1.65        perf-profile.self.cycles-pp.inode_needs_update_time
>       1.00 ±  4%      +0.1        1.08 ±  3%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
>       1.86 ±  2%      +0.1        1.98        perf-profile.self.cycles-pp.down_write
>       5.34            +0.1        5.47        perf-profile.self.cycles-pp.fault_in_readable
>       3.61            +0.2        3.80        perf-profile.self.cycles-pp.__fsnotify_parent
>       1.46            +0.3        1.77        perf-profile.self.cycles-pp.rw_verify_area
>       3.43            +1.4        4.88 ±  3%  perf-profile.self.cycles-pp.filemap_get_entry
>
>
> >
> > Thanks,
> > Amir.
> >
> >
> >
> >
> > > testcase: unixbench
> > > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> > > parameters:
> > >
> > >         runtime: 300s
> > >         nr_task: 100%
> > >         test: fsbuffer-w
> > >         cpufreq_governor: performance
> > >
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com
> > >
> > >
> > > Details are as below:
> > > -------------------------------------------------------------------------------------------------->
> > >
> > >
> > > The kernel config and materials to reproduce are available at:
> > > https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > >   00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > >   9d1fd61f1d ("fanotify: pass optional file access range in pre-content event")
> > >
> > > 00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b
> > > ---------------- ---------------------------
> > >          %stddev     %change         %stddev
> > >              \          |                \
> > >   1.23e+08            -7.9%  1.133e+08        unixbench.throughput
> > >       6169            -7.7%       5694        unixbench.time.user_time
> > >  4.566e+10            -7.9%  4.206e+10        unixbench.workload
> > >  1.513e+11            -4.5%  1.445e+11        perf-stat.i.branch-instructions
> > >    6891152            +4.8%    7221484        perf-stat.i.branch-misses
> > >   29764445 ±  2%      -7.4%   27565609 ±  3%  perf-stat.i.cache-references
> > >       0.91            +2.0%       0.93        perf-stat.i.cpi
> > >  7.187e+11            -2.7%  6.996e+11        perf-stat.i.instructions
> > >       1.26            -2.6%       1.23        perf-stat.i.ipc
> > >       0.00            +0.0        0.01        perf-stat.overall.branch-miss-rate%
> > >       0.73            +2.7%       0.75        perf-stat.overall.cpi
> > >       1.37            -2.6%       1.34        perf-stat.overall.ipc
> > >       5828            +5.7%       6162        perf-stat.overall.path-length
> > >  1.505e+11            -4.5%  1.437e+11        perf-stat.ps.branch-instructions
> > >    6873687            +4.8%    7203107        perf-stat.ps.branch-misses
> > >   29721957 ±  2%      -7.3%   27538369 ±  3%  perf-stat.ps.cache-references
> > >  7.148e+11            -2.6%   6.96e+11        perf-stat.ps.instructions
> > >  2.662e+14            -2.6%  2.592e+14        perf-stat.total.instructions
> > >      57.79            -2.0       55.78        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >      37.58            -2.0       35.63        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > >      13.06            -1.0       12.04        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> > >      13.81            -1.0       12.83        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > >      12.72            -0.9       11.78        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
> > >       7.00            -0.5        6.47        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > >       6.53            -0.5        6.02        perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> > >       5.36            -0.5        4.89        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > >       3.66            -0.4        3.28        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> > >       2.68            -0.3        2.36        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
> > >       6.57            -0.2        6.34        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> > >       2.36 ±  2%      -0.2        2.18 ±  2%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > >       1.83            -0.2        1.66        perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> > >       2.92            -0.2        2.76        perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > >       2.65            -0.2        2.49        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
> > >       3.95            -0.1        3.83        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> > >       1.62            -0.1        1.50        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > >       0.74            -0.1        0.64        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       3.26            -0.1        3.17        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
> > >       3.57            -0.1        3.49        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       1.61            -0.1        1.53        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > >       0.93            -0.1        0.85        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > >       1.05            -0.1        0.99        perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> > >       0.61            -0.1        0.55        perf-profile.calltrace.cycles-pp.w_test
> > >       0.64            -0.1        0.58        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > >       0.87            -0.1        0.82        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
> > >       2.50            -0.1        2.44        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
> > >       0.62            -0.1        0.56        perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> > >       0.74            -0.0        0.69        perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> > >       0.91            -0.0        0.86        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> > >       0.84            -0.0        0.79        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> > >       0.68            -0.0        0.64        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> > >       0.74            -0.0        0.71        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> > >       0.62            -0.0        0.59        perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > >       0.97            +0.0        1.00        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> > >       0.91            +0.1        0.97        perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> > >       0.86 ±  3%      +0.1        0.94        perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
> > >       0.58 ±  2%      +0.1        0.66 ±  7%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> > >      11.24            +0.1       11.36        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > >       2.01 ±  2%      +0.1        2.14        perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > >       6.04            +0.2        6.24        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > >       5.17            +0.2        5.42        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
> > >      96.75            +0.3       97.03        perf-profile.calltrace.cycles-pp.write
> > >       2.57            +0.4        2.92        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
> > >       3.20            +0.4        3.57        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> > >      84.82            +1.1       85.88        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
> > >      83.38            +1.2       84.56        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > >      78.73            +1.5       80.20        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > >      74.54            +1.8       76.32        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > >       0.00            +4.0        3.99        perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> > >       5.32            +4.2        9.48        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >      58.42            -2.0       56.38        perf-profile.children.cycles-pp.generic_file_write_iter
> > >      38.46            -2.0       36.50        perf-profile.children.cycles-pp.generic_perform_write
> > >      13.99            -1.0       13.01        perf-profile.children.cycles-pp.simple_write_begin
> > >      13.11            -1.0       12.15        perf-profile.children.cycles-pp.__filemap_get_folio
> > >       7.23            -0.6        6.66        perf-profile.children.cycles-pp.entry_SYSCALL_64
> > >       7.12            -0.5        6.59        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
> > >       6.73            -0.5        6.21        perf-profile.children.cycles-pp.filemap_get_entry
> > >       5.76            -0.5        5.26        perf-profile.children.cycles-pp.simple_write_end
> > >       4.05            -0.4        3.64        perf-profile.children.cycles-pp.security_file_permission
> > >       2.93            -0.3        2.59        perf-profile.children.cycles-pp.apparmor_file_permission
> > >       4.32            -0.3        4.04        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> > >       4.20            -0.3        3.92        perf-profile.children.cycles-pp.__cond_resched
> > >       6.91            -0.2        6.67        perf-profile.children.cycles-pp.file_remove_privs_flags
> > >       2.43            -0.2        2.24        perf-profile.children.cycles-pp.rcu_all_qs
> > >       3.10            -0.2        2.92        perf-profile.children.cycles-pp.xas_load
> > >       2.47 ±  2%      -0.2        2.29 ±  2%  perf-profile.children.cycles-pp.__fdget_pos
> > >       1.92            -0.2        1.74        perf-profile.children.cycles-pp.folio_unlock
> > >       3.11            -0.2        2.94        perf-profile.children.cycles-pp.down_write
> > >       4.18            -0.1        4.04        perf-profile.children.cycles-pp.security_inode_need_killpriv
> > >       1.68            -0.1        1.56        perf-profile.children.cycles-pp.up_write
> > >       3.48            -0.1        3.38        perf-profile.children.cycles-pp.cap_inode_need_killpriv
> > >       1.96            -0.1        1.87        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> > >       1.28            -0.1        1.18        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
> > >       0.92            -0.1        0.84        perf-profile.children.cycles-pp.w_test
> > >       3.14            -0.1        3.06        perf-profile.children.cycles-pp.__vfs_getxattr
> > >       1.00            -0.1        0.92        perf-profile.children.cycles-pp.aa_file_perm
> > >       1.29            -0.1        1.22        perf-profile.children.cycles-pp.xas_descend
> > >       0.76            -0.1        0.70        perf-profile.children.cycles-pp.x64_sys_call
> > >       0.87            -0.1        0.80        perf-profile.children.cycles-pp.setattr_should_drop_suidgid
> > >       1.07            -0.1        1.01        perf-profile.children.cycles-pp.xattr_resolve_name
> > >       1.10            -0.1        1.04        perf-profile.children.cycles-pp.folio_wait_stable
> > >       1.05            -0.1        1.00        perf-profile.children.cycles-pp.folio_mapping
> > >       0.73            -0.1        0.67        perf-profile.children.cycles-pp.xas_start
> > >       0.93            -0.1        0.88        perf-profile.children.cycles-pp.folio_mark_dirty
> > >       0.50            -0.0        0.46        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> > >       0.60            -0.0        0.56        perf-profile.children.cycles-pp.inode_to_bdi
> > >       0.43            -0.0        0.39        perf-profile.children.cycles-pp.write@plt
> > >       0.36            -0.0        0.33        perf-profile.children.cycles-pp.amd_clear_divider
> > >       0.37            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
> > >       0.33            -0.0        0.31        perf-profile.children.cycles-pp.noop_dirty_folio
> > >       0.36            -0.0        0.34        perf-profile.children.cycles-pp.is_bad_inode
> > >       0.24            -0.0        0.23 ±  2%  perf-profile.children.cycles-pp.file_remove_privs
> > >       1.18            +0.0        1.21        perf-profile.children.cycles-pp.strcmp
> > >       1.02            +0.1        1.08        perf-profile.children.cycles-pp.timestamp_truncate
> > >      99.01            +0.1       99.09        perf-profile.children.cycles-pp.write
> > >       0.98 ±  3%      +0.1        1.06        perf-profile.children.cycles-pp.generic_write_check_limits
> > >       0.68 ±  2%      +0.1        0.77 ±  6%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> > >      11.58            +0.1       11.69        perf-profile.children.cycles-pp.__generic_file_write_iter
> > >       2.36 ±  2%      +0.1        2.50        perf-profile.children.cycles-pp.generic_write_checks
> > >       5.57            +0.2        5.75        perf-profile.children.cycles-pp.fault_in_readable
> > >       6.28            +0.2        6.49        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
> > >       2.98            +0.4        3.33        perf-profile.children.cycles-pp.inode_needs_update_time
> > >       3.51            +0.4        3.89        perf-profile.children.cycles-pp.file_update_time
> > >      85.24            +1.1       86.31        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > >      84.05            +1.2       85.21        perf-profile.children.cycles-pp.do_syscall_64
> > >      79.32            +1.5       80.78        perf-profile.children.cycles-pp.ksys_write
> > >      75.49            +1.7       77.21        perf-profile.children.cycles-pp.vfs_write
> > >       3.64            +4.0        7.64        perf-profile.children.cycles-pp.__fsnotify_parent
> > >       5.68            +4.3       10.03        perf-profile.children.cycles-pp.rw_verify_area
> > >       6.96            -0.5        6.44        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
> > >       6.52            -0.5        6.01        perf-profile.self.cycles-pp.write
> > >       6.92            -0.4        6.48        perf-profile.self.cycles-pp.vfs_write
> > >       3.59            -0.3        3.24        perf-profile.self.cycles-pp.filemap_get_entry
> > >       4.41            -0.3        4.09        perf-profile.self.cycles-pp.__filemap_get_folio
> > >       4.23            -0.3        3.95        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > >       2.79            -0.3        2.52        perf-profile.self.cycles-pp.simple_write_end
> > >       1.76            -0.2        1.52        perf-profile.self.cycles-pp.apparmor_file_permission
> > >       2.32 ±  2%      -0.2        2.16 ±  2%  perf-profile.self.cycles-pp.__fdget_pos
> > >       1.79            -0.2        1.62        perf-profile.self.cycles-pp.folio_unlock
> > >       2.05            -0.2        1.89        perf-profile.self.cycles-pp.down_write
> > >       2.35            -0.1        2.22        perf-profile.self.cycles-pp.__cond_resched
> > >       1.89            -0.1        1.77        perf-profile.self.cycles-pp.do_syscall_64
> > >       1.38            -0.1        1.26        perf-profile.self.cycles-pp.entry_SYSCALL_64
> > >       1.56            -0.1        1.45        perf-profile.self.cycles-pp.up_write
> > >       1.30            -0.1        1.19        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> > >       1.42            -0.1        1.31        perf-profile.self.cycles-pp.rcu_all_qs
> > >       1.12            -0.1        1.02        perf-profile.self.cycles-pp.security_file_permission
> > >       1.46            -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
> > >       0.90            -0.1        0.83        perf-profile.self.cycles-pp.aa_file_perm
> > >       1.29            -0.1        1.22        perf-profile.self.cycles-pp.xas_load
> > >       0.74            -0.1        0.67        perf-profile.self.cycles-pp.w_test
> > >       1.08            -0.1        1.01        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> > >       1.98            -0.1        1.92        perf-profile.self.cycles-pp.file_remove_privs_flags
> > >       1.30            -0.1        1.24        perf-profile.self.cycles-pp.__vfs_getxattr
> > >       1.06            -0.1        1.00        perf-profile.self.cycles-pp.xas_descend
> > >       0.80            -0.1        0.74        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
> > >       0.63            -0.1        0.58        perf-profile.self.cycles-pp.x64_sys_call
> > >       0.74            -0.1        0.69        perf-profile.self.cycles-pp.setattr_should_drop_suidgid
> > >       0.63            -0.0        0.58        perf-profile.self.cycles-pp.xas_start
> > >       0.87            -0.0        0.83        perf-profile.self.cycles-pp.folio_mapping
> > >       0.50            -0.0        0.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> > >       0.60            -0.0        0.57        perf-profile.self.cycles-pp.xattr_resolve_name
> > >       0.48            -0.0        0.44        perf-profile.self.cycles-pp.folio_mark_dirty
> > >       0.68            -0.0        0.65        perf-profile.self.cycles-pp.security_inode_need_killpriv
> > >       0.36            -0.0        0.33 ±  2%  perf-profile.self.cycles-pp.inode_to_bdi
> > >       0.52            -0.0        0.49        perf-profile.self.cycles-pp.folio_wait_stable
> > >       0.34            -0.0        0.32        perf-profile.self.cycles-pp.cap_inode_need_killpriv
> > >       0.89            -0.0        0.87        perf-profile.self.cycles-pp.simple_write_begin
> > >       0.25            -0.0        0.23        perf-profile.self.cycles-pp.__x64_sys_write
> > >       0.23 ±  2%      -0.0        0.22 ±  2%  perf-profile.self.cycles-pp.amd_clear_divider
> > >       0.23 ±  2%      -0.0        0.21        perf-profile.self.cycles-pp.noop_dirty_folio
> > >       0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.write@plt
> > >       0.24            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.is_bad_inode
> > >       0.62            +0.0        0.65        perf-profile.self.cycles-pp.file_update_time
> > >       0.86            +0.0        0.90        perf-profile.self.cycles-pp.strcmp
> > >       0.69            +0.0        0.74        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
> > >       0.75 ±  3%      +0.1        0.81        perf-profile.self.cycles-pp.generic_write_check_limits
> > >       1.42 ±  2%      +0.1        1.48        perf-profile.self.cycles-pp.generic_write_checks
> > >       0.82            +0.1        0.89        perf-profile.self.cycles-pp.timestamp_truncate
> > >       0.58 ±  3%      +0.1        0.66 ±  6%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> > >       5.44            +0.2        5.60        perf-profile.self.cycles-pp.fault_in_readable
> > >       1.36            +0.2        1.55        perf-profile.self.cycles-pp.inode_needs_update_time
> > >       1.76 ±  3%      +0.9        2.64        perf-profile.self.cycles-pp.rw_verify_area
> > >       3.46            +3.8        7.25        perf-profile.self.cycles-pp.__fsnotify_parent
> > >
> > >
> > >
> > >
> > > Disclaimer:
> > > Results have been estimated based on internal Intel analysis and are provided
> > > for informational purposes only. Any difference in system hardware or software
> > > design or configuration may affect actual performance.
> > >
> > >
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki
> > >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-05-31  5:18     ` Amir Goldstein
@ 2024-06-03  8:13       ` Oliver Sang
  2024-06-04 12:33         ` Amir Goldstein
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-06-03  8:13 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang

hi, Amir,

On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Amir,
> >
> > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > <oliver.sang@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > >
> > > >
> > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > https://github.com/amir73il/linux sb_write_barrier
> > > >
> > >
> > > Jan,
> > >
> > > I speculate that the regression is due to the fact that we store and pass the
> > > path information on struct file_range on the stack before the optimizations
> > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > and __fsnotify_parent() pays a bigger price for fetches?
> > >
> > > Luckily, we already have the way to check
> > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > >                                                FSNOTIFY_PRIO_PRE_CONTENT))
> > > so now I used it to optimize out the fsnotify_file_range() inline
> > > code entirely.
> > >
> > > Oliver,
> > >
> > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > >
> > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > with pre-content events
> > > * f301cd18006c - fanotify: rename a misnamed constant
> > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > >
> > > The optimization was done in the first commit (fsnotify: introduce
> > > pre-content permission event),
> > > but impacts the regressing commit (fanotify: pass optional file access
> > > range in pre-content event).
> > > no need to test all middle commits.
> >
> > I directly compare the tip with v6.10-rc1, still a regression but better now
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> >   v6.10-rc1
> >   a82fd282befc7 ("fanotify: report file range info with pre-content events")
> >
> >        v6.10-rc1 a82fd282befc71d99106bf31066
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >  1.216e+08            -3.9%  1.168e+08        unixbench.throughput
> >
> > full data is as below [1]
> >
> >
> > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> >
> > it also has a small regression comparing to its parent, but better also.
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> >   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> >   64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> >
> > 94167e071109d573 64108c0b47db91b20d658a89969
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> >
> > full data is as below [2]
> >
> 
> Ok, this looks sane, the small overhead in the write path makes sense.
> It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> before the rest of the pre-content infrastructure, because together they
> would still be a performance win.
> 
> Can you please compare this branch to v6.9?

there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  v6.9
  v6.10-rc1
  a82fd282befc7 ("fanotify: report file range info with pre-content events")

            v6.9                   v6.10-rc1 a82fd282befc71d99106bf31066
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   9218048 ± 19%     +33.1%   12267178 ±  6%     +14.2%   10523306 ±  7%  meminfo.DirectMap2M
    151289           +63.8%     247886 ±  6%     +61.4%     244132 ±  4%  meminfo.DirectMap4k
      0.52            +0.1        0.58            +0.1        0.59        mpstat.cpu.all.irq%
      0.01            -0.0        0.01 ±  4%      -0.0        0.01        mpstat.cpu.all.soft%
     10241            -9.1%       9314 ± 11%     -15.6%       8648 ± 15%  sched_debug.cpu.curr->pid.min
    -35.33           -55.4%     -15.76           -62.3%     -13.31        sched_debug.cpu.nr_uninterruptible.min
    109116 ± 96%     -85.8%      15473 ±125%      +8.6%     118471 ± 81%  numa-meminfo.node0.AnonHugePages
   4803556 ±  2%      -3.5%    4636196 ±  2%     -31.7%    3278497 ± 41%  numa-meminfo.node0.MemUsed
    574474 ± 29%     +45.5%     836146 ±  6%      -6.8%     535267 ± 45%  numa-meminfo.node1.AnonPages.max
   1773634 ±  6%      +8.9%    1931750 ±  5%     +85.6%    3291386 ± 41%  numa-meminfo.node1.MemUsed
     35.33 ± 15%     -73.1%       9.50 ± 24%     -58.5%      14.67 ± 27%  perf-c2c.DRAM.local
    181.67 ± 11%     -74.7%      46.00 ± 12%     -69.6%      55.17 ± 18%  perf-c2c.DRAM.remote
    298.67 ±  7%     -82.2%      53.17 ±  9%     -79.1%      62.33 ± 21%  perf-c2c.HITM.local
    125.67 ± 15%     -77.1%      28.83 ± 15%     -72.9%      34.00 ± 22%  perf-c2c.HITM.remote
    265024            -1.2%     261842            -0.8%     262871        time.involuntary_context_switches
     25.33 ± 16%     -61.2%       9.83 ± 23%     -59.2%      10.33 ± 22%  time.major_page_faults
      7168            +0.9%       7234            +0.9%       7234        time.maximum_resident_set_size
      6286            -1.4%       6199            -7.1%       5841        time.user_time
     70712            -1.7%      69536            -0.9%      70096        proc-vmstat.nr_active_anon
      9037            +1.4%       9162 ±  2%      +2.9%       9301        proc-vmstat.nr_page_table_pages
     73584            -1.8%      72274            -1.1%      72752        proc-vmstat.nr_shmem
     70712            -1.7%      69536            -0.9%      70096        proc-vmstat.nr_zone_active_anon
     35571 ±  8%      -9.5%      32176 ±  3%     -15.7%      29987 ±  4%  proc-vmstat.pgactivate
 1.219e+08            -0.2%  1.216e+08            -4.1%  1.168e+08        unixbench.throughput
    265024            -1.2%     261842            -0.8%     262871        unixbench.time.involuntary_context_switches
      7168            +0.9%       7234            +0.9%       7234        unixbench.time.maximum_resident_set_size
      6286            -1.4%       6199            -7.1%       5841        unixbench.time.user_time
 4.521e+10            -0.2%  4.513e+10            -4.1%  4.338e+10        unixbench.workload
 1.476e+11            -1.2%  1.458e+11            -3.9%  1.419e+11        perf-stat.i.branch-instructions
   7506784            -2.1%    7347431            -2.4%    7329399        perf-stat.i.branch-misses
   3830897 ±  5%      +2.2%    3915539 ±  8%    +523.5%   23884093 ±  9%  perf-stat.i.cache-misses
  30323968 ±  2%      +6.9%   32425619 ±  3%    +430.9%   1.61e+08 ±  4%  perf-stat.i.cache-references
      0.94            +1.6%       0.95            +1.6%       0.95        perf-stat.i.cpi
    157608 ± 12%      -4.1%     151202 ± 16%     -79.5%      32364 ± 56%  perf-stat.i.cycles-between-cache-misses
 7.003e+11            -0.6%  6.961e+11            -2.5%  6.828e+11        perf-stat.i.instructions
      1.23            -1.0%       1.22            -2.3%       1.20        perf-stat.i.ipc
      0.09 ± 14%     -56.1%       0.04 ± 20%     -64.9%       0.03 ± 22%  perf-stat.i.major-faults
      0.01 ±  5%      +3.3%       0.01 ±  9%    +540.2%       0.04 ± 10%  perf-stat.overall.MPKI
      0.01            -0.0        0.01            +0.0        0.01        perf-stat.overall.branch-miss-rate%
      0.75            +0.6%       0.75            +2.6%       0.77        perf-stat.overall.cpi
    136694 ±  5%      -2.1%     133775 ±  8%     -83.9%      22060 ±  9%  perf-stat.overall.cycles-between-cache-misses
      1.34            -0.6%       1.33            -2.5%       1.31        perf-stat.overall.ipc
      5752            -0.5%       5721            +1.5%       5836        perf-stat.overall.path-length
 1.469e+11            -1.2%  1.452e+11            -3.8%  1.413e+11        perf-stat.ps.branch-instructions
   3815245 ±  5%      +2.8%    3921138 ±  8%    +524.3%   23818053 ±  9%  perf-stat.ps.cache-misses
  30276290 ±  2%      +7.1%   32415461 ±  3%    +429.3%  1.603e+08 ±  4%  perf-stat.ps.cache-references
  6.97e+11            -0.5%  6.932e+11            -2.5%  6.797e+11        perf-stat.ps.instructions
      0.09 ± 14%     -56.0%       0.04 ± 21%     -64.6%       0.03 ± 23%  perf-stat.ps.major-faults
 2.601e+14            -0.7%  2.582e+14            -2.7%  2.532e+14        perf-stat.total.instructions
     58.72            -0.8       57.94            +0.5       59.20        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     38.04            -0.8       37.28            +0.1       38.12        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      5.91            -0.6        5.30            -0.5        5.38        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
     78.64            -0.5       78.13            +0.8       79.41        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      2.65            -0.5        2.18            -0.5        2.12        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
     83.29            -0.5       82.83            +0.6       83.86        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      6.45            -0.5        5.99            +0.8        7.25 ±  3%  perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
     84.71            -0.4       84.26            +0.5       85.21        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
      6.59            -0.4        6.17            -0.3        6.27        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     74.65            -0.4       74.26            +0.9       75.59        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     13.74            -0.3       13.39            +0.6       14.36        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     12.63            -0.3       12.30            +0.7       13.30        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
      4.62            -0.3        4.32            +0.3        4.96        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.78 ±  2%      -0.3        2.50            -0.4        2.35        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
      3.62            -0.3        3.34            -0.3        3.28        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
      2.06            -0.1        1.92            -0.1        1.95        perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      2.92            -0.1        2.81            -0.2        2.72 ±  2%  perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.75            -0.0        0.70            -0.1        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
      0.93            -0.0        0.89            -0.0        0.88        perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
      0.99            -0.0        0.96            -0.0        0.98        perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
      0.74            -0.0        0.72            -0.0        0.71        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.63            -0.0        0.62            -0.0        0.61        perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.87            -0.0        0.86            -0.0        0.82        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
      0.64            -0.0        0.63            -0.0        0.59        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.75            -0.0        0.75            +0.0        0.77        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      0.68            +0.0        0.68            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.91            +0.0        0.92            -0.0        0.88        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.84            +0.0        0.86            -0.0        0.83        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      0.63            +0.0        0.65            -0.0        0.60 ±  2%  perf-profile.calltrace.cycles-pp.w_test
      5.29            +0.0        5.30            +0.1        5.36        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.89            +0.0        0.92            -0.0        0.87        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      1.59            +0.0        1.62            -0.0        1.55        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.61 ±  2%      +0.0        0.64            +0.0        0.63        perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
      1.62            +0.1        1.68            -0.0        1.59        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      1.05            +0.1        1.11            -0.1        0.91        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
     96.77            +0.1       96.84            +0.2       96.98        perf-profile.calltrace.cycles-pp.write
      6.92            +0.1        7.01            -0.1        6.80        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      2.87            +0.1        2.97            +0.7        3.57        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
      3.53            +0.1        3.63            +0.7        4.24        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
      1.03            +0.1        1.13            +0.1        1.17        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      2.59            +0.1        2.71            +0.1        2.71        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
      3.37            +0.2        3.53            +0.1        3.50        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
      0.62 ±  3%      +0.2        0.78 ±  2%      +0.5        1.13 ±  5%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
     13.02            +0.2       13.19            -0.5       12.50        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
      4.06            +0.2        4.23            +0.2        4.22        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
      6.66            +0.2        6.88            +0.2        6.91        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
      3.48            +0.2        3.73            +0.2        3.64        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     11.67            +0.3       12.01            +0.9       12.62        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.00            +0.5        0.52            +0.5        0.52        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.00            +0.5        0.53            +0.5        0.51        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write
     59.32            -0.8       58.52            +0.5       59.78        perf-profile.children.cycles-pp.generic_file_write_iter
     38.93            -0.8       38.16            +0.1       39.01        perf-profile.children.cycles-pp.generic_perform_write
      3.10            -0.6        2.47            -0.7        2.41        perf-profile.children.cycles-pp.xas_load
     79.22            -0.5       78.74            +0.8       80.00        perf-profile.children.cycles-pp.ksys_write
      6.64            -0.5        6.18            +0.8        7.44 ±  3%  perf-profile.children.cycles-pp.filemap_get_entry
     85.12            -0.4       84.67            +0.5       85.63        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     83.94            -0.4       83.50            +0.6       84.51        perf-profile.children.cycles-pp.do_syscall_64
      6.86            -0.4        6.42            -0.3        6.53        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
     75.55            -0.4       75.13            +0.9       76.42        perf-profile.children.cycles-pp.vfs_write
      6.08            -0.4        5.69            -0.3        5.75        perf-profile.children.cycles-pp.fault_in_readable
     13.92            -0.3       13.58            +0.6       14.54        perf-profile.children.cycles-pp.simple_write_begin
     13.03            -0.3       12.68            +0.7       13.68        perf-profile.children.cycles-pp.__filemap_get_folio
      4.86            -0.3        4.56            +0.5        5.33        perf-profile.children.cycles-pp.rw_verify_area
      4.01            -0.3        3.71            -0.4        3.64        perf-profile.children.cycles-pp.security_file_permission
      3.02 ±  2%      -0.3        2.74            -0.4        2.58        perf-profile.children.cycles-pp.apparmor_file_permission
      2.42            -0.2        2.25            -0.1        2.27        perf-profile.children.cycles-pp.generic_write_checks
      3.11            -0.1        2.98            -0.2        2.90 ±  2%  perf-profile.children.cycles-pp.down_write
      4.28            -0.1        4.18            -0.3        4.00        perf-profile.children.cycles-pp.__cond_resched
      1.05            -0.0        1.00            -0.1        1.00        perf-profile.children.cycles-pp.generic_write_check_limits
     98.99            -0.0       98.96            +0.0       99.02        perf-profile.children.cycles-pp.write
      2.45            -0.0        2.42            -0.1        2.30        perf-profile.children.cycles-pp.rcu_all_qs
      1.10            -0.0        1.07            -0.0        1.09        perf-profile.children.cycles-pp.timestamp_truncate
      0.99            -0.0        0.98            -0.1        0.94        perf-profile.children.cycles-pp.aa_file_perm
      0.76            -0.0        0.75            -0.0        0.71        perf-profile.children.cycles-pp.x64_sys_call
      0.33            -0.0        0.32            -0.0        0.31        perf-profile.children.cycles-pp.noop_dirty_folio
      0.23            -0.0        0.23 ±  2%      -0.0        0.22        perf-profile.children.cycles-pp.file_remove_privs
      0.59            +0.0        0.59            -0.0        0.56        perf-profile.children.cycles-pp.inode_to_bdi
      0.25            +0.0        0.25            -0.0        0.24        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
      0.93            +0.0        0.93            +0.0        0.95        perf-profile.children.cycles-pp.folio_mark_dirty
      0.36            +0.0        0.36            -0.0        0.34        perf-profile.children.cycles-pp.amd_clear_divider
      0.37            +0.0        0.38            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
      1.26            +0.0        1.26            -0.0        1.21        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
      5.69            +0.0        5.70            +0.1        5.75        perf-profile.children.cycles-pp.simple_write_end
      0.35 ±  2%      +0.0        0.36            +0.0        0.37        perf-profile.children.cycles-pp.__cmd_record
      0.35 ±  2%      +0.0        0.36            +0.0        0.37        perf-profile.children.cycles-pp.cmd_record
      0.35 ±  2%      +0.0        0.36            +0.0        0.37        perf-profile.children.cycles-pp.record__mmap_read_evlist
      1.08            +0.0        1.10            -0.0        1.07        perf-profile.children.cycles-pp.xattr_resolve_name
      0.39 ±  2%      +0.0        0.40 ±  3%      +0.0        0.41 ±  4%  perf-profile.children.cycles-pp.update_process_times
      0.42 ±  2%      +0.0        0.43 ±  3%      +0.0        0.44 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.44            +0.0        0.46            -0.0        0.42        perf-profile.children.cycles-pp.write@plt
      0.41 ±  3%      +0.0        0.43 ±  3%      +0.0        0.43 ±  4%  perf-profile.children.cycles-pp.tick_nohz_handler
      1.03            +0.0        1.05            -0.0        1.03        perf-profile.children.cycles-pp.folio_mapping
      0.08            +0.0        0.10 ±  4%      +0.0        0.10 ±  5%  perf-profile.children.cycles-pp.update_min_vruntime
      0.10            +0.0        0.13 ±  2%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.update_curr
      0.96            +0.0        0.99            -0.0        0.91        perf-profile.children.cycles-pp.w_test
      1.09            +0.0        1.12            -0.0        1.06        perf-profile.children.cycles-pp.folio_wait_stable
      1.95            +0.0        1.99            -0.1        1.90        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.00            +0.0        0.04 ± 44%      +0.1        0.05        perf-profile.children.cycles-pp.ktime_get_update_offsets_now
      0.09            +0.1        0.15 ±  5%      +0.1        0.16 ± 14%  perf-profile.children.cycles-pp.ktime_get
      0.10 ±  4%      +0.1        0.15 ±  4%      +0.1        0.16 ± 13%  perf-profile.children.cycles-pp.clockevents_program_event
      4.36            +0.1        4.42            -0.2        4.18        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.50            +0.1        0.56            +0.0        0.53        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      1.68            +0.1        1.74            -0.0        1.65        perf-profile.children.cycles-pp.up_write
      1.14            +0.1        1.21            -0.1        1.00        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.55 ±  3%      +0.1        0.64 ±  3%      +0.1        0.66 ±  5%  perf-profile.children.cycles-pp.hrtimer_interrupt
      7.05            +0.1        7.14            -0.1        6.94        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
      0.56 ±  3%      +0.1        0.65 ±  3%      +0.1        0.67 ±  5%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.61 ±  2%      +0.1        0.70 ±  3%      +0.1        0.72 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.58 ±  2%      +0.1        0.67 ±  3%      +0.1        0.69 ±  5%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      3.86            +0.1        3.96            +0.7        4.57        perf-profile.children.cycles-pp.file_update_time
      3.28            +0.1        3.39            +0.7        3.97        perf-profile.children.cycles-pp.inode_needs_update_time
      1.27            +0.1        1.38            +0.1        1.40        perf-profile.children.cycles-pp.strcmp
      3.25            +0.2        3.41            +0.1        3.38        perf-profile.children.cycles-pp.__vfs_getxattr
      0.73 ±  3%      +0.2        0.89            +0.5        1.24 ±  4%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
      3.60            +0.2        3.76            +0.1        3.73        perf-profile.children.cycles-pp.cap_inode_need_killpriv
      4.29            +0.2        4.47            +0.2        4.44        perf-profile.children.cycles-pp.security_inode_need_killpriv
      7.00            +0.2        7.23            +0.3        7.26        perf-profile.children.cycles-pp.file_remove_privs_flags
      7.20            +0.2        7.43            -0.1        7.06        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.00            +0.3        0.27 ±  2%      +0.3        0.27 ±  2%  perf-profile.children.cycles-pp.sched_tick
      3.54            +0.3        3.82            +0.2        3.72        perf-profile.children.cycles-pp.__fsnotify_parent
     12.00            +0.3       12.35            +1.0       12.96        perf-profile.children.cycles-pp.__generic_file_write_iter
      5.93            -0.4        5.54            -0.3        5.59        perf-profile.self.cycles-pp.fault_in_readable
      1.86 ±  3%      -0.3        1.60            -0.4        1.50 ±  2%  perf-profile.self.cycles-pp.apparmor_file_permission
      1.42            -0.1        1.30            -0.1        1.31        perf-profile.self.cycles-pp.generic_write_checks
      2.43            -0.1        2.34            -0.2        2.22        perf-profile.self.cycles-pp.__cond_resched
      3.43            -0.1        3.36            -0.1        3.36        perf-profile.self.cycles-pp.generic_perform_write
      2.00            -0.1        1.93            -0.1        1.92 ±  2%  perf-profile.self.cycles-pp.down_write
      1.76            -0.1        1.69            -0.1        1.67        perf-profile.self.cycles-pp.generic_file_write_iter
      0.63 ±  2%      -0.1        0.56            -0.1        0.56        perf-profile.self.cycles-pp.xas_start
      6.51            -0.1        6.45            -0.4        6.08        perf-profile.self.cycles-pp.write
      0.77            -0.0        0.72            -0.0        0.77        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
      1.28            -0.0        1.25            -0.1        1.20        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.90            -0.0        0.87            +0.0        0.92        perf-profile.self.cycles-pp.timestamp_truncate
      1.53            -0.0        1.51            +0.2        1.69        perf-profile.self.cycles-pp.inode_needs_update_time
      0.90            -0.0        0.88            -0.1        0.84        perf-profile.self.cycles-pp.aa_file_perm
      0.81            -0.0        0.79            -0.0        0.78        perf-profile.self.cycles-pp.generic_write_check_limits
      0.86            -0.0        0.84            +1.0        1.82        perf-profile.self.cycles-pp.rw_verify_area
      1.11            -0.0        1.10            -0.1        1.04        perf-profile.self.cycles-pp.security_file_permission
      1.42            -0.0        1.41            -0.1        1.36        perf-profile.self.cycles-pp.rcu_all_qs
      0.63            -0.0        0.62            -0.0        0.59        perf-profile.self.cycles-pp.x64_sys_call
      0.23 ±  2%      -0.0        0.22            -0.0        0.21        perf-profile.self.cycles-pp.noop_dirty_folio
      0.80            -0.0        0.80            -0.0        0.76        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.12            -0.0        0.12 ±  3%      -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.write@plt
      0.90            -0.0        0.90            -0.0        0.86        perf-profile.self.cycles-pp.simple_write_begin
      0.25            +0.0        0.25            -0.0        0.23        perf-profile.self.cycles-pp.__x64_sys_write
      2.75            +0.0        2.75            +0.0        2.79        perf-profile.self.cycles-pp.simple_write_end
      1.89            +0.0        1.90            -0.1        1.78        perf-profile.self.cycles-pp.do_syscall_64
      0.66            +0.0        0.66            +0.0        0.69        perf-profile.self.cycles-pp.file_update_time
      0.52            +0.0        0.53            -0.0        0.51        perf-profile.self.cycles-pp.folio_wait_stable
      0.86            +0.0        0.87            -0.0        0.85        perf-profile.self.cycles-pp.folio_mapping
      0.69            +0.0        0.70            +0.0        0.71        perf-profile.self.cycles-pp.security_inode_need_killpriv
      1.08            +0.0        1.09            -0.1        1.02        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.07            +0.0        0.09 ±  5%      +0.0        0.08 ±  7%  perf-profile.self.cycles-pp.update_min_vruntime
      0.77            +0.0        0.79            -0.0        0.73        perf-profile.self.cycles-pp.w_test
      0.00            +0.0        0.02 ± 99%      +0.1        0.05        perf-profile.self.cycles-pp.ktime_get_update_offsets_now
      1.44            +0.0        1.47            -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
      1.99            +0.0        2.04            +0.1        2.09        perf-profile.self.cycles-pp.file_remove_privs_flags
      1.57            +0.1        1.62            -0.0        1.53        perf-profile.self.cycles-pp.up_write
      1.33            +0.1        1.39            +0.0        1.36        perf-profile.self.cycles-pp.__vfs_getxattr
      0.09 ±  5%      +0.1        0.14 ±  5%      +0.1        0.15 ± 13%  perf-profile.self.cycles-pp.ktime_get
      4.27            +0.1        4.32            -0.2        4.08        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.49            +0.1        0.56            +0.0        0.53        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      1.14            +0.1        1.21            -0.1        1.00        perf-profile.self.cycles-pp.syscall_return_via_sysret
      6.90            +0.1        6.98            -0.1        6.78        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
      4.43            +0.1        4.52            -0.1        4.36        perf-profile.self.cycles-pp.__filemap_get_folio
      0.94            +0.1        1.03            +0.1        1.06        perf-profile.self.cycles-pp.strcmp
      0.62 ±  3%      +0.2        0.78 ±  2%      +0.5        1.11 ±  5%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
      3.49            +0.2        3.66            +1.5        4.97 ±  4%  perf-profile.self.cycles-pp.filemap_get_entry
      3.42            +0.2        3.65            +0.1        3.56        perf-profile.self.cycles-pp.__fsnotify_parent
      1.35            +0.3        1.66            +0.3        1.60        perf-profile.self.cycles-pp.entry_SYSCALL_64
      6.92            +0.3        7.25            -0.3        6.64        perf-profile.self.cycles-pp.vfs_write
      1.29            +0.5        1.80            +0.4        1.74        perf-profile.self.cycles-pp.xas_load


> 
> Thanks,
> Amir.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-06-03  8:13       ` Oliver Sang
@ 2024-06-04 12:33         ` Amir Goldstein
  2024-07-01  7:42           ` Oliver Sang
  0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-06-04 12:33 UTC (permalink / raw)
  To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp

On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > >
> > > hi, Amir,
> > >
> > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > <oliver.sang@intel.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > >
> > > > >
> > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > >
> > > >
> > > > Jan,
> > > >
> > > > I speculate that the regression is due to the fact that we store and pass the
> > > > path information on struct file_range on the stack before the optimizations
> > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > >
> > > > Luckily, we already have the way to check
> > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > >                                                FSNOTIFY_PRIO_PRE_CONTENT))
> > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > code entirely.
> > > >
> > > > Oliver,
> > > >
> > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > >
> > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > with pre-content events
> > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > >
> > > > The optimization was done in the first commit (fsnotify: introduce
> > > > pre-content permission event),
> > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > range in pre-content event).
> > > > no need to test all middle commits.
> > >
> > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > >   v6.10-rc1
> > >   a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > >
> > >        v6.10-rc1 a82fd282befc71d99106bf31066
> > > ---------------- ---------------------------
> > >          %stddev     %change         %stddev
> > >              \          |                \
> > >  1.216e+08            -3.9%  1.168e+08        unixbench.throughput
> > >
> > > full data is as below [1]
> > >
> > >
> > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > >
> > > it also has a small regression comparing to its parent, but better also.
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > >   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > >   64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > >
> > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > ---------------- ---------------------------
> > >          %stddev     %change         %stddev
> > >              \          |                \
> > >  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> > >
> > > full data is as below [2]
> > >
> >
> > Ok, this looks sane, the small overhead in the write path makes sense.

On second look, while a small regression from 64108c0b47db9 could make
sense, because it changes the inline fsnotify hooks, the extra regression from
the tip of the branch a82fd282befc7 makes no sense at all, as it does not
touch any code that affects the executed functions, so I have to wonder how
reliable are those results.

Could you re-test the commits 94167e071109d..a82fd282befc7?

> > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > before the rest of the pre-content infrastructure, because together they
> > would still be a performance win.
> >
> > Can you please compare this branch to v6.9?
>
> there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
>

This is a bit surprising to me, because a5e57b4d370c should have been a pretty
big performance win for the common case.

Especially, considering that here [1] you reported in pre-merge testing that an
identical commit has improved the fstime-r/unixbench workload
(although with gcc-12):
[1] https://lore.kernel.org/oe-lkp/Zfj3wxDHolB1qCGO@xsang-OptiPlex-9020/
and here [2] that a similar commit had improved writeseek1/will-it-scale
[2] https://lore.kernel.org/all/Zc7KmlQ1cYVrPMQ+@xsang-OptiPlex-9020/

Judging by simple_write_begin() in this regression perf report, and
shmem_file_write_iter in the reports above, may I assume that this report
was with a kernel with non-default config !CONFIG_SHMEM?
Is that correct? Is this an intended config change?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-06-04 12:33         ` Amir Goldstein
@ 2024-07-01  7:42           ` Oliver Sang
  2024-07-03  5:58             ` Amir Goldstein
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-07-01  7:42 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang

hi, Amir,

sorry for quite late.

On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Amir,
> >
> > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > >
> > > > hi, Amir,
> > > >
> > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > <oliver.sang@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > >
> > > > > >
> > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > >
> > > > >
> > > > > Jan,
> > > > >
> > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > path information on struct file_range on the stack before the optimizations
> > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > >
> > > > > Luckily, we already have the way to check
> > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > >                                                FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > code entirely.
> > > > >
> > > > > Oliver,
> > > > >
> > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > >
> > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > with pre-content events
> > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > >
> > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > pre-content permission event),
> > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > range in pre-content event).
> > > > > no need to test all middle commits.
> > > >
> > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > >
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > >
> > > > commit:
> > > >   v6.10-rc1
> > > >   a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > >
> > > >        v6.10-rc1 a82fd282befc71d99106bf31066
> > > > ---------------- ---------------------------
> > > >          %stddev     %change         %stddev
> > > >              \          |                \
> > > >  1.216e+08            -3.9%  1.168e+08        unixbench.throughput
> > > >
> > > > full data is as below [1]
> > > >
> > > >
> > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > >
> > > > it also has a small regression comparing to its parent, but better also.
> > > >
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > >
> > > > commit:
> > > >   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > >   64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > >
> > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > ---------------- ---------------------------
> > > >          %stddev     %change         %stddev
> > > >              \          |                \
> > > >  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> > > >
> > > > full data is as below [2]
> > > >
> > >
> > > Ok, this looks sane, the small overhead in the write path makes sense.
> 
> On second look, while a small regression from 64108c0b47db9 could make
> sense, because it changes the inline fsnotify hooks, the extra regression from
> the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> touch any code that affects the executed functions, so I have to wonder how
> reliable are those results.
> 
> Could you re-test the commits 94167e071109d..a82fd282befc7?

since the branch is:

* a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events  <---
* f301cd18006c3 fanotify: rename a misnamed constant
* 64108c0b47db9 fanotify: pass optional file access range in pre-content event
* 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event           <---
* 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event           <--- parent of 94167e071109d
* 83af0c89527ab fsnotify: generate pre-content permission event on exec
* aca4084213276 fsnotify: generate pre-content permission event on open
* 93656e196b006 fsnotify: introduce pre-content permission event
* 1613e604df0cd (tag: v6.10-rc1,


I made below comparison, which shows little difference among 3 commits:

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
  94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
  a82fd282befc7 fanotify: report file range info with pre-content events

68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
 1.174e+08            -0.9%  1.163e+08            -0.5%  1.168e+08        unixbench.throughput


> 
> > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > > before the rest of the pre-content infrastructure, because together they
> > > would still be a performance win.
> > >
> > > Can you please compare this branch to v6.9?
> >
> > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
> >
> 
> This is a bit surprising to me, because a5e57b4d370c should have been a pretty
> big performance win for the common case.

in our this unixbench tests, a5e57b4d370c introduce a small regression comparing
to its parent (477cf917dd028).

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  v6.9
  477cf917dd028 fsnotify: use an enum for group priority constants
  a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers
  v6.10-rc1

            v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919                   v6.10-rc1
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
 1.219e+08            +2.8%  1.253e+08            +0.4%  1.224e+08            -0.2%  1.216e+08        unixbench.throughput


BTW, for a5e57b4d370c, there is another regression report in
https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
which also includes some unixbench improvement results, but different sub-tests
on different platform.


> 
> Especially, considering that here [1] you reported in pre-merge testing that an
> identical commit has improved the fstime-r/unixbench workload
> (although with gcc-12):
> [1] https://lore.kernel.org/oe-lkp/Zfj3wxDHolB1qCGO@xsang-OptiPlex-9020/
> and here [2] that a similar commit had improved writeseek1/will-it-scale
> [2] https://lore.kernel.org/all/Zc7KmlQ1cYVrPMQ+@xsang-OptiPlex-9020/
> 
> Judging by simple_write_begin() in this regression perf report, and
> shmem_file_write_iter in the reports above, may I assume that this report
> was with a kernel with non-default config !CONFIG_SHMEM?
> Is that correct? Is this an intended config change?

we always set CONFIG_SHMEM.

> 
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-01  7:42           ` Oliver Sang
@ 2024-07-03  5:58             ` Amir Goldstein
  2024-07-03  7:21               ` Oliver Sang
  0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-07-03  5:58 UTC (permalink / raw)
  To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp

On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> sorry for quite late.
>
> On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > >
> > > hi, Amir,
> > >
> > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > >
> > > > > hi, Amir,
> > > > >
> > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > > <oliver.sang@intel.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > > >
> > > > > > >
> > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > > >
> > > > > >
> > > > > > Jan,
> > > > > >
> > > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > > path information on struct file_range on the stack before the optimizations
> > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > > >
> > > > > > Luckily, we already have the way to check
> > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > > >                                                FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > > code entirely.
> > > > > >
> > > > > > Oliver,
> > > > > >
> > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > > >
> > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > > with pre-content events
> > > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > > >
> > > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > > pre-content permission event),
> > > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > > range in pre-content event).
> > > > > > no need to test all middle commits.
> > > > >
> > > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > > >
> > > > > =========================================================================================
> > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > >
> > > > > commit:
> > > > >   v6.10-rc1
> > > > >   a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > > >
> > > > >        v6.10-rc1 a82fd282befc71d99106bf31066
> > > > > ---------------- ---------------------------
> > > > >          %stddev     %change         %stddev
> > > > >              \          |                \
> > > > >  1.216e+08            -3.9%  1.168e+08        unixbench.throughput
> > > > >
> > > > > full data is as below [1]
> > > > >
> > > > >
> > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > > >
> > > > > it also has a small regression comparing to its parent, but better also.
> > > > >
> > > > > =========================================================================================
> > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > >
> > > > > commit:
> > > > >   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > > >   64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > > >
> > > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > > ---------------- ---------------------------
> > > > >          %stddev     %change         %stddev
> > > > >              \          |                \
> > > > >  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> > > > >
> > > > > full data is as below [2]
> > > > >
> > > >
> > > > Ok, this looks sane, the small overhead in the write path makes sense.
> >
> > On second look, while a small regression from 64108c0b47db9 could make
> > sense, because it changes the inline fsnotify hooks, the extra regression from
> > the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> > touch any code that affects the executed functions, so I have to wonder how
> > reliable are those results.
> >
> > Could you re-test the commits 94167e071109d..a82fd282befc7?
>
> since the branch is:
>
> * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events  <---
> * f301cd18006c3 fanotify: rename a misnamed constant
> * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event           <---
> * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event           <--- parent of 94167e071109d
> * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> * aca4084213276 fsnotify: generate pre-content permission event on open
> * 93656e196b006 fsnotify: introduce pre-content permission event
> * 1613e604df0cd (tag: v6.10-rc1,
>
>
> I made below comparison, which shows little difference among 3 commits:
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
>   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
>   a82fd282befc7 fanotify: report file range info with pre-content events
>
> 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>  1.174e+08            -0.9%  1.163e+08            -0.5%  1.168e+08        unixbench.throughput
>
>

Hi Oliver,

Perhaps I am not reading the report right, but how do these numbers reconcile
with the previous report of regression:

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
  64108c0b47db9 ("fanotify: pass optional file access range in
pre-content event")

94167e071109d573 64108c0b47db91b20d658a89969
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
 1.163e+08            -2.4%  1.135e+08        unixbench.throughput

Is this a case of unstable results? something else?

> >
> > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > > > before the rest of the pre-content infrastructure, because together they
> > > > would still be a performance win.
> > > >
> > > > Can you please compare this branch to v6.9?
> > >
> > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
> > >
> >
> > This is a bit surprising to me, because a5e57b4d370c should have been a pretty
> > big performance win for the common case.
>
> in our this unixbench tests, a5e57b4d370c introduce a small regression comparing
> to its parent (477cf917dd028).
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   v6.9
>   477cf917dd028 fsnotify: use an enum for group priority constants
>   a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers
>   v6.10-rc1
>
>             v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919                   v6.10-rc1
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \
>  1.219e+08            +2.8%  1.253e+08            +0.4%  1.224e+08            -0.2%  1.216e+08        unixbench.throughput
>

Assuming this is a stable result,
that's very small regression compared to the improvements before it
and one that I dare to call acceptable for this micro buffered write benchmark
because of the big gain in other workloads.

>
> BTW, for a5e57b4d370c, there is another regression report in
> https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
> which also includes some unixbench improvement results, but different sub-tests
> on different platform.
>

Right. I forgot about this one.
Sorry for dropping the ball.
I do not know what is going on there.
I will try to take a look again.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-03  5:58             ` Amir Goldstein
@ 2024-07-03  7:21               ` Oliver Sang
  2024-07-03 16:20                 ` Amir Goldstein
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-07-03  7:21 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp

hi, Amir,

On Wed, Jul 03, 2024 at 08:58:13AM +0300, Amir Goldstein wrote:
> On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Amir,
> >
> > sorry for quite late.
> >
> > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > >
> > > > hi, Amir,
> > > >
> > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > > >
> > > > > > hi, Amir,
> > > > > >
> > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > > > <oliver.sang@intel.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > > > >
> > > > > > > >
> > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > > > >
> > > > > > >
> > > > > > > Jan,
> > > > > > >
> > > > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > > > path information on struct file_range on the stack before the optimizations
> > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > > > >
> > > > > > > Luckily, we already have the way to check
> > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > > > >                                                FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > > > code entirely.
> > > > > > >
> > > > > > > Oliver,
> > > > > > >
> > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > > > >
> > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > > > with pre-content events
> > > > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > > > >
> > > > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > > > pre-content permission event),
> > > > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > > > range in pre-content event).
> > > > > > > no need to test all middle commits.
> > > > > >
> > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > > > >
> > > > > > =========================================================================================
> > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > >
> > > > > > commit:
> > > > > >   v6.10-rc1
> > > > > >   a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > > > >
> > > > > >        v6.10-rc1 a82fd282befc71d99106bf31066
> > > > > > ---------------- ---------------------------
> > > > > >          %stddev     %change         %stddev
> > > > > >              \          |                \
> > > > > >  1.216e+08            -3.9%  1.168e+08        unixbench.throughput
> > > > > >
> > > > > > full data is as below [1]
> > > > > >
> > > > > >
> > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > > > >
> > > > > > it also has a small regression comparing to its parent, but better also.
> > > > > >
> > > > > > =========================================================================================
> > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > >
> > > > > > commit:
> > > > > >   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > > > >   64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > > > >
> > > > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > > > ---------------- ---------------------------
> > > > > >          %stddev     %change         %stddev
> > > > > >              \          |                \
> > > > > >  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> > > > > >
> > > > > > full data is as below [2]
> > > > > >
> > > > >
> > > > > Ok, this looks sane, the small overhead in the write path makes sense.
> > >
> > > On second look, while a small regression from 64108c0b47db9 could make
> > > sense, because it changes the inline fsnotify hooks, the extra regression from
> > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> > > touch any code that affects the executed functions, so I have to wonder how
> > > reliable are those results.
> > >
> > > Could you re-test the commits 94167e071109d..a82fd282befc7?
> >
> > since the branch is:
> >
> > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events  <---
> > * f301cd18006c3 fanotify: rename a misnamed constant
> > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event           <---
> > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event           <--- parent of 94167e071109d
> > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > * aca4084213276 fsnotify: generate pre-content permission event on open
> > * 93656e196b006 fsnotify: introduce pre-content permission event
> > * 1613e604df0cd (tag: v6.10-rc1,
> >
> >
> > I made below comparison, which shows little difference among 3 commits:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> >   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> >   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> >   a82fd282befc7 fanotify: report file range info with pre-content events
> >
> > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
> > ---------------- --------------------------- ---------------------------
> >          %stddev     %change         %stddev     %change         %stddev
> >              \          |                \          |                \
> >  1.174e+08            -0.9%  1.163e+08            -0.5%  1.168e+08        unixbench.throughput
> >
> >
> 
> Hi Oliver,
> 
> Perhaps I am not reading the report right, but how do these numbers reconcile
> with the previous report of regression:
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> 
> commit:
>   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
>   64108c0b47db9 ("fanotify: pass optional file access range in
> pre-content event")
> 
> 94167e071109d573 64108c0b47db91b20d658a89969
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> 
> Is this a case of unstable results? something else?

you could see the data for 94167e071109d are 1.163e+08 in both table.

the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
  "unixbench.throughput": [
    121545292.8,
    121629889.4,
    121598992.0,
    121492095.5,
    121645038.1,
    121556286.9
  ],

for the branch tip a82fd282befc7:
  "unixbench.throughput": [
    116675606.7,
    116840611.2,
    116738966.0,
    116956953.1,
    116704901.9,
    116997628.3,
    117141733.7,
    116660495.4
  ],


let me combine the results from this branch together:

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  v6.10-rc1
  68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
  94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
  64108c0b47db9 fanotify: pass optional file access range in pre-content event
  a82fd282befc7 fanotify: report file range info with pre-content

       v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
---------------- --------------------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \          |                \
 1.216e+08            -3.5%  1.174e+08            -4.3%  1.163e+08            -6.6%  1.135e+08            -3.9%  1.168e+08        unixbench.throughput


one thing I want to mention is the "%change" is always comparing to the first
column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
and so on.

then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
-2.4% regression compareing to 94167e071109d.

from above table, along the branch, the performance is kind of fluctuating,
dropped most on 64108c0b47db9, but then recovered a little on tip.

our bot will not bisect the improvment between 64108c0b47db9 and the tip, since
the whole branch show a drop.

> 
> > >
> > > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > > > > before the rest of the pre-content infrastructure, because together they
> > > > > would still be a performance win.
> > > > >
> > > > > Can you please compare this branch to v6.9?
> > > >
> > > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
> > > >
> > >
> > > This is a bit surprising to me, because a5e57b4d370c should have been a pretty
> > > big performance win for the common case.
> >
> > in our this unixbench tests, a5e57b4d370c introduce a small regression comparing
> > to its parent (477cf917dd028).
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> >   v6.9
> >   477cf917dd028 fsnotify: use an enum for group priority constants
> >   a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers
> >   v6.10-rc1
> >
> >             v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919                   v6.10-rc1
> > ---------------- --------------------------- --------------------------- ---------------------------
> >          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
> >              \          |                \          |                \          |                \
> >  1.219e+08            +2.8%  1.253e+08            +0.4%  1.224e+08            -0.2%  1.216e+08        unixbench.throughput
> >
> 
> Assuming this is a stable result,
> that's very small regression compared to the improvements before it
> and one that I dare to call acceptable for this micro buffered write benchmark
> because of the big gain in other workloads.

again, all data here is comparing to v6.9, so there is a 2.8% improvement on
477cf917dd028 comparing to v6.9, but it drops back on a5e57b4d370c6, whose
data is almost same with v6.9 (so +0.4% comparing to v6.9).

anyway, we normally ignore <1% performance changes, so we won't say
a5e57b4d370c6 or v6.10-rc1 has obvious performance changes comparing to v6.9.

> 
> >
> > BTW, for a5e57b4d370c, there is another regression report in
> > https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
> > which also includes some unixbench improvement results, but different sub-tests
> > on different platform.
> >
> 
> Right. I forgot about this one.
> Sorry for dropping the ball.
> I do not know what is going on there.
> I will try to take a look again.
> 
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-03  7:21               ` Oliver Sang
@ 2024-07-03 16:20                 ` Amir Goldstein
  2024-07-04 15:39                   ` Jan Kara
  2024-07-05  2:09                   ` Oliver Sang
  0 siblings, 2 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-07-03 16:20 UTC (permalink / raw)
  To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp

On Wed, Jul 3, 2024 at 10:21 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Wed, Jul 03, 2024 at 08:58:13AM +0300, Amir Goldstein wrote:
> > On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > >
> > > hi, Amir,
> > >
> > > sorry for quite late.
> > >
> > > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> > > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > >
> > > > > hi, Amir,
> > > > >
> > > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > > > >
> > > > > > > hi, Amir,
> > > > > > >
> > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > > > > <oliver.sang@intel.com> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > > > > >
> > > > > > > >
> > > > > > > > Jan,
> > > > > > > >
> > > > > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > > > > path information on struct file_range on the stack before the optimizations
> > > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > > > > >
> > > > > > > > Luckily, we already have the way to check
> > > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > > > > >                                                FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > > > > code entirely.
> > > > > > > >
> > > > > > > > Oliver,
> > > > > > > >
> > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > > > > >
> > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > > > > with pre-content events
> > > > > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > > > > >
> > > > > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > > > > pre-content permission event),
> > > > > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > > > > range in pre-content event).
> > > > > > > > no need to test all middle commits.
> > > > > > >
> > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > > > > >
> > > > > > > =========================================================================================
> > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > > >
> > > > > > > commit:
> > > > > > >   v6.10-rc1
> > > > > > >   a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > > > > >
> > > > > > >        v6.10-rc1 a82fd282befc71d99106bf31066
> > > > > > > ---------------- ---------------------------
> > > > > > >          %stddev     %change         %stddev
> > > > > > >              \          |                \
> > > > > > >  1.216e+08            -3.9%  1.168e+08        unixbench.throughput
> > > > > > >
> > > > > > > full data is as below [1]
> > > > > > >
> > > > > > >
> > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > > > > >
> > > > > > > it also has a small regression comparing to its parent, but better also.
> > > > > > >
> > > > > > > =========================================================================================
> > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > > >
> > > > > > > commit:
> > > > > > >   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > > > > >   64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > > > > >
> > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > > > > ---------------- ---------------------------
> > > > > > >          %stddev     %change         %stddev
> > > > > > >              \          |                \
> > > > > > >  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> > > > > > >
> > > > > > > full data is as below [2]
> > > > > > >
> > > > > >
> > > > > > Ok, this looks sane, the small overhead in the write path makes sense.
> > > >
> > > > On second look, while a small regression from 64108c0b47db9 could make
> > > > sense, because it changes the inline fsnotify hooks, the extra regression from
> > > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> > > > touch any code that affects the executed functions, so I have to wonder how
> > > > reliable are those results.
> > > >
> > > > Could you re-test the commits 94167e071109d..a82fd282befc7?
> > >
> > > since the branch is:
> > >
> > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events  <---
> > > * f301cd18006c3 fanotify: rename a misnamed constant
> > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event           <---
> > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event           <--- parent of 94167e071109d
> > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > > * aca4084213276 fsnotify: generate pre-content permission event on open
> > > * 93656e196b006 fsnotify: introduce pre-content permission event
> > > * 1613e604df0cd (tag: v6.10-rc1,
> > >
> > >
> > > I made below comparison, which shows little difference among 3 commits:
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > >   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > >   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > >   a82fd282befc7 fanotify: report file range info with pre-content events
> > >
> > > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
> > > ---------------- --------------------------- ---------------------------
> > >          %stddev     %change         %stddev     %change         %stddev
> > >              \          |                \          |                \
> > >  1.174e+08            -0.9%  1.163e+08            -0.5%  1.168e+08        unixbench.throughput
> > >
> > >
> >
> > Hi Oliver,
> >
> > Perhaps I am not reading the report right, but how do these numbers reconcile
> > with the previous report of regression:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> >   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> >   64108c0b47db9 ("fanotify: pass optional file access range in
> > pre-content event")
> >
> > 94167e071109d573 64108c0b47db91b20d658a89969
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> >
> > Is this a case of unstable results? something else?
>
> you could see the data for 94167e071109d are 1.163e+08 in both table.
>
> the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
>   "unixbench.throughput": [
>     121545292.8,
>     121629889.4,
>     121598992.0,
>     121492095.5,
>     121645038.1,
>     121556286.9
>   ],
>

Are all those runs from the same boot?

> for the branch tip a82fd282befc7:
>   "unixbench.throughput": [
>     116675606.7,
>     116840611.2,
>     116738966.0,
>     116956953.1,
>     116704901.9,
>     116997628.3,
>     117141733.7,
>     116660495.4
>   ],
>

And these run?

Otherwise, we might have a fluctuation that happens at boot time
or at mount time or something.

>
> let me combine the results from this branch together:
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   v6.10-rc1
>   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
>   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
>   64108c0b47db9 fanotify: pass optional file access range in pre-content event
>   a82fd282befc7 fanotify: report file range info with pre-content
>
>        v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \          |                \
>  1.216e+08            -3.5%  1.174e+08            -4.3%  1.163e+08            -6.6%  1.135e+08            -3.9%  1.168e+08        unixbench.throughput
>
>
> one thing I want to mention is the "%change" is always comparing to the first
> column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> and so on.

Thanks for clarifying - I did not read it this way.

>
> then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> -2.4% regression compareing to 94167e071109d.
>
> from above table, along the branch, the performance is kind of fluctuating,
> dropped most on 64108c0b47db9, but then recovered a little on tip.
>

I can understand why 64108c0b47db91b would regress performance, but I
cannot think
of any possible explanation why a82fd282befc should improve performance,
so I have to wonder if the regression to -6.6% is not a fluke of some
specific boot/mount?

I pushed a test branch to
https://github.com/amir73il/linux/commits/fsnotify_for_lkp
with an extra patch that un-inlines some helpers to help bisect the
perf report better.
Maybe produce the report with this commit and it sheds some light.

Jan, any other ideas?

> our bot will not bisect the improvment between 64108c0b47db9 and the tip, since
> the whole branch show a drop.
>
> >
> > > >
> > > > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > > > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > > > > > before the rest of the pre-content infrastructure, because together they
> > > > > > would still be a performance win.
> > > > > >
> > > > > > Can you please compare this branch to v6.9?
> > > > >
> > > > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
> > > > >
> > > >
> > > > This is a bit surprising to me, because a5e57b4d370c should have been a pretty
> > > > big performance win for the common case.
> > >
> > > in our this unixbench tests, a5e57b4d370c introduce a small regression comparing
> > > to its parent (477cf917dd028).
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > >   v6.9
> > >   477cf917dd028 fsnotify: use an enum for group priority constants
> > >   a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers
> > >   v6.10-rc1
> > >
> > >             v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919                   v6.10-rc1
> > > ---------------- --------------------------- --------------------------- ---------------------------
> > >          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
> > >              \          |                \          |                \          |                \
> > >  1.219e+08            +2.8%  1.253e+08            +0.4%  1.224e+08            -0.2%  1.216e+08        unixbench.throughput
> > >
> >
> > Assuming this is a stable result,
> > that's very small regression compared to the improvements before it
> > and one that I dare to call acceptable for this micro buffered write benchmark
> > because of the big gain in other workloads.
>
> again, all data here is comparing to v6.9, so there is a 2.8% improvement on
> 477cf917dd028 comparing to v6.9, but it drops back on a5e57b4d370c6, whose
> data is almost same with v6.9 (so +0.4% comparing to v6.9).
>
> anyway, we normally ignore <1% performance changes, so we won't say
> a5e57b4d370c6 or v6.10-rc1 has obvious performance changes comparing to v6.9.
>

This fluctuation is also hard to explain.

Jan, any thoughts? things to try?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-03 16:20                 ` Amir Goldstein
@ 2024-07-04 15:39                   ` Jan Kara
  2024-07-05  2:09                   ` Oliver Sang
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Kara @ 2024-07-04 15:39 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Oliver Sang, Jan Kara, oe-lkp, lkp

On Wed 03-07-24 19:20:49, Amir Goldstein wrote:
> On Wed, Jul 3, 2024 at 10:21 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > On Wed, Jul 03, 2024 at 08:58:13AM +0300, Amir Goldstein wrote:
> > > On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> > > > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > > > > > <oliver.sang@intel.com> wrote:
> > > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jan,
> > > > > > > > >
> > > > > > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > > > > > path information on struct file_range on the stack before the optimizations
> > > > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > > > > > >
> > > > > > > > > Luckily, we already have the way to check
> > > > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > > > > > >                                                FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > > > > > code entirely.
> > > > > > > > >
> > > > > > > > > Oliver,
> > > > > > > > >
> > > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > > > > > >
> > > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > > > > > with pre-content events
> > > > > > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > > > > > >
> > > > > > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > > > > > pre-content permission event),
> > > > > > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > > > > > range in pre-content event).
> > > > > > > > > no need to test all middle commits.
> > > > > > > >
> > > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > > > > > >
> > > > > > > > =========================================================================================
> > > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > > > >
> > > > > > > > commit:
> > > > > > > >   v6.10-rc1
> > > > > > > >   a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > > > > > >
> > > > > > > >        v6.10-rc1 a82fd282befc71d99106bf31066
> > > > > > > > ---------------- ---------------------------
> > > > > > > >          %stddev     %change         %stddev
> > > > > > > >              \          |                \
> > > > > > > >  1.216e+08            -3.9%  1.168e+08        unixbench.throughput
> > > > > > > >
> > > > > > > > full data is as below [1]
> > > > > > > >
> > > > > > > >
> > > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > > > > > >
> > > > > > > > it also has a small regression comparing to its parent, but better also.
> > > > > > > >
> > > > > > > > =========================================================================================
> > > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > > > >
> > > > > > > > commit:
> > > > > > > >   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > > > > > >   64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > >
> > > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > > > > > ---------------- ---------------------------
> > > > > > > >          %stddev     %change         %stddev
> > > > > > > >              \          |                \
> > > > > > > >  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> > > > > > > >
> > > > > > > > full data is as below [2]
> > > > > > > >
> > > > > > >
> > > > > > > Ok, this looks sane, the small overhead in the write path makes sense.
> > > > >
> > > > > On second look, while a small regression from 64108c0b47db9 could make
> > > > > sense, because it changes the inline fsnotify hooks, the extra regression from
> > > > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> > > > > touch any code that affects the executed functions, so I have to wonder how
> > > > > reliable are those results.
> > > > >
> > > > > Could you re-test the commits 94167e071109d..a82fd282befc7?
> > > >
> > > > since the branch is:
> > > >
> > > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events  <---
> > > > * f301cd18006c3 fanotify: rename a misnamed constant
> > > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event           <---
> > > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event           <--- parent of 94167e071109d
> > > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > > > * aca4084213276 fsnotify: generate pre-content permission event on open
> > > > * 93656e196b006 fsnotify: introduce pre-content permission event
> > > > * 1613e604df0cd (tag: v6.10-rc1,
> > > >
> > > >
> > > > I made below comparison, which shows little difference among 3 commits:
> > > >
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > >
> > > > commit:
> > > >   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > >   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > >   a82fd282befc7 fanotify: report file range info with pre-content events
> > > >
> > > > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
> > > > ---------------- --------------------------- ---------------------------
> > > >          %stddev     %change         %stddev     %change         %stddev
> > > >              \          |                \          |                \
> > > >  1.174e+08            -0.9%  1.163e+08            -0.5%  1.168e+08        unixbench.throughput
> > > >
> > > >
> > >
> > > Hi Oliver,
> > >
> > > Perhaps I am not reading the report right, but how do these numbers reconcile
> > > with the previous report of regression:
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > >   94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > >   64108c0b47db9 ("fanotify: pass optional file access range in
> > > pre-content event")
> > >
> > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > ---------------- ---------------------------
> > >          %stddev     %change         %stddev
> > >              \          |                \
> > >  1.163e+08            -2.4%  1.135e+08        unixbench.throughput
> > >
> > > Is this a case of unstable results? something else?
> >
> > you could see the data for 94167e071109d are 1.163e+08 in both table.
> >
> > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> >   "unixbench.throughput": [
> >     121545292.8,
> >     121629889.4,
> >     121598992.0,
> >     121492095.5,
> >     121645038.1,
> >     121556286.9
> >   ],
> >
> 
> Are all those runs from the same boot?
> 
> > for the branch tip a82fd282befc7:
> >   "unixbench.throughput": [
> >     116675606.7,
> >     116840611.2,
> >     116738966.0,
> >     116956953.1,
> >     116704901.9,
> >     116997628.3,
> >     117141733.7,
> >     116660495.4
> >   ],
> >
> 
> And these run?
> 
> Otherwise, we might have a fluctuation that happens at boot time
> or at mount time or something.

So what I suspect is that the fluctuation actually happens "per compile
time". Depending on how exactly some hot paths get aligned in the compiled
kernel binary wrt CPU cachelines or similar, you get differences in
performance. I've seen that happening quite a few times in the past and the
observed differences are well in that range.

> > let me combine the results from this branch together:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> >   v6.10-rc1
> >   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> >   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> >   64108c0b47db9 fanotify: pass optional file access range in pre-content event
> >   a82fd282befc7 fanotify: report file range info with pre-content
> >
> >        v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> >          %stddev     %change         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
> >              \          |                \          |                \          |                \          |                \
> >  1.216e+08            -3.5%  1.174e+08            -4.3%  1.163e+08            -6.6%  1.135e+08            -3.9%  1.168e+08        unixbench.throughput
> >
> >
> > one thing I want to mention is the "%change" is always comparing to the first
> > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > and so on.
> 
> Thanks for clarifying - I did not read it this way.
> 
> >
> > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > -2.4% regression compareing to 94167e071109d.
> >
> > from above table, along the branch, the performance is kind of fluctuating,
> > dropped most on 64108c0b47db9, but then recovered a little on tip.
> >
> 
> I can understand why 64108c0b47db91b would regress performance, but I
> cannot think of any possible explanation why a82fd282befc should improve
> performance, so I have to wonder if the regression to -6.6% is not a
> fluke of some specific boot/mount?

I agree. In my opinion at least some of those changes are not related to
code changes but rather to random code alignment changes.

> I pushed a test branch to
> https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> with an extra patch that un-inlines some helpers to help bisect the
> perf report better.
> Maybe produce the report with this commit and it sheds some light.
> 
> Jan, any other ideas?

Not really. These alignment induced fluctuations are annoying but I don't
know of a good way to avoid them. Even narrowing them down is tedious as
the changes on this scale are not easy to see in the profiles. So I'd check
the perf profiles and if we don't see any obvious regression in the changed
places, I'd just ignore the regression...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-03 16:20                 ` Amir Goldstein
  2024-07-04 15:39                   ` Jan Kara
@ 2024-07-05  2:09                   ` Oliver Sang
  2024-07-05  5:48                     ` Amir Goldstein
  1 sibling, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-07-05  2:09 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang

hi, Amir,

On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote:

[...]

> > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> >   "unixbench.throughput": [
> >     121545292.8,
> >     121629889.4,
> >     121598992.0,
> >     121492095.5,
> >     121645038.1,
> >     121556286.9
> >   ],
> >
> 
> Are all those runs from the same boot?

no. we reboot machine before each run.

> 
> > for the branch tip a82fd282befc7:
> >   "unixbench.throughput": [
> >     116675606.7,
> >     116840611.2,
> >     116738966.0,
> >     116956953.1,
> >     116704901.9,
> >     116997628.3,
> >     117141733.7,
> >     116660495.4
> >   ],
> >
> 
> And these run?

same.

> 
> Otherwise, we might have a fluctuation that happens at boot time
> or at mount time or something.
> 
> >
> > let me combine the results from this branch together:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> >   v6.10-rc1
> >   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> >   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> >   64108c0b47db9 fanotify: pass optional file access range in pre-content event
> >   a82fd282befc7 fanotify: report file range info with pre-content
> >
> >        v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> >          %stddev     %change         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
> >              \          |                \          |                \          |                \          |                \
> >  1.216e+08            -3.5%  1.174e+08            -4.3%  1.163e+08            -6.6%  1.135e+08            -3.9%  1.168e+08        unixbench.throughput
> >
> >
> > one thing I want to mention is the "%change" is always comparing to the first
> > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > and so on.
> 
> Thanks for clarifying - I did not read it this way.
> 
> >
> > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > -2.4% regression compareing to 94167e071109d.
> >
> > from above table, along the branch, the performance is kind of fluctuating,
> > dropped most on 64108c0b47db9, but then recovered a little on tip.
> >
> 
> I can understand why 64108c0b47db91b would regress performance, but I
> cannot think
> of any possible explanation why a82fd282befc should improve performance,
> so I have to wonder if the regression to -6.6% is not a fluke of some
> specific boot/mount?
> 
> I pushed a test branch to
> https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> with an extra patch that un-inlines some helpers to help bisect the
> perf report better.
> Maybe produce the report with this commit and it sheds some light.

since

* 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
* a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
* f301cd18006c3 fanotify: rename a misnamed constant
* 64108c0b47db9 fanotify: pass optional file access range in pre-content event
* 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
* 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
* 83af0c89527ab fsnotify: generate pre-content permission event on exec
* aca4084213276 fsnotify: generate pre-content permission event on open
* 93656e196b006 fsnotify: introduce pre-content permission event
* 1613e604df0cd (tag: v6.10-rc1,

we run tests upon new commit. summary report is as below:

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  v6.10-rc1
  a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
  388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers

       v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
 1.216e+08            -3.9%  1.168e+08            -4.1%  1.166e+08        unixbench.throughput


since Jan mentioned in a later mail that perf profiles are useful, I put details
as below

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  v6.10-rc1
  a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
  388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers

       v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
      9.50 ± 24%     +55.3%      14.75 ± 24%     +53.9%      14.62 ± 24%  perf-c2c.DRAM.local
    261842            +0.6%     263433            +1.5%     265837        time.involuntary_context_switches
      6199            -5.7%       5843            -2.3%       6054        time.user_time
 1.216e+08            -3.9%  1.168e+08            -4.1%  1.166e+08        unixbench.throughput
    261842            +0.6%     263433            +1.5%     265837        unixbench.time.involuntary_context_switches
      6199            -5.7%       5843            -2.3%       6054        unixbench.time.user_time
 4.513e+10            -3.8%  4.339e+10            -4.0%  4.331e+10        unixbench.workload
    167317 ±  5%     -39.3%     101518 ± 47%     -30.5%     116276 ± 39%  numa-vmstat.node1.nr_anon_pages
    112.92 ± 28%     -53.9%      52.06 ± 99%     -39.8%      67.98 ± 64%  numa-vmstat.node1.nr_anon_transparent_hugepages
     78069          +506.3%     473340 ± 77%    +495.2%     464673 ± 77%  numa-vmstat.node1.nr_file_pages
    167460 ±  5%     -39.3%     101568 ± 47%     -30.5%     116461 ± 39%  numa-vmstat.node1.nr_inactive_anon
     10649 ±  6%   +3703.3%     405022 ± 90%   +3625.1%     396698 ± 90%  numa-vmstat.node1.nr_unevictable
    167460 ±  5%     -39.3%     101568 ± 47%     -30.5%     116461 ± 39%  numa-vmstat.node1.nr_zone_inactive_anon
     10649 ±  6%   +3703.4%     405022 ± 90%   +3625.2%     396698 ± 90%  numa-vmstat.node1.nr_zone_unevictable
     15473 ±125%    +567.1%     103220 ± 85%    +366.5%      72185 ±110%  numa-meminfo.node0.AnonHugePages
    220234 ± 13%    +116.8%     477532 ± 37%    +101.8%     444469 ± 42%  numa-meminfo.node0.AnonPages.max
    231368 ± 28%     -53.9%     106616 ± 98%     -39.8%     139307 ± 64%  numa-meminfo.node1.AnonHugePages
    668949 ±  5%     -39.3%     405873 ± 47%     -30.5%     464919 ± 39%  numa-meminfo.node1.AnonPages
    836146 ±  6%     -34.5%     547503 ± 38%     -28.1%     601279 ± 34%  numa-meminfo.node1.AnonPages.max
    312276          +506.3%    1893321 ± 77%    +495.2%    1858788 ± 77%  numa-meminfo.node1.FilePages
    669489 ±  5%     -39.3%     406110 ± 47%     -30.4%     465687 ± 39%  numa-meminfo.node1.Inactive
    669489 ±  5%     -39.4%     406010 ± 47%     -30.5%     465628 ± 39%  numa-meminfo.node1.Inactive(anon)
     42552 ±  6%   +3707.3%    1620116 ± 90%   +3628.9%    1586760 ± 90%  numa-meminfo.node1.Unevictable
 1.458e+11            -2.7%  1.419e+11            -3.8%  1.402e+11        perf-stat.i.branch-instructions
   7347431            -1.3%    7251270 ±  2%      -2.8%    7140090        perf-stat.i.branch-misses
     11.47 ±  6%      +2.8       14.29 ±  9%      +2.5       13.99 ±  6%  perf-stat.i.cache-miss-rate%
   3915539 ±  8%    +513.8%   24032895 ± 10%    +500.6%   23516538 ±  7%  perf-stat.i.cache-misses
  32425619 ±  3%    +391.9%  1.595e+08 ±  5%    +388.7%  1.585e+08 ±  4%  perf-stat.i.cache-references
      2196            +0.4%       2206            +2.4%       2249        perf-stat.i.context-switches
    151202 ± 16%     -77.0%      34851 ± 59%     -75.9%      36442 ± 38%  perf-stat.i.cycles-between-cache-misses
 6.961e+11            -1.9%  6.829e+11            -3.4%  6.724e+11        perf-stat.i.instructions
      1.22            -1.3%       1.20            -2.5%       1.19        perf-stat.i.ipc
      0.01 ±  9%    +523.5%       0.04 ± 11%    +518.2%       0.03 ±  7%  perf-stat.overall.MPKI
     12.09 ±  6%      +3.0       15.08 ±  8%      +2.7       14.83 ±  5%  perf-stat.overall.cache-miss-rate%
      0.75            +2.0%       0.77            +2.6%       0.77        perf-stat.overall.cpi
    133775 ±  8%     -83.6%      21976 ± 11%     -83.4%      22156 ±  7%  perf-stat.overall.cycles-between-cache-misses
      1.33            -1.9%       1.31            -2.5%       1.30        perf-stat.overall.ipc
      5721            +2.0%       5835            +1.8%       5821        perf-stat.overall.path-length
 1.452e+11            -2.7%  1.413e+11            -3.9%  1.395e+11        perf-stat.ps.branch-instructions
   7332734            -1.3%    7238853 ±  3%      -2.9%    7119552        perf-stat.ps.branch-misses
   3921138 ±  8%    +511.4%   23972111 ± 11%    +496.7%   23398300 ±  7%  perf-stat.ps.cache-misses
  32415461 ±  3%    +389.9%  1.588e+08 ±  5%    +386.6%  1.577e+08 ±  4%  perf-stat.ps.cache-references
      2192            +0.4%       2201            +2.3%       2243        perf-stat.ps.context-switches
 6.932e+11            -1.9%  6.798e+11            -3.5%  6.691e+11        perf-stat.ps.instructions
 2.582e+14            -1.9%  2.532e+14            -2.3%  2.521e+14        perf-stat.total.instructions
     13.19            -0.7       12.50            -0.4       12.75        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
      7.01            -0.2        6.80            -0.1        6.88        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      1.11            -0.2        0.91            -0.0        1.11 ±  2%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
      2.50            -0.2        2.35            +0.1        2.58        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
      2.81            -0.1        2.71 ±  2%      +0.1        2.94        perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      1.68            -0.1        1.59            -0.0        1.64        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      3.73            -0.1        3.64            +0.0        3.76        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.62            -0.1        1.55            -0.1        1.55        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      2.18            -0.1        2.12            -0.1        2.13        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
      3.34            -0.1        3.28            +0.2        3.54        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
      0.65            -0.1        0.60 ±  2%      +0.0        0.68        perf-profile.calltrace.cycles-pp.w_test
      0.92            -0.0        0.87            -0.0        0.88        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.70            -0.0        0.66            -0.0        0.69        perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
      0.86            -0.0        0.82            -0.0        0.84        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
      0.92            -0.0        0.88            -0.0        0.88        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.63            -0.0        0.59            -0.0        0.59        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.86            -0.0        0.83            -0.0        0.81        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      3.53            -0.0        3.50            -0.2        3.37        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
      0.68            -0.0        0.66            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.53            -0.0        0.51            -0.0        0.50        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write
      4.23            -0.0        4.22            -0.2        4.05        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
      0.72            -0.0        0.71            -0.0        0.71        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.62            -0.0        0.61            -0.0        0.59        perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      2.71            +0.0        2.71            -0.1        2.58        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
      0.89            +0.0        0.89 ±  2%      +0.1        1.00        perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
      0.96            +0.0        0.98            -0.1        0.90        perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
      0.75            +0.0        0.77            +0.0        0.75        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      6.88            +0.0        6.91            -0.2        6.67        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
      1.13            +0.0        1.17            -0.1        1.08        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      1.92            +0.0        1.96            +0.3        2.18        perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      5.30            +0.1        5.36            +0.0        5.32        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      2.12 ±  3%      +0.1        2.18            +0.1        2.25 ±  2%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      5.30            +0.1        5.39            -0.4        4.92        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
      6.17            +0.1        6.27            -0.5        5.66        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     96.84            +0.1       96.98            -0.0       96.81        perf-profile.calltrace.cycles-pp.write
      0.78 ±  2%      +0.3        1.12 ±  7%      +0.0        0.80 ±  3%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
      2.97            +0.6        3.57            -0.2        2.81        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
      3.63            +0.6        4.24            -0.2        3.42        perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
     12.01            +0.6       12.63            -0.5       11.54        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      4.32            +0.6        4.95            +0.8        5.12        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     37.28            +0.8       38.12            +0.0       37.32        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
     84.26            +0.9       85.20            +0.4       84.65        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
     13.39            +1.0       14.36            +1.0       14.35        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     12.30            +1.0       13.30            +1.0       13.33        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
     82.83            +1.0       83.84            +0.5       83.28        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     57.94            +1.3       59.20            -0.0       57.92        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.99            +1.3        7.25 ±  2%      +1.3        7.27        perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
     78.13            +1.3       79.39            +0.7       78.83        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     74.26            +1.3       75.58            +0.6       74.90        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      7.43            -0.4        7.06            -0.2        7.19        perf-profile.children.cycles-pp.entry_SYSCALL_64
      4.42            -0.2        4.18            -0.2        4.25        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.21            -0.2        1.01            -0.0        1.19        perf-profile.children.cycles-pp.syscall_return_via_sysret
      7.14            -0.2        6.95            -0.1        7.02        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
      4.18            -0.2        4.00            -0.2        4.03        perf-profile.children.cycles-pp.__cond_resched
      2.74            -0.2        2.58            +0.1        2.82        perf-profile.children.cycles-pp.apparmor_file_permission
      2.42            -0.1        2.30            -0.1        2.31        perf-profile.children.cycles-pp.rcu_all_qs
      3.82            -0.1        3.72            +0.0        3.84        perf-profile.children.cycles-pp.__fsnotify_parent
      2.98            -0.1        2.88 ±  2%      +0.1        3.12        perf-profile.children.cycles-pp.down_write
      1.74            -0.1        1.65            -0.0        1.70        perf-profile.children.cycles-pp.up_write
      1.99            -0.1        1.89            -0.1        1.90        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      3.71            -0.1        3.63            +0.2        3.90        perf-profile.children.cycles-pp.security_file_permission
      0.99            -0.1        0.91            +0.0        1.02        perf-profile.children.cycles-pp.w_test
      2.47            -0.1        2.40            -0.1        2.42        perf-profile.children.cycles-pp.xas_load
      1.12            -0.1        1.06            -0.0        1.08        perf-profile.children.cycles-pp.folio_wait_stable
      1.26            -0.1        1.20            -0.1        1.21        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.75            -0.0        0.71            -0.0        0.71        perf-profile.children.cycles-pp.x64_sys_call
      0.98            -0.0        0.94            -0.0        0.95        perf-profile.children.cycles-pp.aa_file_perm
      0.46            -0.0        0.42            +0.0        0.46        perf-profile.children.cycles-pp.write@plt
      1.10            -0.0        1.06            -0.1        1.04        perf-profile.children.cycles-pp.xattr_resolve_name
      0.36            -0.0        0.33            -0.0        0.35        perf-profile.children.cycles-pp.amd_clear_divider
      3.76            -0.0        3.73            -0.2        3.59        perf-profile.children.cycles-pp.cap_inode_need_killpriv
      0.59            -0.0        0.56            -0.0        0.57        perf-profile.children.cycles-pp.inode_to_bdi
      0.38            -0.0        0.35            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
      1.05            -0.0        1.03            -0.0        1.03        perf-profile.children.cycles-pp.folio_mapping
      3.41            -0.0        3.38            -0.2        3.25        perf-profile.children.cycles-pp.__vfs_getxattr
      0.56            -0.0        0.53            -0.0        0.54        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      4.47            -0.0        4.45            -0.2        4.28        perf-profile.children.cycles-pp.security_inode_need_killpriv
      0.25            -0.0        0.24 ±  2%      -0.0        0.24        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
      0.36            -0.0        0.35            -0.0        0.35        perf-profile.children.cycles-pp.is_bad_inode
      0.64            -0.0        0.63            -0.0        0.61        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      1.00            +0.0        1.00 ±  2%      +0.1        1.12        perf-profile.children.cycles-pp.generic_write_check_limits
      1.38            +0.0        1.40            -0.1        1.31        perf-profile.children.cycles-pp.strcmp
      0.93            +0.0        0.95            -0.0        0.93        perf-profile.children.cycles-pp.folio_mark_dirty
      1.07            +0.0        1.09            -0.1        1.00        perf-profile.children.cycles-pp.timestamp_truncate
      7.23            +0.0        7.26            -0.2        7.01        perf-profile.children.cycles-pp.file_remove_privs_flags
      2.25            +0.0        2.29            +0.3        2.53        perf-profile.children.cycles-pp.generic_write_checks
      5.70            +0.0        5.74            +0.0        5.70        perf-profile.children.cycles-pp.simple_write_end
      2.24 ±  3%      +0.1        2.30            +0.1        2.37 ±  2%  perf-profile.children.cycles-pp.__fdget_pos
     98.96            +0.1       99.02            -0.0       98.92        perf-profile.children.cycles-pp.write
      5.69            +0.1        5.76            -0.5        5.22        perf-profile.children.cycles-pp.fault_in_readable
      6.42            +0.1        6.53            -0.5        5.89        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
      0.89            +0.3        1.23 ±  6%      +0.0        0.91 ±  2%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
      3.39            +0.6        3.97            -0.2        3.20        perf-profile.children.cycles-pp.inode_needs_update_time
      3.96            +0.6        4.57            -0.2        3.73        perf-profile.children.cycles-pp.file_update_time
     12.35            +0.6       12.96            -0.5       11.86        perf-profile.children.cycles-pp.__generic_file_write_iter
      4.56            +0.8        5.32            +0.9        5.48        perf-profile.children.cycles-pp.rw_verify_area
     38.16            +0.8       39.01            +0.0       38.18        perf-profile.children.cycles-pp.generic_perform_write
     84.67            +0.9       85.62            +0.4       85.06        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     13.58            +1.0       14.53            +0.9       14.52        perf-profile.children.cycles-pp.simple_write_begin
     83.50            +1.0       84.49            +0.4       83.94        perf-profile.children.cycles-pp.do_syscall_64
     12.68            +1.0       13.68            +1.0       13.71        perf-profile.children.cycles-pp.__filemap_get_folio
     78.74            +1.2       79.99            +0.7       79.42        perf-profile.children.cycles-pp.ksys_write
     58.52            +1.3       59.78            -0.0       58.52        perf-profile.children.cycles-pp.generic_file_write_iter
      6.18            +1.3        7.44 ±  2%      +1.3        7.46        perf-profile.children.cycles-pp.filemap_get_entry
     75.13            +1.3       76.41            +0.6       75.74        perf-profile.children.cycles-pp.vfs_write
      7.25            -0.6        6.64            -0.3        6.95        perf-profile.self.cycles-pp.vfs_write
      6.45            -0.4        6.09            -0.2        6.28        perf-profile.self.cycles-pp.write
      4.32            -0.2        4.09            -0.2        4.16        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.21            -0.2        1.01            -0.0        1.19        perf-profile.self.cycles-pp.syscall_return_via_sysret
      6.98            -0.2        6.79            -0.1        6.86        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
      4.52            -0.2        4.35            -0.2        4.35        perf-profile.self.cycles-pp.__filemap_get_folio
      2.34            -0.1        2.22            -0.1        2.26        perf-profile.self.cycles-pp.__cond_resched
      1.60            -0.1        1.49 ±  2%      +0.1        1.71 ±  2%  perf-profile.self.cycles-pp.apparmor_file_permission
      1.90            -0.1        1.79            -0.1        1.80        perf-profile.self.cycles-pp.do_syscall_64
      3.65            -0.1        3.56            +0.0        3.68        perf-profile.self.cycles-pp.__fsnotify_parent
      1.47            -0.1        1.38            -0.1        1.41        perf-profile.self.cycles-pp.ksys_write
      1.62            -0.1        1.53            -0.0        1.59        perf-profile.self.cycles-pp.up_write
      1.09            -0.1        1.02            -0.1        1.04        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.80            -0.1        1.74            -0.1        1.74        perf-profile.self.cycles-pp.xas_load
      0.79            -0.1        0.73            +0.0        0.83        perf-profile.self.cycles-pp.w_test
      1.10            -0.1        1.04            -0.0        1.08        perf-profile.self.cycles-pp.security_file_permission
      1.25            -0.1        1.20            -0.1        1.20        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.66            -0.1        1.60            -0.1        1.61        perf-profile.self.cycles-pp.entry_SYSCALL_64
      1.41            -0.1        1.36            -0.0        1.36        perf-profile.self.cycles-pp.rcu_all_qs
      0.90            -0.0        0.86            -0.1        0.81        perf-profile.self.cycles-pp.simple_write_begin
      0.88            -0.0        0.84            -0.0        0.85        perf-profile.self.cycles-pp.aa_file_perm
      0.80            -0.0        0.76            -0.0        0.76        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.62            -0.0        0.59            -0.0        0.59        perf-profile.self.cycles-pp.x64_sys_call
      1.39            -0.0        1.36            -0.1        1.34        perf-profile.self.cycles-pp.__vfs_getxattr
      0.53            -0.0        0.50            -0.0        0.51        perf-profile.self.cycles-pp.folio_wait_stable
      1.69            -0.0        1.67            +0.1        1.81        perf-profile.self.cycles-pp.generic_file_write_iter
      0.87            -0.0        0.85            -0.0        0.85        perf-profile.self.cycles-pp.folio_mapping
      0.56            -0.0        0.53            -0.0        0.53        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      1.93            -0.0        1.91 ±  2%      +0.2        2.10        perf-profile.self.cycles-pp.down_write
      0.24            -0.0        0.22            -0.0        0.23        perf-profile.self.cycles-pp.amd_clear_divider
      0.25            -0.0        0.23            -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.__x64_sys_write
      0.35            -0.0        0.34            -0.0        0.34        perf-profile.self.cycles-pp.inode_to_bdi
      1.15            -0.0        1.13            -0.0        1.11        perf-profile.self.cycles-pp.__generic_file_write_iter
      0.61            -0.0        0.60            -0.0        0.59        perf-profile.self.cycles-pp.xattr_resolve_name
      0.22            -0.0        0.21            -0.0        0.21 ±  2%  perf-profile.self.cycles-pp.noop_dirty_folio
      0.35            -0.0        0.34            -0.0        0.34        perf-profile.self.cycles-pp.cap_inode_need_killpriv
      0.79            -0.0        0.79 ±  2%      +0.1        0.88        perf-profile.self.cycles-pp.generic_write_check_limits
      0.52            +0.0        0.52            -0.0        0.49        perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      3.36            +0.0        3.36            -0.2        3.16        perf-profile.self.cycles-pp.generic_perform_write
      0.70            +0.0        0.71            -0.0        0.68        perf-profile.self.cycles-pp.security_inode_need_killpriv
      1.30            +0.0        1.32            +0.2        1.46        perf-profile.self.cycles-pp.generic_write_checks
      0.66            +0.0        0.69            -0.0        0.62        perf-profile.self.cycles-pp.file_update_time
      1.03            +0.0        1.06            -0.1        0.97        perf-profile.self.cycles-pp.strcmp
      2.75            +0.0        2.79            +0.0        2.76        perf-profile.self.cycles-pp.simple_write_end
      0.72            +0.0        0.77            -0.1        0.66        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
      0.87            +0.1        0.93            -0.1        0.82        perf-profile.self.cycles-pp.timestamp_truncate
      2.10 ±  3%      +0.1        2.16            +0.1        2.23 ±  2%  perf-profile.self.cycles-pp.__fdget_pos
      2.04            +0.1        2.09            -0.0        1.99        perf-profile.self.cycles-pp.file_remove_privs_flags
      5.54            +0.1        5.60            -0.5        5.08        perf-profile.self.cycles-pp.fault_in_readable
      1.51            +0.2        1.69            -0.1        1.37        perf-profile.self.cycles-pp.inode_needs_update_time
      0.78 ±  2%      +0.3        1.11 ±  7%      +0.0        0.80 ±  2%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
      0.84            +1.0        1.81            +0.8        1.69        perf-profile.self.cycles-pp.rw_verify_area
      3.66            +1.3        4.98 ±  4%      +1.3        4.99        perf-profile.self.cycles-pp.filemap_get_entry




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-05  2:09                   ` Oliver Sang
@ 2024-07-05  5:48                     ` Amir Goldstein
  2024-07-08  5:40                       ` Oliver Sang
  2024-07-25 13:41                       ` Jan Kara
  0 siblings, 2 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-07-05  5:48 UTC (permalink / raw)
  To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp

On Fri, Jul 5, 2024 at 5:09 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote:
>
> [...]
>
> > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> > >   "unixbench.throughput": [
> > >     121545292.8,
> > >     121629889.4,
> > >     121598992.0,
> > >     121492095.5,
> > >     121645038.1,
> > >     121556286.9
> > >   ],
> > >
> >
> > Are all those runs from the same boot?
>
> no. we reboot machine before each run.
>
> >
> > > for the branch tip a82fd282befc7:
> > >   "unixbench.throughput": [
> > >     116675606.7,
> > >     116840611.2,
> > >     116738966.0,
> > >     116956953.1,
> > >     116704901.9,
> > >     116997628.3,
> > >     117141733.7,
> > >     116660495.4
> > >   ],
> > >
> >
> > And these run?
>
> same.
>
> >
> > Otherwise, we might have a fluctuation that happens at boot time
> > or at mount time or something.
> >
> > >
> > > let me combine the results from this branch together:
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > >   v6.10-rc1
> > >   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > >   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > >   64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > >   a82fd282befc7 fanotify: report file range info with pre-content
> > >
> > >        v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> > >          %stddev     %change         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
> > >              \          |                \          |                \          |                \          |                \
> > >  1.216e+08            -3.5%  1.174e+08            -4.3%  1.163e+08            -6.6%  1.135e+08            -3.9%  1.168e+08        unixbench.throughput
> > >
> > >
> > > one thing I want to mention is the "%change" is always comparing to the first
> > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > > and so on.
> >
> > Thanks for clarifying - I did not read it this way.
> >
> > >
> > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > > -2.4% regression compareing to 94167e071109d.
> > >
> > > from above table, along the branch, the performance is kind of fluctuating,
> > > dropped most on 64108c0b47db9, but then recovered a little on tip.
> > >
> >
> > I can understand why 64108c0b47db91b would regress performance, but I
> > cannot think
> > of any possible explanation why a82fd282befc should improve performance,
> > so I have to wonder if the regression to -6.6% is not a fluke of some
> > specific boot/mount?
> >
> > I pushed a test branch to
> > https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> > with an extra patch that un-inlines some helpers to help bisect the
> > perf report better.
> > Maybe produce the report with this commit and it sheds some light.
>
> since
>
> * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> * f301cd18006c3 fanotify: rename a misnamed constant
> * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> * aca4084213276 fsnotify: generate pre-content permission event on open
> * 93656e196b006 fsnotify: introduce pre-content permission event
> * 1613e604df0cd (tag: v6.10-rc1,
>
> we run tests upon new commit. summary report is as below:
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   v6.10-rc1
>   a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
>   388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
>
>        v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>  1.216e+08            -3.9%  1.168e+08            -4.1%  1.166e+08        unixbench.throughput
>:
>
> since Jan mentioned in a later mail that perf profiles are useful, I put details
> as below

Thanks.
That clarifies that the cycles are spent in the "optimization code" itself.

I pushed a new version to the fsnotify_for_lkp branch with a possible fix commit
at the base of the branch.

Hopefully, with this fix, the compiler will be able to optimize smarter and
the generated fast path code will be less sensitive to code alignment ???

If it works, it may eliminate some of the regressions throughout this branch and
may also improve the stress-ng regression that you reported on v6.10-rc1 [1].

* e0aaae806edc - (fsnotify_for_lkp) fanotify: report file range info
with pre-content events
* a28c32866bb3 - fanotify: rename a misnamed constant
* 61baabbdceaa - fanotify: pass optional file access range in pre-content event
* 72e76d909afd - fanotify: introduce FAN_PRE_MODIFY permission event
* 1c71a12ff3ce - fanotify: introduce FAN_PRE_ACCESS permission event
* 38a903de931a - fsnotify: generate pre-content permission event on exec
* 70be29706389 - fsnotify: generate pre-content permission event on open
* 96768b7d6721 - fsnotify: introduce pre-content permission event
* 28d5b4a88241 - fsnotify: avoid multiple fsnotify_sb_info() access in
permission hooks

Fingers crossed...

Thanks,
Amir.

[1] https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-05  5:48                     ` Amir Goldstein
@ 2024-07-08  5:40                       ` Oliver Sang
  2024-07-08 16:37                         ` Amir Goldstein
  2024-07-25 13:41                       ` Jan Kara
  1 sibling, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-07-08  5:40 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang

hi, Amir,

On Fri, Jul 05, 2024 at 08:48:28AM +0300, Amir Goldstein wrote:

[...]

> 
> Thanks.
> That clarifies that the cycles are spent in the "optimization code" itself.
> 
> I pushed a new version to the fsnotify_for_lkp branch with a possible fix commit
> at the base of the branch.
> 
> Hopefully, with this fix, the compiler will be able to optimize smarter and
> the generated fast path code will be less sensitive to code alignment ???
> 
> If it works, it may eliminate some of the regressions throughout this branch and
> may also improve the stress-ng regression that you reported on v6.10-rc1 [1].
> 
> * e0aaae806edc - (fsnotify_for_lkp) fanotify: report file range info
> with pre-content events
> * a28c32866bb3 - fanotify: rename a misnamed constant
> * 61baabbdceaa - fanotify: pass optional file access range in pre-content event
> * 72e76d909afd - fanotify: introduce FAN_PRE_MODIFY permission event
> * 1c71a12ff3ce - fanotify: introduce FAN_PRE_ACCESS permission event
> * 38a903de931a - fsnotify: generate pre-content permission event on exec
> * 70be29706389 - fsnotify: generate pre-content permission event on open
> * 96768b7d6721 - fsnotify: introduce pre-content permission event
> * 28d5b4a88241 - fsnotify: avoid multiple fsnotify_sb_info() access in
> permission hooks
> 
> Fingers crossed...

unfortunately, seems no luck. I combine the results with 96768b7d6721 and its
parent since 96768b7d6721 introduces most regression.

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  v6.10-rc1
  28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks
  96768b7d67219 fsnotify: introduce pre-content permission event
  e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events

       v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
 1.218e+08            -0.3%  1.214e+08            -7.6%  1.125e+08            -6.4%   1.14e+08        unixbench.throughput

detail is as below [2]


> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/

for this report, I also retest on new branch. seems the regression reduced to
around 10%, but we cannot get stable data on this new branch, so we cannot say
if it really becomes better now.

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/full/stress-ng/60s

commit:
  v6.10-rc1
  e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events

       v6.10-rc1 e0aaae806edc3411d84dc0d66fe
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
 1.161e+08 ±  5%      -9.5%   1.05e+08 ± 10%  stress-ng.full.ops
   1934587 ±  5%      -9.5%    1750464 ± 10%  stress-ng.full.ops_per_sec



[2]
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench

commit:
  v6.10-rc1
  28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks
  96768b7d67219 fsnotify: introduce pre-content permission event
  e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events

       v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
      6215            +0.1%       6223            -8.9%       5661            -7.9%       5724        time.user_time
      0.58            -0.0        0.54 ± 18%      -0.1        0.52            -0.0        0.55 ± 11%  mpstat.cpu.all.irq%
      0.01 ±  4%      -0.0        0.01 ± 13%      -0.0        0.00 ±  2%      -0.0        0.01 ±  6%  mpstat.cpu.all.soft%
      7.59 ± 59%     -58.7%       3.14 ± 63%     -61.7%       2.90 ± 52%     -33.4%       5.06 ± 50%  sched_debug.cfs_rq:/.util_est.min
      0.00 ± 72%    -148.0%      -0.00           -50.9%       0.00 ±244%     +12.3%       0.00 ±102%  sched_debug.cpu.nr_uninterruptible.avg
 1.218e+08            -0.3%  1.214e+08            -7.6%  1.125e+08            -6.4%   1.14e+08        unixbench.throughput
      6215            +0.1%       6223            -8.9%       5661            -7.9%       5724        unixbench.time.user_time
 4.521e+10            -0.3%  4.506e+10            -7.7%  4.172e+10            -6.4%  4.231e+10        unixbench.workload
 1.458e+11            -7.5%   1.35e+11 ± 18%      -6.8%  1.359e+11            -9.5%   1.32e+11 ± 11%  perf-stat.i.branch-instructions
   3742171 ±  4%     -16.8%    3112873 ± 20%     +80.4%    6752235 ±  7%    +403.3%   18836125 ± 14%  perf-stat.i.cache-misses
  32402657 ±  3%     -19.5%   26094697 ± 16%     +77.8%   57627688 ±  4%    +356.5%  1.479e+08 ± 11%  perf-stat.i.cache-references
      0.95            +9.8%       1.04 ± 20%      +5.8%       1.00 ±  2%      +9.3%       1.03 ± 13%  perf-stat.i.cpi
    161794 ±  8%      -2.2%     158309 ± 20%     -49.9%      81139 ± 15%     -77.3%      36784 ± 54%  perf-stat.i.cycles-between-cache-misses
 6.963e+11            -7.4%  6.445e+11 ± 18%      -6.5%  6.513e+11            -8.8%  6.353e+11 ± 11%  perf-stat.i.instructions
      1.22            -4.7%       1.16 ± 10%      -6.3%       1.14            -6.5%       1.14 ±  6%  perf-stat.i.ipc
      0.01 ±  4%     -10.3%       0.00 ±  6%     +92.9%       0.01 ±  7%    +453.4%       0.03 ± 14%  perf-stat.overall.MPKI
      0.75            +0.0%       0.75            +6.8%       0.80            +4.3%       0.78        perf-stat.overall.cpi
    139258 ±  4%     +11.8%     155626 ±  6%     -44.5%      77325 ±  7%     -80.8%      26737 ± 14%  perf-stat.overall.cycles-between-cache-misses
      1.33            -0.0%       1.33            -6.3%       1.25            -4.1%       1.28        perf-stat.overall.ipc
      5722            +0.3%       5738            +1.6%       5811            +2.6%       5869        perf-stat.overall.path-length
 1.452e+11            -7.4%  1.343e+11 ± 18%      -6.8%  1.352e+11            -9.5%  1.314e+11 ± 11%  perf-stat.ps.branch-instructions
   3742620 ±  4%     -16.7%    3117570 ± 20%     +80.4%    6752430 ±  7%    +401.2%   18758290 ± 14%  perf-stat.ps.cache-misses
  32374621 ±  3%     -19.4%   26088859 ± 16%     +77.6%   57486380 ±  4%    +354.8%  1.473e+08 ± 11%  perf-stat.ps.cache-references
  6.93e+11            -7.4%  6.415e+11 ± 18%      -6.5%  6.481e+11            -8.8%  6.323e+11 ± 11%  perf-stat.ps.instructions
 2.587e+14            -0.0%  2.586e+14            -6.3%  2.425e+14            -4.0%  2.484e+14        perf-stat.total.instructions
      2.85            -0.2        2.62            -0.3        2.55 ±  3%      +0.0        2.86        perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      5.99            -0.1        5.86            +0.1        6.09            +1.1        7.10        perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
     12.29            -0.1       12.16            -0.3       11.97            +0.8       13.04        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
     13.39            -0.1       13.28            -0.4       12.96            +0.7       14.12        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     13.18            -0.1       13.08            -0.9       12.24            -1.0       12.21        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
      1.98 ±  5%      -0.1        1.88            -0.0        1.96 ±  3%      +0.2        2.20        perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      3.27 ±  4%      -0.1        3.20            +4.7        7.99 ±  6%      -0.1        3.21        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
      2.42 ±  6%      -0.1        2.37            +4.7        7.16 ±  7%      -0.1        2.28        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
      3.72            -0.1        3.67            -0.4        3.37            +0.1        3.81        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     58.06            -0.0       58.01            -2.5       55.53            +0.7       58.76        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     74.32            -0.0       74.28            +1.8       76.16            +1.4       75.67        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.90 ±  5%      -0.0        0.87            -0.0        0.87 ±  3%      +0.1        1.00        perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
      0.70            -0.0        0.66            -0.0        0.68            -0.0        0.68        perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
      5.30            -0.0        5.27            -0.2        5.06            -0.1        5.25        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      1.11            -0.0        1.08            -0.1        0.97            -0.2        0.94        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
      1.81            -0.0        1.80            -0.1        1.72            -0.1        1.76        perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      1.62            -0.0        1.62            -0.1        1.49            -0.1        1.52        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.92            -0.0        0.91            -0.1        0.84            -0.1        0.86        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      2.18            -0.0        2.17            -0.1        2.09            -0.1        2.12        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
      0.62            -0.0        0.62            -0.0        0.58            -0.0        0.60        perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      0.63            -0.0        0.63            -0.0        0.58            -0.0        0.59        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.92            -0.0        0.92            -0.1        0.84            -0.0        0.87        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      1.68            +0.0        1.68            -0.1        1.56            -0.1        1.60        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.68            +0.0        0.69            -0.0        0.64            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
      0.72 ±  2%      +0.0        0.72            -0.0        0.69            -0.0        0.70        perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
      0.85            +0.0        0.86            -0.0        0.81            -0.0        0.81        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      0.74            +0.0        0.75            -0.0        0.70            +0.0        0.77        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      0.86            +0.0        0.86            +0.0        0.88            -0.0        0.82        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
     84.29            +0.0       84.30            +1.2       85.47            +1.1       85.34        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
      7.00            +0.0        7.02            -0.4        6.59            -0.1        6.89        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     82.86            +0.0       82.87            +1.3       84.15            +1.1       83.98        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.73            +0.0        0.74            -0.0        0.71            -0.0        0.70        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.64            +0.0        0.66 ±  2%      -0.1        0.57            -0.1        0.58        perf-profile.calltrace.cycles-pp.w_test
     78.16            +0.0       78.18            +1.7       79.81            +1.4       79.57        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     96.83            +0.0       96.86            +0.2       97.07            +0.3       97.12        perf-profile.calltrace.cycles-pp.write
      0.78 ±  3%      +0.0        0.82 ±  2%      +0.1        0.88 ±  3%      +0.3        1.05 ±  8%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
      1.12 ±  2%      +0.0        1.16            -0.1        1.02            -0.0        1.11        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
      4.23            +0.0        4.27            -0.3        3.94            -0.1        4.13        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
      2.70            +0.0        2.74            -0.2        2.50            -0.1        2.63        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
      3.52            +0.0        3.57            -0.3        3.26            -0.1        3.41        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
      6.89            +0.1        6.94            -0.4        6.47            -0.1        6.77        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
      2.10 ±  5%      +0.1        2.15 ±  3%      -0.1        2.03 ±  4%      +0.2        2.25 ±  3%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      2.96            +0.1        3.06            +0.1        3.08            +0.4        3.36 ±  2%  perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
      3.61            +0.1        3.72            +0.1        3.71            +0.4        4.00 ±  2%  perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
     37.30            +0.1       37.42            -1.6       35.66            +0.2       37.50        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      5.32            +0.2        5.48            -0.1        5.18            -0.1        5.21        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
      4.26 ±  3%      +0.2        4.42            +5.4        9.61 ±  5%      +0.7        4.98        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.20            +0.2        6.37            -0.2        5.98            -0.2        6.03        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     12.00            +0.2       12.19            -0.4       11.61            +0.2       12.25        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
      3.02            -0.2        2.79            -0.3        2.72 ±  3%      +0.0        3.04        perf-profile.children.cycles-pp.down_write
      6.18            -0.1        6.05            +0.1        6.28            +1.1        7.28        perf-profile.children.cycles-pp.filemap_get_entry
     12.68            -0.1       12.56            -0.3       12.34            +0.7       13.42        perf-profile.children.cycles-pp.__filemap_get_folio
     13.58            -0.1       13.47            -0.5       13.13            +0.7       14.29        perf-profile.children.cycles-pp.simple_write_begin
     58.65            -0.1       58.58            -2.5       56.11            +0.7       59.36        perf-profile.children.cycles-pp.generic_file_write_iter
      3.64 ±  3%      -0.1        3.58            +4.7        8.36 ±  6%      -0.1        3.56        perf-profile.children.cycles-pp.security_file_permission
      4.19            -0.1        4.14            -0.2        3.96            -0.2        3.98        perf-profile.children.cycles-pp.__cond_resched
      2.67 ±  5%      -0.1        2.61            +4.7        7.40 ±  6%      -0.2        2.51        perf-profile.children.cycles-pp.apparmor_file_permission
      3.81            -0.1        3.75            -0.4        3.45            +0.1        3.90        perf-profile.children.cycles-pp.__fsnotify_parent
      2.42            -0.0        2.38            -0.2        2.23            -0.2        2.25        perf-profile.children.cycles-pp.rcu_all_qs
      7.43            -0.0        7.39            -0.5        6.91            -0.5        6.92        perf-profile.children.cycles-pp.entry_SYSCALL_64
     98.97            -0.0       98.94            +0.1       99.06            +0.1       99.04        perf-profile.children.cycles-pp.write
      5.69            -0.0        5.66            -0.3        5.44            -0.1        5.62        perf-profile.children.cycles-pp.simple_write_end
      1.21            -0.0        1.18            -0.1        1.06            -0.2        1.04        perf-profile.children.cycles-pp.syscall_return_via_sysret
     75.19            -0.0       75.17            +1.8       76.99            +1.3       76.53        perf-profile.children.cycles-pp.vfs_write
      1.12            -0.0        1.11            -0.1        1.03            -0.1        1.04        perf-profile.children.cycles-pp.folio_wait_stable
      1.90            -0.0        1.88            -0.1        1.80            -0.1        1.84        perf-profile.children.cycles-pp.folio_unlock
      1.99            -0.0        1.98            -0.2        1.82            -0.1        1.86        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.76            -0.0        0.75            -0.1        0.70            -0.1        0.70        perf-profile.children.cycles-pp.x64_sys_call
      0.23            -0.0        0.23 ±  2%      -0.0        0.22            -0.0        0.22        perf-profile.children.cycles-pp.file_remove_privs
      2.47            -0.0        2.46            -0.1        2.37            -0.1        2.41        perf-profile.children.cycles-pp.xas_load
      0.64            -0.0        0.64            -0.1        0.58            -0.0        0.60        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.37            -0.0        0.37            -0.0        0.35            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
      0.36            -0.0        0.36            -0.0        0.33            -0.0        0.34        perf-profile.children.cycles-pp.amd_clear_divider
      1.26            -0.0        1.26            -0.1        1.16            -0.1        1.19        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.59            -0.0        0.59            -0.1        0.54            -0.0        0.56        perf-profile.children.cycles-pp.inode_to_bdi
      1.74            -0.0        1.74            -0.1        1.62            -0.1        1.66        perf-profile.children.cycles-pp.up_write
      1.05            -0.0        1.05            -0.1        0.97            -0.0        1.01        perf-profile.children.cycles-pp.folio_mapping
      0.36            +0.0        0.36            -0.0        0.34            -0.0        0.34        perf-profile.children.cycles-pp.is_bad_inode
      0.33            +0.0        0.33            -0.0        0.30            -0.0        0.31        perf-profile.children.cycles-pp.noop_dirty_folio
      0.25            +0.0        0.25            -0.0        0.23            -0.0        0.23        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
      1.10            +0.0        1.10            -0.1        1.04            -0.1        1.04        perf-profile.children.cycles-pp.xattr_resolve_name
      0.85            +0.0        0.85            -0.0        0.80            -0.0        0.82        perf-profile.children.cycles-pp.setattr_should_drop_suidgid
      0.55            +0.0        0.55            -0.0        0.52            -0.0        0.52        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.92            +0.0        0.94            -0.1        0.87            +0.0        0.95        perf-profile.children.cycles-pp.folio_mark_dirty
      0.97            +0.0        0.98            +0.0        0.99            -0.0        0.93        perf-profile.children.cycles-pp.aa_file_perm
     83.54            +0.0       83.55            +1.3       84.79            +1.1       84.63        perf-profile.children.cycles-pp.do_syscall_64
      7.14            +0.0        7.15            -0.4        6.72            -0.1        7.03        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
      0.45 ±  2%      +0.0        0.47            -0.0        0.41            -0.0        0.41        perf-profile.children.cycles-pp.write@plt
     84.71            +0.0       84.72            +1.2       85.88            +1.1       85.76        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      4.41            +0.0        4.43            -0.3        4.11            -0.3        4.16        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.98            +0.0        1.00            -0.1        0.87            -0.1        0.89        perf-profile.children.cycles-pp.w_test
     78.77            +0.0       78.80            +1.6       80.39            +1.4       80.17        perf-profile.children.cycles-pp.ksys_write
      0.89 ±  3%      +0.0        0.93            +0.1        0.97 ±  3%      +0.3        1.16 ±  7%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
      1.37            +0.0        1.42            -0.1        1.24            -0.0        1.34        perf-profile.children.cycles-pp.strcmp
      4.46            +0.0        4.51            -0.3        4.15            -0.1        4.35        perf-profile.children.cycles-pp.security_inode_need_killpriv
      3.75            +0.0        3.80            -0.3        3.47            -0.1        3.63        perf-profile.children.cycles-pp.cap_inode_need_killpriv
      7.24            +0.1        7.30            -0.4        6.81            -0.1        7.11        perf-profile.children.cycles-pp.file_remove_privs_flags
      3.39            +0.1        3.45            -0.2        3.15            -0.1        3.29        perf-profile.children.cycles-pp.__vfs_getxattr
      3.38            +0.1        3.48            +0.1        3.46            +0.4        3.74 ±  2%  perf-profile.children.cycles-pp.inode_needs_update_time
      3.94            +0.1        4.05            +0.1        4.02            +0.4        4.32        perf-profile.children.cycles-pp.file_update_time
     38.19            +0.1       38.33            -1.7       36.51            +0.2       38.39        perf-profile.children.cycles-pp.generic_perform_write
      5.71            +0.2        5.87            -0.2        5.49            -0.2        5.54        perf-profile.children.cycles-pp.fault_in_readable
      4.50 ±  3%      +0.2        4.67            +5.4        9.86 ±  4%      +0.9        5.35        perf-profile.children.cycles-pp.rw_verify_area
      6.45            +0.2        6.64            -0.2        6.22            -0.2        6.28        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
     12.34            +0.2       12.53            -0.4       11.93            +0.2       12.58        perf-profile.children.cycles-pp.__generic_file_write_iter
      1.98 ±  3%      -0.2        1.79            -0.2        1.74 ±  4%      +0.1        2.04        perf-profile.self.cycles-pp.down_write
      3.66            -0.1        3.54            +0.2        3.86            +1.2        4.82 ±  2%  perf-profile.self.cycles-pp.filemap_get_entry
      3.64            -0.1        3.57            -0.4        3.28            +0.1        3.73        perf-profile.self.cycles-pp.__fsnotify_parent
      1.54 ±  9%      -0.1        1.47 ±  2%      +4.7        6.22 ±  8%      -0.1        1.44        perf-profile.self.cycles-pp.apparmor_file_permission
      1.71 ±  2%      -0.1        1.65            -0.0        1.69 ±  2%      +0.1        1.82        perf-profile.self.cycles-pp.generic_file_write_iter
      6.45            -0.1        6.39            -0.5        5.95            -0.5        6.00        perf-profile.self.cycles-pp.write
      7.25            -0.1        7.20            -0.6        6.64            -0.3        6.98        perf-profile.self.cycles-pp.vfs_write
      2.35            -0.0        2.31            -0.1        2.25            -0.1        2.22        perf-profile.self.cycles-pp.__cond_resched
      2.76            -0.0        2.72            -0.1        2.65            -0.0        2.72        perf-profile.self.cycles-pp.simple_write_end
      0.80 ±  4%      -0.0        0.78 ±  2%      -0.1        0.76 ±  2%      +0.1        0.88        perf-profile.self.cycles-pp.generic_write_check_limits
      1.20            -0.0        1.18            -0.1        1.06            -0.2        1.03        perf-profile.self.cycles-pp.syscall_return_via_sysret
      1.41            -0.0        1.39            -0.1        1.30            -0.1        1.34        perf-profile.self.cycles-pp.rcu_all_qs
      1.76            -0.0        1.75            -0.1        1.68            -0.0        1.72        perf-profile.self.cycles-pp.folio_unlock
      1.90            -0.0        1.90            -0.1        1.76            -0.1        1.78        perf-profile.self.cycles-pp.do_syscall_64
      0.54            -0.0        0.53            -0.0        0.50            -0.0        0.50        perf-profile.self.cycles-pp.folio_wait_stable
      1.80            -0.0        1.80            -0.1        1.72            -0.0        1.75        perf-profile.self.cycles-pp.xas_load
      1.10            -0.0        1.09            -0.0        1.06            -0.1        1.02        perf-profile.self.cycles-pp.security_file_permission
      1.48            -0.0        1.47            -0.1        1.36            -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
      0.62            -0.0        0.62            -0.0        0.58            -0.0        0.58        perf-profile.self.cycles-pp.x64_sys_call
      0.35            -0.0        0.35            -0.0        0.32            -0.0        0.33        perf-profile.self.cycles-pp.cap_inode_need_killpriv
      0.24            -0.0        0.24            -0.0        0.23 ±  2%      -0.0        0.23        perf-profile.self.cycles-pp.is_bad_inode
      0.87            -0.0        0.87            -0.1        0.80            -0.0        0.83        perf-profile.self.cycles-pp.folio_mapping
      0.24            -0.0        0.24            -0.0        0.22            -0.0        0.22        perf-profile.self.cycles-pp.amd_clear_divider
      0.70            -0.0        0.70            -0.0        0.67            +0.0        0.72        perf-profile.self.cycles-pp.security_inode_need_killpriv
      0.52            -0.0        0.52            -0.0        0.47            -0.0        0.49        perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      1.62            +0.0        1.62            -0.1        1.51            -0.1        1.55        perf-profile.self.cycles-pp.up_write
      0.22            +0.0        0.22            -0.0        0.21 ±  2%      -0.0        0.21        perf-profile.self.cycles-pp.noop_dirty_folio
      1.09            +0.0        1.10            -0.1        1.01            -0.1        1.02        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.35            +0.0        0.36            -0.0        0.32            -0.0        0.34        perf-profile.self.cycles-pp.inode_to_bdi
      1.25            +0.0        1.25            -0.1        1.16            -0.0        1.20        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.25            +0.0        0.25            -0.0        0.23            -0.0        0.23        perf-profile.self.cycles-pp.__x64_sys_write
      0.79            +0.0        0.80            -0.1        0.74            -0.0        0.75        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
      2.04            +0.0        2.04            -0.1        1.95            +0.0        2.04        perf-profile.self.cycles-pp.file_remove_privs_flags
      0.55            +0.0        0.55            -0.0        0.52            -0.0        0.52        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.61            +0.0        0.62            -0.0        0.58            -0.0        0.59        perf-profile.self.cycles-pp.xattr_resolve_name
      0.73            +0.0        0.73            -0.0        0.69            -0.0        0.70        perf-profile.self.cycles-pp.setattr_should_drop_suidgid
      4.51            +0.0        4.52            -0.3        4.23            -0.2        4.27        perf-profile.self.cycles-pp.__filemap_get_folio
      0.47            +0.0        0.48            -0.0        0.45            +0.0        0.49        perf-profile.self.cycles-pp.folio_mark_dirty
      0.87            +0.0        0.88            +0.0        0.89            -0.0        0.83        perf-profile.self.cycles-pp.aa_file_perm
      6.97            +0.0        6.98            -0.4        6.56            -0.1        6.86        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
      1.39            +0.0        1.40            -0.1        1.30            -0.0        1.34        perf-profile.self.cycles-pp.__vfs_getxattr
      1.65            +0.0        1.67            -0.1        1.56            -0.1        1.60        perf-profile.self.cycles-pp.entry_SYSCALL_64
      4.32            +0.0        4.34            -0.3        4.02            -0.3        4.06        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.90            +0.0        0.92            -0.1        0.80            -0.0        0.87        perf-profile.self.cycles-pp.simple_write_begin
      0.66            +0.0        0.68            -0.0        0.64            +0.0        0.67        perf-profile.self.cycles-pp.file_update_time
      1.15            +0.0        1.18            -0.0        1.10            -0.0        1.15        perf-profile.self.cycles-pp.__generic_file_write_iter
      0.78            +0.0        0.81            -0.1        0.69            -0.1        0.71        perf-profile.self.cycles-pp.w_test
      1.02 ±  2%      +0.0        1.06            -0.1        0.91            -0.0        1.00        perf-profile.self.cycles-pp.strcmp
      1.50            +0.0        1.53            +0.0        1.52            +0.1        1.58        perf-profile.self.cycles-pp.inode_needs_update_time
      0.78 ±  4%      +0.0        0.82            +0.1        0.86 ±  3%      +0.3        1.05 ±  8%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
      3.35            +0.1        3.44            -0.2        3.18            -0.0        3.31        perf-profile.self.cycles-pp.generic_perform_write
      5.56            +0.2        5.72            -0.2        5.33            -0.2        5.38        perf-profile.self.cycles-pp.fault_in_readable
      0.85            +0.2        1.09            +0.7        1.50            +1.1        1.92        perf-profile.self.cycles-pp.rw_verify_area



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-08  5:40                       ` Oliver Sang
@ 2024-07-08 16:37                         ` Amir Goldstein
  0 siblings, 0 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-07-08 16:37 UTC (permalink / raw)
  To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp

On Mon, Jul 8, 2024 at 8:40 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Fri, Jul 05, 2024 at 08:48:28AM +0300, Amir Goldstein wrote:
>
> [...]
>
> >
> > Thanks.
> > That clarifies that the cycles are spent in the "optimization code" itself.
> >
> > I pushed a new version to the fsnotify_for_lkp branch with a possible fix commit
> > at the base of the branch.
> >
> > Hopefully, with this fix, the compiler will be able to optimize smarter and
> > the generated fast path code will be less sensitive to code alignment ???
> >
> > If it works, it may eliminate some of the regressions throughout this branch and
> > may also improve the stress-ng regression that you reported on v6.10-rc1 [1].
> >
> > * e0aaae806edc - (fsnotify_for_lkp) fanotify: report file range info
> > with pre-content events
> > * a28c32866bb3 - fanotify: rename a misnamed constant
> > * 61baabbdceaa - fanotify: pass optional file access range in pre-content event
> > * 72e76d909afd - fanotify: introduce FAN_PRE_MODIFY permission event
> > * 1c71a12ff3ce - fanotify: introduce FAN_PRE_ACCESS permission event
> > * 38a903de931a - fsnotify: generate pre-content permission event on exec
> > * 70be29706389 - fsnotify: generate pre-content permission event on open
> > * 96768b7d6721 - fsnotify: introduce pre-content permission event
> > * 28d5b4a88241 - fsnotify: avoid multiple fsnotify_sb_info() access in
> > permission hooks
> >
> > Fingers crossed...
>
> unfortunately, seems no luck. I combine the results with 96768b7d6721 and its
> parent since 96768b7d6721 introduces most regression.
>

Too bad.

I will need to have a think.

Thank you for testing!

Amir.

> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   v6.10-rc1
>   28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks
>   96768b7d67219 fsnotify: introduce pre-content permission event
>   e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events
>
>        v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \
>  1.218e+08            -0.3%  1.214e+08            -7.6%  1.125e+08            -6.4%   1.14e+08        unixbench.throughput
>
> detail is as below [2]
>
>
> >
> > Thanks,
> > Amir.
> >
> > [1] https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
>
> for this report, I also retest on new branch. seems the regression reduced to
> around 10%, but we cannot get stable data on this new branch, so we cannot say
> if it really becomes better now.
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/full/stress-ng/60s
>
> commit:
>   v6.10-rc1
>   e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events
>
>        v6.10-rc1 e0aaae806edc3411d84dc0d66fe
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>  1.161e+08 ą  5%      -9.5%   1.05e+08 ą 10%  stress-ng.full.ops
>    1934587 ą  5%      -9.5%    1750464 ą 10%  stress-ng.full.ops_per_sec
>
>
>
> [2]
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
>   v6.10-rc1
>   28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks
>   96768b7d67219 fsnotify: introduce pre-content permission event
>   e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events
>
>        v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \
>       6215            +0.1%       6223            -8.9%       5661            -7.9%       5724        time.user_time
>       0.58            -0.0        0.54 ą 18%      -0.1        0.52            -0.0        0.55 ą 11%  mpstat.cpu.all.irq%
>       0.01 ą  4%      -0.0        0.01 ą 13%      -0.0        0.00 ą  2%      -0.0        0.01 ą  6%  mpstat.cpu.all.soft%
>       7.59 ą 59%     -58.7%       3.14 ą 63%     -61.7%       2.90 ą 52%     -33.4%       5.06 ą 50%  sched_debug.cfs_rq:/.util_est.min
>       0.00 ą 72%    -148.0%      -0.00           -50.9%       0.00 ą244%     +12.3%       0.00 ą102%  sched_debug.cpu.nr_uninterruptible.avg
>  1.218e+08            -0.3%  1.214e+08            -7.6%  1.125e+08            -6.4%   1.14e+08        unixbench.throughput
>       6215            +0.1%       6223            -8.9%       5661            -7.9%       5724        unixbench.time.user_time
>  4.521e+10            -0.3%  4.506e+10            -7.7%  4.172e+10            -6.4%  4.231e+10        unixbench.workload
>  1.458e+11            -7.5%   1.35e+11 ą 18%      -6.8%  1.359e+11            -9.5%   1.32e+11 ą 11%  perf-stat.i.branch-instructions
>    3742171 ą  4%     -16.8%    3112873 ą 20%     +80.4%    6752235 ą  7%    +403.3%   18836125 ą 14%  perf-stat.i.cache-misses
>   32402657 ą  3%     -19.5%   26094697 ą 16%     +77.8%   57627688 ą  4%    +356.5%  1.479e+08 ą 11%  perf-stat.i.cache-references
>       0.95            +9.8%       1.04 ą 20%      +5.8%       1.00 ą  2%      +9.3%       1.03 ą 13%  perf-stat.i.cpi
>     161794 ą  8%      -2.2%     158309 ą 20%     -49.9%      81139 ą 15%     -77.3%      36784 ą 54%  perf-stat.i.cycles-between-cache-misses
>  6.963e+11            -7.4%  6.445e+11 ą 18%      -6.5%  6.513e+11            -8.8%  6.353e+11 ą 11%  perf-stat.i.instructions
>       1.22            -4.7%       1.16 ą 10%      -6.3%       1.14            -6.5%       1.14 ą  6%  perf-stat.i.ipc
>       0.01 ą  4%     -10.3%       0.00 ą  6%     +92.9%       0.01 ą  7%    +453.4%       0.03 ą 14%  perf-stat.overall.MPKI
>       0.75            +0.0%       0.75            +6.8%       0.80            +4.3%       0.78        perf-stat.overall.cpi
>     139258 ą  4%     +11.8%     155626 ą  6%     -44.5%      77325 ą  7%     -80.8%      26737 ą 14%  perf-stat.overall.cycles-between-cache-misses
>       1.33            -0.0%       1.33            -6.3%       1.25            -4.1%       1.28        perf-stat.overall.ipc
>       5722            +0.3%       5738            +1.6%       5811            +2.6%       5869        perf-stat.overall.path-length
>  1.452e+11            -7.4%  1.343e+11 ą 18%      -6.8%  1.352e+11            -9.5%  1.314e+11 ą 11%  perf-stat.ps.branch-instructions
>    3742620 ą  4%     -16.7%    3117570 ą 20%     +80.4%    6752430 ą  7%    +401.2%   18758290 ą 14%  perf-stat.ps.cache-misses
>   32374621 ą  3%     -19.4%   26088859 ą 16%     +77.6%   57486380 ą  4%    +354.8%  1.473e+08 ą 11%  perf-stat.ps.cache-references
>   6.93e+11            -7.4%  6.415e+11 ą 18%      -6.5%  6.481e+11            -8.8%  6.323e+11 ą 11%  perf-stat.ps.instructions
>  2.587e+14            -0.0%  2.586e+14            -6.3%  2.425e+14            -4.0%  2.484e+14        perf-stat.total.instructions
>       2.85            -0.2        2.62            -0.3        2.55 ą  3%      +0.0        2.86        perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       5.99            -0.1        5.86            +0.1        6.09            +1.1        7.10        perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>      12.29            -0.1       12.16            -0.3       11.97            +0.8       13.04        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
>      13.39            -0.1       13.28            -0.4       12.96            +0.7       14.12        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>      13.18            -0.1       13.08            -0.9       12.24            -1.0       12.21        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
>       1.98 ą  5%      -0.1        1.88            -0.0        1.96 ą  3%      +0.2        2.20        perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       3.27 ą  4%      -0.1        3.20            +4.7        7.99 ą  6%      -0.1        3.21        perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
>       2.42 ą  6%      -0.1        2.37            +4.7        7.16 ą  7%      -0.1        2.28        perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
>       3.72            -0.1        3.67            -0.4        3.37            +0.1        3.81        perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      58.06            -0.0       58.01            -2.5       55.53            +0.7       58.76        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      74.32            -0.0       74.28            +1.8       76.16            +1.4       75.67        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       0.90 ą  5%      -0.0        0.87            -0.0        0.87 ą  3%      +0.1        1.00        perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
>       0.70            -0.0        0.66            -0.0        0.68            -0.0        0.68        perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
>       5.30            -0.0        5.27            -0.2        5.06            -0.1        5.25        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       1.11            -0.0        1.08            -0.1        0.97            -0.2        0.94        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
>       1.81            -0.0        1.80            -0.1        1.72            -0.1        1.76        perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
>       1.62            -0.0        1.62            -0.1        1.49            -0.1        1.52        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       0.92            -0.0        0.91            -0.1        0.84            -0.1        0.86        perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       2.18            -0.0        2.17            -0.1        2.09            -0.1        2.12        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
>       0.62            -0.0        0.62            -0.0        0.58            -0.0        0.60        perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       0.63            -0.0        0.63            -0.0        0.58            -0.0        0.59        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       0.92            -0.0        0.92            -0.1        0.84            -0.0        0.87        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>       1.68            +0.0        1.68            -0.1        1.56            -0.1        1.60        perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       0.68            +0.0        0.69            -0.0        0.64            -0.0        0.66        perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
>       0.72 ą  2%      +0.0        0.72            -0.0        0.69            -0.0        0.70        perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
>       0.85            +0.0        0.86            -0.0        0.81            -0.0        0.81        perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
>       0.74            +0.0        0.75            -0.0        0.70            +0.0        0.77        perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
>       0.86            +0.0        0.86            +0.0        0.88            -0.0        0.82        perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
>      84.29            +0.0       84.30            +1.2       85.47            +1.1       85.34        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
>       7.00            +0.0        7.02            -0.4        6.59            -0.1        6.89        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>      82.86            +0.0       82.87            +1.3       84.15            +1.1       83.98        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       0.73            +0.0        0.74            -0.0        0.71            -0.0        0.70        perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.64            +0.0        0.66 ą  2%      -0.1        0.57            -0.1        0.58        perf-profile.calltrace.cycles-pp.w_test
>      78.16            +0.0       78.18            +1.7       79.81            +1.4       79.57        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>      96.83            +0.0       96.86            +0.2       97.07            +0.3       97.12        perf-profile.calltrace.cycles-pp.write
>       0.78 ą  3%      +0.0        0.82 ą  2%      +0.1        0.88 ą  3%      +0.3        1.05 ą  8%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
>       1.12 ą  2%      +0.0        1.16            -0.1        1.02            -0.0        1.11        perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
>       4.23            +0.0        4.27            -0.3        3.94            -0.1        4.13        perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
>       2.70            +0.0        2.74            -0.2        2.50            -0.1        2.63        perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
>       3.52            +0.0        3.57            -0.3        3.26            -0.1        3.41        perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
>       6.89            +0.1        6.94            -0.4        6.47            -0.1        6.77        perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
>       2.10 ą  5%      +0.1        2.15 ą  3%      -0.1        2.03 ą  4%      +0.2        2.25 ą  3%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       2.96            +0.1        3.06            +0.1        3.08            +0.4        3.36 ą  2%  perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
>       3.61            +0.1        3.72            +0.1        3.71            +0.4        4.00 ą  2%  perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
>      37.30            +0.1       37.42            -1.6       35.66            +0.2       37.50        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       5.32            +0.2        5.48            -0.1        5.18            -0.1        5.21        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
>       4.26 ą  3%      +0.2        4.42            +5.4        9.61 ą  5%      +0.7        4.98        perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.20            +0.2        6.37            -0.2        5.98            -0.2        6.03        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
>      12.00            +0.2       12.19            -0.4       11.61            +0.2       12.25        perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
>       3.02            -0.2        2.79            -0.3        2.72 ą  3%      +0.0        3.04        perf-profile.children.cycles-pp.down_write
>       6.18            -0.1        6.05            +0.1        6.28            +1.1        7.28        perf-profile.children.cycles-pp.filemap_get_entry
>      12.68            -0.1       12.56            -0.3       12.34            +0.7       13.42        perf-profile.children.cycles-pp.__filemap_get_folio
>      13.58            -0.1       13.47            -0.5       13.13            +0.7       14.29        perf-profile.children.cycles-pp.simple_write_begin
>      58.65            -0.1       58.58            -2.5       56.11            +0.7       59.36        perf-profile.children.cycles-pp.generic_file_write_iter
>       3.64 ą  3%      -0.1        3.58            +4.7        8.36 ą  6%      -0.1        3.56        perf-profile.children.cycles-pp.security_file_permission
>       4.19            -0.1        4.14            -0.2        3.96            -0.2        3.98        perf-profile.children.cycles-pp.__cond_resched
>       2.67 ą  5%      -0.1        2.61            +4.7        7.40 ą  6%      -0.2        2.51        perf-profile.children.cycles-pp.apparmor_file_permission
>       3.81            -0.1        3.75            -0.4        3.45            +0.1        3.90        perf-profile.children.cycles-pp.__fsnotify_parent
>       2.42            -0.0        2.38            -0.2        2.23            -0.2        2.25        perf-profile.children.cycles-pp.rcu_all_qs
>       7.43            -0.0        7.39            -0.5        6.91            -0.5        6.92        perf-profile.children.cycles-pp.entry_SYSCALL_64
>      98.97            -0.0       98.94            +0.1       99.06            +0.1       99.04        perf-profile.children.cycles-pp.write
>       5.69            -0.0        5.66            -0.3        5.44            -0.1        5.62        perf-profile.children.cycles-pp.simple_write_end
>       1.21            -0.0        1.18            -0.1        1.06            -0.2        1.04        perf-profile.children.cycles-pp.syscall_return_via_sysret
>      75.19            -0.0       75.17            +1.8       76.99            +1.3       76.53        perf-profile.children.cycles-pp.vfs_write
>       1.12            -0.0        1.11            -0.1        1.03            -0.1        1.04        perf-profile.children.cycles-pp.folio_wait_stable
>       1.90            -0.0        1.88            -0.1        1.80            -0.1        1.84        perf-profile.children.cycles-pp.folio_unlock
>       1.99            -0.0        1.98            -0.2        1.82            -0.1        1.86        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       0.76            -0.0        0.75            -0.1        0.70            -0.1        0.70        perf-profile.children.cycles-pp.x64_sys_call
>       0.23            -0.0        0.23 ą  2%      -0.0        0.22            -0.0        0.22        perf-profile.children.cycles-pp.file_remove_privs
>       2.47            -0.0        2.46            -0.1        2.37            -0.1        2.41        perf-profile.children.cycles-pp.xas_load
>       0.64            -0.0        0.64            -0.1        0.58            -0.0        0.60        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.37            -0.0        0.37            -0.0        0.35            -0.0        0.35        perf-profile.children.cycles-pp.__x64_sys_write
>       0.36            -0.0        0.36            -0.0        0.33            -0.0        0.34        perf-profile.children.cycles-pp.amd_clear_divider
>       1.26            -0.0        1.26            -0.1        1.16            -0.1        1.19        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
>       0.59            -0.0        0.59            -0.1        0.54            -0.0        0.56        perf-profile.children.cycles-pp.inode_to_bdi
>       1.74            -0.0        1.74            -0.1        1.62            -0.1        1.66        perf-profile.children.cycles-pp.up_write
>       1.05            -0.0        1.05            -0.1        0.97            -0.0        1.01        perf-profile.children.cycles-pp.folio_mapping
>       0.36            +0.0        0.36            -0.0        0.34            -0.0        0.34        perf-profile.children.cycles-pp.is_bad_inode
>       0.33            +0.0        0.33            -0.0        0.30            -0.0        0.31        perf-profile.children.cycles-pp.noop_dirty_folio
>       0.25            +0.0        0.25            -0.0        0.23            -0.0        0.23        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
>       1.10            +0.0        1.10            -0.1        1.04            -0.1        1.04        perf-profile.children.cycles-pp.xattr_resolve_name
>       0.85            +0.0        0.85            -0.0        0.80            -0.0        0.82        perf-profile.children.cycles-pp.setattr_should_drop_suidgid
>       0.55            +0.0        0.55            -0.0        0.52            -0.0        0.52        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
>       0.92            +0.0        0.94            -0.1        0.87            +0.0        0.95        perf-profile.children.cycles-pp.folio_mark_dirty
>       0.97            +0.0        0.98            +0.0        0.99            -0.0        0.93        perf-profile.children.cycles-pp.aa_file_perm
>      83.54            +0.0       83.55            +1.3       84.79            +1.1       84.63        perf-profile.children.cycles-pp.do_syscall_64
>       7.14            +0.0        7.15            -0.4        6.72            -0.1        7.03        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
>       0.45 ą  2%      +0.0        0.47            -0.0        0.41            -0.0        0.41        perf-profile.children.cycles-pp.write@plt
>      84.71            +0.0       84.72            +1.2       85.88            +1.1       85.76        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>       4.41            +0.0        4.43            -0.3        4.11            -0.3        4.16        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.98            +0.0        1.00            -0.1        0.87            -0.1        0.89        perf-profile.children.cycles-pp.w_test
>      78.77            +0.0       78.80            +1.6       80.39            +1.4       80.17        perf-profile.children.cycles-pp.ksys_write
>       0.89 ą  3%      +0.0        0.93            +0.1        0.97 ą  3%      +0.3        1.16 ą  7%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
>       1.37            +0.0        1.42            -0.1        1.24            -0.0        1.34        perf-profile.children.cycles-pp.strcmp
>       4.46            +0.0        4.51            -0.3        4.15            -0.1        4.35        perf-profile.children.cycles-pp.security_inode_need_killpriv
>       3.75            +0.0        3.80            -0.3        3.47            -0.1        3.63        perf-profile.children.cycles-pp.cap_inode_need_killpriv
>       7.24            +0.1        7.30            -0.4        6.81            -0.1        7.11        perf-profile.children.cycles-pp.file_remove_privs_flags
>       3.39            +0.1        3.45            -0.2        3.15            -0.1        3.29        perf-profile.children.cycles-pp.__vfs_getxattr
>       3.38            +0.1        3.48            +0.1        3.46            +0.4        3.74 ą  2%  perf-profile.children.cycles-pp.inode_needs_update_time
>       3.94            +0.1        4.05            +0.1        4.02            +0.4        4.32        perf-profile.children.cycles-pp.file_update_time
>      38.19            +0.1       38.33            -1.7       36.51            +0.2       38.39        perf-profile.children.cycles-pp.generic_perform_write
>       5.71            +0.2        5.87            -0.2        5.49            -0.2        5.54        perf-profile.children.cycles-pp.fault_in_readable
>       4.50 ą  3%      +0.2        4.67            +5.4        9.86 ą  4%      +0.9        5.35        perf-profile.children.cycles-pp.rw_verify_area
>       6.45            +0.2        6.64            -0.2        6.22            -0.2        6.28        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
>      12.34            +0.2       12.53            -0.4       11.93            +0.2       12.58        perf-profile.children.cycles-pp.__generic_file_write_iter
>       1.98 ą  3%      -0.2        1.79            -0.2        1.74 ą  4%      +0.1        2.04        perf-profile.self.cycles-pp.down_write
>       3.66            -0.1        3.54            +0.2        3.86            +1.2        4.82 ą  2%  perf-profile.self.cycles-pp.filemap_get_entry
>       3.64            -0.1        3.57            -0.4        3.28            +0.1        3.73        perf-profile.self.cycles-pp.__fsnotify_parent
>       1.54 ą  9%      -0.1        1.47 ą  2%      +4.7        6.22 ą  8%      -0.1        1.44        perf-profile.self.cycles-pp.apparmor_file_permission
>       1.71 ą  2%      -0.1        1.65            -0.0        1.69 ą  2%      +0.1        1.82        perf-profile.self.cycles-pp.generic_file_write_iter
>       6.45            -0.1        6.39            -0.5        5.95            -0.5        6.00        perf-profile.self.cycles-pp.write
>       7.25            -0.1        7.20            -0.6        6.64            -0.3        6.98        perf-profile.self.cycles-pp.vfs_write
>       2.35            -0.0        2.31            -0.1        2.25            -0.1        2.22        perf-profile.self.cycles-pp.__cond_resched
>       2.76            -0.0        2.72            -0.1        2.65            -0.0        2.72        perf-profile.self.cycles-pp.simple_write_end
>       0.80 ą  4%      -0.0        0.78 ą  2%      -0.1        0.76 ą  2%      +0.1        0.88        perf-profile.self.cycles-pp.generic_write_check_limits
>       1.20            -0.0        1.18            -0.1        1.06            -0.2        1.03        perf-profile.self.cycles-pp.syscall_return_via_sysret
>       1.41            -0.0        1.39            -0.1        1.30            -0.1        1.34        perf-profile.self.cycles-pp.rcu_all_qs
>       1.76            -0.0        1.75            -0.1        1.68            -0.0        1.72        perf-profile.self.cycles-pp.folio_unlock
>       1.90            -0.0        1.90            -0.1        1.76            -0.1        1.78        perf-profile.self.cycles-pp.do_syscall_64
>       0.54            -0.0        0.53            -0.0        0.50            -0.0        0.50        perf-profile.self.cycles-pp.folio_wait_stable
>       1.80            -0.0        1.80            -0.1        1.72            -0.0        1.75        perf-profile.self.cycles-pp.xas_load
>       1.10            -0.0        1.09            -0.0        1.06            -0.1        1.02        perf-profile.self.cycles-pp.security_file_permission
>       1.48            -0.0        1.47            -0.1        1.36            -0.1        1.38        perf-profile.self.cycles-pp.ksys_write
>       0.62            -0.0        0.62            -0.0        0.58            -0.0        0.58        perf-profile.self.cycles-pp.x64_sys_call
>       0.35            -0.0        0.35            -0.0        0.32            -0.0        0.33        perf-profile.self.cycles-pp.cap_inode_need_killpriv
>       0.24            -0.0        0.24            -0.0        0.23 ą  2%      -0.0        0.23        perf-profile.self.cycles-pp.is_bad_inode
>       0.87            -0.0        0.87            -0.1        0.80            -0.0        0.83        perf-profile.self.cycles-pp.folio_mapping
>       0.24            -0.0        0.24            -0.0        0.22            -0.0        0.22        perf-profile.self.cycles-pp.amd_clear_divider
>       0.70            -0.0        0.70            -0.0        0.67            +0.0        0.72        perf-profile.self.cycles-pp.security_inode_need_killpriv
>       0.52            -0.0        0.52            -0.0        0.47            -0.0        0.49        perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
>       1.62            +0.0        1.62            -0.1        1.51            -0.1        1.55        perf-profile.self.cycles-pp.up_write
>       0.22            +0.0        0.22            -0.0        0.21 ą  2%      -0.0        0.21        perf-profile.self.cycles-pp.noop_dirty_folio
>       1.09            +0.0        1.10            -0.1        1.01            -0.1        1.02        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
>       0.35            +0.0        0.36            -0.0        0.32            -0.0        0.34        perf-profile.self.cycles-pp.inode_to_bdi
>       1.25            +0.0        1.25            -0.1        1.16            -0.0        1.20        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       0.25            +0.0        0.25            -0.0        0.23            -0.0        0.23        perf-profile.self.cycles-pp.__x64_sys_write
>       0.79            +0.0        0.80            -0.1        0.74            -0.0        0.75        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
>       2.04            +0.0        2.04            -0.1        1.95            +0.0        2.04        perf-profile.self.cycles-pp.file_remove_privs_flags
>       0.55            +0.0        0.55            -0.0        0.52            -0.0        0.52        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
>       0.61            +0.0        0.62            -0.0        0.58            -0.0        0.59        perf-profile.self.cycles-pp.xattr_resolve_name
>       0.73            +0.0        0.73            -0.0        0.69            -0.0        0.70        perf-profile.self.cycles-pp.setattr_should_drop_suidgid
>       4.51            +0.0        4.52            -0.3        4.23            -0.2        4.27        perf-profile.self.cycles-pp.__filemap_get_folio
>       0.47            +0.0        0.48            -0.0        0.45            +0.0        0.49        perf-profile.self.cycles-pp.folio_mark_dirty
>       0.87            +0.0        0.88            +0.0        0.89            -0.0        0.83        perf-profile.self.cycles-pp.aa_file_perm
>       6.97            +0.0        6.98            -0.4        6.56            -0.1        6.86        perf-profile.self.cycles-pp.copy_page_from_iter_atomic
>       1.39            +0.0        1.40            -0.1        1.30            -0.0        1.34        perf-profile.self.cycles-pp.__vfs_getxattr
>       1.65            +0.0        1.67            -0.1        1.56            -0.1        1.60        perf-profile.self.cycles-pp.entry_SYSCALL_64
>       4.32            +0.0        4.34            -0.3        4.02            -0.3        4.06        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.90            +0.0        0.92            -0.1        0.80            -0.0        0.87        perf-profile.self.cycles-pp.simple_write_begin
>       0.66            +0.0        0.68            -0.0        0.64            +0.0        0.67        perf-profile.self.cycles-pp.file_update_time
>       1.15            +0.0        1.18            -0.0        1.10            -0.0        1.15        perf-profile.self.cycles-pp.__generic_file_write_iter
>       0.78            +0.0        0.81            -0.1        0.69            -0.1        0.71        perf-profile.self.cycles-pp.w_test
>       1.02 ą  2%      +0.0        1.06            -0.1        0.91            -0.0        1.00        perf-profile.self.cycles-pp.strcmp
>       1.50            +0.0        1.53            +0.0        1.52            +0.1        1.58        perf-profile.self.cycles-pp.inode_needs_update_time
>       0.78 ą  4%      +0.0        0.82            +0.1        0.86 ą  3%      +0.3        1.05 ą  8%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
>       3.35            +0.1        3.44            -0.2        3.18            -0.0        3.31        perf-profile.self.cycles-pp.generic_perform_write
>       5.56            +0.2        5.72            -0.2        5.33            -0.2        5.38        perf-profile.self.cycles-pp.fault_in_readable
>       0.85            +0.2        1.09            +0.7        1.50            +1.1        1.92        perf-profile.self.cycles-pp.rw_verify_area
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-05  5:48                     ` Amir Goldstein
  2024-07-08  5:40                       ` Oliver Sang
@ 2024-07-25 13:41                       ` Jan Kara
  2024-07-25 14:04                         ` Amir Goldstein
  1 sibling, 1 reply; 17+ messages in thread
From: Jan Kara @ 2024-07-25 13:41 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Oliver Sang, Jan Kara, oe-lkp, lkp

On Fri 05-07-24 08:48:28, Amir Goldstein wrote:
> On Fri, Jul 5, 2024 at 5:09 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Amir,
> >
> > On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote:
> >
> > [...]
> >
> > > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> > > >   "unixbench.throughput": [
> > > >     121545292.8,
> > > >     121629889.4,
> > > >     121598992.0,
> > > >     121492095.5,
> > > >     121645038.1,
> > > >     121556286.9
> > > >   ],
> > > >
> > >
> > > Are all those runs from the same boot?
> >
> > no. we reboot machine before each run.
> >
> > >
> > > > for the branch tip a82fd282befc7:
> > > >   "unixbench.throughput": [
> > > >     116675606.7,
> > > >     116840611.2,
> > > >     116738966.0,
> > > >     116956953.1,
> > > >     116704901.9,
> > > >     116997628.3,
> > > >     117141733.7,
> > > >     116660495.4
> > > >   ],
> > > >
> > >
> > > And these run?
> >
> > same.
> >
> > >
> > > Otherwise, we might have a fluctuation that happens at boot time
> > > or at mount time or something.
> > >
> > > >
> > > > let me combine the results from this branch together:
> > > >
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > >
> > > > commit:
> > > >   v6.10-rc1
> > > >   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > >   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > >   64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > >   a82fd282befc7 fanotify: report file range info with pre-content
> > > >
> > > >        v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > > > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> > > >          %stddev     %change         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
> > > >              \          |                \          |                \          |                \          |                \
> > > >  1.216e+08            -3.5%  1.174e+08            -4.3%  1.163e+08            -6.6%  1.135e+08            -3.9%  1.168e+08        unixbench.throughput
> > > >
> > > >
> > > > one thing I want to mention is the "%change" is always comparing to the first
> > > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > > > and so on.
> > >
> > > Thanks for clarifying - I did not read it this way.
> > >
> > > >
> > > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > > > -2.4% regression compareing to 94167e071109d.
> > > >
> > > > from above table, along the branch, the performance is kind of fluctuating,
> > > > dropped most on 64108c0b47db9, but then recovered a little on tip.
> > > >
> > >
> > > I can understand why 64108c0b47db91b would regress performance, but I
> > > cannot think
> > > of any possible explanation why a82fd282befc should improve performance,
> > > so I have to wonder if the regression to -6.6% is not a fluke of some
> > > specific boot/mount?
> > >
> > > I pushed a test branch to
> > > https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> > > with an extra patch that un-inlines some helpers to help bisect the
> > > perf report better.
> > > Maybe produce the report with this commit and it sheds some light.
> >
> > since
> >
> > * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> > * f301cd18006c3 fanotify: rename a misnamed constant
> > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > * aca4084213276 fsnotify: generate pre-content permission event on open
> > * 93656e196b006 fsnotify: introduce pre-content permission event
> > * 1613e604df0cd (tag: v6.10-rc1,
> >
> > we run tests upon new commit. summary report is as below:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> >   v6.10-rc1
> >   a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> >   388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> >
> >        v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
> > ---------------- --------------------------- ---------------------------
> >          %stddev     %change         %stddev     %change         %stddev
> >              \          |                \          |                \
> >  1.216e+08            -3.9%  1.168e+08            -4.1%  1.166e+08        unixbench.throughput
> >:
> >
> > since Jan mentioned in a later mail that perf profiles are useful, I put details
> > as below
> 
> Thanks.
> That clarifies that the cycles are spent in the "optimization code" itself.

BTW, Amir how did you decide that the time is spent in the "optimization
code"? I've seen in the perf output there are more cache misses, smaller
IPC, but didn't see a particular place where this would be happening...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
  2024-07-25 13:41                       ` Jan Kara
@ 2024-07-25 14:04                         ` Amir Goldstein
  0 siblings, 0 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-07-25 14:04 UTC (permalink / raw)
  To: Jan Kara; +Cc: Oliver Sang, oe-lkp, lkp

On Thu, Jul 25, 2024 at 4:41 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 05-07-24 08:48:28, Amir Goldstein wrote:
> > On Fri, Jul 5, 2024 at 5:09 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > >
> > > hi, Amir,
> > >
> > > On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote:
> > >
> > > [...]
> > >
> > > > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> > > > >   "unixbench.throughput": [
> > > > >     121545292.8,
> > > > >     121629889.4,
> > > > >     121598992.0,
> > > > >     121492095.5,
> > > > >     121645038.1,
> > > > >     121556286.9
> > > > >   ],
> > > > >
> > > >
> > > > Are all those runs from the same boot?
> > >
> > > no. we reboot machine before each run.
> > >
> > > >
> > > > > for the branch tip a82fd282befc7:
> > > > >   "unixbench.throughput": [
> > > > >     116675606.7,
> > > > >     116840611.2,
> > > > >     116738966.0,
> > > > >     116956953.1,
> > > > >     116704901.9,
> > > > >     116997628.3,
> > > > >     117141733.7,
> > > > >     116660495.4
> > > > >   ],
> > > > >
> > > >
> > > > And these run?
> > >
> > > same.
> > >
> > > >
> > > > Otherwise, we might have a fluctuation that happens at boot time
> > > > or at mount time or something.
> > > >
> > > > >
> > > > > let me combine the results from this branch together:
> > > > >
> > > > > =========================================================================================
> > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > >
> > > > > commit:
> > > > >   v6.10-rc1
> > > > >   68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > > >   94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > > >   64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > > >   a82fd282befc7 fanotify: report file range info with pre-content
> > > > >
> > > > >        v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > > > > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> > > > >          %stddev     %change         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
> > > > >              \          |                \          |                \          |                \          |                \
> > > > >  1.216e+08            -3.5%  1.174e+08            -4.3%  1.163e+08            -6.6%  1.135e+08            -3.9%  1.168e+08        unixbench.throughput
> > > > >
> > > > >
> > > > > one thing I want to mention is the "%change" is always comparing to the first
> > > > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > > > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > > > > and so on.
> > > >
> > > > Thanks for clarifying - I did not read it this way.
> > > >
> > > > >
> > > > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > > > > -2.4% regression compareing to 94167e071109d.
> > > > >
> > > > > from above table, along the branch, the performance is kind of fluctuating,
> > > > > dropped most on 64108c0b47db9, but then recovered a little on tip.
> > > > >
> > > >
> > > > I can understand why 64108c0b47db91b would regress performance, but I
> > > > cannot think
> > > > of any possible explanation why a82fd282befc should improve performance,
> > > > so I have to wonder if the regression to -6.6% is not a fluke of some
> > > > specific boot/mount?
> > > >
> > > > I pushed a test branch to
> > > > https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> > > > with an extra patch that un-inlines some helpers to help bisect the
> > > > perf report better.
> > > > Maybe produce the report with this commit and it sheds some light.
> > >
> > > since
> > >
> > > * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> > > * f301cd18006c3 fanotify: rename a misnamed constant
> > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > > * aca4084213276 fsnotify: generate pre-content permission event on open
> > > * 93656e196b006 fsnotify: introduce pre-content permission event
> > > * 1613e604df0cd (tag: v6.10-rc1,
> > >
> > > we run tests upon new commit. summary report is as below:
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > >   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > >   v6.10-rc1
> > >   a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> > >   388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> > >
> > >        v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
> > > ---------------- --------------------------- ---------------------------
> > >          %stddev     %change         %stddev     %change         %stddev
> > >              \          |                \          |                \
> > >  1.216e+08            -3.9%  1.168e+08            -4.1%  1.166e+08        unixbench.throughput
> > >:
> > >
> > > since Jan mentioned in a later mail that perf profiles are useful, I put details
> > > as below
> >
> > Thanks.
> > That clarifies that the cycles are spent in the "optimization code" itself.
>
> BTW, Amir how did you decide that the time is spent in the "optimization
> code"? I've seen in the perf output there are more cache misses, smaller
> IPC, but didn't see a particular place where this would be happening...
>

Oh no I just meant that because there is so much inlined code
in the hooks, I couldn't say for sure if the cycles are spent in the
optimization code that tries to avoid fsnotify_parent() or also
in fsnotify_parent() inline wrapper, so I used the extern fsnotify_path()
jump point to break inlining.

Maybe this was an unneeded test with obvious outcome, but
I began to suspect that the fsnotify_sb_has_priority_watchers()
optimization may have a bug.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-07-25 14:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29  8:25 [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression kernel test robot
2024-05-29 11:17 ` Amir Goldstein
2024-05-31  3:15   ` Oliver Sang
2024-05-31  5:18     ` Amir Goldstein
2024-06-03  8:13       ` Oliver Sang
2024-06-04 12:33         ` Amir Goldstein
2024-07-01  7:42           ` Oliver Sang
2024-07-03  5:58             ` Amir Goldstein
2024-07-03  7:21               ` Oliver Sang
2024-07-03 16:20                 ` Amir Goldstein
2024-07-04 15:39                   ` Jan Kara
2024-07-05  2:09                   ` Oliver Sang
2024-07-05  5:48                     ` Amir Goldstein
2024-07-08  5:40                       ` Oliver Sang
2024-07-08 16:37                         ` Amir Goldstein
2024-07-25 13:41                       ` Jan Kara
2024-07-25 14:04                         ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.