* [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
@ 2024-05-29 8:25 kernel test robot
2024-05-29 11:17 ` Amir Goldstein
0 siblings, 1 reply; 17+ messages in thread
From: kernel test robot @ 2024-05-29 8:25 UTC (permalink / raw)
To: Amir Goldstein; +Cc: oe-lkp, lkp, oliver.sang
Hello,
kernel test robot noticed a -7.9% regression of unixbench.throughput on:
commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
https://github.com/amir73il/linux sb_write_barrier
testcase: unixbench
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:
runtime: 300s
nr_task: 100%
test: fsbuffer-w
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event")
9d1fd61f1d ("fanotify: pass optional file access range in pre-content event")
00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.23e+08 -7.9% 1.133e+08 unixbench.throughput
6169 -7.7% 5694 unixbench.time.user_time
4.566e+10 -7.9% 4.206e+10 unixbench.workload
1.513e+11 -4.5% 1.445e+11 perf-stat.i.branch-instructions
6891152 +4.8% 7221484 perf-stat.i.branch-misses
29764445 ± 2% -7.4% 27565609 ± 3% perf-stat.i.cache-references
0.91 +2.0% 0.93 perf-stat.i.cpi
7.187e+11 -2.7% 6.996e+11 perf-stat.i.instructions
1.26 -2.6% 1.23 perf-stat.i.ipc
0.00 +0.0 0.01 perf-stat.overall.branch-miss-rate%
0.73 +2.7% 0.75 perf-stat.overall.cpi
1.37 -2.6% 1.34 perf-stat.overall.ipc
5828 +5.7% 6162 perf-stat.overall.path-length
1.505e+11 -4.5% 1.437e+11 perf-stat.ps.branch-instructions
6873687 +4.8% 7203107 perf-stat.ps.branch-misses
29721957 ± 2% -7.3% 27538369 ± 3% perf-stat.ps.cache-references
7.148e+11 -2.6% 6.96e+11 perf-stat.ps.instructions
2.662e+14 -2.6% 2.592e+14 perf-stat.total.instructions
57.79 -2.0 55.78 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.58 -2.0 35.63 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
13.06 -1.0 12.04 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
13.81 -1.0 12.83 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
12.72 -0.9 11.78 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
7.00 -0.5 6.47 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
6.53 -0.5 6.02 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
5.36 -0.5 4.89 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
3.66 -0.4 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
2.68 -0.3 2.36 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
6.57 -0.2 6.34 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
2.36 ± 2% -0.2 2.18 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.83 -0.2 1.66 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
2.92 -0.2 2.76 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
2.65 -0.2 2.49 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
3.95 -0.1 3.83 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
1.62 -0.1 1.50 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
0.74 -0.1 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.26 -0.1 3.17 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
3.57 -0.1 3.49 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.61 -0.1 1.53 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.93 -0.1 0.85 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
1.05 -0.1 0.99 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
0.61 -0.1 0.55 perf-profile.calltrace.cycles-pp.w_test
0.64 -0.1 0.58 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.87 -0.1 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
2.50 -0.1 2.44 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
0.62 -0.1 0.56 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
0.74 -0.0 0.69 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
0.91 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.84 -0.0 0.79 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
0.68 -0.0 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.74 -0.0 0.71 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.97 +0.0 1.00 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
0.91 +0.1 0.97 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
0.86 ± 3% +0.1 0.94 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
0.58 ± 2% +0.1 0.66 ± 7% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
11.24 +0.1 11.36 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
2.01 ± 2% +0.1 2.14 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
6.04 +0.2 6.24 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
5.17 +0.2 5.42 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
96.75 +0.3 97.03 perf-profile.calltrace.cycles-pp.write
2.57 +0.4 2.92 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
3.20 +0.4 3.57 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
84.82 +1.1 85.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
83.38 +1.2 84.56 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
78.73 +1.5 80.20 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
74.54 +1.8 76.32 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.00 +4.0 3.99 perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64
5.32 +4.2 9.48 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
58.42 -2.0 56.38 perf-profile.children.cycles-pp.generic_file_write_iter
38.46 -2.0 36.50 perf-profile.children.cycles-pp.generic_perform_write
13.99 -1.0 13.01 perf-profile.children.cycles-pp.simple_write_begin
13.11 -1.0 12.15 perf-profile.children.cycles-pp.__filemap_get_folio
7.23 -0.6 6.66 perf-profile.children.cycles-pp.entry_SYSCALL_64
7.12 -0.5 6.59 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
6.73 -0.5 6.21 perf-profile.children.cycles-pp.filemap_get_entry
5.76 -0.5 5.26 perf-profile.children.cycles-pp.simple_write_end
4.05 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission
2.93 -0.3 2.59 perf-profile.children.cycles-pp.apparmor_file_permission
4.32 -0.3 4.04 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
4.20 -0.3 3.92 perf-profile.children.cycles-pp.__cond_resched
6.91 -0.2 6.67 perf-profile.children.cycles-pp.file_remove_privs_flags
2.43 -0.2 2.24 perf-profile.children.cycles-pp.rcu_all_qs
3.10 -0.2 2.92 perf-profile.children.cycles-pp.xas_load
2.47 ± 2% -0.2 2.29 ± 2% perf-profile.children.cycles-pp.__fdget_pos
1.92 -0.2 1.74 perf-profile.children.cycles-pp.folio_unlock
3.11 -0.2 2.94 perf-profile.children.cycles-pp.down_write
4.18 -0.1 4.04 perf-profile.children.cycles-pp.security_inode_need_killpriv
1.68 -0.1 1.56 perf-profile.children.cycles-pp.up_write
3.48 -0.1 3.38 perf-profile.children.cycles-pp.cap_inode_need_killpriv
1.96 -0.1 1.87 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.28 -0.1 1.18 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
0.92 -0.1 0.84 perf-profile.children.cycles-pp.w_test
3.14 -0.1 3.06 perf-profile.children.cycles-pp.__vfs_getxattr
1.00 -0.1 0.92 perf-profile.children.cycles-pp.aa_file_perm
1.29 -0.1 1.22 perf-profile.children.cycles-pp.xas_descend
0.76 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call
0.87 -0.1 0.80 perf-profile.children.cycles-pp.setattr_should_drop_suidgid
1.07 -0.1 1.01 perf-profile.children.cycles-pp.xattr_resolve_name
1.10 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable
1.05 -0.1 1.00 perf-profile.children.cycles-pp.folio_mapping
0.73 -0.1 0.67 perf-profile.children.cycles-pp.xas_start
0.93 -0.1 0.88 perf-profile.children.cycles-pp.folio_mark_dirty
0.50 -0.0 0.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.60 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
0.43 -0.0 0.39 perf-profile.children.cycles-pp.write@plt
0.36 -0.0 0.33 perf-profile.children.cycles-pp.amd_clear_divider
0.37 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
0.33 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio
0.36 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode
0.24 -0.0 0.23 ± 2% perf-profile.children.cycles-pp.file_remove_privs
1.18 +0.0 1.21 perf-profile.children.cycles-pp.strcmp
1.02 +0.1 1.08 perf-profile.children.cycles-pp.timestamp_truncate
99.01 +0.1 99.09 perf-profile.children.cycles-pp.write
0.98 ± 3% +0.1 1.06 perf-profile.children.cycles-pp.generic_write_check_limits
0.68 ± 2% +0.1 0.77 ± 6% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
11.58 +0.1 11.69 perf-profile.children.cycles-pp.__generic_file_write_iter
2.36 ± 2% +0.1 2.50 perf-profile.children.cycles-pp.generic_write_checks
5.57 +0.2 5.75 perf-profile.children.cycles-pp.fault_in_readable
6.28 +0.2 6.49 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
2.98 +0.4 3.33 perf-profile.children.cycles-pp.inode_needs_update_time
3.51 +0.4 3.89 perf-profile.children.cycles-pp.file_update_time
85.24 +1.1 86.31 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
84.05 +1.2 85.21 perf-profile.children.cycles-pp.do_syscall_64
79.32 +1.5 80.78 perf-profile.children.cycles-pp.ksys_write
75.49 +1.7 77.21 perf-profile.children.cycles-pp.vfs_write
3.64 +4.0 7.64 perf-profile.children.cycles-pp.__fsnotify_parent
5.68 +4.3 10.03 perf-profile.children.cycles-pp.rw_verify_area
6.96 -0.5 6.44 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
6.52 -0.5 6.01 perf-profile.self.cycles-pp.write
6.92 -0.4 6.48 perf-profile.self.cycles-pp.vfs_write
3.59 -0.3 3.24 perf-profile.self.cycles-pp.filemap_get_entry
4.41 -0.3 4.09 perf-profile.self.cycles-pp.__filemap_get_folio
4.23 -0.3 3.95 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
2.79 -0.3 2.52 perf-profile.self.cycles-pp.simple_write_end
1.76 -0.2 1.52 perf-profile.self.cycles-pp.apparmor_file_permission
2.32 ± 2% -0.2 2.16 ± 2% perf-profile.self.cycles-pp.__fdget_pos
1.79 -0.2 1.62 perf-profile.self.cycles-pp.folio_unlock
2.05 -0.2 1.89 perf-profile.self.cycles-pp.down_write
2.35 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched
1.89 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64
1.38 -0.1 1.26 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.56 -0.1 1.45 perf-profile.self.cycles-pp.up_write
1.30 -0.1 1.19 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.42 -0.1 1.31 perf-profile.self.cycles-pp.rcu_all_qs
1.12 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission
1.46 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
0.90 -0.1 0.83 perf-profile.self.cycles-pp.aa_file_perm
1.29 -0.1 1.22 perf-profile.self.cycles-pp.xas_load
0.74 -0.1 0.67 perf-profile.self.cycles-pp.w_test
1.08 -0.1 1.01 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
1.98 -0.1 1.92 perf-profile.self.cycles-pp.file_remove_privs_flags
1.30 -0.1 1.24 perf-profile.self.cycles-pp.__vfs_getxattr
1.06 -0.1 1.00 perf-profile.self.cycles-pp.xas_descend
0.80 -0.1 0.74 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
0.63 -0.1 0.58 perf-profile.self.cycles-pp.x64_sys_call
0.74 -0.1 0.69 perf-profile.self.cycles-pp.setattr_should_drop_suidgid
0.63 -0.0 0.58 perf-profile.self.cycles-pp.xas_start
0.87 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping
0.50 -0.0 0.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.60 -0.0 0.57 perf-profile.self.cycles-pp.xattr_resolve_name
0.48 -0.0 0.44 perf-profile.self.cycles-pp.folio_mark_dirty
0.68 -0.0 0.65 perf-profile.self.cycles-pp.security_inode_need_killpriv
0.36 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.inode_to_bdi
0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_wait_stable
0.34 -0.0 0.32 perf-profile.self.cycles-pp.cap_inode_need_killpriv
0.89 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin
0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
0.23 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.amd_clear_divider
0.23 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.write@plt
0.24 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.is_bad_inode
0.62 +0.0 0.65 perf-profile.self.cycles-pp.file_update_time
0.86 +0.0 0.90 perf-profile.self.cycles-pp.strcmp
0.69 +0.0 0.74 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
0.75 ± 3% +0.1 0.81 perf-profile.self.cycles-pp.generic_write_check_limits
1.42 ± 2% +0.1 1.48 perf-profile.self.cycles-pp.generic_write_checks
0.82 +0.1 0.89 perf-profile.self.cycles-pp.timestamp_truncate
0.58 ± 3% +0.1 0.66 ± 6% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
5.44 +0.2 5.60 perf-profile.self.cycles-pp.fault_in_readable
1.36 +0.2 1.55 perf-profile.self.cycles-pp.inode_needs_update_time
1.76 ± 3% +0.9 2.64 perf-profile.self.cycles-pp.rw_verify_area
3.46 +3.8 7.25 perf-profile.self.cycles-pp.__fsnotify_parent
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-05-29 8:25 [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression kernel test robot @ 2024-05-29 11:17 ` Amir Goldstein 2024-05-31 3:15 ` Oliver Sang 0 siblings, 1 reply; 17+ messages in thread From: Amir Goldstein @ 2024-05-29 11:17 UTC (permalink / raw) To: Jan Kara, oe-lkp; +Cc: lkp, kernel test robot On Wed, May 29, 2024 at 11:26 AM kernel test robot <oliver.sang@intel.com> wrote: > > > > Hello, > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > https://github.com/amir73il/linux sb_write_barrier > Jan, I speculate that the regression is due to the fact that we store and pass the path information on struct file_range on the stack before the optimizations in fsnotify_parent(), so rw_verify_area() pays some price for the stores and __fsnotify_parent() pays a bigger price for fetches? Luckily, we already have the way to check fsnotify_sb_has_priority_watchers(inode->i_sb, FSNOTIFY_PRIO_PRE_CONTENT)) so now I used it to optimize out the fsnotify_file_range() inline code entirely. Oliver, Can you please re-test with fixed branch (also rebased on v6.10-rc1): * a82fd282befc - (fan_pre_content) fanotify: report file range info with pre-content events * f301cd18006c - fanotify: rename a misnamed constant * 64108c0b47db - fanotify: pass optional file access range in pre-content event * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event * 83af0c89527a - fsnotify: generate pre-content permission event on exec * aca408421327 - fsnotify: generate pre-content permission event on open * 93656e196b00 - fsnotify: introduce pre-content permission event The optimization was done in the first commit (fsnotify: introduce pre-content permission event), but impacts the regressing commit (fanotify: pass optional file access range in pre-content event). no need to test all middle commits. Thanks, Amir. > testcase: unixbench > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory > parameters: > > runtime: 300s > nr_task: 100% > test: fsbuffer-w > cpufreq_governor: performance > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > 00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event") > 9d1fd61f1d ("fanotify: pass optional file access range in pre-content event") > > 00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 1.23e+08 -7.9% 1.133e+08 unixbench.throughput > 6169 -7.7% 5694 unixbench.time.user_time > 4.566e+10 -7.9% 4.206e+10 unixbench.workload > 1.513e+11 -4.5% 1.445e+11 perf-stat.i.branch-instructions > 6891152 +4.8% 7221484 perf-stat.i.branch-misses > 29764445 ± 2% -7.4% 27565609 ± 3% perf-stat.i.cache-references > 0.91 +2.0% 0.93 perf-stat.i.cpi > 7.187e+11 -2.7% 6.996e+11 perf-stat.i.instructions > 1.26 -2.6% 1.23 perf-stat.i.ipc > 0.00 +0.0 0.01 perf-stat.overall.branch-miss-rate% > 0.73 +2.7% 0.75 perf-stat.overall.cpi > 1.37 -2.6% 1.34 perf-stat.overall.ipc > 5828 +5.7% 6162 perf-stat.overall.path-length > 1.505e+11 -4.5% 1.437e+11 perf-stat.ps.branch-instructions > 6873687 +4.8% 7203107 perf-stat.ps.branch-misses > 29721957 ± 2% -7.3% 27538369 ± 3% perf-stat.ps.cache-references > 7.148e+11 -2.6% 6.96e+11 perf-stat.ps.instructions > 2.662e+14 -2.6% 2.592e+14 perf-stat.total.instructions > 57.79 -2.0 55.78 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 37.58 -2.0 35.63 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 13.06 -1.0 12.04 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write > 13.81 -1.0 12.83 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 12.72 -0.9 11.78 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write > 7.00 -0.5 6.47 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 6.53 -0.5 6.02 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 5.36 -0.5 4.89 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 3.66 -0.4 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64 > 2.68 -0.3 2.36 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write > 6.57 -0.2 6.34 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > 2.36 ± 2% -0.2 2.18 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 1.83 -0.2 1.66 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > 2.92 -0.2 2.76 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 2.65 -0.2 2.49 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write > 3.95 -0.1 3.83 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > 1.62 -0.1 1.50 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 0.74 -0.1 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 3.26 -0.1 3.17 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter > 3.57 -0.1 3.49 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 1.61 -0.1 1.53 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.93 -0.1 0.85 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 1.05 -0.1 0.99 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin > 0.61 -0.1 0.55 perf-profile.calltrace.cycles-pp.w_test > 0.64 -0.1 0.58 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.87 -0.1 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write > 2.50 -0.1 2.44 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter > 0.62 -0.1 0.56 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin > 0.74 -0.0 0.69 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > 0.91 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 0.84 -0.0 0.79 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > 0.68 -0.0 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 0.74 -0.0 0.71 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > 0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 0.97 +0.0 1.00 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > 0.91 +0.1 0.97 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter > 0.86 ± 3% +0.1 0.94 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write > 0.58 ± 2% +0.1 0.66 ± 7% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter > 11.24 +0.1 11.36 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 2.01 ± 2% +0.1 2.14 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 6.04 +0.2 6.24 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 5.17 +0.2 5.42 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write > 96.75 +0.3 97.03 perf-profile.calltrace.cycles-pp.write > 2.57 +0.4 2.92 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write > 3.20 +0.4 3.57 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > 84.82 +1.1 85.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write > 83.38 +1.2 84.56 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 78.73 +1.5 80.20 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 74.54 +1.8 76.32 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.00 +4.0 3.99 perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64 > 5.32 +4.2 9.48 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 58.42 -2.0 56.38 perf-profile.children.cycles-pp.generic_file_write_iter > 38.46 -2.0 36.50 perf-profile.children.cycles-pp.generic_perform_write > 13.99 -1.0 13.01 perf-profile.children.cycles-pp.simple_write_begin > 13.11 -1.0 12.15 perf-profile.children.cycles-pp.__filemap_get_folio > 7.23 -0.6 6.66 perf-profile.children.cycles-pp.entry_SYSCALL_64 > 7.12 -0.5 6.59 perf-profile.children.cycles-pp.copy_page_from_iter_atomic > 6.73 -0.5 6.21 perf-profile.children.cycles-pp.filemap_get_entry > 5.76 -0.5 5.26 perf-profile.children.cycles-pp.simple_write_end > 4.05 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission > 2.93 -0.3 2.59 perf-profile.children.cycles-pp.apparmor_file_permission > 4.32 -0.3 4.04 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack > 4.20 -0.3 3.92 perf-profile.children.cycles-pp.__cond_resched > 6.91 -0.2 6.67 perf-profile.children.cycles-pp.file_remove_privs_flags > 2.43 -0.2 2.24 perf-profile.children.cycles-pp.rcu_all_qs > 3.10 -0.2 2.92 perf-profile.children.cycles-pp.xas_load > 2.47 ± 2% -0.2 2.29 ± 2% perf-profile.children.cycles-pp.__fdget_pos > 1.92 -0.2 1.74 perf-profile.children.cycles-pp.folio_unlock > 3.11 -0.2 2.94 perf-profile.children.cycles-pp.down_write > 4.18 -0.1 4.04 perf-profile.children.cycles-pp.security_inode_need_killpriv > 1.68 -0.1 1.56 perf-profile.children.cycles-pp.up_write > 3.48 -0.1 3.38 perf-profile.children.cycles-pp.cap_inode_need_killpriv > 1.96 -0.1 1.87 perf-profile.children.cycles-pp.syscall_exit_to_user_mode > 1.28 -0.1 1.18 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags > 0.92 -0.1 0.84 perf-profile.children.cycles-pp.w_test > 3.14 -0.1 3.06 perf-profile.children.cycles-pp.__vfs_getxattr > 1.00 -0.1 0.92 perf-profile.children.cycles-pp.aa_file_perm > 1.29 -0.1 1.22 perf-profile.children.cycles-pp.xas_descend > 0.76 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call > 0.87 -0.1 0.80 perf-profile.children.cycles-pp.setattr_should_drop_suidgid > 1.07 -0.1 1.01 perf-profile.children.cycles-pp.xattr_resolve_name > 1.10 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable > 1.05 -0.1 1.00 perf-profile.children.cycles-pp.folio_mapping > 0.73 -0.1 0.67 perf-profile.children.cycles-pp.xas_start > 0.93 -0.1 0.88 perf-profile.children.cycles-pp.folio_mark_dirty > 0.50 -0.0 0.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack > 0.60 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi > 0.43 -0.0 0.39 perf-profile.children.cycles-pp.write@plt > 0.36 -0.0 0.33 perf-profile.children.cycles-pp.amd_clear_divider > 0.37 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write > 0.33 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio > 0.36 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode > 0.24 -0.0 0.23 ± 2% perf-profile.children.cycles-pp.file_remove_privs > 1.18 +0.0 1.21 perf-profile.children.cycles-pp.strcmp > 1.02 +0.1 1.08 perf-profile.children.cycles-pp.timestamp_truncate > 99.01 +0.1 99.09 perf-profile.children.cycles-pp.write > 0.98 ± 3% +0.1 1.06 perf-profile.children.cycles-pp.generic_write_check_limits > 0.68 ± 2% +0.1 0.77 ± 6% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 > 11.58 +0.1 11.69 perf-profile.children.cycles-pp.__generic_file_write_iter > 2.36 ± 2% +0.1 2.50 perf-profile.children.cycles-pp.generic_write_checks > 5.57 +0.2 5.75 perf-profile.children.cycles-pp.fault_in_readable > 6.28 +0.2 6.49 perf-profile.children.cycles-pp.fault_in_iov_iter_readable > 2.98 +0.4 3.33 perf-profile.children.cycles-pp.inode_needs_update_time > 3.51 +0.4 3.89 perf-profile.children.cycles-pp.file_update_time > 85.24 +1.1 86.31 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe > 84.05 +1.2 85.21 perf-profile.children.cycles-pp.do_syscall_64 > 79.32 +1.5 80.78 perf-profile.children.cycles-pp.ksys_write > 75.49 +1.7 77.21 perf-profile.children.cycles-pp.vfs_write > 3.64 +4.0 7.64 perf-profile.children.cycles-pp.__fsnotify_parent > 5.68 +4.3 10.03 perf-profile.children.cycles-pp.rw_verify_area > 6.96 -0.5 6.44 perf-profile.self.cycles-pp.copy_page_from_iter_atomic > 6.52 -0.5 6.01 perf-profile.self.cycles-pp.write > 6.92 -0.4 6.48 perf-profile.self.cycles-pp.vfs_write > 3.59 -0.3 3.24 perf-profile.self.cycles-pp.filemap_get_entry > 4.41 -0.3 4.09 perf-profile.self.cycles-pp.__filemap_get_folio > 4.23 -0.3 3.95 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack > 2.79 -0.3 2.52 perf-profile.self.cycles-pp.simple_write_end > 1.76 -0.2 1.52 perf-profile.self.cycles-pp.apparmor_file_permission > 2.32 ± 2% -0.2 2.16 ± 2% perf-profile.self.cycles-pp.__fdget_pos > 1.79 -0.2 1.62 perf-profile.self.cycles-pp.folio_unlock > 2.05 -0.2 1.89 perf-profile.self.cycles-pp.down_write > 2.35 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched > 1.89 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64 > 1.38 -0.1 1.26 perf-profile.self.cycles-pp.entry_SYSCALL_64 > 1.56 -0.1 1.45 perf-profile.self.cycles-pp.up_write > 1.30 -0.1 1.19 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe > 1.42 -0.1 1.31 perf-profile.self.cycles-pp.rcu_all_qs > 1.12 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission > 1.46 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write > 0.90 -0.1 0.83 perf-profile.self.cycles-pp.aa_file_perm > 1.29 -0.1 1.22 perf-profile.self.cycles-pp.xas_load > 0.74 -0.1 0.67 perf-profile.self.cycles-pp.w_test > 1.08 -0.1 1.01 perf-profile.self.cycles-pp.syscall_exit_to_user_mode > 1.98 -0.1 1.92 perf-profile.self.cycles-pp.file_remove_privs_flags > 1.30 -0.1 1.24 perf-profile.self.cycles-pp.__vfs_getxattr > 1.06 -0.1 1.00 perf-profile.self.cycles-pp.xas_descend > 0.80 -0.1 0.74 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags > 0.63 -0.1 0.58 perf-profile.self.cycles-pp.x64_sys_call > 0.74 -0.1 0.69 perf-profile.self.cycles-pp.setattr_should_drop_suidgid > 0.63 -0.0 0.58 perf-profile.self.cycles-pp.xas_start > 0.87 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping > 0.50 -0.0 0.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack > 0.60 -0.0 0.57 perf-profile.self.cycles-pp.xattr_resolve_name > 0.48 -0.0 0.44 perf-profile.self.cycles-pp.folio_mark_dirty > 0.68 -0.0 0.65 perf-profile.self.cycles-pp.security_inode_need_killpriv > 0.36 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.inode_to_bdi > 0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_wait_stable > 0.34 -0.0 0.32 perf-profile.self.cycles-pp.cap_inode_need_killpriv > 0.89 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin > 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write > 0.23 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.amd_clear_divider > 0.23 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio > 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.write@plt > 0.24 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.is_bad_inode > 0.62 +0.0 0.65 perf-profile.self.cycles-pp.file_update_time > 0.86 +0.0 0.90 perf-profile.self.cycles-pp.strcmp > 0.69 +0.0 0.74 perf-profile.self.cycles-pp.fault_in_iov_iter_readable > 0.75 ± 3% +0.1 0.81 perf-profile.self.cycles-pp.generic_write_check_limits > 1.42 ± 2% +0.1 1.48 perf-profile.self.cycles-pp.generic_write_checks > 0.82 +0.1 0.89 perf-profile.self.cycles-pp.timestamp_truncate > 0.58 ± 3% +0.1 0.66 ± 6% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 > 5.44 +0.2 5.60 perf-profile.self.cycles-pp.fault_in_readable > 1.36 +0.2 1.55 perf-profile.self.cycles-pp.inode_needs_update_time > 1.76 ± 3% +0.9 2.64 perf-profile.self.cycles-pp.rw_verify_area > 3.46 +3.8 7.25 perf-profile.self.cycles-pp.__fsnotify_parent > > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-05-29 11:17 ` Amir Goldstein @ 2024-05-31 3:15 ` Oliver Sang 2024-05-31 5:18 ` Amir Goldstein 0 siblings, 1 reply; 17+ messages in thread From: Oliver Sang @ 2024-05-31 3:15 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang hi, Amir, On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote: > On Wed, May 29, 2024 at 11:26 AM kernel test robot > <oliver.sang@intel.com> wrote: > > > > > > > > Hello, > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > > https://github.com/amir73il/linux sb_write_barrier > > > > Jan, > > I speculate that the regression is due to the fact that we store and pass the > path information on struct file_range on the stack before the optimizations > in fsnotify_parent(), so rw_verify_area() pays some price for the stores > and __fsnotify_parent() pays a bigger price for fetches? > > Luckily, we already have the way to check > fsnotify_sb_has_priority_watchers(inode->i_sb, > FSNOTIFY_PRIO_PRE_CONTENT)) > so now I used it to optimize out the fsnotify_file_range() inline > code entirely. > > Oliver, > > Can you please re-test with fixed branch (also rebased on v6.10-rc1): > > * a82fd282befc - (fan_pre_content) fanotify: report file range info > with pre-content events > * f301cd18006c - fanotify: rename a misnamed constant > * 64108c0b47db - fanotify: pass optional file access range in pre-content event > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event > * 83af0c89527a - fsnotify: generate pre-content permission event on exec > * aca408421327 - fsnotify: generate pre-content permission event on open > * 93656e196b00 - fsnotify: introduce pre-content permission event > > The optimization was done in the first commit (fsnotify: introduce > pre-content permission event), > but impacts the regressing commit (fanotify: pass optional file access > range in pre-content event). > no need to test all middle commits. I directly compare the tip with v6.10-rc1, still a regression but better now ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: v6.10-rc1 a82fd282befc7 ("fanotify: report file range info with pre-content events") v6.10-rc1 a82fd282befc71d99106bf31066 ---------------- --------------------------- %stddev %change %stddev \ | \ 1.216e+08 -3.9% 1.168e+08 unixbench.throughput full data is as below [1] then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event" it also has a small regression comparing to its parent, but better also. ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") 94167e071109d573 64108c0b47db91b20d658a89969 ---------------- --------------------------- %stddev %change %stddev \ | \ 1.163e+08 -2.4% 1.135e+08 unixbench.throughput full data is as below [2] [1] ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: v6.10-rc1 a82fd282befc7 ("fanotify: report file range info with pre-content events") v6.10-rc1 a82fd282befc71d99106bf31066 ---------------- --------------------------- %stddev %change %stddev \ | \ 1614 ± 6% +252.4% 5688 ± 67% numa-vmstat.node1.nr_mapped 6199 -5.8% 5841 time.user_time 220234 ± 13% +121.4% 487546 ± 41% numa-meminfo.node0.AnonPages.max 836146 ± 6% -36.0% 535267 ± 45% numa-meminfo.node1.AnonPages.max 6233 ± 7% +251.3% 21898 ± 69% numa-meminfo.node1.Mapped 1.216e+08 -3.9% 1.168e+08 unixbench.throughput 6199 -5.8% 5841 unixbench.time.user_time 4.513e+10 -3.9% 4.338e+10 unixbench.workload 1.458e+11 -2.7% 1.419e+11 perf-stat.i.branch-instructions 11.47 ± 6% +2.6 14.10 ± 9% perf-stat.i.cache-miss-rate% 3915539 ± 8% +510.0% 23884093 ± 9% perf-stat.i.cache-misses 32425619 ± 3% +396.4% 1.61e+08 ± 4% perf-stat.i.cache-references 151202 ± 16% -78.6% 32364 ± 56% perf-stat.i.cycles-between-cache-misses 6.961e+11 -1.9% 6.828e+11 perf-stat.i.instructions 1.22 -1.3% 1.20 perf-stat.i.ipc 0.01 ± 9% +519.5% 0.04 ± 10% perf-stat.overall.MPKI 0.01 +0.0 0.01 perf-stat.overall.branch-miss-rate% 12.09 ± 6% +2.8 14.86 ± 8% perf-stat.overall.cache-miss-rate% 0.75 +2.0% 0.77 perf-stat.overall.cpi 133775 ± 8% -83.5% 22060 ± 9% perf-stat.overall.cycles-between-cache-misses 1.33 -1.9% 1.31 perf-stat.overall.ipc 5721 +2.0% 5836 perf-stat.overall.path-length 1.452e+11 -2.7% 1.413e+11 perf-stat.ps.branch-instructions 3921138 ± 8% +507.4% 23818053 ± 9% perf-stat.ps.cache-misses 32415461 ± 3% +394.4% 1.603e+08 ± 4% perf-stat.ps.cache-references 6.932e+11 -1.9% 6.797e+11 perf-stat.ps.instructions 2.582e+14 -1.9% 2.532e+14 perf-stat.total.instructions 13.19 -0.7 12.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write 7.01 -0.2 6.80 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 1.11 -0.2 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write 2.50 -0.1 2.35 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write 1.68 -0.1 1.59 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 3.73 -0.1 3.64 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.62 -0.1 1.55 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 2.18 -0.1 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write 0.65 -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.w_test 0.92 -0.0 0.87 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 0.70 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write 0.86 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write 0.92 -0.0 0.88 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 0.63 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.86 -0.0 0.83 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags 3.53 -0.0 3.50 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter 0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 0.53 -0.0 0.51 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write 0.72 -0.0 0.71 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.75 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write 1.13 +0.0 1.17 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags 5.30 +0.1 5.36 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 5.30 +0.1 5.38 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write 6.17 +0.1 6.27 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 96.84 +0.1 96.98 perf-profile.calltrace.cycles-pp.write 0.78 ± 2% +0.3 1.13 ± 5% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter 2.97 +0.6 3.57 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write 12.01 +0.6 12.62 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 3.63 +0.6 4.24 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write 4.32 +0.6 4.96 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 37.28 +0.8 38.12 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 84.26 +1.0 85.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 13.39 +1.0 14.36 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 12.30 +1.0 13.30 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write 82.83 +1.0 83.86 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 57.94 +1.3 59.20 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.99 +1.3 7.25 ± 3% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 78.13 +1.3 79.41 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 74.26 +1.3 75.59 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 7.43 -0.4 7.06 perf-profile.children.cycles-pp.entry_SYSCALL_64 4.42 -0.2 4.18 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 1.21 -0.2 1.00 perf-profile.children.cycles-pp.syscall_return_via_sysret 7.14 -0.2 6.94 perf-profile.children.cycles-pp.copy_page_from_iter_atomic 4.18 -0.2 4.00 perf-profile.children.cycles-pp.__cond_resched 2.74 -0.2 2.58 perf-profile.children.cycles-pp.apparmor_file_permission 2.42 -0.1 2.30 perf-profile.children.cycles-pp.rcu_all_qs 3.82 -0.1 3.72 perf-profile.children.cycles-pp.__fsnotify_parent 1.74 -0.1 1.65 perf-profile.children.cycles-pp.up_write 1.99 -0.1 1.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.99 -0.1 0.91 perf-profile.children.cycles-pp.w_test 3.71 -0.1 3.64 perf-profile.children.cycles-pp.security_file_permission 2.47 -0.1 2.41 perf-profile.children.cycles-pp.xas_load 1.12 -0.1 1.06 perf-profile.children.cycles-pp.folio_wait_stable 1.26 -0.1 1.21 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags 0.75 -0.0 0.71 perf-profile.children.cycles-pp.x64_sys_call 0.98 -0.0 0.94 perf-profile.children.cycles-pp.aa_file_perm 0.46 -0.0 0.42 perf-profile.children.cycles-pp.write@plt 1.10 -0.0 1.07 perf-profile.children.cycles-pp.xattr_resolve_name 0.36 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider 3.76 -0.0 3.73 perf-profile.children.cycles-pp.cap_inode_need_killpriv 0.59 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi 3.41 -0.0 3.38 perf-profile.children.cycles-pp.__vfs_getxattr 0.56 -0.0 0.53 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 1.05 -0.0 1.03 perf-profile.children.cycles-pp.folio_mapping 0.38 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write 0.25 -0.0 0.24 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited 0.36 -0.0 0.35 perf-profile.children.cycles-pp.is_bad_inode 1.38 +0.0 1.40 perf-profile.children.cycles-pp.strcmp 0.93 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty 1.07 +0.0 1.09 perf-profile.children.cycles-pp.timestamp_truncate 5.70 +0.0 5.75 perf-profile.children.cycles-pp.simple_write_end 98.96 +0.1 99.02 perf-profile.children.cycles-pp.write 5.69 +0.1 5.75 perf-profile.children.cycles-pp.fault_in_readable 6.42 +0.1 6.53 perf-profile.children.cycles-pp.fault_in_iov_iter_readable 0.89 +0.3 1.24 ± 4% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 3.39 +0.6 3.97 perf-profile.children.cycles-pp.inode_needs_update_time 12.35 +0.6 12.96 perf-profile.children.cycles-pp.__generic_file_write_iter 3.96 +0.6 4.57 perf-profile.children.cycles-pp.file_update_time 4.56 +0.8 5.33 perf-profile.children.cycles-pp.rw_verify_area 38.16 +0.8 39.01 perf-profile.children.cycles-pp.generic_perform_write 13.58 +1.0 14.54 perf-profile.children.cycles-pp.simple_write_begin 84.67 +1.0 85.63 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 12.68 +1.0 13.68 perf-profile.children.cycles-pp.__filemap_get_folio 83.50 +1.0 84.51 perf-profile.children.cycles-pp.do_syscall_64 58.52 +1.3 59.78 perf-profile.children.cycles-pp.generic_file_write_iter 6.18 +1.3 7.44 ± 3% perf-profile.children.cycles-pp.filemap_get_entry 78.74 +1.3 80.00 perf-profile.children.cycles-pp.ksys_write 75.13 +1.3 76.42 perf-profile.children.cycles-pp.vfs_write 7.25 -0.6 6.64 perf-profile.self.cycles-pp.vfs_write 6.45 -0.4 6.08 perf-profile.self.cycles-pp.write 4.32 -0.2 4.08 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 1.21 -0.2 1.00 perf-profile.self.cycles-pp.syscall_return_via_sysret 6.98 -0.2 6.78 perf-profile.self.cycles-pp.copy_page_from_iter_atomic 4.52 -0.2 4.36 perf-profile.self.cycles-pp.__filemap_get_folio 2.34 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched 1.90 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64 1.60 -0.1 1.50 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission 1.47 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write 1.62 -0.1 1.53 perf-profile.self.cycles-pp.up_write 3.65 -0.1 3.56 perf-profile.self.cycles-pp.__fsnotify_parent 1.09 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode 1.80 -0.1 1.74 perf-profile.self.cycles-pp.xas_load 0.79 -0.1 0.73 perf-profile.self.cycles-pp.w_test 1.10 -0.1 1.04 perf-profile.self.cycles-pp.security_file_permission 1.25 -0.1 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 1.66 -0.1 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64 1.41 -0.0 1.36 perf-profile.self.cycles-pp.rcu_all_qs 0.90 -0.0 0.86 perf-profile.self.cycles-pp.simple_write_begin 0.88 -0.0 0.84 perf-profile.self.cycles-pp.aa_file_perm 0.80 -0.0 0.76 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags 0.62 -0.0 0.59 perf-profile.self.cycles-pp.x64_sys_call 1.39 -0.0 1.36 perf-profile.self.cycles-pp.__vfs_getxattr 0.53 -0.0 0.51 perf-profile.self.cycles-pp.folio_wait_stable 0.87 -0.0 0.85 perf-profile.self.cycles-pp.folio_mapping 0.56 -0.0 0.53 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.24 -0.0 0.22 perf-profile.self.cycles-pp.amd_clear_divider 0.12 ± 3% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.write@plt 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write 0.35 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi 0.22 -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio 0.66 +0.0 0.69 perf-profile.self.cycles-pp.file_update_time 1.03 +0.0 1.06 perf-profile.self.cycles-pp.strcmp 2.75 +0.0 2.79 perf-profile.self.cycles-pp.simple_write_end 0.72 +0.0 0.77 perf-profile.self.cycles-pp.fault_in_iov_iter_readable 0.87 +0.1 0.92 perf-profile.self.cycles-pp.timestamp_truncate 5.54 +0.1 5.59 perf-profile.self.cycles-pp.fault_in_readable 2.04 +0.1 2.09 perf-profile.self.cycles-pp.file_remove_privs_flags 1.51 +0.2 1.69 perf-profile.self.cycles-pp.inode_needs_update_time 0.78 ± 2% +0.3 1.11 ± 5% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 0.84 +1.0 1.82 perf-profile.self.cycles-pp.rw_verify_area 3.66 +1.3 4.97 ± 4% perf-profile.self.cycles-pp.filemap_get_entry [2] ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") 94167e071109d573 64108c0b47db91b20d658a89969 ---------------- --------------------------- %stddev %change %stddev \ | \ 38903 ±113% +313.8% 160973 ± 66% numa-meminfo.node1.AnonHugePages 1666466 ± 4% -12.2% 1462703 ± 9% numa-numastat.node1.local_node 18.97 ±113% +314.3% 78.59 ± 66% numa-vmstat.node1.nr_anon_transparent_hugepages 6003 -5.6% 5668 time.user_time 1.163e+08 -2.4% 1.135e+08 unixbench.throughput 6003 -5.6% 5668 unixbench.time.user_time 4.314e+10 -2.3% 4.215e+10 unixbench.workload -12.17 +33.7% -16.26 sched_debug.cpu.nr_uninterruptible.min 0.00 ± 95% +600.3% 0.00 ± 88% sched_debug.rt_rq:.rt_time.avg 0.02 ± 95% +600.3% 0.14 ± 88% sched_debug.rt_rq:.rt_time.max 0.00 ± 95% +600.3% 0.01 ± 88% sched_debug.rt_rq:.rt_time.stddev 1.407e+11 -2.0% 1.379e+11 perf-stat.i.branch-instructions 0.55 -0.0 0.51 ± 4% perf-stat.i.branch-miss-rate% 55780077 -85.5% 8078438 perf-stat.i.branch-misses 5029827 ± 6% +315.5% 20897838 ± 10% perf-stat.i.cache-misses 35311245 ± 2% +328.2% 1.512e+08 ± 6% perf-stat.i.cache-references 118639 ± 18% -61.7% 45421 ± 41% perf-stat.i.cycles-between-cache-misses 6.736e+11 -1.5% 6.634e+11 perf-stat.i.instructions 0.01 ± 6% +321.2% 0.03 ± 10% perf-stat.overall.MPKI 0.04 -0.0 0.01 perf-stat.overall.branch-miss-rate% 0.78 +1.5% 0.79 perf-stat.overall.cpi 103942 ± 6% -75.7% 25208 ± 10% perf-stat.overall.cycles-between-cache-misses 1.29 -1.5% 1.27 perf-stat.overall.ipc 1.4e+11 -1.9% 1.373e+11 perf-stat.ps.branch-instructions 55517704 -85.5% 8057745 perf-stat.ps.branch-misses 5026889 ± 6% +315.3% 20876882 ± 10% perf-stat.ps.cache-misses 35229110 ± 2% +327.6% 1.506e+08 ± 6% perf-stat.ps.cache-references 6.701e+11 -1.4% 6.608e+11 perf-stat.ps.instructions 2.496e+14 -1.4% 2.46e+14 perf-stat.total.instructions 3.61 -0.5 3.09 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64 2.66 -0.5 2.18 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write 12.62 -0.5 12.16 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write 7.29 -0.3 7.03 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write 4.98 -0.2 4.74 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.96 -0.2 6.74 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 4.50 -0.2 4.33 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write 3.74 -0.2 3.58 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter 12.82 -0.2 12.66 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 1.04 -0.1 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write 2.87 -0.1 2.78 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter 5.27 -0.1 5.18 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 0.81 -0.1 0.75 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write 2.17 -0.0 2.13 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write 1.66 -0.0 1.62 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 0.74 -0.0 0.70 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write 1.27 -0.0 1.23 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags 0.84 -0.0 0.81 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write 0.61 -0.0 0.57 perf-profile.calltrace.cycles-pp.w_test 0.88 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 0.90 -0.0 0.87 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 0.61 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 0.71 -0.0 0.69 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.00 ± 4% +0.1 1.10 ± 3% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter 2.68 +0.1 2.79 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 5.18 +0.1 5.31 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write 6.00 +0.1 6.14 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 3.35 +0.1 3.48 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write 4.00 +0.1 4.14 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write 96.95 +0.2 97.12 perf-profile.calltrace.cycles-pp.write 3.68 +0.2 3.86 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 84.87 +0.6 85.45 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 83.48 +0.6 84.10 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 78.96 +0.7 79.70 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 75.00 +0.8 75.81 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 58.04 +1.1 59.14 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 36.40 +1.2 37.55 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 12.88 +1.3 14.20 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 11.79 +1.3 13.13 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write 5.74 +1.4 7.16 ± 2% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 3.98 -0.5 3.44 perf-profile.children.cycles-pp.security_file_permission 2.90 -0.5 2.40 perf-profile.children.cycles-pp.apparmor_file_permission 7.66 -0.3 7.38 perf-profile.children.cycles-pp.file_remove_privs_flags 7.14 -0.3 6.89 perf-profile.children.cycles-pp.entry_SYSCALL_64 5.34 -0.2 5.10 perf-profile.children.cycles-pp.rw_verify_area 7.10 -0.2 6.88 perf-profile.children.cycles-pp.copy_page_from_iter_atomic 3.98 -0.2 3.80 perf-profile.children.cycles-pp.cap_inode_need_killpriv 4.73 -0.2 4.56 perf-profile.children.cycles-pp.security_inode_need_killpriv 13.16 -0.2 13.00 perf-profile.children.cycles-pp.__generic_file_write_iter 1.14 -0.1 1.01 perf-profile.children.cycles-pp.syscall_return_via_sysret 5.67 -0.1 5.56 perf-profile.children.cycles-pp.simple_write_end 3.56 -0.1 3.46 perf-profile.children.cycles-pp.__vfs_getxattr 4.04 -0.1 3.95 perf-profile.children.cycles-pp.__cond_resched 4.21 -0.1 4.14 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 2.31 -0.1 2.24 perf-profile.children.cycles-pp.rcu_all_qs 0.99 -0.1 0.93 perf-profile.children.cycles-pp.folio_mark_dirty 2.46 -0.0 2.42 perf-profile.children.cycles-pp.xas_load 0.93 -0.0 0.88 ± 2% perf-profile.children.cycles-pp.w_test 1.50 -0.0 1.46 perf-profile.children.cycles-pp.strcmp 1.72 -0.0 1.68 perf-profile.children.cycles-pp.up_write 0.87 -0.0 0.82 perf-profile.children.cycles-pp.setattr_should_drop_suidgid 1.04 -0.0 1.00 perf-profile.children.cycles-pp.folio_mapping 0.96 -0.0 0.92 perf-profile.children.cycles-pp.aa_file_perm 1.23 -0.0 1.20 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags 0.73 -0.0 0.70 perf-profile.children.cycles-pp.x64_sys_call 1.07 -0.0 1.05 perf-profile.children.cycles-pp.folio_wait_stable 0.43 -0.0 0.41 ± 2% perf-profile.children.cycles-pp.write@plt 1.08 -0.0 1.06 perf-profile.children.cycles-pp.xattr_resolve_name 0.35 -0.0 0.34 perf-profile.children.cycles-pp.__x64_sys_write 99.01 +0.0 99.04 perf-profile.children.cycles-pp.write 1.12 ± 4% +0.1 1.20 ± 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 2.86 +0.1 2.97 perf-profile.children.cycles-pp.down_write 5.50 +0.1 5.63 perf-profile.children.cycles-pp.fault_in_readable 3.75 +0.1 3.88 perf-profile.children.cycles-pp.inode_needs_update_time 4.34 +0.1 4.47 perf-profile.children.cycles-pp.file_update_time 6.25 +0.1 6.39 perf-profile.children.cycles-pp.fault_in_iov_iter_readable 3.77 +0.2 3.96 perf-profile.children.cycles-pp.__fsnotify_parent 85.29 +0.6 85.86 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 84.14 +0.6 84.74 perf-profile.children.cycles-pp.do_syscall_64 79.57 +0.7 80.28 perf-profile.children.cycles-pp.ksys_write 75.84 +0.8 76.64 perf-profile.children.cycles-pp.vfs_write 58.64 +1.1 59.72 perf-profile.children.cycles-pp.generic_file_write_iter 37.30 +1.1 38.43 perf-profile.children.cycles-pp.generic_perform_write 13.05 +1.3 14.38 perf-profile.children.cycles-pp.simple_write_begin 12.18 +1.3 13.52 perf-profile.children.cycles-pp.__filemap_get_folio 5.94 +1.4 7.35 ± 2% perf-profile.children.cycles-pp.filemap_get_entry 1.77 -0.4 1.35 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission 6.23 -0.3 5.94 perf-profile.self.cycles-pp.write 6.94 -0.2 6.71 perf-profile.self.cycles-pp.copy_page_from_iter_atomic 7.14 -0.2 6.93 perf-profile.self.cycles-pp.vfs_write 1.13 -0.1 1.01 perf-profile.self.cycles-pp.syscall_return_via_sysret 1.49 ± 3% -0.1 1.38 perf-profile.self.cycles-pp.ksys_write 4.12 -0.1 4.03 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.42 -0.1 0.34 perf-profile.self.cycles-pp.cap_inode_need_killpriv 2.17 -0.1 2.10 perf-profile.self.cycles-pp.file_remove_privs_flags 1.08 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission 1.83 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64 2.74 -0.1 2.68 perf-profile.self.cycles-pp.simple_write_end 0.86 -0.0 0.81 perf-profile.self.cycles-pp.aa_file_perm 1.60 -0.0 1.56 perf-profile.self.cycles-pp.up_write 1.42 -0.0 1.38 perf-profile.self.cycles-pp.__vfs_getxattr 0.86 -0.0 0.82 perf-profile.self.cycles-pp.folio_mapping 1.36 -0.0 1.32 perf-profile.self.cycles-pp.rcu_all_qs 0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_mark_dirty 1.78 -0.0 1.75 perf-profile.self.cycles-pp.xas_load 1.23 -0.0 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.74 -0.0 0.71 perf-profile.self.cycles-pp.setattr_should_drop_suidgid 0.74 -0.0 0.71 ± 2% perf-profile.self.cycles-pp.w_test 1.14 -0.0 1.11 perf-profile.self.cycles-pp.strcmp 2.25 -0.0 2.22 perf-profile.self.cycles-pp.__cond_resched 0.60 -0.0 0.58 perf-profile.self.cycles-pp.x64_sys_call 0.77 -0.0 0.75 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags 0.61 -0.0 0.60 perf-profile.self.cycles-pp.xattr_resolve_name 0.74 +0.0 0.76 perf-profile.self.cycles-pp.fault_in_iov_iter_readable 1.40 +0.1 1.45 perf-profile.self.cycles-pp.generic_write_checks 1.60 +0.1 1.65 perf-profile.self.cycles-pp.inode_needs_update_time 1.00 ± 4% +0.1 1.08 ± 3% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 1.86 ± 2% +0.1 1.98 perf-profile.self.cycles-pp.down_write 5.34 +0.1 5.47 perf-profile.self.cycles-pp.fault_in_readable 3.61 +0.2 3.80 perf-profile.self.cycles-pp.__fsnotify_parent 1.46 +0.3 1.77 perf-profile.self.cycles-pp.rw_verify_area 3.43 +1.4 4.88 ± 3% perf-profile.self.cycles-pp.filemap_get_entry > > Thanks, > Amir. > > > > > > testcase: unixbench > > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory > > parameters: > > > > runtime: 300s > > nr_task: 100% > > test: fsbuffer-w > > cpufreq_governor: performance > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > | Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com > > > > > > Details are as below: > > --------------------------------------------------------------------------------------------------> > > > > > > The kernel config and materials to reproduce are available at: > > https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > commit: > > 00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event") > > 9d1fd61f1d ("fanotify: pass optional file access range in pre-content event") > > > > 00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b > > ---------------- --------------------------- > > %stddev %change %stddev > > \ | \ > > 1.23e+08 -7.9% 1.133e+08 unixbench.throughput > > 6169 -7.7% 5694 unixbench.time.user_time > > 4.566e+10 -7.9% 4.206e+10 unixbench.workload > > 1.513e+11 -4.5% 1.445e+11 perf-stat.i.branch-instructions > > 6891152 +4.8% 7221484 perf-stat.i.branch-misses > > 29764445 ± 2% -7.4% 27565609 ± 3% perf-stat.i.cache-references > > 0.91 +2.0% 0.93 perf-stat.i.cpi > > 7.187e+11 -2.7% 6.996e+11 perf-stat.i.instructions > > 1.26 -2.6% 1.23 perf-stat.i.ipc > > 0.00 +0.0 0.01 perf-stat.overall.branch-miss-rate% > > 0.73 +2.7% 0.75 perf-stat.overall.cpi > > 1.37 -2.6% 1.34 perf-stat.overall.ipc > > 5828 +5.7% 6162 perf-stat.overall.path-length > > 1.505e+11 -4.5% 1.437e+11 perf-stat.ps.branch-instructions > > 6873687 +4.8% 7203107 perf-stat.ps.branch-misses > > 29721957 ± 2% -7.3% 27538369 ± 3% perf-stat.ps.cache-references > > 7.148e+11 -2.6% 6.96e+11 perf-stat.ps.instructions > > 2.662e+14 -2.6% 2.592e+14 perf-stat.total.instructions > > 57.79 -2.0 55.78 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > > 37.58 -2.0 35.63 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > 13.06 -1.0 12.04 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write > > 13.81 -1.0 12.83 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > 12.72 -0.9 11.78 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write > > 7.00 -0.5 6.47 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > 6.53 -0.5 6.02 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > > 5.36 -0.5 4.89 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > 3.66 -0.4 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64 > > 2.68 -0.3 2.36 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write > > 6.57 -0.2 6.34 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > > 2.36 ± 2% -0.2 2.18 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > 1.83 -0.2 1.66 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > > 2.92 -0.2 2.76 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > 2.65 -0.2 2.49 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write > > 3.95 -0.1 3.83 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > > 1.62 -0.1 1.50 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > 0.74 -0.1 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > > 3.26 -0.1 3.17 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter > > 3.57 -0.1 3.49 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > > 1.61 -0.1 1.53 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > 0.93 -0.1 0.85 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > 1.05 -0.1 0.99 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin > > 0.61 -0.1 0.55 perf-profile.calltrace.cycles-pp.w_test > > 0.64 -0.1 0.58 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > 0.87 -0.1 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write > > 2.50 -0.1 2.44 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter > > 0.62 -0.1 0.56 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin > > 0.74 -0.0 0.69 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > > 0.91 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > > 0.84 -0.0 0.79 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > > 0.68 -0.0 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > > 0.74 -0.0 0.71 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > > 0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > 0.97 +0.0 1.00 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > > 0.91 +0.1 0.97 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter > > 0.86 ± 3% +0.1 0.94 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write > > 0.58 ± 2% +0.1 0.66 ± 7% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter > > 11.24 +0.1 11.36 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > 2.01 ± 2% +0.1 2.14 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > 6.04 +0.2 6.24 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > 5.17 +0.2 5.42 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write > > 96.75 +0.3 97.03 perf-profile.calltrace.cycles-pp.write > > 2.57 +0.4 2.92 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write > > 3.20 +0.4 3.57 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > > 84.82 +1.1 85.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write > > 83.38 +1.2 84.56 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > 78.73 +1.5 80.20 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > 74.54 +1.8 76.32 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > 0.00 +4.0 3.99 perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64 > > 5.32 +4.2 9.48 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > > 58.42 -2.0 56.38 perf-profile.children.cycles-pp.generic_file_write_iter > > 38.46 -2.0 36.50 perf-profile.children.cycles-pp.generic_perform_write > > 13.99 -1.0 13.01 perf-profile.children.cycles-pp.simple_write_begin > > 13.11 -1.0 12.15 perf-profile.children.cycles-pp.__filemap_get_folio > > 7.23 -0.6 6.66 perf-profile.children.cycles-pp.entry_SYSCALL_64 > > 7.12 -0.5 6.59 perf-profile.children.cycles-pp.copy_page_from_iter_atomic > > 6.73 -0.5 6.21 perf-profile.children.cycles-pp.filemap_get_entry > > 5.76 -0.5 5.26 perf-profile.children.cycles-pp.simple_write_end > > 4.05 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission > > 2.93 -0.3 2.59 perf-profile.children.cycles-pp.apparmor_file_permission > > 4.32 -0.3 4.04 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack > > 4.20 -0.3 3.92 perf-profile.children.cycles-pp.__cond_resched > > 6.91 -0.2 6.67 perf-profile.children.cycles-pp.file_remove_privs_flags > > 2.43 -0.2 2.24 perf-profile.children.cycles-pp.rcu_all_qs > > 3.10 -0.2 2.92 perf-profile.children.cycles-pp.xas_load > > 2.47 ± 2% -0.2 2.29 ± 2% perf-profile.children.cycles-pp.__fdget_pos > > 1.92 -0.2 1.74 perf-profile.children.cycles-pp.folio_unlock > > 3.11 -0.2 2.94 perf-profile.children.cycles-pp.down_write > > 4.18 -0.1 4.04 perf-profile.children.cycles-pp.security_inode_need_killpriv > > 1.68 -0.1 1.56 perf-profile.children.cycles-pp.up_write > > 3.48 -0.1 3.38 perf-profile.children.cycles-pp.cap_inode_need_killpriv > > 1.96 -0.1 1.87 perf-profile.children.cycles-pp.syscall_exit_to_user_mode > > 1.28 -0.1 1.18 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags > > 0.92 -0.1 0.84 perf-profile.children.cycles-pp.w_test > > 3.14 -0.1 3.06 perf-profile.children.cycles-pp.__vfs_getxattr > > 1.00 -0.1 0.92 perf-profile.children.cycles-pp.aa_file_perm > > 1.29 -0.1 1.22 perf-profile.children.cycles-pp.xas_descend > > 0.76 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call > > 0.87 -0.1 0.80 perf-profile.children.cycles-pp.setattr_should_drop_suidgid > > 1.07 -0.1 1.01 perf-profile.children.cycles-pp.xattr_resolve_name > > 1.10 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable > > 1.05 -0.1 1.00 perf-profile.children.cycles-pp.folio_mapping > > 0.73 -0.1 0.67 perf-profile.children.cycles-pp.xas_start > > 0.93 -0.1 0.88 perf-profile.children.cycles-pp.folio_mark_dirty > > 0.50 -0.0 0.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack > > 0.60 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi > > 0.43 -0.0 0.39 perf-profile.children.cycles-pp.write@plt > > 0.36 -0.0 0.33 perf-profile.children.cycles-pp.amd_clear_divider > > 0.37 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write > > 0.33 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio > > 0.36 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode > > 0.24 -0.0 0.23 ± 2% perf-profile.children.cycles-pp.file_remove_privs > > 1.18 +0.0 1.21 perf-profile.children.cycles-pp.strcmp > > 1.02 +0.1 1.08 perf-profile.children.cycles-pp.timestamp_truncate > > 99.01 +0.1 99.09 perf-profile.children.cycles-pp.write > > 0.98 ± 3% +0.1 1.06 perf-profile.children.cycles-pp.generic_write_check_limits > > 0.68 ± 2% +0.1 0.77 ± 6% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 > > 11.58 +0.1 11.69 perf-profile.children.cycles-pp.__generic_file_write_iter > > 2.36 ± 2% +0.1 2.50 perf-profile.children.cycles-pp.generic_write_checks > > 5.57 +0.2 5.75 perf-profile.children.cycles-pp.fault_in_readable > > 6.28 +0.2 6.49 perf-profile.children.cycles-pp.fault_in_iov_iter_readable > > 2.98 +0.4 3.33 perf-profile.children.cycles-pp.inode_needs_update_time > > 3.51 +0.4 3.89 perf-profile.children.cycles-pp.file_update_time > > 85.24 +1.1 86.31 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe > > 84.05 +1.2 85.21 perf-profile.children.cycles-pp.do_syscall_64 > > 79.32 +1.5 80.78 perf-profile.children.cycles-pp.ksys_write > > 75.49 +1.7 77.21 perf-profile.children.cycles-pp.vfs_write > > 3.64 +4.0 7.64 perf-profile.children.cycles-pp.__fsnotify_parent > > 5.68 +4.3 10.03 perf-profile.children.cycles-pp.rw_verify_area > > 6.96 -0.5 6.44 perf-profile.self.cycles-pp.copy_page_from_iter_atomic > > 6.52 -0.5 6.01 perf-profile.self.cycles-pp.write > > 6.92 -0.4 6.48 perf-profile.self.cycles-pp.vfs_write > > 3.59 -0.3 3.24 perf-profile.self.cycles-pp.filemap_get_entry > > 4.41 -0.3 4.09 perf-profile.self.cycles-pp.__filemap_get_folio > > 4.23 -0.3 3.95 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack > > 2.79 -0.3 2.52 perf-profile.self.cycles-pp.simple_write_end > > 1.76 -0.2 1.52 perf-profile.self.cycles-pp.apparmor_file_permission > > 2.32 ± 2% -0.2 2.16 ± 2% perf-profile.self.cycles-pp.__fdget_pos > > 1.79 -0.2 1.62 perf-profile.self.cycles-pp.folio_unlock > > 2.05 -0.2 1.89 perf-profile.self.cycles-pp.down_write > > 2.35 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched > > 1.89 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64 > > 1.38 -0.1 1.26 perf-profile.self.cycles-pp.entry_SYSCALL_64 > > 1.56 -0.1 1.45 perf-profile.self.cycles-pp.up_write > > 1.30 -0.1 1.19 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe > > 1.42 -0.1 1.31 perf-profile.self.cycles-pp.rcu_all_qs > > 1.12 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission > > 1.46 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write > > 0.90 -0.1 0.83 perf-profile.self.cycles-pp.aa_file_perm > > 1.29 -0.1 1.22 perf-profile.self.cycles-pp.xas_load > > 0.74 -0.1 0.67 perf-profile.self.cycles-pp.w_test > > 1.08 -0.1 1.01 perf-profile.self.cycles-pp.syscall_exit_to_user_mode > > 1.98 -0.1 1.92 perf-profile.self.cycles-pp.file_remove_privs_flags > > 1.30 -0.1 1.24 perf-profile.self.cycles-pp.__vfs_getxattr > > 1.06 -0.1 1.00 perf-profile.self.cycles-pp.xas_descend > > 0.80 -0.1 0.74 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags > > 0.63 -0.1 0.58 perf-profile.self.cycles-pp.x64_sys_call > > 0.74 -0.1 0.69 perf-profile.self.cycles-pp.setattr_should_drop_suidgid > > 0.63 -0.0 0.58 perf-profile.self.cycles-pp.xas_start > > 0.87 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping > > 0.50 -0.0 0.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack > > 0.60 -0.0 0.57 perf-profile.self.cycles-pp.xattr_resolve_name > > 0.48 -0.0 0.44 perf-profile.self.cycles-pp.folio_mark_dirty > > 0.68 -0.0 0.65 perf-profile.self.cycles-pp.security_inode_need_killpriv > > 0.36 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.inode_to_bdi > > 0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_wait_stable > > 0.34 -0.0 0.32 perf-profile.self.cycles-pp.cap_inode_need_killpriv > > 0.89 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin > > 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write > > 0.23 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.amd_clear_divider > > 0.23 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio > > 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.write@plt > > 0.24 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.is_bad_inode > > 0.62 +0.0 0.65 perf-profile.self.cycles-pp.file_update_time > > 0.86 +0.0 0.90 perf-profile.self.cycles-pp.strcmp > > 0.69 +0.0 0.74 perf-profile.self.cycles-pp.fault_in_iov_iter_readable > > 0.75 ± 3% +0.1 0.81 perf-profile.self.cycles-pp.generic_write_check_limits > > 1.42 ± 2% +0.1 1.48 perf-profile.self.cycles-pp.generic_write_checks > > 0.82 +0.1 0.89 perf-profile.self.cycles-pp.timestamp_truncate > > 0.58 ± 3% +0.1 0.66 ± 6% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 > > 5.44 +0.2 5.60 perf-profile.self.cycles-pp.fault_in_readable > > 1.36 +0.2 1.55 perf-profile.self.cycles-pp.inode_needs_update_time > > 1.76 ± 3% +0.9 2.64 perf-profile.self.cycles-pp.rw_verify_area > > 3.46 +3.8 7.25 perf-profile.self.cycles-pp.__fsnotify_parent > > > > > > > > > > Disclaimer: > > Results have been estimated based on internal Intel analysis and are provided > > for informational purposes only. Any difference in system hardware or software > > design or configuration may affect actual performance. > > > > > > -- > > 0-DAY CI Kernel Test Service > > https://github.com/intel/lkp-tests/wiki > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-05-31 3:15 ` Oliver Sang @ 2024-05-31 5:18 ` Amir Goldstein 2024-06-03 8:13 ` Oliver Sang 0 siblings, 1 reply; 17+ messages in thread From: Amir Goldstein @ 2024-05-31 5:18 UTC (permalink / raw) To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote: > > hi, Amir, > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote: > > On Wed, May 29, 2024 at 11:26 AM kernel test robot > > <oliver.sang@intel.com> wrote: > > > > > > > > > > > > Hello, > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > > > https://github.com/amir73il/linux sb_write_barrier > > > > > > > Jan, > > > > I speculate that the regression is due to the fact that we store and pass the > > path information on struct file_range on the stack before the optimizations > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores > > and __fsnotify_parent() pays a bigger price for fetches? > > > > Luckily, we already have the way to check > > fsnotify_sb_has_priority_watchers(inode->i_sb, > > FSNOTIFY_PRIO_PRE_CONTENT)) > > so now I used it to optimize out the fsnotify_file_range() inline > > code entirely. > > > > Oliver, > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1): > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info > > with pre-content events > > * f301cd18006c - fanotify: rename a misnamed constant > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec > > * aca408421327 - fsnotify: generate pre-content permission event on open > > * 93656e196b00 - fsnotify: introduce pre-content permission event > > > > The optimization was done in the first commit (fsnotify: introduce > > pre-content permission event), > > but impacts the regressing commit (fanotify: pass optional file access > > range in pre-content event). > > no need to test all middle commits. > > I directly compare the tip with v6.10-rc1, still a regression but better now > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > v6.10-rc1 > a82fd282befc7 ("fanotify: report file range info with pre-content events") > > v6.10-rc1 a82fd282befc71d99106bf31066 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput > > full data is as below [1] > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event" > > it also has a small regression comparing to its parent, but better also. > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") > > 94167e071109d573 64108c0b47db91b20d658a89969 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > full data is as below [2] > Ok, this looks sane, the small overhead in the write path makes sense. It may have been a "tactic mistake" merging this optimization to v6.10-rc1 a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers") before the rest of the pre-content infrastructure, because together they would still be a performance win. Can you please compare this branch to v6.9? Thanks, Amir. > > [1] > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > v6.10-rc1 > a82fd282befc7 ("fanotify: report file range info with pre-content events") > > v6.10-rc1 a82fd282befc71d99106bf31066 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 1614 ± 6% +252.4% 5688 ± 67% numa-vmstat.node1.nr_mapped > 6199 -5.8% 5841 time.user_time > 220234 ± 13% +121.4% 487546 ± 41% numa-meminfo.node0.AnonPages.max > 836146 ± 6% -36.0% 535267 ± 45% numa-meminfo.node1.AnonPages.max > 6233 ± 7% +251.3% 21898 ± 69% numa-meminfo.node1.Mapped > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput > 6199 -5.8% 5841 unixbench.time.user_time > 4.513e+10 -3.9% 4.338e+10 unixbench.workload > 1.458e+11 -2.7% 1.419e+11 perf-stat.i.branch-instructions > 11.47 ± 6% +2.6 14.10 ± 9% perf-stat.i.cache-miss-rate% > 3915539 ± 8% +510.0% 23884093 ± 9% perf-stat.i.cache-misses > 32425619 ± 3% +396.4% 1.61e+08 ± 4% perf-stat.i.cache-references > 151202 ± 16% -78.6% 32364 ± 56% perf-stat.i.cycles-between-cache-misses > 6.961e+11 -1.9% 6.828e+11 perf-stat.i.instructions > 1.22 -1.3% 1.20 perf-stat.i.ipc > 0.01 ± 9% +519.5% 0.04 ± 10% perf-stat.overall.MPKI > 0.01 +0.0 0.01 perf-stat.overall.branch-miss-rate% > 12.09 ± 6% +2.8 14.86 ± 8% perf-stat.overall.cache-miss-rate% > 0.75 +2.0% 0.77 perf-stat.overall.cpi > 133775 ± 8% -83.5% 22060 ± 9% perf-stat.overall.cycles-between-cache-misses > 1.33 -1.9% 1.31 perf-stat.overall.ipc > 5721 +2.0% 5836 perf-stat.overall.path-length > 1.452e+11 -2.7% 1.413e+11 perf-stat.ps.branch-instructions > 3921138 ± 8% +507.4% 23818053 ± 9% perf-stat.ps.cache-misses > 32415461 ± 3% +394.4% 1.603e+08 ± 4% perf-stat.ps.cache-references > 6.932e+11 -1.9% 6.797e+11 perf-stat.ps.instructions > 2.582e+14 -1.9% 2.532e+14 perf-stat.total.instructions > 13.19 -0.7 12.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write > 7.01 -0.2 6.80 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 1.11 -0.2 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write > 2.50 -0.1 2.35 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write > 1.68 -0.1 1.59 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 3.73 -0.1 3.64 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 1.62 -0.1 1.55 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 2.18 -0.1 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write > 0.65 -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.w_test > 0.92 -0.0 0.87 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 0.70 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write > 0.86 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write > 0.92 -0.0 0.88 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 0.63 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.86 -0.0 0.83 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > 3.53 -0.0 3.50 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter > 0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 0.53 -0.0 0.51 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write > 0.72 -0.0 0.71 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.75 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > 1.13 +0.0 1.17 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > 5.30 +0.1 5.36 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 5.30 +0.1 5.38 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write > 6.17 +0.1 6.27 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 96.84 +0.1 96.98 perf-profile.calltrace.cycles-pp.write > 0.78 ± 2% +0.3 1.13 ± 5% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter > 2.97 +0.6 3.57 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write > 12.01 +0.6 12.62 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 3.63 +0.6 4.24 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > 4.32 +0.6 4.96 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 37.28 +0.8 38.12 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 84.26 +1.0 85.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write > 13.39 +1.0 14.36 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 12.30 +1.0 13.30 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write > 82.83 +1.0 83.86 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 57.94 +1.3 59.20 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 5.99 +1.3 7.25 ± 3% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 78.13 +1.3 79.41 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 74.26 +1.3 75.59 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 7.43 -0.4 7.06 perf-profile.children.cycles-pp.entry_SYSCALL_64 > 4.42 -0.2 4.18 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack > 1.21 -0.2 1.00 perf-profile.children.cycles-pp.syscall_return_via_sysret > 7.14 -0.2 6.94 perf-profile.children.cycles-pp.copy_page_from_iter_atomic > 4.18 -0.2 4.00 perf-profile.children.cycles-pp.__cond_resched > 2.74 -0.2 2.58 perf-profile.children.cycles-pp.apparmor_file_permission > 2.42 -0.1 2.30 perf-profile.children.cycles-pp.rcu_all_qs > 3.82 -0.1 3.72 perf-profile.children.cycles-pp.__fsnotify_parent > 1.74 -0.1 1.65 perf-profile.children.cycles-pp.up_write > 1.99 -0.1 1.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode > 0.99 -0.1 0.91 perf-profile.children.cycles-pp.w_test > 3.71 -0.1 3.64 perf-profile.children.cycles-pp.security_file_permission > 2.47 -0.1 2.41 perf-profile.children.cycles-pp.xas_load > 1.12 -0.1 1.06 perf-profile.children.cycles-pp.folio_wait_stable > 1.26 -0.1 1.21 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags > 0.75 -0.0 0.71 perf-profile.children.cycles-pp.x64_sys_call > 0.98 -0.0 0.94 perf-profile.children.cycles-pp.aa_file_perm > 0.46 -0.0 0.42 perf-profile.children.cycles-pp.write@plt > 1.10 -0.0 1.07 perf-profile.children.cycles-pp.xattr_resolve_name > 0.36 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider > 3.76 -0.0 3.73 perf-profile.children.cycles-pp.cap_inode_need_killpriv > 0.59 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi > 3.41 -0.0 3.38 perf-profile.children.cycles-pp.__vfs_getxattr > 0.56 -0.0 0.53 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack > 1.05 -0.0 1.03 perf-profile.children.cycles-pp.folio_mapping > 0.38 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write > 0.25 -0.0 0.24 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited > 0.36 -0.0 0.35 perf-profile.children.cycles-pp.is_bad_inode > 1.38 +0.0 1.40 perf-profile.children.cycles-pp.strcmp > 0.93 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty > 1.07 +0.0 1.09 perf-profile.children.cycles-pp.timestamp_truncate > 5.70 +0.0 5.75 perf-profile.children.cycles-pp.simple_write_end > 98.96 +0.1 99.02 perf-profile.children.cycles-pp.write > 5.69 +0.1 5.75 perf-profile.children.cycles-pp.fault_in_readable > 6.42 +0.1 6.53 perf-profile.children.cycles-pp.fault_in_iov_iter_readable > 0.89 +0.3 1.24 ± 4% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 > 3.39 +0.6 3.97 perf-profile.children.cycles-pp.inode_needs_update_time > 12.35 +0.6 12.96 perf-profile.children.cycles-pp.__generic_file_write_iter > 3.96 +0.6 4.57 perf-profile.children.cycles-pp.file_update_time > 4.56 +0.8 5.33 perf-profile.children.cycles-pp.rw_verify_area > 38.16 +0.8 39.01 perf-profile.children.cycles-pp.generic_perform_write > 13.58 +1.0 14.54 perf-profile.children.cycles-pp.simple_write_begin > 84.67 +1.0 85.63 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe > 12.68 +1.0 13.68 perf-profile.children.cycles-pp.__filemap_get_folio > 83.50 +1.0 84.51 perf-profile.children.cycles-pp.do_syscall_64 > 58.52 +1.3 59.78 perf-profile.children.cycles-pp.generic_file_write_iter > 6.18 +1.3 7.44 ± 3% perf-profile.children.cycles-pp.filemap_get_entry > 78.74 +1.3 80.00 perf-profile.children.cycles-pp.ksys_write > 75.13 +1.3 76.42 perf-profile.children.cycles-pp.vfs_write > 7.25 -0.6 6.64 perf-profile.self.cycles-pp.vfs_write > 6.45 -0.4 6.08 perf-profile.self.cycles-pp.write > 4.32 -0.2 4.08 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack > 1.21 -0.2 1.00 perf-profile.self.cycles-pp.syscall_return_via_sysret > 6.98 -0.2 6.78 perf-profile.self.cycles-pp.copy_page_from_iter_atomic > 4.52 -0.2 4.36 perf-profile.self.cycles-pp.__filemap_get_folio > 2.34 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched > 1.90 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64 > 1.60 -0.1 1.50 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission > 1.47 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write > 1.62 -0.1 1.53 perf-profile.self.cycles-pp.up_write > 3.65 -0.1 3.56 perf-profile.self.cycles-pp.__fsnotify_parent > 1.09 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode > 1.80 -0.1 1.74 perf-profile.self.cycles-pp.xas_load > 0.79 -0.1 0.73 perf-profile.self.cycles-pp.w_test > 1.10 -0.1 1.04 perf-profile.self.cycles-pp.security_file_permission > 1.25 -0.1 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe > 1.66 -0.1 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64 > 1.41 -0.0 1.36 perf-profile.self.cycles-pp.rcu_all_qs > 0.90 -0.0 0.86 perf-profile.self.cycles-pp.simple_write_begin > 0.88 -0.0 0.84 perf-profile.self.cycles-pp.aa_file_perm > 0.80 -0.0 0.76 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags > 0.62 -0.0 0.59 perf-profile.self.cycles-pp.x64_sys_call > 1.39 -0.0 1.36 perf-profile.self.cycles-pp.__vfs_getxattr > 0.53 -0.0 0.51 perf-profile.self.cycles-pp.folio_wait_stable > 0.87 -0.0 0.85 perf-profile.self.cycles-pp.folio_mapping > 0.56 -0.0 0.53 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack > 0.24 -0.0 0.22 perf-profile.self.cycles-pp.amd_clear_divider > 0.12 ± 3% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.write@plt > 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write > 0.35 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi > 0.22 -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio > 0.66 +0.0 0.69 perf-profile.self.cycles-pp.file_update_time > 1.03 +0.0 1.06 perf-profile.self.cycles-pp.strcmp > 2.75 +0.0 2.79 perf-profile.self.cycles-pp.simple_write_end > 0.72 +0.0 0.77 perf-profile.self.cycles-pp.fault_in_iov_iter_readable > 0.87 +0.1 0.92 perf-profile.self.cycles-pp.timestamp_truncate > 5.54 +0.1 5.59 perf-profile.self.cycles-pp.fault_in_readable > 2.04 +0.1 2.09 perf-profile.self.cycles-pp.file_remove_privs_flags > 1.51 +0.2 1.69 perf-profile.self.cycles-pp.inode_needs_update_time > 0.78 ± 2% +0.3 1.11 ± 5% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 > 0.84 +1.0 1.82 perf-profile.self.cycles-pp.rw_verify_area > 3.66 +1.3 4.97 ± 4% perf-profile.self.cycles-pp.filemap_get_entry > > > [2] > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") > > 94167e071109d573 64108c0b47db91b20d658a89969 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 38903 ±113% +313.8% 160973 ± 66% numa-meminfo.node1.AnonHugePages > 1666466 ± 4% -12.2% 1462703 ± 9% numa-numastat.node1.local_node > 18.97 ±113% +314.3% 78.59 ± 66% numa-vmstat.node1.nr_anon_transparent_hugepages > 6003 -5.6% 5668 time.user_time > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > 6003 -5.6% 5668 unixbench.time.user_time > 4.314e+10 -2.3% 4.215e+10 unixbench.workload > -12.17 +33.7% -16.26 sched_debug.cpu.nr_uninterruptible.min > 0.00 ± 95% +600.3% 0.00 ± 88% sched_debug.rt_rq:.rt_time.avg > 0.02 ± 95% +600.3% 0.14 ± 88% sched_debug.rt_rq:.rt_time.max > 0.00 ± 95% +600.3% 0.01 ± 88% sched_debug.rt_rq:.rt_time.stddev > 1.407e+11 -2.0% 1.379e+11 perf-stat.i.branch-instructions > 0.55 -0.0 0.51 ± 4% perf-stat.i.branch-miss-rate% > 55780077 -85.5% 8078438 perf-stat.i.branch-misses > 5029827 ± 6% +315.5% 20897838 ± 10% perf-stat.i.cache-misses > 35311245 ± 2% +328.2% 1.512e+08 ± 6% perf-stat.i.cache-references > 118639 ± 18% -61.7% 45421 ± 41% perf-stat.i.cycles-between-cache-misses > 6.736e+11 -1.5% 6.634e+11 perf-stat.i.instructions > 0.01 ± 6% +321.2% 0.03 ± 10% perf-stat.overall.MPKI > 0.04 -0.0 0.01 perf-stat.overall.branch-miss-rate% > 0.78 +1.5% 0.79 perf-stat.overall.cpi > 103942 ± 6% -75.7% 25208 ± 10% perf-stat.overall.cycles-between-cache-misses > 1.29 -1.5% 1.27 perf-stat.overall.ipc > 1.4e+11 -1.9% 1.373e+11 perf-stat.ps.branch-instructions > 55517704 -85.5% 8057745 perf-stat.ps.branch-misses > 5026889 ± 6% +315.3% 20876882 ± 10% perf-stat.ps.cache-misses > 35229110 ± 2% +327.6% 1.506e+08 ± 6% perf-stat.ps.cache-references > 6.701e+11 -1.4% 6.608e+11 perf-stat.ps.instructions > 2.496e+14 -1.4% 2.46e+14 perf-stat.total.instructions > 3.61 -0.5 3.09 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64 > 2.66 -0.5 2.18 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write > 12.62 -0.5 12.16 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write > 7.29 -0.3 7.03 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > 4.98 -0.2 4.74 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 6.96 -0.2 6.74 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 4.50 -0.2 4.33 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > 3.74 -0.2 3.58 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter > 12.82 -0.2 12.66 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 1.04 -0.1 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write > 2.87 -0.1 2.78 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter > 5.27 -0.1 5.18 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 0.81 -0.1 0.75 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > 2.17 -0.0 2.13 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write > 1.66 -0.0 1.62 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 0.74 -0.0 0.70 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > 1.27 -0.0 1.23 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > 0.84 -0.0 0.81 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write > 0.61 -0.0 0.57 perf-profile.calltrace.cycles-pp.w_test > 0.88 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 0.90 -0.0 0.87 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 0.61 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 0.71 -0.0 0.69 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 1.00 ± 4% +0.1 1.10 ± 3% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter > 2.68 +0.1 2.79 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 5.18 +0.1 5.31 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write > 6.00 +0.1 6.14 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 3.35 +0.1 3.48 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write > 4.00 +0.1 4.14 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > 96.95 +0.2 97.12 perf-profile.calltrace.cycles-pp.write > 3.68 +0.2 3.86 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 84.87 +0.6 85.45 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write > 83.48 +0.6 84.10 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 78.96 +0.7 79.70 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 75.00 +0.8 75.81 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 58.04 +1.1 59.14 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 36.40 +1.2 37.55 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 12.88 +1.3 14.20 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 11.79 +1.3 13.13 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write > 5.74 +1.4 7.16 ± 2% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 3.98 -0.5 3.44 perf-profile.children.cycles-pp.security_file_permission > 2.90 -0.5 2.40 perf-profile.children.cycles-pp.apparmor_file_permission > 7.66 -0.3 7.38 perf-profile.children.cycles-pp.file_remove_privs_flags > 7.14 -0.3 6.89 perf-profile.children.cycles-pp.entry_SYSCALL_64 > 5.34 -0.2 5.10 perf-profile.children.cycles-pp.rw_verify_area > 7.10 -0.2 6.88 perf-profile.children.cycles-pp.copy_page_from_iter_atomic > 3.98 -0.2 3.80 perf-profile.children.cycles-pp.cap_inode_need_killpriv > 4.73 -0.2 4.56 perf-profile.children.cycles-pp.security_inode_need_killpriv > 13.16 -0.2 13.00 perf-profile.children.cycles-pp.__generic_file_write_iter > 1.14 -0.1 1.01 perf-profile.children.cycles-pp.syscall_return_via_sysret > 5.67 -0.1 5.56 perf-profile.children.cycles-pp.simple_write_end > 3.56 -0.1 3.46 perf-profile.children.cycles-pp.__vfs_getxattr > 4.04 -0.1 3.95 perf-profile.children.cycles-pp.__cond_resched > 4.21 -0.1 4.14 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack > 2.31 -0.1 2.24 perf-profile.children.cycles-pp.rcu_all_qs > 0.99 -0.1 0.93 perf-profile.children.cycles-pp.folio_mark_dirty > 2.46 -0.0 2.42 perf-profile.children.cycles-pp.xas_load > 0.93 -0.0 0.88 ± 2% perf-profile.children.cycles-pp.w_test > 1.50 -0.0 1.46 perf-profile.children.cycles-pp.strcmp > 1.72 -0.0 1.68 perf-profile.children.cycles-pp.up_write > 0.87 -0.0 0.82 perf-profile.children.cycles-pp.setattr_should_drop_suidgid > 1.04 -0.0 1.00 perf-profile.children.cycles-pp.folio_mapping > 0.96 -0.0 0.92 perf-profile.children.cycles-pp.aa_file_perm > 1.23 -0.0 1.20 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags > 0.73 -0.0 0.70 perf-profile.children.cycles-pp.x64_sys_call > 1.07 -0.0 1.05 perf-profile.children.cycles-pp.folio_wait_stable > 0.43 -0.0 0.41 ± 2% perf-profile.children.cycles-pp.write@plt > 1.08 -0.0 1.06 perf-profile.children.cycles-pp.xattr_resolve_name > 0.35 -0.0 0.34 perf-profile.children.cycles-pp.__x64_sys_write > 99.01 +0.0 99.04 perf-profile.children.cycles-pp.write > 1.12 ± 4% +0.1 1.20 ± 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 > 2.86 +0.1 2.97 perf-profile.children.cycles-pp.down_write > 5.50 +0.1 5.63 perf-profile.children.cycles-pp.fault_in_readable > 3.75 +0.1 3.88 perf-profile.children.cycles-pp.inode_needs_update_time > 4.34 +0.1 4.47 perf-profile.children.cycles-pp.file_update_time > 6.25 +0.1 6.39 perf-profile.children.cycles-pp.fault_in_iov_iter_readable > 3.77 +0.2 3.96 perf-profile.children.cycles-pp.__fsnotify_parent > 85.29 +0.6 85.86 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe > 84.14 +0.6 84.74 perf-profile.children.cycles-pp.do_syscall_64 > 79.57 +0.7 80.28 perf-profile.children.cycles-pp.ksys_write > 75.84 +0.8 76.64 perf-profile.children.cycles-pp.vfs_write > 58.64 +1.1 59.72 perf-profile.children.cycles-pp.generic_file_write_iter > 37.30 +1.1 38.43 perf-profile.children.cycles-pp.generic_perform_write > 13.05 +1.3 14.38 perf-profile.children.cycles-pp.simple_write_begin > 12.18 +1.3 13.52 perf-profile.children.cycles-pp.__filemap_get_folio > 5.94 +1.4 7.35 ± 2% perf-profile.children.cycles-pp.filemap_get_entry > 1.77 -0.4 1.35 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission > 6.23 -0.3 5.94 perf-profile.self.cycles-pp.write > 6.94 -0.2 6.71 perf-profile.self.cycles-pp.copy_page_from_iter_atomic > 7.14 -0.2 6.93 perf-profile.self.cycles-pp.vfs_write > 1.13 -0.1 1.01 perf-profile.self.cycles-pp.syscall_return_via_sysret > 1.49 ± 3% -0.1 1.38 perf-profile.self.cycles-pp.ksys_write > 4.12 -0.1 4.03 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack > 0.42 -0.1 0.34 perf-profile.self.cycles-pp.cap_inode_need_killpriv > 2.17 -0.1 2.10 perf-profile.self.cycles-pp.file_remove_privs_flags > 1.08 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission > 1.83 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64 > 2.74 -0.1 2.68 perf-profile.self.cycles-pp.simple_write_end > 0.86 -0.0 0.81 perf-profile.self.cycles-pp.aa_file_perm > 1.60 -0.0 1.56 perf-profile.self.cycles-pp.up_write > 1.42 -0.0 1.38 perf-profile.self.cycles-pp.__vfs_getxattr > 0.86 -0.0 0.82 perf-profile.self.cycles-pp.folio_mapping > 1.36 -0.0 1.32 perf-profile.self.cycles-pp.rcu_all_qs > 0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_mark_dirty > 1.78 -0.0 1.75 perf-profile.self.cycles-pp.xas_load > 1.23 -0.0 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe > 0.74 -0.0 0.71 perf-profile.self.cycles-pp.setattr_should_drop_suidgid > 0.74 -0.0 0.71 ± 2% perf-profile.self.cycles-pp.w_test > 1.14 -0.0 1.11 perf-profile.self.cycles-pp.strcmp > 2.25 -0.0 2.22 perf-profile.self.cycles-pp.__cond_resched > 0.60 -0.0 0.58 perf-profile.self.cycles-pp.x64_sys_call > 0.77 -0.0 0.75 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags > 0.61 -0.0 0.60 perf-profile.self.cycles-pp.xattr_resolve_name > 0.74 +0.0 0.76 perf-profile.self.cycles-pp.fault_in_iov_iter_readable > 1.40 +0.1 1.45 perf-profile.self.cycles-pp.generic_write_checks > 1.60 +0.1 1.65 perf-profile.self.cycles-pp.inode_needs_update_time > 1.00 ± 4% +0.1 1.08 ± 3% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 > 1.86 ± 2% +0.1 1.98 perf-profile.self.cycles-pp.down_write > 5.34 +0.1 5.47 perf-profile.self.cycles-pp.fault_in_readable > 3.61 +0.2 3.80 perf-profile.self.cycles-pp.__fsnotify_parent > 1.46 +0.3 1.77 perf-profile.self.cycles-pp.rw_verify_area > 3.43 +1.4 4.88 ± 3% perf-profile.self.cycles-pp.filemap_get_entry > > > > > > Thanks, > > Amir. > > > > > > > > > > > testcase: unixbench > > > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory > > > parameters: > > > > > > runtime: 300s > > > nr_task: 100% > > > test: fsbuffer-w > > > cpufreq_governor: performance > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > the same patch/commit), kindly add following tags > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > | Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com > > > > > > > > > Details are as below: > > > --------------------------------------------------------------------------------------------------> > > > > > > > > > The kernel config and materials to reproduce are available at: > > > https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com > > > > > > ========================================================================================= > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > commit: > > > 00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event") > > > 9d1fd61f1d ("fanotify: pass optional file access range in pre-content event") > > > > > > 00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b > > > ---------------- --------------------------- > > > %stddev %change %stddev > > > \ | \ > > > 1.23e+08 -7.9% 1.133e+08 unixbench.throughput > > > 6169 -7.7% 5694 unixbench.time.user_time > > > 4.566e+10 -7.9% 4.206e+10 unixbench.workload > > > 1.513e+11 -4.5% 1.445e+11 perf-stat.i.branch-instructions > > > 6891152 +4.8% 7221484 perf-stat.i.branch-misses > > > 29764445 ± 2% -7.4% 27565609 ± 3% perf-stat.i.cache-references > > > 0.91 +2.0% 0.93 perf-stat.i.cpi > > > 7.187e+11 -2.7% 6.996e+11 perf-stat.i.instructions > > > 1.26 -2.6% 1.23 perf-stat.i.ipc > > > 0.00 +0.0 0.01 perf-stat.overall.branch-miss-rate% > > > 0.73 +2.7% 0.75 perf-stat.overall.cpi > > > 1.37 -2.6% 1.34 perf-stat.overall.ipc > > > 5828 +5.7% 6162 perf-stat.overall.path-length > > > 1.505e+11 -4.5% 1.437e+11 perf-stat.ps.branch-instructions > > > 6873687 +4.8% 7203107 perf-stat.ps.branch-misses > > > 29721957 ± 2% -7.3% 27538369 ± 3% perf-stat.ps.cache-references > > > 7.148e+11 -2.6% 6.96e+11 perf-stat.ps.instructions > > > 2.662e+14 -2.6% 2.592e+14 perf-stat.total.instructions > > > 57.79 -2.0 55.78 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > > > 37.58 -2.0 35.63 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > > 13.06 -1.0 12.04 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write > > > 13.81 -1.0 12.83 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > > 12.72 -0.9 11.78 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write > > > 7.00 -0.5 6.47 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > > 6.53 -0.5 6.02 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > > > 5.36 -0.5 4.89 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > > 3.66 -0.4 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64 > > > 2.68 -0.3 2.36 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write > > > 6.57 -0.2 6.34 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > > > 2.36 ± 2% -0.2 2.18 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > > 1.83 -0.2 1.66 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > > > 2.92 -0.2 2.76 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > > 2.65 -0.2 2.49 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write > > > 3.95 -0.1 3.83 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > > > 1.62 -0.1 1.50 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > > 0.74 -0.1 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > > > 3.26 -0.1 3.17 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter > > > 3.57 -0.1 3.49 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > > > 1.61 -0.1 1.53 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > > 0.93 -0.1 0.85 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > > 1.05 -0.1 0.99 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin > > > 0.61 -0.1 0.55 perf-profile.calltrace.cycles-pp.w_test > > > 0.64 -0.1 0.58 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > > 0.87 -0.1 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write > > > 2.50 -0.1 2.44 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter > > > 0.62 -0.1 0.56 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin > > > 0.74 -0.0 0.69 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > > > 0.91 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > > > 0.84 -0.0 0.79 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > > > 0.68 -0.0 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > > > 0.74 -0.0 0.71 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > > > 0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > > 0.97 +0.0 1.00 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > > > 0.91 +0.1 0.97 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter > > > 0.86 ± 3% +0.1 0.94 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write > > > 0.58 ± 2% +0.1 0.66 ± 7% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter > > > 11.24 +0.1 11.36 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > > 2.01 ± 2% +0.1 2.14 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > > > 6.04 +0.2 6.24 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > > > 5.17 +0.2 5.42 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write > > > 96.75 +0.3 97.03 perf-profile.calltrace.cycles-pp.write > > > 2.57 +0.4 2.92 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write > > > 3.20 +0.4 3.57 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > > > 84.82 +1.1 85.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write > > > 83.38 +1.2 84.56 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > > 78.73 +1.5 80.20 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > > 74.54 +1.8 76.32 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > > > 0.00 +4.0 3.99 perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64 > > > 5.32 +4.2 9.48 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > > > 58.42 -2.0 56.38 perf-profile.children.cycles-pp.generic_file_write_iter > > > 38.46 -2.0 36.50 perf-profile.children.cycles-pp.generic_perform_write > > > 13.99 -1.0 13.01 perf-profile.children.cycles-pp.simple_write_begin > > > 13.11 -1.0 12.15 perf-profile.children.cycles-pp.__filemap_get_folio > > > 7.23 -0.6 6.66 perf-profile.children.cycles-pp.entry_SYSCALL_64 > > > 7.12 -0.5 6.59 perf-profile.children.cycles-pp.copy_page_from_iter_atomic > > > 6.73 -0.5 6.21 perf-profile.children.cycles-pp.filemap_get_entry > > > 5.76 -0.5 5.26 perf-profile.children.cycles-pp.simple_write_end > > > 4.05 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission > > > 2.93 -0.3 2.59 perf-profile.children.cycles-pp.apparmor_file_permission > > > 4.32 -0.3 4.04 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack > > > 4.20 -0.3 3.92 perf-profile.children.cycles-pp.__cond_resched > > > 6.91 -0.2 6.67 perf-profile.children.cycles-pp.file_remove_privs_flags > > > 2.43 -0.2 2.24 perf-profile.children.cycles-pp.rcu_all_qs > > > 3.10 -0.2 2.92 perf-profile.children.cycles-pp.xas_load > > > 2.47 ± 2% -0.2 2.29 ± 2% perf-profile.children.cycles-pp.__fdget_pos > > > 1.92 -0.2 1.74 perf-profile.children.cycles-pp.folio_unlock > > > 3.11 -0.2 2.94 perf-profile.children.cycles-pp.down_write > > > 4.18 -0.1 4.04 perf-profile.children.cycles-pp.security_inode_need_killpriv > > > 1.68 -0.1 1.56 perf-profile.children.cycles-pp.up_write > > > 3.48 -0.1 3.38 perf-profile.children.cycles-pp.cap_inode_need_killpriv > > > 1.96 -0.1 1.87 perf-profile.children.cycles-pp.syscall_exit_to_user_mode > > > 1.28 -0.1 1.18 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags > > > 0.92 -0.1 0.84 perf-profile.children.cycles-pp.w_test > > > 3.14 -0.1 3.06 perf-profile.children.cycles-pp.__vfs_getxattr > > > 1.00 -0.1 0.92 perf-profile.children.cycles-pp.aa_file_perm > > > 1.29 -0.1 1.22 perf-profile.children.cycles-pp.xas_descend > > > 0.76 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call > > > 0.87 -0.1 0.80 perf-profile.children.cycles-pp.setattr_should_drop_suidgid > > > 1.07 -0.1 1.01 perf-profile.children.cycles-pp.xattr_resolve_name > > > 1.10 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable > > > 1.05 -0.1 1.00 perf-profile.children.cycles-pp.folio_mapping > > > 0.73 -0.1 0.67 perf-profile.children.cycles-pp.xas_start > > > 0.93 -0.1 0.88 perf-profile.children.cycles-pp.folio_mark_dirty > > > 0.50 -0.0 0.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack > > > 0.60 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi > > > 0.43 -0.0 0.39 perf-profile.children.cycles-pp.write@plt > > > 0.36 -0.0 0.33 perf-profile.children.cycles-pp.amd_clear_divider > > > 0.37 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write > > > 0.33 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio > > > 0.36 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode > > > 0.24 -0.0 0.23 ± 2% perf-profile.children.cycles-pp.file_remove_privs > > > 1.18 +0.0 1.21 perf-profile.children.cycles-pp.strcmp > > > 1.02 +0.1 1.08 perf-profile.children.cycles-pp.timestamp_truncate > > > 99.01 +0.1 99.09 perf-profile.children.cycles-pp.write > > > 0.98 ± 3% +0.1 1.06 perf-profile.children.cycles-pp.generic_write_check_limits > > > 0.68 ± 2% +0.1 0.77 ± 6% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 > > > 11.58 +0.1 11.69 perf-profile.children.cycles-pp.__generic_file_write_iter > > > 2.36 ± 2% +0.1 2.50 perf-profile.children.cycles-pp.generic_write_checks > > > 5.57 +0.2 5.75 perf-profile.children.cycles-pp.fault_in_readable > > > 6.28 +0.2 6.49 perf-profile.children.cycles-pp.fault_in_iov_iter_readable > > > 2.98 +0.4 3.33 perf-profile.children.cycles-pp.inode_needs_update_time > > > 3.51 +0.4 3.89 perf-profile.children.cycles-pp.file_update_time > > > 85.24 +1.1 86.31 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe > > > 84.05 +1.2 85.21 perf-profile.children.cycles-pp.do_syscall_64 > > > 79.32 +1.5 80.78 perf-profile.children.cycles-pp.ksys_write > > > 75.49 +1.7 77.21 perf-profile.children.cycles-pp.vfs_write > > > 3.64 +4.0 7.64 perf-profile.children.cycles-pp.__fsnotify_parent > > > 5.68 +4.3 10.03 perf-profile.children.cycles-pp.rw_verify_area > > > 6.96 -0.5 6.44 perf-profile.self.cycles-pp.copy_page_from_iter_atomic > > > 6.52 -0.5 6.01 perf-profile.self.cycles-pp.write > > > 6.92 -0.4 6.48 perf-profile.self.cycles-pp.vfs_write > > > 3.59 -0.3 3.24 perf-profile.self.cycles-pp.filemap_get_entry > > > 4.41 -0.3 4.09 perf-profile.self.cycles-pp.__filemap_get_folio > > > 4.23 -0.3 3.95 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack > > > 2.79 -0.3 2.52 perf-profile.self.cycles-pp.simple_write_end > > > 1.76 -0.2 1.52 perf-profile.self.cycles-pp.apparmor_file_permission > > > 2.32 ± 2% -0.2 2.16 ± 2% perf-profile.self.cycles-pp.__fdget_pos > > > 1.79 -0.2 1.62 perf-profile.self.cycles-pp.folio_unlock > > > 2.05 -0.2 1.89 perf-profile.self.cycles-pp.down_write > > > 2.35 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched > > > 1.89 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64 > > > 1.38 -0.1 1.26 perf-profile.self.cycles-pp.entry_SYSCALL_64 > > > 1.56 -0.1 1.45 perf-profile.self.cycles-pp.up_write > > > 1.30 -0.1 1.19 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe > > > 1.42 -0.1 1.31 perf-profile.self.cycles-pp.rcu_all_qs > > > 1.12 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission > > > 1.46 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write > > > 0.90 -0.1 0.83 perf-profile.self.cycles-pp.aa_file_perm > > > 1.29 -0.1 1.22 perf-profile.self.cycles-pp.xas_load > > > 0.74 -0.1 0.67 perf-profile.self.cycles-pp.w_test > > > 1.08 -0.1 1.01 perf-profile.self.cycles-pp.syscall_exit_to_user_mode > > > 1.98 -0.1 1.92 perf-profile.self.cycles-pp.file_remove_privs_flags > > > 1.30 -0.1 1.24 perf-profile.self.cycles-pp.__vfs_getxattr > > > 1.06 -0.1 1.00 perf-profile.self.cycles-pp.xas_descend > > > 0.80 -0.1 0.74 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags > > > 0.63 -0.1 0.58 perf-profile.self.cycles-pp.x64_sys_call > > > 0.74 -0.1 0.69 perf-profile.self.cycles-pp.setattr_should_drop_suidgid > > > 0.63 -0.0 0.58 perf-profile.self.cycles-pp.xas_start > > > 0.87 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping > > > 0.50 -0.0 0.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack > > > 0.60 -0.0 0.57 perf-profile.self.cycles-pp.xattr_resolve_name > > > 0.48 -0.0 0.44 perf-profile.self.cycles-pp.folio_mark_dirty > > > 0.68 -0.0 0.65 perf-profile.self.cycles-pp.security_inode_need_killpriv > > > 0.36 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.inode_to_bdi > > > 0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_wait_stable > > > 0.34 -0.0 0.32 perf-profile.self.cycles-pp.cap_inode_need_killpriv > > > 0.89 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin > > > 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write > > > 0.23 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.amd_clear_divider > > > 0.23 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio > > > 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.write@plt > > > 0.24 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.is_bad_inode > > > 0.62 +0.0 0.65 perf-profile.self.cycles-pp.file_update_time > > > 0.86 +0.0 0.90 perf-profile.self.cycles-pp.strcmp > > > 0.69 +0.0 0.74 perf-profile.self.cycles-pp.fault_in_iov_iter_readable > > > 0.75 ± 3% +0.1 0.81 perf-profile.self.cycles-pp.generic_write_check_limits > > > 1.42 ± 2% +0.1 1.48 perf-profile.self.cycles-pp.generic_write_checks > > > 0.82 +0.1 0.89 perf-profile.self.cycles-pp.timestamp_truncate > > > 0.58 ± 3% +0.1 0.66 ± 6% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 > > > 5.44 +0.2 5.60 perf-profile.self.cycles-pp.fault_in_readable > > > 1.36 +0.2 1.55 perf-profile.self.cycles-pp.inode_needs_update_time > > > 1.76 ± 3% +0.9 2.64 perf-profile.self.cycles-pp.rw_verify_area > > > 3.46 +3.8 7.25 perf-profile.self.cycles-pp.__fsnotify_parent > > > > > > > > > > > > > > > Disclaimer: > > > Results have been estimated based on internal Intel analysis and are provided > > > for informational purposes only. Any difference in system hardware or software > > > design or configuration may affect actual performance. > > > > > > > > > -- > > > 0-DAY CI Kernel Test Service > > > https://github.com/intel/lkp-tests/wiki > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-05-31 5:18 ` Amir Goldstein @ 2024-06-03 8:13 ` Oliver Sang 2024-06-04 12:33 ` Amir Goldstein 0 siblings, 1 reply; 17+ messages in thread From: Oliver Sang @ 2024-06-03 8:13 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang hi, Amir, On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote: > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > hi, Amir, > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote: > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot > > > <oliver.sang@intel.com> wrote: > > > > > > > > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > > > > https://github.com/amir73il/linux sb_write_barrier > > > > > > > > > > Jan, > > > > > > I speculate that the regression is due to the fact that we store and pass the > > > path information on struct file_range on the stack before the optimizations > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores > > > and __fsnotify_parent() pays a bigger price for fetches? > > > > > > Luckily, we already have the way to check > > > fsnotify_sb_has_priority_watchers(inode->i_sb, > > > FSNOTIFY_PRIO_PRE_CONTENT)) > > > so now I used it to optimize out the fsnotify_file_range() inline > > > code entirely. > > > > > > Oliver, > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1): > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info > > > with pre-content events > > > * f301cd18006c - fanotify: rename a misnamed constant > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec > > > * aca408421327 - fsnotify: generate pre-content permission event on open > > > * 93656e196b00 - fsnotify: introduce pre-content permission event > > > > > > The optimization was done in the first commit (fsnotify: introduce > > > pre-content permission event), > > > but impacts the regressing commit (fanotify: pass optional file access > > > range in pre-content event). > > > no need to test all middle commits. > > > > I directly compare the tip with v6.10-rc1, still a regression but better now > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > commit: > > v6.10-rc1 > > a82fd282befc7 ("fanotify: report file range info with pre-content events") > > > > v6.10-rc1 a82fd282befc71d99106bf31066 > > ---------------- --------------------------- > > %stddev %change %stddev > > \ | \ > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput > > > > full data is as below [1] > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event" > > > > it also has a small regression comparing to its parent, but better also. > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > commit: > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") > > > > 94167e071109d573 64108c0b47db91b20d658a89969 > > ---------------- --------------------------- > > %stddev %change %stddev > > \ | \ > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > > > full data is as below [2] > > > > Ok, this looks sane, the small overhead in the write path makes sense. > It may have been a "tactic mistake" merging this optimization to v6.10-rc1 > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers") > before the rest of the pre-content infrastructure, because together they > would still be a performance win. > > Can you please compare this branch to v6.9? there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests. ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: v6.9 v6.10-rc1 a82fd282befc7 ("fanotify: report file range info with pre-content events") v6.9 v6.10-rc1 a82fd282befc71d99106bf31066 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 9218048 ± 19% +33.1% 12267178 ± 6% +14.2% 10523306 ± 7% meminfo.DirectMap2M 151289 +63.8% 247886 ± 6% +61.4% 244132 ± 4% meminfo.DirectMap4k 0.52 +0.1 0.58 +0.1 0.59 mpstat.cpu.all.irq% 0.01 -0.0 0.01 ± 4% -0.0 0.01 mpstat.cpu.all.soft% 10241 -9.1% 9314 ± 11% -15.6% 8648 ± 15% sched_debug.cpu.curr->pid.min -35.33 -55.4% -15.76 -62.3% -13.31 sched_debug.cpu.nr_uninterruptible.min 109116 ± 96% -85.8% 15473 ±125% +8.6% 118471 ± 81% numa-meminfo.node0.AnonHugePages 4803556 ± 2% -3.5% 4636196 ± 2% -31.7% 3278497 ± 41% numa-meminfo.node0.MemUsed 574474 ± 29% +45.5% 836146 ± 6% -6.8% 535267 ± 45% numa-meminfo.node1.AnonPages.max 1773634 ± 6% +8.9% 1931750 ± 5% +85.6% 3291386 ± 41% numa-meminfo.node1.MemUsed 35.33 ± 15% -73.1% 9.50 ± 24% -58.5% 14.67 ± 27% perf-c2c.DRAM.local 181.67 ± 11% -74.7% 46.00 ± 12% -69.6% 55.17 ± 18% perf-c2c.DRAM.remote 298.67 ± 7% -82.2% 53.17 ± 9% -79.1% 62.33 ± 21% perf-c2c.HITM.local 125.67 ± 15% -77.1% 28.83 ± 15% -72.9% 34.00 ± 22% perf-c2c.HITM.remote 265024 -1.2% 261842 -0.8% 262871 time.involuntary_context_switches 25.33 ± 16% -61.2% 9.83 ± 23% -59.2% 10.33 ± 22% time.major_page_faults 7168 +0.9% 7234 +0.9% 7234 time.maximum_resident_set_size 6286 -1.4% 6199 -7.1% 5841 time.user_time 70712 -1.7% 69536 -0.9% 70096 proc-vmstat.nr_active_anon 9037 +1.4% 9162 ± 2% +2.9% 9301 proc-vmstat.nr_page_table_pages 73584 -1.8% 72274 -1.1% 72752 proc-vmstat.nr_shmem 70712 -1.7% 69536 -0.9% 70096 proc-vmstat.nr_zone_active_anon 35571 ± 8% -9.5% 32176 ± 3% -15.7% 29987 ± 4% proc-vmstat.pgactivate 1.219e+08 -0.2% 1.216e+08 -4.1% 1.168e+08 unixbench.throughput 265024 -1.2% 261842 -0.8% 262871 unixbench.time.involuntary_context_switches 7168 +0.9% 7234 +0.9% 7234 unixbench.time.maximum_resident_set_size 6286 -1.4% 6199 -7.1% 5841 unixbench.time.user_time 4.521e+10 -0.2% 4.513e+10 -4.1% 4.338e+10 unixbench.workload 1.476e+11 -1.2% 1.458e+11 -3.9% 1.419e+11 perf-stat.i.branch-instructions 7506784 -2.1% 7347431 -2.4% 7329399 perf-stat.i.branch-misses 3830897 ± 5% +2.2% 3915539 ± 8% +523.5% 23884093 ± 9% perf-stat.i.cache-misses 30323968 ± 2% +6.9% 32425619 ± 3% +430.9% 1.61e+08 ± 4% perf-stat.i.cache-references 0.94 +1.6% 0.95 +1.6% 0.95 perf-stat.i.cpi 157608 ± 12% -4.1% 151202 ± 16% -79.5% 32364 ± 56% perf-stat.i.cycles-between-cache-misses 7.003e+11 -0.6% 6.961e+11 -2.5% 6.828e+11 perf-stat.i.instructions 1.23 -1.0% 1.22 -2.3% 1.20 perf-stat.i.ipc 0.09 ± 14% -56.1% 0.04 ± 20% -64.9% 0.03 ± 22% perf-stat.i.major-faults 0.01 ± 5% +3.3% 0.01 ± 9% +540.2% 0.04 ± 10% perf-stat.overall.MPKI 0.01 -0.0 0.01 +0.0 0.01 perf-stat.overall.branch-miss-rate% 0.75 +0.6% 0.75 +2.6% 0.77 perf-stat.overall.cpi 136694 ± 5% -2.1% 133775 ± 8% -83.9% 22060 ± 9% perf-stat.overall.cycles-between-cache-misses 1.34 -0.6% 1.33 -2.5% 1.31 perf-stat.overall.ipc 5752 -0.5% 5721 +1.5% 5836 perf-stat.overall.path-length 1.469e+11 -1.2% 1.452e+11 -3.8% 1.413e+11 perf-stat.ps.branch-instructions 3815245 ± 5% +2.8% 3921138 ± 8% +524.3% 23818053 ± 9% perf-stat.ps.cache-misses 30276290 ± 2% +7.1% 32415461 ± 3% +429.3% 1.603e+08 ± 4% perf-stat.ps.cache-references 6.97e+11 -0.5% 6.932e+11 -2.5% 6.797e+11 perf-stat.ps.instructions 0.09 ± 14% -56.0% 0.04 ± 21% -64.6% 0.03 ± 23% perf-stat.ps.major-faults 2.601e+14 -0.7% 2.582e+14 -2.7% 2.532e+14 perf-stat.total.instructions 58.72 -0.8 57.94 +0.5 59.20 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 38.04 -0.8 37.28 +0.1 38.12 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 5.91 -0.6 5.30 -0.5 5.38 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write 78.64 -0.5 78.13 +0.8 79.41 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 2.65 -0.5 2.18 -0.5 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write 83.29 -0.5 82.83 +0.6 83.86 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 6.45 -0.5 5.99 +0.8 7.25 ± 3% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 84.71 -0.4 84.26 +0.5 85.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 6.59 -0.4 6.17 -0.3 6.27 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 74.65 -0.4 74.26 +0.9 75.59 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 13.74 -0.3 13.39 +0.6 14.36 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 12.63 -0.3 12.30 +0.7 13.30 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write 4.62 -0.3 4.32 +0.3 4.96 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.78 ± 2% -0.3 2.50 -0.4 2.35 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write 3.62 -0.3 3.34 -0.3 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64 2.06 -0.1 1.92 -0.1 1.95 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 2.92 -0.1 2.81 -0.2 2.72 ± 2% perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 0.75 -0.0 0.70 -0.1 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write 0.93 -0.0 0.89 -0.0 0.88 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write 0.99 -0.0 0.96 -0.0 0.98 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter 0.74 -0.0 0.72 -0.0 0.71 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.63 -0.0 0.62 -0.0 0.61 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 0.87 -0.0 0.86 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write 0.64 -0.0 0.63 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.75 -0.0 0.75 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write 0.68 +0.0 0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 0.91 +0.0 0.92 -0.0 0.88 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 0.84 +0.0 0.86 -0.0 0.83 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags 0.63 +0.0 0.65 -0.0 0.60 ± 2% perf-profile.calltrace.cycles-pp.w_test 5.29 +0.0 5.30 +0.1 5.36 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 0.89 +0.0 0.92 -0.0 0.87 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 1.59 +0.0 1.62 -0.0 1.55 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.61 ± 2% +0.0 0.64 +0.0 0.63 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin 1.62 +0.1 1.68 -0.0 1.59 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 1.05 +0.1 1.11 -0.1 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write 96.77 +0.1 96.84 +0.2 96.98 perf-profile.calltrace.cycles-pp.write 6.92 +0.1 7.01 -0.1 6.80 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 2.87 +0.1 2.97 +0.7 3.57 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write 3.53 +0.1 3.63 +0.7 4.24 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write 1.03 +0.1 1.13 +0.1 1.17 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags 2.59 +0.1 2.71 +0.1 2.71 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter 3.37 +0.2 3.53 +0.1 3.50 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter 0.62 ± 3% +0.2 0.78 ± 2% +0.5 1.13 ± 5% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter 13.02 +0.2 13.19 -0.5 12.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write 4.06 +0.2 4.23 +0.2 4.22 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write 6.66 +0.2 6.88 +0.2 6.91 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write 3.48 +0.2 3.73 +0.2 3.64 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 11.67 +0.3 12.01 +0.9 12.62 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 0.00 +0.5 0.52 +0.5 0.52 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.00 +0.5 0.53 +0.5 0.51 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write 59.32 -0.8 58.52 +0.5 59.78 perf-profile.children.cycles-pp.generic_file_write_iter 38.93 -0.8 38.16 +0.1 39.01 perf-profile.children.cycles-pp.generic_perform_write 3.10 -0.6 2.47 -0.7 2.41 perf-profile.children.cycles-pp.xas_load 79.22 -0.5 78.74 +0.8 80.00 perf-profile.children.cycles-pp.ksys_write 6.64 -0.5 6.18 +0.8 7.44 ± 3% perf-profile.children.cycles-pp.filemap_get_entry 85.12 -0.4 84.67 +0.5 85.63 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 83.94 -0.4 83.50 +0.6 84.51 perf-profile.children.cycles-pp.do_syscall_64 6.86 -0.4 6.42 -0.3 6.53 perf-profile.children.cycles-pp.fault_in_iov_iter_readable 75.55 -0.4 75.13 +0.9 76.42 perf-profile.children.cycles-pp.vfs_write 6.08 -0.4 5.69 -0.3 5.75 perf-profile.children.cycles-pp.fault_in_readable 13.92 -0.3 13.58 +0.6 14.54 perf-profile.children.cycles-pp.simple_write_begin 13.03 -0.3 12.68 +0.7 13.68 perf-profile.children.cycles-pp.__filemap_get_folio 4.86 -0.3 4.56 +0.5 5.33 perf-profile.children.cycles-pp.rw_verify_area 4.01 -0.3 3.71 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission 3.02 ± 2% -0.3 2.74 -0.4 2.58 perf-profile.children.cycles-pp.apparmor_file_permission 2.42 -0.2 2.25 -0.1 2.27 perf-profile.children.cycles-pp.generic_write_checks 3.11 -0.1 2.98 -0.2 2.90 ± 2% perf-profile.children.cycles-pp.down_write 4.28 -0.1 4.18 -0.3 4.00 perf-profile.children.cycles-pp.__cond_resched 1.05 -0.0 1.00 -0.1 1.00 perf-profile.children.cycles-pp.generic_write_check_limits 98.99 -0.0 98.96 +0.0 99.02 perf-profile.children.cycles-pp.write 2.45 -0.0 2.42 -0.1 2.30 perf-profile.children.cycles-pp.rcu_all_qs 1.10 -0.0 1.07 -0.0 1.09 perf-profile.children.cycles-pp.timestamp_truncate 0.99 -0.0 0.98 -0.1 0.94 perf-profile.children.cycles-pp.aa_file_perm 0.76 -0.0 0.75 -0.0 0.71 perf-profile.children.cycles-pp.x64_sys_call 0.33 -0.0 0.32 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio 0.23 -0.0 0.23 ± 2% -0.0 0.22 perf-profile.children.cycles-pp.file_remove_privs 0.59 +0.0 0.59 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi 0.25 +0.0 0.25 -0.0 0.24 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited 0.93 +0.0 0.93 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty 0.36 +0.0 0.36 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider 0.37 +0.0 0.38 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write 1.26 +0.0 1.26 -0.0 1.21 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags 5.69 +0.0 5.70 +0.1 5.75 perf-profile.children.cycles-pp.simple_write_end 0.35 ± 2% +0.0 0.36 +0.0 0.37 perf-profile.children.cycles-pp.__cmd_record 0.35 ± 2% +0.0 0.36 +0.0 0.37 perf-profile.children.cycles-pp.cmd_record 0.35 ± 2% +0.0 0.36 +0.0 0.37 perf-profile.children.cycles-pp.record__mmap_read_evlist 1.08 +0.0 1.10 -0.0 1.07 perf-profile.children.cycles-pp.xattr_resolve_name 0.39 ± 2% +0.0 0.40 ± 3% +0.0 0.41 ± 4% perf-profile.children.cycles-pp.update_process_times 0.42 ± 2% +0.0 0.43 ± 3% +0.0 0.44 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.44 +0.0 0.46 -0.0 0.42 perf-profile.children.cycles-pp.write@plt 0.41 ± 3% +0.0 0.43 ± 3% +0.0 0.43 ± 4% perf-profile.children.cycles-pp.tick_nohz_handler 1.03 +0.0 1.05 -0.0 1.03 perf-profile.children.cycles-pp.folio_mapping 0.08 +0.0 0.10 ± 4% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.update_min_vruntime 0.10 +0.0 0.13 ± 2% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.update_curr 0.96 +0.0 0.99 -0.0 0.91 perf-profile.children.cycles-pp.w_test 1.09 +0.0 1.12 -0.0 1.06 perf-profile.children.cycles-pp.folio_wait_stable 1.95 +0.0 1.99 -0.1 1.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.00 +0.0 0.04 ± 44% +0.1 0.05 perf-profile.children.cycles-pp.ktime_get_update_offsets_now 0.09 +0.1 0.15 ± 5% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.ktime_get 0.10 ± 4% +0.1 0.15 ± 4% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.clockevents_program_event 4.36 +0.1 4.42 -0.2 4.18 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 0.50 +0.1 0.56 +0.0 0.53 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 1.68 +0.1 1.74 -0.0 1.65 perf-profile.children.cycles-pp.up_write 1.14 +0.1 1.21 -0.1 1.00 perf-profile.children.cycles-pp.syscall_return_via_sysret 0.55 ± 3% +0.1 0.64 ± 3% +0.1 0.66 ± 5% perf-profile.children.cycles-pp.hrtimer_interrupt 7.05 +0.1 7.14 -0.1 6.94 perf-profile.children.cycles-pp.copy_page_from_iter_atomic 0.56 ± 3% +0.1 0.65 ± 3% +0.1 0.67 ± 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.61 ± 2% +0.1 0.70 ± 3% +0.1 0.72 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.58 ± 2% +0.1 0.67 ± 3% +0.1 0.69 ± 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 3.86 +0.1 3.96 +0.7 4.57 perf-profile.children.cycles-pp.file_update_time 3.28 +0.1 3.39 +0.7 3.97 perf-profile.children.cycles-pp.inode_needs_update_time 1.27 +0.1 1.38 +0.1 1.40 perf-profile.children.cycles-pp.strcmp 3.25 +0.2 3.41 +0.1 3.38 perf-profile.children.cycles-pp.__vfs_getxattr 0.73 ± 3% +0.2 0.89 +0.5 1.24 ± 4% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 3.60 +0.2 3.76 +0.1 3.73 perf-profile.children.cycles-pp.cap_inode_need_killpriv 4.29 +0.2 4.47 +0.2 4.44 perf-profile.children.cycles-pp.security_inode_need_killpriv 7.00 +0.2 7.23 +0.3 7.26 perf-profile.children.cycles-pp.file_remove_privs_flags 7.20 +0.2 7.43 -0.1 7.06 perf-profile.children.cycles-pp.entry_SYSCALL_64 0.00 +0.3 0.27 ± 2% +0.3 0.27 ± 2% perf-profile.children.cycles-pp.sched_tick 3.54 +0.3 3.82 +0.2 3.72 perf-profile.children.cycles-pp.__fsnotify_parent 12.00 +0.3 12.35 +1.0 12.96 perf-profile.children.cycles-pp.__generic_file_write_iter 5.93 -0.4 5.54 -0.3 5.59 perf-profile.self.cycles-pp.fault_in_readable 1.86 ± 3% -0.3 1.60 -0.4 1.50 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission 1.42 -0.1 1.30 -0.1 1.31 perf-profile.self.cycles-pp.generic_write_checks 2.43 -0.1 2.34 -0.2 2.22 perf-profile.self.cycles-pp.__cond_resched 3.43 -0.1 3.36 -0.1 3.36 perf-profile.self.cycles-pp.generic_perform_write 2.00 -0.1 1.93 -0.1 1.92 ± 2% perf-profile.self.cycles-pp.down_write 1.76 -0.1 1.69 -0.1 1.67 perf-profile.self.cycles-pp.generic_file_write_iter 0.63 ± 2% -0.1 0.56 -0.1 0.56 perf-profile.self.cycles-pp.xas_start 6.51 -0.1 6.45 -0.4 6.08 perf-profile.self.cycles-pp.write 0.77 -0.0 0.72 -0.0 0.77 perf-profile.self.cycles-pp.fault_in_iov_iter_readable 1.28 -0.0 1.25 -0.1 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.90 -0.0 0.87 +0.0 0.92 perf-profile.self.cycles-pp.timestamp_truncate 1.53 -0.0 1.51 +0.2 1.69 perf-profile.self.cycles-pp.inode_needs_update_time 0.90 -0.0 0.88 -0.1 0.84 perf-profile.self.cycles-pp.aa_file_perm 0.81 -0.0 0.79 -0.0 0.78 perf-profile.self.cycles-pp.generic_write_check_limits 0.86 -0.0 0.84 +1.0 1.82 perf-profile.self.cycles-pp.rw_verify_area 1.11 -0.0 1.10 -0.1 1.04 perf-profile.self.cycles-pp.security_file_permission 1.42 -0.0 1.41 -0.1 1.36 perf-profile.self.cycles-pp.rcu_all_qs 0.63 -0.0 0.62 -0.0 0.59 perf-profile.self.cycles-pp.x64_sys_call 0.23 ± 2% -0.0 0.22 -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio 0.80 -0.0 0.80 -0.0 0.76 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags 0.12 -0.0 0.12 ± 3% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.write@plt 0.90 -0.0 0.90 -0.0 0.86 perf-profile.self.cycles-pp.simple_write_begin 0.25 +0.0 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write 2.75 +0.0 2.75 +0.0 2.79 perf-profile.self.cycles-pp.simple_write_end 1.89 +0.0 1.90 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64 0.66 +0.0 0.66 +0.0 0.69 perf-profile.self.cycles-pp.file_update_time 0.52 +0.0 0.53 -0.0 0.51 perf-profile.self.cycles-pp.folio_wait_stable 0.86 +0.0 0.87 -0.0 0.85 perf-profile.self.cycles-pp.folio_mapping 0.69 +0.0 0.70 +0.0 0.71 perf-profile.self.cycles-pp.security_inode_need_killpriv 1.08 +0.0 1.09 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.07 +0.0 0.09 ± 5% +0.0 0.08 ± 7% perf-profile.self.cycles-pp.update_min_vruntime 0.77 +0.0 0.79 -0.0 0.73 perf-profile.self.cycles-pp.w_test 0.00 +0.0 0.02 ± 99% +0.1 0.05 perf-profile.self.cycles-pp.ktime_get_update_offsets_now 1.44 +0.0 1.47 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write 1.99 +0.0 2.04 +0.1 2.09 perf-profile.self.cycles-pp.file_remove_privs_flags 1.57 +0.1 1.62 -0.0 1.53 perf-profile.self.cycles-pp.up_write 1.33 +0.1 1.39 +0.0 1.36 perf-profile.self.cycles-pp.__vfs_getxattr 0.09 ± 5% +0.1 0.14 ± 5% +0.1 0.15 ± 13% perf-profile.self.cycles-pp.ktime_get 4.27 +0.1 4.32 -0.2 4.08 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.49 +0.1 0.56 +0.0 0.53 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 1.14 +0.1 1.21 -0.1 1.00 perf-profile.self.cycles-pp.syscall_return_via_sysret 6.90 +0.1 6.98 -0.1 6.78 perf-profile.self.cycles-pp.copy_page_from_iter_atomic 4.43 +0.1 4.52 -0.1 4.36 perf-profile.self.cycles-pp.__filemap_get_folio 0.94 +0.1 1.03 +0.1 1.06 perf-profile.self.cycles-pp.strcmp 0.62 ± 3% +0.2 0.78 ± 2% +0.5 1.11 ± 5% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 3.49 +0.2 3.66 +1.5 4.97 ± 4% perf-profile.self.cycles-pp.filemap_get_entry 3.42 +0.2 3.65 +0.1 3.56 perf-profile.self.cycles-pp.__fsnotify_parent 1.35 +0.3 1.66 +0.3 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64 6.92 +0.3 7.25 -0.3 6.64 perf-profile.self.cycles-pp.vfs_write 1.29 +0.5 1.80 +0.4 1.74 perf-profile.self.cycles-pp.xas_load > > Thanks, > Amir. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-06-03 8:13 ` Oliver Sang @ 2024-06-04 12:33 ` Amir Goldstein 2024-07-01 7:42 ` Oliver Sang 0 siblings, 1 reply; 17+ messages in thread From: Amir Goldstein @ 2024-06-04 12:33 UTC (permalink / raw) To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote: > > hi, Amir, > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote: > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > hi, Amir, > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote: > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot > > > > <oliver.sang@intel.com> wrote: > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > > > > > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > > > > > https://github.com/amir73il/linux sb_write_barrier > > > > > > > > > > > > > Jan, > > > > > > > > I speculate that the regression is due to the fact that we store and pass the > > > > path information on struct file_range on the stack before the optimizations > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores > > > > and __fsnotify_parent() pays a bigger price for fetches? > > > > > > > > Luckily, we already have the way to check > > > > fsnotify_sb_has_priority_watchers(inode->i_sb, > > > > FSNOTIFY_PRIO_PRE_CONTENT)) > > > > so now I used it to optimize out the fsnotify_file_range() inline > > > > code entirely. > > > > > > > > Oliver, > > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1): > > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info > > > > with pre-content events > > > > * f301cd18006c - fanotify: rename a misnamed constant > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec > > > > * aca408421327 - fsnotify: generate pre-content permission event on open > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event > > > > > > > > The optimization was done in the first commit (fsnotify: introduce > > > > pre-content permission event), > > > > but impacts the regressing commit (fanotify: pass optional file access > > > > range in pre-content event). > > > > no need to test all middle commits. > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now > > > > > > ========================================================================================= > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > commit: > > > v6.10-rc1 > > > a82fd282befc7 ("fanotify: report file range info with pre-content events") > > > > > > v6.10-rc1 a82fd282befc71d99106bf31066 > > > ---------------- --------------------------- > > > %stddev %change %stddev > > > \ | \ > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > full data is as below [1] > > > > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event" > > > > > > it also has a small regression comparing to its parent, but better also. > > > > > > ========================================================================================= > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > commit: > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969 > > > ---------------- --------------------------- > > > %stddev %change %stddev > > > \ | \ > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > > > > > full data is as below [2] > > > > > > > Ok, this looks sane, the small overhead in the write path makes sense. On second look, while a small regression from 64108c0b47db9 could make sense, because it changes the inline fsnotify hooks, the extra regression from the tip of the branch a82fd282befc7 makes no sense at all, as it does not touch any code that affects the executed functions, so I have to wonder how reliable are those results. Could you re-test the commits 94167e071109d..a82fd282befc7? > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1 > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers") > > before the rest of the pre-content infrastructure, because together they > > would still be a performance win. > > > > Can you please compare this branch to v6.9? > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests. > This is a bit surprising to me, because a5e57b4d370c should have been a pretty big performance win for the common case. Especially, considering that here [1] you reported in pre-merge testing that an identical commit has improved the fstime-r/unixbench workload (although with gcc-12): [1] https://lore.kernel.org/oe-lkp/Zfj3wxDHolB1qCGO@xsang-OptiPlex-9020/ and here [2] that a similar commit had improved writeseek1/will-it-scale [2] https://lore.kernel.org/all/Zc7KmlQ1cYVrPMQ+@xsang-OptiPlex-9020/ Judging by simple_write_begin() in this regression perf report, and shmem_file_write_iter in the reports above, may I assume that this report was with a kernel with non-default config !CONFIG_SHMEM? Is that correct? Is this an intended config change? Thanks, Amir. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-06-04 12:33 ` Amir Goldstein @ 2024-07-01 7:42 ` Oliver Sang 2024-07-03 5:58 ` Amir Goldstein 0 siblings, 1 reply; 17+ messages in thread From: Oliver Sang @ 2024-07-01 7:42 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang hi, Amir, sorry for quite late. On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote: > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > hi, Amir, > > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote: > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > > > hi, Amir, > > > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote: > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot > > > > > <oliver.sang@intel.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > > > > > > > > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > > > > > > https://github.com/amir73il/linux sb_write_barrier > > > > > > > > > > > > > > > > Jan, > > > > > > > > > > I speculate that the regression is due to the fact that we store and pass the > > > > > path information on struct file_range on the stack before the optimizations > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores > > > > > and __fsnotify_parent() pays a bigger price for fetches? > > > > > > > > > > Luckily, we already have the way to check > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb, > > > > > FSNOTIFY_PRIO_PRE_CONTENT)) > > > > > so now I used it to optimize out the fsnotify_file_range() inline > > > > > code entirely. > > > > > > > > > > Oliver, > > > > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1): > > > > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info > > > > > with pre-content events > > > > > * f301cd18006c - fanotify: rename a misnamed constant > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event > > > > > > > > > > The optimization was done in the first commit (fsnotify: introduce > > > > > pre-content permission event), > > > > > but impacts the regressing commit (fanotify: pass optional file access > > > > > range in pre-content event). > > > > > no need to test all middle commits. > > > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now > > > > > > > > ========================================================================================= > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > commit: > > > > v6.10-rc1 > > > > a82fd282befc7 ("fanotify: report file range info with pre-content events") > > > > > > > > v6.10-rc1 a82fd282befc71d99106bf31066 > > > > ---------------- --------------------------- > > > > %stddev %change %stddev > > > > \ | \ > > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > > > full data is as below [1] > > > > > > > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event" > > > > > > > > it also has a small regression comparing to its parent, but better also. > > > > > > > > ========================================================================================= > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > commit: > > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") > > > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969 > > > > ---------------- --------------------------- > > > > %stddev %change %stddev > > > > \ | \ > > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > > > > > > > full data is as below [2] > > > > > > > > > > Ok, this looks sane, the small overhead in the write path makes sense. > > On second look, while a small regression from 64108c0b47db9 could make > sense, because it changes the inline fsnotify hooks, the extra regression from > the tip of the branch a82fd282befc7 makes no sense at all, as it does not > touch any code that affects the executed functions, so I have to wonder how > reliable are those results. > > Could you re-test the commits 94167e071109d..a82fd282befc7? since the branch is: * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <--- * f301cd18006c3 fanotify: rename a misnamed constant * 64108c0b47db9 fanotify: pass optional file access range in pre-content event * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <--- * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d * 83af0c89527ab fsnotify: generate pre-content permission event on exec * aca4084213276 fsnotify: generate pre-content permission event on open * 93656e196b006 fsnotify: introduce pre-content permission event * 1613e604df0cd (tag: v6.10-rc1, I made below comparison, which shows little difference among 3 commits: ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event a82fd282befc7 fanotify: report file range info with pre-content events 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput > > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1 > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers") > > > before the rest of the pre-content infrastructure, because together they > > > would still be a performance win. > > > > > > Can you please compare this branch to v6.9? > > > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests. > > > > This is a bit surprising to me, because a5e57b4d370c should have been a pretty > big performance win for the common case. in our this unixbench tests, a5e57b4d370c introduce a small regression comparing to its parent (477cf917dd028). ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: v6.9 477cf917dd028 fsnotify: use an enum for group priority constants a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers v6.10-rc1 v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919 v6.10-rc1 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 1.219e+08 +2.8% 1.253e+08 +0.4% 1.224e+08 -0.2% 1.216e+08 unixbench.throughput BTW, for a5e57b4d370c, there is another regression report in https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/ which also includes some unixbench improvement results, but different sub-tests on different platform. > > Especially, considering that here [1] you reported in pre-merge testing that an > identical commit has improved the fstime-r/unixbench workload > (although with gcc-12): > [1] https://lore.kernel.org/oe-lkp/Zfj3wxDHolB1qCGO@xsang-OptiPlex-9020/ > and here [2] that a similar commit had improved writeseek1/will-it-scale > [2] https://lore.kernel.org/all/Zc7KmlQ1cYVrPMQ+@xsang-OptiPlex-9020/ > > Judging by simple_write_begin() in this regression perf report, and > shmem_file_write_iter in the reports above, may I assume that this report > was with a kernel with non-default config !CONFIG_SHMEM? > Is that correct? Is this an intended config change? we always set CONFIG_SHMEM. > > Thanks, > Amir. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-01 7:42 ` Oliver Sang @ 2024-07-03 5:58 ` Amir Goldstein 2024-07-03 7:21 ` Oliver Sang 0 siblings, 1 reply; 17+ messages in thread From: Amir Goldstein @ 2024-07-03 5:58 UTC (permalink / raw) To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote: > > hi, Amir, > > sorry for quite late. > > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote: > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > hi, Amir, > > > > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote: > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > > > > > hi, Amir, > > > > > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote: > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot > > > > > > <oliver.sang@intel.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > > > > > > > > > > > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > > > > > > > https://github.com/amir73il/linux sb_write_barrier > > > > > > > > > > > > > > > > > > > Jan, > > > > > > > > > > > > I speculate that the regression is due to the fact that we store and pass the > > > > > > path information on struct file_range on the stack before the optimizations > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores > > > > > > and __fsnotify_parent() pays a bigger price for fetches? > > > > > > > > > > > > Luckily, we already have the way to check > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb, > > > > > > FSNOTIFY_PRIO_PRE_CONTENT)) > > > > > > so now I used it to optimize out the fsnotify_file_range() inline > > > > > > code entirely. > > > > > > > > > > > > Oliver, > > > > > > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1): > > > > > > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info > > > > > > with pre-content events > > > > > > * f301cd18006c - fanotify: rename a misnamed constant > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event > > > > > > > > > > > > The optimization was done in the first commit (fsnotify: introduce > > > > > > pre-content permission event), > > > > > > but impacts the regressing commit (fanotify: pass optional file access > > > > > > range in pre-content event). > > > > > > no need to test all middle commits. > > > > > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now > > > > > > > > > > ========================================================================================= > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > > > commit: > > > > > v6.10-rc1 > > > > > a82fd282befc7 ("fanotify: report file range info with pre-content events") > > > > > > > > > > v6.10-rc1 a82fd282befc71d99106bf31066 > > > > > ---------------- --------------------------- > > > > > %stddev %change %stddev > > > > > \ | \ > > > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > > > > > full data is as below [1] > > > > > > > > > > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event" > > > > > > > > > > it also has a small regression comparing to its parent, but better also. > > > > > > > > > > ========================================================================================= > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > > > commit: > > > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > > > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") > > > > > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969 > > > > > ---------------- --------------------------- > > > > > %stddev %change %stddev > > > > > \ | \ > > > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > > > > > > > > > full data is as below [2] > > > > > > > > > > > > > Ok, this looks sane, the small overhead in the write path makes sense. > > > > On second look, while a small regression from 64108c0b47db9 could make > > sense, because it changes the inline fsnotify hooks, the extra regression from > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not > > touch any code that affects the executed functions, so I have to wonder how > > reliable are those results. > > > > Could you re-test the commits 94167e071109d..a82fd282befc7? > > since the branch is: > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <--- > * f301cd18006c3 fanotify: rename a misnamed constant > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <--- > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d > * 83af0c89527ab fsnotify: generate pre-content permission event on exec > * aca4084213276 fsnotify: generate pre-content permission event on open > * 93656e196b006 fsnotify: introduce pre-content permission event > * 1613e604df0cd (tag: v6.10-rc1, > > > I made below comparison, which shows little difference among 3 commits: > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > a82fd282befc7 fanotify: report file range info with pre-content events > > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput > > Hi Oliver, Perhaps I am not reading the report right, but how do these numbers reconcile with the previous report of regression: ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") 94167e071109d573 64108c0b47db91b20d658a89969 ---------------- --------------------------- %stddev %change %stddev \ | \ 1.163e+08 -2.4% 1.135e+08 unixbench.throughput Is this a case of unstable results? something else? > > > > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1 > > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers") > > > > before the rest of the pre-content infrastructure, because together they > > > > would still be a performance win. > > > > > > > > Can you please compare this branch to v6.9? > > > > > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests. > > > > > > > This is a bit surprising to me, because a5e57b4d370c should have been a pretty > > big performance win for the common case. > > in our this unixbench tests, a5e57b4d370c introduce a small regression comparing > to its parent (477cf917dd028). > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > v6.9 > 477cf917dd028 fsnotify: use an enum for group priority constants > a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers > v6.10-rc1 > > v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919 v6.10-rc1 > ---------------- --------------------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev %change %stddev > \ | \ | \ | \ > 1.219e+08 +2.8% 1.253e+08 +0.4% 1.224e+08 -0.2% 1.216e+08 unixbench.throughput > Assuming this is a stable result, that's very small regression compared to the improvements before it and one that I dare to call acceptable for this micro buffered write benchmark because of the big gain in other workloads. > > BTW, for a5e57b4d370c, there is another regression report in > https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/ > which also includes some unixbench improvement results, but different sub-tests > on different platform. > Right. I forgot about this one. Sorry for dropping the ball. I do not know what is going on there. I will try to take a look again. Thanks, Amir. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-03 5:58 ` Amir Goldstein @ 2024-07-03 7:21 ` Oliver Sang 2024-07-03 16:20 ` Amir Goldstein 0 siblings, 1 reply; 17+ messages in thread From: Oliver Sang @ 2024-07-03 7:21 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp hi, Amir, On Wed, Jul 03, 2024 at 08:58:13AM +0300, Amir Goldstein wrote: > On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > hi, Amir, > > > > sorry for quite late. > > > > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote: > > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > > > hi, Amir, > > > > > > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote: > > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > > > > > > > hi, Amir, > > > > > > > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote: > > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot > > > > > > > <oliver.sang@intel.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > > > > > > > > > > > > > > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > > > > > > > > https://github.com/amir73il/linux sb_write_barrier > > > > > > > > > > > > > > > > > > > > > > Jan, > > > > > > > > > > > > > > I speculate that the regression is due to the fact that we store and pass the > > > > > > > path information on struct file_range on the stack before the optimizations > > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores > > > > > > > and __fsnotify_parent() pays a bigger price for fetches? > > > > > > > > > > > > > > Luckily, we already have the way to check > > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb, > > > > > > > FSNOTIFY_PRIO_PRE_CONTENT)) > > > > > > > so now I used it to optimize out the fsnotify_file_range() inline > > > > > > > code entirely. > > > > > > > > > > > > > > Oliver, > > > > > > > > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1): > > > > > > > > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info > > > > > > > with pre-content events > > > > > > > * f301cd18006c - fanotify: rename a misnamed constant > > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event > > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event > > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event > > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec > > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open > > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event > > > > > > > > > > > > > > The optimization was done in the first commit (fsnotify: introduce > > > > > > > pre-content permission event), > > > > > > > but impacts the regressing commit (fanotify: pass optional file access > > > > > > > range in pre-content event). > > > > > > > no need to test all middle commits. > > > > > > > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now > > > > > > > > > > > > ========================================================================================= > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > > > > > commit: > > > > > > v6.10-rc1 > > > > > > a82fd282befc7 ("fanotify: report file range info with pre-content events") > > > > > > > > > > > > v6.10-rc1 a82fd282befc71d99106bf31066 > > > > > > ---------------- --------------------------- > > > > > > %stddev %change %stddev > > > > > > \ | \ > > > > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > > > > > > > full data is as below [1] > > > > > > > > > > > > > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event" > > > > > > > > > > > > it also has a small regression comparing to its parent, but better also. > > > > > > > > > > > > ========================================================================================= > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > > > > > commit: > > > > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > > > > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") > > > > > > > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969 > > > > > > ---------------- --------------------------- > > > > > > %stddev %change %stddev > > > > > > \ | \ > > > > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > > > > > > > > > > > full data is as below [2] > > > > > > > > > > > > > > > > Ok, this looks sane, the small overhead in the write path makes sense. > > > > > > On second look, while a small regression from 64108c0b47db9 could make > > > sense, because it changes the inline fsnotify hooks, the extra regression from > > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not > > > touch any code that affects the executed functions, so I have to wonder how > > > reliable are those results. > > > > > > Could you re-test the commits 94167e071109d..a82fd282befc7? > > > > since the branch is: > > > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <--- > > * f301cd18006c3 fanotify: rename a misnamed constant > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <--- > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec > > * aca4084213276 fsnotify: generate pre-content permission event on open > > * 93656e196b006 fsnotify: introduce pre-content permission event > > * 1613e604df0cd (tag: v6.10-rc1, > > > > > > I made below comparison, which shows little difference among 3 commits: > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > commit: > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > a82fd282befc7 fanotify: report file range info with pre-content events > > > > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066 > > ---------------- --------------------------- --------------------------- > > %stddev %change %stddev %change %stddev > > \ | \ | \ > > 1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput > > > > > > Hi Oliver, > > Perhaps I am not reading the report right, but how do these numbers reconcile > with the previous report of regression: > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > 64108c0b47db9 ("fanotify: pass optional file access range in > pre-content event") > > 94167e071109d573 64108c0b47db91b20d658a89969 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > Is this a case of unstable results? something else? you could see the data for 94167e071109d are 1.163e+08 in both table. the data in our tests seem quite stable for a commit, such like for v6.10-rc1: "unixbench.throughput": [ 121545292.8, 121629889.4, 121598992.0, 121492095.5, 121645038.1, 121556286.9 ], for the branch tip a82fd282befc7: "unixbench.throughput": [ 116675606.7, 116840611.2, 116738966.0, 116956953.1, 116704901.9, 116997628.3, 117141733.7, 116660495.4 ], let me combine the results from this branch together: ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: v6.10-rc1 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event 64108c0b47db9 fanotify: pass optional file access range in pre-content event a82fd282befc7 fanotify: report file range info with pre-content v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066 ---------------- --------------------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ | \ 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput one thing I want to mention is the "%change" is always comparing to the first column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1, and so on. then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about -2.4% regression compareing to 94167e071109d. from above table, along the branch, the performance is kind of fluctuating, dropped most on 64108c0b47db9, but then recovered a little on tip. our bot will not bisect the improvment between 64108c0b47db9 and the tip, since the whole branch show a drop. > > > > > > > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1 > > > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers") > > > > > before the rest of the pre-content infrastructure, because together they > > > > > would still be a performance win. > > > > > > > > > > Can you please compare this branch to v6.9? > > > > > > > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests. > > > > > > > > > > This is a bit surprising to me, because a5e57b4d370c should have been a pretty > > > big performance win for the common case. > > > > in our this unixbench tests, a5e57b4d370c introduce a small regression comparing > > to its parent (477cf917dd028). > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > commit: > > v6.9 > > 477cf917dd028 fsnotify: use an enum for group priority constants > > a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers > > v6.10-rc1 > > > > v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919 v6.10-rc1 > > ---------------- --------------------------- --------------------------- --------------------------- > > %stddev %change %stddev %change %stddev %change %stddev > > \ | \ | \ | \ > > 1.219e+08 +2.8% 1.253e+08 +0.4% 1.224e+08 -0.2% 1.216e+08 unixbench.throughput > > > > Assuming this is a stable result, > that's very small regression compared to the improvements before it > and one that I dare to call acceptable for this micro buffered write benchmark > because of the big gain in other workloads. again, all data here is comparing to v6.9, so there is a 2.8% improvement on 477cf917dd028 comparing to v6.9, but it drops back on a5e57b4d370c6, whose data is almost same with v6.9 (so +0.4% comparing to v6.9). anyway, we normally ignore <1% performance changes, so we won't say a5e57b4d370c6 or v6.10-rc1 has obvious performance changes comparing to v6.9. > > > > > BTW, for a5e57b4d370c, there is another regression report in > > https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/ > > which also includes some unixbench improvement results, but different sub-tests > > on different platform. > > > > Right. I forgot about this one. > Sorry for dropping the ball. > I do not know what is going on there. > I will try to take a look again. > > Thanks, > Amir. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-03 7:21 ` Oliver Sang @ 2024-07-03 16:20 ` Amir Goldstein 2024-07-04 15:39 ` Jan Kara 2024-07-05 2:09 ` Oliver Sang 0 siblings, 2 replies; 17+ messages in thread From: Amir Goldstein @ 2024-07-03 16:20 UTC (permalink / raw) To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp On Wed, Jul 3, 2024 at 10:21 AM Oliver Sang <oliver.sang@intel.com> wrote: > > hi, Amir, > > On Wed, Jul 03, 2024 at 08:58:13AM +0300, Amir Goldstein wrote: > > On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > hi, Amir, > > > > > > sorry for quite late. > > > > > > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote: > > > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > > > > > hi, Amir, > > > > > > > > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote: > > > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > > > > > > > > > hi, Amir, > > > > > > > > > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote: > > > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot > > > > > > > > <oliver.sang@intel.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > > > > > > > > > > > > > > > > > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > > > > > > > > > https://github.com/amir73il/linux sb_write_barrier > > > > > > > > > > > > > > > > > > > > > > > > > Jan, > > > > > > > > > > > > > > > > I speculate that the regression is due to the fact that we store and pass the > > > > > > > > path information on struct file_range on the stack before the optimizations > > > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores > > > > > > > > and __fsnotify_parent() pays a bigger price for fetches? > > > > > > > > > > > > > > > > Luckily, we already have the way to check > > > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb, > > > > > > > > FSNOTIFY_PRIO_PRE_CONTENT)) > > > > > > > > so now I used it to optimize out the fsnotify_file_range() inline > > > > > > > > code entirely. > > > > > > > > > > > > > > > > Oliver, > > > > > > > > > > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1): > > > > > > > > > > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info > > > > > > > > with pre-content events > > > > > > > > * f301cd18006c - fanotify: rename a misnamed constant > > > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event > > > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event > > > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event > > > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec > > > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open > > > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event > > > > > > > > > > > > > > > > The optimization was done in the first commit (fsnotify: introduce > > > > > > > > pre-content permission event), > > > > > > > > but impacts the regressing commit (fanotify: pass optional file access > > > > > > > > range in pre-content event). > > > > > > > > no need to test all middle commits. > > > > > > > > > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now > > > > > > > > > > > > > > ========================================================================================= > > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > > > > > > > commit: > > > > > > > v6.10-rc1 > > > > > > > a82fd282befc7 ("fanotify: report file range info with pre-content events") > > > > > > > > > > > > > > v6.10-rc1 a82fd282befc71d99106bf31066 > > > > > > > ---------------- --------------------------- > > > > > > > %stddev %change %stddev > > > > > > > \ | \ > > > > > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > > > > > > > > > full data is as below [1] > > > > > > > > > > > > > > > > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event" > > > > > > > > > > > > > > it also has a small regression comparing to its parent, but better also. > > > > > > > > > > > > > > ========================================================================================= > > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > > > > > > > commit: > > > > > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > > > > > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") > > > > > > > > > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969 > > > > > > > ---------------- --------------------------- > > > > > > > %stddev %change %stddev > > > > > > > \ | \ > > > > > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > > > > > > > > > > > > > full data is as below [2] > > > > > > > > > > > > > > > > > > > Ok, this looks sane, the small overhead in the write path makes sense. > > > > > > > > On second look, while a small regression from 64108c0b47db9 could make > > > > sense, because it changes the inline fsnotify hooks, the extra regression from > > > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not > > > > touch any code that affects the executed functions, so I have to wonder how > > > > reliable are those results. > > > > > > > > Could you re-test the commits 94167e071109d..a82fd282befc7? > > > > > > since the branch is: > > > > > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <--- > > > * f301cd18006c3 fanotify: rename a misnamed constant > > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <--- > > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d > > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec > > > * aca4084213276 fsnotify: generate pre-content permission event on open > > > * 93656e196b006 fsnotify: introduce pre-content permission event > > > * 1613e604df0cd (tag: v6.10-rc1, > > > > > > > > > I made below comparison, which shows little difference among 3 commits: > > > > > > ========================================================================================= > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > commit: > > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > > a82fd282befc7 fanotify: report file range info with pre-content events > > > > > > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066 > > > ---------------- --------------------------- --------------------------- > > > %stddev %change %stddev %change %stddev > > > \ | \ | \ > > > 1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput > > > > > > > > > > Hi Oliver, > > > > Perhaps I am not reading the report right, but how do these numbers reconcile > > with the previous report of regression: > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > commit: > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > > 64108c0b47db9 ("fanotify: pass optional file access range in > > pre-content event") > > > > 94167e071109d573 64108c0b47db91b20d658a89969 > > ---------------- --------------------------- > > %stddev %change %stddev > > \ | \ > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > > > Is this a case of unstable results? something else? > > you could see the data for 94167e071109d are 1.163e+08 in both table. > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1: > "unixbench.throughput": [ > 121545292.8, > 121629889.4, > 121598992.0, > 121492095.5, > 121645038.1, > 121556286.9 > ], > Are all those runs from the same boot? > for the branch tip a82fd282befc7: > "unixbench.throughput": [ > 116675606.7, > 116840611.2, > 116738966.0, > 116956953.1, > 116704901.9, > 116997628.3, > 117141733.7, > 116660495.4 > ], > And these run? Otherwise, we might have a fluctuation that happens at boot time or at mount time or something. > > let me combine the results from this branch together: > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > v6.10-rc1 > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > 64108c0b47db9 fanotify: pass optional file access range in pre-content event > a82fd282befc7 fanotify: report file range info with pre-content > > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066 > ---------------- --------------------------- --------------------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev %change %stddev %change %stddev > \ | \ | \ | \ | \ > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput > > > one thing I want to mention is the "%change" is always comparing to the first > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1, > and so on. Thanks for clarifying - I did not read it this way. > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about > -2.4% regression compareing to 94167e071109d. > > from above table, along the branch, the performance is kind of fluctuating, > dropped most on 64108c0b47db9, but then recovered a little on tip. > I can understand why 64108c0b47db91b would regress performance, but I cannot think of any possible explanation why a82fd282befc should improve performance, so I have to wonder if the regression to -6.6% is not a fluke of some specific boot/mount? I pushed a test branch to https://github.com/amir73il/linux/commits/fsnotify_for_lkp with an extra patch that un-inlines some helpers to help bisect the perf report better. Maybe produce the report with this commit and it sheds some light. Jan, any other ideas? > our bot will not bisect the improvment between 64108c0b47db9 and the tip, since > the whole branch show a drop. > > > > > > > > > > > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1 > > > > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers") > > > > > > before the rest of the pre-content infrastructure, because together they > > > > > > would still be a performance win. > > > > > > > > > > > > Can you please compare this branch to v6.9? > > > > > > > > > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests. > > > > > > > > > > > > > This is a bit surprising to me, because a5e57b4d370c should have been a pretty > > > > big performance win for the common case. > > > > > > in our this unixbench tests, a5e57b4d370c introduce a small regression comparing > > > to its parent (477cf917dd028). > > > > > > ========================================================================================= > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > commit: > > > v6.9 > > > 477cf917dd028 fsnotify: use an enum for group priority constants > > > a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers > > > v6.10-rc1 > > > > > > v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919 v6.10-rc1 > > > ---------------- --------------------------- --------------------------- --------------------------- > > > %stddev %change %stddev %change %stddev %change %stddev > > > \ | \ | \ | \ > > > 1.219e+08 +2.8% 1.253e+08 +0.4% 1.224e+08 -0.2% 1.216e+08 unixbench.throughput > > > > > > > Assuming this is a stable result, > > that's very small regression compared to the improvements before it > > and one that I dare to call acceptable for this micro buffered write benchmark > > because of the big gain in other workloads. > > again, all data here is comparing to v6.9, so there is a 2.8% improvement on > 477cf917dd028 comparing to v6.9, but it drops back on a5e57b4d370c6, whose > data is almost same with v6.9 (so +0.4% comparing to v6.9). > > anyway, we normally ignore <1% performance changes, so we won't say > a5e57b4d370c6 or v6.10-rc1 has obvious performance changes comparing to v6.9. > This fluctuation is also hard to explain. Jan, any thoughts? things to try? Thanks, Amir. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-03 16:20 ` Amir Goldstein @ 2024-07-04 15:39 ` Jan Kara 2024-07-05 2:09 ` Oliver Sang 1 sibling, 0 replies; 17+ messages in thread From: Jan Kara @ 2024-07-04 15:39 UTC (permalink / raw) To: Amir Goldstein; +Cc: Oliver Sang, Jan Kara, oe-lkp, lkp On Wed 03-07-24 19:20:49, Amir Goldstein wrote: > On Wed, Jul 3, 2024 at 10:21 AM Oliver Sang <oliver.sang@intel.com> wrote: > > On Wed, Jul 03, 2024 at 08:58:13AM +0300, Amir Goldstein wrote: > > > On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote: > > > > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote: > > > > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote: > > > > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot > > > > > > > > > <oliver.sang@intel.com> wrote: > > > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event") > > > > > > > > > > https://github.com/amir73il/linux sb_write_barrier > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jan, > > > > > > > > > > > > > > > > > > I speculate that the regression is due to the fact that we store and pass the > > > > > > > > > path information on struct file_range on the stack before the optimizations > > > > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores > > > > > > > > > and __fsnotify_parent() pays a bigger price for fetches? > > > > > > > > > > > > > > > > > > Luckily, we already have the way to check > > > > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb, > > > > > > > > > FSNOTIFY_PRIO_PRE_CONTENT)) > > > > > > > > > so now I used it to optimize out the fsnotify_file_range() inline > > > > > > > > > code entirely. > > > > > > > > > > > > > > > > > > Oliver, > > > > > > > > > > > > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1): > > > > > > > > > > > > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info > > > > > > > > > with pre-content events > > > > > > > > > * f301cd18006c - fanotify: rename a misnamed constant > > > > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event > > > > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event > > > > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event > > > > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec > > > > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open > > > > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event > > > > > > > > > > > > > > > > > > The optimization was done in the first commit (fsnotify: introduce > > > > > > > > > pre-content permission event), > > > > > > > > > but impacts the regressing commit (fanotify: pass optional file access > > > > > > > > > range in pre-content event). > > > > > > > > > no need to test all middle commits. > > > > > > > > > > > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now > > > > > > > > > > > > > > > > ========================================================================================= > > > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > > > > > > > > > commit: > > > > > > > > v6.10-rc1 > > > > > > > > a82fd282befc7 ("fanotify: report file range info with pre-content events") > > > > > > > > > > > > > > > > v6.10-rc1 a82fd282befc71d99106bf31066 > > > > > > > > ---------------- --------------------------- > > > > > > > > %stddev %change %stddev > > > > > > > > \ | \ > > > > > > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > > > > > > > > > > > full data is as below [1] > > > > > > > > > > > > > > > > > > > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event" > > > > > > > > > > > > > > > > it also has a small regression comparing to its parent, but better also. > > > > > > > > > > > > > > > > ========================================================================================= > > > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > > > > > > > > > commit: > > > > > > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > > > > > > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event") > > > > > > > > > > > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969 > > > > > > > > ---------------- --------------------------- > > > > > > > > %stddev %change %stddev > > > > > > > > \ | \ > > > > > > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > > > > > > > > > > > > > > > full data is as below [2] > > > > > > > > > > > > > > > > > > > > > > Ok, this looks sane, the small overhead in the write path makes sense. > > > > > > > > > > On second look, while a small regression from 64108c0b47db9 could make > > > > > sense, because it changes the inline fsnotify hooks, the extra regression from > > > > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not > > > > > touch any code that affects the executed functions, so I have to wonder how > > > > > reliable are those results. > > > > > > > > > > Could you re-test the commits 94167e071109d..a82fd282befc7? > > > > > > > > since the branch is: > > > > > > > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <--- > > > > * f301cd18006c3 fanotify: rename a misnamed constant > > > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <--- > > > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d > > > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec > > > > * aca4084213276 fsnotify: generate pre-content permission event on open > > > > * 93656e196b006 fsnotify: introduce pre-content permission event > > > > * 1613e604df0cd (tag: v6.10-rc1, > > > > > > > > > > > > I made below comparison, which shows little difference among 3 commits: > > > > > > > > ========================================================================================= > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > commit: > > > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > > > a82fd282befc7 fanotify: report file range info with pre-content events > > > > > > > > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066 > > > > ---------------- --------------------------- --------------------------- > > > > %stddev %change %stddev %change %stddev > > > > \ | \ | \ > > > > 1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput > > > > > > > > > > > > > > Hi Oliver, > > > > > > Perhaps I am not reading the report right, but how do these numbers reconcile > > > with the previous report of regression: > > > > > > ========================================================================================= > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > commit: > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event") > > > 64108c0b47db9 ("fanotify: pass optional file access range in > > > pre-content event") > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969 > > > ---------------- --------------------------- > > > %stddev %change %stddev > > > \ | \ > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput > > > > > > Is this a case of unstable results? something else? > > > > you could see the data for 94167e071109d are 1.163e+08 in both table. > > > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1: > > "unixbench.throughput": [ > > 121545292.8, > > 121629889.4, > > 121598992.0, > > 121492095.5, > > 121645038.1, > > 121556286.9 > > ], > > > > Are all those runs from the same boot? > > > for the branch tip a82fd282befc7: > > "unixbench.throughput": [ > > 116675606.7, > > 116840611.2, > > 116738966.0, > > 116956953.1, > > 116704901.9, > > 116997628.3, > > 117141733.7, > > 116660495.4 > > ], > > > > And these run? > > Otherwise, we might have a fluctuation that happens at boot time > or at mount time or something. So what I suspect is that the fluctuation actually happens "per compile time". Depending on how exactly some hot paths get aligned in the compiled kernel binary wrt CPU cachelines or similar, you get differences in performance. I've seen that happening quite a few times in the past and the observed differences are well in that range. > > let me combine the results from this branch together: > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > commit: > > v6.10-rc1 > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > a82fd282befc7 fanotify: report file range info with pre-content > > > > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066 > > ---------------- --------------------------- --------------------------- --------------------------- --------------------------- > > %stddev %change %stddev %change %stddev %change %stddev %change %stddev > > \ | \ | \ | \ | \ > > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > one thing I want to mention is the "%change" is always comparing to the first > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1, > > and so on. > > Thanks for clarifying - I did not read it this way. > > > > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about > > -2.4% regression compareing to 94167e071109d. > > > > from above table, along the branch, the performance is kind of fluctuating, > > dropped most on 64108c0b47db9, but then recovered a little on tip. > > > > I can understand why 64108c0b47db91b would regress performance, but I > cannot think of any possible explanation why a82fd282befc should improve > performance, so I have to wonder if the regression to -6.6% is not a > fluke of some specific boot/mount? I agree. In my opinion at least some of those changes are not related to code changes but rather to random code alignment changes. > I pushed a test branch to > https://github.com/amir73il/linux/commits/fsnotify_for_lkp > with an extra patch that un-inlines some helpers to help bisect the > perf report better. > Maybe produce the report with this commit and it sheds some light. > > Jan, any other ideas? Not really. These alignment induced fluctuations are annoying but I don't know of a good way to avoid them. Even narrowing them down is tedious as the changes on this scale are not easy to see in the profiles. So I'd check the perf profiles and if we don't see any obvious regression in the changed places, I'd just ignore the regression... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-03 16:20 ` Amir Goldstein 2024-07-04 15:39 ` Jan Kara @ 2024-07-05 2:09 ` Oliver Sang 2024-07-05 5:48 ` Amir Goldstein 1 sibling, 1 reply; 17+ messages in thread From: Oliver Sang @ 2024-07-05 2:09 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang hi, Amir, On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote: [...] > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1: > > "unixbench.throughput": [ > > 121545292.8, > > 121629889.4, > > 121598992.0, > > 121492095.5, > > 121645038.1, > > 121556286.9 > > ], > > > > Are all those runs from the same boot? no. we reboot machine before each run. > > > for the branch tip a82fd282befc7: > > "unixbench.throughput": [ > > 116675606.7, > > 116840611.2, > > 116738966.0, > > 116956953.1, > > 116704901.9, > > 116997628.3, > > 117141733.7, > > 116660495.4 > > ], > > > > And these run? same. > > Otherwise, we might have a fluctuation that happens at boot time > or at mount time or something. > > > > > let me combine the results from this branch together: > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > commit: > > v6.10-rc1 > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > a82fd282befc7 fanotify: report file range info with pre-content > > > > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066 > > ---------------- --------------------------- --------------------------- --------------------------- --------------------------- > > %stddev %change %stddev %change %stddev %change %stddev %change %stddev > > \ | \ | \ | \ | \ > > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > one thing I want to mention is the "%change" is always comparing to the first > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1, > > and so on. > > Thanks for clarifying - I did not read it this way. > > > > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about > > -2.4% regression compareing to 94167e071109d. > > > > from above table, along the branch, the performance is kind of fluctuating, > > dropped most on 64108c0b47db9, but then recovered a little on tip. > > > > I can understand why 64108c0b47db91b would regress performance, but I > cannot think > of any possible explanation why a82fd282befc should improve performance, > so I have to wonder if the regression to -6.6% is not a fluke of some > specific boot/mount? > > I pushed a test branch to > https://github.com/amir73il/linux/commits/fsnotify_for_lkp > with an extra patch that un-inlines some helpers to help bisect the > perf report better. > Maybe produce the report with this commit and it sheds some light. since * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events * f301cd18006c3 fanotify: rename a misnamed constant * 64108c0b47db9 fanotify: pass optional file access range in pre-content event * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event * 83af0c89527ab fsnotify: generate pre-content permission event on exec * aca4084213276 fsnotify: generate pre-content permission event on open * 93656e196b006 fsnotify: introduce pre-content permission event * 1613e604df0cd (tag: v6.10-rc1, we run tests upon new commit. summary report is as below: ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: v6.10-rc1 a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput since Jan mentioned in a later mail that perf profiles are useful, I put details as below ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: v6.10-rc1 a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 9.50 ± 24% +55.3% 14.75 ± 24% +53.9% 14.62 ± 24% perf-c2c.DRAM.local 261842 +0.6% 263433 +1.5% 265837 time.involuntary_context_switches 6199 -5.7% 5843 -2.3% 6054 time.user_time 1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput 261842 +0.6% 263433 +1.5% 265837 unixbench.time.involuntary_context_switches 6199 -5.7% 5843 -2.3% 6054 unixbench.time.user_time 4.513e+10 -3.8% 4.339e+10 -4.0% 4.331e+10 unixbench.workload 167317 ± 5% -39.3% 101518 ± 47% -30.5% 116276 ± 39% numa-vmstat.node1.nr_anon_pages 112.92 ± 28% -53.9% 52.06 ± 99% -39.8% 67.98 ± 64% numa-vmstat.node1.nr_anon_transparent_hugepages 78069 +506.3% 473340 ± 77% +495.2% 464673 ± 77% numa-vmstat.node1.nr_file_pages 167460 ± 5% -39.3% 101568 ± 47% -30.5% 116461 ± 39% numa-vmstat.node1.nr_inactive_anon 10649 ± 6% +3703.3% 405022 ± 90% +3625.1% 396698 ± 90% numa-vmstat.node1.nr_unevictable 167460 ± 5% -39.3% 101568 ± 47% -30.5% 116461 ± 39% numa-vmstat.node1.nr_zone_inactive_anon 10649 ± 6% +3703.4% 405022 ± 90% +3625.2% 396698 ± 90% numa-vmstat.node1.nr_zone_unevictable 15473 ±125% +567.1% 103220 ± 85% +366.5% 72185 ±110% numa-meminfo.node0.AnonHugePages 220234 ± 13% +116.8% 477532 ± 37% +101.8% 444469 ± 42% numa-meminfo.node0.AnonPages.max 231368 ± 28% -53.9% 106616 ± 98% -39.8% 139307 ± 64% numa-meminfo.node1.AnonHugePages 668949 ± 5% -39.3% 405873 ± 47% -30.5% 464919 ± 39% numa-meminfo.node1.AnonPages 836146 ± 6% -34.5% 547503 ± 38% -28.1% 601279 ± 34% numa-meminfo.node1.AnonPages.max 312276 +506.3% 1893321 ± 77% +495.2% 1858788 ± 77% numa-meminfo.node1.FilePages 669489 ± 5% -39.3% 406110 ± 47% -30.4% 465687 ± 39% numa-meminfo.node1.Inactive 669489 ± 5% -39.4% 406010 ± 47% -30.5% 465628 ± 39% numa-meminfo.node1.Inactive(anon) 42552 ± 6% +3707.3% 1620116 ± 90% +3628.9% 1586760 ± 90% numa-meminfo.node1.Unevictable 1.458e+11 -2.7% 1.419e+11 -3.8% 1.402e+11 perf-stat.i.branch-instructions 7347431 -1.3% 7251270 ± 2% -2.8% 7140090 perf-stat.i.branch-misses 11.47 ± 6% +2.8 14.29 ± 9% +2.5 13.99 ± 6% perf-stat.i.cache-miss-rate% 3915539 ± 8% +513.8% 24032895 ± 10% +500.6% 23516538 ± 7% perf-stat.i.cache-misses 32425619 ± 3% +391.9% 1.595e+08 ± 5% +388.7% 1.585e+08 ± 4% perf-stat.i.cache-references 2196 +0.4% 2206 +2.4% 2249 perf-stat.i.context-switches 151202 ± 16% -77.0% 34851 ± 59% -75.9% 36442 ± 38% perf-stat.i.cycles-between-cache-misses 6.961e+11 -1.9% 6.829e+11 -3.4% 6.724e+11 perf-stat.i.instructions 1.22 -1.3% 1.20 -2.5% 1.19 perf-stat.i.ipc 0.01 ± 9% +523.5% 0.04 ± 11% +518.2% 0.03 ± 7% perf-stat.overall.MPKI 12.09 ± 6% +3.0 15.08 ± 8% +2.7 14.83 ± 5% perf-stat.overall.cache-miss-rate% 0.75 +2.0% 0.77 +2.6% 0.77 perf-stat.overall.cpi 133775 ± 8% -83.6% 21976 ± 11% -83.4% 22156 ± 7% perf-stat.overall.cycles-between-cache-misses 1.33 -1.9% 1.31 -2.5% 1.30 perf-stat.overall.ipc 5721 +2.0% 5835 +1.8% 5821 perf-stat.overall.path-length 1.452e+11 -2.7% 1.413e+11 -3.9% 1.395e+11 perf-stat.ps.branch-instructions 7332734 -1.3% 7238853 ± 3% -2.9% 7119552 perf-stat.ps.branch-misses 3921138 ± 8% +511.4% 23972111 ± 11% +496.7% 23398300 ± 7% perf-stat.ps.cache-misses 32415461 ± 3% +389.9% 1.588e+08 ± 5% +386.6% 1.577e+08 ± 4% perf-stat.ps.cache-references 2192 +0.4% 2201 +2.3% 2243 perf-stat.ps.context-switches 6.932e+11 -1.9% 6.798e+11 -3.5% 6.691e+11 perf-stat.ps.instructions 2.582e+14 -1.9% 2.532e+14 -2.3% 2.521e+14 perf-stat.total.instructions 13.19 -0.7 12.50 -0.4 12.75 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write 7.01 -0.2 6.80 -0.1 6.88 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 1.11 -0.2 0.91 -0.0 1.11 ± 2% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write 2.50 -0.2 2.35 +0.1 2.58 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write 2.81 -0.1 2.71 ± 2% +0.1 2.94 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 1.68 -0.1 1.59 -0.0 1.64 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 3.73 -0.1 3.64 +0.0 3.76 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.62 -0.1 1.55 -0.1 1.55 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 2.18 -0.1 2.12 -0.1 2.13 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write 3.34 -0.1 3.28 +0.2 3.54 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64 0.65 -0.1 0.60 ± 2% +0.0 0.68 perf-profile.calltrace.cycles-pp.w_test 0.92 -0.0 0.87 -0.0 0.88 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 0.70 -0.0 0.66 -0.0 0.69 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write 0.86 -0.0 0.82 -0.0 0.84 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write 0.92 -0.0 0.88 -0.0 0.88 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 0.63 -0.0 0.59 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.86 -0.0 0.83 -0.0 0.81 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags 3.53 -0.0 3.50 -0.2 3.37 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter 0.68 -0.0 0.66 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 0.53 -0.0 0.51 -0.0 0.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write 4.23 -0.0 4.22 -0.2 4.05 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write 0.72 -0.0 0.71 -0.0 0.71 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.62 -0.0 0.61 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 2.71 +0.0 2.71 -0.1 2.58 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter 0.89 +0.0 0.89 ± 2% +0.1 1.00 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write 0.96 +0.0 0.98 -0.1 0.90 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter 0.75 +0.0 0.77 +0.0 0.75 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write 6.88 +0.0 6.91 -0.2 6.67 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write 1.13 +0.0 1.17 -0.1 1.08 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags 1.92 +0.0 1.96 +0.3 2.18 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 5.30 +0.1 5.36 +0.0 5.32 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 2.12 ± 3% +0.1 2.18 +0.1 2.25 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 5.30 +0.1 5.39 -0.4 4.92 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write 6.17 +0.1 6.27 -0.5 5.66 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 96.84 +0.1 96.98 -0.0 96.81 perf-profile.calltrace.cycles-pp.write 0.78 ± 2% +0.3 1.12 ± 7% +0.0 0.80 ± 3% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter 2.97 +0.6 3.57 -0.2 2.81 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write 3.63 +0.6 4.24 -0.2 3.42 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write 12.01 +0.6 12.63 -0.5 11.54 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 4.32 +0.6 4.95 +0.8 5.12 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 37.28 +0.8 38.12 +0.0 37.32 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 84.26 +0.9 85.20 +0.4 84.65 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 13.39 +1.0 14.36 +1.0 14.35 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 12.30 +1.0 13.30 +1.0 13.33 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write 82.83 +1.0 83.84 +0.5 83.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 57.94 +1.3 59.20 -0.0 57.92 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.99 +1.3 7.25 ± 2% +1.3 7.27 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 78.13 +1.3 79.39 +0.7 78.83 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 74.26 +1.3 75.58 +0.6 74.90 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 7.43 -0.4 7.06 -0.2 7.19 perf-profile.children.cycles-pp.entry_SYSCALL_64 4.42 -0.2 4.18 -0.2 4.25 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 1.21 -0.2 1.01 -0.0 1.19 perf-profile.children.cycles-pp.syscall_return_via_sysret 7.14 -0.2 6.95 -0.1 7.02 perf-profile.children.cycles-pp.copy_page_from_iter_atomic 4.18 -0.2 4.00 -0.2 4.03 perf-profile.children.cycles-pp.__cond_resched 2.74 -0.2 2.58 +0.1 2.82 perf-profile.children.cycles-pp.apparmor_file_permission 2.42 -0.1 2.30 -0.1 2.31 perf-profile.children.cycles-pp.rcu_all_qs 3.82 -0.1 3.72 +0.0 3.84 perf-profile.children.cycles-pp.__fsnotify_parent 2.98 -0.1 2.88 ± 2% +0.1 3.12 perf-profile.children.cycles-pp.down_write 1.74 -0.1 1.65 -0.0 1.70 perf-profile.children.cycles-pp.up_write 1.99 -0.1 1.89 -0.1 1.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 3.71 -0.1 3.63 +0.2 3.90 perf-profile.children.cycles-pp.security_file_permission 0.99 -0.1 0.91 +0.0 1.02 perf-profile.children.cycles-pp.w_test 2.47 -0.1 2.40 -0.1 2.42 perf-profile.children.cycles-pp.xas_load 1.12 -0.1 1.06 -0.0 1.08 perf-profile.children.cycles-pp.folio_wait_stable 1.26 -0.1 1.20 -0.1 1.21 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags 0.75 -0.0 0.71 -0.0 0.71 perf-profile.children.cycles-pp.x64_sys_call 0.98 -0.0 0.94 -0.0 0.95 perf-profile.children.cycles-pp.aa_file_perm 0.46 -0.0 0.42 +0.0 0.46 perf-profile.children.cycles-pp.write@plt 1.10 -0.0 1.06 -0.1 1.04 perf-profile.children.cycles-pp.xattr_resolve_name 0.36 -0.0 0.33 -0.0 0.35 perf-profile.children.cycles-pp.amd_clear_divider 3.76 -0.0 3.73 -0.2 3.59 perf-profile.children.cycles-pp.cap_inode_need_killpriv 0.59 -0.0 0.56 -0.0 0.57 perf-profile.children.cycles-pp.inode_to_bdi 0.38 -0.0 0.35 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write 1.05 -0.0 1.03 -0.0 1.03 perf-profile.children.cycles-pp.folio_mapping 3.41 -0.0 3.38 -0.2 3.25 perf-profile.children.cycles-pp.__vfs_getxattr 0.56 -0.0 0.53 -0.0 0.54 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 4.47 -0.0 4.45 -0.2 4.28 perf-profile.children.cycles-pp.security_inode_need_killpriv 0.25 -0.0 0.24 ± 2% -0.0 0.24 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited 0.36 -0.0 0.35 -0.0 0.35 perf-profile.children.cycles-pp.is_bad_inode 0.64 -0.0 0.63 -0.0 0.61 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 1.00 +0.0 1.00 ± 2% +0.1 1.12 perf-profile.children.cycles-pp.generic_write_check_limits 1.38 +0.0 1.40 -0.1 1.31 perf-profile.children.cycles-pp.strcmp 0.93 +0.0 0.95 -0.0 0.93 perf-profile.children.cycles-pp.folio_mark_dirty 1.07 +0.0 1.09 -0.1 1.00 perf-profile.children.cycles-pp.timestamp_truncate 7.23 +0.0 7.26 -0.2 7.01 perf-profile.children.cycles-pp.file_remove_privs_flags 2.25 +0.0 2.29 +0.3 2.53 perf-profile.children.cycles-pp.generic_write_checks 5.70 +0.0 5.74 +0.0 5.70 perf-profile.children.cycles-pp.simple_write_end 2.24 ± 3% +0.1 2.30 +0.1 2.37 ± 2% perf-profile.children.cycles-pp.__fdget_pos 98.96 +0.1 99.02 -0.0 98.92 perf-profile.children.cycles-pp.write 5.69 +0.1 5.76 -0.5 5.22 perf-profile.children.cycles-pp.fault_in_readable 6.42 +0.1 6.53 -0.5 5.89 perf-profile.children.cycles-pp.fault_in_iov_iter_readable 0.89 +0.3 1.23 ± 6% +0.0 0.91 ± 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 3.39 +0.6 3.97 -0.2 3.20 perf-profile.children.cycles-pp.inode_needs_update_time 3.96 +0.6 4.57 -0.2 3.73 perf-profile.children.cycles-pp.file_update_time 12.35 +0.6 12.96 -0.5 11.86 perf-profile.children.cycles-pp.__generic_file_write_iter 4.56 +0.8 5.32 +0.9 5.48 perf-profile.children.cycles-pp.rw_verify_area 38.16 +0.8 39.01 +0.0 38.18 perf-profile.children.cycles-pp.generic_perform_write 84.67 +0.9 85.62 +0.4 85.06 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 13.58 +1.0 14.53 +0.9 14.52 perf-profile.children.cycles-pp.simple_write_begin 83.50 +1.0 84.49 +0.4 83.94 perf-profile.children.cycles-pp.do_syscall_64 12.68 +1.0 13.68 +1.0 13.71 perf-profile.children.cycles-pp.__filemap_get_folio 78.74 +1.2 79.99 +0.7 79.42 perf-profile.children.cycles-pp.ksys_write 58.52 +1.3 59.78 -0.0 58.52 perf-profile.children.cycles-pp.generic_file_write_iter 6.18 +1.3 7.44 ± 2% +1.3 7.46 perf-profile.children.cycles-pp.filemap_get_entry 75.13 +1.3 76.41 +0.6 75.74 perf-profile.children.cycles-pp.vfs_write 7.25 -0.6 6.64 -0.3 6.95 perf-profile.self.cycles-pp.vfs_write 6.45 -0.4 6.09 -0.2 6.28 perf-profile.self.cycles-pp.write 4.32 -0.2 4.09 -0.2 4.16 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 1.21 -0.2 1.01 -0.0 1.19 perf-profile.self.cycles-pp.syscall_return_via_sysret 6.98 -0.2 6.79 -0.1 6.86 perf-profile.self.cycles-pp.copy_page_from_iter_atomic 4.52 -0.2 4.35 -0.2 4.35 perf-profile.self.cycles-pp.__filemap_get_folio 2.34 -0.1 2.22 -0.1 2.26 perf-profile.self.cycles-pp.__cond_resched 1.60 -0.1 1.49 ± 2% +0.1 1.71 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission 1.90 -0.1 1.79 -0.1 1.80 perf-profile.self.cycles-pp.do_syscall_64 3.65 -0.1 3.56 +0.0 3.68 perf-profile.self.cycles-pp.__fsnotify_parent 1.47 -0.1 1.38 -0.1 1.41 perf-profile.self.cycles-pp.ksys_write 1.62 -0.1 1.53 -0.0 1.59 perf-profile.self.cycles-pp.up_write 1.09 -0.1 1.02 -0.1 1.04 perf-profile.self.cycles-pp.syscall_exit_to_user_mode 1.80 -0.1 1.74 -0.1 1.74 perf-profile.self.cycles-pp.xas_load 0.79 -0.1 0.73 +0.0 0.83 perf-profile.self.cycles-pp.w_test 1.10 -0.1 1.04 -0.0 1.08 perf-profile.self.cycles-pp.security_file_permission 1.25 -0.1 1.20 -0.1 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 1.66 -0.1 1.60 -0.1 1.61 perf-profile.self.cycles-pp.entry_SYSCALL_64 1.41 -0.1 1.36 -0.0 1.36 perf-profile.self.cycles-pp.rcu_all_qs 0.90 -0.0 0.86 -0.1 0.81 perf-profile.self.cycles-pp.simple_write_begin 0.88 -0.0 0.84 -0.0 0.85 perf-profile.self.cycles-pp.aa_file_perm 0.80 -0.0 0.76 -0.0 0.76 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags 0.62 -0.0 0.59 -0.0 0.59 perf-profile.self.cycles-pp.x64_sys_call 1.39 -0.0 1.36 -0.1 1.34 perf-profile.self.cycles-pp.__vfs_getxattr 0.53 -0.0 0.50 -0.0 0.51 perf-profile.self.cycles-pp.folio_wait_stable 1.69 -0.0 1.67 +0.1 1.81 perf-profile.self.cycles-pp.generic_file_write_iter 0.87 -0.0 0.85 -0.0 0.85 perf-profile.self.cycles-pp.folio_mapping 0.56 -0.0 0.53 -0.0 0.53 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 1.93 -0.0 1.91 ± 2% +0.2 2.10 perf-profile.self.cycles-pp.down_write 0.24 -0.0 0.22 -0.0 0.23 perf-profile.self.cycles-pp.amd_clear_divider 0.25 -0.0 0.23 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.__x64_sys_write 0.35 -0.0 0.34 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi 1.15 -0.0 1.13 -0.0 1.11 perf-profile.self.cycles-pp.__generic_file_write_iter 0.61 -0.0 0.60 -0.0 0.59 perf-profile.self.cycles-pp.xattr_resolve_name 0.22 -0.0 0.21 -0.0 0.21 ± 2% perf-profile.self.cycles-pp.noop_dirty_folio 0.35 -0.0 0.34 -0.0 0.34 perf-profile.self.cycles-pp.cap_inode_need_killpriv 0.79 -0.0 0.79 ± 2% +0.1 0.88 perf-profile.self.cycles-pp.generic_write_check_limits 0.52 +0.0 0.52 -0.0 0.49 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 3.36 +0.0 3.36 -0.2 3.16 perf-profile.self.cycles-pp.generic_perform_write 0.70 +0.0 0.71 -0.0 0.68 perf-profile.self.cycles-pp.security_inode_need_killpriv 1.30 +0.0 1.32 +0.2 1.46 perf-profile.self.cycles-pp.generic_write_checks 0.66 +0.0 0.69 -0.0 0.62 perf-profile.self.cycles-pp.file_update_time 1.03 +0.0 1.06 -0.1 0.97 perf-profile.self.cycles-pp.strcmp 2.75 +0.0 2.79 +0.0 2.76 perf-profile.self.cycles-pp.simple_write_end 0.72 +0.0 0.77 -0.1 0.66 perf-profile.self.cycles-pp.fault_in_iov_iter_readable 0.87 +0.1 0.93 -0.1 0.82 perf-profile.self.cycles-pp.timestamp_truncate 2.10 ± 3% +0.1 2.16 +0.1 2.23 ± 2% perf-profile.self.cycles-pp.__fdget_pos 2.04 +0.1 2.09 -0.0 1.99 perf-profile.self.cycles-pp.file_remove_privs_flags 5.54 +0.1 5.60 -0.5 5.08 perf-profile.self.cycles-pp.fault_in_readable 1.51 +0.2 1.69 -0.1 1.37 perf-profile.self.cycles-pp.inode_needs_update_time 0.78 ± 2% +0.3 1.11 ± 7% +0.0 0.80 ± 2% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 0.84 +1.0 1.81 +0.8 1.69 perf-profile.self.cycles-pp.rw_verify_area 3.66 +1.3 4.98 ± 4% +1.3 4.99 perf-profile.self.cycles-pp.filemap_get_entry ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-05 2:09 ` Oliver Sang @ 2024-07-05 5:48 ` Amir Goldstein 2024-07-08 5:40 ` Oliver Sang 2024-07-25 13:41 ` Jan Kara 0 siblings, 2 replies; 17+ messages in thread From: Amir Goldstein @ 2024-07-05 5:48 UTC (permalink / raw) To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp On Fri, Jul 5, 2024 at 5:09 AM Oliver Sang <oliver.sang@intel.com> wrote: > > hi, Amir, > > On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote: > > [...] > > > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1: > > > "unixbench.throughput": [ > > > 121545292.8, > > > 121629889.4, > > > 121598992.0, > > > 121492095.5, > > > 121645038.1, > > > 121556286.9 > > > ], > > > > > > > Are all those runs from the same boot? > > no. we reboot machine before each run. > > > > > > for the branch tip a82fd282befc7: > > > "unixbench.throughput": [ > > > 116675606.7, > > > 116840611.2, > > > 116738966.0, > > > 116956953.1, > > > 116704901.9, > > > 116997628.3, > > > 117141733.7, > > > 116660495.4 > > > ], > > > > > > > And these run? > > same. > > > > > Otherwise, we might have a fluctuation that happens at boot time > > or at mount time or something. > > > > > > > > let me combine the results from this branch together: > > > > > > ========================================================================================= > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > commit: > > > v6.10-rc1 > > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > > 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > > a82fd282befc7 fanotify: report file range info with pre-content > > > > > > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066 > > > ---------------- --------------------------- --------------------------- --------------------------- --------------------------- > > > %stddev %change %stddev %change %stddev %change %stddev %change %stddev > > > \ | \ | \ | \ | \ > > > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > > > > one thing I want to mention is the "%change" is always comparing to the first > > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to > > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1, > > > and so on. > > > > Thanks for clarifying - I did not read it this way. > > > > > > > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about > > > -2.4% regression compareing to 94167e071109d. > > > > > > from above table, along the branch, the performance is kind of fluctuating, > > > dropped most on 64108c0b47db9, but then recovered a little on tip. > > > > > > > I can understand why 64108c0b47db91b would regress performance, but I > > cannot think > > of any possible explanation why a82fd282befc should improve performance, > > so I have to wonder if the regression to -6.6% is not a fluke of some > > specific boot/mount? > > > > I pushed a test branch to > > https://github.com/amir73il/linux/commits/fsnotify_for_lkp > > with an extra patch that un-inlines some helpers to help bisect the > > perf report better. > > Maybe produce the report with this commit and it sheds some light. > > since > > * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events > * f301cd18006c3 fanotify: rename a misnamed constant > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > * 83af0c89527ab fsnotify: generate pre-content permission event on exec > * aca4084213276 fsnotify: generate pre-content permission event on open > * 93656e196b006 fsnotify: introduce pre-content permission event > * 1613e604df0cd (tag: v6.10-rc1, > > we run tests upon new commit. summary report is as below: > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > v6.10-rc1 > a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events > 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers > > v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput >: > > since Jan mentioned in a later mail that perf profiles are useful, I put details > as below Thanks. That clarifies that the cycles are spent in the "optimization code" itself. I pushed a new version to the fsnotify_for_lkp branch with a possible fix commit at the base of the branch. Hopefully, with this fix, the compiler will be able to optimize smarter and the generated fast path code will be less sensitive to code alignment ??? If it works, it may eliminate some of the regressions throughout this branch and may also improve the stress-ng regression that you reported on v6.10-rc1 [1]. * e0aaae806edc - (fsnotify_for_lkp) fanotify: report file range info with pre-content events * a28c32866bb3 - fanotify: rename a misnamed constant * 61baabbdceaa - fanotify: pass optional file access range in pre-content event * 72e76d909afd - fanotify: introduce FAN_PRE_MODIFY permission event * 1c71a12ff3ce - fanotify: introduce FAN_PRE_ACCESS permission event * 38a903de931a - fsnotify: generate pre-content permission event on exec * 70be29706389 - fsnotify: generate pre-content permission event on open * 96768b7d6721 - fsnotify: introduce pre-content permission event * 28d5b4a88241 - fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks Fingers crossed... Thanks, Amir. [1] https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-05 5:48 ` Amir Goldstein @ 2024-07-08 5:40 ` Oliver Sang 2024-07-08 16:37 ` Amir Goldstein 2024-07-25 13:41 ` Jan Kara 1 sibling, 1 reply; 17+ messages in thread From: Oliver Sang @ 2024-07-08 5:40 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang hi, Amir, On Fri, Jul 05, 2024 at 08:48:28AM +0300, Amir Goldstein wrote: [...] > > Thanks. > That clarifies that the cycles are spent in the "optimization code" itself. > > I pushed a new version to the fsnotify_for_lkp branch with a possible fix commit > at the base of the branch. > > Hopefully, with this fix, the compiler will be able to optimize smarter and > the generated fast path code will be less sensitive to code alignment ??? > > If it works, it may eliminate some of the regressions throughout this branch and > may also improve the stress-ng regression that you reported on v6.10-rc1 [1]. > > * e0aaae806edc - (fsnotify_for_lkp) fanotify: report file range info > with pre-content events > * a28c32866bb3 - fanotify: rename a misnamed constant > * 61baabbdceaa - fanotify: pass optional file access range in pre-content event > * 72e76d909afd - fanotify: introduce FAN_PRE_MODIFY permission event > * 1c71a12ff3ce - fanotify: introduce FAN_PRE_ACCESS permission event > * 38a903de931a - fsnotify: generate pre-content permission event on exec > * 70be29706389 - fsnotify: generate pre-content permission event on open > * 96768b7d6721 - fsnotify: introduce pre-content permission event > * 28d5b4a88241 - fsnotify: avoid multiple fsnotify_sb_info() access in > permission hooks > > Fingers crossed... unfortunately, seems no luck. I combine the results with 96768b7d6721 and its parent since 96768b7d6721 introduces most regression. ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: v6.10-rc1 28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks 96768b7d67219 fsnotify: introduce pre-content permission event e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 1.218e+08 -0.3% 1.214e+08 -7.6% 1.125e+08 -6.4% 1.14e+08 unixbench.throughput detail is as below [2] > > Thanks, > Amir. > > [1] https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/ for this report, I also retest on new branch. seems the regression reduced to around 10%, but we cannot get stable data on this new branch, so we cannot say if it really becomes better now. ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/full/stress-ng/60s commit: v6.10-rc1 e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events v6.10-rc1 e0aaae806edc3411d84dc0d66fe ---------------- --------------------------- %stddev %change %stddev \ | \ 1.161e+08 ± 5% -9.5% 1.05e+08 ± 10% stress-ng.full.ops 1934587 ± 5% -9.5% 1750464 ± 10% stress-ng.full.ops_per_sec [2] ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench commit: v6.10-rc1 28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks 96768b7d67219 fsnotify: introduce pre-content permission event e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 6215 +0.1% 6223 -8.9% 5661 -7.9% 5724 time.user_time 0.58 -0.0 0.54 ± 18% -0.1 0.52 -0.0 0.55 ± 11% mpstat.cpu.all.irq% 0.01 ± 4% -0.0 0.01 ± 13% -0.0 0.00 ± 2% -0.0 0.01 ± 6% mpstat.cpu.all.soft% 7.59 ± 59% -58.7% 3.14 ± 63% -61.7% 2.90 ± 52% -33.4% 5.06 ± 50% sched_debug.cfs_rq:/.util_est.min 0.00 ± 72% -148.0% -0.00 -50.9% 0.00 ±244% +12.3% 0.00 ±102% sched_debug.cpu.nr_uninterruptible.avg 1.218e+08 -0.3% 1.214e+08 -7.6% 1.125e+08 -6.4% 1.14e+08 unixbench.throughput 6215 +0.1% 6223 -8.9% 5661 -7.9% 5724 unixbench.time.user_time 4.521e+10 -0.3% 4.506e+10 -7.7% 4.172e+10 -6.4% 4.231e+10 unixbench.workload 1.458e+11 -7.5% 1.35e+11 ± 18% -6.8% 1.359e+11 -9.5% 1.32e+11 ± 11% perf-stat.i.branch-instructions 3742171 ± 4% -16.8% 3112873 ± 20% +80.4% 6752235 ± 7% +403.3% 18836125 ± 14% perf-stat.i.cache-misses 32402657 ± 3% -19.5% 26094697 ± 16% +77.8% 57627688 ± 4% +356.5% 1.479e+08 ± 11% perf-stat.i.cache-references 0.95 +9.8% 1.04 ± 20% +5.8% 1.00 ± 2% +9.3% 1.03 ± 13% perf-stat.i.cpi 161794 ± 8% -2.2% 158309 ± 20% -49.9% 81139 ± 15% -77.3% 36784 ± 54% perf-stat.i.cycles-between-cache-misses 6.963e+11 -7.4% 6.445e+11 ± 18% -6.5% 6.513e+11 -8.8% 6.353e+11 ± 11% perf-stat.i.instructions 1.22 -4.7% 1.16 ± 10% -6.3% 1.14 -6.5% 1.14 ± 6% perf-stat.i.ipc 0.01 ± 4% -10.3% 0.00 ± 6% +92.9% 0.01 ± 7% +453.4% 0.03 ± 14% perf-stat.overall.MPKI 0.75 +0.0% 0.75 +6.8% 0.80 +4.3% 0.78 perf-stat.overall.cpi 139258 ± 4% +11.8% 155626 ± 6% -44.5% 77325 ± 7% -80.8% 26737 ± 14% perf-stat.overall.cycles-between-cache-misses 1.33 -0.0% 1.33 -6.3% 1.25 -4.1% 1.28 perf-stat.overall.ipc 5722 +0.3% 5738 +1.6% 5811 +2.6% 5869 perf-stat.overall.path-length 1.452e+11 -7.4% 1.343e+11 ± 18% -6.8% 1.352e+11 -9.5% 1.314e+11 ± 11% perf-stat.ps.branch-instructions 3742620 ± 4% -16.7% 3117570 ± 20% +80.4% 6752430 ± 7% +401.2% 18758290 ± 14% perf-stat.ps.cache-misses 32374621 ± 3% -19.4% 26088859 ± 16% +77.6% 57486380 ± 4% +354.8% 1.473e+08 ± 11% perf-stat.ps.cache-references 6.93e+11 -7.4% 6.415e+11 ± 18% -6.5% 6.481e+11 -8.8% 6.323e+11 ± 11% perf-stat.ps.instructions 2.587e+14 -0.0% 2.586e+14 -6.3% 2.425e+14 -4.0% 2.484e+14 perf-stat.total.instructions 2.85 -0.2 2.62 -0.3 2.55 ± 3% +0.0 2.86 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 5.99 -0.1 5.86 +0.1 6.09 +1.1 7.10 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 12.29 -0.1 12.16 -0.3 11.97 +0.8 13.04 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write 13.39 -0.1 13.28 -0.4 12.96 +0.7 14.12 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 13.18 -0.1 13.08 -0.9 12.24 -1.0 12.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write 1.98 ± 5% -0.1 1.88 -0.0 1.96 ± 3% +0.2 2.20 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 3.27 ± 4% -0.1 3.20 +4.7 7.99 ± 6% -0.1 3.21 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64 2.42 ± 6% -0.1 2.37 +4.7 7.16 ± 7% -0.1 2.28 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write 3.72 -0.1 3.67 -0.4 3.37 +0.1 3.81 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 58.06 -0.0 58.01 -2.5 55.53 +0.7 58.76 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 74.32 -0.0 74.28 +1.8 76.16 +1.4 75.67 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.90 ± 5% -0.0 0.87 -0.0 0.87 ± 3% +0.1 1.00 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write 0.70 -0.0 0.66 -0.0 0.68 -0.0 0.68 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write 5.30 -0.0 5.27 -0.2 5.06 -0.1 5.25 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 1.11 -0.0 1.08 -0.1 0.97 -0.2 0.94 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write 1.81 -0.0 1.80 -0.1 1.72 -0.1 1.76 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write 1.62 -0.0 1.62 -0.1 1.49 -0.1 1.52 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.92 -0.0 0.91 -0.1 0.84 -0.1 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 2.18 -0.0 2.17 -0.1 2.09 -0.1 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write 0.62 -0.0 0.62 -0.0 0.58 -0.0 0.60 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 0.63 -0.0 0.63 -0.0 0.58 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.92 -0.0 0.92 -0.1 0.84 -0.0 0.87 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 1.68 +0.0 1.68 -0.1 1.56 -0.1 1.60 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 0.68 +0.0 0.69 -0.0 0.64 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter 0.72 ± 2% +0.0 0.72 -0.0 0.69 -0.0 0.70 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write 0.85 +0.0 0.86 -0.0 0.81 -0.0 0.81 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags 0.74 +0.0 0.75 -0.0 0.70 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write 0.86 +0.0 0.86 +0.0 0.88 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write 84.29 +0.0 84.30 +1.2 85.47 +1.1 85.34 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 7.00 +0.0 7.02 -0.4 6.59 -0.1 6.89 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 82.86 +0.0 82.87 +1.3 84.15 +1.1 83.98 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.73 +0.0 0.74 -0.0 0.71 -0.0 0.70 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.64 +0.0 0.66 ± 2% -0.1 0.57 -0.1 0.58 perf-profile.calltrace.cycles-pp.w_test 78.16 +0.0 78.18 +1.7 79.81 +1.4 79.57 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 96.83 +0.0 96.86 +0.2 97.07 +0.3 97.12 perf-profile.calltrace.cycles-pp.write 0.78 ± 3% +0.0 0.82 ± 2% +0.1 0.88 ± 3% +0.3 1.05 ± 8% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter 1.12 ± 2% +0.0 1.16 -0.1 1.02 -0.0 1.11 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags 4.23 +0.0 4.27 -0.3 3.94 -0.1 4.13 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write 2.70 +0.0 2.74 -0.2 2.50 -0.1 2.63 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter 3.52 +0.0 3.57 -0.3 3.26 -0.1 3.41 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter 6.89 +0.1 6.94 -0.4 6.47 -0.1 6.77 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write 2.10 ± 5% +0.1 2.15 ± 3% -0.1 2.03 ± 4% +0.2 2.25 ± 3% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 2.96 +0.1 3.06 +0.1 3.08 +0.4 3.36 ± 2% perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write 3.61 +0.1 3.72 +0.1 3.71 +0.4 4.00 ± 2% perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write 37.30 +0.1 37.42 -1.6 35.66 +0.2 37.50 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 5.32 +0.2 5.48 -0.1 5.18 -0.1 5.21 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write 4.26 ± 3% +0.2 4.42 +5.4 9.61 ± 5% +0.7 4.98 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.20 +0.2 6.37 -0.2 5.98 -0.2 6.03 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 12.00 +0.2 12.19 -0.4 11.61 +0.2 12.25 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 3.02 -0.2 2.79 -0.3 2.72 ± 3% +0.0 3.04 perf-profile.children.cycles-pp.down_write 6.18 -0.1 6.05 +0.1 6.28 +1.1 7.28 perf-profile.children.cycles-pp.filemap_get_entry 12.68 -0.1 12.56 -0.3 12.34 +0.7 13.42 perf-profile.children.cycles-pp.__filemap_get_folio 13.58 -0.1 13.47 -0.5 13.13 +0.7 14.29 perf-profile.children.cycles-pp.simple_write_begin 58.65 -0.1 58.58 -2.5 56.11 +0.7 59.36 perf-profile.children.cycles-pp.generic_file_write_iter 3.64 ± 3% -0.1 3.58 +4.7 8.36 ± 6% -0.1 3.56 perf-profile.children.cycles-pp.security_file_permission 4.19 -0.1 4.14 -0.2 3.96 -0.2 3.98 perf-profile.children.cycles-pp.__cond_resched 2.67 ± 5% -0.1 2.61 +4.7 7.40 ± 6% -0.2 2.51 perf-profile.children.cycles-pp.apparmor_file_permission 3.81 -0.1 3.75 -0.4 3.45 +0.1 3.90 perf-profile.children.cycles-pp.__fsnotify_parent 2.42 -0.0 2.38 -0.2 2.23 -0.2 2.25 perf-profile.children.cycles-pp.rcu_all_qs 7.43 -0.0 7.39 -0.5 6.91 -0.5 6.92 perf-profile.children.cycles-pp.entry_SYSCALL_64 98.97 -0.0 98.94 +0.1 99.06 +0.1 99.04 perf-profile.children.cycles-pp.write 5.69 -0.0 5.66 -0.3 5.44 -0.1 5.62 perf-profile.children.cycles-pp.simple_write_end 1.21 -0.0 1.18 -0.1 1.06 -0.2 1.04 perf-profile.children.cycles-pp.syscall_return_via_sysret 75.19 -0.0 75.17 +1.8 76.99 +1.3 76.53 perf-profile.children.cycles-pp.vfs_write 1.12 -0.0 1.11 -0.1 1.03 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable 1.90 -0.0 1.88 -0.1 1.80 -0.1 1.84 perf-profile.children.cycles-pp.folio_unlock 1.99 -0.0 1.98 -0.2 1.82 -0.1 1.86 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.76 -0.0 0.75 -0.1 0.70 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call 0.23 -0.0 0.23 ± 2% -0.0 0.22 -0.0 0.22 perf-profile.children.cycles-pp.file_remove_privs 2.47 -0.0 2.46 -0.1 2.37 -0.1 2.41 perf-profile.children.cycles-pp.xas_load 0.64 -0.0 0.64 -0.1 0.58 -0.0 0.60 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 0.37 -0.0 0.37 -0.0 0.35 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write 0.36 -0.0 0.36 -0.0 0.33 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider 1.26 -0.0 1.26 -0.1 1.16 -0.1 1.19 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags 0.59 -0.0 0.59 -0.1 0.54 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi 1.74 -0.0 1.74 -0.1 1.62 -0.1 1.66 perf-profile.children.cycles-pp.up_write 1.05 -0.0 1.05 -0.1 0.97 -0.0 1.01 perf-profile.children.cycles-pp.folio_mapping 0.36 +0.0 0.36 -0.0 0.34 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode 0.33 +0.0 0.33 -0.0 0.30 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio 0.25 +0.0 0.25 -0.0 0.23 -0.0 0.23 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited 1.10 +0.0 1.10 -0.1 1.04 -0.1 1.04 perf-profile.children.cycles-pp.xattr_resolve_name 0.85 +0.0 0.85 -0.0 0.80 -0.0 0.82 perf-profile.children.cycles-pp.setattr_should_drop_suidgid 0.55 +0.0 0.55 -0.0 0.52 -0.0 0.52 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 0.92 +0.0 0.94 -0.1 0.87 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty 0.97 +0.0 0.98 +0.0 0.99 -0.0 0.93 perf-profile.children.cycles-pp.aa_file_perm 83.54 +0.0 83.55 +1.3 84.79 +1.1 84.63 perf-profile.children.cycles-pp.do_syscall_64 7.14 +0.0 7.15 -0.4 6.72 -0.1 7.03 perf-profile.children.cycles-pp.copy_page_from_iter_atomic 0.45 ± 2% +0.0 0.47 -0.0 0.41 -0.0 0.41 perf-profile.children.cycles-pp.write@plt 84.71 +0.0 84.72 +1.2 85.88 +1.1 85.76 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 4.41 +0.0 4.43 -0.3 4.11 -0.3 4.16 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 0.98 +0.0 1.00 -0.1 0.87 -0.1 0.89 perf-profile.children.cycles-pp.w_test 78.77 +0.0 78.80 +1.6 80.39 +1.4 80.17 perf-profile.children.cycles-pp.ksys_write 0.89 ± 3% +0.0 0.93 +0.1 0.97 ± 3% +0.3 1.16 ± 7% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 1.37 +0.0 1.42 -0.1 1.24 -0.0 1.34 perf-profile.children.cycles-pp.strcmp 4.46 +0.0 4.51 -0.3 4.15 -0.1 4.35 perf-profile.children.cycles-pp.security_inode_need_killpriv 3.75 +0.0 3.80 -0.3 3.47 -0.1 3.63 perf-profile.children.cycles-pp.cap_inode_need_killpriv 7.24 +0.1 7.30 -0.4 6.81 -0.1 7.11 perf-profile.children.cycles-pp.file_remove_privs_flags 3.39 +0.1 3.45 -0.2 3.15 -0.1 3.29 perf-profile.children.cycles-pp.__vfs_getxattr 3.38 +0.1 3.48 +0.1 3.46 +0.4 3.74 ± 2% perf-profile.children.cycles-pp.inode_needs_update_time 3.94 +0.1 4.05 +0.1 4.02 +0.4 4.32 perf-profile.children.cycles-pp.file_update_time 38.19 +0.1 38.33 -1.7 36.51 +0.2 38.39 perf-profile.children.cycles-pp.generic_perform_write 5.71 +0.2 5.87 -0.2 5.49 -0.2 5.54 perf-profile.children.cycles-pp.fault_in_readable 4.50 ± 3% +0.2 4.67 +5.4 9.86 ± 4% +0.9 5.35 perf-profile.children.cycles-pp.rw_verify_area 6.45 +0.2 6.64 -0.2 6.22 -0.2 6.28 perf-profile.children.cycles-pp.fault_in_iov_iter_readable 12.34 +0.2 12.53 -0.4 11.93 +0.2 12.58 perf-profile.children.cycles-pp.__generic_file_write_iter 1.98 ± 3% -0.2 1.79 -0.2 1.74 ± 4% +0.1 2.04 perf-profile.self.cycles-pp.down_write 3.66 -0.1 3.54 +0.2 3.86 +1.2 4.82 ± 2% perf-profile.self.cycles-pp.filemap_get_entry 3.64 -0.1 3.57 -0.4 3.28 +0.1 3.73 perf-profile.self.cycles-pp.__fsnotify_parent 1.54 ± 9% -0.1 1.47 ± 2% +4.7 6.22 ± 8% -0.1 1.44 perf-profile.self.cycles-pp.apparmor_file_permission 1.71 ± 2% -0.1 1.65 -0.0 1.69 ± 2% +0.1 1.82 perf-profile.self.cycles-pp.generic_file_write_iter 6.45 -0.1 6.39 -0.5 5.95 -0.5 6.00 perf-profile.self.cycles-pp.write 7.25 -0.1 7.20 -0.6 6.64 -0.3 6.98 perf-profile.self.cycles-pp.vfs_write 2.35 -0.0 2.31 -0.1 2.25 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched 2.76 -0.0 2.72 -0.1 2.65 -0.0 2.72 perf-profile.self.cycles-pp.simple_write_end 0.80 ± 4% -0.0 0.78 ± 2% -0.1 0.76 ± 2% +0.1 0.88 perf-profile.self.cycles-pp.generic_write_check_limits 1.20 -0.0 1.18 -0.1 1.06 -0.2 1.03 perf-profile.self.cycles-pp.syscall_return_via_sysret 1.41 -0.0 1.39 -0.1 1.30 -0.1 1.34 perf-profile.self.cycles-pp.rcu_all_qs 1.76 -0.0 1.75 -0.1 1.68 -0.0 1.72 perf-profile.self.cycles-pp.folio_unlock 1.90 -0.0 1.90 -0.1 1.76 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64 0.54 -0.0 0.53 -0.0 0.50 -0.0 0.50 perf-profile.self.cycles-pp.folio_wait_stable 1.80 -0.0 1.80 -0.1 1.72 -0.0 1.75 perf-profile.self.cycles-pp.xas_load 1.10 -0.0 1.09 -0.0 1.06 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission 1.48 -0.0 1.47 -0.1 1.36 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write 0.62 -0.0 0.62 -0.0 0.58 -0.0 0.58 perf-profile.self.cycles-pp.x64_sys_call 0.35 -0.0 0.35 -0.0 0.32 -0.0 0.33 perf-profile.self.cycles-pp.cap_inode_need_killpriv 0.24 -0.0 0.24 -0.0 0.23 ± 2% -0.0 0.23 perf-profile.self.cycles-pp.is_bad_inode 0.87 -0.0 0.87 -0.1 0.80 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping 0.24 -0.0 0.24 -0.0 0.22 -0.0 0.22 perf-profile.self.cycles-pp.amd_clear_divider 0.70 -0.0 0.70 -0.0 0.67 +0.0 0.72 perf-profile.self.cycles-pp.security_inode_need_killpriv 0.52 -0.0 0.52 -0.0 0.47 -0.0 0.49 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 1.62 +0.0 1.62 -0.1 1.51 -0.1 1.55 perf-profile.self.cycles-pp.up_write 0.22 +0.0 0.22 -0.0 0.21 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio 1.09 +0.0 1.10 -0.1 1.01 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.35 +0.0 0.36 -0.0 0.32 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi 1.25 +0.0 1.25 -0.1 1.16 -0.0 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.25 +0.0 0.25 -0.0 0.23 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write 0.79 +0.0 0.80 -0.1 0.74 -0.0 0.75 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags 2.04 +0.0 2.04 -0.1 1.95 +0.0 2.04 perf-profile.self.cycles-pp.file_remove_privs_flags 0.55 +0.0 0.55 -0.0 0.52 -0.0 0.52 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.61 +0.0 0.62 -0.0 0.58 -0.0 0.59 perf-profile.self.cycles-pp.xattr_resolve_name 0.73 +0.0 0.73 -0.0 0.69 -0.0 0.70 perf-profile.self.cycles-pp.setattr_should_drop_suidgid 4.51 +0.0 4.52 -0.3 4.23 -0.2 4.27 perf-profile.self.cycles-pp.__filemap_get_folio 0.47 +0.0 0.48 -0.0 0.45 +0.0 0.49 perf-profile.self.cycles-pp.folio_mark_dirty 0.87 +0.0 0.88 +0.0 0.89 -0.0 0.83 perf-profile.self.cycles-pp.aa_file_perm 6.97 +0.0 6.98 -0.4 6.56 -0.1 6.86 perf-profile.self.cycles-pp.copy_page_from_iter_atomic 1.39 +0.0 1.40 -0.1 1.30 -0.0 1.34 perf-profile.self.cycles-pp.__vfs_getxattr 1.65 +0.0 1.67 -0.1 1.56 -0.1 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64 4.32 +0.0 4.34 -0.3 4.02 -0.3 4.06 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.90 +0.0 0.92 -0.1 0.80 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin 0.66 +0.0 0.68 -0.0 0.64 +0.0 0.67 perf-profile.self.cycles-pp.file_update_time 1.15 +0.0 1.18 -0.0 1.10 -0.0 1.15 perf-profile.self.cycles-pp.__generic_file_write_iter 0.78 +0.0 0.81 -0.1 0.69 -0.1 0.71 perf-profile.self.cycles-pp.w_test 1.02 ± 2% +0.0 1.06 -0.1 0.91 -0.0 1.00 perf-profile.self.cycles-pp.strcmp 1.50 +0.0 1.53 +0.0 1.52 +0.1 1.58 perf-profile.self.cycles-pp.inode_needs_update_time 0.78 ± 4% +0.0 0.82 +0.1 0.86 ± 3% +0.3 1.05 ± 8% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 3.35 +0.1 3.44 -0.2 3.18 -0.0 3.31 perf-profile.self.cycles-pp.generic_perform_write 5.56 +0.2 5.72 -0.2 5.33 -0.2 5.38 perf-profile.self.cycles-pp.fault_in_readable 0.85 +0.2 1.09 +0.7 1.50 +1.1 1.92 perf-profile.self.cycles-pp.rw_verify_area ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-08 5:40 ` Oliver Sang @ 2024-07-08 16:37 ` Amir Goldstein 0 siblings, 0 replies; 17+ messages in thread From: Amir Goldstein @ 2024-07-08 16:37 UTC (permalink / raw) To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp On Mon, Jul 8, 2024 at 8:40 AM Oliver Sang <oliver.sang@intel.com> wrote: > > hi, Amir, > > On Fri, Jul 05, 2024 at 08:48:28AM +0300, Amir Goldstein wrote: > > [...] > > > > > Thanks. > > That clarifies that the cycles are spent in the "optimization code" itself. > > > > I pushed a new version to the fsnotify_for_lkp branch with a possible fix commit > > at the base of the branch. > > > > Hopefully, with this fix, the compiler will be able to optimize smarter and > > the generated fast path code will be less sensitive to code alignment ??? > > > > If it works, it may eliminate some of the regressions throughout this branch and > > may also improve the stress-ng regression that you reported on v6.10-rc1 [1]. > > > > * e0aaae806edc - (fsnotify_for_lkp) fanotify: report file range info > > with pre-content events > > * a28c32866bb3 - fanotify: rename a misnamed constant > > * 61baabbdceaa - fanotify: pass optional file access range in pre-content event > > * 72e76d909afd - fanotify: introduce FAN_PRE_MODIFY permission event > > * 1c71a12ff3ce - fanotify: introduce FAN_PRE_ACCESS permission event > > * 38a903de931a - fsnotify: generate pre-content permission event on exec > > * 70be29706389 - fsnotify: generate pre-content permission event on open > > * 96768b7d6721 - fsnotify: introduce pre-content permission event > > * 28d5b4a88241 - fsnotify: avoid multiple fsnotify_sb_info() access in > > permission hooks > > > > Fingers crossed... > > unfortunately, seems no luck. I combine the results with 96768b7d6721 and its > parent since 96768b7d6721 introduces most regression. > Too bad. I will need to have a think. Thank you for testing! Amir. > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > v6.10-rc1 > 28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks > 96768b7d67219 fsnotify: introduce pre-content permission event > e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events > > v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe > ---------------- --------------------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev %change %stddev > \ | \ | \ | \ > 1.218e+08 -0.3% 1.214e+08 -7.6% 1.125e+08 -6.4% 1.14e+08 unixbench.throughput > > detail is as below [2] > > > > > > Thanks, > > Amir. > > > > [1] https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/ > > for this report, I also retest on new branch. seems the regression reduced to > around 10%, but we cannot get stable data on this new branch, so we cannot say > if it really becomes better now. > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/full/stress-ng/60s > > commit: > v6.10-rc1 > e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events > > v6.10-rc1 e0aaae806edc3411d84dc0d66fe > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 1.161e+08 ą 5% -9.5% 1.05e+08 ą 10% stress-ng.full.ops > 1934587 ą 5% -9.5% 1750464 ą 10% stress-ng.full.ops_per_sec > > > > [2] > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > commit: > v6.10-rc1 > 28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks > 96768b7d67219 fsnotify: introduce pre-content permission event > e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events > > v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe > ---------------- --------------------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev %change %stddev > \ | \ | \ | \ > 6215 +0.1% 6223 -8.9% 5661 -7.9% 5724 time.user_time > 0.58 -0.0 0.54 ą 18% -0.1 0.52 -0.0 0.55 ą 11% mpstat.cpu.all.irq% > 0.01 ą 4% -0.0 0.01 ą 13% -0.0 0.00 ą 2% -0.0 0.01 ą 6% mpstat.cpu.all.soft% > 7.59 ą 59% -58.7% 3.14 ą 63% -61.7% 2.90 ą 52% -33.4% 5.06 ą 50% sched_debug.cfs_rq:/.util_est.min > 0.00 ą 72% -148.0% -0.00 -50.9% 0.00 ą244% +12.3% 0.00 ą102% sched_debug.cpu.nr_uninterruptible.avg > 1.218e+08 -0.3% 1.214e+08 -7.6% 1.125e+08 -6.4% 1.14e+08 unixbench.throughput > 6215 +0.1% 6223 -8.9% 5661 -7.9% 5724 unixbench.time.user_time > 4.521e+10 -0.3% 4.506e+10 -7.7% 4.172e+10 -6.4% 4.231e+10 unixbench.workload > 1.458e+11 -7.5% 1.35e+11 ą 18% -6.8% 1.359e+11 -9.5% 1.32e+11 ą 11% perf-stat.i.branch-instructions > 3742171 ą 4% -16.8% 3112873 ą 20% +80.4% 6752235 ą 7% +403.3% 18836125 ą 14% perf-stat.i.cache-misses > 32402657 ą 3% -19.5% 26094697 ą 16% +77.8% 57627688 ą 4% +356.5% 1.479e+08 ą 11% perf-stat.i.cache-references > 0.95 +9.8% 1.04 ą 20% +5.8% 1.00 ą 2% +9.3% 1.03 ą 13% perf-stat.i.cpi > 161794 ą 8% -2.2% 158309 ą 20% -49.9% 81139 ą 15% -77.3% 36784 ą 54% perf-stat.i.cycles-between-cache-misses > 6.963e+11 -7.4% 6.445e+11 ą 18% -6.5% 6.513e+11 -8.8% 6.353e+11 ą 11% perf-stat.i.instructions > 1.22 -4.7% 1.16 ą 10% -6.3% 1.14 -6.5% 1.14 ą 6% perf-stat.i.ipc > 0.01 ą 4% -10.3% 0.00 ą 6% +92.9% 0.01 ą 7% +453.4% 0.03 ą 14% perf-stat.overall.MPKI > 0.75 +0.0% 0.75 +6.8% 0.80 +4.3% 0.78 perf-stat.overall.cpi > 139258 ą 4% +11.8% 155626 ą 6% -44.5% 77325 ą 7% -80.8% 26737 ą 14% perf-stat.overall.cycles-between-cache-misses > 1.33 -0.0% 1.33 -6.3% 1.25 -4.1% 1.28 perf-stat.overall.ipc > 5722 +0.3% 5738 +1.6% 5811 +2.6% 5869 perf-stat.overall.path-length > 1.452e+11 -7.4% 1.343e+11 ą 18% -6.8% 1.352e+11 -9.5% 1.314e+11 ą 11% perf-stat.ps.branch-instructions > 3742620 ą 4% -16.7% 3117570 ą 20% +80.4% 6752430 ą 7% +401.2% 18758290 ą 14% perf-stat.ps.cache-misses > 32374621 ą 3% -19.4% 26088859 ą 16% +77.6% 57486380 ą 4% +354.8% 1.473e+08 ą 11% perf-stat.ps.cache-references > 6.93e+11 -7.4% 6.415e+11 ą 18% -6.5% 6.481e+11 -8.8% 6.323e+11 ą 11% perf-stat.ps.instructions > 2.587e+14 -0.0% 2.586e+14 -6.3% 2.425e+14 -4.0% 2.484e+14 perf-stat.total.instructions > 2.85 -0.2 2.62 -0.3 2.55 ą 3% +0.0 2.86 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 5.99 -0.1 5.86 +0.1 6.09 +1.1 7.10 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 12.29 -0.1 12.16 -0.3 11.97 +0.8 13.04 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write > 13.39 -0.1 13.28 -0.4 12.96 +0.7 14.12 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 13.18 -0.1 13.08 -0.9 12.24 -1.0 12.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write > 1.98 ą 5% -0.1 1.88 -0.0 1.96 ą 3% +0.2 2.20 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 3.27 ą 4% -0.1 3.20 +4.7 7.99 ą 6% -0.1 3.21 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64 > 2.42 ą 6% -0.1 2.37 +4.7 7.16 ą 7% -0.1 2.28 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write > 3.72 -0.1 3.67 -0.4 3.37 +0.1 3.81 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 58.06 -0.0 58.01 -2.5 55.53 +0.7 58.76 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 74.32 -0.0 74.28 +1.8 76.16 +1.4 75.67 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.90 ą 5% -0.0 0.87 -0.0 0.87 ą 3% +0.1 1.00 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write > 0.70 -0.0 0.66 -0.0 0.68 -0.0 0.68 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write > 5.30 -0.0 5.27 -0.2 5.06 -0.1 5.25 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 1.11 -0.0 1.08 -0.1 0.97 -0.2 0.94 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write > 1.81 -0.0 1.80 -0.1 1.72 -0.1 1.76 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > 1.62 -0.0 1.62 -0.1 1.49 -0.1 1.52 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.92 -0.0 0.91 -0.1 0.84 -0.1 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 2.18 -0.0 2.17 -0.1 2.09 -0.1 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write > 0.62 -0.0 0.62 -0.0 0.58 -0.0 0.60 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 0.63 -0.0 0.63 -0.0 0.58 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.92 -0.0 0.92 -0.1 0.84 -0.0 0.87 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 1.68 +0.0 1.68 -0.1 1.56 -0.1 1.60 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 0.68 +0.0 0.69 -0.0 0.64 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter > 0.72 ą 2% +0.0 0.72 -0.0 0.69 -0.0 0.70 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > 0.85 +0.0 0.86 -0.0 0.81 -0.0 0.81 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > 0.74 +0.0 0.75 -0.0 0.70 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write > 0.86 +0.0 0.86 +0.0 0.88 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write > 84.29 +0.0 84.30 +1.2 85.47 +1.1 85.34 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write > 7.00 +0.0 7.02 -0.4 6.59 -0.1 6.89 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 82.86 +0.0 82.87 +1.3 84.15 +1.1 83.98 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 0.73 +0.0 0.74 -0.0 0.71 -0.0 0.70 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.64 +0.0 0.66 ą 2% -0.1 0.57 -0.1 0.58 perf-profile.calltrace.cycles-pp.w_test > 78.16 +0.0 78.18 +1.7 79.81 +1.4 79.57 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 96.83 +0.0 96.86 +0.2 97.07 +0.3 97.12 perf-profile.calltrace.cycles-pp.write > 0.78 ą 3% +0.0 0.82 ą 2% +0.1 0.88 ą 3% +0.3 1.05 ą 8% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter > 1.12 ą 2% +0.0 1.16 -0.1 1.02 -0.0 1.11 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags > 4.23 +0.0 4.27 -0.3 3.94 -0.1 4.13 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write > 2.70 +0.0 2.74 -0.2 2.50 -0.1 2.63 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter > 3.52 +0.0 3.57 -0.3 3.26 -0.1 3.41 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter > 6.89 +0.1 6.94 -0.4 6.47 -0.1 6.77 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > 2.10 ą 5% +0.1 2.15 ą 3% -0.1 2.03 ą 4% +0.2 2.25 ą 3% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write > 2.96 +0.1 3.06 +0.1 3.08 +0.4 3.36 ą 2% perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write > 3.61 +0.1 3.72 +0.1 3.71 +0.4 4.00 ą 2% perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write > 37.30 +0.1 37.42 -1.6 35.66 +0.2 37.50 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 5.32 +0.2 5.48 -0.1 5.18 -0.1 5.21 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write > 4.26 ą 3% +0.2 4.42 +5.4 9.61 ą 5% +0.7 4.98 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe > 6.20 +0.2 6.37 -0.2 5.98 -0.2 6.03 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 12.00 +0.2 12.19 -0.4 11.61 +0.2 12.25 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64 > 3.02 -0.2 2.79 -0.3 2.72 ą 3% +0.0 3.04 perf-profile.children.cycles-pp.down_write > 6.18 -0.1 6.05 +0.1 6.28 +1.1 7.28 perf-profile.children.cycles-pp.filemap_get_entry > 12.68 -0.1 12.56 -0.3 12.34 +0.7 13.42 perf-profile.children.cycles-pp.__filemap_get_folio > 13.58 -0.1 13.47 -0.5 13.13 +0.7 14.29 perf-profile.children.cycles-pp.simple_write_begin > 58.65 -0.1 58.58 -2.5 56.11 +0.7 59.36 perf-profile.children.cycles-pp.generic_file_write_iter > 3.64 ą 3% -0.1 3.58 +4.7 8.36 ą 6% -0.1 3.56 perf-profile.children.cycles-pp.security_file_permission > 4.19 -0.1 4.14 -0.2 3.96 -0.2 3.98 perf-profile.children.cycles-pp.__cond_resched > 2.67 ą 5% -0.1 2.61 +4.7 7.40 ą 6% -0.2 2.51 perf-profile.children.cycles-pp.apparmor_file_permission > 3.81 -0.1 3.75 -0.4 3.45 +0.1 3.90 perf-profile.children.cycles-pp.__fsnotify_parent > 2.42 -0.0 2.38 -0.2 2.23 -0.2 2.25 perf-profile.children.cycles-pp.rcu_all_qs > 7.43 -0.0 7.39 -0.5 6.91 -0.5 6.92 perf-profile.children.cycles-pp.entry_SYSCALL_64 > 98.97 -0.0 98.94 +0.1 99.06 +0.1 99.04 perf-profile.children.cycles-pp.write > 5.69 -0.0 5.66 -0.3 5.44 -0.1 5.62 perf-profile.children.cycles-pp.simple_write_end > 1.21 -0.0 1.18 -0.1 1.06 -0.2 1.04 perf-profile.children.cycles-pp.syscall_return_via_sysret > 75.19 -0.0 75.17 +1.8 76.99 +1.3 76.53 perf-profile.children.cycles-pp.vfs_write > 1.12 -0.0 1.11 -0.1 1.03 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable > 1.90 -0.0 1.88 -0.1 1.80 -0.1 1.84 perf-profile.children.cycles-pp.folio_unlock > 1.99 -0.0 1.98 -0.2 1.82 -0.1 1.86 perf-profile.children.cycles-pp.syscall_exit_to_user_mode > 0.76 -0.0 0.75 -0.1 0.70 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call > 0.23 -0.0 0.23 ą 2% -0.0 0.22 -0.0 0.22 perf-profile.children.cycles-pp.file_remove_privs > 2.47 -0.0 2.46 -0.1 2.37 -0.1 2.41 perf-profile.children.cycles-pp.xas_load > 0.64 -0.0 0.64 -0.1 0.58 -0.0 0.60 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare > 0.37 -0.0 0.37 -0.0 0.35 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write > 0.36 -0.0 0.36 -0.0 0.33 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider > 1.26 -0.0 1.26 -0.1 1.16 -0.1 1.19 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags > 0.59 -0.0 0.59 -0.1 0.54 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi > 1.74 -0.0 1.74 -0.1 1.62 -0.1 1.66 perf-profile.children.cycles-pp.up_write > 1.05 -0.0 1.05 -0.1 0.97 -0.0 1.01 perf-profile.children.cycles-pp.folio_mapping > 0.36 +0.0 0.36 -0.0 0.34 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode > 0.33 +0.0 0.33 -0.0 0.30 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio > 0.25 +0.0 0.25 -0.0 0.23 -0.0 0.23 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited > 1.10 +0.0 1.10 -0.1 1.04 -0.1 1.04 perf-profile.children.cycles-pp.xattr_resolve_name > 0.85 +0.0 0.85 -0.0 0.80 -0.0 0.82 perf-profile.children.cycles-pp.setattr_should_drop_suidgid > 0.55 +0.0 0.55 -0.0 0.52 -0.0 0.52 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack > 0.92 +0.0 0.94 -0.1 0.87 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty > 0.97 +0.0 0.98 +0.0 0.99 -0.0 0.93 perf-profile.children.cycles-pp.aa_file_perm > 83.54 +0.0 83.55 +1.3 84.79 +1.1 84.63 perf-profile.children.cycles-pp.do_syscall_64 > 7.14 +0.0 7.15 -0.4 6.72 -0.1 7.03 perf-profile.children.cycles-pp.copy_page_from_iter_atomic > 0.45 ą 2% +0.0 0.47 -0.0 0.41 -0.0 0.41 perf-profile.children.cycles-pp.write@plt > 84.71 +0.0 84.72 +1.2 85.88 +1.1 85.76 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe > 4.41 +0.0 4.43 -0.3 4.11 -0.3 4.16 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack > 0.98 +0.0 1.00 -0.1 0.87 -0.1 0.89 perf-profile.children.cycles-pp.w_test > 78.77 +0.0 78.80 +1.6 80.39 +1.4 80.17 perf-profile.children.cycles-pp.ksys_write > 0.89 ą 3% +0.0 0.93 +0.1 0.97 ą 3% +0.3 1.16 ą 7% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 > 1.37 +0.0 1.42 -0.1 1.24 -0.0 1.34 perf-profile.children.cycles-pp.strcmp > 4.46 +0.0 4.51 -0.3 4.15 -0.1 4.35 perf-profile.children.cycles-pp.security_inode_need_killpriv > 3.75 +0.0 3.80 -0.3 3.47 -0.1 3.63 perf-profile.children.cycles-pp.cap_inode_need_killpriv > 7.24 +0.1 7.30 -0.4 6.81 -0.1 7.11 perf-profile.children.cycles-pp.file_remove_privs_flags > 3.39 +0.1 3.45 -0.2 3.15 -0.1 3.29 perf-profile.children.cycles-pp.__vfs_getxattr > 3.38 +0.1 3.48 +0.1 3.46 +0.4 3.74 ą 2% perf-profile.children.cycles-pp.inode_needs_update_time > 3.94 +0.1 4.05 +0.1 4.02 +0.4 4.32 perf-profile.children.cycles-pp.file_update_time > 38.19 +0.1 38.33 -1.7 36.51 +0.2 38.39 perf-profile.children.cycles-pp.generic_perform_write > 5.71 +0.2 5.87 -0.2 5.49 -0.2 5.54 perf-profile.children.cycles-pp.fault_in_readable > 4.50 ą 3% +0.2 4.67 +5.4 9.86 ą 4% +0.9 5.35 perf-profile.children.cycles-pp.rw_verify_area > 6.45 +0.2 6.64 -0.2 6.22 -0.2 6.28 perf-profile.children.cycles-pp.fault_in_iov_iter_readable > 12.34 +0.2 12.53 -0.4 11.93 +0.2 12.58 perf-profile.children.cycles-pp.__generic_file_write_iter > 1.98 ą 3% -0.2 1.79 -0.2 1.74 ą 4% +0.1 2.04 perf-profile.self.cycles-pp.down_write > 3.66 -0.1 3.54 +0.2 3.86 +1.2 4.82 ą 2% perf-profile.self.cycles-pp.filemap_get_entry > 3.64 -0.1 3.57 -0.4 3.28 +0.1 3.73 perf-profile.self.cycles-pp.__fsnotify_parent > 1.54 ą 9% -0.1 1.47 ą 2% +4.7 6.22 ą 8% -0.1 1.44 perf-profile.self.cycles-pp.apparmor_file_permission > 1.71 ą 2% -0.1 1.65 -0.0 1.69 ą 2% +0.1 1.82 perf-profile.self.cycles-pp.generic_file_write_iter > 6.45 -0.1 6.39 -0.5 5.95 -0.5 6.00 perf-profile.self.cycles-pp.write > 7.25 -0.1 7.20 -0.6 6.64 -0.3 6.98 perf-profile.self.cycles-pp.vfs_write > 2.35 -0.0 2.31 -0.1 2.25 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched > 2.76 -0.0 2.72 -0.1 2.65 -0.0 2.72 perf-profile.self.cycles-pp.simple_write_end > 0.80 ą 4% -0.0 0.78 ą 2% -0.1 0.76 ą 2% +0.1 0.88 perf-profile.self.cycles-pp.generic_write_check_limits > 1.20 -0.0 1.18 -0.1 1.06 -0.2 1.03 perf-profile.self.cycles-pp.syscall_return_via_sysret > 1.41 -0.0 1.39 -0.1 1.30 -0.1 1.34 perf-profile.self.cycles-pp.rcu_all_qs > 1.76 -0.0 1.75 -0.1 1.68 -0.0 1.72 perf-profile.self.cycles-pp.folio_unlock > 1.90 -0.0 1.90 -0.1 1.76 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64 > 0.54 -0.0 0.53 -0.0 0.50 -0.0 0.50 perf-profile.self.cycles-pp.folio_wait_stable > 1.80 -0.0 1.80 -0.1 1.72 -0.0 1.75 perf-profile.self.cycles-pp.xas_load > 1.10 -0.0 1.09 -0.0 1.06 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission > 1.48 -0.0 1.47 -0.1 1.36 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write > 0.62 -0.0 0.62 -0.0 0.58 -0.0 0.58 perf-profile.self.cycles-pp.x64_sys_call > 0.35 -0.0 0.35 -0.0 0.32 -0.0 0.33 perf-profile.self.cycles-pp.cap_inode_need_killpriv > 0.24 -0.0 0.24 -0.0 0.23 ą 2% -0.0 0.23 perf-profile.self.cycles-pp.is_bad_inode > 0.87 -0.0 0.87 -0.1 0.80 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping > 0.24 -0.0 0.24 -0.0 0.22 -0.0 0.22 perf-profile.self.cycles-pp.amd_clear_divider > 0.70 -0.0 0.70 -0.0 0.67 +0.0 0.72 perf-profile.self.cycles-pp.security_inode_need_killpriv > 0.52 -0.0 0.52 -0.0 0.47 -0.0 0.49 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare > 1.62 +0.0 1.62 -0.1 1.51 -0.1 1.55 perf-profile.self.cycles-pp.up_write > 0.22 +0.0 0.22 -0.0 0.21 ą 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio > 1.09 +0.0 1.10 -0.1 1.01 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode > 0.35 +0.0 0.36 -0.0 0.32 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi > 1.25 +0.0 1.25 -0.1 1.16 -0.0 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe > 0.25 +0.0 0.25 -0.0 0.23 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write > 0.79 +0.0 0.80 -0.1 0.74 -0.0 0.75 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags > 2.04 +0.0 2.04 -0.1 1.95 +0.0 2.04 perf-profile.self.cycles-pp.file_remove_privs_flags > 0.55 +0.0 0.55 -0.0 0.52 -0.0 0.52 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack > 0.61 +0.0 0.62 -0.0 0.58 -0.0 0.59 perf-profile.self.cycles-pp.xattr_resolve_name > 0.73 +0.0 0.73 -0.0 0.69 -0.0 0.70 perf-profile.self.cycles-pp.setattr_should_drop_suidgid > 4.51 +0.0 4.52 -0.3 4.23 -0.2 4.27 perf-profile.self.cycles-pp.__filemap_get_folio > 0.47 +0.0 0.48 -0.0 0.45 +0.0 0.49 perf-profile.self.cycles-pp.folio_mark_dirty > 0.87 +0.0 0.88 +0.0 0.89 -0.0 0.83 perf-profile.self.cycles-pp.aa_file_perm > 6.97 +0.0 6.98 -0.4 6.56 -0.1 6.86 perf-profile.self.cycles-pp.copy_page_from_iter_atomic > 1.39 +0.0 1.40 -0.1 1.30 -0.0 1.34 perf-profile.self.cycles-pp.__vfs_getxattr > 1.65 +0.0 1.67 -0.1 1.56 -0.1 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64 > 4.32 +0.0 4.34 -0.3 4.02 -0.3 4.06 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack > 0.90 +0.0 0.92 -0.1 0.80 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin > 0.66 +0.0 0.68 -0.0 0.64 +0.0 0.67 perf-profile.self.cycles-pp.file_update_time > 1.15 +0.0 1.18 -0.0 1.10 -0.0 1.15 perf-profile.self.cycles-pp.__generic_file_write_iter > 0.78 +0.0 0.81 -0.1 0.69 -0.1 0.71 perf-profile.self.cycles-pp.w_test > 1.02 ą 2% +0.0 1.06 -0.1 0.91 -0.0 1.00 perf-profile.self.cycles-pp.strcmp > 1.50 +0.0 1.53 +0.0 1.52 +0.1 1.58 perf-profile.self.cycles-pp.inode_needs_update_time > 0.78 ą 4% +0.0 0.82 +0.1 0.86 ą 3% +0.3 1.05 ą 8% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 > 3.35 +0.1 3.44 -0.2 3.18 -0.0 3.31 perf-profile.self.cycles-pp.generic_perform_write > 5.56 +0.2 5.72 -0.2 5.33 -0.2 5.38 perf-profile.self.cycles-pp.fault_in_readable > 0.85 +0.2 1.09 +0.7 1.50 +1.1 1.92 perf-profile.self.cycles-pp.rw_verify_area > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-05 5:48 ` Amir Goldstein 2024-07-08 5:40 ` Oliver Sang @ 2024-07-25 13:41 ` Jan Kara 2024-07-25 14:04 ` Amir Goldstein 1 sibling, 1 reply; 17+ messages in thread From: Jan Kara @ 2024-07-25 13:41 UTC (permalink / raw) To: Amir Goldstein; +Cc: Oliver Sang, Jan Kara, oe-lkp, lkp On Fri 05-07-24 08:48:28, Amir Goldstein wrote: > On Fri, Jul 5, 2024 at 5:09 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > hi, Amir, > > > > On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote: > > > > [...] > > > > > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1: > > > > "unixbench.throughput": [ > > > > 121545292.8, > > > > 121629889.4, > > > > 121598992.0, > > > > 121492095.5, > > > > 121645038.1, > > > > 121556286.9 > > > > ], > > > > > > > > > > Are all those runs from the same boot? > > > > no. we reboot machine before each run. > > > > > > > > > for the branch tip a82fd282befc7: > > > > "unixbench.throughput": [ > > > > 116675606.7, > > > > 116840611.2, > > > > 116738966.0, > > > > 116956953.1, > > > > 116704901.9, > > > > 116997628.3, > > > > 117141733.7, > > > > 116660495.4 > > > > ], > > > > > > > > > > And these run? > > > > same. > > > > > > > > Otherwise, we might have a fluctuation that happens at boot time > > > or at mount time or something. > > > > > > > > > > > let me combine the results from this branch together: > > > > > > > > ========================================================================================= > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > commit: > > > > v6.10-rc1 > > > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > > > 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > > > a82fd282befc7 fanotify: report file range info with pre-content > > > > > > > > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066 > > > > ---------------- --------------------------- --------------------------- --------------------------- --------------------------- > > > > %stddev %change %stddev %change %stddev %change %stddev %change %stddev > > > > \ | \ | \ | \ | \ > > > > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > > > > > > > one thing I want to mention is the "%change" is always comparing to the first > > > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to > > > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1, > > > > and so on. > > > > > > Thanks for clarifying - I did not read it this way. > > > > > > > > > > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about > > > > -2.4% regression compareing to 94167e071109d. > > > > > > > > from above table, along the branch, the performance is kind of fluctuating, > > > > dropped most on 64108c0b47db9, but then recovered a little on tip. > > > > > > > > > > I can understand why 64108c0b47db91b would regress performance, but I > > > cannot think > > > of any possible explanation why a82fd282befc should improve performance, > > > so I have to wonder if the regression to -6.6% is not a fluke of some > > > specific boot/mount? > > > > > > I pushed a test branch to > > > https://github.com/amir73il/linux/commits/fsnotify_for_lkp > > > with an extra patch that un-inlines some helpers to help bisect the > > > perf report better. > > > Maybe produce the report with this commit and it sheds some light. > > > > since > > > > * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events > > * f301cd18006c3 fanotify: rename a misnamed constant > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec > > * aca4084213276 fsnotify: generate pre-content permission event on open > > * 93656e196b006 fsnotify: introduce pre-content permission event > > * 1613e604df0cd (tag: v6.10-rc1, > > > > we run tests upon new commit. summary report is as below: > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > commit: > > v6.10-rc1 > > a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events > > 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers > > > > v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360 > > ---------------- --------------------------- --------------------------- > > %stddev %change %stddev %change %stddev > > \ | \ | \ > > 1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput > >: > > > > since Jan mentioned in a later mail that perf profiles are useful, I put details > > as below > > Thanks. > That clarifies that the cycles are spent in the "optimization code" itself. BTW, Amir how did you decide that the time is spent in the "optimization code"? I've seen in the perf output there are more cache misses, smaller IPC, but didn't see a particular place where this would be happening... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression 2024-07-25 13:41 ` Jan Kara @ 2024-07-25 14:04 ` Amir Goldstein 0 siblings, 0 replies; 17+ messages in thread From: Amir Goldstein @ 2024-07-25 14:04 UTC (permalink / raw) To: Jan Kara; +Cc: Oliver Sang, oe-lkp, lkp On Thu, Jul 25, 2024 at 4:41 PM Jan Kara <jack@suse.cz> wrote: > > On Fri 05-07-24 08:48:28, Amir Goldstein wrote: > > On Fri, Jul 5, 2024 at 5:09 AM Oliver Sang <oliver.sang@intel.com> wrote: > > > > > > hi, Amir, > > > > > > On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote: > > > > > > [...] > > > > > > > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1: > > > > > "unixbench.throughput": [ > > > > > 121545292.8, > > > > > 121629889.4, > > > > > 121598992.0, > > > > > 121492095.5, > > > > > 121645038.1, > > > > > 121556286.9 > > > > > ], > > > > > > > > > > > > > Are all those runs from the same boot? > > > > > > no. we reboot machine before each run. > > > > > > > > > > > > for the branch tip a82fd282befc7: > > > > > "unixbench.throughput": [ > > > > > 116675606.7, > > > > > 116840611.2, > > > > > 116738966.0, > > > > > 116956953.1, > > > > > 116704901.9, > > > > > 116997628.3, > > > > > 117141733.7, > > > > > 116660495.4 > > > > > ], > > > > > > > > > > > > > And these run? > > > > > > same. > > > > > > > > > > > Otherwise, we might have a fluctuation that happens at boot time > > > > or at mount time or something. > > > > > > > > > > > > > > let me combine the results from this branch together: > > > > > > > > > > ========================================================================================= > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > > > > > commit: > > > > > v6.10-rc1 > > > > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > > > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > > > > 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > > > > a82fd282befc7 fanotify: report file range info with pre-content > > > > > > > > > > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066 > > > > > ---------------- --------------------------- --------------------------- --------------------------- --------------------------- > > > > > %stddev %change %stddev %change %stddev %change %stddev %change %stddev > > > > > \ | \ | \ | \ | \ > > > > > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput > > > > > > > > > > > > > > > one thing I want to mention is the "%change" is always comparing to the first > > > > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to > > > > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1, > > > > > and so on. > > > > > > > > Thanks for clarifying - I did not read it this way. > > > > > > > > > > > > > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about > > > > > -2.4% regression compareing to 94167e071109d. > > > > > > > > > > from above table, along the branch, the performance is kind of fluctuating, > > > > > dropped most on 64108c0b47db9, but then recovered a little on tip. > > > > > > > > > > > > > I can understand why 64108c0b47db91b would regress performance, but I > > > > cannot think > > > > of any possible explanation why a82fd282befc should improve performance, > > > > so I have to wonder if the regression to -6.6% is not a fluke of some > > > > specific boot/mount? > > > > > > > > I pushed a test branch to > > > > https://github.com/amir73il/linux/commits/fsnotify_for_lkp > > > > with an extra patch that un-inlines some helpers to help bisect the > > > > perf report better. > > > > Maybe produce the report with this commit and it sheds some light. > > > > > > since > > > > > > * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers > > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events > > > * f301cd18006c3 fanotify: rename a misnamed constant > > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event > > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event > > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event > > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec > > > * aca4084213276 fsnotify: generate pre-content permission event on open > > > * 93656e196b006 fsnotify: introduce pre-content permission event > > > * 1613e604df0cd (tag: v6.10-rc1, > > > > > > we run tests upon new commit. summary report is as below: > > > > > > ========================================================================================= > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench > > > > > > commit: > > > v6.10-rc1 > > > a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events > > > 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers > > > > > > v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360 > > > ---------------- --------------------------- --------------------------- > > > %stddev %change %stddev %change %stddev > > > \ | \ | \ > > > 1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput > > >: > > > > > > since Jan mentioned in a later mail that perf profiles are useful, I put details > > > as below > > > > Thanks. > > That clarifies that the cycles are spent in the "optimization code" itself. > > BTW, Amir how did you decide that the time is spent in the "optimization > code"? I've seen in the perf output there are more cache misses, smaller > IPC, but didn't see a particular place where this would be happening... > Oh no I just meant that because there is so much inlined code in the hooks, I couldn't say for sure if the cycles are spent in the optimization code that tries to avoid fsnotify_parent() or also in fsnotify_parent() inline wrapper, so I used the extern fsnotify_path() jump point to break inlining. Maybe this was an unneeded test with obvious outcome, but I began to suspect that the fsnotify_sb_has_priority_watchers() optimization may have a bug. Thanks, Amir. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-07-25 14:04 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-29 8:25 [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression kernel test robot 2024-05-29 11:17 ` Amir Goldstein 2024-05-31 3:15 ` Oliver Sang 2024-05-31 5:18 ` Amir Goldstein 2024-06-03 8:13 ` Oliver Sang 2024-06-04 12:33 ` Amir Goldstein 2024-07-01 7:42 ` Oliver Sang 2024-07-03 5:58 ` Amir Goldstein 2024-07-03 7:21 ` Oliver Sang 2024-07-03 16:20 ` Amir Goldstein 2024-07-04 15:39 ` Jan Kara 2024-07-05 2:09 ` Oliver Sang 2024-07-05 5:48 ` Amir Goldstein 2024-07-08 5:40 ` Oliver Sang 2024-07-08 16:37 ` Amir Goldstein 2024-07-25 13:41 ` Jan Kara 2024-07-25 14:04 ` Amir Goldstein
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.