* [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
@ 2024-05-29 8:25 kernel test robot
2024-05-29 11:17 ` Amir Goldstein
0 siblings, 1 reply; 17+ messages in thread
From: kernel test robot @ 2024-05-29 8:25 UTC (permalink / raw)
To: Amir Goldstein; +Cc: oe-lkp, lkp, oliver.sang
Hello,
kernel test robot noticed a -7.9% regression of unixbench.throughput on:
commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
https://github.com/amir73il/linux sb_write_barrier
testcase: unixbench
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:
runtime: 300s
nr_task: 100%
test: fsbuffer-w
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event")
9d1fd61f1d ("fanotify: pass optional file access range in pre-content event")
00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.23e+08 -7.9% 1.133e+08 unixbench.throughput
6169 -7.7% 5694 unixbench.time.user_time
4.566e+10 -7.9% 4.206e+10 unixbench.workload
1.513e+11 -4.5% 1.445e+11 perf-stat.i.branch-instructions
6891152 +4.8% 7221484 perf-stat.i.branch-misses
29764445 ± 2% -7.4% 27565609 ± 3% perf-stat.i.cache-references
0.91 +2.0% 0.93 perf-stat.i.cpi
7.187e+11 -2.7% 6.996e+11 perf-stat.i.instructions
1.26 -2.6% 1.23 perf-stat.i.ipc
0.00 +0.0 0.01 perf-stat.overall.branch-miss-rate%
0.73 +2.7% 0.75 perf-stat.overall.cpi
1.37 -2.6% 1.34 perf-stat.overall.ipc
5828 +5.7% 6162 perf-stat.overall.path-length
1.505e+11 -4.5% 1.437e+11 perf-stat.ps.branch-instructions
6873687 +4.8% 7203107 perf-stat.ps.branch-misses
29721957 ± 2% -7.3% 27538369 ± 3% perf-stat.ps.cache-references
7.148e+11 -2.6% 6.96e+11 perf-stat.ps.instructions
2.662e+14 -2.6% 2.592e+14 perf-stat.total.instructions
57.79 -2.0 55.78 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.58 -2.0 35.63 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
13.06 -1.0 12.04 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
13.81 -1.0 12.83 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
12.72 -0.9 11.78 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
7.00 -0.5 6.47 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
6.53 -0.5 6.02 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
5.36 -0.5 4.89 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
3.66 -0.4 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
2.68 -0.3 2.36 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
6.57 -0.2 6.34 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
2.36 ± 2% -0.2 2.18 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.83 -0.2 1.66 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
2.92 -0.2 2.76 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
2.65 -0.2 2.49 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
3.95 -0.1 3.83 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
1.62 -0.1 1.50 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
0.74 -0.1 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.26 -0.1 3.17 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
3.57 -0.1 3.49 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.61 -0.1 1.53 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.93 -0.1 0.85 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
1.05 -0.1 0.99 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
0.61 -0.1 0.55 perf-profile.calltrace.cycles-pp.w_test
0.64 -0.1 0.58 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.87 -0.1 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
2.50 -0.1 2.44 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
0.62 -0.1 0.56 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
0.74 -0.0 0.69 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
0.91 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.84 -0.0 0.79 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
0.68 -0.0 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.74 -0.0 0.71 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.97 +0.0 1.00 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
0.91 +0.1 0.97 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
0.86 ± 3% +0.1 0.94 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
0.58 ± 2% +0.1 0.66 ± 7% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
11.24 +0.1 11.36 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
2.01 ± 2% +0.1 2.14 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
6.04 +0.2 6.24 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
5.17 +0.2 5.42 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
96.75 +0.3 97.03 perf-profile.calltrace.cycles-pp.write
2.57 +0.4 2.92 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
3.20 +0.4 3.57 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
84.82 +1.1 85.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
83.38 +1.2 84.56 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
78.73 +1.5 80.20 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
74.54 +1.8 76.32 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.00 +4.0 3.99 perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64
5.32 +4.2 9.48 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
58.42 -2.0 56.38 perf-profile.children.cycles-pp.generic_file_write_iter
38.46 -2.0 36.50 perf-profile.children.cycles-pp.generic_perform_write
13.99 -1.0 13.01 perf-profile.children.cycles-pp.simple_write_begin
13.11 -1.0 12.15 perf-profile.children.cycles-pp.__filemap_get_folio
7.23 -0.6 6.66 perf-profile.children.cycles-pp.entry_SYSCALL_64
7.12 -0.5 6.59 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
6.73 -0.5 6.21 perf-profile.children.cycles-pp.filemap_get_entry
5.76 -0.5 5.26 perf-profile.children.cycles-pp.simple_write_end
4.05 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission
2.93 -0.3 2.59 perf-profile.children.cycles-pp.apparmor_file_permission
4.32 -0.3 4.04 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
4.20 -0.3 3.92 perf-profile.children.cycles-pp.__cond_resched
6.91 -0.2 6.67 perf-profile.children.cycles-pp.file_remove_privs_flags
2.43 -0.2 2.24 perf-profile.children.cycles-pp.rcu_all_qs
3.10 -0.2 2.92 perf-profile.children.cycles-pp.xas_load
2.47 ± 2% -0.2 2.29 ± 2% perf-profile.children.cycles-pp.__fdget_pos
1.92 -0.2 1.74 perf-profile.children.cycles-pp.folio_unlock
3.11 -0.2 2.94 perf-profile.children.cycles-pp.down_write
4.18 -0.1 4.04 perf-profile.children.cycles-pp.security_inode_need_killpriv
1.68 -0.1 1.56 perf-profile.children.cycles-pp.up_write
3.48 -0.1 3.38 perf-profile.children.cycles-pp.cap_inode_need_killpriv
1.96 -0.1 1.87 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.28 -0.1 1.18 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
0.92 -0.1 0.84 perf-profile.children.cycles-pp.w_test
3.14 -0.1 3.06 perf-profile.children.cycles-pp.__vfs_getxattr
1.00 -0.1 0.92 perf-profile.children.cycles-pp.aa_file_perm
1.29 -0.1 1.22 perf-profile.children.cycles-pp.xas_descend
0.76 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call
0.87 -0.1 0.80 perf-profile.children.cycles-pp.setattr_should_drop_suidgid
1.07 -0.1 1.01 perf-profile.children.cycles-pp.xattr_resolve_name
1.10 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable
1.05 -0.1 1.00 perf-profile.children.cycles-pp.folio_mapping
0.73 -0.1 0.67 perf-profile.children.cycles-pp.xas_start
0.93 -0.1 0.88 perf-profile.children.cycles-pp.folio_mark_dirty
0.50 -0.0 0.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.60 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
0.43 -0.0 0.39 perf-profile.children.cycles-pp.write@plt
0.36 -0.0 0.33 perf-profile.children.cycles-pp.amd_clear_divider
0.37 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
0.33 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio
0.36 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode
0.24 -0.0 0.23 ± 2% perf-profile.children.cycles-pp.file_remove_privs
1.18 +0.0 1.21 perf-profile.children.cycles-pp.strcmp
1.02 +0.1 1.08 perf-profile.children.cycles-pp.timestamp_truncate
99.01 +0.1 99.09 perf-profile.children.cycles-pp.write
0.98 ± 3% +0.1 1.06 perf-profile.children.cycles-pp.generic_write_check_limits
0.68 ± 2% +0.1 0.77 ± 6% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
11.58 +0.1 11.69 perf-profile.children.cycles-pp.__generic_file_write_iter
2.36 ± 2% +0.1 2.50 perf-profile.children.cycles-pp.generic_write_checks
5.57 +0.2 5.75 perf-profile.children.cycles-pp.fault_in_readable
6.28 +0.2 6.49 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
2.98 +0.4 3.33 perf-profile.children.cycles-pp.inode_needs_update_time
3.51 +0.4 3.89 perf-profile.children.cycles-pp.file_update_time
85.24 +1.1 86.31 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
84.05 +1.2 85.21 perf-profile.children.cycles-pp.do_syscall_64
79.32 +1.5 80.78 perf-profile.children.cycles-pp.ksys_write
75.49 +1.7 77.21 perf-profile.children.cycles-pp.vfs_write
3.64 +4.0 7.64 perf-profile.children.cycles-pp.__fsnotify_parent
5.68 +4.3 10.03 perf-profile.children.cycles-pp.rw_verify_area
6.96 -0.5 6.44 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
6.52 -0.5 6.01 perf-profile.self.cycles-pp.write
6.92 -0.4 6.48 perf-profile.self.cycles-pp.vfs_write
3.59 -0.3 3.24 perf-profile.self.cycles-pp.filemap_get_entry
4.41 -0.3 4.09 perf-profile.self.cycles-pp.__filemap_get_folio
4.23 -0.3 3.95 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
2.79 -0.3 2.52 perf-profile.self.cycles-pp.simple_write_end
1.76 -0.2 1.52 perf-profile.self.cycles-pp.apparmor_file_permission
2.32 ± 2% -0.2 2.16 ± 2% perf-profile.self.cycles-pp.__fdget_pos
1.79 -0.2 1.62 perf-profile.self.cycles-pp.folio_unlock
2.05 -0.2 1.89 perf-profile.self.cycles-pp.down_write
2.35 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched
1.89 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64
1.38 -0.1 1.26 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.56 -0.1 1.45 perf-profile.self.cycles-pp.up_write
1.30 -0.1 1.19 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.42 -0.1 1.31 perf-profile.self.cycles-pp.rcu_all_qs
1.12 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission
1.46 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
0.90 -0.1 0.83 perf-profile.self.cycles-pp.aa_file_perm
1.29 -0.1 1.22 perf-profile.self.cycles-pp.xas_load
0.74 -0.1 0.67 perf-profile.self.cycles-pp.w_test
1.08 -0.1 1.01 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
1.98 -0.1 1.92 perf-profile.self.cycles-pp.file_remove_privs_flags
1.30 -0.1 1.24 perf-profile.self.cycles-pp.__vfs_getxattr
1.06 -0.1 1.00 perf-profile.self.cycles-pp.xas_descend
0.80 -0.1 0.74 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
0.63 -0.1 0.58 perf-profile.self.cycles-pp.x64_sys_call
0.74 -0.1 0.69 perf-profile.self.cycles-pp.setattr_should_drop_suidgid
0.63 -0.0 0.58 perf-profile.self.cycles-pp.xas_start
0.87 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping
0.50 -0.0 0.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.60 -0.0 0.57 perf-profile.self.cycles-pp.xattr_resolve_name
0.48 -0.0 0.44 perf-profile.self.cycles-pp.folio_mark_dirty
0.68 -0.0 0.65 perf-profile.self.cycles-pp.security_inode_need_killpriv
0.36 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.inode_to_bdi
0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_wait_stable
0.34 -0.0 0.32 perf-profile.self.cycles-pp.cap_inode_need_killpriv
0.89 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin
0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
0.23 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.amd_clear_divider
0.23 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.write@plt
0.24 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.is_bad_inode
0.62 +0.0 0.65 perf-profile.self.cycles-pp.file_update_time
0.86 +0.0 0.90 perf-profile.self.cycles-pp.strcmp
0.69 +0.0 0.74 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
0.75 ± 3% +0.1 0.81 perf-profile.self.cycles-pp.generic_write_check_limits
1.42 ± 2% +0.1 1.48 perf-profile.self.cycles-pp.generic_write_checks
0.82 +0.1 0.89 perf-profile.self.cycles-pp.timestamp_truncate
0.58 ± 3% +0.1 0.66 ± 6% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
5.44 +0.2 5.60 perf-profile.self.cycles-pp.fault_in_readable
1.36 +0.2 1.55 perf-profile.self.cycles-pp.inode_needs_update_time
1.76 ± 3% +0.9 2.64 perf-profile.self.cycles-pp.rw_verify_area
3.46 +3.8 7.25 perf-profile.self.cycles-pp.__fsnotify_parent
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-05-29 8:25 [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression kernel test robot
@ 2024-05-29 11:17 ` Amir Goldstein
2024-05-31 3:15 ` Oliver Sang
0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-05-29 11:17 UTC (permalink / raw)
To: Jan Kara, oe-lkp; +Cc: lkp, kernel test robot
On Wed, May 29, 2024 at 11:26 AM kernel test robot
<oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a -7.9% regression of unixbench.throughput on:
>
>
> commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> https://github.com/amir73il/linux sb_write_barrier
>
Jan,
I speculate that the regression is due to the fact that we store and pass the
path information on struct file_range on the stack before the optimizations
in fsnotify_parent(), so rw_verify_area() pays some price for the stores
and __fsnotify_parent() pays a bigger price for fetches?
Luckily, we already have the way to check
fsnotify_sb_has_priority_watchers(inode->i_sb,
FSNOTIFY_PRIO_PRE_CONTENT))
so now I used it to optimize out the fsnotify_file_range() inline
code entirely.
Oliver,
Can you please re-test with fixed branch (also rebased on v6.10-rc1):
* a82fd282befc - (fan_pre_content) fanotify: report file range info
with pre-content events
* f301cd18006c - fanotify: rename a misnamed constant
* 64108c0b47db - fanotify: pass optional file access range in pre-content event
* 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
* 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
* 83af0c89527a - fsnotify: generate pre-content permission event on exec
* aca408421327 - fsnotify: generate pre-content permission event on open
* 93656e196b00 - fsnotify: introduce pre-content permission event
The optimization was done in the first commit (fsnotify: introduce
pre-content permission event),
but impacts the regressing commit (fanotify: pass optional file access
range in pre-content event).
no need to test all middle commits.
Thanks,
Amir.
> testcase: unixbench
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> parameters:
>
> runtime: 300s
> nr_task: 100%
> test: fsbuffer-w
> cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> 00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event")
> 9d1fd61f1d ("fanotify: pass optional file access range in pre-content event")
>
> 00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 1.23e+08 -7.9% 1.133e+08 unixbench.throughput
> 6169 -7.7% 5694 unixbench.time.user_time
> 4.566e+10 -7.9% 4.206e+10 unixbench.workload
> 1.513e+11 -4.5% 1.445e+11 perf-stat.i.branch-instructions
> 6891152 +4.8% 7221484 perf-stat.i.branch-misses
> 29764445 ± 2% -7.4% 27565609 ± 3% perf-stat.i.cache-references
> 0.91 +2.0% 0.93 perf-stat.i.cpi
> 7.187e+11 -2.7% 6.996e+11 perf-stat.i.instructions
> 1.26 -2.6% 1.23 perf-stat.i.ipc
> 0.00 +0.0 0.01 perf-stat.overall.branch-miss-rate%
> 0.73 +2.7% 0.75 perf-stat.overall.cpi
> 1.37 -2.6% 1.34 perf-stat.overall.ipc
> 5828 +5.7% 6162 perf-stat.overall.path-length
> 1.505e+11 -4.5% 1.437e+11 perf-stat.ps.branch-instructions
> 6873687 +4.8% 7203107 perf-stat.ps.branch-misses
> 29721957 ± 2% -7.3% 27538369 ± 3% perf-stat.ps.cache-references
> 7.148e+11 -2.6% 6.96e+11 perf-stat.ps.instructions
> 2.662e+14 -2.6% 2.592e+14 perf-stat.total.instructions
> 57.79 -2.0 55.78 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 37.58 -2.0 35.63 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 13.06 -1.0 12.04 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> 13.81 -1.0 12.83 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 12.72 -0.9 11.78 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
> 7.00 -0.5 6.47 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 6.53 -0.5 6.02 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 5.36 -0.5 4.89 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 3.66 -0.4 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> 2.68 -0.3 2.36 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
> 6.57 -0.2 6.34 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> 2.36 ± 2% -0.2 2.18 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 1.83 -0.2 1.66 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> 2.92 -0.2 2.76 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 2.65 -0.2 2.49 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
> 3.95 -0.1 3.83 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 1.62 -0.1 1.50 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 0.74 -0.1 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.26 -0.1 3.17 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
> 3.57 -0.1 3.49 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.61 -0.1 1.53 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.93 -0.1 0.85 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 1.05 -0.1 0.99 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> 0.61 -0.1 0.55 perf-profile.calltrace.cycles-pp.w_test
> 0.64 -0.1 0.58 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.87 -0.1 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
> 2.50 -0.1 2.44 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
> 0.62 -0.1 0.56 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> 0.74 -0.0 0.69 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 0.91 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 0.84 -0.0 0.79 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> 0.68 -0.0 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 0.74 -0.0 0.71 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> 0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 0.97 +0.0 1.00 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> 0.91 +0.1 0.97 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> 0.86 ± 3% +0.1 0.94 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
> 0.58 ± 2% +0.1 0.66 ± 7% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> 11.24 +0.1 11.36 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 2.01 ± 2% +0.1 2.14 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 6.04 +0.2 6.24 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 5.17 +0.2 5.42 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
> 96.75 +0.3 97.03 perf-profile.calltrace.cycles-pp.write
> 2.57 +0.4 2.92 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 3.20 +0.4 3.57 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> 84.82 +1.1 85.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
> 83.38 +1.2 84.56 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 78.73 +1.5 80.20 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 74.54 +1.8 76.32 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.00 +4.0 3.99 perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> 5.32 +4.2 9.48 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 58.42 -2.0 56.38 perf-profile.children.cycles-pp.generic_file_write_iter
> 38.46 -2.0 36.50 perf-profile.children.cycles-pp.generic_perform_write
> 13.99 -1.0 13.01 perf-profile.children.cycles-pp.simple_write_begin
> 13.11 -1.0 12.15 perf-profile.children.cycles-pp.__filemap_get_folio
> 7.23 -0.6 6.66 perf-profile.children.cycles-pp.entry_SYSCALL_64
> 7.12 -0.5 6.59 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
> 6.73 -0.5 6.21 perf-profile.children.cycles-pp.filemap_get_entry
> 5.76 -0.5 5.26 perf-profile.children.cycles-pp.simple_write_end
> 4.05 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission
> 2.93 -0.3 2.59 perf-profile.children.cycles-pp.apparmor_file_permission
> 4.32 -0.3 4.04 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 4.20 -0.3 3.92 perf-profile.children.cycles-pp.__cond_resched
> 6.91 -0.2 6.67 perf-profile.children.cycles-pp.file_remove_privs_flags
> 2.43 -0.2 2.24 perf-profile.children.cycles-pp.rcu_all_qs
> 3.10 -0.2 2.92 perf-profile.children.cycles-pp.xas_load
> 2.47 ± 2% -0.2 2.29 ± 2% perf-profile.children.cycles-pp.__fdget_pos
> 1.92 -0.2 1.74 perf-profile.children.cycles-pp.folio_unlock
> 3.11 -0.2 2.94 perf-profile.children.cycles-pp.down_write
> 4.18 -0.1 4.04 perf-profile.children.cycles-pp.security_inode_need_killpriv
> 1.68 -0.1 1.56 perf-profile.children.cycles-pp.up_write
> 3.48 -0.1 3.38 perf-profile.children.cycles-pp.cap_inode_need_killpriv
> 1.96 -0.1 1.87 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 1.28 -0.1 1.18 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
> 0.92 -0.1 0.84 perf-profile.children.cycles-pp.w_test
> 3.14 -0.1 3.06 perf-profile.children.cycles-pp.__vfs_getxattr
> 1.00 -0.1 0.92 perf-profile.children.cycles-pp.aa_file_perm
> 1.29 -0.1 1.22 perf-profile.children.cycles-pp.xas_descend
> 0.76 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call
> 0.87 -0.1 0.80 perf-profile.children.cycles-pp.setattr_should_drop_suidgid
> 1.07 -0.1 1.01 perf-profile.children.cycles-pp.xattr_resolve_name
> 1.10 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable
> 1.05 -0.1 1.00 perf-profile.children.cycles-pp.folio_mapping
> 0.73 -0.1 0.67 perf-profile.children.cycles-pp.xas_start
> 0.93 -0.1 0.88 perf-profile.children.cycles-pp.folio_mark_dirty
> 0.50 -0.0 0.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.60 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
> 0.43 -0.0 0.39 perf-profile.children.cycles-pp.write@plt
> 0.36 -0.0 0.33 perf-profile.children.cycles-pp.amd_clear_divider
> 0.37 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
> 0.33 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio
> 0.36 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode
> 0.24 -0.0 0.23 ± 2% perf-profile.children.cycles-pp.file_remove_privs
> 1.18 +0.0 1.21 perf-profile.children.cycles-pp.strcmp
> 1.02 +0.1 1.08 perf-profile.children.cycles-pp.timestamp_truncate
> 99.01 +0.1 99.09 perf-profile.children.cycles-pp.write
> 0.98 ± 3% +0.1 1.06 perf-profile.children.cycles-pp.generic_write_check_limits
> 0.68 ± 2% +0.1 0.77 ± 6% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> 11.58 +0.1 11.69 perf-profile.children.cycles-pp.__generic_file_write_iter
> 2.36 ± 2% +0.1 2.50 perf-profile.children.cycles-pp.generic_write_checks
> 5.57 +0.2 5.75 perf-profile.children.cycles-pp.fault_in_readable
> 6.28 +0.2 6.49 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
> 2.98 +0.4 3.33 perf-profile.children.cycles-pp.inode_needs_update_time
> 3.51 +0.4 3.89 perf-profile.children.cycles-pp.file_update_time
> 85.24 +1.1 86.31 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 84.05 +1.2 85.21 perf-profile.children.cycles-pp.do_syscall_64
> 79.32 +1.5 80.78 perf-profile.children.cycles-pp.ksys_write
> 75.49 +1.7 77.21 perf-profile.children.cycles-pp.vfs_write
> 3.64 +4.0 7.64 perf-profile.children.cycles-pp.__fsnotify_parent
> 5.68 +4.3 10.03 perf-profile.children.cycles-pp.rw_verify_area
> 6.96 -0.5 6.44 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
> 6.52 -0.5 6.01 perf-profile.self.cycles-pp.write
> 6.92 -0.4 6.48 perf-profile.self.cycles-pp.vfs_write
> 3.59 -0.3 3.24 perf-profile.self.cycles-pp.filemap_get_entry
> 4.41 -0.3 4.09 perf-profile.self.cycles-pp.__filemap_get_folio
> 4.23 -0.3 3.95 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 2.79 -0.3 2.52 perf-profile.self.cycles-pp.simple_write_end
> 1.76 -0.2 1.52 perf-profile.self.cycles-pp.apparmor_file_permission
> 2.32 ± 2% -0.2 2.16 ± 2% perf-profile.self.cycles-pp.__fdget_pos
> 1.79 -0.2 1.62 perf-profile.self.cycles-pp.folio_unlock
> 2.05 -0.2 1.89 perf-profile.self.cycles-pp.down_write
> 2.35 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched
> 1.89 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64
> 1.38 -0.1 1.26 perf-profile.self.cycles-pp.entry_SYSCALL_64
> 1.56 -0.1 1.45 perf-profile.self.cycles-pp.up_write
> 1.30 -0.1 1.19 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 1.42 -0.1 1.31 perf-profile.self.cycles-pp.rcu_all_qs
> 1.12 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission
> 1.46 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
> 0.90 -0.1 0.83 perf-profile.self.cycles-pp.aa_file_perm
> 1.29 -0.1 1.22 perf-profile.self.cycles-pp.xas_load
> 0.74 -0.1 0.67 perf-profile.self.cycles-pp.w_test
> 1.08 -0.1 1.01 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> 1.98 -0.1 1.92 perf-profile.self.cycles-pp.file_remove_privs_flags
> 1.30 -0.1 1.24 perf-profile.self.cycles-pp.__vfs_getxattr
> 1.06 -0.1 1.00 perf-profile.self.cycles-pp.xas_descend
> 0.80 -0.1 0.74 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
> 0.63 -0.1 0.58 perf-profile.self.cycles-pp.x64_sys_call
> 0.74 -0.1 0.69 perf-profile.self.cycles-pp.setattr_should_drop_suidgid
> 0.63 -0.0 0.58 perf-profile.self.cycles-pp.xas_start
> 0.87 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping
> 0.50 -0.0 0.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.60 -0.0 0.57 perf-profile.self.cycles-pp.xattr_resolve_name
> 0.48 -0.0 0.44 perf-profile.self.cycles-pp.folio_mark_dirty
> 0.68 -0.0 0.65 perf-profile.self.cycles-pp.security_inode_need_killpriv
> 0.36 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.inode_to_bdi
> 0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_wait_stable
> 0.34 -0.0 0.32 perf-profile.self.cycles-pp.cap_inode_need_killpriv
> 0.89 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin
> 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
> 0.23 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.amd_clear_divider
> 0.23 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
> 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.write@plt
> 0.24 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.is_bad_inode
> 0.62 +0.0 0.65 perf-profile.self.cycles-pp.file_update_time
> 0.86 +0.0 0.90 perf-profile.self.cycles-pp.strcmp
> 0.69 +0.0 0.74 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
> 0.75 ± 3% +0.1 0.81 perf-profile.self.cycles-pp.generic_write_check_limits
> 1.42 ± 2% +0.1 1.48 perf-profile.self.cycles-pp.generic_write_checks
> 0.82 +0.1 0.89 perf-profile.self.cycles-pp.timestamp_truncate
> 0.58 ± 3% +0.1 0.66 ± 6% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> 5.44 +0.2 5.60 perf-profile.self.cycles-pp.fault_in_readable
> 1.36 +0.2 1.55 perf-profile.self.cycles-pp.inode_needs_update_time
> 1.76 ± 3% +0.9 2.64 perf-profile.self.cycles-pp.rw_verify_area
> 3.46 +3.8 7.25 perf-profile.self.cycles-pp.__fsnotify_parent
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-05-29 11:17 ` Amir Goldstein
@ 2024-05-31 3:15 ` Oliver Sang
2024-05-31 5:18 ` Amir Goldstein
0 siblings, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-05-31 3:15 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang
hi, Amir,
On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> On Wed, May 29, 2024 at 11:26 AM kernel test robot
> <oliver.sang@intel.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> >
> >
> > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > https://github.com/amir73il/linux sb_write_barrier
> >
>
> Jan,
>
> I speculate that the regression is due to the fact that we store and pass the
> path information on struct file_range on the stack before the optimizations
> in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> and __fsnotify_parent() pays a bigger price for fetches?
>
> Luckily, we already have the way to check
> fsnotify_sb_has_priority_watchers(inode->i_sb,
> FSNOTIFY_PRIO_PRE_CONTENT))
> so now I used it to optimize out the fsnotify_file_range() inline
> code entirely.
>
> Oliver,
>
> Can you please re-test with fixed branch (also rebased on v6.10-rc1):
>
> * a82fd282befc - (fan_pre_content) fanotify: report file range info
> with pre-content events
> * f301cd18006c - fanotify: rename a misnamed constant
> * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> * aca408421327 - fsnotify: generate pre-content permission event on open
> * 93656e196b00 - fsnotify: introduce pre-content permission event
>
> The optimization was done in the first commit (fsnotify: introduce
> pre-content permission event),
> but impacts the regressing commit (fanotify: pass optional file access
> range in pre-content event).
> no need to test all middle commits.
I directly compare the tip with v6.10-rc1, still a regression but better now
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
v6.10-rc1
a82fd282befc7 ("fanotify: report file range info with pre-content events")
v6.10-rc1 a82fd282befc71d99106bf31066
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.216e+08 -3.9% 1.168e+08 unixbench.throughput
full data is as below [1]
then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
it also has a small regression comparing to its parent, but better also.
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
94167e071109d573 64108c0b47db91b20d658a89969
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.163e+08 -2.4% 1.135e+08 unixbench.throughput
full data is as below [2]
[1]
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
v6.10-rc1
a82fd282befc7 ("fanotify: report file range info with pre-content events")
v6.10-rc1 a82fd282befc71d99106bf31066
---------------- ---------------------------
%stddev %change %stddev
\ | \
1614 ± 6% +252.4% 5688 ± 67% numa-vmstat.node1.nr_mapped
6199 -5.8% 5841 time.user_time
220234 ± 13% +121.4% 487546 ± 41% numa-meminfo.node0.AnonPages.max
836146 ± 6% -36.0% 535267 ± 45% numa-meminfo.node1.AnonPages.max
6233 ± 7% +251.3% 21898 ± 69% numa-meminfo.node1.Mapped
1.216e+08 -3.9% 1.168e+08 unixbench.throughput
6199 -5.8% 5841 unixbench.time.user_time
4.513e+10 -3.9% 4.338e+10 unixbench.workload
1.458e+11 -2.7% 1.419e+11 perf-stat.i.branch-instructions
11.47 ± 6% +2.6 14.10 ± 9% perf-stat.i.cache-miss-rate%
3915539 ± 8% +510.0% 23884093 ± 9% perf-stat.i.cache-misses
32425619 ± 3% +396.4% 1.61e+08 ± 4% perf-stat.i.cache-references
151202 ± 16% -78.6% 32364 ± 56% perf-stat.i.cycles-between-cache-misses
6.961e+11 -1.9% 6.828e+11 perf-stat.i.instructions
1.22 -1.3% 1.20 perf-stat.i.ipc
0.01 ± 9% +519.5% 0.04 ± 10% perf-stat.overall.MPKI
0.01 +0.0 0.01 perf-stat.overall.branch-miss-rate%
12.09 ± 6% +2.8 14.86 ± 8% perf-stat.overall.cache-miss-rate%
0.75 +2.0% 0.77 perf-stat.overall.cpi
133775 ± 8% -83.5% 22060 ± 9% perf-stat.overall.cycles-between-cache-misses
1.33 -1.9% 1.31 perf-stat.overall.ipc
5721 +2.0% 5836 perf-stat.overall.path-length
1.452e+11 -2.7% 1.413e+11 perf-stat.ps.branch-instructions
3921138 ± 8% +507.4% 23818053 ± 9% perf-stat.ps.cache-misses
32415461 ± 3% +394.4% 1.603e+08 ± 4% perf-stat.ps.cache-references
6.932e+11 -1.9% 6.797e+11 perf-stat.ps.instructions
2.582e+14 -1.9% 2.532e+14 perf-stat.total.instructions
13.19 -0.7 12.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
7.01 -0.2 6.80 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
1.11 -0.2 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
2.50 -0.1 2.35 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
1.68 -0.1 1.59 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
3.73 -0.1 3.64 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.62 -0.1 1.55 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.18 -0.1 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
0.65 -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.w_test
0.92 -0.0 0.87 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.70 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
0.86 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
0.92 -0.0 0.88 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.63 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.86 -0.0 0.83 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
3.53 -0.0 3.50 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.53 -0.0 0.51 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write
0.72 -0.0 0.71 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.75 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
1.13 +0.0 1.17 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
5.30 +0.1 5.36 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
5.30 +0.1 5.38 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
6.17 +0.1 6.27 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
96.84 +0.1 96.98 perf-profile.calltrace.cycles-pp.write
0.78 ± 2% +0.3 1.13 ± 5% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
2.97 +0.6 3.57 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
12.01 +0.6 12.62 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
3.63 +0.6 4.24 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
4.32 +0.6 4.96 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.28 +0.8 38.12 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
84.26 +1.0 85.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
13.39 +1.0 14.36 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
12.30 +1.0 13.30 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
82.83 +1.0 83.86 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
57.94 +1.3 59.20 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.99 +1.3 7.25 ± 3% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
78.13 +1.3 79.41 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
74.26 +1.3 75.59 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
7.43 -0.4 7.06 perf-profile.children.cycles-pp.entry_SYSCALL_64
4.42 -0.2 4.18 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.21 -0.2 1.00 perf-profile.children.cycles-pp.syscall_return_via_sysret
7.14 -0.2 6.94 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
4.18 -0.2 4.00 perf-profile.children.cycles-pp.__cond_resched
2.74 -0.2 2.58 perf-profile.children.cycles-pp.apparmor_file_permission
2.42 -0.1 2.30 perf-profile.children.cycles-pp.rcu_all_qs
3.82 -0.1 3.72 perf-profile.children.cycles-pp.__fsnotify_parent
1.74 -0.1 1.65 perf-profile.children.cycles-pp.up_write
1.99 -0.1 1.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.99 -0.1 0.91 perf-profile.children.cycles-pp.w_test
3.71 -0.1 3.64 perf-profile.children.cycles-pp.security_file_permission
2.47 -0.1 2.41 perf-profile.children.cycles-pp.xas_load
1.12 -0.1 1.06 perf-profile.children.cycles-pp.folio_wait_stable
1.26 -0.1 1.21 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
0.75 -0.0 0.71 perf-profile.children.cycles-pp.x64_sys_call
0.98 -0.0 0.94 perf-profile.children.cycles-pp.aa_file_perm
0.46 -0.0 0.42 perf-profile.children.cycles-pp.write@plt
1.10 -0.0 1.07 perf-profile.children.cycles-pp.xattr_resolve_name
0.36 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider
3.76 -0.0 3.73 perf-profile.children.cycles-pp.cap_inode_need_killpriv
0.59 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
3.41 -0.0 3.38 perf-profile.children.cycles-pp.__vfs_getxattr
0.56 -0.0 0.53 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
1.05 -0.0 1.03 perf-profile.children.cycles-pp.folio_mapping
0.38 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
0.25 -0.0 0.24 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
0.36 -0.0 0.35 perf-profile.children.cycles-pp.is_bad_inode
1.38 +0.0 1.40 perf-profile.children.cycles-pp.strcmp
0.93 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty
1.07 +0.0 1.09 perf-profile.children.cycles-pp.timestamp_truncate
5.70 +0.0 5.75 perf-profile.children.cycles-pp.simple_write_end
98.96 +0.1 99.02 perf-profile.children.cycles-pp.write
5.69 +0.1 5.75 perf-profile.children.cycles-pp.fault_in_readable
6.42 +0.1 6.53 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
0.89 +0.3 1.24 ± 4% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
3.39 +0.6 3.97 perf-profile.children.cycles-pp.inode_needs_update_time
12.35 +0.6 12.96 perf-profile.children.cycles-pp.__generic_file_write_iter
3.96 +0.6 4.57 perf-profile.children.cycles-pp.file_update_time
4.56 +0.8 5.33 perf-profile.children.cycles-pp.rw_verify_area
38.16 +0.8 39.01 perf-profile.children.cycles-pp.generic_perform_write
13.58 +1.0 14.54 perf-profile.children.cycles-pp.simple_write_begin
84.67 +1.0 85.63 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
12.68 +1.0 13.68 perf-profile.children.cycles-pp.__filemap_get_folio
83.50 +1.0 84.51 perf-profile.children.cycles-pp.do_syscall_64
58.52 +1.3 59.78 perf-profile.children.cycles-pp.generic_file_write_iter
6.18 +1.3 7.44 ± 3% perf-profile.children.cycles-pp.filemap_get_entry
78.74 +1.3 80.00 perf-profile.children.cycles-pp.ksys_write
75.13 +1.3 76.42 perf-profile.children.cycles-pp.vfs_write
7.25 -0.6 6.64 perf-profile.self.cycles-pp.vfs_write
6.45 -0.4 6.08 perf-profile.self.cycles-pp.write
4.32 -0.2 4.08 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
1.21 -0.2 1.00 perf-profile.self.cycles-pp.syscall_return_via_sysret
6.98 -0.2 6.78 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
4.52 -0.2 4.36 perf-profile.self.cycles-pp.__filemap_get_folio
2.34 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched
1.90 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64
1.60 -0.1 1.50 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission
1.47 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
1.62 -0.1 1.53 perf-profile.self.cycles-pp.up_write
3.65 -0.1 3.56 perf-profile.self.cycles-pp.__fsnotify_parent
1.09 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
1.80 -0.1 1.74 perf-profile.self.cycles-pp.xas_load
0.79 -0.1 0.73 perf-profile.self.cycles-pp.w_test
1.10 -0.1 1.04 perf-profile.self.cycles-pp.security_file_permission
1.25 -0.1 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.66 -0.1 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.41 -0.0 1.36 perf-profile.self.cycles-pp.rcu_all_qs
0.90 -0.0 0.86 perf-profile.self.cycles-pp.simple_write_begin
0.88 -0.0 0.84 perf-profile.self.cycles-pp.aa_file_perm
0.80 -0.0 0.76 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
0.62 -0.0 0.59 perf-profile.self.cycles-pp.x64_sys_call
1.39 -0.0 1.36 perf-profile.self.cycles-pp.__vfs_getxattr
0.53 -0.0 0.51 perf-profile.self.cycles-pp.folio_wait_stable
0.87 -0.0 0.85 perf-profile.self.cycles-pp.folio_mapping
0.56 -0.0 0.53 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.24 -0.0 0.22 perf-profile.self.cycles-pp.amd_clear_divider
0.12 ± 3% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.write@plt
0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
0.35 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi
0.22 -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
0.66 +0.0 0.69 perf-profile.self.cycles-pp.file_update_time
1.03 +0.0 1.06 perf-profile.self.cycles-pp.strcmp
2.75 +0.0 2.79 perf-profile.self.cycles-pp.simple_write_end
0.72 +0.0 0.77 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
0.87 +0.1 0.92 perf-profile.self.cycles-pp.timestamp_truncate
5.54 +0.1 5.59 perf-profile.self.cycles-pp.fault_in_readable
2.04 +0.1 2.09 perf-profile.self.cycles-pp.file_remove_privs_flags
1.51 +0.2 1.69 perf-profile.self.cycles-pp.inode_needs_update_time
0.78 ± 2% +0.3 1.11 ± 5% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
0.84 +1.0 1.82 perf-profile.self.cycles-pp.rw_verify_area
3.66 +1.3 4.97 ± 4% perf-profile.self.cycles-pp.filemap_get_entry
[2]
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
94167e071109d573 64108c0b47db91b20d658a89969
---------------- ---------------------------
%stddev %change %stddev
\ | \
38903 ±113% +313.8% 160973 ± 66% numa-meminfo.node1.AnonHugePages
1666466 ± 4% -12.2% 1462703 ± 9% numa-numastat.node1.local_node
18.97 ±113% +314.3% 78.59 ± 66% numa-vmstat.node1.nr_anon_transparent_hugepages
6003 -5.6% 5668 time.user_time
1.163e+08 -2.4% 1.135e+08 unixbench.throughput
6003 -5.6% 5668 unixbench.time.user_time
4.314e+10 -2.3% 4.215e+10 unixbench.workload
-12.17 +33.7% -16.26 sched_debug.cpu.nr_uninterruptible.min
0.00 ± 95% +600.3% 0.00 ± 88% sched_debug.rt_rq:.rt_time.avg
0.02 ± 95% +600.3% 0.14 ± 88% sched_debug.rt_rq:.rt_time.max
0.00 ± 95% +600.3% 0.01 ± 88% sched_debug.rt_rq:.rt_time.stddev
1.407e+11 -2.0% 1.379e+11 perf-stat.i.branch-instructions
0.55 -0.0 0.51 ± 4% perf-stat.i.branch-miss-rate%
55780077 -85.5% 8078438 perf-stat.i.branch-misses
5029827 ± 6% +315.5% 20897838 ± 10% perf-stat.i.cache-misses
35311245 ± 2% +328.2% 1.512e+08 ± 6% perf-stat.i.cache-references
118639 ± 18% -61.7% 45421 ± 41% perf-stat.i.cycles-between-cache-misses
6.736e+11 -1.5% 6.634e+11 perf-stat.i.instructions
0.01 ± 6% +321.2% 0.03 ± 10% perf-stat.overall.MPKI
0.04 -0.0 0.01 perf-stat.overall.branch-miss-rate%
0.78 +1.5% 0.79 perf-stat.overall.cpi
103942 ± 6% -75.7% 25208 ± 10% perf-stat.overall.cycles-between-cache-misses
1.29 -1.5% 1.27 perf-stat.overall.ipc
1.4e+11 -1.9% 1.373e+11 perf-stat.ps.branch-instructions
55517704 -85.5% 8057745 perf-stat.ps.branch-misses
5026889 ± 6% +315.3% 20876882 ± 10% perf-stat.ps.cache-misses
35229110 ± 2% +327.6% 1.506e+08 ± 6% perf-stat.ps.cache-references
6.701e+11 -1.4% 6.608e+11 perf-stat.ps.instructions
2.496e+14 -1.4% 2.46e+14 perf-stat.total.instructions
3.61 -0.5 3.09 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
2.66 -0.5 2.18 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
12.62 -0.5 12.16 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
7.29 -0.3 7.03 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
4.98 -0.2 4.74 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.96 -0.2 6.74 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
4.50 -0.2 4.33 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
3.74 -0.2 3.58 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
12.82 -0.2 12.66 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
1.04 -0.1 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
2.87 -0.1 2.78 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
5.27 -0.1 5.18 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.81 -0.1 0.75 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
2.17 -0.0 2.13 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
1.66 -0.0 1.62 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
0.74 -0.0 0.70 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
1.27 -0.0 1.23 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
0.84 -0.0 0.81 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
0.61 -0.0 0.57 perf-profile.calltrace.cycles-pp.w_test
0.88 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.90 -0.0 0.87 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.61 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.71 -0.0 0.69 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.00 ± 4% +0.1 1.10 ± 3% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
2.68 +0.1 2.79 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
5.18 +0.1 5.31 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
6.00 +0.1 6.14 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
3.35 +0.1 3.48 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
4.00 +0.1 4.14 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
96.95 +0.2 97.12 perf-profile.calltrace.cycles-pp.write
3.68 +0.2 3.86 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
84.87 +0.6 85.45 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
83.48 +0.6 84.10 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
78.96 +0.7 79.70 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
75.00 +0.8 75.81 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
58.04 +1.1 59.14 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.40 +1.2 37.55 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
12.88 +1.3 14.20 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
11.79 +1.3 13.13 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
5.74 +1.4 7.16 ± 2% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
3.98 -0.5 3.44 perf-profile.children.cycles-pp.security_file_permission
2.90 -0.5 2.40 perf-profile.children.cycles-pp.apparmor_file_permission
7.66 -0.3 7.38 perf-profile.children.cycles-pp.file_remove_privs_flags
7.14 -0.3 6.89 perf-profile.children.cycles-pp.entry_SYSCALL_64
5.34 -0.2 5.10 perf-profile.children.cycles-pp.rw_verify_area
7.10 -0.2 6.88 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
3.98 -0.2 3.80 perf-profile.children.cycles-pp.cap_inode_need_killpriv
4.73 -0.2 4.56 perf-profile.children.cycles-pp.security_inode_need_killpriv
13.16 -0.2 13.00 perf-profile.children.cycles-pp.__generic_file_write_iter
1.14 -0.1 1.01 perf-profile.children.cycles-pp.syscall_return_via_sysret
5.67 -0.1 5.56 perf-profile.children.cycles-pp.simple_write_end
3.56 -0.1 3.46 perf-profile.children.cycles-pp.__vfs_getxattr
4.04 -0.1 3.95 perf-profile.children.cycles-pp.__cond_resched
4.21 -0.1 4.14 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
2.31 -0.1 2.24 perf-profile.children.cycles-pp.rcu_all_qs
0.99 -0.1 0.93 perf-profile.children.cycles-pp.folio_mark_dirty
2.46 -0.0 2.42 perf-profile.children.cycles-pp.xas_load
0.93 -0.0 0.88 ± 2% perf-profile.children.cycles-pp.w_test
1.50 -0.0 1.46 perf-profile.children.cycles-pp.strcmp
1.72 -0.0 1.68 perf-profile.children.cycles-pp.up_write
0.87 -0.0 0.82 perf-profile.children.cycles-pp.setattr_should_drop_suidgid
1.04 -0.0 1.00 perf-profile.children.cycles-pp.folio_mapping
0.96 -0.0 0.92 perf-profile.children.cycles-pp.aa_file_perm
1.23 -0.0 1.20 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
0.73 -0.0 0.70 perf-profile.children.cycles-pp.x64_sys_call
1.07 -0.0 1.05 perf-profile.children.cycles-pp.folio_wait_stable
0.43 -0.0 0.41 ± 2% perf-profile.children.cycles-pp.write@plt
1.08 -0.0 1.06 perf-profile.children.cycles-pp.xattr_resolve_name
0.35 -0.0 0.34 perf-profile.children.cycles-pp.__x64_sys_write
99.01 +0.0 99.04 perf-profile.children.cycles-pp.write
1.12 ± 4% +0.1 1.20 ± 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
2.86 +0.1 2.97 perf-profile.children.cycles-pp.down_write
5.50 +0.1 5.63 perf-profile.children.cycles-pp.fault_in_readable
3.75 +0.1 3.88 perf-profile.children.cycles-pp.inode_needs_update_time
4.34 +0.1 4.47 perf-profile.children.cycles-pp.file_update_time
6.25 +0.1 6.39 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
3.77 +0.2 3.96 perf-profile.children.cycles-pp.__fsnotify_parent
85.29 +0.6 85.86 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
84.14 +0.6 84.74 perf-profile.children.cycles-pp.do_syscall_64
79.57 +0.7 80.28 perf-profile.children.cycles-pp.ksys_write
75.84 +0.8 76.64 perf-profile.children.cycles-pp.vfs_write
58.64 +1.1 59.72 perf-profile.children.cycles-pp.generic_file_write_iter
37.30 +1.1 38.43 perf-profile.children.cycles-pp.generic_perform_write
13.05 +1.3 14.38 perf-profile.children.cycles-pp.simple_write_begin
12.18 +1.3 13.52 perf-profile.children.cycles-pp.__filemap_get_folio
5.94 +1.4 7.35 ± 2% perf-profile.children.cycles-pp.filemap_get_entry
1.77 -0.4 1.35 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission
6.23 -0.3 5.94 perf-profile.self.cycles-pp.write
6.94 -0.2 6.71 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
7.14 -0.2 6.93 perf-profile.self.cycles-pp.vfs_write
1.13 -0.1 1.01 perf-profile.self.cycles-pp.syscall_return_via_sysret
1.49 ± 3% -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
4.12 -0.1 4.03 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.42 -0.1 0.34 perf-profile.self.cycles-pp.cap_inode_need_killpriv
2.17 -0.1 2.10 perf-profile.self.cycles-pp.file_remove_privs_flags
1.08 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission
1.83 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64
2.74 -0.1 2.68 perf-profile.self.cycles-pp.simple_write_end
0.86 -0.0 0.81 perf-profile.self.cycles-pp.aa_file_perm
1.60 -0.0 1.56 perf-profile.self.cycles-pp.up_write
1.42 -0.0 1.38 perf-profile.self.cycles-pp.__vfs_getxattr
0.86 -0.0 0.82 perf-profile.self.cycles-pp.folio_mapping
1.36 -0.0 1.32 perf-profile.self.cycles-pp.rcu_all_qs
0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_mark_dirty
1.78 -0.0 1.75 perf-profile.self.cycles-pp.xas_load
1.23 -0.0 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.74 -0.0 0.71 perf-profile.self.cycles-pp.setattr_should_drop_suidgid
0.74 -0.0 0.71 ± 2% perf-profile.self.cycles-pp.w_test
1.14 -0.0 1.11 perf-profile.self.cycles-pp.strcmp
2.25 -0.0 2.22 perf-profile.self.cycles-pp.__cond_resched
0.60 -0.0 0.58 perf-profile.self.cycles-pp.x64_sys_call
0.77 -0.0 0.75 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
0.61 -0.0 0.60 perf-profile.self.cycles-pp.xattr_resolve_name
0.74 +0.0 0.76 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
1.40 +0.1 1.45 perf-profile.self.cycles-pp.generic_write_checks
1.60 +0.1 1.65 perf-profile.self.cycles-pp.inode_needs_update_time
1.00 ± 4% +0.1 1.08 ± 3% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
1.86 ± 2% +0.1 1.98 perf-profile.self.cycles-pp.down_write
5.34 +0.1 5.47 perf-profile.self.cycles-pp.fault_in_readable
3.61 +0.2 3.80 perf-profile.self.cycles-pp.__fsnotify_parent
1.46 +0.3 1.77 perf-profile.self.cycles-pp.rw_verify_area
3.43 +1.4 4.88 ± 3% perf-profile.self.cycles-pp.filemap_get_entry
>
> Thanks,
> Amir.
>
>
>
>
> > testcase: unixbench
> > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> > parameters:
> >
> > runtime: 300s
> > nr_task: 100%
> > test: fsbuffer-w
> > cpufreq_governor: performance
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> > 00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > 9d1fd61f1d ("fanotify: pass optional file access range in pre-content event")
> >
> > 00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 1.23e+08 -7.9% 1.133e+08 unixbench.throughput
> > 6169 -7.7% 5694 unixbench.time.user_time
> > 4.566e+10 -7.9% 4.206e+10 unixbench.workload
> > 1.513e+11 -4.5% 1.445e+11 perf-stat.i.branch-instructions
> > 6891152 +4.8% 7221484 perf-stat.i.branch-misses
> > 29764445 ± 2% -7.4% 27565609 ± 3% perf-stat.i.cache-references
> > 0.91 +2.0% 0.93 perf-stat.i.cpi
> > 7.187e+11 -2.7% 6.996e+11 perf-stat.i.instructions
> > 1.26 -2.6% 1.23 perf-stat.i.ipc
> > 0.00 +0.0 0.01 perf-stat.overall.branch-miss-rate%
> > 0.73 +2.7% 0.75 perf-stat.overall.cpi
> > 1.37 -2.6% 1.34 perf-stat.overall.ipc
> > 5828 +5.7% 6162 perf-stat.overall.path-length
> > 1.505e+11 -4.5% 1.437e+11 perf-stat.ps.branch-instructions
> > 6873687 +4.8% 7203107 perf-stat.ps.branch-misses
> > 29721957 ± 2% -7.3% 27538369 ± 3% perf-stat.ps.cache-references
> > 7.148e+11 -2.6% 6.96e+11 perf-stat.ps.instructions
> > 2.662e+14 -2.6% 2.592e+14 perf-stat.total.instructions
> > 57.79 -2.0 55.78 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 37.58 -2.0 35.63 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > 13.06 -1.0 12.04 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> > 13.81 -1.0 12.83 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > 12.72 -0.9 11.78 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
> > 7.00 -0.5 6.47 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > 6.53 -0.5 6.02 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> > 5.36 -0.5 4.89 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > 3.66 -0.4 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> > 2.68 -0.3 2.36 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
> > 6.57 -0.2 6.34 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> > 2.36 ± 2% -0.2 2.18 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > 1.83 -0.2 1.66 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> > 2.92 -0.2 2.76 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > 2.65 -0.2 2.49 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
> > 3.95 -0.1 3.83 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> > 1.62 -0.1 1.50 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > 0.74 -0.1 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 3.26 -0.1 3.17 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
> > 3.57 -0.1 3.49 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 1.61 -0.1 1.53 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > 0.93 -0.1 0.85 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > 1.05 -0.1 0.99 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> > 0.61 -0.1 0.55 perf-profile.calltrace.cycles-pp.w_test
> > 0.64 -0.1 0.58 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > 0.87 -0.1 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
> > 2.50 -0.1 2.44 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
> > 0.62 -0.1 0.56 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> > 0.74 -0.0 0.69 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> > 0.91 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> > 0.84 -0.0 0.79 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> > 0.68 -0.0 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> > 0.74 -0.0 0.71 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> > 0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > 0.97 +0.0 1.00 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> > 0.91 +0.1 0.97 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> > 0.86 ± 3% +0.1 0.94 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
> > 0.58 ± 2% +0.1 0.66 ± 7% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> > 11.24 +0.1 11.36 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > 2.01 ± 2% +0.1 2.14 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > 6.04 +0.2 6.24 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > 5.17 +0.2 5.42 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
> > 96.75 +0.3 97.03 perf-profile.calltrace.cycles-pp.write
> > 2.57 +0.4 2.92 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
> > 3.20 +0.4 3.57 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> > 84.82 +1.1 85.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
> > 83.38 +1.2 84.56 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > 78.73 +1.5 80.20 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > 74.54 +1.8 76.32 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > 0.00 +4.0 3.99 perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> > 5.32 +4.2 9.48 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 58.42 -2.0 56.38 perf-profile.children.cycles-pp.generic_file_write_iter
> > 38.46 -2.0 36.50 perf-profile.children.cycles-pp.generic_perform_write
> > 13.99 -1.0 13.01 perf-profile.children.cycles-pp.simple_write_begin
> > 13.11 -1.0 12.15 perf-profile.children.cycles-pp.__filemap_get_folio
> > 7.23 -0.6 6.66 perf-profile.children.cycles-pp.entry_SYSCALL_64
> > 7.12 -0.5 6.59 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
> > 6.73 -0.5 6.21 perf-profile.children.cycles-pp.filemap_get_entry
> > 5.76 -0.5 5.26 perf-profile.children.cycles-pp.simple_write_end
> > 4.05 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission
> > 2.93 -0.3 2.59 perf-profile.children.cycles-pp.apparmor_file_permission
> > 4.32 -0.3 4.04 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> > 4.20 -0.3 3.92 perf-profile.children.cycles-pp.__cond_resched
> > 6.91 -0.2 6.67 perf-profile.children.cycles-pp.file_remove_privs_flags
> > 2.43 -0.2 2.24 perf-profile.children.cycles-pp.rcu_all_qs
> > 3.10 -0.2 2.92 perf-profile.children.cycles-pp.xas_load
> > 2.47 ± 2% -0.2 2.29 ± 2% perf-profile.children.cycles-pp.__fdget_pos
> > 1.92 -0.2 1.74 perf-profile.children.cycles-pp.folio_unlock
> > 3.11 -0.2 2.94 perf-profile.children.cycles-pp.down_write
> > 4.18 -0.1 4.04 perf-profile.children.cycles-pp.security_inode_need_killpriv
> > 1.68 -0.1 1.56 perf-profile.children.cycles-pp.up_write
> > 3.48 -0.1 3.38 perf-profile.children.cycles-pp.cap_inode_need_killpriv
> > 1.96 -0.1 1.87 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> > 1.28 -0.1 1.18 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
> > 0.92 -0.1 0.84 perf-profile.children.cycles-pp.w_test
> > 3.14 -0.1 3.06 perf-profile.children.cycles-pp.__vfs_getxattr
> > 1.00 -0.1 0.92 perf-profile.children.cycles-pp.aa_file_perm
> > 1.29 -0.1 1.22 perf-profile.children.cycles-pp.xas_descend
> > 0.76 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call
> > 0.87 -0.1 0.80 perf-profile.children.cycles-pp.setattr_should_drop_suidgid
> > 1.07 -0.1 1.01 perf-profile.children.cycles-pp.xattr_resolve_name
> > 1.10 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable
> > 1.05 -0.1 1.00 perf-profile.children.cycles-pp.folio_mapping
> > 0.73 -0.1 0.67 perf-profile.children.cycles-pp.xas_start
> > 0.93 -0.1 0.88 perf-profile.children.cycles-pp.folio_mark_dirty
> > 0.50 -0.0 0.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> > 0.60 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
> > 0.43 -0.0 0.39 perf-profile.children.cycles-pp.write@plt
> > 0.36 -0.0 0.33 perf-profile.children.cycles-pp.amd_clear_divider
> > 0.37 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
> > 0.33 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio
> > 0.36 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode
> > 0.24 -0.0 0.23 ± 2% perf-profile.children.cycles-pp.file_remove_privs
> > 1.18 +0.0 1.21 perf-profile.children.cycles-pp.strcmp
> > 1.02 +0.1 1.08 perf-profile.children.cycles-pp.timestamp_truncate
> > 99.01 +0.1 99.09 perf-profile.children.cycles-pp.write
> > 0.98 ± 3% +0.1 1.06 perf-profile.children.cycles-pp.generic_write_check_limits
> > 0.68 ± 2% +0.1 0.77 ± 6% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> > 11.58 +0.1 11.69 perf-profile.children.cycles-pp.__generic_file_write_iter
> > 2.36 ± 2% +0.1 2.50 perf-profile.children.cycles-pp.generic_write_checks
> > 5.57 +0.2 5.75 perf-profile.children.cycles-pp.fault_in_readable
> > 6.28 +0.2 6.49 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
> > 2.98 +0.4 3.33 perf-profile.children.cycles-pp.inode_needs_update_time
> > 3.51 +0.4 3.89 perf-profile.children.cycles-pp.file_update_time
> > 85.24 +1.1 86.31 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 84.05 +1.2 85.21 perf-profile.children.cycles-pp.do_syscall_64
> > 79.32 +1.5 80.78 perf-profile.children.cycles-pp.ksys_write
> > 75.49 +1.7 77.21 perf-profile.children.cycles-pp.vfs_write
> > 3.64 +4.0 7.64 perf-profile.children.cycles-pp.__fsnotify_parent
> > 5.68 +4.3 10.03 perf-profile.children.cycles-pp.rw_verify_area
> > 6.96 -0.5 6.44 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
> > 6.52 -0.5 6.01 perf-profile.self.cycles-pp.write
> > 6.92 -0.4 6.48 perf-profile.self.cycles-pp.vfs_write
> > 3.59 -0.3 3.24 perf-profile.self.cycles-pp.filemap_get_entry
> > 4.41 -0.3 4.09 perf-profile.self.cycles-pp.__filemap_get_folio
> > 4.23 -0.3 3.95 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > 2.79 -0.3 2.52 perf-profile.self.cycles-pp.simple_write_end
> > 1.76 -0.2 1.52 perf-profile.self.cycles-pp.apparmor_file_permission
> > 2.32 ± 2% -0.2 2.16 ± 2% perf-profile.self.cycles-pp.__fdget_pos
> > 1.79 -0.2 1.62 perf-profile.self.cycles-pp.folio_unlock
> > 2.05 -0.2 1.89 perf-profile.self.cycles-pp.down_write
> > 2.35 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched
> > 1.89 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64
> > 1.38 -0.1 1.26 perf-profile.self.cycles-pp.entry_SYSCALL_64
> > 1.56 -0.1 1.45 perf-profile.self.cycles-pp.up_write
> > 1.30 -0.1 1.19 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 1.42 -0.1 1.31 perf-profile.self.cycles-pp.rcu_all_qs
> > 1.12 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission
> > 1.46 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
> > 0.90 -0.1 0.83 perf-profile.self.cycles-pp.aa_file_perm
> > 1.29 -0.1 1.22 perf-profile.self.cycles-pp.xas_load
> > 0.74 -0.1 0.67 perf-profile.self.cycles-pp.w_test
> > 1.08 -0.1 1.01 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> > 1.98 -0.1 1.92 perf-profile.self.cycles-pp.file_remove_privs_flags
> > 1.30 -0.1 1.24 perf-profile.self.cycles-pp.__vfs_getxattr
> > 1.06 -0.1 1.00 perf-profile.self.cycles-pp.xas_descend
> > 0.80 -0.1 0.74 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
> > 0.63 -0.1 0.58 perf-profile.self.cycles-pp.x64_sys_call
> > 0.74 -0.1 0.69 perf-profile.self.cycles-pp.setattr_should_drop_suidgid
> > 0.63 -0.0 0.58 perf-profile.self.cycles-pp.xas_start
> > 0.87 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping
> > 0.50 -0.0 0.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> > 0.60 -0.0 0.57 perf-profile.self.cycles-pp.xattr_resolve_name
> > 0.48 -0.0 0.44 perf-profile.self.cycles-pp.folio_mark_dirty
> > 0.68 -0.0 0.65 perf-profile.self.cycles-pp.security_inode_need_killpriv
> > 0.36 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.inode_to_bdi
> > 0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_wait_stable
> > 0.34 -0.0 0.32 perf-profile.self.cycles-pp.cap_inode_need_killpriv
> > 0.89 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin
> > 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
> > 0.23 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.amd_clear_divider
> > 0.23 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
> > 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.write@plt
> > 0.24 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.is_bad_inode
> > 0.62 +0.0 0.65 perf-profile.self.cycles-pp.file_update_time
> > 0.86 +0.0 0.90 perf-profile.self.cycles-pp.strcmp
> > 0.69 +0.0 0.74 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
> > 0.75 ± 3% +0.1 0.81 perf-profile.self.cycles-pp.generic_write_check_limits
> > 1.42 ± 2% +0.1 1.48 perf-profile.self.cycles-pp.generic_write_checks
> > 0.82 +0.1 0.89 perf-profile.self.cycles-pp.timestamp_truncate
> > 0.58 ± 3% +0.1 0.66 ± 6% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> > 5.44 +0.2 5.60 perf-profile.self.cycles-pp.fault_in_readable
> > 1.36 +0.2 1.55 perf-profile.self.cycles-pp.inode_needs_update_time
> > 1.76 ± 3% +0.9 2.64 perf-profile.self.cycles-pp.rw_verify_area
> > 3.46 +3.8 7.25 perf-profile.self.cycles-pp.__fsnotify_parent
> >
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> >
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-05-31 3:15 ` Oliver Sang
@ 2024-05-31 5:18 ` Amir Goldstein
2024-06-03 8:13 ` Oliver Sang
0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-05-31 5:18 UTC (permalink / raw)
To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp
On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > <oliver.sang@intel.com> wrote:
> > >
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > >
> > >
> > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > https://github.com/amir73il/linux sb_write_barrier
> > >
> >
> > Jan,
> >
> > I speculate that the regression is due to the fact that we store and pass the
> > path information on struct file_range on the stack before the optimizations
> > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > and __fsnotify_parent() pays a bigger price for fetches?
> >
> > Luckily, we already have the way to check
> > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > FSNOTIFY_PRIO_PRE_CONTENT))
> > so now I used it to optimize out the fsnotify_file_range() inline
> > code entirely.
> >
> > Oliver,
> >
> > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> >
> > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > with pre-content events
> > * f301cd18006c - fanotify: rename a misnamed constant
> > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > * aca408421327 - fsnotify: generate pre-content permission event on open
> > * 93656e196b00 - fsnotify: introduce pre-content permission event
> >
> > The optimization was done in the first commit (fsnotify: introduce
> > pre-content permission event),
> > but impacts the regressing commit (fanotify: pass optional file access
> > range in pre-content event).
> > no need to test all middle commits.
>
> I directly compare the tip with v6.10-rc1, still a regression but better now
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> v6.10-rc1
> a82fd282befc7 ("fanotify: report file range info with pre-content events")
>
> v6.10-rc1 a82fd282befc71d99106bf31066
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 1.216e+08 -3.9% 1.168e+08 unixbench.throughput
>
> full data is as below [1]
>
>
> then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
>
> it also has a small regression comparing to its parent, but better also.
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
>
> 94167e071109d573 64108c0b47db91b20d658a89969
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
>
> full data is as below [2]
>
Ok, this looks sane, the small overhead in the write path makes sense.
It may have been a "tactic mistake" merging this optimization to v6.10-rc1
a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
before the rest of the pre-content infrastructure, because together they
would still be a performance win.
Can you please compare this branch to v6.9?
Thanks,
Amir.
>
> [1]
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> v6.10-rc1
> a82fd282befc7 ("fanotify: report file range info with pre-content events")
>
> v6.10-rc1 a82fd282befc71d99106bf31066
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 1614 ± 6% +252.4% 5688 ± 67% numa-vmstat.node1.nr_mapped
> 6199 -5.8% 5841 time.user_time
> 220234 ± 13% +121.4% 487546 ± 41% numa-meminfo.node0.AnonPages.max
> 836146 ± 6% -36.0% 535267 ± 45% numa-meminfo.node1.AnonPages.max
> 6233 ± 7% +251.3% 21898 ± 69% numa-meminfo.node1.Mapped
> 1.216e+08 -3.9% 1.168e+08 unixbench.throughput
> 6199 -5.8% 5841 unixbench.time.user_time
> 4.513e+10 -3.9% 4.338e+10 unixbench.workload
> 1.458e+11 -2.7% 1.419e+11 perf-stat.i.branch-instructions
> 11.47 ± 6% +2.6 14.10 ± 9% perf-stat.i.cache-miss-rate%
> 3915539 ± 8% +510.0% 23884093 ± 9% perf-stat.i.cache-misses
> 32425619 ± 3% +396.4% 1.61e+08 ± 4% perf-stat.i.cache-references
> 151202 ± 16% -78.6% 32364 ± 56% perf-stat.i.cycles-between-cache-misses
> 6.961e+11 -1.9% 6.828e+11 perf-stat.i.instructions
> 1.22 -1.3% 1.20 perf-stat.i.ipc
> 0.01 ± 9% +519.5% 0.04 ± 10% perf-stat.overall.MPKI
> 0.01 +0.0 0.01 perf-stat.overall.branch-miss-rate%
> 12.09 ± 6% +2.8 14.86 ± 8% perf-stat.overall.cache-miss-rate%
> 0.75 +2.0% 0.77 perf-stat.overall.cpi
> 133775 ± 8% -83.5% 22060 ± 9% perf-stat.overall.cycles-between-cache-misses
> 1.33 -1.9% 1.31 perf-stat.overall.ipc
> 5721 +2.0% 5836 perf-stat.overall.path-length
> 1.452e+11 -2.7% 1.413e+11 perf-stat.ps.branch-instructions
> 3921138 ± 8% +507.4% 23818053 ± 9% perf-stat.ps.cache-misses
> 32415461 ± 3% +394.4% 1.603e+08 ± 4% perf-stat.ps.cache-references
> 6.932e+11 -1.9% 6.797e+11 perf-stat.ps.instructions
> 2.582e+14 -1.9% 2.532e+14 perf-stat.total.instructions
> 13.19 -0.7 12.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> 7.01 -0.2 6.80 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 1.11 -0.2 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
> 2.50 -0.1 2.35 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
> 1.68 -0.1 1.59 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 3.73 -0.1 3.64 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.62 -0.1 1.55 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 2.18 -0.1 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
> 0.65 -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.w_test
> 0.92 -0.0 0.87 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 0.70 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
> 0.86 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
> 0.92 -0.0 0.88 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 0.63 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.86 -0.0 0.83 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> 3.53 -0.0 3.50 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
> 0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 0.53 -0.0 0.51 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write
> 0.72 -0.0 0.71 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.75 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> 1.13 +0.0 1.17 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> 5.30 +0.1 5.36 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 5.30 +0.1 5.38 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
> 6.17 +0.1 6.27 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 96.84 +0.1 96.98 perf-profile.calltrace.cycles-pp.write
> 0.78 ± 2% +0.3 1.13 ± 5% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> 2.97 +0.6 3.57 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 12.01 +0.6 12.62 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 3.63 +0.6 4.24 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> 4.32 +0.6 4.96 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 37.28 +0.8 38.12 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 84.26 +1.0 85.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
> 13.39 +1.0 14.36 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 12.30 +1.0 13.30 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
> 82.83 +1.0 83.86 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 57.94 +1.3 59.20 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 5.99 +1.3 7.25 ± 3% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 78.13 +1.3 79.41 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 74.26 +1.3 75.59 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 7.43 -0.4 7.06 perf-profile.children.cycles-pp.entry_SYSCALL_64
> 4.42 -0.2 4.18 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 1.21 -0.2 1.00 perf-profile.children.cycles-pp.syscall_return_via_sysret
> 7.14 -0.2 6.94 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
> 4.18 -0.2 4.00 perf-profile.children.cycles-pp.__cond_resched
> 2.74 -0.2 2.58 perf-profile.children.cycles-pp.apparmor_file_permission
> 2.42 -0.1 2.30 perf-profile.children.cycles-pp.rcu_all_qs
> 3.82 -0.1 3.72 perf-profile.children.cycles-pp.__fsnotify_parent
> 1.74 -0.1 1.65 perf-profile.children.cycles-pp.up_write
> 1.99 -0.1 1.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 0.99 -0.1 0.91 perf-profile.children.cycles-pp.w_test
> 3.71 -0.1 3.64 perf-profile.children.cycles-pp.security_file_permission
> 2.47 -0.1 2.41 perf-profile.children.cycles-pp.xas_load
> 1.12 -0.1 1.06 perf-profile.children.cycles-pp.folio_wait_stable
> 1.26 -0.1 1.21 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
> 0.75 -0.0 0.71 perf-profile.children.cycles-pp.x64_sys_call
> 0.98 -0.0 0.94 perf-profile.children.cycles-pp.aa_file_perm
> 0.46 -0.0 0.42 perf-profile.children.cycles-pp.write@plt
> 1.10 -0.0 1.07 perf-profile.children.cycles-pp.xattr_resolve_name
> 0.36 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider
> 3.76 -0.0 3.73 perf-profile.children.cycles-pp.cap_inode_need_killpriv
> 0.59 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
> 3.41 -0.0 3.38 perf-profile.children.cycles-pp.__vfs_getxattr
> 0.56 -0.0 0.53 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> 1.05 -0.0 1.03 perf-profile.children.cycles-pp.folio_mapping
> 0.38 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
> 0.25 -0.0 0.24 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
> 0.36 -0.0 0.35 perf-profile.children.cycles-pp.is_bad_inode
> 1.38 +0.0 1.40 perf-profile.children.cycles-pp.strcmp
> 0.93 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty
> 1.07 +0.0 1.09 perf-profile.children.cycles-pp.timestamp_truncate
> 5.70 +0.0 5.75 perf-profile.children.cycles-pp.simple_write_end
> 98.96 +0.1 99.02 perf-profile.children.cycles-pp.write
> 5.69 +0.1 5.75 perf-profile.children.cycles-pp.fault_in_readable
> 6.42 +0.1 6.53 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
> 0.89 +0.3 1.24 ± 4% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> 3.39 +0.6 3.97 perf-profile.children.cycles-pp.inode_needs_update_time
> 12.35 +0.6 12.96 perf-profile.children.cycles-pp.__generic_file_write_iter
> 3.96 +0.6 4.57 perf-profile.children.cycles-pp.file_update_time
> 4.56 +0.8 5.33 perf-profile.children.cycles-pp.rw_verify_area
> 38.16 +0.8 39.01 perf-profile.children.cycles-pp.generic_perform_write
> 13.58 +1.0 14.54 perf-profile.children.cycles-pp.simple_write_begin
> 84.67 +1.0 85.63 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 12.68 +1.0 13.68 perf-profile.children.cycles-pp.__filemap_get_folio
> 83.50 +1.0 84.51 perf-profile.children.cycles-pp.do_syscall_64
> 58.52 +1.3 59.78 perf-profile.children.cycles-pp.generic_file_write_iter
> 6.18 +1.3 7.44 ± 3% perf-profile.children.cycles-pp.filemap_get_entry
> 78.74 +1.3 80.00 perf-profile.children.cycles-pp.ksys_write
> 75.13 +1.3 76.42 perf-profile.children.cycles-pp.vfs_write
> 7.25 -0.6 6.64 perf-profile.self.cycles-pp.vfs_write
> 6.45 -0.4 6.08 perf-profile.self.cycles-pp.write
> 4.32 -0.2 4.08 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 1.21 -0.2 1.00 perf-profile.self.cycles-pp.syscall_return_via_sysret
> 6.98 -0.2 6.78 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
> 4.52 -0.2 4.36 perf-profile.self.cycles-pp.__filemap_get_folio
> 2.34 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched
> 1.90 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64
> 1.60 -0.1 1.50 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission
> 1.47 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
> 1.62 -0.1 1.53 perf-profile.self.cycles-pp.up_write
> 3.65 -0.1 3.56 perf-profile.self.cycles-pp.__fsnotify_parent
> 1.09 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> 1.80 -0.1 1.74 perf-profile.self.cycles-pp.xas_load
> 0.79 -0.1 0.73 perf-profile.self.cycles-pp.w_test
> 1.10 -0.1 1.04 perf-profile.self.cycles-pp.security_file_permission
> 1.25 -0.1 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 1.66 -0.1 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64
> 1.41 -0.0 1.36 perf-profile.self.cycles-pp.rcu_all_qs
> 0.90 -0.0 0.86 perf-profile.self.cycles-pp.simple_write_begin
> 0.88 -0.0 0.84 perf-profile.self.cycles-pp.aa_file_perm
> 0.80 -0.0 0.76 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
> 0.62 -0.0 0.59 perf-profile.self.cycles-pp.x64_sys_call
> 1.39 -0.0 1.36 perf-profile.self.cycles-pp.__vfs_getxattr
> 0.53 -0.0 0.51 perf-profile.self.cycles-pp.folio_wait_stable
> 0.87 -0.0 0.85 perf-profile.self.cycles-pp.folio_mapping
> 0.56 -0.0 0.53 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.24 -0.0 0.22 perf-profile.self.cycles-pp.amd_clear_divider
> 0.12 ± 3% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.write@plt
> 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
> 0.35 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi
> 0.22 -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
> 0.66 +0.0 0.69 perf-profile.self.cycles-pp.file_update_time
> 1.03 +0.0 1.06 perf-profile.self.cycles-pp.strcmp
> 2.75 +0.0 2.79 perf-profile.self.cycles-pp.simple_write_end
> 0.72 +0.0 0.77 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
> 0.87 +0.1 0.92 perf-profile.self.cycles-pp.timestamp_truncate
> 5.54 +0.1 5.59 perf-profile.self.cycles-pp.fault_in_readable
> 2.04 +0.1 2.09 perf-profile.self.cycles-pp.file_remove_privs_flags
> 1.51 +0.2 1.69 perf-profile.self.cycles-pp.inode_needs_update_time
> 0.78 ± 2% +0.3 1.11 ± 5% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> 0.84 +1.0 1.82 perf-profile.self.cycles-pp.rw_verify_area
> 3.66 +1.3 4.97 ± 4% perf-profile.self.cycles-pp.filemap_get_entry
>
>
> [2]
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
>
> 94167e071109d573 64108c0b47db91b20d658a89969
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 38903 ±113% +313.8% 160973 ± 66% numa-meminfo.node1.AnonHugePages
> 1666466 ± 4% -12.2% 1462703 ± 9% numa-numastat.node1.local_node
> 18.97 ±113% +314.3% 78.59 ± 66% numa-vmstat.node1.nr_anon_transparent_hugepages
> 6003 -5.6% 5668 time.user_time
> 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> 6003 -5.6% 5668 unixbench.time.user_time
> 4.314e+10 -2.3% 4.215e+10 unixbench.workload
> -12.17 +33.7% -16.26 sched_debug.cpu.nr_uninterruptible.min
> 0.00 ± 95% +600.3% 0.00 ± 88% sched_debug.rt_rq:.rt_time.avg
> 0.02 ± 95% +600.3% 0.14 ± 88% sched_debug.rt_rq:.rt_time.max
> 0.00 ± 95% +600.3% 0.01 ± 88% sched_debug.rt_rq:.rt_time.stddev
> 1.407e+11 -2.0% 1.379e+11 perf-stat.i.branch-instructions
> 0.55 -0.0 0.51 ± 4% perf-stat.i.branch-miss-rate%
> 55780077 -85.5% 8078438 perf-stat.i.branch-misses
> 5029827 ± 6% +315.5% 20897838 ± 10% perf-stat.i.cache-misses
> 35311245 ± 2% +328.2% 1.512e+08 ± 6% perf-stat.i.cache-references
> 118639 ± 18% -61.7% 45421 ± 41% perf-stat.i.cycles-between-cache-misses
> 6.736e+11 -1.5% 6.634e+11 perf-stat.i.instructions
> 0.01 ± 6% +321.2% 0.03 ± 10% perf-stat.overall.MPKI
> 0.04 -0.0 0.01 perf-stat.overall.branch-miss-rate%
> 0.78 +1.5% 0.79 perf-stat.overall.cpi
> 103942 ± 6% -75.7% 25208 ± 10% perf-stat.overall.cycles-between-cache-misses
> 1.29 -1.5% 1.27 perf-stat.overall.ipc
> 1.4e+11 -1.9% 1.373e+11 perf-stat.ps.branch-instructions
> 55517704 -85.5% 8057745 perf-stat.ps.branch-misses
> 5026889 ± 6% +315.3% 20876882 ± 10% perf-stat.ps.cache-misses
> 35229110 ± 2% +327.6% 1.506e+08 ± 6% perf-stat.ps.cache-references
> 6.701e+11 -1.4% 6.608e+11 perf-stat.ps.instructions
> 2.496e+14 -1.4% 2.46e+14 perf-stat.total.instructions
> 3.61 -0.5 3.09 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> 2.66 -0.5 2.18 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
> 12.62 -0.5 12.16 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> 7.29 -0.3 7.03 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> 4.98 -0.2 4.74 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 6.96 -0.2 6.74 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 4.50 -0.2 4.33 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 3.74 -0.2 3.58 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
> 12.82 -0.2 12.66 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 1.04 -0.1 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
> 2.87 -0.1 2.78 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
> 5.27 -0.1 5.18 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 0.81 -0.1 0.75 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> 2.17 -0.0 2.13 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
> 1.66 -0.0 1.62 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 0.74 -0.0 0.70 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 1.27 -0.0 1.23 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> 0.84 -0.0 0.81 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
> 0.61 -0.0 0.57 perf-profile.calltrace.cycles-pp.w_test
> 0.88 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 0.90 -0.0 0.87 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 0.61 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 0.71 -0.0 0.69 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.00 ± 4% +0.1 1.10 ± 3% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> 2.68 +0.1 2.79 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 5.18 +0.1 5.31 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
> 6.00 +0.1 6.14 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 3.35 +0.1 3.48 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 4.00 +0.1 4.14 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> 96.95 +0.2 97.12 perf-profile.calltrace.cycles-pp.write
> 3.68 +0.2 3.86 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 84.87 +0.6 85.45 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
> 83.48 +0.6 84.10 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 78.96 +0.7 79.70 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 75.00 +0.8 75.81 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 58.04 +1.1 59.14 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 36.40 +1.2 37.55 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 12.88 +1.3 14.20 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 11.79 +1.3 13.13 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
> 5.74 +1.4 7.16 ± 2% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 3.98 -0.5 3.44 perf-profile.children.cycles-pp.security_file_permission
> 2.90 -0.5 2.40 perf-profile.children.cycles-pp.apparmor_file_permission
> 7.66 -0.3 7.38 perf-profile.children.cycles-pp.file_remove_privs_flags
> 7.14 -0.3 6.89 perf-profile.children.cycles-pp.entry_SYSCALL_64
> 5.34 -0.2 5.10 perf-profile.children.cycles-pp.rw_verify_area
> 7.10 -0.2 6.88 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
> 3.98 -0.2 3.80 perf-profile.children.cycles-pp.cap_inode_need_killpriv
> 4.73 -0.2 4.56 perf-profile.children.cycles-pp.security_inode_need_killpriv
> 13.16 -0.2 13.00 perf-profile.children.cycles-pp.__generic_file_write_iter
> 1.14 -0.1 1.01 perf-profile.children.cycles-pp.syscall_return_via_sysret
> 5.67 -0.1 5.56 perf-profile.children.cycles-pp.simple_write_end
> 3.56 -0.1 3.46 perf-profile.children.cycles-pp.__vfs_getxattr
> 4.04 -0.1 3.95 perf-profile.children.cycles-pp.__cond_resched
> 4.21 -0.1 4.14 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 2.31 -0.1 2.24 perf-profile.children.cycles-pp.rcu_all_qs
> 0.99 -0.1 0.93 perf-profile.children.cycles-pp.folio_mark_dirty
> 2.46 -0.0 2.42 perf-profile.children.cycles-pp.xas_load
> 0.93 -0.0 0.88 ± 2% perf-profile.children.cycles-pp.w_test
> 1.50 -0.0 1.46 perf-profile.children.cycles-pp.strcmp
> 1.72 -0.0 1.68 perf-profile.children.cycles-pp.up_write
> 0.87 -0.0 0.82 perf-profile.children.cycles-pp.setattr_should_drop_suidgid
> 1.04 -0.0 1.00 perf-profile.children.cycles-pp.folio_mapping
> 0.96 -0.0 0.92 perf-profile.children.cycles-pp.aa_file_perm
> 1.23 -0.0 1.20 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
> 0.73 -0.0 0.70 perf-profile.children.cycles-pp.x64_sys_call
> 1.07 -0.0 1.05 perf-profile.children.cycles-pp.folio_wait_stable
> 0.43 -0.0 0.41 ± 2% perf-profile.children.cycles-pp.write@plt
> 1.08 -0.0 1.06 perf-profile.children.cycles-pp.xattr_resolve_name
> 0.35 -0.0 0.34 perf-profile.children.cycles-pp.__x64_sys_write
> 99.01 +0.0 99.04 perf-profile.children.cycles-pp.write
> 1.12 ± 4% +0.1 1.20 ± 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> 2.86 +0.1 2.97 perf-profile.children.cycles-pp.down_write
> 5.50 +0.1 5.63 perf-profile.children.cycles-pp.fault_in_readable
> 3.75 +0.1 3.88 perf-profile.children.cycles-pp.inode_needs_update_time
> 4.34 +0.1 4.47 perf-profile.children.cycles-pp.file_update_time
> 6.25 +0.1 6.39 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
> 3.77 +0.2 3.96 perf-profile.children.cycles-pp.__fsnotify_parent
> 85.29 +0.6 85.86 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 84.14 +0.6 84.74 perf-profile.children.cycles-pp.do_syscall_64
> 79.57 +0.7 80.28 perf-profile.children.cycles-pp.ksys_write
> 75.84 +0.8 76.64 perf-profile.children.cycles-pp.vfs_write
> 58.64 +1.1 59.72 perf-profile.children.cycles-pp.generic_file_write_iter
> 37.30 +1.1 38.43 perf-profile.children.cycles-pp.generic_perform_write
> 13.05 +1.3 14.38 perf-profile.children.cycles-pp.simple_write_begin
> 12.18 +1.3 13.52 perf-profile.children.cycles-pp.__filemap_get_folio
> 5.94 +1.4 7.35 ± 2% perf-profile.children.cycles-pp.filemap_get_entry
> 1.77 -0.4 1.35 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission
> 6.23 -0.3 5.94 perf-profile.self.cycles-pp.write
> 6.94 -0.2 6.71 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
> 7.14 -0.2 6.93 perf-profile.self.cycles-pp.vfs_write
> 1.13 -0.1 1.01 perf-profile.self.cycles-pp.syscall_return_via_sysret
> 1.49 ± 3% -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
> 4.12 -0.1 4.03 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.42 -0.1 0.34 perf-profile.self.cycles-pp.cap_inode_need_killpriv
> 2.17 -0.1 2.10 perf-profile.self.cycles-pp.file_remove_privs_flags
> 1.08 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission
> 1.83 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64
> 2.74 -0.1 2.68 perf-profile.self.cycles-pp.simple_write_end
> 0.86 -0.0 0.81 perf-profile.self.cycles-pp.aa_file_perm
> 1.60 -0.0 1.56 perf-profile.self.cycles-pp.up_write
> 1.42 -0.0 1.38 perf-profile.self.cycles-pp.__vfs_getxattr
> 0.86 -0.0 0.82 perf-profile.self.cycles-pp.folio_mapping
> 1.36 -0.0 1.32 perf-profile.self.cycles-pp.rcu_all_qs
> 0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_mark_dirty
> 1.78 -0.0 1.75 perf-profile.self.cycles-pp.xas_load
> 1.23 -0.0 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.74 -0.0 0.71 perf-profile.self.cycles-pp.setattr_should_drop_suidgid
> 0.74 -0.0 0.71 ± 2% perf-profile.self.cycles-pp.w_test
> 1.14 -0.0 1.11 perf-profile.self.cycles-pp.strcmp
> 2.25 -0.0 2.22 perf-profile.self.cycles-pp.__cond_resched
> 0.60 -0.0 0.58 perf-profile.self.cycles-pp.x64_sys_call
> 0.77 -0.0 0.75 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
> 0.61 -0.0 0.60 perf-profile.self.cycles-pp.xattr_resolve_name
> 0.74 +0.0 0.76 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
> 1.40 +0.1 1.45 perf-profile.self.cycles-pp.generic_write_checks
> 1.60 +0.1 1.65 perf-profile.self.cycles-pp.inode_needs_update_time
> 1.00 ± 4% +0.1 1.08 ± 3% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> 1.86 ± 2% +0.1 1.98 perf-profile.self.cycles-pp.down_write
> 5.34 +0.1 5.47 perf-profile.self.cycles-pp.fault_in_readable
> 3.61 +0.2 3.80 perf-profile.self.cycles-pp.__fsnotify_parent
> 1.46 +0.3 1.77 perf-profile.self.cycles-pp.rw_verify_area
> 3.43 +1.4 4.88 ± 3% perf-profile.self.cycles-pp.filemap_get_entry
>
>
> >
> > Thanks,
> > Amir.
> >
> >
> >
> >
> > > testcase: unixbench
> > > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> > > parameters:
> > >
> > > runtime: 300s
> > > nr_task: 100%
> > > test: fsbuffer-w
> > > cpufreq_governor: performance
> > >
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202405291640.2016ebfe-oliver.sang@intel.com
> > >
> > >
> > > Details are as below:
> > > -------------------------------------------------------------------------------------------------->
> > >
> > >
> > > The kernel config and materials to reproduce are available at:
> > > https://download.01.org/0day-ci/archive/20240529/202405291640.2016ebfe-oliver.sang@intel.com
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > > 00c423c0d8 ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > 9d1fd61f1d ("fanotify: pass optional file access range in pre-content event")
> > >
> > > 00c423c0d82eabad 9d1fd61f1d9bb74e44bdcc8767b
> > > ---------------- ---------------------------
> > > %stddev %change %stddev
> > > \ | \
> > > 1.23e+08 -7.9% 1.133e+08 unixbench.throughput
> > > 6169 -7.7% 5694 unixbench.time.user_time
> > > 4.566e+10 -7.9% 4.206e+10 unixbench.workload
> > > 1.513e+11 -4.5% 1.445e+11 perf-stat.i.branch-instructions
> > > 6891152 +4.8% 7221484 perf-stat.i.branch-misses
> > > 29764445 ± 2% -7.4% 27565609 ± 3% perf-stat.i.cache-references
> > > 0.91 +2.0% 0.93 perf-stat.i.cpi
> > > 7.187e+11 -2.7% 6.996e+11 perf-stat.i.instructions
> > > 1.26 -2.6% 1.23 perf-stat.i.ipc
> > > 0.00 +0.0 0.01 perf-stat.overall.branch-miss-rate%
> > > 0.73 +2.7% 0.75 perf-stat.overall.cpi
> > > 1.37 -2.6% 1.34 perf-stat.overall.ipc
> > > 5828 +5.7% 6162 perf-stat.overall.path-length
> > > 1.505e+11 -4.5% 1.437e+11 perf-stat.ps.branch-instructions
> > > 6873687 +4.8% 7203107 perf-stat.ps.branch-misses
> > > 29721957 ± 2% -7.3% 27538369 ± 3% perf-stat.ps.cache-references
> > > 7.148e+11 -2.6% 6.96e+11 perf-stat.ps.instructions
> > > 2.662e+14 -2.6% 2.592e+14 perf-stat.total.instructions
> > > 57.79 -2.0 55.78 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > > 37.58 -2.0 35.63 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > > 13.06 -1.0 12.04 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> > > 13.81 -1.0 12.83 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > > 12.72 -0.9 11.78 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
> > > 7.00 -0.5 6.47 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > > 6.53 -0.5 6.02 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> > > 5.36 -0.5 4.89 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > > 3.66 -0.4 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> > > 2.68 -0.3 2.36 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
> > > 6.57 -0.2 6.34 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> > > 2.36 ± 2% -0.2 2.18 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > > 1.83 -0.2 1.66 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> > > 2.92 -0.2 2.76 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > > 2.65 -0.2 2.49 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
> > > 3.95 -0.1 3.83 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> > > 1.62 -0.1 1.50 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > > 0.74 -0.1 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > > 3.26 -0.1 3.17 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
> > > 3.57 -0.1 3.49 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > > 1.61 -0.1 1.53 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > > 0.93 -0.1 0.85 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > > 1.05 -0.1 0.99 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> > > 0.61 -0.1 0.55 perf-profile.calltrace.cycles-pp.w_test
> > > 0.64 -0.1 0.58 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > > 0.87 -0.1 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
> > > 2.50 -0.1 2.44 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
> > > 0.62 -0.1 0.56 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
> > > 0.74 -0.0 0.69 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> > > 0.91 -0.0 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> > > 0.84 -0.0 0.79 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> > > 0.68 -0.0 0.64 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> > > 0.74 -0.0 0.71 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> > > 0.62 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > > 0.97 +0.0 1.00 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> > > 0.91 +0.1 0.97 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> > > 0.86 ± 3% +0.1 0.94 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
> > > 0.58 ± 2% +0.1 0.66 ± 7% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> > > 11.24 +0.1 11.36 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > > 2.01 ± 2% +0.1 2.14 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> > > 6.04 +0.2 6.24 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> > > 5.17 +0.2 5.42 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
> > > 96.75 +0.3 97.03 perf-profile.calltrace.cycles-pp.write
> > > 2.57 +0.4 2.92 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
> > > 3.20 +0.4 3.57 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> > > 84.82 +1.1 85.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
> > > 83.38 +1.2 84.56 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > > 78.73 +1.5 80.20 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > > 74.54 +1.8 76.32 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > > 0.00 +4.0 3.99 perf-profile.calltrace.cycles-pp.__fsnotify_parent.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> > > 5.32 +4.2 9.48 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > > 58.42 -2.0 56.38 perf-profile.children.cycles-pp.generic_file_write_iter
> > > 38.46 -2.0 36.50 perf-profile.children.cycles-pp.generic_perform_write
> > > 13.99 -1.0 13.01 perf-profile.children.cycles-pp.simple_write_begin
> > > 13.11 -1.0 12.15 perf-profile.children.cycles-pp.__filemap_get_folio
> > > 7.23 -0.6 6.66 perf-profile.children.cycles-pp.entry_SYSCALL_64
> > > 7.12 -0.5 6.59 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
> > > 6.73 -0.5 6.21 perf-profile.children.cycles-pp.filemap_get_entry
> > > 5.76 -0.5 5.26 perf-profile.children.cycles-pp.simple_write_end
> > > 4.05 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission
> > > 2.93 -0.3 2.59 perf-profile.children.cycles-pp.apparmor_file_permission
> > > 4.32 -0.3 4.04 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> > > 4.20 -0.3 3.92 perf-profile.children.cycles-pp.__cond_resched
> > > 6.91 -0.2 6.67 perf-profile.children.cycles-pp.file_remove_privs_flags
> > > 2.43 -0.2 2.24 perf-profile.children.cycles-pp.rcu_all_qs
> > > 3.10 -0.2 2.92 perf-profile.children.cycles-pp.xas_load
> > > 2.47 ± 2% -0.2 2.29 ± 2% perf-profile.children.cycles-pp.__fdget_pos
> > > 1.92 -0.2 1.74 perf-profile.children.cycles-pp.folio_unlock
> > > 3.11 -0.2 2.94 perf-profile.children.cycles-pp.down_write
> > > 4.18 -0.1 4.04 perf-profile.children.cycles-pp.security_inode_need_killpriv
> > > 1.68 -0.1 1.56 perf-profile.children.cycles-pp.up_write
> > > 3.48 -0.1 3.38 perf-profile.children.cycles-pp.cap_inode_need_killpriv
> > > 1.96 -0.1 1.87 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> > > 1.28 -0.1 1.18 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
> > > 0.92 -0.1 0.84 perf-profile.children.cycles-pp.w_test
> > > 3.14 -0.1 3.06 perf-profile.children.cycles-pp.__vfs_getxattr
> > > 1.00 -0.1 0.92 perf-profile.children.cycles-pp.aa_file_perm
> > > 1.29 -0.1 1.22 perf-profile.children.cycles-pp.xas_descend
> > > 0.76 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call
> > > 0.87 -0.1 0.80 perf-profile.children.cycles-pp.setattr_should_drop_suidgid
> > > 1.07 -0.1 1.01 perf-profile.children.cycles-pp.xattr_resolve_name
> > > 1.10 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable
> > > 1.05 -0.1 1.00 perf-profile.children.cycles-pp.folio_mapping
> > > 0.73 -0.1 0.67 perf-profile.children.cycles-pp.xas_start
> > > 0.93 -0.1 0.88 perf-profile.children.cycles-pp.folio_mark_dirty
> > > 0.50 -0.0 0.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> > > 0.60 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
> > > 0.43 -0.0 0.39 perf-profile.children.cycles-pp.write@plt
> > > 0.36 -0.0 0.33 perf-profile.children.cycles-pp.amd_clear_divider
> > > 0.37 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
> > > 0.33 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio
> > > 0.36 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode
> > > 0.24 -0.0 0.23 ± 2% perf-profile.children.cycles-pp.file_remove_privs
> > > 1.18 +0.0 1.21 perf-profile.children.cycles-pp.strcmp
> > > 1.02 +0.1 1.08 perf-profile.children.cycles-pp.timestamp_truncate
> > > 99.01 +0.1 99.09 perf-profile.children.cycles-pp.write
> > > 0.98 ± 3% +0.1 1.06 perf-profile.children.cycles-pp.generic_write_check_limits
> > > 0.68 ± 2% +0.1 0.77 ± 6% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> > > 11.58 +0.1 11.69 perf-profile.children.cycles-pp.__generic_file_write_iter
> > > 2.36 ± 2% +0.1 2.50 perf-profile.children.cycles-pp.generic_write_checks
> > > 5.57 +0.2 5.75 perf-profile.children.cycles-pp.fault_in_readable
> > > 6.28 +0.2 6.49 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
> > > 2.98 +0.4 3.33 perf-profile.children.cycles-pp.inode_needs_update_time
> > > 3.51 +0.4 3.89 perf-profile.children.cycles-pp.file_update_time
> > > 85.24 +1.1 86.31 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > > 84.05 +1.2 85.21 perf-profile.children.cycles-pp.do_syscall_64
> > > 79.32 +1.5 80.78 perf-profile.children.cycles-pp.ksys_write
> > > 75.49 +1.7 77.21 perf-profile.children.cycles-pp.vfs_write
> > > 3.64 +4.0 7.64 perf-profile.children.cycles-pp.__fsnotify_parent
> > > 5.68 +4.3 10.03 perf-profile.children.cycles-pp.rw_verify_area
> > > 6.96 -0.5 6.44 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
> > > 6.52 -0.5 6.01 perf-profile.self.cycles-pp.write
> > > 6.92 -0.4 6.48 perf-profile.self.cycles-pp.vfs_write
> > > 3.59 -0.3 3.24 perf-profile.self.cycles-pp.filemap_get_entry
> > > 4.41 -0.3 4.09 perf-profile.self.cycles-pp.__filemap_get_folio
> > > 4.23 -0.3 3.95 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > > 2.79 -0.3 2.52 perf-profile.self.cycles-pp.simple_write_end
> > > 1.76 -0.2 1.52 perf-profile.self.cycles-pp.apparmor_file_permission
> > > 2.32 ± 2% -0.2 2.16 ± 2% perf-profile.self.cycles-pp.__fdget_pos
> > > 1.79 -0.2 1.62 perf-profile.self.cycles-pp.folio_unlock
> > > 2.05 -0.2 1.89 perf-profile.self.cycles-pp.down_write
> > > 2.35 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched
> > > 1.89 -0.1 1.77 perf-profile.self.cycles-pp.do_syscall_64
> > > 1.38 -0.1 1.26 perf-profile.self.cycles-pp.entry_SYSCALL_64
> > > 1.56 -0.1 1.45 perf-profile.self.cycles-pp.up_write
> > > 1.30 -0.1 1.19 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> > > 1.42 -0.1 1.31 perf-profile.self.cycles-pp.rcu_all_qs
> > > 1.12 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission
> > > 1.46 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
> > > 0.90 -0.1 0.83 perf-profile.self.cycles-pp.aa_file_perm
> > > 1.29 -0.1 1.22 perf-profile.self.cycles-pp.xas_load
> > > 0.74 -0.1 0.67 perf-profile.self.cycles-pp.w_test
> > > 1.08 -0.1 1.01 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> > > 1.98 -0.1 1.92 perf-profile.self.cycles-pp.file_remove_privs_flags
> > > 1.30 -0.1 1.24 perf-profile.self.cycles-pp.__vfs_getxattr
> > > 1.06 -0.1 1.00 perf-profile.self.cycles-pp.xas_descend
> > > 0.80 -0.1 0.74 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
> > > 0.63 -0.1 0.58 perf-profile.self.cycles-pp.x64_sys_call
> > > 0.74 -0.1 0.69 perf-profile.self.cycles-pp.setattr_should_drop_suidgid
> > > 0.63 -0.0 0.58 perf-profile.self.cycles-pp.xas_start
> > > 0.87 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping
> > > 0.50 -0.0 0.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> > > 0.60 -0.0 0.57 perf-profile.self.cycles-pp.xattr_resolve_name
> > > 0.48 -0.0 0.44 perf-profile.self.cycles-pp.folio_mark_dirty
> > > 0.68 -0.0 0.65 perf-profile.self.cycles-pp.security_inode_need_killpriv
> > > 0.36 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.inode_to_bdi
> > > 0.52 -0.0 0.49 perf-profile.self.cycles-pp.folio_wait_stable
> > > 0.34 -0.0 0.32 perf-profile.self.cycles-pp.cap_inode_need_killpriv
> > > 0.89 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin
> > > 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
> > > 0.23 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.amd_clear_divider
> > > 0.23 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
> > > 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.write@plt
> > > 0.24 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.is_bad_inode
> > > 0.62 +0.0 0.65 perf-profile.self.cycles-pp.file_update_time
> > > 0.86 +0.0 0.90 perf-profile.self.cycles-pp.strcmp
> > > 0.69 +0.0 0.74 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
> > > 0.75 ± 3% +0.1 0.81 perf-profile.self.cycles-pp.generic_write_check_limits
> > > 1.42 ± 2% +0.1 1.48 perf-profile.self.cycles-pp.generic_write_checks
> > > 0.82 +0.1 0.89 perf-profile.self.cycles-pp.timestamp_truncate
> > > 0.58 ± 3% +0.1 0.66 ± 6% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> > > 5.44 +0.2 5.60 perf-profile.self.cycles-pp.fault_in_readable
> > > 1.36 +0.2 1.55 perf-profile.self.cycles-pp.inode_needs_update_time
> > > 1.76 ± 3% +0.9 2.64 perf-profile.self.cycles-pp.rw_verify_area
> > > 3.46 +3.8 7.25 perf-profile.self.cycles-pp.__fsnotify_parent
> > >
> > >
> > >
> > >
> > > Disclaimer:
> > > Results have been estimated based on internal Intel analysis and are provided
> > > for informational purposes only. Any difference in system hardware or software
> > > design or configuration may affect actual performance.
> > >
> > >
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki
> > >
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-05-31 5:18 ` Amir Goldstein
@ 2024-06-03 8:13 ` Oliver Sang
2024-06-04 12:33 ` Amir Goldstein
0 siblings, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-06-03 8:13 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang
hi, Amir,
On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Amir,
> >
> > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > <oliver.sang@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > >
> > > >
> > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > https://github.com/amir73il/linux sb_write_barrier
> > > >
> > >
> > > Jan,
> > >
> > > I speculate that the regression is due to the fact that we store and pass the
> > > path information on struct file_range on the stack before the optimizations
> > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > and __fsnotify_parent() pays a bigger price for fetches?
> > >
> > > Luckily, we already have the way to check
> > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > FSNOTIFY_PRIO_PRE_CONTENT))
> > > so now I used it to optimize out the fsnotify_file_range() inline
> > > code entirely.
> > >
> > > Oliver,
> > >
> > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > >
> > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > with pre-content events
> > > * f301cd18006c - fanotify: rename a misnamed constant
> > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > >
> > > The optimization was done in the first commit (fsnotify: introduce
> > > pre-content permission event),
> > > but impacts the regressing commit (fanotify: pass optional file access
> > > range in pre-content event).
> > > no need to test all middle commits.
> >
> > I directly compare the tip with v6.10-rc1, still a regression but better now
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> > v6.10-rc1
> > a82fd282befc7 ("fanotify: report file range info with pre-content events")
> >
> > v6.10-rc1 a82fd282befc71d99106bf31066
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput
> >
> > full data is as below [1]
> >
> >
> > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> >
> > it also has a small regression comparing to its parent, but better also.
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> >
> > 94167e071109d573 64108c0b47db91b20d658a89969
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> >
> > full data is as below [2]
> >
>
> Ok, this looks sane, the small overhead in the write path makes sense.
> It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> before the rest of the pre-content infrastructure, because together they
> would still be a performance win.
>
> Can you please compare this branch to v6.9?
there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
v6.9
v6.10-rc1
a82fd282befc7 ("fanotify: report file range info with pre-content events")
v6.9 v6.10-rc1 a82fd282befc71d99106bf31066
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
9218048 ± 19% +33.1% 12267178 ± 6% +14.2% 10523306 ± 7% meminfo.DirectMap2M
151289 +63.8% 247886 ± 6% +61.4% 244132 ± 4% meminfo.DirectMap4k
0.52 +0.1 0.58 +0.1 0.59 mpstat.cpu.all.irq%
0.01 -0.0 0.01 ± 4% -0.0 0.01 mpstat.cpu.all.soft%
10241 -9.1% 9314 ± 11% -15.6% 8648 ± 15% sched_debug.cpu.curr->pid.min
-35.33 -55.4% -15.76 -62.3% -13.31 sched_debug.cpu.nr_uninterruptible.min
109116 ± 96% -85.8% 15473 ±125% +8.6% 118471 ± 81% numa-meminfo.node0.AnonHugePages
4803556 ± 2% -3.5% 4636196 ± 2% -31.7% 3278497 ± 41% numa-meminfo.node0.MemUsed
574474 ± 29% +45.5% 836146 ± 6% -6.8% 535267 ± 45% numa-meminfo.node1.AnonPages.max
1773634 ± 6% +8.9% 1931750 ± 5% +85.6% 3291386 ± 41% numa-meminfo.node1.MemUsed
35.33 ± 15% -73.1% 9.50 ± 24% -58.5% 14.67 ± 27% perf-c2c.DRAM.local
181.67 ± 11% -74.7% 46.00 ± 12% -69.6% 55.17 ± 18% perf-c2c.DRAM.remote
298.67 ± 7% -82.2% 53.17 ± 9% -79.1% 62.33 ± 21% perf-c2c.HITM.local
125.67 ± 15% -77.1% 28.83 ± 15% -72.9% 34.00 ± 22% perf-c2c.HITM.remote
265024 -1.2% 261842 -0.8% 262871 time.involuntary_context_switches
25.33 ± 16% -61.2% 9.83 ± 23% -59.2% 10.33 ± 22% time.major_page_faults
7168 +0.9% 7234 +0.9% 7234 time.maximum_resident_set_size
6286 -1.4% 6199 -7.1% 5841 time.user_time
70712 -1.7% 69536 -0.9% 70096 proc-vmstat.nr_active_anon
9037 +1.4% 9162 ± 2% +2.9% 9301 proc-vmstat.nr_page_table_pages
73584 -1.8% 72274 -1.1% 72752 proc-vmstat.nr_shmem
70712 -1.7% 69536 -0.9% 70096 proc-vmstat.nr_zone_active_anon
35571 ± 8% -9.5% 32176 ± 3% -15.7% 29987 ± 4% proc-vmstat.pgactivate
1.219e+08 -0.2% 1.216e+08 -4.1% 1.168e+08 unixbench.throughput
265024 -1.2% 261842 -0.8% 262871 unixbench.time.involuntary_context_switches
7168 +0.9% 7234 +0.9% 7234 unixbench.time.maximum_resident_set_size
6286 -1.4% 6199 -7.1% 5841 unixbench.time.user_time
4.521e+10 -0.2% 4.513e+10 -4.1% 4.338e+10 unixbench.workload
1.476e+11 -1.2% 1.458e+11 -3.9% 1.419e+11 perf-stat.i.branch-instructions
7506784 -2.1% 7347431 -2.4% 7329399 perf-stat.i.branch-misses
3830897 ± 5% +2.2% 3915539 ± 8% +523.5% 23884093 ± 9% perf-stat.i.cache-misses
30323968 ± 2% +6.9% 32425619 ± 3% +430.9% 1.61e+08 ± 4% perf-stat.i.cache-references
0.94 +1.6% 0.95 +1.6% 0.95 perf-stat.i.cpi
157608 ± 12% -4.1% 151202 ± 16% -79.5% 32364 ± 56% perf-stat.i.cycles-between-cache-misses
7.003e+11 -0.6% 6.961e+11 -2.5% 6.828e+11 perf-stat.i.instructions
1.23 -1.0% 1.22 -2.3% 1.20 perf-stat.i.ipc
0.09 ± 14% -56.1% 0.04 ± 20% -64.9% 0.03 ± 22% perf-stat.i.major-faults
0.01 ± 5% +3.3% 0.01 ± 9% +540.2% 0.04 ± 10% perf-stat.overall.MPKI
0.01 -0.0 0.01 +0.0 0.01 perf-stat.overall.branch-miss-rate%
0.75 +0.6% 0.75 +2.6% 0.77 perf-stat.overall.cpi
136694 ± 5% -2.1% 133775 ± 8% -83.9% 22060 ± 9% perf-stat.overall.cycles-between-cache-misses
1.34 -0.6% 1.33 -2.5% 1.31 perf-stat.overall.ipc
5752 -0.5% 5721 +1.5% 5836 perf-stat.overall.path-length
1.469e+11 -1.2% 1.452e+11 -3.8% 1.413e+11 perf-stat.ps.branch-instructions
3815245 ± 5% +2.8% 3921138 ± 8% +524.3% 23818053 ± 9% perf-stat.ps.cache-misses
30276290 ± 2% +7.1% 32415461 ± 3% +429.3% 1.603e+08 ± 4% perf-stat.ps.cache-references
6.97e+11 -0.5% 6.932e+11 -2.5% 6.797e+11 perf-stat.ps.instructions
0.09 ± 14% -56.0% 0.04 ± 21% -64.6% 0.03 ± 23% perf-stat.ps.major-faults
2.601e+14 -0.7% 2.582e+14 -2.7% 2.532e+14 perf-stat.total.instructions
58.72 -0.8 57.94 +0.5 59.20 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
38.04 -0.8 37.28 +0.1 38.12 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
5.91 -0.6 5.30 -0.5 5.38 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
78.64 -0.5 78.13 +0.8 79.41 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.65 -0.5 2.18 -0.5 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
83.29 -0.5 82.83 +0.6 83.86 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
6.45 -0.5 5.99 +0.8 7.25 ± 3% perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
84.71 -0.4 84.26 +0.5 85.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
6.59 -0.4 6.17 -0.3 6.27 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
74.65 -0.4 74.26 +0.9 75.59 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
13.74 -0.3 13.39 +0.6 14.36 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
12.63 -0.3 12.30 +0.7 13.30 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
4.62 -0.3 4.32 +0.3 4.96 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.78 ± 2% -0.3 2.50 -0.4 2.35 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
3.62 -0.3 3.34 -0.3 3.28 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
2.06 -0.1 1.92 -0.1 1.95 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
2.92 -0.1 2.81 -0.2 2.72 ± 2% perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
0.75 -0.0 0.70 -0.1 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
0.93 -0.0 0.89 -0.0 0.88 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
0.99 -0.0 0.96 -0.0 0.98 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
0.74 -0.0 0.72 -0.0 0.71 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.63 -0.0 0.62 -0.0 0.61 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.87 -0.0 0.86 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
0.64 -0.0 0.63 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.75 -0.0 0.75 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
0.68 +0.0 0.68 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.91 +0.0 0.92 -0.0 0.88 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.84 +0.0 0.86 -0.0 0.83 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
0.63 +0.0 0.65 -0.0 0.60 ± 2% perf-profile.calltrace.cycles-pp.w_test
5.29 +0.0 5.30 +0.1 5.36 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.89 +0.0 0.92 -0.0 0.87 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
1.59 +0.0 1.62 -0.0 1.55 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.61 ± 2% +0.0 0.64 +0.0 0.63 perf-profile.calltrace.cycles-pp.xas_start.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin
1.62 +0.1 1.68 -0.0 1.59 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
1.05 +0.1 1.11 -0.1 0.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
96.77 +0.1 96.84 +0.2 96.98 perf-profile.calltrace.cycles-pp.write
6.92 +0.1 7.01 -0.1 6.80 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
2.87 +0.1 2.97 +0.7 3.57 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
3.53 +0.1 3.63 +0.7 4.24 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
1.03 +0.1 1.13 +0.1 1.17 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
2.59 +0.1 2.71 +0.1 2.71 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
3.37 +0.2 3.53 +0.1 3.50 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
0.62 ± 3% +0.2 0.78 ± 2% +0.5 1.13 ± 5% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
13.02 +0.2 13.19 -0.5 12.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
4.06 +0.2 4.23 +0.2 4.22 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
6.66 +0.2 6.88 +0.2 6.91 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
3.48 +0.2 3.73 +0.2 3.64 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
11.67 +0.3 12.01 +0.9 12.62 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
0.00 +0.5 0.52 +0.5 0.52 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.00 +0.5 0.53 +0.5 0.51 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write
59.32 -0.8 58.52 +0.5 59.78 perf-profile.children.cycles-pp.generic_file_write_iter
38.93 -0.8 38.16 +0.1 39.01 perf-profile.children.cycles-pp.generic_perform_write
3.10 -0.6 2.47 -0.7 2.41 perf-profile.children.cycles-pp.xas_load
79.22 -0.5 78.74 +0.8 80.00 perf-profile.children.cycles-pp.ksys_write
6.64 -0.5 6.18 +0.8 7.44 ± 3% perf-profile.children.cycles-pp.filemap_get_entry
85.12 -0.4 84.67 +0.5 85.63 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
83.94 -0.4 83.50 +0.6 84.51 perf-profile.children.cycles-pp.do_syscall_64
6.86 -0.4 6.42 -0.3 6.53 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
75.55 -0.4 75.13 +0.9 76.42 perf-profile.children.cycles-pp.vfs_write
6.08 -0.4 5.69 -0.3 5.75 perf-profile.children.cycles-pp.fault_in_readable
13.92 -0.3 13.58 +0.6 14.54 perf-profile.children.cycles-pp.simple_write_begin
13.03 -0.3 12.68 +0.7 13.68 perf-profile.children.cycles-pp.__filemap_get_folio
4.86 -0.3 4.56 +0.5 5.33 perf-profile.children.cycles-pp.rw_verify_area
4.01 -0.3 3.71 -0.4 3.64 perf-profile.children.cycles-pp.security_file_permission
3.02 ± 2% -0.3 2.74 -0.4 2.58 perf-profile.children.cycles-pp.apparmor_file_permission
2.42 -0.2 2.25 -0.1 2.27 perf-profile.children.cycles-pp.generic_write_checks
3.11 -0.1 2.98 -0.2 2.90 ± 2% perf-profile.children.cycles-pp.down_write
4.28 -0.1 4.18 -0.3 4.00 perf-profile.children.cycles-pp.__cond_resched
1.05 -0.0 1.00 -0.1 1.00 perf-profile.children.cycles-pp.generic_write_check_limits
98.99 -0.0 98.96 +0.0 99.02 perf-profile.children.cycles-pp.write
2.45 -0.0 2.42 -0.1 2.30 perf-profile.children.cycles-pp.rcu_all_qs
1.10 -0.0 1.07 -0.0 1.09 perf-profile.children.cycles-pp.timestamp_truncate
0.99 -0.0 0.98 -0.1 0.94 perf-profile.children.cycles-pp.aa_file_perm
0.76 -0.0 0.75 -0.0 0.71 perf-profile.children.cycles-pp.x64_sys_call
0.33 -0.0 0.32 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio
0.23 -0.0 0.23 ± 2% -0.0 0.22 perf-profile.children.cycles-pp.file_remove_privs
0.59 +0.0 0.59 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
0.25 +0.0 0.25 -0.0 0.24 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
0.93 +0.0 0.93 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty
0.36 +0.0 0.36 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider
0.37 +0.0 0.38 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
1.26 +0.0 1.26 -0.0 1.21 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
5.69 +0.0 5.70 +0.1 5.75 perf-profile.children.cycles-pp.simple_write_end
0.35 ± 2% +0.0 0.36 +0.0 0.37 perf-profile.children.cycles-pp.__cmd_record
0.35 ± 2% +0.0 0.36 +0.0 0.37 perf-profile.children.cycles-pp.cmd_record
0.35 ± 2% +0.0 0.36 +0.0 0.37 perf-profile.children.cycles-pp.record__mmap_read_evlist
1.08 +0.0 1.10 -0.0 1.07 perf-profile.children.cycles-pp.xattr_resolve_name
0.39 ± 2% +0.0 0.40 ± 3% +0.0 0.41 ± 4% perf-profile.children.cycles-pp.update_process_times
0.42 ± 2% +0.0 0.43 ± 3% +0.0 0.44 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.44 +0.0 0.46 -0.0 0.42 perf-profile.children.cycles-pp.write@plt
0.41 ± 3% +0.0 0.43 ± 3% +0.0 0.43 ± 4% perf-profile.children.cycles-pp.tick_nohz_handler
1.03 +0.0 1.05 -0.0 1.03 perf-profile.children.cycles-pp.folio_mapping
0.08 +0.0 0.10 ± 4% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.update_min_vruntime
0.10 +0.0 0.13 ± 2% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.update_curr
0.96 +0.0 0.99 -0.0 0.91 perf-profile.children.cycles-pp.w_test
1.09 +0.0 1.12 -0.0 1.06 perf-profile.children.cycles-pp.folio_wait_stable
1.95 +0.0 1.99 -0.1 1.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.00 +0.0 0.04 ± 44% +0.1 0.05 perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.09 +0.1 0.15 ± 5% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.ktime_get
0.10 ± 4% +0.1 0.15 ± 4% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.clockevents_program_event
4.36 +0.1 4.42 -0.2 4.18 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.50 +0.1 0.56 +0.0 0.53 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
1.68 +0.1 1.74 -0.0 1.65 perf-profile.children.cycles-pp.up_write
1.14 +0.1 1.21 -0.1 1.00 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.55 ± 3% +0.1 0.64 ± 3% +0.1 0.66 ± 5% perf-profile.children.cycles-pp.hrtimer_interrupt
7.05 +0.1 7.14 -0.1 6.94 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
0.56 ± 3% +0.1 0.65 ± 3% +0.1 0.67 ± 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.61 ± 2% +0.1 0.70 ± 3% +0.1 0.72 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.58 ± 2% +0.1 0.67 ± 3% +0.1 0.69 ± 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
3.86 +0.1 3.96 +0.7 4.57 perf-profile.children.cycles-pp.file_update_time
3.28 +0.1 3.39 +0.7 3.97 perf-profile.children.cycles-pp.inode_needs_update_time
1.27 +0.1 1.38 +0.1 1.40 perf-profile.children.cycles-pp.strcmp
3.25 +0.2 3.41 +0.1 3.38 perf-profile.children.cycles-pp.__vfs_getxattr
0.73 ± 3% +0.2 0.89 +0.5 1.24 ± 4% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
3.60 +0.2 3.76 +0.1 3.73 perf-profile.children.cycles-pp.cap_inode_need_killpriv
4.29 +0.2 4.47 +0.2 4.44 perf-profile.children.cycles-pp.security_inode_need_killpriv
7.00 +0.2 7.23 +0.3 7.26 perf-profile.children.cycles-pp.file_remove_privs_flags
7.20 +0.2 7.43 -0.1 7.06 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.00 +0.3 0.27 ± 2% +0.3 0.27 ± 2% perf-profile.children.cycles-pp.sched_tick
3.54 +0.3 3.82 +0.2 3.72 perf-profile.children.cycles-pp.__fsnotify_parent
12.00 +0.3 12.35 +1.0 12.96 perf-profile.children.cycles-pp.__generic_file_write_iter
5.93 -0.4 5.54 -0.3 5.59 perf-profile.self.cycles-pp.fault_in_readable
1.86 ± 3% -0.3 1.60 -0.4 1.50 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission
1.42 -0.1 1.30 -0.1 1.31 perf-profile.self.cycles-pp.generic_write_checks
2.43 -0.1 2.34 -0.2 2.22 perf-profile.self.cycles-pp.__cond_resched
3.43 -0.1 3.36 -0.1 3.36 perf-profile.self.cycles-pp.generic_perform_write
2.00 -0.1 1.93 -0.1 1.92 ± 2% perf-profile.self.cycles-pp.down_write
1.76 -0.1 1.69 -0.1 1.67 perf-profile.self.cycles-pp.generic_file_write_iter
0.63 ± 2% -0.1 0.56 -0.1 0.56 perf-profile.self.cycles-pp.xas_start
6.51 -0.1 6.45 -0.4 6.08 perf-profile.self.cycles-pp.write
0.77 -0.0 0.72 -0.0 0.77 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
1.28 -0.0 1.25 -0.1 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.90 -0.0 0.87 +0.0 0.92 perf-profile.self.cycles-pp.timestamp_truncate
1.53 -0.0 1.51 +0.2 1.69 perf-profile.self.cycles-pp.inode_needs_update_time
0.90 -0.0 0.88 -0.1 0.84 perf-profile.self.cycles-pp.aa_file_perm
0.81 -0.0 0.79 -0.0 0.78 perf-profile.self.cycles-pp.generic_write_check_limits
0.86 -0.0 0.84 +1.0 1.82 perf-profile.self.cycles-pp.rw_verify_area
1.11 -0.0 1.10 -0.1 1.04 perf-profile.self.cycles-pp.security_file_permission
1.42 -0.0 1.41 -0.1 1.36 perf-profile.self.cycles-pp.rcu_all_qs
0.63 -0.0 0.62 -0.0 0.59 perf-profile.self.cycles-pp.x64_sys_call
0.23 ± 2% -0.0 0.22 -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
0.80 -0.0 0.80 -0.0 0.76 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
0.12 -0.0 0.12 ± 3% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.write@plt
0.90 -0.0 0.90 -0.0 0.86 perf-profile.self.cycles-pp.simple_write_begin
0.25 +0.0 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
2.75 +0.0 2.75 +0.0 2.79 perf-profile.self.cycles-pp.simple_write_end
1.89 +0.0 1.90 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64
0.66 +0.0 0.66 +0.0 0.69 perf-profile.self.cycles-pp.file_update_time
0.52 +0.0 0.53 -0.0 0.51 perf-profile.self.cycles-pp.folio_wait_stable
0.86 +0.0 0.87 -0.0 0.85 perf-profile.self.cycles-pp.folio_mapping
0.69 +0.0 0.70 +0.0 0.71 perf-profile.self.cycles-pp.security_inode_need_killpriv
1.08 +0.0 1.09 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.07 +0.0 0.09 ± 5% +0.0 0.08 ± 7% perf-profile.self.cycles-pp.update_min_vruntime
0.77 +0.0 0.79 -0.0 0.73 perf-profile.self.cycles-pp.w_test
0.00 +0.0 0.02 ± 99% +0.1 0.05 perf-profile.self.cycles-pp.ktime_get_update_offsets_now
1.44 +0.0 1.47 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
1.99 +0.0 2.04 +0.1 2.09 perf-profile.self.cycles-pp.file_remove_privs_flags
1.57 +0.1 1.62 -0.0 1.53 perf-profile.self.cycles-pp.up_write
1.33 +0.1 1.39 +0.0 1.36 perf-profile.self.cycles-pp.__vfs_getxattr
0.09 ± 5% +0.1 0.14 ± 5% +0.1 0.15 ± 13% perf-profile.self.cycles-pp.ktime_get
4.27 +0.1 4.32 -0.2 4.08 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.49 +0.1 0.56 +0.0 0.53 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
1.14 +0.1 1.21 -0.1 1.00 perf-profile.self.cycles-pp.syscall_return_via_sysret
6.90 +0.1 6.98 -0.1 6.78 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
4.43 +0.1 4.52 -0.1 4.36 perf-profile.self.cycles-pp.__filemap_get_folio
0.94 +0.1 1.03 +0.1 1.06 perf-profile.self.cycles-pp.strcmp
0.62 ± 3% +0.2 0.78 ± 2% +0.5 1.11 ± 5% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
3.49 +0.2 3.66 +1.5 4.97 ± 4% perf-profile.self.cycles-pp.filemap_get_entry
3.42 +0.2 3.65 +0.1 3.56 perf-profile.self.cycles-pp.__fsnotify_parent
1.35 +0.3 1.66 +0.3 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64
6.92 +0.3 7.25 -0.3 6.64 perf-profile.self.cycles-pp.vfs_write
1.29 +0.5 1.80 +0.4 1.74 perf-profile.self.cycles-pp.xas_load
>
> Thanks,
> Amir.
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-06-03 8:13 ` Oliver Sang
@ 2024-06-04 12:33 ` Amir Goldstein
2024-07-01 7:42 ` Oliver Sang
0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-06-04 12:33 UTC (permalink / raw)
To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp
On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > >
> > > hi, Amir,
> > >
> > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > <oliver.sang@intel.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > >
> > > > >
> > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > >
> > > >
> > > > Jan,
> > > >
> > > > I speculate that the regression is due to the fact that we store and pass the
> > > > path information on struct file_range on the stack before the optimizations
> > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > >
> > > > Luckily, we already have the way to check
> > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > FSNOTIFY_PRIO_PRE_CONTENT))
> > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > code entirely.
> > > >
> > > > Oliver,
> > > >
> > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > >
> > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > with pre-content events
> > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > >
> > > > The optimization was done in the first commit (fsnotify: introduce
> > > > pre-content permission event),
> > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > range in pre-content event).
> > > > no need to test all middle commits.
> > >
> > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > > v6.10-rc1
> > > a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > >
> > > v6.10-rc1 a82fd282befc71d99106bf31066
> > > ---------------- ---------------------------
> > > %stddev %change %stddev
> > > \ | \
> > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput
> > >
> > > full data is as below [1]
> > >
> > >
> > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > >
> > > it also has a small regression comparing to its parent, but better also.
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > >
> > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > ---------------- ---------------------------
> > > %stddev %change %stddev
> > > \ | \
> > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> > >
> > > full data is as below [2]
> > >
> >
> > Ok, this looks sane, the small overhead in the write path makes sense.
On second look, while a small regression from 64108c0b47db9 could make
sense, because it changes the inline fsnotify hooks, the extra regression from
the tip of the branch a82fd282befc7 makes no sense at all, as it does not
touch any code that affects the executed functions, so I have to wonder how
reliable are those results.
Could you re-test the commits 94167e071109d..a82fd282befc7?
> > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > before the rest of the pre-content infrastructure, because together they
> > would still be a performance win.
> >
> > Can you please compare this branch to v6.9?
>
> there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
>
This is a bit surprising to me, because a5e57b4d370c should have been a pretty
big performance win for the common case.
Especially, considering that here [1] you reported in pre-merge testing that an
identical commit has improved the fstime-r/unixbench workload
(although with gcc-12):
[1] https://lore.kernel.org/oe-lkp/Zfj3wxDHolB1qCGO@xsang-OptiPlex-9020/
and here [2] that a similar commit had improved writeseek1/will-it-scale
[2] https://lore.kernel.org/all/Zc7KmlQ1cYVrPMQ+@xsang-OptiPlex-9020/
Judging by simple_write_begin() in this regression perf report, and
shmem_file_write_iter in the reports above, may I assume that this report
was with a kernel with non-default config !CONFIG_SHMEM?
Is that correct? Is this an intended config change?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-06-04 12:33 ` Amir Goldstein
@ 2024-07-01 7:42 ` Oliver Sang
2024-07-03 5:58 ` Amir Goldstein
0 siblings, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-07-01 7:42 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang
hi, Amir,
sorry for quite late.
On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Amir,
> >
> > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > >
> > > > hi, Amir,
> > > >
> > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > <oliver.sang@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > >
> > > > > >
> > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > >
> > > > >
> > > > > Jan,
> > > > >
> > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > path information on struct file_range on the stack before the optimizations
> > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > >
> > > > > Luckily, we already have the way to check
> > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > > FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > code entirely.
> > > > >
> > > > > Oliver,
> > > > >
> > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > >
> > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > with pre-content events
> > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > >
> > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > pre-content permission event),
> > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > range in pre-content event).
> > > > > no need to test all middle commits.
> > > >
> > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > >
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > >
> > > > commit:
> > > > v6.10-rc1
> > > > a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > >
> > > > v6.10-rc1 a82fd282befc71d99106bf31066
> > > > ---------------- ---------------------------
> > > > %stddev %change %stddev
> > > > \ | \
> > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput
> > > >
> > > > full data is as below [1]
> > > >
> > > >
> > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > >
> > > > it also has a small regression comparing to its parent, but better also.
> > > >
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > >
> > > > commit:
> > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > >
> > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > ---------------- ---------------------------
> > > > %stddev %change %stddev
> > > > \ | \
> > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> > > >
> > > > full data is as below [2]
> > > >
> > >
> > > Ok, this looks sane, the small overhead in the write path makes sense.
>
> On second look, while a small regression from 64108c0b47db9 could make
> sense, because it changes the inline fsnotify hooks, the extra regression from
> the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> touch any code that affects the executed functions, so I have to wonder how
> reliable are those results.
>
> Could you re-test the commits 94167e071109d..a82fd282befc7?
since the branch is:
* a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <---
* f301cd18006c3 fanotify: rename a misnamed constant
* 64108c0b47db9 fanotify: pass optional file access range in pre-content event
* 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <---
* 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d
* 83af0c89527ab fsnotify: generate pre-content permission event on exec
* aca4084213276 fsnotify: generate pre-content permission event on open
* 93656e196b006 fsnotify: introduce pre-content permission event
* 1613e604df0cd (tag: v6.10-rc1,
I made below comparison, which shows little difference among 3 commits:
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
a82fd282befc7 fanotify: report file range info with pre-content events
68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput
>
> > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > > before the rest of the pre-content infrastructure, because together they
> > > would still be a performance win.
> > >
> > > Can you please compare this branch to v6.9?
> >
> > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
> >
>
> This is a bit surprising to me, because a5e57b4d370c should have been a pretty
> big performance win for the common case.
in our this unixbench tests, a5e57b4d370c introduce a small regression comparing
to its parent (477cf917dd028).
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
v6.9
477cf917dd028 fsnotify: use an enum for group priority constants
a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers
v6.10-rc1
v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919 v6.10-rc1
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
1.219e+08 +2.8% 1.253e+08 +0.4% 1.224e+08 -0.2% 1.216e+08 unixbench.throughput
BTW, for a5e57b4d370c, there is another regression report in
https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
which also includes some unixbench improvement results, but different sub-tests
on different platform.
>
> Especially, considering that here [1] you reported in pre-merge testing that an
> identical commit has improved the fstime-r/unixbench workload
> (although with gcc-12):
> [1] https://lore.kernel.org/oe-lkp/Zfj3wxDHolB1qCGO@xsang-OptiPlex-9020/
> and here [2] that a similar commit had improved writeseek1/will-it-scale
> [2] https://lore.kernel.org/all/Zc7KmlQ1cYVrPMQ+@xsang-OptiPlex-9020/
>
> Judging by simple_write_begin() in this regression perf report, and
> shmem_file_write_iter in the reports above, may I assume that this report
> was with a kernel with non-default config !CONFIG_SHMEM?
> Is that correct? Is this an intended config change?
we always set CONFIG_SHMEM.
>
> Thanks,
> Amir.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-01 7:42 ` Oliver Sang
@ 2024-07-03 5:58 ` Amir Goldstein
2024-07-03 7:21 ` Oliver Sang
0 siblings, 1 reply; 17+ messages in thread
From: Amir Goldstein @ 2024-07-03 5:58 UTC (permalink / raw)
To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp
On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> sorry for quite late.
>
> On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > >
> > > hi, Amir,
> > >
> > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > >
> > > > > hi, Amir,
> > > > >
> > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > > <oliver.sang@intel.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > > >
> > > > > > >
> > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > > >
> > > > > >
> > > > > > Jan,
> > > > > >
> > > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > > path information on struct file_range on the stack before the optimizations
> > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > > >
> > > > > > Luckily, we already have the way to check
> > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > > > FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > > code entirely.
> > > > > >
> > > > > > Oliver,
> > > > > >
> > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > > >
> > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > > with pre-content events
> > > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > > >
> > > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > > pre-content permission event),
> > > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > > range in pre-content event).
> > > > > > no need to test all middle commits.
> > > > >
> > > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > > >
> > > > > =========================================================================================
> > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > >
> > > > > commit:
> > > > > v6.10-rc1
> > > > > a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > > >
> > > > > v6.10-rc1 a82fd282befc71d99106bf31066
> > > > > ---------------- ---------------------------
> > > > > %stddev %change %stddev
> > > > > \ | \
> > > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput
> > > > >
> > > > > full data is as below [1]
> > > > >
> > > > >
> > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > > >
> > > > > it also has a small regression comparing to its parent, but better also.
> > > > >
> > > > > =========================================================================================
> > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > >
> > > > > commit:
> > > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > > >
> > > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > > ---------------- ---------------------------
> > > > > %stddev %change %stddev
> > > > > \ | \
> > > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> > > > >
> > > > > full data is as below [2]
> > > > >
> > > >
> > > > Ok, this looks sane, the small overhead in the write path makes sense.
> >
> > On second look, while a small regression from 64108c0b47db9 could make
> > sense, because it changes the inline fsnotify hooks, the extra regression from
> > the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> > touch any code that affects the executed functions, so I have to wonder how
> > reliable are those results.
> >
> > Could you re-test the commits 94167e071109d..a82fd282befc7?
>
> since the branch is:
>
> * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <---
> * f301cd18006c3 fanotify: rename a misnamed constant
> * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <---
> * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d
> * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> * aca4084213276 fsnotify: generate pre-content permission event on open
> * 93656e196b006 fsnotify: introduce pre-content permission event
> * 1613e604df0cd (tag: v6.10-rc1,
>
>
> I made below comparison, which shows little difference among 3 commits:
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> a82fd282befc7 fanotify: report file range info with pre-content events
>
> 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput
>
>
Hi Oliver,
Perhaps I am not reading the report right, but how do these numbers reconcile
with the previous report of regression:
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
64108c0b47db9 ("fanotify: pass optional file access range in
pre-content event")
94167e071109d573 64108c0b47db91b20d658a89969
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.163e+08 -2.4% 1.135e+08 unixbench.throughput
Is this a case of unstable results? something else?
> >
> > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > > > before the rest of the pre-content infrastructure, because together they
> > > > would still be a performance win.
> > > >
> > > > Can you please compare this branch to v6.9?
> > >
> > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
> > >
> >
> > This is a bit surprising to me, because a5e57b4d370c should have been a pretty
> > big performance win for the common case.
>
> in our this unixbench tests, a5e57b4d370c introduce a small regression comparing
> to its parent (477cf917dd028).
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> v6.9
> 477cf917dd028 fsnotify: use an enum for group priority constants
> a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers
> v6.10-rc1
>
> v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919 v6.10-rc1
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 1.219e+08 +2.8% 1.253e+08 +0.4% 1.224e+08 -0.2% 1.216e+08 unixbench.throughput
>
Assuming this is a stable result,
that's very small regression compared to the improvements before it
and one that I dare to call acceptable for this micro buffered write benchmark
because of the big gain in other workloads.
>
> BTW, for a5e57b4d370c, there is another regression report in
> https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
> which also includes some unixbench improvement results, but different sub-tests
> on different platform.
>
Right. I forgot about this one.
Sorry for dropping the ball.
I do not know what is going on there.
I will try to take a look again.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-03 5:58 ` Amir Goldstein
@ 2024-07-03 7:21 ` Oliver Sang
2024-07-03 16:20 ` Amir Goldstein
0 siblings, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-07-03 7:21 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp
hi, Amir,
On Wed, Jul 03, 2024 at 08:58:13AM +0300, Amir Goldstein wrote:
> On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Amir,
> >
> > sorry for quite late.
> >
> > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > >
> > > > hi, Amir,
> > > >
> > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > > >
> > > > > > hi, Amir,
> > > > > >
> > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > > > <oliver.sang@intel.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > > > >
> > > > > > > >
> > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > > > >
> > > > > > >
> > > > > > > Jan,
> > > > > > >
> > > > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > > > path information on struct file_range on the stack before the optimizations
> > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > > > >
> > > > > > > Luckily, we already have the way to check
> > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > > > > FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > > > code entirely.
> > > > > > >
> > > > > > > Oliver,
> > > > > > >
> > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > > > >
> > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > > > with pre-content events
> > > > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > > > >
> > > > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > > > pre-content permission event),
> > > > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > > > range in pre-content event).
> > > > > > > no need to test all middle commits.
> > > > > >
> > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > > > >
> > > > > > =========================================================================================
> > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > >
> > > > > > commit:
> > > > > > v6.10-rc1
> > > > > > a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > > > >
> > > > > > v6.10-rc1 a82fd282befc71d99106bf31066
> > > > > > ---------------- ---------------------------
> > > > > > %stddev %change %stddev
> > > > > > \ | \
> > > > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput
> > > > > >
> > > > > > full data is as below [1]
> > > > > >
> > > > > >
> > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > > > >
> > > > > > it also has a small regression comparing to its parent, but better also.
> > > > > >
> > > > > > =========================================================================================
> > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > >
> > > > > > commit:
> > > > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > > > >
> > > > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > > > ---------------- ---------------------------
> > > > > > %stddev %change %stddev
> > > > > > \ | \
> > > > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> > > > > >
> > > > > > full data is as below [2]
> > > > > >
> > > > >
> > > > > Ok, this looks sane, the small overhead in the write path makes sense.
> > >
> > > On second look, while a small regression from 64108c0b47db9 could make
> > > sense, because it changes the inline fsnotify hooks, the extra regression from
> > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> > > touch any code that affects the executed functions, so I have to wonder how
> > > reliable are those results.
> > >
> > > Could you re-test the commits 94167e071109d..a82fd282befc7?
> >
> > since the branch is:
> >
> > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <---
> > * f301cd18006c3 fanotify: rename a misnamed constant
> > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <---
> > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d
> > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > * aca4084213276 fsnotify: generate pre-content permission event on open
> > * 93656e196b006 fsnotify: introduce pre-content permission event
> > * 1613e604df0cd (tag: v6.10-rc1,
> >
> >
> > I made below comparison, which shows little difference among 3 commits:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > a82fd282befc7 fanotify: report file range info with pre-content events
> >
> > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
> > ---------------- --------------------------- ---------------------------
> > %stddev %change %stddev %change %stddev
> > \ | \ | \
> > 1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput
> >
> >
>
> Hi Oliver,
>
> Perhaps I am not reading the report right, but how do these numbers reconcile
> with the previous report of regression:
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> 64108c0b47db9 ("fanotify: pass optional file access range in
> pre-content event")
>
> 94167e071109d573 64108c0b47db91b20d658a89969
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
>
> Is this a case of unstable results? something else?
you could see the data for 94167e071109d are 1.163e+08 in both table.
the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
"unixbench.throughput": [
121545292.8,
121629889.4,
121598992.0,
121492095.5,
121645038.1,
121556286.9
],
for the branch tip a82fd282befc7:
"unixbench.throughput": [
116675606.7,
116840611.2,
116738966.0,
116956953.1,
116704901.9,
116997628.3,
117141733.7,
116660495.4
],
let me combine the results from this branch together:
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
v6.10-rc1
68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
64108c0b47db9 fanotify: pass optional file access range in pre-content event
a82fd282befc7 fanotify: report file range info with pre-content
v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
---------------- --------------------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \ | \
1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput
one thing I want to mention is the "%change" is always comparing to the first
column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
and so on.
then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
-2.4% regression compareing to 94167e071109d.
from above table, along the branch, the performance is kind of fluctuating,
dropped most on 64108c0b47db9, but then recovered a little on tip.
our bot will not bisect the improvment between 64108c0b47db9 and the tip, since
the whole branch show a drop.
>
> > >
> > > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > > > > before the rest of the pre-content infrastructure, because together they
> > > > > would still be a performance win.
> > > > >
> > > > > Can you please compare this branch to v6.9?
> > > >
> > > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
> > > >
> > >
> > > This is a bit surprising to me, because a5e57b4d370c should have been a pretty
> > > big performance win for the common case.
> >
> > in our this unixbench tests, a5e57b4d370c introduce a small regression comparing
> > to its parent (477cf917dd028).
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> > v6.9
> > 477cf917dd028 fsnotify: use an enum for group priority constants
> > a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers
> > v6.10-rc1
> >
> > v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919 v6.10-rc1
> > ---------------- --------------------------- --------------------------- ---------------------------
> > %stddev %change %stddev %change %stddev %change %stddev
> > \ | \ | \ | \
> > 1.219e+08 +2.8% 1.253e+08 +0.4% 1.224e+08 -0.2% 1.216e+08 unixbench.throughput
> >
>
> Assuming this is a stable result,
> that's very small regression compared to the improvements before it
> and one that I dare to call acceptable for this micro buffered write benchmark
> because of the big gain in other workloads.
again, all data here is comparing to v6.9, so there is a 2.8% improvement on
477cf917dd028 comparing to v6.9, but it drops back on a5e57b4d370c6, whose
data is almost same with v6.9 (so +0.4% comparing to v6.9).
anyway, we normally ignore <1% performance changes, so we won't say
a5e57b4d370c6 or v6.10-rc1 has obvious performance changes comparing to v6.9.
>
> >
> > BTW, for a5e57b4d370c, there is another regression report in
> > https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
> > which also includes some unixbench improvement results, but different sub-tests
> > on different platform.
> >
>
> Right. I forgot about this one.
> Sorry for dropping the ball.
> I do not know what is going on there.
> I will try to take a look again.
>
> Thanks,
> Amir.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-03 7:21 ` Oliver Sang
@ 2024-07-03 16:20 ` Amir Goldstein
2024-07-04 15:39 ` Jan Kara
2024-07-05 2:09 ` Oliver Sang
0 siblings, 2 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-07-03 16:20 UTC (permalink / raw)
To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp
On Wed, Jul 3, 2024 at 10:21 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Wed, Jul 03, 2024 at 08:58:13AM +0300, Amir Goldstein wrote:
> > On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > >
> > > hi, Amir,
> > >
> > > sorry for quite late.
> > >
> > > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> > > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > >
> > > > > hi, Amir,
> > > > >
> > > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > > > >
> > > > > > > hi, Amir,
> > > > > > >
> > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > > > > <oliver.sang@intel.com> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > > > > >
> > > > > > > >
> > > > > > > > Jan,
> > > > > > > >
> > > > > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > > > > path information on struct file_range on the stack before the optimizations
> > > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > > > > >
> > > > > > > > Luckily, we already have the way to check
> > > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > > > > > FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > > > > code entirely.
> > > > > > > >
> > > > > > > > Oliver,
> > > > > > > >
> > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > > > > >
> > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > > > > with pre-content events
> > > > > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > > > > >
> > > > > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > > > > pre-content permission event),
> > > > > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > > > > range in pre-content event).
> > > > > > > > no need to test all middle commits.
> > > > > > >
> > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > > > > >
> > > > > > > =========================================================================================
> > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > > >
> > > > > > > commit:
> > > > > > > v6.10-rc1
> > > > > > > a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > > > > >
> > > > > > > v6.10-rc1 a82fd282befc71d99106bf31066
> > > > > > > ---------------- ---------------------------
> > > > > > > %stddev %change %stddev
> > > > > > > \ | \
> > > > > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput
> > > > > > >
> > > > > > > full data is as below [1]
> > > > > > >
> > > > > > >
> > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > > > > >
> > > > > > > it also has a small regression comparing to its parent, but better also.
> > > > > > >
> > > > > > > =========================================================================================
> > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > > >
> > > > > > > commit:
> > > > > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > > > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > > > > >
> > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > > > > ---------------- ---------------------------
> > > > > > > %stddev %change %stddev
> > > > > > > \ | \
> > > > > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> > > > > > >
> > > > > > > full data is as below [2]
> > > > > > >
> > > > > >
> > > > > > Ok, this looks sane, the small overhead in the write path makes sense.
> > > >
> > > > On second look, while a small regression from 64108c0b47db9 could make
> > > > sense, because it changes the inline fsnotify hooks, the extra regression from
> > > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> > > > touch any code that affects the executed functions, so I have to wonder how
> > > > reliable are those results.
> > > >
> > > > Could you re-test the commits 94167e071109d..a82fd282befc7?
> > >
> > > since the branch is:
> > >
> > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <---
> > > * f301cd18006c3 fanotify: rename a misnamed constant
> > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <---
> > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d
> > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > > * aca4084213276 fsnotify: generate pre-content permission event on open
> > > * 93656e196b006 fsnotify: introduce pre-content permission event
> > > * 1613e604df0cd (tag: v6.10-rc1,
> > >
> > >
> > > I made below comparison, which shows little difference among 3 commits:
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > a82fd282befc7 fanotify: report file range info with pre-content events
> > >
> > > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
> > > ---------------- --------------------------- ---------------------------
> > > %stddev %change %stddev %change %stddev
> > > \ | \ | \
> > > 1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput
> > >
> > >
> >
> > Hi Oliver,
> >
> > Perhaps I am not reading the report right, but how do these numbers reconcile
> > with the previous report of regression:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > 64108c0b47db9 ("fanotify: pass optional file access range in
> > pre-content event")
> >
> > 94167e071109d573 64108c0b47db91b20d658a89969
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> >
> > Is this a case of unstable results? something else?
>
> you could see the data for 94167e071109d are 1.163e+08 in both table.
>
> the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> "unixbench.throughput": [
> 121545292.8,
> 121629889.4,
> 121598992.0,
> 121492095.5,
> 121645038.1,
> 121556286.9
> ],
>
Are all those runs from the same boot?
> for the branch tip a82fd282befc7:
> "unixbench.throughput": [
> 116675606.7,
> 116840611.2,
> 116738966.0,
> 116956953.1,
> 116704901.9,
> 116997628.3,
> 117141733.7,
> 116660495.4
> ],
>
And these run?
Otherwise, we might have a fluctuation that happens at boot time
or at mount time or something.
>
> let me combine the results from this branch together:
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> v6.10-rc1
> 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> a82fd282befc7 fanotify: report file range info with pre-content
>
> v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \ | \
> 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput
>
>
> one thing I want to mention is the "%change" is always comparing to the first
> column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> and so on.
Thanks for clarifying - I did not read it this way.
>
> then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> -2.4% regression compareing to 94167e071109d.
>
> from above table, along the branch, the performance is kind of fluctuating,
> dropped most on 64108c0b47db9, but then recovered a little on tip.
>
I can understand why 64108c0b47db91b would regress performance, but I
cannot think
of any possible explanation why a82fd282befc should improve performance,
so I have to wonder if the regression to -6.6% is not a fluke of some
specific boot/mount?
I pushed a test branch to
https://github.com/amir73il/linux/commits/fsnotify_for_lkp
with an extra patch that un-inlines some helpers to help bisect the
perf report better.
Maybe produce the report with this commit and it sheds some light.
Jan, any other ideas?
> our bot will not bisect the improvment between 64108c0b47db9 and the tip, since
> the whole branch show a drop.
>
> >
> > > >
> > > > > > It may have been a "tactic mistake" merging this optimization to v6.10-rc1
> > > > > > a5e57b4d370c ("fsnotify: optimize the case of no permission event watchers")
> > > > > > before the rest of the pre-content infrastructure, because together they
> > > > > > would still be a performance win.
> > > > > >
> > > > > > Can you please compare this branch to v6.9?
> > > > >
> > > > > there is no obvious diff between v6.9 and v6.10-rc1 for this test in our tests.
> > > > >
> > > >
> > > > This is a bit surprising to me, because a5e57b4d370c should have been a pretty
> > > > big performance win for the common case.
> > >
> > > in our this unixbench tests, a5e57b4d370c introduce a small regression comparing
> > > to its parent (477cf917dd028).
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > > v6.9
> > > 477cf917dd028 fsnotify: use an enum for group priority constants
> > > a5e57b4d370c6 fsnotify: optimize the case of no permission event watchers
> > > v6.10-rc1
> > >
> > > v6.9 477cf917dd02853ba78a73cdeb6 a5e57b4d370c6d320e5bfb0c919 v6.10-rc1
> > > ---------------- --------------------------- --------------------------- ---------------------------
> > > %stddev %change %stddev %change %stddev %change %stddev
> > > \ | \ | \ | \
> > > 1.219e+08 +2.8% 1.253e+08 +0.4% 1.224e+08 -0.2% 1.216e+08 unixbench.throughput
> > >
> >
> > Assuming this is a stable result,
> > that's very small regression compared to the improvements before it
> > and one that I dare to call acceptable for this micro buffered write benchmark
> > because of the big gain in other workloads.
>
> again, all data here is comparing to v6.9, so there is a 2.8% improvement on
> 477cf917dd028 comparing to v6.9, but it drops back on a5e57b4d370c6, whose
> data is almost same with v6.9 (so +0.4% comparing to v6.9).
>
> anyway, we normally ignore <1% performance changes, so we won't say
> a5e57b4d370c6 or v6.10-rc1 has obvious performance changes comparing to v6.9.
>
This fluctuation is also hard to explain.
Jan, any thoughts? things to try?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-03 16:20 ` Amir Goldstein
@ 2024-07-04 15:39 ` Jan Kara
2024-07-05 2:09 ` Oliver Sang
1 sibling, 0 replies; 17+ messages in thread
From: Jan Kara @ 2024-07-04 15:39 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Oliver Sang, Jan Kara, oe-lkp, lkp
On Wed 03-07-24 19:20:49, Amir Goldstein wrote:
> On Wed, Jul 3, 2024 at 10:21 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > On Wed, Jul 03, 2024 at 08:58:13AM +0300, Amir Goldstein wrote:
> > > On Mon, Jul 1, 2024 at 10:42 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > On Tue, Jun 04, 2024 at 03:33:39PM +0300, Amir Goldstein wrote:
> > > > > On Mon, Jun 3, 2024 at 11:13 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > > > On Fri, May 31, 2024 at 08:18:09AM +0300, Amir Goldstein wrote:
> > > > > > > On Fri, May 31, 2024 at 6:15 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > > > > > > > On Wed, May 29, 2024 at 02:17:55PM +0300, Amir Goldstein wrote:
> > > > > > > > > On Wed, May 29, 2024 at 11:26 AM kernel test robot
> > > > > > > > > <oliver.sang@intel.com> wrote:
> > > > > > > > > > kernel test robot noticed a -7.9% regression of unixbench.throughput on:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > commit: 9d1fd61f1d9bb74e44bdcc8767ba7008a08c6075 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > > > > https://github.com/amir73il/linux sb_write_barrier
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jan,
> > > > > > > > >
> > > > > > > > > I speculate that the regression is due to the fact that we store and pass the
> > > > > > > > > path information on struct file_range on the stack before the optimizations
> > > > > > > > > in fsnotify_parent(), so rw_verify_area() pays some price for the stores
> > > > > > > > > and __fsnotify_parent() pays a bigger price for fetches?
> > > > > > > > >
> > > > > > > > > Luckily, we already have the way to check
> > > > > > > > > fsnotify_sb_has_priority_watchers(inode->i_sb,
> > > > > > > > > FSNOTIFY_PRIO_PRE_CONTENT))
> > > > > > > > > so now I used it to optimize out the fsnotify_file_range() inline
> > > > > > > > > code entirely.
> > > > > > > > >
> > > > > > > > > Oliver,
> > > > > > > > >
> > > > > > > > > Can you please re-test with fixed branch (also rebased on v6.10-rc1):
> > > > > > > > >
> > > > > > > > > * a82fd282befc - (fan_pre_content) fanotify: report file range info
> > > > > > > > > with pre-content events
> > > > > > > > > * f301cd18006c - fanotify: rename a misnamed constant
> > > > > > > > > * 64108c0b47db - fanotify: pass optional file access range in pre-content event
> > > > > > > > > * 94167e071109 - fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > > > > > * 68e04c2451ba - fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > > > > > * 83af0c89527a - fsnotify: generate pre-content permission event on exec
> > > > > > > > > * aca408421327 - fsnotify: generate pre-content permission event on open
> > > > > > > > > * 93656e196b00 - fsnotify: introduce pre-content permission event
> > > > > > > > >
> > > > > > > > > The optimization was done in the first commit (fsnotify: introduce
> > > > > > > > > pre-content permission event),
> > > > > > > > > but impacts the regressing commit (fanotify: pass optional file access
> > > > > > > > > range in pre-content event).
> > > > > > > > > no need to test all middle commits.
> > > > > > > >
> > > > > > > > I directly compare the tip with v6.10-rc1, still a regression but better now
> > > > > > > >
> > > > > > > > =========================================================================================
> > > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > > > >
> > > > > > > > commit:
> > > > > > > > v6.10-rc1
> > > > > > > > a82fd282befc7 ("fanotify: report file range info with pre-content events")
> > > > > > > >
> > > > > > > > v6.10-rc1 a82fd282befc71d99106bf31066
> > > > > > > > ---------------- ---------------------------
> > > > > > > > %stddev %change %stddev
> > > > > > > > \ | \
> > > > > > > > 1.216e+08 -3.9% 1.168e+08 unixbench.throughput
> > > > > > > >
> > > > > > > > full data is as below [1]
> > > > > > > >
> > > > > > > >
> > > > > > > > then I checked new "64108c0b47db - fanotify: pass optional file access range in pre-content event"
> > > > > > > >
> > > > > > > > it also has a small regression comparing to its parent, but better also.
> > > > > > > >
> > > > > > > > =========================================================================================
> > > > > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > > > > >
> > > > > > > > commit:
> > > > > > > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > > > > > > 64108c0b47db9 ("fanotify: pass optional file access range in pre-content event")
> > > > > > > >
> > > > > > > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > > > > > > ---------------- ---------------------------
> > > > > > > > %stddev %change %stddev
> > > > > > > > \ | \
> > > > > > > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> > > > > > > >
> > > > > > > > full data is as below [2]
> > > > > > > >
> > > > > > >
> > > > > > > Ok, this looks sane, the small overhead in the write path makes sense.
> > > > >
> > > > > On second look, while a small regression from 64108c0b47db9 could make
> > > > > sense, because it changes the inline fsnotify hooks, the extra regression from
> > > > > the tip of the branch a82fd282befc7 makes no sense at all, as it does not
> > > > > touch any code that affects the executed functions, so I have to wonder how
> > > > > reliable are those results.
> > > > >
> > > > > Could you re-test the commits 94167e071109d..a82fd282befc7?
> > > >
> > > > since the branch is:
> > > >
> > > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events <---
> > > > * f301cd18006c3 fanotify: rename a misnamed constant
> > > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event <---
> > > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event <--- parent of 94167e071109d
> > > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > > > * aca4084213276 fsnotify: generate pre-content permission event on open
> > > > * 93656e196b006 fsnotify: introduce pre-content permission event
> > > > * 1613e604df0cd (tag: v6.10-rc1,
> > > >
> > > >
> > > > I made below comparison, which shows little difference among 3 commits:
> > > >
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > >
> > > > commit:
> > > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > > a82fd282befc7 fanotify: report file range info with pre-content events
> > > >
> > > > 68e04c2451ba03a1 94167e071109d573a5fc1ff3061 a82fd282befc71d99106bf31066
> > > > ---------------- --------------------------- ---------------------------
> > > > %stddev %change %stddev %change %stddev
> > > > \ | \ | \
> > > > 1.174e+08 -0.9% 1.163e+08 -0.5% 1.168e+08 unixbench.throughput
> > > >
> > > >
> > >
> > > Hi Oliver,
> > >
> > > Perhaps I am not reading the report right, but how do these numbers reconcile
> > > with the previous report of regression:
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > > 94167e071109d ("fanotify: introduce FAN_PRE_MODIFY permission event")
> > > 64108c0b47db9 ("fanotify: pass optional file access range in
> > > pre-content event")
> > >
> > > 94167e071109d573 64108c0b47db91b20d658a89969
> > > ---------------- ---------------------------
> > > %stddev %change %stddev
> > > \ | \
> > > 1.163e+08 -2.4% 1.135e+08 unixbench.throughput
> > >
> > > Is this a case of unstable results? something else?
> >
> > you could see the data for 94167e071109d are 1.163e+08 in both table.
> >
> > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> > "unixbench.throughput": [
> > 121545292.8,
> > 121629889.4,
> > 121598992.0,
> > 121492095.5,
> > 121645038.1,
> > 121556286.9
> > ],
> >
>
> Are all those runs from the same boot?
>
> > for the branch tip a82fd282befc7:
> > "unixbench.throughput": [
> > 116675606.7,
> > 116840611.2,
> > 116738966.0,
> > 116956953.1,
> > 116704901.9,
> > 116997628.3,
> > 117141733.7,
> > 116660495.4
> > ],
> >
>
> And these run?
>
> Otherwise, we might have a fluctuation that happens at boot time
> or at mount time or something.
So what I suspect is that the fluctuation actually happens "per compile
time". Depending on how exactly some hot paths get aligned in the compiled
kernel binary wrt CPU cachelines or similar, you get differences in
performance. I've seen that happening quite a few times in the past and the
observed differences are well in that range.
> > let me combine the results from this branch together:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> > v6.10-rc1
> > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > a82fd282befc7 fanotify: report file range info with pre-content
> >
> > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> > %stddev %change %stddev %change %stddev %change %stddev %change %stddev
> > \ | \ | \ | \ | \
> > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput
> >
> >
> > one thing I want to mention is the "%change" is always comparing to the first
> > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > and so on.
>
> Thanks for clarifying - I did not read it this way.
>
> >
> > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > -2.4% regression compareing to 94167e071109d.
> >
> > from above table, along the branch, the performance is kind of fluctuating,
> > dropped most on 64108c0b47db9, but then recovered a little on tip.
> >
>
> I can understand why 64108c0b47db91b would regress performance, but I
> cannot think of any possible explanation why a82fd282befc should improve
> performance, so I have to wonder if the regression to -6.6% is not a
> fluke of some specific boot/mount?
I agree. In my opinion at least some of those changes are not related to
code changes but rather to random code alignment changes.
> I pushed a test branch to
> https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> with an extra patch that un-inlines some helpers to help bisect the
> perf report better.
> Maybe produce the report with this commit and it sheds some light.
>
> Jan, any other ideas?
Not really. These alignment induced fluctuations are annoying but I don't
know of a good way to avoid them. Even narrowing them down is tedious as
the changes on this scale are not easy to see in the profiles. So I'd check
the perf profiles and if we don't see any obvious regression in the changed
places, I'd just ignore the regression...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-03 16:20 ` Amir Goldstein
2024-07-04 15:39 ` Jan Kara
@ 2024-07-05 2:09 ` Oliver Sang
2024-07-05 5:48 ` Amir Goldstein
1 sibling, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-07-05 2:09 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang
hi, Amir,
On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote:
[...]
> > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> > "unixbench.throughput": [
> > 121545292.8,
> > 121629889.4,
> > 121598992.0,
> > 121492095.5,
> > 121645038.1,
> > 121556286.9
> > ],
> >
>
> Are all those runs from the same boot?
no. we reboot machine before each run.
>
> > for the branch tip a82fd282befc7:
> > "unixbench.throughput": [
> > 116675606.7,
> > 116840611.2,
> > 116738966.0,
> > 116956953.1,
> > 116704901.9,
> > 116997628.3,
> > 117141733.7,
> > 116660495.4
> > ],
> >
>
> And these run?
same.
>
> Otherwise, we might have a fluctuation that happens at boot time
> or at mount time or something.
>
> >
> > let me combine the results from this branch together:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> > v6.10-rc1
> > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > a82fd282befc7 fanotify: report file range info with pre-content
> >
> > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> > %stddev %change %stddev %change %stddev %change %stddev %change %stddev
> > \ | \ | \ | \ | \
> > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput
> >
> >
> > one thing I want to mention is the "%change" is always comparing to the first
> > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > and so on.
>
> Thanks for clarifying - I did not read it this way.
>
> >
> > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > -2.4% regression compareing to 94167e071109d.
> >
> > from above table, along the branch, the performance is kind of fluctuating,
> > dropped most on 64108c0b47db9, but then recovered a little on tip.
> >
>
> I can understand why 64108c0b47db91b would regress performance, but I
> cannot think
> of any possible explanation why a82fd282befc should improve performance,
> so I have to wonder if the regression to -6.6% is not a fluke of some
> specific boot/mount?
>
> I pushed a test branch to
> https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> with an extra patch that un-inlines some helpers to help bisect the
> perf report better.
> Maybe produce the report with this commit and it sheds some light.
since
* 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
* a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
* f301cd18006c3 fanotify: rename a misnamed constant
* 64108c0b47db9 fanotify: pass optional file access range in pre-content event
* 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
* 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
* 83af0c89527ab fsnotify: generate pre-content permission event on exec
* aca4084213276 fsnotify: generate pre-content permission event on open
* 93656e196b006 fsnotify: introduce pre-content permission event
* 1613e604df0cd (tag: v6.10-rc1,
we run tests upon new commit. summary report is as below:
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
v6.10-rc1
a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput
since Jan mentioned in a later mail that perf profiles are useful, I put details
as below
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
v6.10-rc1
a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
9.50 ± 24% +55.3% 14.75 ± 24% +53.9% 14.62 ± 24% perf-c2c.DRAM.local
261842 +0.6% 263433 +1.5% 265837 time.involuntary_context_switches
6199 -5.7% 5843 -2.3% 6054 time.user_time
1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput
261842 +0.6% 263433 +1.5% 265837 unixbench.time.involuntary_context_switches
6199 -5.7% 5843 -2.3% 6054 unixbench.time.user_time
4.513e+10 -3.8% 4.339e+10 -4.0% 4.331e+10 unixbench.workload
167317 ± 5% -39.3% 101518 ± 47% -30.5% 116276 ± 39% numa-vmstat.node1.nr_anon_pages
112.92 ± 28% -53.9% 52.06 ± 99% -39.8% 67.98 ± 64% numa-vmstat.node1.nr_anon_transparent_hugepages
78069 +506.3% 473340 ± 77% +495.2% 464673 ± 77% numa-vmstat.node1.nr_file_pages
167460 ± 5% -39.3% 101568 ± 47% -30.5% 116461 ± 39% numa-vmstat.node1.nr_inactive_anon
10649 ± 6% +3703.3% 405022 ± 90% +3625.1% 396698 ± 90% numa-vmstat.node1.nr_unevictable
167460 ± 5% -39.3% 101568 ± 47% -30.5% 116461 ± 39% numa-vmstat.node1.nr_zone_inactive_anon
10649 ± 6% +3703.4% 405022 ± 90% +3625.2% 396698 ± 90% numa-vmstat.node1.nr_zone_unevictable
15473 ±125% +567.1% 103220 ± 85% +366.5% 72185 ±110% numa-meminfo.node0.AnonHugePages
220234 ± 13% +116.8% 477532 ± 37% +101.8% 444469 ± 42% numa-meminfo.node0.AnonPages.max
231368 ± 28% -53.9% 106616 ± 98% -39.8% 139307 ± 64% numa-meminfo.node1.AnonHugePages
668949 ± 5% -39.3% 405873 ± 47% -30.5% 464919 ± 39% numa-meminfo.node1.AnonPages
836146 ± 6% -34.5% 547503 ± 38% -28.1% 601279 ± 34% numa-meminfo.node1.AnonPages.max
312276 +506.3% 1893321 ± 77% +495.2% 1858788 ± 77% numa-meminfo.node1.FilePages
669489 ± 5% -39.3% 406110 ± 47% -30.4% 465687 ± 39% numa-meminfo.node1.Inactive
669489 ± 5% -39.4% 406010 ± 47% -30.5% 465628 ± 39% numa-meminfo.node1.Inactive(anon)
42552 ± 6% +3707.3% 1620116 ± 90% +3628.9% 1586760 ± 90% numa-meminfo.node1.Unevictable
1.458e+11 -2.7% 1.419e+11 -3.8% 1.402e+11 perf-stat.i.branch-instructions
7347431 -1.3% 7251270 ± 2% -2.8% 7140090 perf-stat.i.branch-misses
11.47 ± 6% +2.8 14.29 ± 9% +2.5 13.99 ± 6% perf-stat.i.cache-miss-rate%
3915539 ± 8% +513.8% 24032895 ± 10% +500.6% 23516538 ± 7% perf-stat.i.cache-misses
32425619 ± 3% +391.9% 1.595e+08 ± 5% +388.7% 1.585e+08 ± 4% perf-stat.i.cache-references
2196 +0.4% 2206 +2.4% 2249 perf-stat.i.context-switches
151202 ± 16% -77.0% 34851 ± 59% -75.9% 36442 ± 38% perf-stat.i.cycles-between-cache-misses
6.961e+11 -1.9% 6.829e+11 -3.4% 6.724e+11 perf-stat.i.instructions
1.22 -1.3% 1.20 -2.5% 1.19 perf-stat.i.ipc
0.01 ± 9% +523.5% 0.04 ± 11% +518.2% 0.03 ± 7% perf-stat.overall.MPKI
12.09 ± 6% +3.0 15.08 ± 8% +2.7 14.83 ± 5% perf-stat.overall.cache-miss-rate%
0.75 +2.0% 0.77 +2.6% 0.77 perf-stat.overall.cpi
133775 ± 8% -83.6% 21976 ± 11% -83.4% 22156 ± 7% perf-stat.overall.cycles-between-cache-misses
1.33 -1.9% 1.31 -2.5% 1.30 perf-stat.overall.ipc
5721 +2.0% 5835 +1.8% 5821 perf-stat.overall.path-length
1.452e+11 -2.7% 1.413e+11 -3.9% 1.395e+11 perf-stat.ps.branch-instructions
7332734 -1.3% 7238853 ± 3% -2.9% 7119552 perf-stat.ps.branch-misses
3921138 ± 8% +511.4% 23972111 ± 11% +496.7% 23398300 ± 7% perf-stat.ps.cache-misses
32415461 ± 3% +389.9% 1.588e+08 ± 5% +386.6% 1.577e+08 ± 4% perf-stat.ps.cache-references
2192 +0.4% 2201 +2.3% 2243 perf-stat.ps.context-switches
6.932e+11 -1.9% 6.798e+11 -3.5% 6.691e+11 perf-stat.ps.instructions
2.582e+14 -1.9% 2.532e+14 -2.3% 2.521e+14 perf-stat.total.instructions
13.19 -0.7 12.50 -0.4 12.75 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
7.01 -0.2 6.80 -0.1 6.88 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
1.11 -0.2 0.91 -0.0 1.11 ± 2% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
2.50 -0.2 2.35 +0.1 2.58 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
2.81 -0.1 2.71 ± 2% +0.1 2.94 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
1.68 -0.1 1.59 -0.0 1.64 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
3.73 -0.1 3.64 +0.0 3.76 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.62 -0.1 1.55 -0.1 1.55 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.18 -0.1 2.12 -0.1 2.13 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
3.34 -0.1 3.28 +0.2 3.54 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
0.65 -0.1 0.60 ± 2% +0.0 0.68 perf-profile.calltrace.cycles-pp.w_test
0.92 -0.0 0.87 -0.0 0.88 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.70 -0.0 0.66 -0.0 0.69 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
0.86 -0.0 0.82 -0.0 0.84 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
0.92 -0.0 0.88 -0.0 0.88 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.63 -0.0 0.59 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.86 -0.0 0.83 -0.0 0.81 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
3.53 -0.0 3.50 -0.2 3.37 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
0.68 -0.0 0.66 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.53 -0.0 0.51 -0.0 0.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write
4.23 -0.0 4.22 -0.2 4.05 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
0.72 -0.0 0.71 -0.0 0.71 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.62 -0.0 0.61 -0.0 0.59 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
2.71 +0.0 2.71 -0.1 2.58 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
0.89 +0.0 0.89 ± 2% +0.1 1.00 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
0.96 +0.0 0.98 -0.1 0.90 perf-profile.calltrace.cycles-pp.timestamp_truncate.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
0.75 +0.0 0.77 +0.0 0.75 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
6.88 +0.0 6.91 -0.2 6.67 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
1.13 +0.0 1.17 -0.1 1.08 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
1.92 +0.0 1.96 +0.3 2.18 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
5.30 +0.1 5.36 +0.0 5.32 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
2.12 ± 3% +0.1 2.18 +0.1 2.25 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
5.30 +0.1 5.39 -0.4 4.92 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
6.17 +0.1 6.27 -0.5 5.66 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
96.84 +0.1 96.98 -0.0 96.81 perf-profile.calltrace.cycles-pp.write
0.78 ± 2% +0.3 1.12 ± 7% +0.0 0.80 ± 3% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
2.97 +0.6 3.57 -0.2 2.81 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
3.63 +0.6 4.24 -0.2 3.42 perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
12.01 +0.6 12.63 -0.5 11.54 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
4.32 +0.6 4.95 +0.8 5.12 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.28 +0.8 38.12 +0.0 37.32 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
84.26 +0.9 85.20 +0.4 84.65 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
13.39 +1.0 14.36 +1.0 14.35 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
12.30 +1.0 13.30 +1.0 13.33 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
82.83 +1.0 83.84 +0.5 83.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
57.94 +1.3 59.20 -0.0 57.92 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.99 +1.3 7.25 ± 2% +1.3 7.27 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
78.13 +1.3 79.39 +0.7 78.83 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
74.26 +1.3 75.58 +0.6 74.90 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
7.43 -0.4 7.06 -0.2 7.19 perf-profile.children.cycles-pp.entry_SYSCALL_64
4.42 -0.2 4.18 -0.2 4.25 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.21 -0.2 1.01 -0.0 1.19 perf-profile.children.cycles-pp.syscall_return_via_sysret
7.14 -0.2 6.95 -0.1 7.02 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
4.18 -0.2 4.00 -0.2 4.03 perf-profile.children.cycles-pp.__cond_resched
2.74 -0.2 2.58 +0.1 2.82 perf-profile.children.cycles-pp.apparmor_file_permission
2.42 -0.1 2.30 -0.1 2.31 perf-profile.children.cycles-pp.rcu_all_qs
3.82 -0.1 3.72 +0.0 3.84 perf-profile.children.cycles-pp.__fsnotify_parent
2.98 -0.1 2.88 ± 2% +0.1 3.12 perf-profile.children.cycles-pp.down_write
1.74 -0.1 1.65 -0.0 1.70 perf-profile.children.cycles-pp.up_write
1.99 -0.1 1.89 -0.1 1.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
3.71 -0.1 3.63 +0.2 3.90 perf-profile.children.cycles-pp.security_file_permission
0.99 -0.1 0.91 +0.0 1.02 perf-profile.children.cycles-pp.w_test
2.47 -0.1 2.40 -0.1 2.42 perf-profile.children.cycles-pp.xas_load
1.12 -0.1 1.06 -0.0 1.08 perf-profile.children.cycles-pp.folio_wait_stable
1.26 -0.1 1.20 -0.1 1.21 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
0.75 -0.0 0.71 -0.0 0.71 perf-profile.children.cycles-pp.x64_sys_call
0.98 -0.0 0.94 -0.0 0.95 perf-profile.children.cycles-pp.aa_file_perm
0.46 -0.0 0.42 +0.0 0.46 perf-profile.children.cycles-pp.write@plt
1.10 -0.0 1.06 -0.1 1.04 perf-profile.children.cycles-pp.xattr_resolve_name
0.36 -0.0 0.33 -0.0 0.35 perf-profile.children.cycles-pp.amd_clear_divider
3.76 -0.0 3.73 -0.2 3.59 perf-profile.children.cycles-pp.cap_inode_need_killpriv
0.59 -0.0 0.56 -0.0 0.57 perf-profile.children.cycles-pp.inode_to_bdi
0.38 -0.0 0.35 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
1.05 -0.0 1.03 -0.0 1.03 perf-profile.children.cycles-pp.folio_mapping
3.41 -0.0 3.38 -0.2 3.25 perf-profile.children.cycles-pp.__vfs_getxattr
0.56 -0.0 0.53 -0.0 0.54 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
4.47 -0.0 4.45 -0.2 4.28 perf-profile.children.cycles-pp.security_inode_need_killpriv
0.25 -0.0 0.24 ± 2% -0.0 0.24 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
0.36 -0.0 0.35 -0.0 0.35 perf-profile.children.cycles-pp.is_bad_inode
0.64 -0.0 0.63 -0.0 0.61 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
1.00 +0.0 1.00 ± 2% +0.1 1.12 perf-profile.children.cycles-pp.generic_write_check_limits
1.38 +0.0 1.40 -0.1 1.31 perf-profile.children.cycles-pp.strcmp
0.93 +0.0 0.95 -0.0 0.93 perf-profile.children.cycles-pp.folio_mark_dirty
1.07 +0.0 1.09 -0.1 1.00 perf-profile.children.cycles-pp.timestamp_truncate
7.23 +0.0 7.26 -0.2 7.01 perf-profile.children.cycles-pp.file_remove_privs_flags
2.25 +0.0 2.29 +0.3 2.53 perf-profile.children.cycles-pp.generic_write_checks
5.70 +0.0 5.74 +0.0 5.70 perf-profile.children.cycles-pp.simple_write_end
2.24 ± 3% +0.1 2.30 +0.1 2.37 ± 2% perf-profile.children.cycles-pp.__fdget_pos
98.96 +0.1 99.02 -0.0 98.92 perf-profile.children.cycles-pp.write
5.69 +0.1 5.76 -0.5 5.22 perf-profile.children.cycles-pp.fault_in_readable
6.42 +0.1 6.53 -0.5 5.89 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
0.89 +0.3 1.23 ± 6% +0.0 0.91 ± 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
3.39 +0.6 3.97 -0.2 3.20 perf-profile.children.cycles-pp.inode_needs_update_time
3.96 +0.6 4.57 -0.2 3.73 perf-profile.children.cycles-pp.file_update_time
12.35 +0.6 12.96 -0.5 11.86 perf-profile.children.cycles-pp.__generic_file_write_iter
4.56 +0.8 5.32 +0.9 5.48 perf-profile.children.cycles-pp.rw_verify_area
38.16 +0.8 39.01 +0.0 38.18 perf-profile.children.cycles-pp.generic_perform_write
84.67 +0.9 85.62 +0.4 85.06 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
13.58 +1.0 14.53 +0.9 14.52 perf-profile.children.cycles-pp.simple_write_begin
83.50 +1.0 84.49 +0.4 83.94 perf-profile.children.cycles-pp.do_syscall_64
12.68 +1.0 13.68 +1.0 13.71 perf-profile.children.cycles-pp.__filemap_get_folio
78.74 +1.2 79.99 +0.7 79.42 perf-profile.children.cycles-pp.ksys_write
58.52 +1.3 59.78 -0.0 58.52 perf-profile.children.cycles-pp.generic_file_write_iter
6.18 +1.3 7.44 ± 2% +1.3 7.46 perf-profile.children.cycles-pp.filemap_get_entry
75.13 +1.3 76.41 +0.6 75.74 perf-profile.children.cycles-pp.vfs_write
7.25 -0.6 6.64 -0.3 6.95 perf-profile.self.cycles-pp.vfs_write
6.45 -0.4 6.09 -0.2 6.28 perf-profile.self.cycles-pp.write
4.32 -0.2 4.09 -0.2 4.16 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
1.21 -0.2 1.01 -0.0 1.19 perf-profile.self.cycles-pp.syscall_return_via_sysret
6.98 -0.2 6.79 -0.1 6.86 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
4.52 -0.2 4.35 -0.2 4.35 perf-profile.self.cycles-pp.__filemap_get_folio
2.34 -0.1 2.22 -0.1 2.26 perf-profile.self.cycles-pp.__cond_resched
1.60 -0.1 1.49 ± 2% +0.1 1.71 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission
1.90 -0.1 1.79 -0.1 1.80 perf-profile.self.cycles-pp.do_syscall_64
3.65 -0.1 3.56 +0.0 3.68 perf-profile.self.cycles-pp.__fsnotify_parent
1.47 -0.1 1.38 -0.1 1.41 perf-profile.self.cycles-pp.ksys_write
1.62 -0.1 1.53 -0.0 1.59 perf-profile.self.cycles-pp.up_write
1.09 -0.1 1.02 -0.1 1.04 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
1.80 -0.1 1.74 -0.1 1.74 perf-profile.self.cycles-pp.xas_load
0.79 -0.1 0.73 +0.0 0.83 perf-profile.self.cycles-pp.w_test
1.10 -0.1 1.04 -0.0 1.08 perf-profile.self.cycles-pp.security_file_permission
1.25 -0.1 1.20 -0.1 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.66 -0.1 1.60 -0.1 1.61 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.41 -0.1 1.36 -0.0 1.36 perf-profile.self.cycles-pp.rcu_all_qs
0.90 -0.0 0.86 -0.1 0.81 perf-profile.self.cycles-pp.simple_write_begin
0.88 -0.0 0.84 -0.0 0.85 perf-profile.self.cycles-pp.aa_file_perm
0.80 -0.0 0.76 -0.0 0.76 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
0.62 -0.0 0.59 -0.0 0.59 perf-profile.self.cycles-pp.x64_sys_call
1.39 -0.0 1.36 -0.1 1.34 perf-profile.self.cycles-pp.__vfs_getxattr
0.53 -0.0 0.50 -0.0 0.51 perf-profile.self.cycles-pp.folio_wait_stable
1.69 -0.0 1.67 +0.1 1.81 perf-profile.self.cycles-pp.generic_file_write_iter
0.87 -0.0 0.85 -0.0 0.85 perf-profile.self.cycles-pp.folio_mapping
0.56 -0.0 0.53 -0.0 0.53 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
1.93 -0.0 1.91 ± 2% +0.2 2.10 perf-profile.self.cycles-pp.down_write
0.24 -0.0 0.22 -0.0 0.23 perf-profile.self.cycles-pp.amd_clear_divider
0.25 -0.0 0.23 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.__x64_sys_write
0.35 -0.0 0.34 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi
1.15 -0.0 1.13 -0.0 1.11 perf-profile.self.cycles-pp.__generic_file_write_iter
0.61 -0.0 0.60 -0.0 0.59 perf-profile.self.cycles-pp.xattr_resolve_name
0.22 -0.0 0.21 -0.0 0.21 ± 2% perf-profile.self.cycles-pp.noop_dirty_folio
0.35 -0.0 0.34 -0.0 0.34 perf-profile.self.cycles-pp.cap_inode_need_killpriv
0.79 -0.0 0.79 ± 2% +0.1 0.88 perf-profile.self.cycles-pp.generic_write_check_limits
0.52 +0.0 0.52 -0.0 0.49 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
3.36 +0.0 3.36 -0.2 3.16 perf-profile.self.cycles-pp.generic_perform_write
0.70 +0.0 0.71 -0.0 0.68 perf-profile.self.cycles-pp.security_inode_need_killpriv
1.30 +0.0 1.32 +0.2 1.46 perf-profile.self.cycles-pp.generic_write_checks
0.66 +0.0 0.69 -0.0 0.62 perf-profile.self.cycles-pp.file_update_time
1.03 +0.0 1.06 -0.1 0.97 perf-profile.self.cycles-pp.strcmp
2.75 +0.0 2.79 +0.0 2.76 perf-profile.self.cycles-pp.simple_write_end
0.72 +0.0 0.77 -0.1 0.66 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
0.87 +0.1 0.93 -0.1 0.82 perf-profile.self.cycles-pp.timestamp_truncate
2.10 ± 3% +0.1 2.16 +0.1 2.23 ± 2% perf-profile.self.cycles-pp.__fdget_pos
2.04 +0.1 2.09 -0.0 1.99 perf-profile.self.cycles-pp.file_remove_privs_flags
5.54 +0.1 5.60 -0.5 5.08 perf-profile.self.cycles-pp.fault_in_readable
1.51 +0.2 1.69 -0.1 1.37 perf-profile.self.cycles-pp.inode_needs_update_time
0.78 ± 2% +0.3 1.11 ± 7% +0.0 0.80 ± 2% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
0.84 +1.0 1.81 +0.8 1.69 perf-profile.self.cycles-pp.rw_verify_area
3.66 +1.3 4.98 ± 4% +1.3 4.99 perf-profile.self.cycles-pp.filemap_get_entry
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-05 2:09 ` Oliver Sang
@ 2024-07-05 5:48 ` Amir Goldstein
2024-07-08 5:40 ` Oliver Sang
2024-07-25 13:41 ` Jan Kara
0 siblings, 2 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-07-05 5:48 UTC (permalink / raw)
To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp
On Fri, Jul 5, 2024 at 5:09 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote:
>
> [...]
>
> > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> > > "unixbench.throughput": [
> > > 121545292.8,
> > > 121629889.4,
> > > 121598992.0,
> > > 121492095.5,
> > > 121645038.1,
> > > 121556286.9
> > > ],
> > >
> >
> > Are all those runs from the same boot?
>
> no. we reboot machine before each run.
>
> >
> > > for the branch tip a82fd282befc7:
> > > "unixbench.throughput": [
> > > 116675606.7,
> > > 116840611.2,
> > > 116738966.0,
> > > 116956953.1,
> > > 116704901.9,
> > > 116997628.3,
> > > 117141733.7,
> > > 116660495.4
> > > ],
> > >
> >
> > And these run?
>
> same.
>
> >
> > Otherwise, we might have a fluctuation that happens at boot time
> > or at mount time or something.
> >
> > >
> > > let me combine the results from this branch together:
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > > v6.10-rc1
> > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > a82fd282befc7 fanotify: report file range info with pre-content
> > >
> > > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> > > %stddev %change %stddev %change %stddev %change %stddev %change %stddev
> > > \ | \ | \ | \ | \
> > > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput
> > >
> > >
> > > one thing I want to mention is the "%change" is always comparing to the first
> > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > > and so on.
> >
> > Thanks for clarifying - I did not read it this way.
> >
> > >
> > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > > -2.4% regression compareing to 94167e071109d.
> > >
> > > from above table, along the branch, the performance is kind of fluctuating,
> > > dropped most on 64108c0b47db9, but then recovered a little on tip.
> > >
> >
> > I can understand why 64108c0b47db91b would regress performance, but I
> > cannot think
> > of any possible explanation why a82fd282befc should improve performance,
> > so I have to wonder if the regression to -6.6% is not a fluke of some
> > specific boot/mount?
> >
> > I pushed a test branch to
> > https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> > with an extra patch that un-inlines some helpers to help bisect the
> > perf report better.
> > Maybe produce the report with this commit and it sheds some light.
>
> since
>
> * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> * f301cd18006c3 fanotify: rename a misnamed constant
> * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> * aca4084213276 fsnotify: generate pre-content permission event on open
> * 93656e196b006 fsnotify: introduce pre-content permission event
> * 1613e604df0cd (tag: v6.10-rc1,
>
> we run tests upon new commit. summary report is as below:
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> v6.10-rc1
> a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
>
> v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput
>:
>
> since Jan mentioned in a later mail that perf profiles are useful, I put details
> as below
Thanks.
That clarifies that the cycles are spent in the "optimization code" itself.
I pushed a new version to the fsnotify_for_lkp branch with a possible fix commit
at the base of the branch.
Hopefully, with this fix, the compiler will be able to optimize smarter and
the generated fast path code will be less sensitive to code alignment ???
If it works, it may eliminate some of the regressions throughout this branch and
may also improve the stress-ng regression that you reported on v6.10-rc1 [1].
* e0aaae806edc - (fsnotify_for_lkp) fanotify: report file range info
with pre-content events
* a28c32866bb3 - fanotify: rename a misnamed constant
* 61baabbdceaa - fanotify: pass optional file access range in pre-content event
* 72e76d909afd - fanotify: introduce FAN_PRE_MODIFY permission event
* 1c71a12ff3ce - fanotify: introduce FAN_PRE_ACCESS permission event
* 38a903de931a - fsnotify: generate pre-content permission event on exec
* 70be29706389 - fsnotify: generate pre-content permission event on open
* 96768b7d6721 - fsnotify: introduce pre-content permission event
* 28d5b4a88241 - fsnotify: avoid multiple fsnotify_sb_info() access in
permission hooks
Fingers crossed...
Thanks,
Amir.
[1] https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-05 5:48 ` Amir Goldstein
@ 2024-07-08 5:40 ` Oliver Sang
2024-07-08 16:37 ` Amir Goldstein
2024-07-25 13:41 ` Jan Kara
1 sibling, 1 reply; 17+ messages in thread
From: Oliver Sang @ 2024-07-08 5:40 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jan Kara, oe-lkp, lkp, oliver.sang
hi, Amir,
On Fri, Jul 05, 2024 at 08:48:28AM +0300, Amir Goldstein wrote:
[...]
>
> Thanks.
> That clarifies that the cycles are spent in the "optimization code" itself.
>
> I pushed a new version to the fsnotify_for_lkp branch with a possible fix commit
> at the base of the branch.
>
> Hopefully, with this fix, the compiler will be able to optimize smarter and
> the generated fast path code will be less sensitive to code alignment ???
>
> If it works, it may eliminate some of the regressions throughout this branch and
> may also improve the stress-ng regression that you reported on v6.10-rc1 [1].
>
> * e0aaae806edc - (fsnotify_for_lkp) fanotify: report file range info
> with pre-content events
> * a28c32866bb3 - fanotify: rename a misnamed constant
> * 61baabbdceaa - fanotify: pass optional file access range in pre-content event
> * 72e76d909afd - fanotify: introduce FAN_PRE_MODIFY permission event
> * 1c71a12ff3ce - fanotify: introduce FAN_PRE_ACCESS permission event
> * 38a903de931a - fsnotify: generate pre-content permission event on exec
> * 70be29706389 - fsnotify: generate pre-content permission event on open
> * 96768b7d6721 - fsnotify: introduce pre-content permission event
> * 28d5b4a88241 - fsnotify: avoid multiple fsnotify_sb_info() access in
> permission hooks
>
> Fingers crossed...
unfortunately, seems no luck. I combine the results with 96768b7d6721 and its
parent since 96768b7d6721 introduces most regression.
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
v6.10-rc1
28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks
96768b7d67219 fsnotify: introduce pre-content permission event
e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events
v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
1.218e+08 -0.3% 1.214e+08 -7.6% 1.125e+08 -6.4% 1.14e+08 unixbench.throughput
detail is as below [2]
>
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
for this report, I also retest on new branch. seems the regression reduced to
around 10%, but we cannot get stable data on this new branch, so we cannot say
if it really becomes better now.
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/full/stress-ng/60s
commit:
v6.10-rc1
e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events
v6.10-rc1 e0aaae806edc3411d84dc0d66fe
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.161e+08 ± 5% -9.5% 1.05e+08 ± 10% stress-ng.full.ops
1934587 ± 5% -9.5% 1750464 ± 10% stress-ng.full.ops_per_sec
[2]
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
commit:
v6.10-rc1
28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks
96768b7d67219 fsnotify: introduce pre-content permission event
e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events
v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
6215 +0.1% 6223 -8.9% 5661 -7.9% 5724 time.user_time
0.58 -0.0 0.54 ± 18% -0.1 0.52 -0.0 0.55 ± 11% mpstat.cpu.all.irq%
0.01 ± 4% -0.0 0.01 ± 13% -0.0 0.00 ± 2% -0.0 0.01 ± 6% mpstat.cpu.all.soft%
7.59 ± 59% -58.7% 3.14 ± 63% -61.7% 2.90 ± 52% -33.4% 5.06 ± 50% sched_debug.cfs_rq:/.util_est.min
0.00 ± 72% -148.0% -0.00 -50.9% 0.00 ±244% +12.3% 0.00 ±102% sched_debug.cpu.nr_uninterruptible.avg
1.218e+08 -0.3% 1.214e+08 -7.6% 1.125e+08 -6.4% 1.14e+08 unixbench.throughput
6215 +0.1% 6223 -8.9% 5661 -7.9% 5724 unixbench.time.user_time
4.521e+10 -0.3% 4.506e+10 -7.7% 4.172e+10 -6.4% 4.231e+10 unixbench.workload
1.458e+11 -7.5% 1.35e+11 ± 18% -6.8% 1.359e+11 -9.5% 1.32e+11 ± 11% perf-stat.i.branch-instructions
3742171 ± 4% -16.8% 3112873 ± 20% +80.4% 6752235 ± 7% +403.3% 18836125 ± 14% perf-stat.i.cache-misses
32402657 ± 3% -19.5% 26094697 ± 16% +77.8% 57627688 ± 4% +356.5% 1.479e+08 ± 11% perf-stat.i.cache-references
0.95 +9.8% 1.04 ± 20% +5.8% 1.00 ± 2% +9.3% 1.03 ± 13% perf-stat.i.cpi
161794 ± 8% -2.2% 158309 ± 20% -49.9% 81139 ± 15% -77.3% 36784 ± 54% perf-stat.i.cycles-between-cache-misses
6.963e+11 -7.4% 6.445e+11 ± 18% -6.5% 6.513e+11 -8.8% 6.353e+11 ± 11% perf-stat.i.instructions
1.22 -4.7% 1.16 ± 10% -6.3% 1.14 -6.5% 1.14 ± 6% perf-stat.i.ipc
0.01 ± 4% -10.3% 0.00 ± 6% +92.9% 0.01 ± 7% +453.4% 0.03 ± 14% perf-stat.overall.MPKI
0.75 +0.0% 0.75 +6.8% 0.80 +4.3% 0.78 perf-stat.overall.cpi
139258 ± 4% +11.8% 155626 ± 6% -44.5% 77325 ± 7% -80.8% 26737 ± 14% perf-stat.overall.cycles-between-cache-misses
1.33 -0.0% 1.33 -6.3% 1.25 -4.1% 1.28 perf-stat.overall.ipc
5722 +0.3% 5738 +1.6% 5811 +2.6% 5869 perf-stat.overall.path-length
1.452e+11 -7.4% 1.343e+11 ± 18% -6.8% 1.352e+11 -9.5% 1.314e+11 ± 11% perf-stat.ps.branch-instructions
3742620 ± 4% -16.7% 3117570 ± 20% +80.4% 6752430 ± 7% +401.2% 18758290 ± 14% perf-stat.ps.cache-misses
32374621 ± 3% -19.4% 26088859 ± 16% +77.6% 57486380 ± 4% +354.8% 1.473e+08 ± 11% perf-stat.ps.cache-references
6.93e+11 -7.4% 6.415e+11 ± 18% -6.5% 6.481e+11 -8.8% 6.323e+11 ± 11% perf-stat.ps.instructions
2.587e+14 -0.0% 2.586e+14 -6.3% 2.425e+14 -4.0% 2.484e+14 perf-stat.total.instructions
2.85 -0.2 2.62 -0.3 2.55 ± 3% +0.0 2.86 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
5.99 -0.1 5.86 +0.1 6.09 +1.1 7.10 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
12.29 -0.1 12.16 -0.3 11.97 +0.8 13.04 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
13.39 -0.1 13.28 -0.4 12.96 +0.7 14.12 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
13.18 -0.1 13.08 -0.9 12.24 -1.0 12.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
1.98 ± 5% -0.1 1.88 -0.0 1.96 ± 3% +0.2 2.20 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
3.27 ± 4% -0.1 3.20 +4.7 7.99 ± 6% -0.1 3.21 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
2.42 ± 6% -0.1 2.37 +4.7 7.16 ± 7% -0.1 2.28 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
3.72 -0.1 3.67 -0.4 3.37 +0.1 3.81 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
58.06 -0.0 58.01 -2.5 55.53 +0.7 58.76 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
74.32 -0.0 74.28 +1.8 76.16 +1.4 75.67 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.90 ± 5% -0.0 0.87 -0.0 0.87 ± 3% +0.1 1.00 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
0.70 -0.0 0.66 -0.0 0.68 -0.0 0.68 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
5.30 -0.0 5.27 -0.2 5.06 -0.1 5.25 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
1.11 -0.0 1.08 -0.1 0.97 -0.2 0.94 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
1.81 -0.0 1.80 -0.1 1.72 -0.1 1.76 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
1.62 -0.0 1.62 -0.1 1.49 -0.1 1.52 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.92 -0.0 0.91 -0.1 0.84 -0.1 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
2.18 -0.0 2.17 -0.1 2.09 -0.1 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
0.62 -0.0 0.62 -0.0 0.58 -0.0 0.60 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
0.63 -0.0 0.63 -0.0 0.58 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.92 -0.0 0.92 -0.1 0.84 -0.0 0.87 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
1.68 +0.0 1.68 -0.1 1.56 -0.1 1.60 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
0.68 +0.0 0.69 -0.0 0.64 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
0.72 ± 2% +0.0 0.72 -0.0 0.69 -0.0 0.70 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
0.85 +0.0 0.86 -0.0 0.81 -0.0 0.81 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
0.74 +0.0 0.75 -0.0 0.70 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
0.86 +0.0 0.86 +0.0 0.88 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
84.29 +0.0 84.30 +1.2 85.47 +1.1 85.34 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
7.00 +0.0 7.02 -0.4 6.59 -0.1 6.89 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
82.86 +0.0 82.87 +1.3 84.15 +1.1 83.98 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.73 +0.0 0.74 -0.0 0.71 -0.0 0.70 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.64 +0.0 0.66 ± 2% -0.1 0.57 -0.1 0.58 perf-profile.calltrace.cycles-pp.w_test
78.16 +0.0 78.18 +1.7 79.81 +1.4 79.57 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
96.83 +0.0 96.86 +0.2 97.07 +0.3 97.12 perf-profile.calltrace.cycles-pp.write
0.78 ± 3% +0.0 0.82 ± 2% +0.1 0.88 ± 3% +0.3 1.05 ± 8% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
1.12 ± 2% +0.0 1.16 -0.1 1.02 -0.0 1.11 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
4.23 +0.0 4.27 -0.3 3.94 -0.1 4.13 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
2.70 +0.0 2.74 -0.2 2.50 -0.1 2.63 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
3.52 +0.0 3.57 -0.3 3.26 -0.1 3.41 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
6.89 +0.1 6.94 -0.4 6.47 -0.1 6.77 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
2.10 ± 5% +0.1 2.15 ± 3% -0.1 2.03 ± 4% +0.2 2.25 ± 3% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.96 +0.1 3.06 +0.1 3.08 +0.4 3.36 ± 2% perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
3.61 +0.1 3.72 +0.1 3.71 +0.4 4.00 ± 2% perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
37.30 +0.1 37.42 -1.6 35.66 +0.2 37.50 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
5.32 +0.2 5.48 -0.1 5.18 -0.1 5.21 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
4.26 ± 3% +0.2 4.42 +5.4 9.61 ± 5% +0.7 4.98 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.20 +0.2 6.37 -0.2 5.98 -0.2 6.03 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
12.00 +0.2 12.19 -0.4 11.61 +0.2 12.25 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
3.02 -0.2 2.79 -0.3 2.72 ± 3% +0.0 3.04 perf-profile.children.cycles-pp.down_write
6.18 -0.1 6.05 +0.1 6.28 +1.1 7.28 perf-profile.children.cycles-pp.filemap_get_entry
12.68 -0.1 12.56 -0.3 12.34 +0.7 13.42 perf-profile.children.cycles-pp.__filemap_get_folio
13.58 -0.1 13.47 -0.5 13.13 +0.7 14.29 perf-profile.children.cycles-pp.simple_write_begin
58.65 -0.1 58.58 -2.5 56.11 +0.7 59.36 perf-profile.children.cycles-pp.generic_file_write_iter
3.64 ± 3% -0.1 3.58 +4.7 8.36 ± 6% -0.1 3.56 perf-profile.children.cycles-pp.security_file_permission
4.19 -0.1 4.14 -0.2 3.96 -0.2 3.98 perf-profile.children.cycles-pp.__cond_resched
2.67 ± 5% -0.1 2.61 +4.7 7.40 ± 6% -0.2 2.51 perf-profile.children.cycles-pp.apparmor_file_permission
3.81 -0.1 3.75 -0.4 3.45 +0.1 3.90 perf-profile.children.cycles-pp.__fsnotify_parent
2.42 -0.0 2.38 -0.2 2.23 -0.2 2.25 perf-profile.children.cycles-pp.rcu_all_qs
7.43 -0.0 7.39 -0.5 6.91 -0.5 6.92 perf-profile.children.cycles-pp.entry_SYSCALL_64
98.97 -0.0 98.94 +0.1 99.06 +0.1 99.04 perf-profile.children.cycles-pp.write
5.69 -0.0 5.66 -0.3 5.44 -0.1 5.62 perf-profile.children.cycles-pp.simple_write_end
1.21 -0.0 1.18 -0.1 1.06 -0.2 1.04 perf-profile.children.cycles-pp.syscall_return_via_sysret
75.19 -0.0 75.17 +1.8 76.99 +1.3 76.53 perf-profile.children.cycles-pp.vfs_write
1.12 -0.0 1.11 -0.1 1.03 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable
1.90 -0.0 1.88 -0.1 1.80 -0.1 1.84 perf-profile.children.cycles-pp.folio_unlock
1.99 -0.0 1.98 -0.2 1.82 -0.1 1.86 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.76 -0.0 0.75 -0.1 0.70 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call
0.23 -0.0 0.23 ± 2% -0.0 0.22 -0.0 0.22 perf-profile.children.cycles-pp.file_remove_privs
2.47 -0.0 2.46 -0.1 2.37 -0.1 2.41 perf-profile.children.cycles-pp.xas_load
0.64 -0.0 0.64 -0.1 0.58 -0.0 0.60 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.37 -0.0 0.37 -0.0 0.35 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
0.36 -0.0 0.36 -0.0 0.33 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider
1.26 -0.0 1.26 -0.1 1.16 -0.1 1.19 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
0.59 -0.0 0.59 -0.1 0.54 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
1.74 -0.0 1.74 -0.1 1.62 -0.1 1.66 perf-profile.children.cycles-pp.up_write
1.05 -0.0 1.05 -0.1 0.97 -0.0 1.01 perf-profile.children.cycles-pp.folio_mapping
0.36 +0.0 0.36 -0.0 0.34 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode
0.33 +0.0 0.33 -0.0 0.30 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio
0.25 +0.0 0.25 -0.0 0.23 -0.0 0.23 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
1.10 +0.0 1.10 -0.1 1.04 -0.1 1.04 perf-profile.children.cycles-pp.xattr_resolve_name
0.85 +0.0 0.85 -0.0 0.80 -0.0 0.82 perf-profile.children.cycles-pp.setattr_should_drop_suidgid
0.55 +0.0 0.55 -0.0 0.52 -0.0 0.52 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.92 +0.0 0.94 -0.1 0.87 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty
0.97 +0.0 0.98 +0.0 0.99 -0.0 0.93 perf-profile.children.cycles-pp.aa_file_perm
83.54 +0.0 83.55 +1.3 84.79 +1.1 84.63 perf-profile.children.cycles-pp.do_syscall_64
7.14 +0.0 7.15 -0.4 6.72 -0.1 7.03 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
0.45 ± 2% +0.0 0.47 -0.0 0.41 -0.0 0.41 perf-profile.children.cycles-pp.write@plt
84.71 +0.0 84.72 +1.2 85.88 +1.1 85.76 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
4.41 +0.0 4.43 -0.3 4.11 -0.3 4.16 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.98 +0.0 1.00 -0.1 0.87 -0.1 0.89 perf-profile.children.cycles-pp.w_test
78.77 +0.0 78.80 +1.6 80.39 +1.4 80.17 perf-profile.children.cycles-pp.ksys_write
0.89 ± 3% +0.0 0.93 +0.1 0.97 ± 3% +0.3 1.16 ± 7% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
1.37 +0.0 1.42 -0.1 1.24 -0.0 1.34 perf-profile.children.cycles-pp.strcmp
4.46 +0.0 4.51 -0.3 4.15 -0.1 4.35 perf-profile.children.cycles-pp.security_inode_need_killpriv
3.75 +0.0 3.80 -0.3 3.47 -0.1 3.63 perf-profile.children.cycles-pp.cap_inode_need_killpriv
7.24 +0.1 7.30 -0.4 6.81 -0.1 7.11 perf-profile.children.cycles-pp.file_remove_privs_flags
3.39 +0.1 3.45 -0.2 3.15 -0.1 3.29 perf-profile.children.cycles-pp.__vfs_getxattr
3.38 +0.1 3.48 +0.1 3.46 +0.4 3.74 ± 2% perf-profile.children.cycles-pp.inode_needs_update_time
3.94 +0.1 4.05 +0.1 4.02 +0.4 4.32 perf-profile.children.cycles-pp.file_update_time
38.19 +0.1 38.33 -1.7 36.51 +0.2 38.39 perf-profile.children.cycles-pp.generic_perform_write
5.71 +0.2 5.87 -0.2 5.49 -0.2 5.54 perf-profile.children.cycles-pp.fault_in_readable
4.50 ± 3% +0.2 4.67 +5.4 9.86 ± 4% +0.9 5.35 perf-profile.children.cycles-pp.rw_verify_area
6.45 +0.2 6.64 -0.2 6.22 -0.2 6.28 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
12.34 +0.2 12.53 -0.4 11.93 +0.2 12.58 perf-profile.children.cycles-pp.__generic_file_write_iter
1.98 ± 3% -0.2 1.79 -0.2 1.74 ± 4% +0.1 2.04 perf-profile.self.cycles-pp.down_write
3.66 -0.1 3.54 +0.2 3.86 +1.2 4.82 ± 2% perf-profile.self.cycles-pp.filemap_get_entry
3.64 -0.1 3.57 -0.4 3.28 +0.1 3.73 perf-profile.self.cycles-pp.__fsnotify_parent
1.54 ± 9% -0.1 1.47 ± 2% +4.7 6.22 ± 8% -0.1 1.44 perf-profile.self.cycles-pp.apparmor_file_permission
1.71 ± 2% -0.1 1.65 -0.0 1.69 ± 2% +0.1 1.82 perf-profile.self.cycles-pp.generic_file_write_iter
6.45 -0.1 6.39 -0.5 5.95 -0.5 6.00 perf-profile.self.cycles-pp.write
7.25 -0.1 7.20 -0.6 6.64 -0.3 6.98 perf-profile.self.cycles-pp.vfs_write
2.35 -0.0 2.31 -0.1 2.25 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched
2.76 -0.0 2.72 -0.1 2.65 -0.0 2.72 perf-profile.self.cycles-pp.simple_write_end
0.80 ± 4% -0.0 0.78 ± 2% -0.1 0.76 ± 2% +0.1 0.88 perf-profile.self.cycles-pp.generic_write_check_limits
1.20 -0.0 1.18 -0.1 1.06 -0.2 1.03 perf-profile.self.cycles-pp.syscall_return_via_sysret
1.41 -0.0 1.39 -0.1 1.30 -0.1 1.34 perf-profile.self.cycles-pp.rcu_all_qs
1.76 -0.0 1.75 -0.1 1.68 -0.0 1.72 perf-profile.self.cycles-pp.folio_unlock
1.90 -0.0 1.90 -0.1 1.76 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64
0.54 -0.0 0.53 -0.0 0.50 -0.0 0.50 perf-profile.self.cycles-pp.folio_wait_stable
1.80 -0.0 1.80 -0.1 1.72 -0.0 1.75 perf-profile.self.cycles-pp.xas_load
1.10 -0.0 1.09 -0.0 1.06 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission
1.48 -0.0 1.47 -0.1 1.36 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
0.62 -0.0 0.62 -0.0 0.58 -0.0 0.58 perf-profile.self.cycles-pp.x64_sys_call
0.35 -0.0 0.35 -0.0 0.32 -0.0 0.33 perf-profile.self.cycles-pp.cap_inode_need_killpriv
0.24 -0.0 0.24 -0.0 0.23 ± 2% -0.0 0.23 perf-profile.self.cycles-pp.is_bad_inode
0.87 -0.0 0.87 -0.1 0.80 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping
0.24 -0.0 0.24 -0.0 0.22 -0.0 0.22 perf-profile.self.cycles-pp.amd_clear_divider
0.70 -0.0 0.70 -0.0 0.67 +0.0 0.72 perf-profile.self.cycles-pp.security_inode_need_killpriv
0.52 -0.0 0.52 -0.0 0.47 -0.0 0.49 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
1.62 +0.0 1.62 -0.1 1.51 -0.1 1.55 perf-profile.self.cycles-pp.up_write
0.22 +0.0 0.22 -0.0 0.21 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
1.09 +0.0 1.10 -0.1 1.01 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.35 +0.0 0.36 -0.0 0.32 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi
1.25 +0.0 1.25 -0.1 1.16 -0.0 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.25 +0.0 0.25 -0.0 0.23 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
0.79 +0.0 0.80 -0.1 0.74 -0.0 0.75 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
2.04 +0.0 2.04 -0.1 1.95 +0.0 2.04 perf-profile.self.cycles-pp.file_remove_privs_flags
0.55 +0.0 0.55 -0.0 0.52 -0.0 0.52 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.61 +0.0 0.62 -0.0 0.58 -0.0 0.59 perf-profile.self.cycles-pp.xattr_resolve_name
0.73 +0.0 0.73 -0.0 0.69 -0.0 0.70 perf-profile.self.cycles-pp.setattr_should_drop_suidgid
4.51 +0.0 4.52 -0.3 4.23 -0.2 4.27 perf-profile.self.cycles-pp.__filemap_get_folio
0.47 +0.0 0.48 -0.0 0.45 +0.0 0.49 perf-profile.self.cycles-pp.folio_mark_dirty
0.87 +0.0 0.88 +0.0 0.89 -0.0 0.83 perf-profile.self.cycles-pp.aa_file_perm
6.97 +0.0 6.98 -0.4 6.56 -0.1 6.86 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
1.39 +0.0 1.40 -0.1 1.30 -0.0 1.34 perf-profile.self.cycles-pp.__vfs_getxattr
1.65 +0.0 1.67 -0.1 1.56 -0.1 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64
4.32 +0.0 4.34 -0.3 4.02 -0.3 4.06 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.90 +0.0 0.92 -0.1 0.80 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin
0.66 +0.0 0.68 -0.0 0.64 +0.0 0.67 perf-profile.self.cycles-pp.file_update_time
1.15 +0.0 1.18 -0.0 1.10 -0.0 1.15 perf-profile.self.cycles-pp.__generic_file_write_iter
0.78 +0.0 0.81 -0.1 0.69 -0.1 0.71 perf-profile.self.cycles-pp.w_test
1.02 ± 2% +0.0 1.06 -0.1 0.91 -0.0 1.00 perf-profile.self.cycles-pp.strcmp
1.50 +0.0 1.53 +0.0 1.52 +0.1 1.58 perf-profile.self.cycles-pp.inode_needs_update_time
0.78 ± 4% +0.0 0.82 +0.1 0.86 ± 3% +0.3 1.05 ± 8% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
3.35 +0.1 3.44 -0.2 3.18 -0.0 3.31 perf-profile.self.cycles-pp.generic_perform_write
5.56 +0.2 5.72 -0.2 5.33 -0.2 5.38 perf-profile.self.cycles-pp.fault_in_readable
0.85 +0.2 1.09 +0.7 1.50 +1.1 1.92 perf-profile.self.cycles-pp.rw_verify_area
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-08 5:40 ` Oliver Sang
@ 2024-07-08 16:37 ` Amir Goldstein
0 siblings, 0 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-07-08 16:37 UTC (permalink / raw)
To: Oliver Sang; +Cc: Jan Kara, oe-lkp, lkp
On Mon, Jul 8, 2024 at 8:40 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Amir,
>
> On Fri, Jul 05, 2024 at 08:48:28AM +0300, Amir Goldstein wrote:
>
> [...]
>
> >
> > Thanks.
> > That clarifies that the cycles are spent in the "optimization code" itself.
> >
> > I pushed a new version to the fsnotify_for_lkp branch with a possible fix commit
> > at the base of the branch.
> >
> > Hopefully, with this fix, the compiler will be able to optimize smarter and
> > the generated fast path code will be less sensitive to code alignment ???
> >
> > If it works, it may eliminate some of the regressions throughout this branch and
> > may also improve the stress-ng regression that you reported on v6.10-rc1 [1].
> >
> > * e0aaae806edc - (fsnotify_for_lkp) fanotify: report file range info
> > with pre-content events
> > * a28c32866bb3 - fanotify: rename a misnamed constant
> > * 61baabbdceaa - fanotify: pass optional file access range in pre-content event
> > * 72e76d909afd - fanotify: introduce FAN_PRE_MODIFY permission event
> > * 1c71a12ff3ce - fanotify: introduce FAN_PRE_ACCESS permission event
> > * 38a903de931a - fsnotify: generate pre-content permission event on exec
> > * 70be29706389 - fsnotify: generate pre-content permission event on open
> > * 96768b7d6721 - fsnotify: introduce pre-content permission event
> > * 28d5b4a88241 - fsnotify: avoid multiple fsnotify_sb_info() access in
> > permission hooks
> >
> > Fingers crossed...
>
> unfortunately, seems no luck. I combine the results with 96768b7d6721 and its
> parent since 96768b7d6721 introduces most regression.
>
Too bad.
I will need to have a think.
Thank you for testing!
Amir.
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> v6.10-rc1
> 28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks
> 96768b7d67219 fsnotify: introduce pre-content permission event
> e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events
>
> v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 1.218e+08 -0.3% 1.214e+08 -7.6% 1.125e+08 -6.4% 1.14e+08 unixbench.throughput
>
> detail is as below [2]
>
>
> >
> > Thanks,
> > Amir.
> >
> > [1] https://lore.kernel.org/all/202404101624.85684be8-oliver.sang@intel.com/
>
> for this report, I also retest on new branch. seems the regression reduced to
> around 10%, but we cannot get stable data on this new branch, so we cannot say
> if it really becomes better now.
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/full/stress-ng/60s
>
> commit:
> v6.10-rc1
> e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events
>
> v6.10-rc1 e0aaae806edc3411d84dc0d66fe
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 1.161e+08 ą 5% -9.5% 1.05e+08 ą 10% stress-ng.full.ops
> 1934587 ą 5% -9.5% 1750464 ą 10% stress-ng.full.ops_per_sec
>
>
>
> [2]
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
>
> commit:
> v6.10-rc1
> 28d5b4a88241d fsnotify: avoid multiple fsnotify_sb_info() access in permission hooks
> 96768b7d67219 fsnotify: introduce pre-content permission event
> e0aaae806edc3 (amir73il/fsnotify_for_lkp) fanotify: report file range info with pre-content events
>
> v6.10-rc1 28d5b4a88241d36788173a41211 96768b7d672192594d54b474077 e0aaae806edc3411d84dc0d66fe
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 6215 +0.1% 6223 -8.9% 5661 -7.9% 5724 time.user_time
> 0.58 -0.0 0.54 ą 18% -0.1 0.52 -0.0 0.55 ą 11% mpstat.cpu.all.irq%
> 0.01 ą 4% -0.0 0.01 ą 13% -0.0 0.00 ą 2% -0.0 0.01 ą 6% mpstat.cpu.all.soft%
> 7.59 ą 59% -58.7% 3.14 ą 63% -61.7% 2.90 ą 52% -33.4% 5.06 ą 50% sched_debug.cfs_rq:/.util_est.min
> 0.00 ą 72% -148.0% -0.00 -50.9% 0.00 ą244% +12.3% 0.00 ą102% sched_debug.cpu.nr_uninterruptible.avg
> 1.218e+08 -0.3% 1.214e+08 -7.6% 1.125e+08 -6.4% 1.14e+08 unixbench.throughput
> 6215 +0.1% 6223 -8.9% 5661 -7.9% 5724 unixbench.time.user_time
> 4.521e+10 -0.3% 4.506e+10 -7.7% 4.172e+10 -6.4% 4.231e+10 unixbench.workload
> 1.458e+11 -7.5% 1.35e+11 ą 18% -6.8% 1.359e+11 -9.5% 1.32e+11 ą 11% perf-stat.i.branch-instructions
> 3742171 ą 4% -16.8% 3112873 ą 20% +80.4% 6752235 ą 7% +403.3% 18836125 ą 14% perf-stat.i.cache-misses
> 32402657 ą 3% -19.5% 26094697 ą 16% +77.8% 57627688 ą 4% +356.5% 1.479e+08 ą 11% perf-stat.i.cache-references
> 0.95 +9.8% 1.04 ą 20% +5.8% 1.00 ą 2% +9.3% 1.03 ą 13% perf-stat.i.cpi
> 161794 ą 8% -2.2% 158309 ą 20% -49.9% 81139 ą 15% -77.3% 36784 ą 54% perf-stat.i.cycles-between-cache-misses
> 6.963e+11 -7.4% 6.445e+11 ą 18% -6.5% 6.513e+11 -8.8% 6.353e+11 ą 11% perf-stat.i.instructions
> 1.22 -4.7% 1.16 ą 10% -6.3% 1.14 -6.5% 1.14 ą 6% perf-stat.i.ipc
> 0.01 ą 4% -10.3% 0.00 ą 6% +92.9% 0.01 ą 7% +453.4% 0.03 ą 14% perf-stat.overall.MPKI
> 0.75 +0.0% 0.75 +6.8% 0.80 +4.3% 0.78 perf-stat.overall.cpi
> 139258 ą 4% +11.8% 155626 ą 6% -44.5% 77325 ą 7% -80.8% 26737 ą 14% perf-stat.overall.cycles-between-cache-misses
> 1.33 -0.0% 1.33 -6.3% 1.25 -4.1% 1.28 perf-stat.overall.ipc
> 5722 +0.3% 5738 +1.6% 5811 +2.6% 5869 perf-stat.overall.path-length
> 1.452e+11 -7.4% 1.343e+11 ą 18% -6.8% 1.352e+11 -9.5% 1.314e+11 ą 11% perf-stat.ps.branch-instructions
> 3742620 ą 4% -16.7% 3117570 ą 20% +80.4% 6752430 ą 7% +401.2% 18758290 ą 14% perf-stat.ps.cache-misses
> 32374621 ą 3% -19.4% 26088859 ą 16% +77.6% 57486380 ą 4% +354.8% 1.473e+08 ą 11% perf-stat.ps.cache-references
> 6.93e+11 -7.4% 6.415e+11 ą 18% -6.5% 6.481e+11 -8.8% 6.323e+11 ą 11% perf-stat.ps.instructions
> 2.587e+14 -0.0% 2.586e+14 -6.3% 2.425e+14 -4.0% 2.484e+14 perf-stat.total.instructions
> 2.85 -0.2 2.62 -0.3 2.55 ą 3% +0.0 2.86 perf-profile.calltrace.cycles-pp.down_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 5.99 -0.1 5.86 +0.1 6.09 +1.1 7.10 perf-profile.calltrace.cycles-pp.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 12.29 -0.1 12.16 -0.3 11.97 +0.8 13.04 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
> 13.39 -0.1 13.28 -0.4 12.96 +0.7 14.12 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 13.18 -0.1 13.08 -0.9 12.24 -1.0 12.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> 1.98 ą 5% -0.1 1.88 -0.0 1.96 ą 3% +0.2 2.20 perf-profile.calltrace.cycles-pp.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 3.27 ą 4% -0.1 3.20 +4.7 7.99 ą 6% -0.1 3.21 perf-profile.calltrace.cycles-pp.security_file_permission.rw_verify_area.vfs_write.ksys_write.do_syscall_64
> 2.42 ą 6% -0.1 2.37 +4.7 7.16 ą 7% -0.1 2.28 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write.ksys_write
> 3.72 -0.1 3.67 -0.4 3.37 +0.1 3.81 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 58.06 -0.0 58.01 -2.5 55.53 +0.7 58.76 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 74.32 -0.0 74.28 +1.8 76.16 +1.4 75.67 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.90 ą 5% -0.0 0.87 -0.0 0.87 ą 3% +0.1 1.00 perf-profile.calltrace.cycles-pp.generic_write_check_limits.generic_write_checks.generic_file_write_iter.vfs_write.ksys_write
> 0.70 -0.0 0.66 -0.0 0.68 -0.0 0.68 perf-profile.calltrace.cycles-pp.__cond_resched.down_write.generic_file_write_iter.vfs_write.ksys_write
> 5.30 -0.0 5.27 -0.2 5.06 -0.1 5.25 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 1.11 -0.0 1.08 -0.1 0.97 -0.2 0.94 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
> 1.81 -0.0 1.80 -0.1 1.72 -0.1 1.76 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> 1.62 -0.0 1.62 -0.1 1.49 -0.1 1.52 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.92 -0.0 0.91 -0.1 0.84 -0.1 0.86 perf-profile.calltrace.cycles-pp.folio_wait_stable.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 2.18 -0.0 2.17 -0.1 2.09 -0.1 2.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.__filemap_get_folio.simple_write_begin.generic_perform_write
> 0.62 -0.0 0.62 -0.0 0.58 -0.0 0.60 perf-profile.calltrace.cycles-pp.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 0.63 -0.0 0.63 -0.0 0.58 -0.0 0.59 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.92 -0.0 0.92 -0.1 0.84 -0.0 0.87 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 1.68 +0.0 1.68 -0.1 1.56 -0.1 1.60 perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 0.68 +0.0 0.69 -0.0 0.64 -0.0 0.66 perf-profile.calltrace.cycles-pp.__cond_resched.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter
> 0.72 ą 2% +0.0 0.72 -0.0 0.69 -0.0 0.70 perf-profile.calltrace.cycles-pp.setattr_should_drop_suidgid.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 0.85 +0.0 0.86 -0.0 0.81 -0.0 0.81 perf-profile.calltrace.cycles-pp.xattr_resolve_name.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> 0.74 +0.0 0.75 -0.0 0.70 +0.0 0.77 perf-profile.calltrace.cycles-pp.folio_mark_dirty.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
> 0.86 +0.0 0.86 +0.0 0.88 -0.0 0.82 perf-profile.calltrace.cycles-pp.aa_file_perm.apparmor_file_permission.security_file_permission.rw_verify_area.vfs_write
> 84.29 +0.0 84.30 +1.2 85.47 +1.1 85.34 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
> 7.00 +0.0 7.02 -0.4 6.59 -0.1 6.89 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 82.86 +0.0 82.87 +1.3 84.15 +1.1 83.98 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.73 +0.0 0.74 -0.0 0.71 -0.0 0.70 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.64 +0.0 0.66 ą 2% -0.1 0.57 -0.1 0.58 perf-profile.calltrace.cycles-pp.w_test
> 78.16 +0.0 78.18 +1.7 79.81 +1.4 79.57 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 96.83 +0.0 96.86 +0.2 97.07 +0.3 97.12 perf-profile.calltrace.cycles-pp.write
> 0.78 ą 3% +0.0 0.82 ą 2% +0.1 0.88 ą 3% +0.3 1.05 ą 8% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter
> 1.12 ą 2% +0.0 1.16 -0.1 1.02 -0.0 1.11 perf-profile.calltrace.cycles-pp.strcmp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags
> 4.23 +0.0 4.27 -0.3 3.94 -0.1 4.13 perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 2.70 +0.0 2.74 -0.2 2.50 -0.1 2.63 perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter
> 3.52 +0.0 3.57 -0.3 3.26 -0.1 3.41 perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter
> 6.89 +0.1 6.94 -0.4 6.47 -0.1 6.77 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> 2.10 ą 5% +0.1 2.15 ą 3% -0.1 2.03 ą 4% +0.2 2.25 ą 3% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 2.96 +0.1 3.06 +0.1 3.08 +0.4 3.36 ą 2% perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write
> 3.61 +0.1 3.72 +0.1 3.71 +0.4 4.00 ą 2% perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
> 37.30 +0.1 37.42 -1.6 35.66 +0.2 37.50 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 5.32 +0.2 5.48 -0.1 5.18 -0.1 5.21 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write
> 4.26 ą 3% +0.2 4.42 +5.4 9.61 ą 5% +0.7 4.98 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 6.20 +0.2 6.37 -0.2 5.98 -0.2 6.03 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
> 12.00 +0.2 12.19 -0.4 11.61 +0.2 12.25 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
> 3.02 -0.2 2.79 -0.3 2.72 ą 3% +0.0 3.04 perf-profile.children.cycles-pp.down_write
> 6.18 -0.1 6.05 +0.1 6.28 +1.1 7.28 perf-profile.children.cycles-pp.filemap_get_entry
> 12.68 -0.1 12.56 -0.3 12.34 +0.7 13.42 perf-profile.children.cycles-pp.__filemap_get_folio
> 13.58 -0.1 13.47 -0.5 13.13 +0.7 14.29 perf-profile.children.cycles-pp.simple_write_begin
> 58.65 -0.1 58.58 -2.5 56.11 +0.7 59.36 perf-profile.children.cycles-pp.generic_file_write_iter
> 3.64 ą 3% -0.1 3.58 +4.7 8.36 ą 6% -0.1 3.56 perf-profile.children.cycles-pp.security_file_permission
> 4.19 -0.1 4.14 -0.2 3.96 -0.2 3.98 perf-profile.children.cycles-pp.__cond_resched
> 2.67 ą 5% -0.1 2.61 +4.7 7.40 ą 6% -0.2 2.51 perf-profile.children.cycles-pp.apparmor_file_permission
> 3.81 -0.1 3.75 -0.4 3.45 +0.1 3.90 perf-profile.children.cycles-pp.__fsnotify_parent
> 2.42 -0.0 2.38 -0.2 2.23 -0.2 2.25 perf-profile.children.cycles-pp.rcu_all_qs
> 7.43 -0.0 7.39 -0.5 6.91 -0.5 6.92 perf-profile.children.cycles-pp.entry_SYSCALL_64
> 98.97 -0.0 98.94 +0.1 99.06 +0.1 99.04 perf-profile.children.cycles-pp.write
> 5.69 -0.0 5.66 -0.3 5.44 -0.1 5.62 perf-profile.children.cycles-pp.simple_write_end
> 1.21 -0.0 1.18 -0.1 1.06 -0.2 1.04 perf-profile.children.cycles-pp.syscall_return_via_sysret
> 75.19 -0.0 75.17 +1.8 76.99 +1.3 76.53 perf-profile.children.cycles-pp.vfs_write
> 1.12 -0.0 1.11 -0.1 1.03 -0.1 1.04 perf-profile.children.cycles-pp.folio_wait_stable
> 1.90 -0.0 1.88 -0.1 1.80 -0.1 1.84 perf-profile.children.cycles-pp.folio_unlock
> 1.99 -0.0 1.98 -0.2 1.82 -0.1 1.86 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 0.76 -0.0 0.75 -0.1 0.70 -0.1 0.70 perf-profile.children.cycles-pp.x64_sys_call
> 0.23 -0.0 0.23 ą 2% -0.0 0.22 -0.0 0.22 perf-profile.children.cycles-pp.file_remove_privs
> 2.47 -0.0 2.46 -0.1 2.37 -0.1 2.41 perf-profile.children.cycles-pp.xas_load
> 0.64 -0.0 0.64 -0.1 0.58 -0.0 0.60 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> 0.37 -0.0 0.37 -0.0 0.35 -0.0 0.35 perf-profile.children.cycles-pp.__x64_sys_write
> 0.36 -0.0 0.36 -0.0 0.33 -0.0 0.34 perf-profile.children.cycles-pp.amd_clear_divider
> 1.26 -0.0 1.26 -0.1 1.16 -0.1 1.19 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
> 0.59 -0.0 0.59 -0.1 0.54 -0.0 0.56 perf-profile.children.cycles-pp.inode_to_bdi
> 1.74 -0.0 1.74 -0.1 1.62 -0.1 1.66 perf-profile.children.cycles-pp.up_write
> 1.05 -0.0 1.05 -0.1 0.97 -0.0 1.01 perf-profile.children.cycles-pp.folio_mapping
> 0.36 +0.0 0.36 -0.0 0.34 -0.0 0.34 perf-profile.children.cycles-pp.is_bad_inode
> 0.33 +0.0 0.33 -0.0 0.30 -0.0 0.31 perf-profile.children.cycles-pp.noop_dirty_folio
> 0.25 +0.0 0.25 -0.0 0.23 -0.0 0.23 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
> 1.10 +0.0 1.10 -0.1 1.04 -0.1 1.04 perf-profile.children.cycles-pp.xattr_resolve_name
> 0.85 +0.0 0.85 -0.0 0.80 -0.0 0.82 perf-profile.children.cycles-pp.setattr_should_drop_suidgid
> 0.55 +0.0 0.55 -0.0 0.52 -0.0 0.52 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.92 +0.0 0.94 -0.1 0.87 +0.0 0.95 perf-profile.children.cycles-pp.folio_mark_dirty
> 0.97 +0.0 0.98 +0.0 0.99 -0.0 0.93 perf-profile.children.cycles-pp.aa_file_perm
> 83.54 +0.0 83.55 +1.3 84.79 +1.1 84.63 perf-profile.children.cycles-pp.do_syscall_64
> 7.14 +0.0 7.15 -0.4 6.72 -0.1 7.03 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
> 0.45 ą 2% +0.0 0.47 -0.0 0.41 -0.0 0.41 perf-profile.children.cycles-pp.write@plt
> 84.71 +0.0 84.72 +1.2 85.88 +1.1 85.76 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 4.41 +0.0 4.43 -0.3 4.11 -0.3 4.16 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.98 +0.0 1.00 -0.1 0.87 -0.1 0.89 perf-profile.children.cycles-pp.w_test
> 78.77 +0.0 78.80 +1.6 80.39 +1.4 80.17 perf-profile.children.cycles-pp.ksys_write
> 0.89 ą 3% +0.0 0.93 +0.1 0.97 ą 3% +0.3 1.16 ą 7% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> 1.37 +0.0 1.42 -0.1 1.24 -0.0 1.34 perf-profile.children.cycles-pp.strcmp
> 4.46 +0.0 4.51 -0.3 4.15 -0.1 4.35 perf-profile.children.cycles-pp.security_inode_need_killpriv
> 3.75 +0.0 3.80 -0.3 3.47 -0.1 3.63 perf-profile.children.cycles-pp.cap_inode_need_killpriv
> 7.24 +0.1 7.30 -0.4 6.81 -0.1 7.11 perf-profile.children.cycles-pp.file_remove_privs_flags
> 3.39 +0.1 3.45 -0.2 3.15 -0.1 3.29 perf-profile.children.cycles-pp.__vfs_getxattr
> 3.38 +0.1 3.48 +0.1 3.46 +0.4 3.74 ą 2% perf-profile.children.cycles-pp.inode_needs_update_time
> 3.94 +0.1 4.05 +0.1 4.02 +0.4 4.32 perf-profile.children.cycles-pp.file_update_time
> 38.19 +0.1 38.33 -1.7 36.51 +0.2 38.39 perf-profile.children.cycles-pp.generic_perform_write
> 5.71 +0.2 5.87 -0.2 5.49 -0.2 5.54 perf-profile.children.cycles-pp.fault_in_readable
> 4.50 ą 3% +0.2 4.67 +5.4 9.86 ą 4% +0.9 5.35 perf-profile.children.cycles-pp.rw_verify_area
> 6.45 +0.2 6.64 -0.2 6.22 -0.2 6.28 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
> 12.34 +0.2 12.53 -0.4 11.93 +0.2 12.58 perf-profile.children.cycles-pp.__generic_file_write_iter
> 1.98 ą 3% -0.2 1.79 -0.2 1.74 ą 4% +0.1 2.04 perf-profile.self.cycles-pp.down_write
> 3.66 -0.1 3.54 +0.2 3.86 +1.2 4.82 ą 2% perf-profile.self.cycles-pp.filemap_get_entry
> 3.64 -0.1 3.57 -0.4 3.28 +0.1 3.73 perf-profile.self.cycles-pp.__fsnotify_parent
> 1.54 ą 9% -0.1 1.47 ą 2% +4.7 6.22 ą 8% -0.1 1.44 perf-profile.self.cycles-pp.apparmor_file_permission
> 1.71 ą 2% -0.1 1.65 -0.0 1.69 ą 2% +0.1 1.82 perf-profile.self.cycles-pp.generic_file_write_iter
> 6.45 -0.1 6.39 -0.5 5.95 -0.5 6.00 perf-profile.self.cycles-pp.write
> 7.25 -0.1 7.20 -0.6 6.64 -0.3 6.98 perf-profile.self.cycles-pp.vfs_write
> 2.35 -0.0 2.31 -0.1 2.25 -0.1 2.22 perf-profile.self.cycles-pp.__cond_resched
> 2.76 -0.0 2.72 -0.1 2.65 -0.0 2.72 perf-profile.self.cycles-pp.simple_write_end
> 0.80 ą 4% -0.0 0.78 ą 2% -0.1 0.76 ą 2% +0.1 0.88 perf-profile.self.cycles-pp.generic_write_check_limits
> 1.20 -0.0 1.18 -0.1 1.06 -0.2 1.03 perf-profile.self.cycles-pp.syscall_return_via_sysret
> 1.41 -0.0 1.39 -0.1 1.30 -0.1 1.34 perf-profile.self.cycles-pp.rcu_all_qs
> 1.76 -0.0 1.75 -0.1 1.68 -0.0 1.72 perf-profile.self.cycles-pp.folio_unlock
> 1.90 -0.0 1.90 -0.1 1.76 -0.1 1.78 perf-profile.self.cycles-pp.do_syscall_64
> 0.54 -0.0 0.53 -0.0 0.50 -0.0 0.50 perf-profile.self.cycles-pp.folio_wait_stable
> 1.80 -0.0 1.80 -0.1 1.72 -0.0 1.75 perf-profile.self.cycles-pp.xas_load
> 1.10 -0.0 1.09 -0.0 1.06 -0.1 1.02 perf-profile.self.cycles-pp.security_file_permission
> 1.48 -0.0 1.47 -0.1 1.36 -0.1 1.38 perf-profile.self.cycles-pp.ksys_write
> 0.62 -0.0 0.62 -0.0 0.58 -0.0 0.58 perf-profile.self.cycles-pp.x64_sys_call
> 0.35 -0.0 0.35 -0.0 0.32 -0.0 0.33 perf-profile.self.cycles-pp.cap_inode_need_killpriv
> 0.24 -0.0 0.24 -0.0 0.23 ą 2% -0.0 0.23 perf-profile.self.cycles-pp.is_bad_inode
> 0.87 -0.0 0.87 -0.1 0.80 -0.0 0.83 perf-profile.self.cycles-pp.folio_mapping
> 0.24 -0.0 0.24 -0.0 0.22 -0.0 0.22 perf-profile.self.cycles-pp.amd_clear_divider
> 0.70 -0.0 0.70 -0.0 0.67 +0.0 0.72 perf-profile.self.cycles-pp.security_inode_need_killpriv
> 0.52 -0.0 0.52 -0.0 0.47 -0.0 0.49 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
> 1.62 +0.0 1.62 -0.1 1.51 -0.1 1.55 perf-profile.self.cycles-pp.up_write
> 0.22 +0.0 0.22 -0.0 0.21 ą 2% -0.0 0.21 perf-profile.self.cycles-pp.noop_dirty_folio
> 1.09 +0.0 1.10 -0.1 1.01 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> 0.35 +0.0 0.36 -0.0 0.32 -0.0 0.34 perf-profile.self.cycles-pp.inode_to_bdi
> 1.25 +0.0 1.25 -0.1 1.16 -0.0 1.20 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.25 +0.0 0.25 -0.0 0.23 -0.0 0.23 perf-profile.self.cycles-pp.__x64_sys_write
> 0.79 +0.0 0.80 -0.1 0.74 -0.0 0.75 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
> 2.04 +0.0 2.04 -0.1 1.95 +0.0 2.04 perf-profile.self.cycles-pp.file_remove_privs_flags
> 0.55 +0.0 0.55 -0.0 0.52 -0.0 0.52 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.61 +0.0 0.62 -0.0 0.58 -0.0 0.59 perf-profile.self.cycles-pp.xattr_resolve_name
> 0.73 +0.0 0.73 -0.0 0.69 -0.0 0.70 perf-profile.self.cycles-pp.setattr_should_drop_suidgid
> 4.51 +0.0 4.52 -0.3 4.23 -0.2 4.27 perf-profile.self.cycles-pp.__filemap_get_folio
> 0.47 +0.0 0.48 -0.0 0.45 +0.0 0.49 perf-profile.self.cycles-pp.folio_mark_dirty
> 0.87 +0.0 0.88 +0.0 0.89 -0.0 0.83 perf-profile.self.cycles-pp.aa_file_perm
> 6.97 +0.0 6.98 -0.4 6.56 -0.1 6.86 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
> 1.39 +0.0 1.40 -0.1 1.30 -0.0 1.34 perf-profile.self.cycles-pp.__vfs_getxattr
> 1.65 +0.0 1.67 -0.1 1.56 -0.1 1.60 perf-profile.self.cycles-pp.entry_SYSCALL_64
> 4.32 +0.0 4.34 -0.3 4.02 -0.3 4.06 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.90 +0.0 0.92 -0.1 0.80 -0.0 0.87 perf-profile.self.cycles-pp.simple_write_begin
> 0.66 +0.0 0.68 -0.0 0.64 +0.0 0.67 perf-profile.self.cycles-pp.file_update_time
> 1.15 +0.0 1.18 -0.0 1.10 -0.0 1.15 perf-profile.self.cycles-pp.__generic_file_write_iter
> 0.78 +0.0 0.81 -0.1 0.69 -0.1 0.71 perf-profile.self.cycles-pp.w_test
> 1.02 ą 2% +0.0 1.06 -0.1 0.91 -0.0 1.00 perf-profile.self.cycles-pp.strcmp
> 1.50 +0.0 1.53 +0.0 1.52 +0.1 1.58 perf-profile.self.cycles-pp.inode_needs_update_time
> 0.78 ą 4% +0.0 0.82 +0.1 0.86 ą 3% +0.3 1.05 ą 8% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> 3.35 +0.1 3.44 -0.2 3.18 -0.0 3.31 perf-profile.self.cycles-pp.generic_perform_write
> 5.56 +0.2 5.72 -0.2 5.33 -0.2 5.38 perf-profile.self.cycles-pp.fault_in_readable
> 0.85 +0.2 1.09 +0.7 1.50 +1.1 1.92 perf-profile.self.cycles-pp.rw_verify_area
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-05 5:48 ` Amir Goldstein
2024-07-08 5:40 ` Oliver Sang
@ 2024-07-25 13:41 ` Jan Kara
2024-07-25 14:04 ` Amir Goldstein
1 sibling, 1 reply; 17+ messages in thread
From: Jan Kara @ 2024-07-25 13:41 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Oliver Sang, Jan Kara, oe-lkp, lkp
On Fri 05-07-24 08:48:28, Amir Goldstein wrote:
> On Fri, Jul 5, 2024 at 5:09 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Amir,
> >
> > On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote:
> >
> > [...]
> >
> > > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> > > > "unixbench.throughput": [
> > > > 121545292.8,
> > > > 121629889.4,
> > > > 121598992.0,
> > > > 121492095.5,
> > > > 121645038.1,
> > > > 121556286.9
> > > > ],
> > > >
> > >
> > > Are all those runs from the same boot?
> >
> > no. we reboot machine before each run.
> >
> > >
> > > > for the branch tip a82fd282befc7:
> > > > "unixbench.throughput": [
> > > > 116675606.7,
> > > > 116840611.2,
> > > > 116738966.0,
> > > > 116956953.1,
> > > > 116704901.9,
> > > > 116997628.3,
> > > > 117141733.7,
> > > > 116660495.4
> > > > ],
> > > >
> > >
> > > And these run?
> >
> > same.
> >
> > >
> > > Otherwise, we might have a fluctuation that happens at boot time
> > > or at mount time or something.
> > >
> > > >
> > > > let me combine the results from this branch together:
> > > >
> > > > =========================================================================================
> > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > >
> > > > commit:
> > > > v6.10-rc1
> > > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > > 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > > a82fd282befc7 fanotify: report file range info with pre-content
> > > >
> > > > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > > > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> > > > %stddev %change %stddev %change %stddev %change %stddev %change %stddev
> > > > \ | \ | \ | \ | \
> > > > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput
> > > >
> > > >
> > > > one thing I want to mention is the "%change" is always comparing to the first
> > > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > > > and so on.
> > >
> > > Thanks for clarifying - I did not read it this way.
> > >
> > > >
> > > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > > > -2.4% regression compareing to 94167e071109d.
> > > >
> > > > from above table, along the branch, the performance is kind of fluctuating,
> > > > dropped most on 64108c0b47db9, but then recovered a little on tip.
> > > >
> > >
> > > I can understand why 64108c0b47db91b would regress performance, but I
> > > cannot think
> > > of any possible explanation why a82fd282befc should improve performance,
> > > so I have to wonder if the regression to -6.6% is not a fluke of some
> > > specific boot/mount?
> > >
> > > I pushed a test branch to
> > > https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> > > with an extra patch that un-inlines some helpers to help bisect the
> > > perf report better.
> > > Maybe produce the report with this commit and it sheds some light.
> >
> > since
> >
> > * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> > * f301cd18006c3 fanotify: rename a misnamed constant
> > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > * aca4084213276 fsnotify: generate pre-content permission event on open
> > * 93656e196b006 fsnotify: introduce pre-content permission event
> > * 1613e604df0cd (tag: v6.10-rc1,
> >
> > we run tests upon new commit. summary report is as below:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> >
> > commit:
> > v6.10-rc1
> > a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> > 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> >
> > v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
> > ---------------- --------------------------- ---------------------------
> > %stddev %change %stddev %change %stddev
> > \ | \ | \
> > 1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput
> >:
> >
> > since Jan mentioned in a later mail that perf profiles are useful, I put details
> > as below
>
> Thanks.
> That clarifies that the cycles are spent in the "optimization code" itself.
BTW, Amir how did you decide that the time is spent in the "optimization
code"? I've seen in the perf output there are more cache misses, smaller
IPC, but didn't see a particular place where this would be happening...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression
2024-07-25 13:41 ` Jan Kara
@ 2024-07-25 14:04 ` Amir Goldstein
0 siblings, 0 replies; 17+ messages in thread
From: Amir Goldstein @ 2024-07-25 14:04 UTC (permalink / raw)
To: Jan Kara; +Cc: Oliver Sang, oe-lkp, lkp
On Thu, Jul 25, 2024 at 4:41 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 05-07-24 08:48:28, Amir Goldstein wrote:
> > On Fri, Jul 5, 2024 at 5:09 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > >
> > > hi, Amir,
> > >
> > > On Wed, Jul 03, 2024 at 07:20:49PM +0300, Amir Goldstein wrote:
> > >
> > > [...]
> > >
> > > > > the data in our tests seem quite stable for a commit, such like for v6.10-rc1:
> > > > > "unixbench.throughput": [
> > > > > 121545292.8,
> > > > > 121629889.4,
> > > > > 121598992.0,
> > > > > 121492095.5,
> > > > > 121645038.1,
> > > > > 121556286.9
> > > > > ],
> > > > >
> > > >
> > > > Are all those runs from the same boot?
> > >
> > > no. we reboot machine before each run.
> > >
> > > >
> > > > > for the branch tip a82fd282befc7:
> > > > > "unixbench.throughput": [
> > > > > 116675606.7,
> > > > > 116840611.2,
> > > > > 116738966.0,
> > > > > 116956953.1,
> > > > > 116704901.9,
> > > > > 116997628.3,
> > > > > 117141733.7,
> > > > > 116660495.4
> > > > > ],
> > > > >
> > > >
> > > > And these run?
> > >
> > > same.
> > >
> > > >
> > > > Otherwise, we might have a fluctuation that happens at boot time
> > > > or at mount time or something.
> > > >
> > > > >
> > > > > let me combine the results from this branch together:
> > > > >
> > > > > =========================================================================================
> > > > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > > > >
> > > > > commit:
> > > > > v6.10-rc1
> > > > > 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > > > 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > > > 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > > > a82fd282befc7 fanotify: report file range info with pre-content
> > > > >
> > > > > v6.10-rc1 68e04c2451ba03a18ccb1192890 94167e071109d573a5fc1ff3061 64108c0b47db91b20d658a89969 a82fd282befc71d99106bf31066
> > > > > ---------------- --------------------------- --------------------------- --------------------------- ---------------------------
> > > > > %stddev %change %stddev %change %stddev %change %stddev %change %stddev
> > > > > \ | \ | \ | \ | \
> > > > > 1.216e+08 -3.5% 1.174e+08 -4.3% 1.163e+08 -6.6% 1.135e+08 -3.9% 1.168e+08 unixbench.throughput
> > > > >
> > > > >
> > > > > one thing I want to mention is the "%change" is always comparing to the first
> > > > > column (v6.10-rc1 here). so 68e04c2451ba0 has a -3.5% regression comparing to
> > > > > v6.10-rc1; 94167e071109d has a -4.3% regression also comparing to v6.10-rc1,
> > > > > and so on.
> > > >
> > > > Thanks for clarifying - I did not read it this way.
> > > >
> > > > >
> > > > > then if just comparing 94167e071109d and 64108c0b47db9, 64108c0b47db9 has about
> > > > > -2.4% regression compareing to 94167e071109d.
> > > > >
> > > > > from above table, along the branch, the performance is kind of fluctuating,
> > > > > dropped most on 64108c0b47db9, but then recovered a little on tip.
> > > > >
> > > >
> > > > I can understand why 64108c0b47db91b would regress performance, but I
> > > > cannot think
> > > > of any possible explanation why a82fd282befc should improve performance,
> > > > so I have to wonder if the regression to -6.6% is not a fluke of some
> > > > specific boot/mount?
> > > >
> > > > I pushed a test branch to
> > > > https://github.com/amir73il/linux/commits/fsnotify_for_lkp
> > > > with an extra patch that un-inlines some helpers to help bisect the
> > > > perf report better.
> > > > Maybe produce the report with this commit and it sheds some light.
> > >
> > > since
> > >
> > > * 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> > > * a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> > > * f301cd18006c3 fanotify: rename a misnamed constant
> > > * 64108c0b47db9 fanotify: pass optional file access range in pre-content event
> > > * 94167e071109d fanotify: introduce FAN_PRE_MODIFY permission event
> > > * 68e04c2451ba0 fanotify: introduce FAN_PRE_ACCESS permission event
> > > * 83af0c89527ab fsnotify: generate pre-content permission event on exec
> > > * aca4084213276 fsnotify: generate pre-content permission event on open
> > > * 93656e196b006 fsnotify: introduce pre-content permission event
> > > * 1613e604df0cd (tag: v6.10-rc1,
> > >
> > > we run tests upon new commit. summary report is as below:
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
> > > gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/300s/lkp-spr-r02/fsbuffer-w/unixbench
> > >
> > > commit:
> > > v6.10-rc1
> > > a82fd282befc7 (amir73il/fan_pre_content) fanotify: report file range info with pre-content events
> > > 388baed2ddef7 (amir73il/fsnotify_for_lkp) fsnotify: un-inline fsnotify helpers
> > >
> > > v6.10-rc1 a82fd282befc71d99106bf31066 388baed2ddef701fe2f07ea0360
> > > ---------------- --------------------------- ---------------------------
> > > %stddev %change %stddev %change %stddev
> > > \ | \ | \
> > > 1.216e+08 -3.9% 1.168e+08 -4.1% 1.166e+08 unixbench.throughput
> > >:
> > >
> > > since Jan mentioned in a later mail that perf profiles are useful, I put details
> > > as below
> >
> > Thanks.
> > That clarifies that the cycles are spent in the "optimization code" itself.
>
> BTW, Amir how did you decide that the time is spent in the "optimization
> code"? I've seen in the perf output there are more cache misses, smaller
> IPC, but didn't see a particular place where this would be happening...
>
Oh no I just meant that because there is so much inlined code
in the hooks, I couldn't say for sure if the cycles are spent in the
optimization code that tries to avoid fsnotify_parent() or also
in fsnotify_parent() inline wrapper, so I used the extern fsnotify_path()
jump point to break inlining.
Maybe this was an unneeded test with obvious outcome, but
I began to suspect that the fsnotify_sb_has_priority_watchers()
optimization may have a bug.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-07-25 14:04 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29 8:25 [amir73il:sb_write_barrier] [fanotify] 9d1fd61f1d: unixbench.throughput -7.9% regression kernel test robot
2024-05-29 11:17 ` Amir Goldstein
2024-05-31 3:15 ` Oliver Sang
2024-05-31 5:18 ` Amir Goldstein
2024-06-03 8:13 ` Oliver Sang
2024-06-04 12:33 ` Amir Goldstein
2024-07-01 7:42 ` Oliver Sang
2024-07-03 5:58 ` Amir Goldstein
2024-07-03 7:21 ` Oliver Sang
2024-07-03 16:20 ` Amir Goldstein
2024-07-04 15:39 ` Jan Kara
2024-07-05 2:09 ` Oliver Sang
2024-07-05 5:48 ` Amir Goldstein
2024-07-08 5:40 ` Oliver Sang
2024-07-08 16:37 ` Amir Goldstein
2024-07-25 13:41 ` Jan Kara
2024-07-25 14:04 ` Amir Goldstein
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.