* [jlayton:delstid] [nfsd] 48ff66155e: fsmark.app_overhead 56.0% regression
@ 2024-12-24 14:35 kernel test robot
0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2024-12-24 14:35 UTC (permalink / raw)
To: Jeff Layton; +Cc: oe-lkp, lkp, linux-nfs, oliver.sang
Hello,
kernel test robot noticed a 56.0% regression of fsmark.app_overhead on:
(but no obvious diff for fsmark.files_per_sec)
commit: 48ff66155e25a3db70cb44b340e1e990349825b8 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git delstid
testcase: fsmark
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:
iterations: 1x
nr_threads: 1t
disk: 1HDD
fs: btrfs
fs2: nfsv4
filesize: 4K
test_size: 40M
sync_method: fsyncBeforeClose
nr_files_per_directory: 1fpd
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | fsmark: fsmark.app_overhead 35.9% regression |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters | cpufreq_governor=performance |
| | disk=1SSD |
| | filesize=16MB |
| | fs2=nfsv4 |
| | fs=ext4 |
| | iterations=1x |
| | nr_directories=16d |
| | nr_files_per_directory=256fpd |
| | nr_threads=32t |
| | sync_method=fsyncBeforeClose |
| | test_size=20G |
+------------------+------------------------------------------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202412242245.dbf6ea4b-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241224/202412242245.dbf6ea4b-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-9.4/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
commit:
1436c81cbe ("nfsd: handle delegated timestamps in SETATTR")
48ff66155e ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
1436c81cbe9bef84 48ff66155e25a3db70cb44b340e
---------------- ---------------------------
%stddev %change %stddev
\ | \
2837 ± 10% +21.1% 3436 ± 9% numa-meminfo.node1.PageTables
709.56 ± 10% +21.1% 859.52 ± 9% numa-vmstat.node1.nr_page_table_pages
1551575 +17.7% 1826031 ± 28% proc-vmstat.pgfault
33.29 +2.9% 34.26 boot-time.boot
24.88 +2.0% 25.38 boot-time.dhcp
3081774 +56.0% 4807945 fsmark.app_overhead
18.53 +0.2% 18.57 fsmark.files_per_sec
53434 -17.3% 44176 fsmark.time.voluntary_context_switches
2827 -11.7% 2496 ± 10% perf-stat.i.context-switches
2822 -11.7% 2492 ± 10% perf-stat.ps.context-switches
2882 -12.0% 2537 ± 11% vmstat.system.cs
4080 -10.7% 3643 ± 13% vmstat.system.in
5.82 ± 4% -0.5 5.30 ± 5% perf-profile.calltrace.cycles-pp.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
6.83 ± 4% -0.5 6.33 ± 4% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
1.67 ± 10% -0.4 1.28 ± 11% perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread
1.67 ± 10% -0.4 1.28 ± 11% perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork
0.64 ± 4% -0.4 0.28 ±100% perf-profile.calltrace.cycles-pp.sched_balance_find_src_group.sched_balance_rq.sched_balance_domains.handle_softirqs.__irq_exit_rcu
1.16 ± 8% -0.3 0.88 ± 18% perf-profile.calltrace.cycles-pp.sched_balance_domains.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.95 ± 9% -0.2 0.70 ± 20% perf-profile.calltrace.cycles-pp.sched_balance_rq.sched_balance_domains.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
0.96 ± 9% +0.1 1.09 ± 9% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.94 ± 9% +0.2 1.09 ± 9% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.79 ± 8% +0.2 0.96 ± 13% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__vfork
0.79 ± 8% +0.2 0.96 ± 13% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__vfork
0.78 ± 6% +0.2 0.95 ± 13% perf-profile.calltrace.cycles-pp.__x64_sys_vfork.do_syscall_64.entry_SYSCALL_64_after_hwframe.__vfork
1.92 ± 24% -0.7 1.20 ± 32% perf-profile.children.cycles-pp.do_softirq
7.32 ± 5% -0.5 6.82 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
1.85 ± 14% -0.5 1.35 ± 18% perf-profile.children.cycles-pp.ip6_xmit
1.92 ± 12% -0.5 1.44 ± 14% perf-profile.children.cycles-pp.inet6_csk_xmit
1.75 ± 14% -0.5 1.28 ± 20% perf-profile.children.cycles-pp.ip6_finish_output2
2.64 ± 4% -0.5 2.18 perf-profile.children.cycles-pp.__rpc_execute
1.45 ± 12% -0.4 1.00 ± 19% perf-profile.children.cycles-pp.__local_bh_enable_ip
1.58 ± 14% -0.4 1.16 ± 4% perf-profile.children.cycles-pp.xprt_request_transmit
1.59 ± 14% -0.4 1.16 ± 4% perf-profile.children.cycles-pp.xprt_transmit
1.30 ± 15% -0.4 0.88 ± 16% perf-profile.children.cycles-pp.__netif_receive_skb_one_core
1.33 ± 14% -0.4 0.92 ± 13% perf-profile.children.cycles-pp.__napi_poll
1.50 ± 11% -0.4 1.09 ± 14% perf-profile.children.cycles-pp.net_rx_action
1.32 ± 14% -0.4 0.92 ± 13% perf-profile.children.cycles-pp.process_backlog
1.19 ± 14% -0.4 0.79 ± 13% perf-profile.children.cycles-pp.ip6_input_finish
1.18 ± 14% -0.4 0.78 ± 13% perf-profile.children.cycles-pp.ip6_protocol_deliver_rcu
1.59 ± 13% -0.4 1.20 ± 5% perf-profile.children.cycles-pp.call_transmit
1.54 ± 14% -0.4 1.15 ± 4% perf-profile.children.cycles-pp.xs_tcp_send_request
1.67 ± 10% -0.4 1.28 ± 11% perf-profile.children.cycles-pp.rpc_async_schedule
1.10 ± 15% -0.4 0.74 ± 14% perf-profile.children.cycles-pp.tcp_v6_rcv
0.86 ± 22% -0.3 0.54 ± 12% perf-profile.children.cycles-pp.tcp_v6_do_rcv
0.81 ± 26% -0.3 0.52 ± 10% perf-profile.children.cycles-pp.tcp_rcv_established
0.42 ± 19% -0.2 0.24 ± 17% perf-profile.children.cycles-pp.___perf_sw_event
0.20 ± 35% -0.2 0.03 ±100% perf-profile.children.cycles-pp.__put_nfs_open_context
0.34 ± 12% -0.1 0.23 ± 22% perf-profile.children.cycles-pp.free_pages_and_swap_cache
0.35 ± 10% -0.1 0.24 ± 18% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
0.05 ± 76% +0.1 0.13 perf-profile.children.cycles-pp.ct_nmi_enter
0.12 ± 36% +0.1 0.21 ± 28% perf-profile.children.cycles-pp.autoremove_wake_function
0.78 ± 6% +0.2 0.95 ± 13% perf-profile.children.cycles-pp.__x64_sys_vfork
1.08 ± 16% +0.2 1.32 ± 10% perf-profile.children.cycles-pp.tick_nohz_stop_tick
1.09 ± 15% +0.2 1.34 ± 10% perf-profile.children.cycles-pp.tick_nohz_idle_stop_tick
1.26 ± 12% +0.3 1.53 ± 7% perf-profile.children.cycles-pp.kernel_clone
2.33 ± 8% +0.3 2.66 ± 7% perf-profile.children.cycles-pp.do_filp_open
2.30 ± 7% +0.3 2.64 ± 7% perf-profile.children.cycles-pp.path_openat
0.30 ± 28% -0.1 0.16 ± 46% perf-profile.self.cycles-pp.tick_nohz_handler
0.12 ± 32% +0.1 0.20 ± 26% perf-profile.self.cycles-pp.__memcg_slab_free_hook
0.05 ± 76% +0.1 0.13 perf-profile.self.cycles-pp.ct_nmi_enter
0.36 ± 33% +0.2 0.54 ± 27% perf-profile.self.cycles-pp.next_uptodate_folio
***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-12/performance/1SSD/16MB/nfsv4/ext4/1x/x86_64-rhel-9.4/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep2/20G/fsmark
commit:
1436c81cbe ("nfsd: handle delegated timestamps in SETATTR")
48ff66155e ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
1436c81cbe9bef84 48ff66155e25a3db70cb44b340e
---------------- ---------------------------
%stddev %change %stddev
\ | \
2.472e+08 ± 4% +35.9% 3.361e+08 ± 4% fsmark.app_overhead
16.00 +0.0% 16.00 fsmark.files_per_sec
65306 ±121% +168.8% 175532 ± 4% numa-meminfo.node1.AnonHugePages
449.72 ± 8% +24.4% 559.64 ± 16% sched_debug.cpu.curr->pid.avg
0.02 ± 37% -75.3% 0.00 ±101% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.61 ± 47% -69.2% 0.19 ± 52% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
3.57 ± 35% -1.6 1.98 ± 24% perf-profile.calltrace.cycles-pp.read
3.35 ± 38% -1.4 1.98 ± 24% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
3.35 ± 38% -1.4 1.98 ± 24% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
1.10 ± 72% +2.0 3.11 ± 14% perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
1.37 ± 83% +2.3 3.66 ± 38% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.57 ± 35% -1.6 1.98 ± 24% perf-profile.children.cycles-pp.read
1.90 ± 60% +2.4 4.28 ± 15% perf-profile.children.cycles-pp.__mmap_region
1.37 ± 83% +2.5 3.92 ± 35% perf-profile.children.cycles-pp.ksys_mmap_pgoff
2.17 ± 57% +2.7 4.84 ± 34% perf-profile.children.cycles-pp.do_mmap
2.17 ± 57% +2.9 5.08 ± 33% perf-profile.children.cycles-pp.vm_mmap_pgoff
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-12-24 14:36 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-24 14:35 [jlayton:delstid] [nfsd] 48ff66155e: fsmark.app_overhead 56.0% regression kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox