From: kernel test robot <oliver.sang@intel.com>
To: Charlie Jenkins <charlie@rivosinc.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
<linux-kernel@vger.kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Huacai Chen <chenhuacai@kernel.org>,
WANG Xuerui <kernel@xen0n.name>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Andy Lutomirski <luto@kernel.org>,
Alexandre Ghiti <alexghiti@rivosinc.com>,
<linux-riscv@lists.infradead.org>, <loongarch@lists.linux.dev>,
Charlie Jenkins <charlie@rivosinc.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()
Date: Wed, 5 Feb 2025 16:13:14 +0800 [thread overview]
Message-ID: <202502051555.85ae6844-lkp@intel.com> (raw)
In-Reply-To: <20250127-riscv_optimize_entry-v4-4-868cf7702dc9@rivosinc.com>
Hello,
kernel test robot noticed a 1.9% improvement of stress-ng.seek.ops_per_sec on:
commit: c1bc35dd5bf6c7fa86a936a4fbe3b8d92fbf8641 ("[PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()")
url: https://github.com/intel-lab-lkp/linux/commits/Charlie-Jenkins/riscv-entry-Convert-ret_from_fork-to-C/20250128-133636
patch link: https://lore.kernel.org/all/20250127-riscv_optimize_entry-v4-4-868cf7702dc9@rivosinc.com/
patch subject: [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: seek
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+--------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.context.swapcontext_calls_per_sec 1.9% improvement |
| test machine | 384 threads 2 sockets Intel(R) Xeon(R) 6972P (Granite Rapids) with 128G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=context |
| | testtime=60s |
+------------------+--------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250205/202502051555.85ae6844-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/seek/stress-ng/60s
commit:
37c1871b51 ("LoongArch: entry: Migrate ret_from_fork() to C")
c1bc35dd5b ("entry: Inline syscall_exit_to_user_mode()")
37c1871b51766a66 c1bc35dd5bf6c7fa86a936a4fbe
---------------- ---------------------------
%stddev %change %stddev
\ | \
104886 ± 19% +19.3% 125157 ± 17% numa-meminfo.node1.Slab
2583 ± 39% +75.4% 4531 ± 40% proc-vmstat.numa_hint_faults_local
179842 +0.6% 180945 vmstat.system.in
177.18 -2.6% 172.49 stress-ng.seek.nanosecs_per_seek
1.223e+09 +1.9% 1.246e+09 stress-ng.seek.ops
20376380 +1.9% 20771261 stress-ng.seek.ops_per_sec
1.05 ± 20% -100.0% 0.00 perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
13.11 ± 28% -100.0% 0.00 perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
3.12 ± 21% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
2785 ± 14% -100.0% 0.00 perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
836.20 ± 43% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
2.07 ± 27% -100.0% 0.00 perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
834.79 ± 44% -100.0% 0.00 perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
2.04 +3.4% 2.11 perf-stat.i.MPKI
3.682e+08 +2.0% 3.754e+08 perf-stat.i.cache-misses
4.637e+08 +1.8% 4.721e+08 perf-stat.i.cache-references
1.23 +1.5% 1.25 perf-stat.i.cpi
603.02 -1.9% 591.60 perf-stat.i.cycles-between-cache-misses
1.798e+11 -1.4% 1.772e+11 perf-stat.i.instructions
0.82 -1.4% 0.80 perf-stat.i.ipc
3902 +1.8% 3972 ± 2% perf-stat.i.minor-faults
3902 +1.8% 3972 ± 2% perf-stat.i.page-faults
2.05 +3.4% 2.12 perf-stat.overall.MPKI
1.23 +1.5% 1.25 perf-stat.overall.cpi
602.25 -1.9% 590.74 perf-stat.overall.cycles-between-cache-misses
0.81 -1.4% 0.80 perf-stat.overall.ipc
3.623e+08 +1.9% 3.693e+08 perf-stat.ps.cache-misses
4.562e+08 +1.8% 4.645e+08 perf-stat.ps.cache-references
1.769e+11 -1.4% 1.743e+11 perf-stat.ps.instructions
3826 +1.8% 3893 ± 2% perf-stat.ps.minor-faults
3826 +1.8% 3893 ± 2% perf-stat.ps.page-faults
1.085e+13 -2.0% 1.063e+13 perf-stat.total.instructions
10.62 ± 2% -0.6 10.02 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.llseek.stress_run
9.46 ± 2% -0.5 8.94 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek.stress_run
0.63 +0.0 0.66 ± 3% perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
1.61 +0.0 1.64 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
2.78 +0.1 2.85 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
2.94 +0.1 3.02 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
8.58 +0.2 8.77 perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.vfs_read.ksys_read.do_syscall_64
8.37 +0.2 8.56 perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.vfs_read.ksys_read
8.96 +0.2 9.17 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
9.53 +0.2 9.75 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
12.86 +0.3 13.15 perf-profile.calltrace.cycles-pp.filemap_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
14.08 +0.3 14.42 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
15.98 +0.3 16.32 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
19.18 +0.4 19.55 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
20.30 +0.4 20.67 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
7.39 -7.4 0.00 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
54.31 -0.7 53.60 perf-profile.children.cycles-pp.llseek
56.77 -0.3 56.42 perf-profile.children.cycles-pp.do_syscall_64
59.25 -0.3 58.95 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.12 ± 3% +0.0 0.15 ± 13% perf-profile.children.cycles-pp.generic_file_read_iter
1.73 +0.0 1.77 perf-profile.children.cycles-pp.x64_sys_call
1.97 +0.1 2.02 perf-profile.children.cycles-pp.filemap_get_entry
2.84 +0.1 2.92 perf-profile.children.cycles-pp.__filemap_get_folio
2.97 +0.1 3.05 perf-profile.children.cycles-pp.simple_write_begin
6.98 +0.1 7.09 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.96 +0.1 2.08 ± 5% perf-profile.children.cycles-pp.stress_shim_lseek
8.92 +0.1 9.06 perf-profile.children.cycles-pp.entry_SYSCALL_64
8.40 +0.2 8.58 perf-profile.children.cycles-pp._copy_to_iter
8.61 +0.2 8.80 perf-profile.children.cycles-pp.copy_page_to_iter
8.97 +0.2 9.19 perf-profile.children.cycles-pp.folio_unlock
9.57 +0.2 9.80 perf-profile.children.cycles-pp.simple_write_end
19.10 +0.3 19.38 perf-profile.children.cycles-pp.read
12.94 +0.3 13.24 perf-profile.children.cycles-pp.filemap_read
25.30 +0.3 25.62 perf-profile.children.cycles-pp.write
14.14 +0.3 14.48 perf-profile.children.cycles-pp.vfs_read
16.12 +0.3 16.47 perf-profile.children.cycles-pp.generic_perform_write
14.72 +0.4 15.08 perf-profile.children.cycles-pp.ksys_read
19.25 +0.4 19.62 perf-profile.children.cycles-pp.generic_file_write_iter
20.95 +0.4 21.33 perf-profile.children.cycles-pp.ksys_write
20.40 +0.4 20.78 perf-profile.children.cycles-pp.vfs_write
6.38 -6.4 0.00 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.63 +0.0 0.65 perf-profile.self.cycles-pp.__filemap_get_folio
2.20 +0.0 2.23 perf-profile.self.cycles-pp.entry_SYSCALL_64
2.45 +0.0 2.48 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.97 +0.0 1.00 perf-profile.self.cycles-pp.filemap_read
1.51 +0.0 1.56 perf-profile.self.cycles-pp.x64_sys_call
1.54 +0.0 1.59 perf-profile.self.cycles-pp.filemap_get_read_batch
6.54 +0.1 6.64 perf-profile.self.cycles-pp.llseek
6.74 +0.1 6.85 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
8.35 +0.2 8.54 perf-profile.self.cycles-pp._copy_to_iter
8.93 +0.2 9.14 perf-profile.self.cycles-pp.folio_unlock
3.91 +6.1 9.96 perf-profile.self.cycles-pp.do_syscall_64
***************************************************************************************************
lkp-gnr-2ap2: 384 threads 2 sockets Intel(R) Xeon(R) 6972P (Granite Rapids) with 128G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2ap2/context/stress-ng/60s
commit:
37c1871b51 ("LoongArch: entry: Migrate ret_from_fork() to C")
c1bc35dd5b ("entry: Inline syscall_exit_to_user_mode()")
37c1871b51766a66 c1bc35dd5bf6c7fa86a936a4fbe
---------------- ---------------------------
%stddev %change %stddev
\ | \
933000 ± 10% +30.5% 1217543 ± 18% proc-vmstat.pgfree
40.25 ± 37% +70.8% 68.75 ± 37% sched_debug.cpu.nr_uninterruptible.max
1.063e+08 +1.9% 1.083e+08 stress-ng.context.ops
1771139 +1.9% 1805148 stress-ng.context.ops_per_sec
4608060 +1.9% 4696809 stress-ng.context.swapcontext_calls_per_sec
0.06 ± 24% -100.0% 0.00 perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
4.53 ± 59% -100.0% 0.00 perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
217.64 ± 10% -17.8% 178.86 ± 17% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.67 ± 83% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
3262 ± 3% -100.0% 0.00 perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
505.60 ± 97% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
217.59 ± 10% -18.1% 178.22 ± 17% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.61 ± 91% -100.0% 0.00 perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
502.72 ± 98% -100.0% 0.00 perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
1.197e+11 -4.4% 1.145e+11 perf-stat.i.branch-instructions
1.48 +0.1 1.57 perf-stat.i.branch-miss-rate%
1.761e+09 +1.5% 1.788e+09 perf-stat.i.branch-misses
2.06 +4.1% 2.15 perf-stat.i.cpi
6.404e+11 -4.3% 6.129e+11 perf-stat.i.instructions
0.49 -3.9% 0.47 perf-stat.i.ipc
1.47 +0.1 1.56 perf-stat.overall.branch-miss-rate%
2.06 +4.1% 2.15 perf-stat.overall.cpi
0.48 -3.9% 0.47 perf-stat.overall.ipc
1.178e+11 -4.4% 1.126e+11 perf-stat.ps.branch-instructions
1.732e+09 +1.5% 1.758e+09 perf-stat.ps.branch-misses
6.3e+11 -4.3% 6.029e+11 perf-stat.ps.instructions
3.849e+13 -3.5% 3.716e+13 perf-stat.total.instructions
6.12 -6.1 0.00 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
33.80 -0.7 33.14 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.swapcontext
31.62 -0.5 31.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
90.78 -0.3 90.49 perf-profile.calltrace.cycles-pp.swapcontext
1.40 -0.1 1.30 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.swapcontext
1.44 -0.0 1.40 perf-profile.calltrace.cycles-pp.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
0.57 +0.0 0.61 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.swapcontext
0.72 +0.0 0.77 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.swapcontext
2.21 +0.1 2.28 perf-profile.calltrace.cycles-pp.stress_thread2
2.20 +0.1 2.28 perf-profile.calltrace.cycles-pp.stress_thread3
2.15 +0.1 2.24 perf-profile.calltrace.cycles-pp.stress_thread1
7.38 +0.1 7.48 perf-profile.calltrace.cycles-pp._copy_to_user.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
8.90 +0.1 9.00 perf-profile.calltrace.cycles-pp._copy_from_user.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
1.26 +0.1 1.37 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
21.14 +0.3 21.49 perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
22.96 +0.5 23.48 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.swapcontext
6.45 -6.4 0.00 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
32.36 -0.7 31.64 perf-profile.children.cycles-pp.do_syscall_64
34.18 -0.7 33.52 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
96.11 -0.1 96.00 perf-profile.children.cycles-pp.swapcontext
1.59 -0.1 1.50 perf-profile.children.cycles-pp.syscall_return_via_sysret
1.54 -0.0 1.51 perf-profile.children.cycles-pp.sigprocmask
0.74 +0.1 0.79 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
1.72 +0.1 1.78 perf-profile.children.cycles-pp.stress_thread3
1.70 +0.1 1.75 perf-profile.children.cycles-pp.stress_thread1
1.72 +0.1 1.78 perf-profile.children.cycles-pp.stress_thread2
7.64 +0.1 7.76 perf-profile.children.cycles-pp._copy_to_user
1.44 +0.1 1.58 perf-profile.children.cycles-pp.x64_sys_call
9.59 +0.2 9.74 perf-profile.children.cycles-pp._copy_from_user
7.18 +0.2 7.35 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
12.65 +0.3 12.92 perf-profile.children.cycles-pp.entry_SYSCALL_64
21.19 +0.3 21.50 perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
5.45 -5.5 0.00 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
1.59 -0.1 1.50 perf-profile.self.cycles-pp.syscall_return_via_sysret
1.39 -0.0 1.36 perf-profile.self.cycles-pp.sigprocmask
2.32 +0.0 2.35 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.17 +0.0 1.20 perf-profile.self.cycles-pp.stress_thread3
1.18 +0.0 1.21 perf-profile.self.cycles-pp.stress_thread2
1.17 +0.0 1.20 perf-profile.self.cycles-pp.stress_thread1
2.83 +0.0 2.87 perf-profile.self.cycles-pp.__x64_sys_rt_sigprocmask
2.00 +0.1 2.05 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.73 +0.1 0.79 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
7.50 +0.1 7.62 perf-profile.self.cycles-pp._copy_to_user
9.20 +0.1 9.34 perf-profile.self.cycles-pp._copy_from_user
1.22 +0.1 1.37 perf-profile.self.cycles-pp.x64_sys_call
6.99 +0.2 7.15 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
49.94 +0.4 50.34 perf-profile.self.cycles-pp.swapcontext
3.36 +5.2 8.51 perf-profile.self.cycles-pp.do_syscall_64
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: kernel test robot <oliver.sang@intel.com>
To: Charlie Jenkins <charlie@rivosinc.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
<linux-kernel@vger.kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Huacai Chen <chenhuacai@kernel.org>,
WANG Xuerui <kernel@xen0n.name>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Andy Lutomirski <luto@kernel.org>,
Alexandre Ghiti <alexghiti@rivosinc.com>,
<linux-riscv@lists.infradead.org>, <loongarch@lists.linux.dev>,
Charlie Jenkins <charlie@rivosinc.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()
Date: Wed, 5 Feb 2025 16:13:14 +0800 [thread overview]
Message-ID: <202502051555.85ae6844-lkp@intel.com> (raw)
In-Reply-To: <20250127-riscv_optimize_entry-v4-4-868cf7702dc9@rivosinc.com>
Hello,
kernel test robot noticed a 1.9% improvement of stress-ng.seek.ops_per_sec on:
commit: c1bc35dd5bf6c7fa86a936a4fbe3b8d92fbf8641 ("[PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()")
url: https://github.com/intel-lab-lkp/linux/commits/Charlie-Jenkins/riscv-entry-Convert-ret_from_fork-to-C/20250128-133636
patch link: https://lore.kernel.org/all/20250127-riscv_optimize_entry-v4-4-868cf7702dc9@rivosinc.com/
patch subject: [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: seek
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+--------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.context.swapcontext_calls_per_sec 1.9% improvement |
| test machine | 384 threads 2 sockets Intel(R) Xeon(R) 6972P (Granite Rapids) with 128G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=context |
| | testtime=60s |
+------------------+--------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250205/202502051555.85ae6844-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/seek/stress-ng/60s
commit:
37c1871b51 ("LoongArch: entry: Migrate ret_from_fork() to C")
c1bc35dd5b ("entry: Inline syscall_exit_to_user_mode()")
37c1871b51766a66 c1bc35dd5bf6c7fa86a936a4fbe
---------------- ---------------------------
%stddev %change %stddev
\ | \
104886 ± 19% +19.3% 125157 ± 17% numa-meminfo.node1.Slab
2583 ± 39% +75.4% 4531 ± 40% proc-vmstat.numa_hint_faults_local
179842 +0.6% 180945 vmstat.system.in
177.18 -2.6% 172.49 stress-ng.seek.nanosecs_per_seek
1.223e+09 +1.9% 1.246e+09 stress-ng.seek.ops
20376380 +1.9% 20771261 stress-ng.seek.ops_per_sec
1.05 ± 20% -100.0% 0.00 perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
13.11 ± 28% -100.0% 0.00 perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
3.12 ± 21% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
2785 ± 14% -100.0% 0.00 perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
836.20 ± 43% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
2.07 ± 27% -100.0% 0.00 perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
834.79 ± 44% -100.0% 0.00 perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
2.04 +3.4% 2.11 perf-stat.i.MPKI
3.682e+08 +2.0% 3.754e+08 perf-stat.i.cache-misses
4.637e+08 +1.8% 4.721e+08 perf-stat.i.cache-references
1.23 +1.5% 1.25 perf-stat.i.cpi
603.02 -1.9% 591.60 perf-stat.i.cycles-between-cache-misses
1.798e+11 -1.4% 1.772e+11 perf-stat.i.instructions
0.82 -1.4% 0.80 perf-stat.i.ipc
3902 +1.8% 3972 ± 2% perf-stat.i.minor-faults
3902 +1.8% 3972 ± 2% perf-stat.i.page-faults
2.05 +3.4% 2.12 perf-stat.overall.MPKI
1.23 +1.5% 1.25 perf-stat.overall.cpi
602.25 -1.9% 590.74 perf-stat.overall.cycles-between-cache-misses
0.81 -1.4% 0.80 perf-stat.overall.ipc
3.623e+08 +1.9% 3.693e+08 perf-stat.ps.cache-misses
4.562e+08 +1.8% 4.645e+08 perf-stat.ps.cache-references
1.769e+11 -1.4% 1.743e+11 perf-stat.ps.instructions
3826 +1.8% 3893 ± 2% perf-stat.ps.minor-faults
3826 +1.8% 3893 ± 2% perf-stat.ps.page-faults
1.085e+13 -2.0% 1.063e+13 perf-stat.total.instructions
10.62 ± 2% -0.6 10.02 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.llseek.stress_run
9.46 ± 2% -0.5 8.94 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek.stress_run
0.63 +0.0 0.66 ± 3% perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
1.61 +0.0 1.64 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
2.78 +0.1 2.85 perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
2.94 +0.1 3.02 perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
8.58 +0.2 8.77 perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.vfs_read.ksys_read.do_syscall_64
8.37 +0.2 8.56 perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.vfs_read.ksys_read
8.96 +0.2 9.17 perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
9.53 +0.2 9.75 perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
12.86 +0.3 13.15 perf-profile.calltrace.cycles-pp.filemap_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
14.08 +0.3 14.42 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
15.98 +0.3 16.32 perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
19.18 +0.4 19.55 perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
20.30 +0.4 20.67 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
7.39 -7.4 0.00 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
54.31 -0.7 53.60 perf-profile.children.cycles-pp.llseek
56.77 -0.3 56.42 perf-profile.children.cycles-pp.do_syscall_64
59.25 -0.3 58.95 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.12 ± 3% +0.0 0.15 ± 13% perf-profile.children.cycles-pp.generic_file_read_iter
1.73 +0.0 1.77 perf-profile.children.cycles-pp.x64_sys_call
1.97 +0.1 2.02 perf-profile.children.cycles-pp.filemap_get_entry
2.84 +0.1 2.92 perf-profile.children.cycles-pp.__filemap_get_folio
2.97 +0.1 3.05 perf-profile.children.cycles-pp.simple_write_begin
6.98 +0.1 7.09 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.96 +0.1 2.08 ± 5% perf-profile.children.cycles-pp.stress_shim_lseek
8.92 +0.1 9.06 perf-profile.children.cycles-pp.entry_SYSCALL_64
8.40 +0.2 8.58 perf-profile.children.cycles-pp._copy_to_iter
8.61 +0.2 8.80 perf-profile.children.cycles-pp.copy_page_to_iter
8.97 +0.2 9.19 perf-profile.children.cycles-pp.folio_unlock
9.57 +0.2 9.80 perf-profile.children.cycles-pp.simple_write_end
19.10 +0.3 19.38 perf-profile.children.cycles-pp.read
12.94 +0.3 13.24 perf-profile.children.cycles-pp.filemap_read
25.30 +0.3 25.62 perf-profile.children.cycles-pp.write
14.14 +0.3 14.48 perf-profile.children.cycles-pp.vfs_read
16.12 +0.3 16.47 perf-profile.children.cycles-pp.generic_perform_write
14.72 +0.4 15.08 perf-profile.children.cycles-pp.ksys_read
19.25 +0.4 19.62 perf-profile.children.cycles-pp.generic_file_write_iter
20.95 +0.4 21.33 perf-profile.children.cycles-pp.ksys_write
20.40 +0.4 20.78 perf-profile.children.cycles-pp.vfs_write
6.38 -6.4 0.00 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.63 +0.0 0.65 perf-profile.self.cycles-pp.__filemap_get_folio
2.20 +0.0 2.23 perf-profile.self.cycles-pp.entry_SYSCALL_64
2.45 +0.0 2.48 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.97 +0.0 1.00 perf-profile.self.cycles-pp.filemap_read
1.51 +0.0 1.56 perf-profile.self.cycles-pp.x64_sys_call
1.54 +0.0 1.59 perf-profile.self.cycles-pp.filemap_get_read_batch
6.54 +0.1 6.64 perf-profile.self.cycles-pp.llseek
6.74 +0.1 6.85 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
8.35 +0.2 8.54 perf-profile.self.cycles-pp._copy_to_iter
8.93 +0.2 9.14 perf-profile.self.cycles-pp.folio_unlock
3.91 +6.1 9.96 perf-profile.self.cycles-pp.do_syscall_64
***************************************************************************************************
lkp-gnr-2ap2: 384 threads 2 sockets Intel(R) Xeon(R) 6972P (Granite Rapids) with 128G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2ap2/context/stress-ng/60s
commit:
37c1871b51 ("LoongArch: entry: Migrate ret_from_fork() to C")
c1bc35dd5b ("entry: Inline syscall_exit_to_user_mode()")
37c1871b51766a66 c1bc35dd5bf6c7fa86a936a4fbe
---------------- ---------------------------
%stddev %change %stddev
\ | \
933000 ± 10% +30.5% 1217543 ± 18% proc-vmstat.pgfree
40.25 ± 37% +70.8% 68.75 ± 37% sched_debug.cpu.nr_uninterruptible.max
1.063e+08 +1.9% 1.083e+08 stress-ng.context.ops
1771139 +1.9% 1805148 stress-ng.context.ops_per_sec
4608060 +1.9% 4696809 stress-ng.context.swapcontext_calls_per_sec
0.06 ± 24% -100.0% 0.00 perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
4.53 ± 59% -100.0% 0.00 perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
217.64 ± 10% -17.8% 178.86 ± 17% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.67 ± 83% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
3262 ± 3% -100.0% 0.00 perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
505.60 ± 97% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
217.59 ± 10% -18.1% 178.22 ± 17% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.61 ± 91% -100.0% 0.00 perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
502.72 ± 98% -100.0% 0.00 perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
1.197e+11 -4.4% 1.145e+11 perf-stat.i.branch-instructions
1.48 +0.1 1.57 perf-stat.i.branch-miss-rate%
1.761e+09 +1.5% 1.788e+09 perf-stat.i.branch-misses
2.06 +4.1% 2.15 perf-stat.i.cpi
6.404e+11 -4.3% 6.129e+11 perf-stat.i.instructions
0.49 -3.9% 0.47 perf-stat.i.ipc
1.47 +0.1 1.56 perf-stat.overall.branch-miss-rate%
2.06 +4.1% 2.15 perf-stat.overall.cpi
0.48 -3.9% 0.47 perf-stat.overall.ipc
1.178e+11 -4.4% 1.126e+11 perf-stat.ps.branch-instructions
1.732e+09 +1.5% 1.758e+09 perf-stat.ps.branch-misses
6.3e+11 -4.3% 6.029e+11 perf-stat.ps.instructions
3.849e+13 -3.5% 3.716e+13 perf-stat.total.instructions
6.12 -6.1 0.00 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
33.80 -0.7 33.14 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.swapcontext
31.62 -0.5 31.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
90.78 -0.3 90.49 perf-profile.calltrace.cycles-pp.swapcontext
1.40 -0.1 1.30 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.swapcontext
1.44 -0.0 1.40 perf-profile.calltrace.cycles-pp.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
0.57 +0.0 0.61 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.swapcontext
0.72 +0.0 0.77 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.swapcontext
2.21 +0.1 2.28 perf-profile.calltrace.cycles-pp.stress_thread2
2.20 +0.1 2.28 perf-profile.calltrace.cycles-pp.stress_thread3
2.15 +0.1 2.24 perf-profile.calltrace.cycles-pp.stress_thread1
7.38 +0.1 7.48 perf-profile.calltrace.cycles-pp._copy_to_user.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
8.90 +0.1 9.00 perf-profile.calltrace.cycles-pp._copy_from_user.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
1.26 +0.1 1.37 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
21.14 +0.3 21.49 perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
22.96 +0.5 23.48 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.swapcontext
6.45 -6.4 0.00 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
32.36 -0.7 31.64 perf-profile.children.cycles-pp.do_syscall_64
34.18 -0.7 33.52 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
96.11 -0.1 96.00 perf-profile.children.cycles-pp.swapcontext
1.59 -0.1 1.50 perf-profile.children.cycles-pp.syscall_return_via_sysret
1.54 -0.0 1.51 perf-profile.children.cycles-pp.sigprocmask
0.74 +0.1 0.79 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
1.72 +0.1 1.78 perf-profile.children.cycles-pp.stress_thread3
1.70 +0.1 1.75 perf-profile.children.cycles-pp.stress_thread1
1.72 +0.1 1.78 perf-profile.children.cycles-pp.stress_thread2
7.64 +0.1 7.76 perf-profile.children.cycles-pp._copy_to_user
1.44 +0.1 1.58 perf-profile.children.cycles-pp.x64_sys_call
9.59 +0.2 9.74 perf-profile.children.cycles-pp._copy_from_user
7.18 +0.2 7.35 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
12.65 +0.3 12.92 perf-profile.children.cycles-pp.entry_SYSCALL_64
21.19 +0.3 21.50 perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
5.45 -5.5 0.00 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
1.59 -0.1 1.50 perf-profile.self.cycles-pp.syscall_return_via_sysret
1.39 -0.0 1.36 perf-profile.self.cycles-pp.sigprocmask
2.32 +0.0 2.35 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.17 +0.0 1.20 perf-profile.self.cycles-pp.stress_thread3
1.18 +0.0 1.21 perf-profile.self.cycles-pp.stress_thread2
1.17 +0.0 1.20 perf-profile.self.cycles-pp.stress_thread1
2.83 +0.0 2.87 perf-profile.self.cycles-pp.__x64_sys_rt_sigprocmask
2.00 +0.1 2.05 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.73 +0.1 0.79 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
7.50 +0.1 7.62 perf-profile.self.cycles-pp._copy_to_user
9.20 +0.1 9.34 perf-profile.self.cycles-pp._copy_from_user
1.22 +0.1 1.37 perf-profile.self.cycles-pp.x64_sys_call
6.99 +0.2 7.15 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
49.94 +0.4 50.34 perf-profile.self.cycles-pp.swapcontext
3.36 +5.2 8.51 perf-profile.self.cycles-pp.do_syscall_64
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
next prev parent reply other threads:[~2025-02-05 8:14 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 5:33 [PATCH v4 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
2025-01-28 5:33 ` Charlie Jenkins
2025-01-28 5:33 ` [PATCH v4 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
2025-01-28 5:33 ` Charlie Jenkins
2025-01-28 5:33 ` [PATCH v4 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
2025-01-28 5:33 ` Charlie Jenkins
2025-01-28 5:33 ` [PATCH v4 3/4] LoongArch: entry: Migrate ret_from_fork() to C Charlie Jenkins
2025-01-28 5:33 ` Charlie Jenkins
2025-01-28 5:33 ` [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode() Charlie Jenkins
2025-01-28 5:33 ` Charlie Jenkins
2025-02-05 8:13 ` kernel test robot [this message]
2025-02-05 8:13 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202502051555.85ae6844-lkp@intel.com \
--to=oliver.sang@intel.com \
--cc=alexghiti@rivosinc.com \
--cc=charlie@rivosinc.com \
--cc=chenhuacai@kernel.org \
--cc=kernel@xen0n.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=lkp@intel.com \
--cc=loongarch@lists.linux.dev \
--cc=luto@kernel.org \
--cc=oe-lkp@lists.linux.dev \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.