All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Charlie Jenkins <charlie@rivosinc.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	WANG Xuerui <kernel@xen0n.name>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	Alexandre Ghiti <alexghiti@rivosinc.com>,
	<linux-riscv@lists.infradead.org>, <loongarch@lists.linux.dev>,
	Charlie Jenkins <charlie@rivosinc.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()
Date: Wed, 5 Feb 2025 16:13:14 +0800	[thread overview]
Message-ID: <202502051555.85ae6844-lkp@intel.com> (raw)
In-Reply-To: <20250127-riscv_optimize_entry-v4-4-868cf7702dc9@rivosinc.com>



Hello,

kernel test robot noticed a 1.9% improvement of stress-ng.seek.ops_per_sec on:


commit: c1bc35dd5bf6c7fa86a936a4fbe3b8d92fbf8641 ("[PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()")
url: https://github.com/intel-lab-lkp/linux/commits/Charlie-Jenkins/riscv-entry-Convert-ret_from_fork-to-C/20250128-133636
patch link: https://lore.kernel.org/all/20250127-riscv_optimize_entry-v4-4-868cf7702dc9@rivosinc.com/
patch subject: [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: seek
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+--------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.context.swapcontext_calls_per_sec 1.9% improvement        |
| test machine     | 384 threads 2 sockets Intel(R) Xeon(R) 6972P (Granite Rapids) with 128G memory |
| test parameters  | cpufreq_governor=performance                                                   |
|                  | nr_threads=100%                                                                |
|                  | test=context                                                                   |
|                  | testtime=60s                                                                   |
+------------------+--------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250205/202502051555.85ae6844-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/seek/stress-ng/60s

commit: 
  37c1871b51 ("LoongArch: entry: Migrate ret_from_fork() to C")
  c1bc35dd5b ("entry: Inline syscall_exit_to_user_mode()")

37c1871b51766a66 c1bc35dd5bf6c7fa86a936a4fbe 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    104886 ± 19%     +19.3%     125157 ± 17%  numa-meminfo.node1.Slab
      2583 ± 39%     +75.4%       4531 ± 40%  proc-vmstat.numa_hint_faults_local
    179842            +0.6%     180945        vmstat.system.in
    177.18            -2.6%     172.49        stress-ng.seek.nanosecs_per_seek
 1.223e+09            +1.9%  1.246e+09        stress-ng.seek.ops
  20376380            +1.9%   20771261        stress-ng.seek.ops_per_sec
      1.05 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     13.11 ± 28%    -100.0%       0.00        perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      3.12 ± 21%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2785 ± 14%    -100.0%       0.00        perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    836.20 ± 43%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2.07 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    834.79 ± 44%    -100.0%       0.00        perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2.04            +3.4%       2.11        perf-stat.i.MPKI
 3.682e+08            +2.0%  3.754e+08        perf-stat.i.cache-misses
 4.637e+08            +1.8%  4.721e+08        perf-stat.i.cache-references
      1.23            +1.5%       1.25        perf-stat.i.cpi
    603.02            -1.9%     591.60        perf-stat.i.cycles-between-cache-misses
 1.798e+11            -1.4%  1.772e+11        perf-stat.i.instructions
      0.82            -1.4%       0.80        perf-stat.i.ipc
      3902            +1.8%       3972 ±  2%  perf-stat.i.minor-faults
      3902            +1.8%       3972 ±  2%  perf-stat.i.page-faults
      2.05            +3.4%       2.12        perf-stat.overall.MPKI
      1.23            +1.5%       1.25        perf-stat.overall.cpi
    602.25            -1.9%     590.74        perf-stat.overall.cycles-between-cache-misses
      0.81            -1.4%       0.80        perf-stat.overall.ipc
 3.623e+08            +1.9%  3.693e+08        perf-stat.ps.cache-misses
 4.562e+08            +1.8%  4.645e+08        perf-stat.ps.cache-references
 1.769e+11            -1.4%  1.743e+11        perf-stat.ps.instructions
      3826            +1.8%       3893 ±  2%  perf-stat.ps.minor-faults
      3826            +1.8%       3893 ±  2%  perf-stat.ps.page-faults
 1.085e+13            -2.0%  1.063e+13        perf-stat.total.instructions
     10.62 ±  2%      -0.6       10.02 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.llseek.stress_run
      9.46 ±  2%      -0.5        8.94 ±  3%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek.stress_run
      0.63            +0.0        0.66 ±  3%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
      1.61            +0.0        1.64        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      2.78            +0.1        2.85        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
      2.94            +0.1        3.02        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      8.58            +0.2        8.77        perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.vfs_read.ksys_read.do_syscall_64
      8.37            +0.2        8.56        perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.vfs_read.ksys_read
      8.96            +0.2        9.17        perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      9.53            +0.2        9.75        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     12.86            +0.3       13.15        perf-profile.calltrace.cycles-pp.filemap_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     14.08            +0.3       14.42        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
     15.98            +0.3       16.32        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
     19.18            +0.4       19.55        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     20.30            +0.4       20.67        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      7.39            -7.4        0.00        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     54.31            -0.7       53.60        perf-profile.children.cycles-pp.llseek
     56.77            -0.3       56.42        perf-profile.children.cycles-pp.do_syscall_64
     59.25            -0.3       58.95        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12 ±  3%      +0.0        0.15 ± 13%  perf-profile.children.cycles-pp.generic_file_read_iter
      1.73            +0.0        1.77        perf-profile.children.cycles-pp.x64_sys_call
      1.97            +0.1        2.02        perf-profile.children.cycles-pp.filemap_get_entry
      2.84            +0.1        2.92        perf-profile.children.cycles-pp.__filemap_get_folio
      2.97            +0.1        3.05        perf-profile.children.cycles-pp.simple_write_begin
      6.98            +0.1        7.09        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.96            +0.1        2.08 ±  5%  perf-profile.children.cycles-pp.stress_shim_lseek
      8.92            +0.1        9.06        perf-profile.children.cycles-pp.entry_SYSCALL_64
      8.40            +0.2        8.58        perf-profile.children.cycles-pp._copy_to_iter
      8.61            +0.2        8.80        perf-profile.children.cycles-pp.copy_page_to_iter
      8.97            +0.2        9.19        perf-profile.children.cycles-pp.folio_unlock
      9.57            +0.2        9.80        perf-profile.children.cycles-pp.simple_write_end
     19.10            +0.3       19.38        perf-profile.children.cycles-pp.read
     12.94            +0.3       13.24        perf-profile.children.cycles-pp.filemap_read
     25.30            +0.3       25.62        perf-profile.children.cycles-pp.write
     14.14            +0.3       14.48        perf-profile.children.cycles-pp.vfs_read
     16.12            +0.3       16.47        perf-profile.children.cycles-pp.generic_perform_write
     14.72            +0.4       15.08        perf-profile.children.cycles-pp.ksys_read
     19.25            +0.4       19.62        perf-profile.children.cycles-pp.generic_file_write_iter
     20.95            +0.4       21.33        perf-profile.children.cycles-pp.ksys_write
     20.40            +0.4       20.78        perf-profile.children.cycles-pp.vfs_write
      6.38            -6.4        0.00        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.63            +0.0        0.65        perf-profile.self.cycles-pp.__filemap_get_folio
      2.20            +0.0        2.23        perf-profile.self.cycles-pp.entry_SYSCALL_64
      2.45            +0.0        2.48        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.97            +0.0        1.00        perf-profile.self.cycles-pp.filemap_read
      1.51            +0.0        1.56        perf-profile.self.cycles-pp.x64_sys_call
      1.54            +0.0        1.59        perf-profile.self.cycles-pp.filemap_get_read_batch
      6.54            +0.1        6.64        perf-profile.self.cycles-pp.llseek
      6.74            +0.1        6.85        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      8.35            +0.2        8.54        perf-profile.self.cycles-pp._copy_to_iter
      8.93            +0.2        9.14        perf-profile.self.cycles-pp.folio_unlock
      3.91            +6.1        9.96        perf-profile.self.cycles-pp.do_syscall_64


***************************************************************************************************
lkp-gnr-2ap2: 384 threads 2 sockets Intel(R) Xeon(R) 6972P (Granite Rapids) with 128G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2ap2/context/stress-ng/60s

commit: 
  37c1871b51 ("LoongArch: entry: Migrate ret_from_fork() to C")
  c1bc35dd5b ("entry: Inline syscall_exit_to_user_mode()")

37c1871b51766a66 c1bc35dd5bf6c7fa86a936a4fbe 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    933000 ± 10%     +30.5%    1217543 ± 18%  proc-vmstat.pgfree
     40.25 ± 37%     +70.8%      68.75 ± 37%  sched_debug.cpu.nr_uninterruptible.max
 1.063e+08            +1.9%  1.083e+08        stress-ng.context.ops
   1771139            +1.9%    1805148        stress-ng.context.ops_per_sec
   4608060            +1.9%    4696809        stress-ng.context.swapcontext_calls_per_sec
      0.06 ± 24%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      4.53 ± 59%    -100.0%       0.00        perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    217.64 ± 10%     -17.8%     178.86 ± 17%  perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.67 ± 83%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      3262 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    505.60 ± 97%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    217.59 ± 10%     -18.1%     178.22 ± 17%  perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.61 ± 91%    -100.0%       0.00        perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    502.72 ± 98%    -100.0%       0.00        perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
 1.197e+11            -4.4%  1.145e+11        perf-stat.i.branch-instructions
      1.48            +0.1        1.57        perf-stat.i.branch-miss-rate%
 1.761e+09            +1.5%  1.788e+09        perf-stat.i.branch-misses
      2.06            +4.1%       2.15        perf-stat.i.cpi
 6.404e+11            -4.3%  6.129e+11        perf-stat.i.instructions
      0.49            -3.9%       0.47        perf-stat.i.ipc
      1.47            +0.1        1.56        perf-stat.overall.branch-miss-rate%
      2.06            +4.1%       2.15        perf-stat.overall.cpi
      0.48            -3.9%       0.47        perf-stat.overall.ipc
 1.178e+11            -4.4%  1.126e+11        perf-stat.ps.branch-instructions
 1.732e+09            +1.5%  1.758e+09        perf-stat.ps.branch-misses
   6.3e+11            -4.3%  6.029e+11        perf-stat.ps.instructions
 3.849e+13            -3.5%  3.716e+13        perf-stat.total.instructions
      6.12            -6.1        0.00        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
     33.80            -0.7       33.14        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.swapcontext
     31.62            -0.5       31.12        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
     90.78            -0.3       90.49        perf-profile.calltrace.cycles-pp.swapcontext
      1.40            -0.1        1.30        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.swapcontext
      1.44            -0.0        1.40        perf-profile.calltrace.cycles-pp.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
      0.57            +0.0        0.61        perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.swapcontext
      0.72            +0.0        0.77        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.swapcontext
      2.21            +0.1        2.28        perf-profile.calltrace.cycles-pp.stress_thread2
      2.20            +0.1        2.28        perf-profile.calltrace.cycles-pp.stress_thread3
      2.15            +0.1        2.24        perf-profile.calltrace.cycles-pp.stress_thread1
      7.38            +0.1        7.48        perf-profile.calltrace.cycles-pp._copy_to_user.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
      8.90            +0.1        9.00        perf-profile.calltrace.cycles-pp._copy_from_user.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
      1.26            +0.1        1.37        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
     21.14            +0.3       21.49        perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
     22.96            +0.5       23.48        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.swapcontext
      6.45            -6.4        0.00        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     32.36            -0.7       31.64        perf-profile.children.cycles-pp.do_syscall_64
     34.18            -0.7       33.52        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     96.11            -0.1       96.00        perf-profile.children.cycles-pp.swapcontext
      1.59            -0.1        1.50        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.54            -0.0        1.51        perf-profile.children.cycles-pp.sigprocmask
      0.74            +0.1        0.79        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      1.72            +0.1        1.78        perf-profile.children.cycles-pp.stress_thread3
      1.70            +0.1        1.75        perf-profile.children.cycles-pp.stress_thread1
      1.72            +0.1        1.78        perf-profile.children.cycles-pp.stress_thread2
      7.64            +0.1        7.76        perf-profile.children.cycles-pp._copy_to_user
      1.44            +0.1        1.58        perf-profile.children.cycles-pp.x64_sys_call
      9.59            +0.2        9.74        perf-profile.children.cycles-pp._copy_from_user
      7.18            +0.2        7.35        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
     12.65            +0.3       12.92        perf-profile.children.cycles-pp.entry_SYSCALL_64
     21.19            +0.3       21.50        perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
      5.45            -5.5        0.00        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.59            -0.1        1.50        perf-profile.self.cycles-pp.syscall_return_via_sysret
      1.39            -0.0        1.36        perf-profile.self.cycles-pp.sigprocmask
      2.32            +0.0        2.35        perf-profile.self.cycles-pp.entry_SYSCALL_64
      1.17            +0.0        1.20        perf-profile.self.cycles-pp.stress_thread3
      1.18            +0.0        1.21        perf-profile.self.cycles-pp.stress_thread2
      1.17            +0.0        1.20        perf-profile.self.cycles-pp.stress_thread1
      2.83            +0.0        2.87        perf-profile.self.cycles-pp.__x64_sys_rt_sigprocmask
      2.00            +0.1        2.05        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.73            +0.1        0.79        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      7.50            +0.1        7.62        perf-profile.self.cycles-pp._copy_to_user
      9.20            +0.1        9.34        perf-profile.self.cycles-pp._copy_from_user
      1.22            +0.1        1.37        perf-profile.self.cycles-pp.x64_sys_call
      6.99            +0.2        7.15        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
     49.94            +0.4       50.34        perf-profile.self.cycles-pp.swapcontext
      3.36            +5.2        8.51        perf-profile.self.cycles-pp.do_syscall_64





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: kernel test robot <oliver.sang@intel.com>
To: Charlie Jenkins <charlie@rivosinc.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	WANG Xuerui <kernel@xen0n.name>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	Alexandre Ghiti <alexghiti@rivosinc.com>,
	<linux-riscv@lists.infradead.org>, <loongarch@lists.linux.dev>,
	Charlie Jenkins <charlie@rivosinc.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()
Date: Wed, 5 Feb 2025 16:13:14 +0800	[thread overview]
Message-ID: <202502051555.85ae6844-lkp@intel.com> (raw)
In-Reply-To: <20250127-riscv_optimize_entry-v4-4-868cf7702dc9@rivosinc.com>



Hello,

kernel test robot noticed a 1.9% improvement of stress-ng.seek.ops_per_sec on:


commit: c1bc35dd5bf6c7fa86a936a4fbe3b8d92fbf8641 ("[PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()")
url: https://github.com/intel-lab-lkp/linux/commits/Charlie-Jenkins/riscv-entry-Convert-ret_from_fork-to-C/20250128-133636
patch link: https://lore.kernel.org/all/20250127-riscv_optimize_entry-v4-4-868cf7702dc9@rivosinc.com/
patch subject: [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode()

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: seek
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+--------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.context.swapcontext_calls_per_sec 1.9% improvement        |
| test machine     | 384 threads 2 sockets Intel(R) Xeon(R) 6972P (Granite Rapids) with 128G memory |
| test parameters  | cpufreq_governor=performance                                                   |
|                  | nr_threads=100%                                                                |
|                  | test=context                                                                   |
|                  | testtime=60s                                                                   |
+------------------+--------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250205/202502051555.85ae6844-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/seek/stress-ng/60s

commit: 
  37c1871b51 ("LoongArch: entry: Migrate ret_from_fork() to C")
  c1bc35dd5b ("entry: Inline syscall_exit_to_user_mode()")

37c1871b51766a66 c1bc35dd5bf6c7fa86a936a4fbe 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    104886 ± 19%     +19.3%     125157 ± 17%  numa-meminfo.node1.Slab
      2583 ± 39%     +75.4%       4531 ± 40%  proc-vmstat.numa_hint_faults_local
    179842            +0.6%     180945        vmstat.system.in
    177.18            -2.6%     172.49        stress-ng.seek.nanosecs_per_seek
 1.223e+09            +1.9%  1.246e+09        stress-ng.seek.ops
  20376380            +1.9%   20771261        stress-ng.seek.ops_per_sec
      1.05 ± 20%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     13.11 ± 28%    -100.0%       0.00        perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      3.12 ± 21%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2785 ± 14%    -100.0%       0.00        perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    836.20 ± 43%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2.07 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    834.79 ± 44%    -100.0%       0.00        perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2.04            +3.4%       2.11        perf-stat.i.MPKI
 3.682e+08            +2.0%  3.754e+08        perf-stat.i.cache-misses
 4.637e+08            +1.8%  4.721e+08        perf-stat.i.cache-references
      1.23            +1.5%       1.25        perf-stat.i.cpi
    603.02            -1.9%     591.60        perf-stat.i.cycles-between-cache-misses
 1.798e+11            -1.4%  1.772e+11        perf-stat.i.instructions
      0.82            -1.4%       0.80        perf-stat.i.ipc
      3902            +1.8%       3972 ±  2%  perf-stat.i.minor-faults
      3902            +1.8%       3972 ±  2%  perf-stat.i.page-faults
      2.05            +3.4%       2.12        perf-stat.overall.MPKI
      1.23            +1.5%       1.25        perf-stat.overall.cpi
    602.25            -1.9%     590.74        perf-stat.overall.cycles-between-cache-misses
      0.81            -1.4%       0.80        perf-stat.overall.ipc
 3.623e+08            +1.9%  3.693e+08        perf-stat.ps.cache-misses
 4.562e+08            +1.8%  4.645e+08        perf-stat.ps.cache-references
 1.769e+11            -1.4%  1.743e+11        perf-stat.ps.instructions
      3826            +1.8%       3893 ±  2%  perf-stat.ps.minor-faults
      3826            +1.8%       3893 ±  2%  perf-stat.ps.page-faults
 1.085e+13            -2.0%  1.063e+13        perf-stat.total.instructions
     10.62 ±  2%      -0.6       10.02 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.llseek.stress_run
      9.46 ±  2%      -0.5        8.94 ±  3%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek.stress_run
      0.63            +0.0        0.66 ±  3%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
      1.61            +0.0        1.64        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      2.78            +0.1        2.85        perf-profile.calltrace.cycles-pp.__filemap_get_folio.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write
      2.94            +0.1        3.02        perf-profile.calltrace.cycles-pp.simple_write_begin.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
      8.58            +0.2        8.77        perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.vfs_read.ksys_read.do_syscall_64
      8.37            +0.2        8.56        perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.vfs_read.ksys_read
      8.96            +0.2        9.17        perf-profile.calltrace.cycles-pp.folio_unlock.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write
      9.53            +0.2        9.75        perf-profile.calltrace.cycles-pp.simple_write_end.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write
     12.86            +0.3       13.15        perf-profile.calltrace.cycles-pp.filemap_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     14.08            +0.3       14.42        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
     15.98            +0.3       16.32        perf-profile.calltrace.cycles-pp.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
     19.18            +0.4       19.55        perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     20.30            +0.4       20.67        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      7.39            -7.4        0.00        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     54.31            -0.7       53.60        perf-profile.children.cycles-pp.llseek
     56.77            -0.3       56.42        perf-profile.children.cycles-pp.do_syscall_64
     59.25            -0.3       58.95        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12 ±  3%      +0.0        0.15 ± 13%  perf-profile.children.cycles-pp.generic_file_read_iter
      1.73            +0.0        1.77        perf-profile.children.cycles-pp.x64_sys_call
      1.97            +0.1        2.02        perf-profile.children.cycles-pp.filemap_get_entry
      2.84            +0.1        2.92        perf-profile.children.cycles-pp.__filemap_get_folio
      2.97            +0.1        3.05        perf-profile.children.cycles-pp.simple_write_begin
      6.98            +0.1        7.09        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.96            +0.1        2.08 ±  5%  perf-profile.children.cycles-pp.stress_shim_lseek
      8.92            +0.1        9.06        perf-profile.children.cycles-pp.entry_SYSCALL_64
      8.40            +0.2        8.58        perf-profile.children.cycles-pp._copy_to_iter
      8.61            +0.2        8.80        perf-profile.children.cycles-pp.copy_page_to_iter
      8.97            +0.2        9.19        perf-profile.children.cycles-pp.folio_unlock
      9.57            +0.2        9.80        perf-profile.children.cycles-pp.simple_write_end
     19.10            +0.3       19.38        perf-profile.children.cycles-pp.read
     12.94            +0.3       13.24        perf-profile.children.cycles-pp.filemap_read
     25.30            +0.3       25.62        perf-profile.children.cycles-pp.write
     14.14            +0.3       14.48        perf-profile.children.cycles-pp.vfs_read
     16.12            +0.3       16.47        perf-profile.children.cycles-pp.generic_perform_write
     14.72            +0.4       15.08        perf-profile.children.cycles-pp.ksys_read
     19.25            +0.4       19.62        perf-profile.children.cycles-pp.generic_file_write_iter
     20.95            +0.4       21.33        perf-profile.children.cycles-pp.ksys_write
     20.40            +0.4       20.78        perf-profile.children.cycles-pp.vfs_write
      6.38            -6.4        0.00        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.63            +0.0        0.65        perf-profile.self.cycles-pp.__filemap_get_folio
      2.20            +0.0        2.23        perf-profile.self.cycles-pp.entry_SYSCALL_64
      2.45            +0.0        2.48        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.97            +0.0        1.00        perf-profile.self.cycles-pp.filemap_read
      1.51            +0.0        1.56        perf-profile.self.cycles-pp.x64_sys_call
      1.54            +0.0        1.59        perf-profile.self.cycles-pp.filemap_get_read_batch
      6.54            +0.1        6.64        perf-profile.self.cycles-pp.llseek
      6.74            +0.1        6.85        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      8.35            +0.2        8.54        perf-profile.self.cycles-pp._copy_to_iter
      8.93            +0.2        9.14        perf-profile.self.cycles-pp.folio_unlock
      3.91            +6.1        9.96        perf-profile.self.cycles-pp.do_syscall_64


***************************************************************************************************
lkp-gnr-2ap2: 384 threads 2 sockets Intel(R) Xeon(R) 6972P (Granite Rapids) with 128G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2ap2/context/stress-ng/60s

commit: 
  37c1871b51 ("LoongArch: entry: Migrate ret_from_fork() to C")
  c1bc35dd5b ("entry: Inline syscall_exit_to_user_mode()")

37c1871b51766a66 c1bc35dd5bf6c7fa86a936a4fbe 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    933000 ± 10%     +30.5%    1217543 ± 18%  proc-vmstat.pgfree
     40.25 ± 37%     +70.8%      68.75 ± 37%  sched_debug.cpu.nr_uninterruptible.max
 1.063e+08            +1.9%  1.083e+08        stress-ng.context.ops
   1771139            +1.9%    1805148        stress-ng.context.ops_per_sec
   4608060            +1.9%    4696809        stress-ng.context.swapcontext_calls_per_sec
      0.06 ± 24%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      4.53 ± 59%    -100.0%       0.00        perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    217.64 ± 10%     -17.8%     178.86 ± 17%  perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.67 ± 83%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      3262 ±  3%    -100.0%       0.00        perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    505.60 ± 97%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    217.59 ± 10%     -18.1%     178.22 ± 17%  perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.61 ± 91%    -100.0%       0.00        perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    502.72 ± 98%    -100.0%       0.00        perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
 1.197e+11            -4.4%  1.145e+11        perf-stat.i.branch-instructions
      1.48            +0.1        1.57        perf-stat.i.branch-miss-rate%
 1.761e+09            +1.5%  1.788e+09        perf-stat.i.branch-misses
      2.06            +4.1%       2.15        perf-stat.i.cpi
 6.404e+11            -4.3%  6.129e+11        perf-stat.i.instructions
      0.49            -3.9%       0.47        perf-stat.i.ipc
      1.47            +0.1        1.56        perf-stat.overall.branch-miss-rate%
      2.06            +4.1%       2.15        perf-stat.overall.cpi
      0.48            -3.9%       0.47        perf-stat.overall.ipc
 1.178e+11            -4.4%  1.126e+11        perf-stat.ps.branch-instructions
 1.732e+09            +1.5%  1.758e+09        perf-stat.ps.branch-misses
   6.3e+11            -4.3%  6.029e+11        perf-stat.ps.instructions
 3.849e+13            -3.5%  3.716e+13        perf-stat.total.instructions
      6.12            -6.1        0.00        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
     33.80            -0.7       33.14        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.swapcontext
     31.62            -0.5       31.12        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
     90.78            -0.3       90.49        perf-profile.calltrace.cycles-pp.swapcontext
      1.40            -0.1        1.30        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.swapcontext
      1.44            -0.0        1.40        perf-profile.calltrace.cycles-pp.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
      0.57            +0.0        0.61        perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.swapcontext
      0.72            +0.0        0.77        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.swapcontext
      2.21            +0.1        2.28        perf-profile.calltrace.cycles-pp.stress_thread2
      2.20            +0.1        2.28        perf-profile.calltrace.cycles-pp.stress_thread3
      2.15            +0.1        2.24        perf-profile.calltrace.cycles-pp.stress_thread1
      7.38            +0.1        7.48        perf-profile.calltrace.cycles-pp._copy_to_user.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
      8.90            +0.1        9.00        perf-profile.calltrace.cycles-pp._copy_from_user.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
      1.26            +0.1        1.37        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
     21.14            +0.3       21.49        perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.swapcontext
     22.96            +0.5       23.48        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.swapcontext
      6.45            -6.4        0.00        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
     32.36            -0.7       31.64        perf-profile.children.cycles-pp.do_syscall_64
     34.18            -0.7       33.52        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     96.11            -0.1       96.00        perf-profile.children.cycles-pp.swapcontext
      1.59            -0.1        1.50        perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.54            -0.0        1.51        perf-profile.children.cycles-pp.sigprocmask
      0.74            +0.1        0.79        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      1.72            +0.1        1.78        perf-profile.children.cycles-pp.stress_thread3
      1.70            +0.1        1.75        perf-profile.children.cycles-pp.stress_thread1
      1.72            +0.1        1.78        perf-profile.children.cycles-pp.stress_thread2
      7.64            +0.1        7.76        perf-profile.children.cycles-pp._copy_to_user
      1.44            +0.1        1.58        perf-profile.children.cycles-pp.x64_sys_call
      9.59            +0.2        9.74        perf-profile.children.cycles-pp._copy_from_user
      7.18            +0.2        7.35        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
     12.65            +0.3       12.92        perf-profile.children.cycles-pp.entry_SYSCALL_64
     21.19            +0.3       21.50        perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
      5.45            -5.5        0.00        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.59            -0.1        1.50        perf-profile.self.cycles-pp.syscall_return_via_sysret
      1.39            -0.0        1.36        perf-profile.self.cycles-pp.sigprocmask
      2.32            +0.0        2.35        perf-profile.self.cycles-pp.entry_SYSCALL_64
      1.17            +0.0        1.20        perf-profile.self.cycles-pp.stress_thread3
      1.18            +0.0        1.21        perf-profile.self.cycles-pp.stress_thread2
      1.17            +0.0        1.20        perf-profile.self.cycles-pp.stress_thread1
      2.83            +0.0        2.87        perf-profile.self.cycles-pp.__x64_sys_rt_sigprocmask
      2.00            +0.1        2.05        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.73            +0.1        0.79        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      7.50            +0.1        7.62        perf-profile.self.cycles-pp._copy_to_user
      9.20            +0.1        9.34        perf-profile.self.cycles-pp._copy_from_user
      1.22            +0.1        1.37        perf-profile.self.cycles-pp.x64_sys_call
      6.99            +0.2        7.15        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
     49.94            +0.4       50.34        perf-profile.self.cycles-pp.swapcontext
      3.36            +5.2        8.51        perf-profile.self.cycles-pp.do_syscall_64





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  reply	other threads:[~2025-02-05  8:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-28  5:33 [PATCH v4 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
2025-01-28  5:33 ` Charlie Jenkins
2025-01-28  5:33 ` [PATCH v4 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
2025-01-28  5:33   ` Charlie Jenkins
2025-01-28  5:33 ` [PATCH v4 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
2025-01-28  5:33   ` Charlie Jenkins
2025-01-28  5:33 ` [PATCH v4 3/4] LoongArch: entry: Migrate ret_from_fork() to C Charlie Jenkins
2025-01-28  5:33   ` Charlie Jenkins
2025-01-28  5:33 ` [PATCH v4 4/4] entry: Inline syscall_exit_to_user_mode() Charlie Jenkins
2025-01-28  5:33   ` Charlie Jenkins
2025-02-05  8:13   ` kernel test robot [this message]
2025-02-05  8:13     ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202502051555.85ae6844-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=alexghiti@rivosinc.com \
    --cc=charlie@rivosinc.com \
    --cc=chenhuacai@kernel.org \
    --cc=kernel@xen0n.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=lkp@intel.com \
    --cc=loongarch@lists.linux.dev \
    --cc=luto@kernel.org \
    --cc=oe-lkp@lists.linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.