All of lore.kernel.org
 help / color / mirror / Atom feed
* [linus:master] [tcp]  86c2bc293b:  stress-ng.sockmany.ops_per_sec 6.8% improvement
@ 2025-06-10 13:57 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-06-10 13:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: oe-lkp, lkp, linux-kernel, Jakub Kicinski, Jason Xing,
	Kuniyuki Iwashima, netdev, oliver.sang



Hello,

kernel test robot noticed a 6.8% improvement of stress-ng.sockmany.ops_per_sec on:


commit: 86c2bc293b8130aec9fa504e953531a84a6eb9a6 ("tcp: use RCU lookup in __inet_hash_connect()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sockmany
	cpufreq_governor: performance




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250610/202506102156.1d2bde14-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s

commit: 
  d186f405fd ("tcp: add RCU management to inet_bind_bucket")
  86c2bc293b ("tcp: use RCU lookup in __inet_hash_connect()")

d186f405fdf4229d 86c2bc293b8130aec9fa504e953 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.62 ±  3%      +0.1        0.69 ±  2%  mpstat.cpu.all.irq%
    521879            -1.5%     514052        vmstat.system.in
   4059292            +6.8%    4335271        stress-ng.sockmany.ops
     67315            +6.8%      71863        stress-ng.sockmany.ops_per_sec
    903062            +4.0%     939576        proc-vmstat.nr_slab_reclaimable
   5715333            +5.7%    6043532        proc-vmstat.pgfree
     30955 ±  4%      -5.6%      29223 ±  3%  proc-vmstat.pgreuse
    617802           +12.5%     694736 ±  2%  perf-c2c.DRAM.local
     43535 ±  2%     -55.2%      19524 ±  2%  perf-c2c.HITM.local
     13760 ±  4%     -94.7%     726.83 ±  9%  perf-c2c.HITM.remote
     57296 ±  3%     -64.7%      20251 ±  2%  perf-c2c.HITM.total
   4862651 ± 23%     +26.2%    6137833 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.min
      0.24 ±  6%     +23.8%       0.30 ±  5%  sched_debug.cfs_rq:/.h_nr_queued.stddev
   4862651 ± 23%     +26.2%    6137833 ±  6%  sched_debug.cfs_rq:/.min_vruntime.min
      0.24 ±  6%     +23.3%       0.30 ±  6%  sched_debug.cpu.nr_running.stddev
     40590 ±  3%     +18.8%      48233 ± 17%  sched_debug.cpu.nr_switches.max
      0.63 ± 12%     +20.6%       0.76 ±  7%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.32 ± 10%     -41.2%       0.19 ± 18%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      0.19 ±195%    +772.8%       1.62 ± 82%  perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.30 ± 31%     +51.8%       3.49 ± 12%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
     20.10           -23.3%      15.41        perf-sched.total_wait_and_delay.average.ms
    177307           +32.5%     234941        perf-sched.total_wait_and_delay.count.ms
     20.04           -23.4%      15.36        perf-sched.total_wait_time.average.ms
    125.96 ±110%     -73.3%      33.69 ± 17%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     13.68           -25.7%      10.16        perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      0.65 ± 10%     -41.0%       0.38 ± 18%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
     79042           +32.2%     104463        perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
     81037           +34.4%     108937        perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      1965 ±  9%    +125.3%       4427 ±  3%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      2427 ±  3%     +12.5%       2729 ±  2%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     13.36 ±  2%     -25.0%      10.02        perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
     13.66           -25.7%      10.15        perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      0.33 ± 10%     -40.8%       0.19 ± 18%  perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
     35.56           +15.4%      41.03        perf-stat.i.MPKI
 1.386e+10            +3.1%  1.428e+10        perf-stat.i.branch-instructions
      2.15            +0.1        2.26        perf-stat.i.branch-miss-rate%
 2.923e+08            +8.8%  3.182e+08        perf-stat.i.branch-misses
     71.48            +5.8       77.26        perf-stat.i.cache-miss-rate%
 2.391e+09           +24.9%  2.985e+09        perf-stat.i.cache-misses
 3.296e+09           +15.3%  3.802e+09        perf-stat.i.cache-references
      9.36            -7.4%       8.66        perf-stat.i.cpi
    291.67           -17.3%     241.22        perf-stat.i.cycles-between-cache-misses
 7.053e+10            +8.2%  7.631e+10        perf-stat.i.instructions
      0.12            +7.1%       0.13        perf-stat.i.ipc
     34.03           +14.9%      39.11        perf-stat.overall.MPKI
      2.11            +0.1        2.23        perf-stat.overall.branch-miss-rate%
     72.58            +5.9       78.51        perf-stat.overall.cache-miss-rate%
      9.04            -7.8%       8.34        perf-stat.overall.cpi
    265.78           -19.8%     213.18        perf-stat.overall.cycles-between-cache-misses
      0.11            +8.5%       0.12        perf-stat.overall.ipc
 1.359e+10            +3.4%  1.405e+10        perf-stat.ps.branch-instructions
 2.863e+08            +9.3%  3.129e+08        perf-stat.ps.branch-misses
 2.353e+09           +24.7%  2.935e+09        perf-stat.ps.cache-misses
 3.242e+09           +15.3%  3.739e+09        perf-stat.ps.cache-references
 6.915e+10            +8.5%  7.506e+10        perf-stat.ps.instructions
 4.246e+12            +8.2%  4.596e+12        perf-stat.total.instructions
     66.41 ± 70%     -49.8       16.57 ±223%  perf-profile.calltrace.cycles-pp.stress_sockmany
     66.32 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.connect.stress_sockmany
     66.32 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.connect.stress_sockmany
     66.32 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.connect.stress_sockmany
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.__sys_connect.__x64_sys_connect.do_syscall_64.entry_SYSCALL_64_after_hwframe.connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.__x64_sys_connect.do_syscall_64.entry_SYSCALL_64_after_hwframe.connect.stress_sockmany
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.__inet_stream_connect.inet_stream_connect.__sys_connect.__x64_sys_connect.do_syscall_64
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.inet_stream_connect.__sys_connect.__x64_sys_connect.do_syscall_64.entry_SYSCALL_64_after_hwframe
     66.25 ± 70%     -49.7       16.52 ±223%  perf-profile.calltrace.cycles-pp.tcp_v4_connect.__inet_stream_connect.inet_stream_connect.__sys_connect.__x64_sys_connect
     66.09 ± 70%     -49.6       16.48 ±223%  perf-profile.calltrace.cycles-pp.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect.__sys_connect
     54.17 ± 70%     -38.3       15.86 ±223%  perf-profile.calltrace.cycles-pp.__inet_check_established.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
     10.32 ± 70%     -10.3        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_bh.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
      4.67 ± 70%      -4.7        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_bh.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect
     66.53 ± 70%     -49.9       16.60 ±223%  perf-profile.children.cycles-pp.do_syscall_64
     66.53 ± 70%     -49.9       16.60 ±223%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     66.41 ± 70%     -49.8       16.57 ±223%  perf-profile.children.cycles-pp.stress_sockmany
     66.33 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.__inet_stream_connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.__sys_connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.__x64_sys_connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.inet_stream_connect
     66.25 ± 70%     -49.7       16.52 ±223%  perf-profile.children.cycles-pp.tcp_v4_connect
     66.21 ± 70%     -49.7       16.50 ±223%  perf-profile.children.cycles-pp.__inet_hash_connect
     54.25 ± 70%     -38.4       15.89 ±223%  perf-profile.children.cycles-pp.__inet_check_established
     10.37 ± 70%     -10.4        0.00        perf-profile.children.cycles-pp._raw_spin_lock_bh
      4.67 ± 70%      -4.7        0.00        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     53.42 ± 70%     -37.8       15.58 ±223%  perf-profile.self.cycles-pp.__inet_check_established
      5.65 ± 70%      -5.6        0.00        perf-profile.self.cycles-pp._raw_spin_lock_bh
      4.62 ± 70%      -4.6        0.00        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-06-10 13:58 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-10 13:57 [linus:master] [tcp] 86c2bc293b: stress-ng.sockmany.ops_per_sec 6.8% improvement kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.