All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Eric Dumazet <edumazet@google.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<netdev@vger.kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Neal Cardwell <ncardwell@google.com>,
	Kuniyuki Iwashima <kuniyu@amazon.com>,
	Jason Xing <kernelxing@tencent.com>,
	Simon Horman <horms@kernel.org>, <eric.dumazet@gmail.com>,
	Eric Dumazet <edumazet@google.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
Date: Mon, 17 Mar 2025 21:44:54 +0800	[thread overview]
Message-ID: <202503171623.f2e16b60-lkp@intel.com> (raw)
In-Reply-To: <20250305034550.879255-2-edumazet@google.com>



Hello,

kernel test robot noticed a 26.0% improvement of stress-ng.sockmany.ops_per_sec on:


commit: 265acc444f8a96246e9d42b54b6931d078034218 ("[PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/inet-change-lport-contribution-to-inet_ehashfn-and-inet6_ehashfn/20250305-114734
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git f252f23ab657cd224cb8334ba69966396f3f629b
patch link: https://lore.kernel.org/all/20250305034550.879255-2-edumazet@google.com/
patch subject: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sockmany
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.sockmany.ops_per_sec 4.4% improvement                                  |
| test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters  | cpufreq_governor=performance                                                                |
|                  | nr_threads=100%                                                                             |
|                  | test=sockmany                                                                               |
|                  | testtime=60s                                                                                |
+------------------+---------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250317/202503171623.f2e16b60-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/sockmany/stress-ng/60s

commit: 
  f252f23ab6 ("net: Prevent use after free in netif_napi_set_irq_locked()")
  265acc444f ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")

f252f23ab657cd22 265acc444f8a96246e9d42b54b6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.60 ±  6%      +0.2        0.75 ±  6%  mpstat.cpu.all.soft%
    376850 ±  9%     +15.7%     436068 ±  9%  numa-numastat.node0.local_node
    376612 ±  9%     +15.8%     435968 ±  9%  numa-vmstat.node0.numa_local
     54708           +22.0%      66753 ±  2%  vmstat.system.cs
      2308         +1167.7%      29267 ± 26%  perf-c2c.HITM.local
      2499         +1078.3%      29447 ± 26%  perf-c2c.HITM.total
      1413 ±  8%     -13.8%       1218 ±  4%  sched_debug.cfs_rq:/.runnable_avg.max
     28302           +21.2%      34303 ±  2%  sched_debug.cpu.nr_switches.avg
     39625 ±  6%     +63.4%      64761 ±  6%  sched_debug.cpu.nr_switches.max
      4170 ±  9%    +126.1%       9429 ±  8%  sched_debug.cpu.nr_switches.stddev
   1606932           +25.9%    2023746 ±  3%  stress-ng.sockmany.ops
     26687           +26.0%      33624 ±  3%  stress-ng.sockmany.ops_per_sec
   1561801           +28.1%    2000939 ±  3%  stress-ng.time.involuntary_context_switches
   1731525           +22.3%    2118259 ±  2%  stress-ng.time.voluntary_context_switches
     84783            +2.6%      86953        proc-vmstat.nr_shmem
      5339 ±  6%     -26.4%       3931 ± 16%  proc-vmstat.numa_hint_faults_local
    878479            +6.8%     937819        proc-vmstat.numa_hit
    812262            +7.3%     871615        proc-vmstat.numa_local
   2550690           +12.5%    2870404        proc-vmstat.pgalloc_normal
   2407108           +13.2%    2724922        proc-vmstat.pgfree
     21.96           -17.2%      18.18 ±  2%  perf-stat.i.MPKI
 7.517e+09           +18.8%  8.933e+09        perf-stat.i.branch-instructions
      2.70            -0.7        1.96        perf-stat.i.branch-miss-rate%
  2.03e+08           -13.1%  1.765e+08        perf-stat.i.branch-misses
     60.22            -2.3       57.89 ±  2%  perf-stat.i.cache-miss-rate%
 1.472e+09            +4.7%  1.542e+09        perf-stat.i.cache-references
     56669           +22.3%      69301 ±  2%  perf-stat.i.context-switches
      5.56           -18.4%       4.53 ±  2%  perf-stat.i.cpi
  4.24e+10           +19.2%  5.054e+10        perf-stat.i.instructions
      0.20           +20.1%       0.24 ±  4%  perf-stat.i.ipc
      0.49           +21.0%       0.60 ±  8%  perf-stat.i.metric.K/sec
     21.03           -15.1%      17.85        perf-stat.overall.MPKI
      2.70            -0.7        1.98        perf-stat.overall.branch-miss-rate%
     60.56            -2.1       58.49        perf-stat.overall.cache-miss-rate%
      5.34           -16.6%       4.45        perf-stat.overall.cpi
    253.77            -1.7%     249.50        perf-stat.overall.cycles-between-cache-misses
      0.19           +19.9%       0.22        perf-stat.overall.ipc
 7.395e+09           +18.9%  8.789e+09        perf-stat.ps.branch-instructions
 1.997e+08           -13.0%  1.737e+08        perf-stat.ps.branch-misses
 1.448e+09            +4.7%  1.517e+09        perf-stat.ps.cache-references
     55820           +22.2%      68204 ±  2%  perf-stat.ps.context-switches
 4.172e+10           +19.2%  4.972e+10        perf-stat.ps.instructions
 2.556e+12           +20.2%  3.072e+12 ±  2%  perf-stat.total.instructions
      0.35 ±  9%     -14.9%       0.29 ±  6%  perf-sched.sch_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
      0.06 ±  7%     -20.5%       0.04 ±  4%  perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
      0.16 ±218%    +798.3%       1.44 ± 40%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
      0.25 ±152%    +291.3%       0.99 ± 45%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
      0.11 ±166%    +568.2%       0.75 ± 45%  perf-sched.sch_delay.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
      0.84 ± 14%     +39.2%       1.17 ±  9%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.11 ± 22%    +108.5%       0.23 ± 12%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      0.08 ± 59%     -60.0%       0.03 ±  4%  perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      0.16 ±218%   +1286.4%       2.22 ± 25%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
      0.13 ±153%    +910.1%       1.27 ± 34%  perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
      9.23           -12.5%       8.08        perf-sched.total_wait_and_delay.average.ms
    139892           +15.3%     161338        perf-sched.total_wait_and_delay.count.ms
      9.18           -12.5%       8.03        perf-sched.total_wait_time.average.ms
      0.70 ±  8%     -14.5%       0.60 ±  6%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
      0.11 ±  8%     -20.1%       0.09 ±  4%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
    429.48 ± 44%     +63.6%     702.60 ± 11%  perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      4.97           -14.0%       4.28        perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      0.23 ± 21%    +104.2%       0.46 ± 12%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
     48576 ±  5%     +36.3%      66215 ±  2%  perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
     81.83            +9.8%      89.83 ±  2%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     64098           +16.3%      74560        perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
     15531 ± 17%     -46.2%       8355 ±  6%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      0.36 ±  8%     -14.2%       0.31 ±  6%  perf-sched.wait_time.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
      0.06 ±  7%     -20.2%       0.04 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
      0.04 ±178%     -94.4%       0.00 ±130%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.16 ±218%    +798.5%       1.44 ± 40%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
      0.11 ±166%    +568.6%       0.75 ± 45%  perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
    427.69 ± 45%     +63.1%     697.48 ± 10%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      4.95           -14.0%       4.26        perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      0.12 ± 20%     +99.9%       0.23 ± 12%  perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      0.16 ±218%   +1286.4%       2.22 ± 25%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
      0.13 ±153%    +911.4%       1.27 ± 34%  perf-sched.wait_time.max.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect


***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s

commit: 
  f252f23ab6 ("net: Prevent use after free in netif_napi_set_irq_locked()")
  265acc444f ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")

f252f23ab657cd22 265acc444f8a96246e9d42b54b6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    205766            +3.2%     212279        vmstat.system.cs
    309724 ±  5%     +63.6%     506684 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.stddev
    309724 ±  5%     +63.6%     506684 ±  9%  sched_debug.cfs_rq:/.min_vruntime.stddev
   1307371 ±  8%     -14.5%    1117523 ±  7%  sched_debug.cpu.avg_idle.max
   4333131            +4.4%    4525951        stress-ng.sockmany.ops
     71816            +4.4%      74988        stress-ng.sockmany.ops_per_sec
   7639150            +3.6%    7910527        stress-ng.time.voluntary_context_switches
    693603           -18.6%     564616 ±  3%  perf-c2c.DRAM.local
    611374           -16.8%     508688 ±  2%  perf-c2c.DRAM.remote
     19509          +994.2%     213470 ±  7%  perf-c2c.HITM.local
     20252          +957.6%     214187 ±  7%  perf-c2c.HITM.total
    204521            +3.1%     210765        proc-vmstat.nr_shmem
    938137            +2.9%     965493        proc-vmstat.nr_slab_reclaimable
   3102658            +3.0%    3196837        proc-vmstat.nr_slab_unreclaimable
   2113801            +1.8%    2151131        proc-vmstat.numa_hit
   1881174            +2.0%    1919223        proc-vmstat.numa_local
   6186586            +3.6%    6406837        proc-vmstat.pgalloc_normal
      0.76 ± 46%     -83.0%       0.13 ±144%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.02 ±  2%      -6.3%       0.02 ±  2%  perf-sched.sch_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
     15.43           -12.6%      13.48        perf-sched.total_wait_and_delay.average.ms
    234971           +15.6%     271684        perf-sched.total_wait_and_delay.count.ms
     15.37           -12.6%      13.43        perf-sched.total_wait_time.average.ms
    140.18 ±  5%     -37.2%      88.02 ± 11%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     10.17           -14.1%       8.74        perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      4.02          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    104089           +16.4%     121193        perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
     88.17 ±  6%     +68.1%     148.17 ± 13%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    108724           +16.8%     127034        perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      1232          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      4592 ± 12%     +26.1%       5792 ± 14%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
     11.29 ± 68%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      9.99           -13.3%       8.66        perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
    139.53 ±  6%     -37.2%      87.60 ± 11%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     10.15           -14.1%       8.72        perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
     41.10           -17.2%      34.03        perf-stat.i.MPKI
 1.424e+10           +14.6%  1.631e+10        perf-stat.i.branch-instructions
      2.28            -0.1        2.17        perf-stat.i.branch-miss-rate%
 3.193e+08            +9.4%  3.492e+08        perf-stat.i.branch-misses
     77.01            -9.5       67.48        perf-stat.i.cache-miss-rate%
 2.981e+09            -5.1%   2.83e+09        perf-stat.i.cache-misses
 3.806e+09            +8.4%  4.127e+09        perf-stat.i.cache-references
    217129            +3.2%     224056        perf-stat.i.context-switches
      8.68           -12.7%       7.58        perf-stat.i.cpi
    242.24            +4.0%     251.97        perf-stat.i.cycles-between-cache-misses
 7.608e+10           +14.1%  8.679e+10        perf-stat.i.instructions
      0.13           +13.3%       0.15        perf-stat.i.ipc
     39.15           -16.8%      32.58        perf-stat.overall.MPKI
      2.24            -0.1        2.14        perf-stat.overall.branch-miss-rate%
     78.30            -9.7       68.56        perf-stat.overall.cache-miss-rate%
      8.35           -12.4%       7.31        perf-stat.overall.cpi
    213.17            +5.3%     224.53        perf-stat.overall.cycles-between-cache-misses
      0.12           +14.1%       0.14        perf-stat.overall.ipc
 1.401e+10           +14.6%  1.604e+10        perf-stat.ps.branch-instructions
 3.139e+08            +9.4%  3.434e+08        perf-stat.ps.branch-misses
 2.931e+09            -5.1%  2.782e+09        perf-stat.ps.cache-misses
 3.743e+09            +8.4%  4.058e+09        perf-stat.ps.cache-references
    213541            +3.3%     220574        perf-stat.ps.context-switches
 7.485e+10           +14.1%  8.539e+10        perf-stat.ps.instructions
 4.597e+12           +13.9%  5.235e+12        perf-stat.total.instructions





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  parent reply	other threads:[~2025-03-17 13:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05  3:45 [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
2025-03-05  3:45 ` [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn() Eric Dumazet
2025-03-06  4:24   ` Kuniyuki Iwashima
2025-03-06  7:54   ` Jason Xing
2025-03-06  8:14     ` Eric Dumazet
2025-03-06  8:19       ` Jason Xing
2025-03-17 13:44   ` kernel test robot [this message]
2025-03-05  3:45 ` [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect() Eric Dumazet
2025-03-06  4:26   ` Kuniyuki Iwashima
2025-03-06  8:22   ` Jason Xing
2025-03-05  4:01 ` [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
2025-03-06 23:40 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202503171623.f2e16b60-lkp@intel.com \
    --to=oliver.sang@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=horms@kernel.org \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=lkp@intel.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=oe-lkp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.