From: kernel test robot <oliver.sang@intel.com>
To: Eric Dumazet <edumazet@google.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
<netdev@vger.kernel.org>,
"David S . Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Neal Cardwell <ncardwell@google.com>,
Kuniyuki Iwashima <kuniyu@amazon.com>,
Jason Xing <kernelxing@tencent.com>,
Simon Horman <horms@kernel.org>, <eric.dumazet@gmail.com>,
Eric Dumazet <edumazet@google.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
Date: Mon, 17 Mar 2025 21:44:54 +0800 [thread overview]
Message-ID: <202503171623.f2e16b60-lkp@intel.com> (raw)
In-Reply-To: <20250305034550.879255-2-edumazet@google.com>
Hello,
kernel test robot noticed a 26.0% improvement of stress-ng.sockmany.ops_per_sec on:
commit: 265acc444f8a96246e9d42b54b6931d078034218 ("[PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/inet-change-lport-contribution-to-inet_ehashfn-and-inet6_ehashfn/20250305-114734
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git f252f23ab657cd224cb8334ba69966396f3f629b
patch link: https://lore.kernel.org/all/20250305034550.879255-2-edumazet@google.com/
patch subject: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: sockmany
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.sockmany.ops_per_sec 4.4% improvement |
| test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=sockmany |
| | testtime=60s |
+------------------+---------------------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250317/202503171623.f2e16b60-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/sockmany/stress-ng/60s
commit:
f252f23ab6 ("net: Prevent use after free in netif_napi_set_irq_locked()")
265acc444f ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
f252f23ab657cd22 265acc444f8a96246e9d42b54b6
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.60 ± 6% +0.2 0.75 ± 6% mpstat.cpu.all.soft%
376850 ± 9% +15.7% 436068 ± 9% numa-numastat.node0.local_node
376612 ± 9% +15.8% 435968 ± 9% numa-vmstat.node0.numa_local
54708 +22.0% 66753 ± 2% vmstat.system.cs
2308 +1167.7% 29267 ± 26% perf-c2c.HITM.local
2499 +1078.3% 29447 ± 26% perf-c2c.HITM.total
1413 ± 8% -13.8% 1218 ± 4% sched_debug.cfs_rq:/.runnable_avg.max
28302 +21.2% 34303 ± 2% sched_debug.cpu.nr_switches.avg
39625 ± 6% +63.4% 64761 ± 6% sched_debug.cpu.nr_switches.max
4170 ± 9% +126.1% 9429 ± 8% sched_debug.cpu.nr_switches.stddev
1606932 +25.9% 2023746 ± 3% stress-ng.sockmany.ops
26687 +26.0% 33624 ± 3% stress-ng.sockmany.ops_per_sec
1561801 +28.1% 2000939 ± 3% stress-ng.time.involuntary_context_switches
1731525 +22.3% 2118259 ± 2% stress-ng.time.voluntary_context_switches
84783 +2.6% 86953 proc-vmstat.nr_shmem
5339 ± 6% -26.4% 3931 ± 16% proc-vmstat.numa_hint_faults_local
878479 +6.8% 937819 proc-vmstat.numa_hit
812262 +7.3% 871615 proc-vmstat.numa_local
2550690 +12.5% 2870404 proc-vmstat.pgalloc_normal
2407108 +13.2% 2724922 proc-vmstat.pgfree
21.96 -17.2% 18.18 ± 2% perf-stat.i.MPKI
7.517e+09 +18.8% 8.933e+09 perf-stat.i.branch-instructions
2.70 -0.7 1.96 perf-stat.i.branch-miss-rate%
2.03e+08 -13.1% 1.765e+08 perf-stat.i.branch-misses
60.22 -2.3 57.89 ± 2% perf-stat.i.cache-miss-rate%
1.472e+09 +4.7% 1.542e+09 perf-stat.i.cache-references
56669 +22.3% 69301 ± 2% perf-stat.i.context-switches
5.56 -18.4% 4.53 ± 2% perf-stat.i.cpi
4.24e+10 +19.2% 5.054e+10 perf-stat.i.instructions
0.20 +20.1% 0.24 ± 4% perf-stat.i.ipc
0.49 +21.0% 0.60 ± 8% perf-stat.i.metric.K/sec
21.03 -15.1% 17.85 perf-stat.overall.MPKI
2.70 -0.7 1.98 perf-stat.overall.branch-miss-rate%
60.56 -2.1 58.49 perf-stat.overall.cache-miss-rate%
5.34 -16.6% 4.45 perf-stat.overall.cpi
253.77 -1.7% 249.50 perf-stat.overall.cycles-between-cache-misses
0.19 +19.9% 0.22 perf-stat.overall.ipc
7.395e+09 +18.9% 8.789e+09 perf-stat.ps.branch-instructions
1.997e+08 -13.0% 1.737e+08 perf-stat.ps.branch-misses
1.448e+09 +4.7% 1.517e+09 perf-stat.ps.cache-references
55820 +22.2% 68204 ± 2% perf-stat.ps.context-switches
4.172e+10 +19.2% 4.972e+10 perf-stat.ps.instructions
2.556e+12 +20.2% 3.072e+12 ± 2% perf-stat.total.instructions
0.35 ± 9% -14.9% 0.29 ± 6% perf-sched.sch_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
0.06 ± 7% -20.5% 0.04 ± 4% perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
0.16 ±218% +798.3% 1.44 ± 40% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.25 ±152% +291.3% 0.99 ± 45% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
0.11 ±166% +568.2% 0.75 ± 45% perf-sched.sch_delay.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
0.84 ± 14% +39.2% 1.17 ± 9% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.11 ± 22% +108.5% 0.23 ± 12% perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
0.08 ± 59% -60.0% 0.03 ± 4% perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
0.16 ±218% +1286.4% 2.22 ± 25% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.13 ±153% +910.1% 1.27 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
9.23 -12.5% 8.08 perf-sched.total_wait_and_delay.average.ms
139892 +15.3% 161338 perf-sched.total_wait_and_delay.count.ms
9.18 -12.5% 8.03 perf-sched.total_wait_time.average.ms
0.70 ± 8% -14.5% 0.60 ± 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
0.11 ± 8% -20.1% 0.09 ± 4% perf-sched.wait_and_delay.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
429.48 ± 44% +63.6% 702.60 ± 11% perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
4.97 -14.0% 4.28 perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
0.23 ± 21% +104.2% 0.46 ± 12% perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
48576 ± 5% +36.3% 66215 ± 2% perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
81.83 +9.8% 89.83 ± 2% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
64098 +16.3% 74560 perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
15531 ± 17% -46.2% 8355 ± 6% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
0.36 ± 8% -14.2% 0.31 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
0.06 ± 7% -20.2% 0.04 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
0.04 ±178% -94.4% 0.00 ±130% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
0.16 ±218% +798.5% 1.44 ± 40% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.11 ±166% +568.6% 0.75 ± 45% perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
427.69 ± 45% +63.1% 697.48 ± 10% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
4.95 -14.0% 4.26 perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
0.12 ± 20% +99.9% 0.23 ± 12% perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
0.16 ±218% +1286.4% 2.22 ± 25% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.13 ±153% +911.4% 1.27 ± 34% perf-sched.wait_time.max.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s
commit:
f252f23ab6 ("net: Prevent use after free in netif_napi_set_irq_locked()")
265acc444f ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
f252f23ab657cd22 265acc444f8a96246e9d42b54b6
---------------- ---------------------------
%stddev %change %stddev
\ | \
205766 +3.2% 212279 vmstat.system.cs
309724 ± 5% +63.6% 506684 ± 9% sched_debug.cfs_rq:/.avg_vruntime.stddev
309724 ± 5% +63.6% 506684 ± 9% sched_debug.cfs_rq:/.min_vruntime.stddev
1307371 ± 8% -14.5% 1117523 ± 7% sched_debug.cpu.avg_idle.max
4333131 +4.4% 4525951 stress-ng.sockmany.ops
71816 +4.4% 74988 stress-ng.sockmany.ops_per_sec
7639150 +3.6% 7910527 stress-ng.time.voluntary_context_switches
693603 -18.6% 564616 ± 3% perf-c2c.DRAM.local
611374 -16.8% 508688 ± 2% perf-c2c.DRAM.remote
19509 +994.2% 213470 ± 7% perf-c2c.HITM.local
20252 +957.6% 214187 ± 7% perf-c2c.HITM.total
204521 +3.1% 210765 proc-vmstat.nr_shmem
938137 +2.9% 965493 proc-vmstat.nr_slab_reclaimable
3102658 +3.0% 3196837 proc-vmstat.nr_slab_unreclaimable
2113801 +1.8% 2151131 proc-vmstat.numa_hit
1881174 +2.0% 1919223 proc-vmstat.numa_local
6186586 +3.6% 6406837 proc-vmstat.pgalloc_normal
0.76 ± 46% -83.0% 0.13 ±144% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
0.02 ± 2% -6.3% 0.02 ± 2% perf-sched.sch_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
15.43 -12.6% 13.48 perf-sched.total_wait_and_delay.average.ms
234971 +15.6% 271684 perf-sched.total_wait_and_delay.count.ms
15.37 -12.6% 13.43 perf-sched.total_wait_time.average.ms
140.18 ± 5% -37.2% 88.02 ± 11% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
10.17 -14.1% 8.74 perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
4.02 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
104089 +16.4% 121193 perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
88.17 ± 6% +68.1% 148.17 ± 13% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
108724 +16.8% 127034 perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
1232 -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
4592 ± 12% +26.1% 5792 ± 14% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
11.29 ± 68% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
9.99 -13.3% 8.66 perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
139.53 ± 6% -37.2% 87.60 ± 11% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
10.15 -14.1% 8.72 perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
41.10 -17.2% 34.03 perf-stat.i.MPKI
1.424e+10 +14.6% 1.631e+10 perf-stat.i.branch-instructions
2.28 -0.1 2.17 perf-stat.i.branch-miss-rate%
3.193e+08 +9.4% 3.492e+08 perf-stat.i.branch-misses
77.01 -9.5 67.48 perf-stat.i.cache-miss-rate%
2.981e+09 -5.1% 2.83e+09 perf-stat.i.cache-misses
3.806e+09 +8.4% 4.127e+09 perf-stat.i.cache-references
217129 +3.2% 224056 perf-stat.i.context-switches
8.68 -12.7% 7.58 perf-stat.i.cpi
242.24 +4.0% 251.97 perf-stat.i.cycles-between-cache-misses
7.608e+10 +14.1% 8.679e+10 perf-stat.i.instructions
0.13 +13.3% 0.15 perf-stat.i.ipc
39.15 -16.8% 32.58 perf-stat.overall.MPKI
2.24 -0.1 2.14 perf-stat.overall.branch-miss-rate%
78.30 -9.7 68.56 perf-stat.overall.cache-miss-rate%
8.35 -12.4% 7.31 perf-stat.overall.cpi
213.17 +5.3% 224.53 perf-stat.overall.cycles-between-cache-misses
0.12 +14.1% 0.14 perf-stat.overall.ipc
1.401e+10 +14.6% 1.604e+10 perf-stat.ps.branch-instructions
3.139e+08 +9.4% 3.434e+08 perf-stat.ps.branch-misses
2.931e+09 -5.1% 2.782e+09 perf-stat.ps.cache-misses
3.743e+09 +8.4% 4.058e+09 perf-stat.ps.cache-references
213541 +3.3% 220574 perf-stat.ps.context-switches
7.485e+10 +14.1% 8.539e+10 perf-stat.ps.instructions
4.597e+12 +13.9% 5.235e+12 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
next prev parent reply other threads:[~2025-03-17 13:45 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-05 3:45 [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
2025-03-05 3:45 ` [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn() Eric Dumazet
2025-03-06 4:24 ` Kuniyuki Iwashima
2025-03-06 7:54 ` Jason Xing
2025-03-06 8:14 ` Eric Dumazet
2025-03-06 8:19 ` Jason Xing
2025-03-17 13:44 ` kernel test robot [this message]
2025-03-05 3:45 ` [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect() Eric Dumazet
2025-03-06 4:26 ` Kuniyuki Iwashima
2025-03-06 8:22 ` Jason Xing
2025-03-05 4:01 ` [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
2025-03-06 23:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202503171623.f2e16b60-lkp@intel.com \
--to=oliver.sang@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=horms@kernel.org \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=kuniyu@amazon.com \
--cc=lkp@intel.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=oe-lkp@lists.linux.dev \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.