* [PATCH net-next 0/2] tcp: even faster connect() under stress
@ 2025-03-05 3:45 Eric Dumazet
2025-03-05 3:45 ` [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn() Eric Dumazet
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Eric Dumazet @ 2025-03-05 3:45 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Neal Cardwell, Kuniyuki Iwashima, Jason Xing, Simon Horman,
netdev, eric.dumazet, Eric Dumazet
This is a followup on the prior series, "tcp: scale connect() under pressure"
Now spinlocks are no longer in the picture, we see a very high cost
of the inet6_ehashfn() function.
In this series (of 2), I change how lport contributes to inet6_ehashfn()
to ensure better cache locality and call inet6_ehashfn()
only once per connect() system call.
This brings an additional 229 % increase of performance
for "neper/tcp_crr -6 -T 200 -F 30000" stress test,
while greatly improving latency metrics.
Before:
latency_min=0.014131929
latency_max=17.895073144
latency_mean=0.505675853
latency_stddev=2.125164772
num_samples=307884
throughput=139866.80
After:
latency_min=0.003041375
latency_max=7.056589232
latency_mean=0.141075048
latency_stddev=0.526900516
num_samples=312996
throughput=320677.21
Eric Dumazet (2):
inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
inet: call inet6_ehashfn() once from inet6_hash_connect()
include/net/inet_hashtables.h | 4 +++-
include/net/ip.h | 2 +-
net/ipv4/inet_hashtables.c | 30 ++++++++++++++++++++----------
net/ipv6/inet6_hashtables.c | 19 +++++++++++++------
4 files changed, 37 insertions(+), 18 deletions(-)
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
2025-03-05 3:45 [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
@ 2025-03-05 3:45 ` Eric Dumazet
2025-03-06 4:24 ` Kuniyuki Iwashima
` (2 more replies)
2025-03-05 3:45 ` [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect() Eric Dumazet
` (2 subsequent siblings)
3 siblings, 3 replies; 12+ messages in thread
From: Eric Dumazet @ 2025-03-05 3:45 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Neal Cardwell, Kuniyuki Iwashima, Jason Xing, Simon Horman,
netdev, eric.dumazet, Eric Dumazet
In order to speedup __inet_hash_connect(), we want to ensure hash values
for <source address, port X, destination address, destination port>
are not randomly spread, but monotonically increasing.
Goal is to allow __inet_hash_connect() to derive the hash value
of a candidate 4-tuple with a single addition in the following
patch in the series.
Given :
hash_0 = inet_ehashfn(saddr, 0, daddr, dport)
hash_sport = inet_ehashfn(saddr, sport, daddr, dport)
Then (hash_sport == hash_0 + sport) for all sport values.
As far as I know, there is no security implication with this change.
After this patch, when __inet_hash_connect() has to try XXXX candidates,
the hash table buckets are contiguous and packed, allowing
a better use of cpu caches and hardware prefetchers.
Tested:
Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
Before this patch:
utime_start=0.271607
utime_end=3.847111
stime_start=18.407684
stime_end=1997.485557
num_transactions=1350742
latency_min=0.014131929
latency_max=17.895073144
latency_mean=0.505675853
latency_stddev=2.125164772
num_samples=307884
throughput=139866.80
perf top on client:
56.86% [kernel] [k] __inet6_check_established
17.96% [kernel] [k] __inet_hash_connect
13.88% [kernel] [k] inet6_ehashfn
2.52% [kernel] [k] rcu_all_qs
2.01% [kernel] [k] __cond_resched
0.41% [kernel] [k] _raw_spin_lock
After this patch:
utime_start=0.286131
utime_end=4.378886
stime_start=11.952556
stime_end=1991.655533
num_transactions=1446830
latency_min=0.001061085
latency_max=12.075275028
latency_mean=0.376375302
latency_stddev=1.361969596
num_samples=306383
throughput=151866.56
perf top:
50.01% [kernel] [k] __inet6_check_established
20.65% [kernel] [k] __inet_hash_connect
15.81% [kernel] [k] inet6_ehashfn
2.92% [kernel] [k] rcu_all_qs
2.34% [kernel] [k] __cond_resched
0.50% [kernel] [k] _raw_spin_lock
0.34% [kernel] [k] sched_balance_trigger
0.24% [kernel] [k] queued_spin_lock_slowpath
There is indeed an increase of throughput and reduction of latency.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/inet_hashtables.c | 4 ++--
net/ipv6/inet6_hashtables.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index d1b5f45ee718410fdf3e78c113c7ebd4a1ddba40..3025d2b708852acd9744709a897fca17564523d5 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -35,8 +35,8 @@ u32 inet_ehashfn(const struct net *net, const __be32 laddr,
{
net_get_random_once(&inet_ehash_secret, sizeof(inet_ehash_secret));
- return __inet_ehashfn(laddr, lport, faddr, fport,
- inet_ehash_secret + net_hash_mix(net));
+ return lport + __inet_ehashfn(laddr, 0, faddr, fport,
+ inet_ehash_secret + net_hash_mix(net));
}
EXPORT_SYMBOL_GPL(inet_ehashfn);
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 9be315496459fcb391123a07ac887e2f59d27360..3d95f1e75a118ff8027d4ec0f33910d23b6af832 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -35,8 +35,8 @@ u32 inet6_ehashfn(const struct net *net,
lhash = (__force u32)laddr->s6_addr32[3];
fhash = __ipv6_addr_jhash(faddr, tcp_ipv6_hash_secret);
- return __inet6_ehashfn(lhash, lport, fhash, fport,
- inet6_ehash_secret + net_hash_mix(net));
+ return lport + __inet6_ehashfn(lhash, 0, fhash, fport,
+ inet6_ehash_secret + net_hash_mix(net));
}
EXPORT_SYMBOL_GPL(inet6_ehashfn);
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect()
2025-03-05 3:45 [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
2025-03-05 3:45 ` [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn() Eric Dumazet
@ 2025-03-05 3:45 ` Eric Dumazet
2025-03-06 4:26 ` Kuniyuki Iwashima
2025-03-06 8:22 ` Jason Xing
2025-03-05 4:01 ` [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
2025-03-06 23:40 ` patchwork-bot+netdevbpf
3 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2025-03-05 3:45 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Neal Cardwell, Kuniyuki Iwashima, Jason Xing, Simon Horman,
netdev, eric.dumazet, Eric Dumazet
inet6_ehashfn() being called from __inet6_check_established()
has a big impact on performance, as shown in the Tested section.
After prior patch, we can compute the hash for port 0
from inet6_hash_connect(), and derive each hash in
__inet_hash_connect() from this initial hash:
hash(saddr, lport, daddr, dport) == hash(saddr, 0, daddr, dport) + lport
Apply the same principle for __inet_check_established(),
although inet_ehashfn() has a smaller cost.
Tested:
Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
Before this patch:
utime_start=0.286131
utime_end=4.378886
stime_start=11.952556
stime_end=1991.655533
num_transactions=1446830
latency_min=0.001061085
latency_max=12.075275028
latency_mean=0.376375302
latency_stddev=1.361969596
num_samples=306383
throughput=151866.56
perf top:
50.01% [kernel] [k] __inet6_check_established
20.65% [kernel] [k] __inet_hash_connect
15.81% [kernel] [k] inet6_ehashfn
2.92% [kernel] [k] rcu_all_qs
2.34% [kernel] [k] __cond_resched
0.50% [kernel] [k] _raw_spin_lock
0.34% [kernel] [k] sched_balance_trigger
0.24% [kernel] [k] queued_spin_lock_slowpath
After this patch:
utime_start=0.315047
utime_end=9.257617
stime_start=7.041489
stime_end=1923.688387
num_transactions=3057968
latency_min=0.003041375
latency_max=7.056589232
latency_mean=0.141075048 # Better latency metrics
latency_stddev=0.526900516
num_samples=312996
throughput=320677.21 # 111 % increase, and 229 % for the series
perf top: inet6_ehashfn is no longer seen.
39.67% [kernel] [k] __inet_hash_connect
37.06% [kernel] [k] __inet6_check_established
4.79% [kernel] [k] rcu_all_qs
3.82% [kernel] [k] __cond_resched
1.76% [kernel] [k] sched_balance_domains
0.82% [kernel] [k] _raw_spin_lock
0.81% [kernel] [k] sched_balance_rq
0.81% [kernel] [k] sched_balance_trigger
0.76% [kernel] [k] queued_spin_lock_slowpath
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/inet_hashtables.h | 4 +++-
include/net/ip.h | 2 +-
net/ipv4/inet_hashtables.c | 26 ++++++++++++++++++--------
net/ipv6/inet6_hashtables.c | 15 +++++++++++----
4 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index f447d61d95982090aac492b31e4199534970c4fb..949641e925398f741f2d4dda5898efc683b305dc 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -527,10 +527,12 @@ static inline void sk_rcv_saddr_set(struct sock *sk, __be32 addr)
int __inet_hash_connect(struct inet_timewait_death_row *death_row,
struct sock *sk, u64 port_offset,
+ u32 hash_port0,
int (*check_established)(struct inet_timewait_death_row *,
struct sock *, __u16,
struct inet_timewait_sock **,
- bool rcu_lookup));
+ bool rcu_lookup,
+ u32 hash));
int inet_hash_connect(struct inet_timewait_death_row *death_row,
struct sock *sk);
diff --git a/include/net/ip.h b/include/net/ip.h
index ce5e59957dd553697536ddf111bb1406d9d99408..8a48ade24620b4c8e2ebb4726f27a69aac7138b0 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -357,7 +357,7 @@ static inline void inet_get_local_port_range(const struct net *net, int *low, in
bool inet_sk_get_local_port_range(const struct sock *sk, int *low, int *high);
#ifdef CONFIG_SYSCTL
-static inline bool inet_is_local_reserved_port(struct net *net, unsigned short port)
+static inline bool inet_is_local_reserved_port(const struct net *net, unsigned short port)
{
if (!net->ipv4.sysctl_local_reserved_ports)
return false;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 3025d2b708852acd9744709a897fca17564523d5..1e3a9573c19834cc96d0b4cbf816f86433134450 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -538,7 +538,8 @@ EXPORT_SYMBOL_GPL(__inet_lookup_established);
static int __inet_check_established(struct inet_timewait_death_row *death_row,
struct sock *sk, __u16 lport,
struct inet_timewait_sock **twp,
- bool rcu_lookup)
+ bool rcu_lookup,
+ u32 hash)
{
struct inet_hashinfo *hinfo = death_row->hashinfo;
struct inet_sock *inet = inet_sk(sk);
@@ -549,8 +550,6 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row,
int sdif = l3mdev_master_ifindex_by_index(net, dif);
INET_ADDR_COOKIE(acookie, saddr, daddr);
const __portpair ports = INET_COMBINED_PORTS(inet->inet_dport, lport);
- unsigned int hash = inet_ehashfn(net, daddr, lport,
- saddr, inet->inet_dport);
struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash);
struct inet_timewait_sock *tw = NULL;
const struct hlist_nulls_node *node;
@@ -1007,9 +1006,10 @@ static u32 *table_perturb;
int __inet_hash_connect(struct inet_timewait_death_row *death_row,
struct sock *sk, u64 port_offset,
+ u32 hash_port0,
int (*check_established)(struct inet_timewait_death_row *,
struct sock *, __u16, struct inet_timewait_sock **,
- bool rcu_lookup))
+ bool rcu_lookup, u32 hash))
{
struct inet_hashinfo *hinfo = death_row->hashinfo;
struct inet_bind_hashbucket *head, *head2;
@@ -1027,7 +1027,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
if (port) {
local_bh_disable();
- ret = check_established(death_row, sk, port, NULL, false);
+ ret = check_established(death_row, sk, port, NULL, false,
+ hash_port0 + port);
local_bh_enable();
return ret;
}
@@ -1071,7 +1072,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
rcu_read_unlock();
goto next_port;
}
- if (!check_established(death_row, sk, port, &tw, true))
+ if (!check_established(death_row, sk, port, &tw, true,
+ hash_port0 + port))
break;
rcu_read_unlock();
goto next_port;
@@ -1090,7 +1092,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
goto next_port_unlock;
WARN_ON(hlist_empty(&tb->bhash2));
if (!check_established(death_row, sk,
- port, &tw, false))
+ port, &tw, false,
+ hash_port0 + port))
goto ok;
goto next_port_unlock;
}
@@ -1197,11 +1200,18 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
int inet_hash_connect(struct inet_timewait_death_row *death_row,
struct sock *sk)
{
+ const struct inet_sock *inet = inet_sk(sk);
+ const struct net *net = sock_net(sk);
u64 port_offset = 0;
+ u32 hash_port0;
if (!inet_sk(sk)->inet_num)
port_offset = inet_sk_port_offset(sk);
- return __inet_hash_connect(death_row, sk, port_offset,
+
+ hash_port0 = inet_ehashfn(net, inet->inet_rcv_saddr, 0,
+ inet->inet_daddr, inet->inet_dport);
+
+ return __inet_hash_connect(death_row, sk, port_offset, hash_port0,
__inet_check_established);
}
EXPORT_SYMBOL_GPL(inet_hash_connect);
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 3d95f1e75a118ff8027d4ec0f33910d23b6af832..76ee521189eb77c48845eeeac9d50b3a93a250a6 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -264,7 +264,8 @@ EXPORT_SYMBOL_GPL(inet6_lookup);
static int __inet6_check_established(struct inet_timewait_death_row *death_row,
struct sock *sk, const __u16 lport,
struct inet_timewait_sock **twp,
- bool rcu_lookup)
+ bool rcu_lookup,
+ u32 hash)
{
struct inet_hashinfo *hinfo = death_row->hashinfo;
struct inet_sock *inet = inet_sk(sk);
@@ -274,8 +275,6 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row,
struct net *net = sock_net(sk);
const int sdif = l3mdev_master_ifindex_by_index(net, dif);
const __portpair ports = INET_COMBINED_PORTS(inet->inet_dport, lport);
- const unsigned int hash = inet6_ehashfn(net, daddr, lport, saddr,
- inet->inet_dport);
struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash);
struct inet_timewait_sock *tw = NULL;
const struct hlist_nulls_node *node;
@@ -354,11 +353,19 @@ static u64 inet6_sk_port_offset(const struct sock *sk)
int inet6_hash_connect(struct inet_timewait_death_row *death_row,
struct sock *sk)
{
+ const struct in6_addr *daddr = &sk->sk_v6_rcv_saddr;
+ const struct in6_addr *saddr = &sk->sk_v6_daddr;
+ const struct inet_sock *inet = inet_sk(sk);
+ const struct net *net = sock_net(sk);
u64 port_offset = 0;
+ u32 hash_port0;
if (!inet_sk(sk)->inet_num)
port_offset = inet6_sk_port_offset(sk);
- return __inet_hash_connect(death_row, sk, port_offset,
+
+ hash_port0 = inet6_ehashfn(net, daddr, 0, saddr, inet->inet_dport);
+
+ return __inet_hash_connect(death_row, sk, port_offset, hash_port0,
__inet6_check_established);
}
EXPORT_SYMBOL_GPL(inet6_hash_connect);
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] tcp: even faster connect() under stress
2025-03-05 3:45 [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
2025-03-05 3:45 ` [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn() Eric Dumazet
2025-03-05 3:45 ` [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect() Eric Dumazet
@ 2025-03-05 4:01 ` Eric Dumazet
2025-03-06 23:40 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2025-03-05 4:01 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Neal Cardwell, Kuniyuki Iwashima, Jason Xing, Simon Horman,
netdev, eric.dumazet
On Wed, Mar 5, 2025 at 4:45 AM Eric Dumazet <edumazet@google.com> wrote:
>
> This is a followup on the prior series, "tcp: scale connect() under pressure"
>
> Now spinlocks are no longer in the picture, we see a very high cost
> of the inet6_ehashfn() function.
>
> In this series (of 2), I change how lport contributes to inet6_ehashfn()
> to ensure better cache locality and call inet6_ehashfn()
> only once per connect() system call.
>
> This brings an additional 229 % increase of performance
This is 129 % additional QPS (going from 139866.80 to 320677.21)
Sorry for the confusion :)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
2025-03-05 3:45 ` [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn() Eric Dumazet
@ 2025-03-06 4:24 ` Kuniyuki Iwashima
2025-03-06 7:54 ` Jason Xing
2025-03-17 13:44 ` kernel test robot
2 siblings, 0 replies; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-06 4:24 UTC (permalink / raw)
To: edumazet
Cc: davem, eric.dumazet, horms, kernelxing, kuba, kuniyu, ncardwell,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 5 Mar 2025 03:45:49 +0000
> In order to speedup __inet_hash_connect(), we want to ensure hash values
> for <source address, port X, destination address, destination port>
> are not randomly spread, but monotonically increasing.
>
> Goal is to allow __inet_hash_connect() to derive the hash value
> of a candidate 4-tuple with a single addition in the following
> patch in the series.
>
> Given :
> hash_0 = inet_ehashfn(saddr, 0, daddr, dport)
> hash_sport = inet_ehashfn(saddr, sport, daddr, dport)
>
> Then (hash_sport == hash_0 + sport) for all sport values.
>
> As far as I know, there is no security implication with this change.
>
> After this patch, when __inet_hash_connect() has to try XXXX candidates,
> the hash table buckets are contiguous and packed, allowing
> a better use of cpu caches and hardware prefetchers.
>
> Tested:
>
> Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
> Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
>
> Before this patch:
>
> utime_start=0.271607
> utime_end=3.847111
> stime_start=18.407684
> stime_end=1997.485557
> num_transactions=1350742
> latency_min=0.014131929
> latency_max=17.895073144
> latency_mean=0.505675853
> latency_stddev=2.125164772
> num_samples=307884
> throughput=139866.80
>
> perf top on client:
>
> 56.86% [kernel] [k] __inet6_check_established
> 17.96% [kernel] [k] __inet_hash_connect
> 13.88% [kernel] [k] inet6_ehashfn
> 2.52% [kernel] [k] rcu_all_qs
> 2.01% [kernel] [k] __cond_resched
> 0.41% [kernel] [k] _raw_spin_lock
>
> After this patch:
>
> utime_start=0.286131
> utime_end=4.378886
> stime_start=11.952556
> stime_end=1991.655533
> num_transactions=1446830
> latency_min=0.001061085
> latency_max=12.075275028
> latency_mean=0.376375302
> latency_stddev=1.361969596
> num_samples=306383
> throughput=151866.56
>
> perf top:
>
> 50.01% [kernel] [k] __inet6_check_established
> 20.65% [kernel] [k] __inet_hash_connect
> 15.81% [kernel] [k] inet6_ehashfn
> 2.92% [kernel] [k] rcu_all_qs
> 2.34% [kernel] [k] __cond_resched
> 0.50% [kernel] [k] _raw_spin_lock
> 0.34% [kernel] [k] sched_balance_trigger
> 0.24% [kernel] [k] queued_spin_lock_slowpath
>
> There is indeed an increase of throughput and reduction of latency.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect()
2025-03-05 3:45 ` [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect() Eric Dumazet
@ 2025-03-06 4:26 ` Kuniyuki Iwashima
2025-03-06 8:22 ` Jason Xing
1 sibling, 0 replies; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-06 4:26 UTC (permalink / raw)
To: edumazet
Cc: davem, eric.dumazet, horms, kernelxing, kuba, kuniyu, ncardwell,
netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 5 Mar 2025 03:45:50 +0000
> inet6_ehashfn() being called from __inet6_check_established()
> has a big impact on performance, as shown in the Tested section.
>
> After prior patch, we can compute the hash for port 0
> from inet6_hash_connect(), and derive each hash in
> __inet_hash_connect() from this initial hash:
>
> hash(saddr, lport, daddr, dport) == hash(saddr, 0, daddr, dport) + lport
>
> Apply the same principle for __inet_check_established(),
> although inet_ehashfn() has a smaller cost.
>
> Tested:
>
> Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
> Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
>
> Before this patch:
>
> utime_start=0.286131
> utime_end=4.378886
> stime_start=11.952556
> stime_end=1991.655533
> num_transactions=1446830
> latency_min=0.001061085
> latency_max=12.075275028
> latency_mean=0.376375302
> latency_stddev=1.361969596
> num_samples=306383
> throughput=151866.56
>
> perf top:
>
> 50.01% [kernel] [k] __inet6_check_established
> 20.65% [kernel] [k] __inet_hash_connect
> 15.81% [kernel] [k] inet6_ehashfn
> 2.92% [kernel] [k] rcu_all_qs
> 2.34% [kernel] [k] __cond_resched
> 0.50% [kernel] [k] _raw_spin_lock
> 0.34% [kernel] [k] sched_balance_trigger
> 0.24% [kernel] [k] queued_spin_lock_slowpath
>
> After this patch:
>
> utime_start=0.315047
> utime_end=9.257617
> stime_start=7.041489
> stime_end=1923.688387
> num_transactions=3057968
> latency_min=0.003041375
> latency_max=7.056589232
> latency_mean=0.141075048 # Better latency metrics
> latency_stddev=0.526900516
> num_samples=312996
> throughput=320677.21 # 111 % increase, and 229 % for the series
>
> perf top: inet6_ehashfn is no longer seen.
>
> 39.67% [kernel] [k] __inet_hash_connect
> 37.06% [kernel] [k] __inet6_check_established
> 4.79% [kernel] [k] rcu_all_qs
> 3.82% [kernel] [k] __cond_resched
> 1.76% [kernel] [k] sched_balance_domains
> 0.82% [kernel] [k] _raw_spin_lock
> 0.81% [kernel] [k] sched_balance_rq
> 0.81% [kernel] [k] sched_balance_trigger
> 0.76% [kernel] [k] queued_spin_lock_slowpath
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Interesting optimisation, thanks!
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
2025-03-05 3:45 ` [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn() Eric Dumazet
2025-03-06 4:24 ` Kuniyuki Iwashima
@ 2025-03-06 7:54 ` Jason Xing
2025-03-06 8:14 ` Eric Dumazet
2025-03-17 13:44 ` kernel test robot
2 siblings, 1 reply; 12+ messages in thread
From: Jason Xing @ 2025-03-06 7:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Jason Xing, Simon Horman, netdev, eric.dumazet
On Wed, Mar 5, 2025 at 11:46 AM Eric Dumazet <edumazet@google.com> wrote:
>
> In order to speedup __inet_hash_connect(), we want to ensure hash values
> for <source address, port X, destination address, destination port>
> are not randomly spread, but monotonically increasing.
>
> Goal is to allow __inet_hash_connect() to derive the hash value
> of a candidate 4-tuple with a single addition in the following
> patch in the series.
>
> Given :
> hash_0 = inet_ehashfn(saddr, 0, daddr, dport)
> hash_sport = inet_ehashfn(saddr, sport, daddr, dport)
>
> Then (hash_sport == hash_0 + sport) for all sport values.
>
> As far as I know, there is no security implication with this change.
Good to know this. The moment I read the first paragraph, I was
thinking if it might bring potential risk.
Sorry that I hesitate to bring up one question: could this new
algorithm result in sockets concentrating into several buckets instead
of being sufficiently dispersed like before. Well good news is that I
tested other cases like TCP_CRR and saw no degradation in performance.
But they didn't cover establishing from one client to many different
servers cases.
>
> After this patch, when __inet_hash_connect() has to try XXXX candidates,
> the hash table buckets are contiguous and packed, allowing
> a better use of cpu caches and hardware prefetchers.
>
> Tested:
>
> Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
> Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
>
> Before this patch:
>
> utime_start=0.271607
> utime_end=3.847111
> stime_start=18.407684
> stime_end=1997.485557
> num_transactions=1350742
> latency_min=0.014131929
> latency_max=17.895073144
> latency_mean=0.505675853
> latency_stddev=2.125164772
> num_samples=307884
> throughput=139866.80
>
> perf top on client:
>
> 56.86% [kernel] [k] __inet6_check_established
> 17.96% [kernel] [k] __inet_hash_connect
> 13.88% [kernel] [k] inet6_ehashfn
> 2.52% [kernel] [k] rcu_all_qs
> 2.01% [kernel] [k] __cond_resched
> 0.41% [kernel] [k] _raw_spin_lock
>
> After this patch:
>
> utime_start=0.286131
> utime_end=4.378886
> stime_start=11.952556
> stime_end=1991.655533
> num_transactions=1446830
> latency_min=0.001061085
> latency_max=12.075275028
> latency_mean=0.376375302
> latency_stddev=1.361969596
> num_samples=306383
> throughput=151866.56
>
> perf top:
>
> 50.01% [kernel] [k] __inet6_check_established
> 20.65% [kernel] [k] __inet_hash_connect
> 15.81% [kernel] [k] inet6_ehashfn
> 2.92% [kernel] [k] rcu_all_qs
> 2.34% [kernel] [k] __cond_resched
> 0.50% [kernel] [k] _raw_spin_lock
> 0.34% [kernel] [k] sched_balance_trigger
> 0.24% [kernel] [k] queued_spin_lock_slowpath
>
> There is indeed an increase of throughput and reduction of latency.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Throughput goes from 12829 to 26072.. The percentage increase - 103% -
is alluring to me!
Thanks,
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
2025-03-06 7:54 ` Jason Xing
@ 2025-03-06 8:14 ` Eric Dumazet
2025-03-06 8:19 ` Jason Xing
0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2025-03-06 8:14 UTC (permalink / raw)
To: Jason Xing
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Jason Xing, Simon Horman, netdev, eric.dumazet
On Thu, Mar 6, 2025 at 8:54 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Mar 5, 2025 at 11:46 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > In order to speedup __inet_hash_connect(), we want to ensure hash values
> > for <source address, port X, destination address, destination port>
> > are not randomly spread, but monotonically increasing.
> >
> > Goal is to allow __inet_hash_connect() to derive the hash value
> > of a candidate 4-tuple with a single addition in the following
> > patch in the series.
> >
> > Given :
> > hash_0 = inet_ehashfn(saddr, 0, daddr, dport)
> > hash_sport = inet_ehashfn(saddr, sport, daddr, dport)
> >
> > Then (hash_sport == hash_0 + sport) for all sport values.
> >
> > As far as I know, there is no security implication with this change.
>
> Good to know this. The moment I read the first paragraph, I was
> thinking if it might bring potential risk.
>
> Sorry that I hesitate to bring up one question: could this new
> algorithm result in sockets concentrating into several buckets instead
> of being sufficiently dispersed like before.
As I said, I see no difference for servers, since their sport is a fixed value.
What matters for them is the hash contribution of the remote address and port,
because the server port is usually well known.
This change does not change the hash distribution, an attacker will not be able
to target a particular bucket.
> Well good news is that I
> tested other cases like TCP_CRR and saw no degradation in performance.
> But they didn't cover establishing from one client to many different
> servers cases.
>
> >
> > After this patch, when __inet_hash_connect() has to try XXXX candidates,
> > the hash table buckets are contiguous and packed, allowing
> > a better use of cpu caches and hardware prefetchers.
> >
> > Tested:
> >
> > Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
> > Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
> >
> > Before this patch:
> >
> > utime_start=0.271607
> > utime_end=3.847111
> > stime_start=18.407684
> > stime_end=1997.485557
> > num_transactions=1350742
> > latency_min=0.014131929
> > latency_max=17.895073144
> > latency_mean=0.505675853
> > latency_stddev=2.125164772
> > num_samples=307884
> > throughput=139866.80
> >
> > perf top on client:
> >
> > 56.86% [kernel] [k] __inet6_check_established
> > 17.96% [kernel] [k] __inet_hash_connect
> > 13.88% [kernel] [k] inet6_ehashfn
> > 2.52% [kernel] [k] rcu_all_qs
> > 2.01% [kernel] [k] __cond_resched
> > 0.41% [kernel] [k] _raw_spin_lock
> >
> > After this patch:
> >
> > utime_start=0.286131
> > utime_end=4.378886
> > stime_start=11.952556
> > stime_end=1991.655533
> > num_transactions=1446830
> > latency_min=0.001061085
> > latency_max=12.075275028
> > latency_mean=0.376375302
> > latency_stddev=1.361969596
> > num_samples=306383
> > throughput=151866.56
> >
> > perf top:
> >
> > 50.01% [kernel] [k] __inet6_check_established
> > 20.65% [kernel] [k] __inet_hash_connect
> > 15.81% [kernel] [k] inet6_ehashfn
> > 2.92% [kernel] [k] rcu_all_qs
> > 2.34% [kernel] [k] __cond_resched
> > 0.50% [kernel] [k] _raw_spin_lock
> > 0.34% [kernel] [k] sched_balance_trigger
> > 0.24% [kernel] [k] queued_spin_lock_slowpath
> >
> > There is indeed an increase of throughput and reduction of latency.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> Tested-by: Jason Xing <kerneljasonxing@gmail.com>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
>
> Throughput goes from 12829 to 26072.. The percentage increase - 103% -
> is alluring to me!
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
2025-03-06 8:14 ` Eric Dumazet
@ 2025-03-06 8:19 ` Jason Xing
0 siblings, 0 replies; 12+ messages in thread
From: Jason Xing @ 2025-03-06 8:19 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Jason Xing, Simon Horman, netdev, eric.dumazet
On Thu, Mar 6, 2025 at 4:14 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Mar 6, 2025 at 8:54 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Wed, Mar 5, 2025 at 11:46 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > In order to speedup __inet_hash_connect(), we want to ensure hash values
> > > for <source address, port X, destination address, destination port>
> > > are not randomly spread, but monotonically increasing.
> > >
> > > Goal is to allow __inet_hash_connect() to derive the hash value
> > > of a candidate 4-tuple with a single addition in the following
> > > patch in the series.
> > >
> > > Given :
> > > hash_0 = inet_ehashfn(saddr, 0, daddr, dport)
> > > hash_sport = inet_ehashfn(saddr, sport, daddr, dport)
> > >
> > > Then (hash_sport == hash_0 + sport) for all sport values.
> > >
> > > As far as I know, there is no security implication with this change.
> >
> > Good to know this. The moment I read the first paragraph, I was
> > thinking if it might bring potential risk.
> >
> > Sorry that I hesitate to bring up one question: could this new
> > algorithm result in sockets concentrating into several buckets instead
> > of being sufficiently dispersed like before.
>
> As I said, I see no difference for servers, since their sport is a fixed value.
>
> What matters for them is the hash contribution of the remote address and port,
> because the server port is usually well known.
>
> This change does not change the hash distribution, an attacker will not be able
> to target a particular bucket.
Point taken. Thank you very much for the explanation.
Thanks,
Jason
>
> > Well good news is that I
> > tested other cases like TCP_CRR and saw no degradation in performance.
> > But they didn't cover establishing from one client to many different
> > servers cases.
> >
> > >
> > > After this patch, when __inet_hash_connect() has to try XXXX candidates,
> > > the hash table buckets are contiguous and packed, allowing
> > > a better use of cpu caches and hardware prefetchers.
> > >
> > > Tested:
> > >
> > > Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
> > > Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
> > >
> > > Before this patch:
> > >
> > > utime_start=0.271607
> > > utime_end=3.847111
> > > stime_start=18.407684
> > > stime_end=1997.485557
> > > num_transactions=1350742
> > > latency_min=0.014131929
> > > latency_max=17.895073144
> > > latency_mean=0.505675853
> > > latency_stddev=2.125164772
> > > num_samples=307884
> > > throughput=139866.80
> > >
> > > perf top on client:
> > >
> > > 56.86% [kernel] [k] __inet6_check_established
> > > 17.96% [kernel] [k] __inet_hash_connect
> > > 13.88% [kernel] [k] inet6_ehashfn
> > > 2.52% [kernel] [k] rcu_all_qs
> > > 2.01% [kernel] [k] __cond_resched
> > > 0.41% [kernel] [k] _raw_spin_lock
> > >
> > > After this patch:
> > >
> > > utime_start=0.286131
> > > utime_end=4.378886
> > > stime_start=11.952556
> > > stime_end=1991.655533
> > > num_transactions=1446830
> > > latency_min=0.001061085
> > > latency_max=12.075275028
> > > latency_mean=0.376375302
> > > latency_stddev=1.361969596
> > > num_samples=306383
> > > throughput=151866.56
> > >
> > > perf top:
> > >
> > > 50.01% [kernel] [k] __inet6_check_established
> > > 20.65% [kernel] [k] __inet_hash_connect
> > > 15.81% [kernel] [k] inet6_ehashfn
> > > 2.92% [kernel] [k] rcu_all_qs
> > > 2.34% [kernel] [k] __cond_resched
> > > 0.50% [kernel] [k] _raw_spin_lock
> > > 0.34% [kernel] [k] sched_balance_trigger
> > > 0.24% [kernel] [k] queued_spin_lock_slowpath
> > >
> > > There is indeed an increase of throughput and reduction of latency.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> >
> > Tested-by: Jason Xing <kerneljasonxing@gmail.com>
> > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> >
> > Throughput goes from 12829 to 26072.. The percentage increase - 103% -
> > is alluring to me!
> >
> > Thanks,
> > Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect()
2025-03-05 3:45 ` [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect() Eric Dumazet
2025-03-06 4:26 ` Kuniyuki Iwashima
@ 2025-03-06 8:22 ` Jason Xing
1 sibling, 0 replies; 12+ messages in thread
From: Jason Xing @ 2025-03-06 8:22 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Jason Xing, Simon Horman, netdev, eric.dumazet
On Wed, Mar 5, 2025 at 11:46 AM Eric Dumazet <edumazet@google.com> wrote:
>
> inet6_ehashfn() being called from __inet6_check_established()
> has a big impact on performance, as shown in the Tested section.
>
> After prior patch, we can compute the hash for port 0
> from inet6_hash_connect(), and derive each hash in
> __inet_hash_connect() from this initial hash:
>
> hash(saddr, lport, daddr, dport) == hash(saddr, 0, daddr, dport) + lport
>
> Apply the same principle for __inet_check_established(),
> although inet_ehashfn() has a smaller cost.
>
> Tested:
>
> Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
> Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
>
> Before this patch:
>
> utime_start=0.286131
> utime_end=4.378886
> stime_start=11.952556
> stime_end=1991.655533
> num_transactions=1446830
> latency_min=0.001061085
> latency_max=12.075275028
> latency_mean=0.376375302
> latency_stddev=1.361969596
> num_samples=306383
> throughput=151866.56
>
> perf top:
>
> 50.01% [kernel] [k] __inet6_check_established
> 20.65% [kernel] [k] __inet_hash_connect
> 15.81% [kernel] [k] inet6_ehashfn
> 2.92% [kernel] [k] rcu_all_qs
> 2.34% [kernel] [k] __cond_resched
> 0.50% [kernel] [k] _raw_spin_lock
> 0.34% [kernel] [k] sched_balance_trigger
> 0.24% [kernel] [k] queued_spin_lock_slowpath
>
> After this patch:
>
> utime_start=0.315047
> utime_end=9.257617
> stime_start=7.041489
> stime_end=1923.688387
> num_transactions=3057968
> latency_min=0.003041375
> latency_max=7.056589232
> latency_mean=0.141075048 # Better latency metrics
> latency_stddev=0.526900516
> num_samples=312996
> throughput=320677.21 # 111 % increase, and 229 % for the series
>
> perf top: inet6_ehashfn is no longer seen.
>
> 39.67% [kernel] [k] __inet_hash_connect
> 37.06% [kernel] [k] __inet6_check_established
> 4.79% [kernel] [k] rcu_all_qs
> 3.82% [kernel] [k] __cond_resched
> 1.76% [kernel] [k] sched_balance_domains
> 0.82% [kernel] [k] _raw_spin_lock
> 0.81% [kernel] [k] sched_balance_rq
> 0.81% [kernel] [k] sched_balance_trigger
> 0.76% [kernel] [k] queued_spin_lock_slowpath
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Thank you!
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] tcp: even faster connect() under stress
2025-03-05 3:45 [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
` (2 preceding siblings ...)
2025-03-05 4:01 ` [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
@ 2025-03-06 23:40 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-03-06 23:40 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, kuba, pabeni, ncardwell, kuniyu, kernelxing, horms, netdev,
eric.dumazet
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 5 Mar 2025 03:45:48 +0000 you wrote:
> This is a followup on the prior series, "tcp: scale connect() under pressure"
>
> Now spinlocks are no longer in the picture, we see a very high cost
> of the inet6_ehashfn() function.
>
> In this series (of 2), I change how lport contributes to inet6_ehashfn()
> to ensure better cache locality and call inet6_ehashfn()
> only once per connect() system call.
>
> [...]
Here is the summary with links:
- [net-next,1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
https://git.kernel.org/netdev/net-next/c/9544d60a2605
- [net-next,2/2] inet: call inet6_ehashfn() once from inet6_hash_connect()
https://git.kernel.org/netdev/net-next/c/d4438ce68bf1
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
2025-03-05 3:45 ` [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn() Eric Dumazet
2025-03-06 4:24 ` Kuniyuki Iwashima
2025-03-06 7:54 ` Jason Xing
@ 2025-03-17 13:44 ` kernel test robot
2 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2025-03-17 13:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: oe-lkp, lkp, netdev, David S . Miller, Jakub Kicinski,
Paolo Abeni, Neal Cardwell, Kuniyuki Iwashima, Jason Xing,
Simon Horman, eric.dumazet, Eric Dumazet, oliver.sang
Hello,
kernel test robot noticed a 26.0% improvement of stress-ng.sockmany.ops_per_sec on:
commit: 265acc444f8a96246e9d42b54b6931d078034218 ("[PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/inet-change-lport-contribution-to-inet_ehashfn-and-inet6_ehashfn/20250305-114734
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git f252f23ab657cd224cb8334ba69966396f3f629b
patch link: https://lore.kernel.org/all/20250305034550.879255-2-edumazet@google.com/
patch subject: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: sockmany
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.sockmany.ops_per_sec 4.4% improvement |
| test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=sockmany |
| | testtime=60s |
+------------------+---------------------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250317/202503171623.f2e16b60-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/sockmany/stress-ng/60s
commit:
f252f23ab6 ("net: Prevent use after free in netif_napi_set_irq_locked()")
265acc444f ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
f252f23ab657cd22 265acc444f8a96246e9d42b54b6
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.60 ± 6% +0.2 0.75 ± 6% mpstat.cpu.all.soft%
376850 ± 9% +15.7% 436068 ± 9% numa-numastat.node0.local_node
376612 ± 9% +15.8% 435968 ± 9% numa-vmstat.node0.numa_local
54708 +22.0% 66753 ± 2% vmstat.system.cs
2308 +1167.7% 29267 ± 26% perf-c2c.HITM.local
2499 +1078.3% 29447 ± 26% perf-c2c.HITM.total
1413 ± 8% -13.8% 1218 ± 4% sched_debug.cfs_rq:/.runnable_avg.max
28302 +21.2% 34303 ± 2% sched_debug.cpu.nr_switches.avg
39625 ± 6% +63.4% 64761 ± 6% sched_debug.cpu.nr_switches.max
4170 ± 9% +126.1% 9429 ± 8% sched_debug.cpu.nr_switches.stddev
1606932 +25.9% 2023746 ± 3% stress-ng.sockmany.ops
26687 +26.0% 33624 ± 3% stress-ng.sockmany.ops_per_sec
1561801 +28.1% 2000939 ± 3% stress-ng.time.involuntary_context_switches
1731525 +22.3% 2118259 ± 2% stress-ng.time.voluntary_context_switches
84783 +2.6% 86953 proc-vmstat.nr_shmem
5339 ± 6% -26.4% 3931 ± 16% proc-vmstat.numa_hint_faults_local
878479 +6.8% 937819 proc-vmstat.numa_hit
812262 +7.3% 871615 proc-vmstat.numa_local
2550690 +12.5% 2870404 proc-vmstat.pgalloc_normal
2407108 +13.2% 2724922 proc-vmstat.pgfree
21.96 -17.2% 18.18 ± 2% perf-stat.i.MPKI
7.517e+09 +18.8% 8.933e+09 perf-stat.i.branch-instructions
2.70 -0.7 1.96 perf-stat.i.branch-miss-rate%
2.03e+08 -13.1% 1.765e+08 perf-stat.i.branch-misses
60.22 -2.3 57.89 ± 2% perf-stat.i.cache-miss-rate%
1.472e+09 +4.7% 1.542e+09 perf-stat.i.cache-references
56669 +22.3% 69301 ± 2% perf-stat.i.context-switches
5.56 -18.4% 4.53 ± 2% perf-stat.i.cpi
4.24e+10 +19.2% 5.054e+10 perf-stat.i.instructions
0.20 +20.1% 0.24 ± 4% perf-stat.i.ipc
0.49 +21.0% 0.60 ± 8% perf-stat.i.metric.K/sec
21.03 -15.1% 17.85 perf-stat.overall.MPKI
2.70 -0.7 1.98 perf-stat.overall.branch-miss-rate%
60.56 -2.1 58.49 perf-stat.overall.cache-miss-rate%
5.34 -16.6% 4.45 perf-stat.overall.cpi
253.77 -1.7% 249.50 perf-stat.overall.cycles-between-cache-misses
0.19 +19.9% 0.22 perf-stat.overall.ipc
7.395e+09 +18.9% 8.789e+09 perf-stat.ps.branch-instructions
1.997e+08 -13.0% 1.737e+08 perf-stat.ps.branch-misses
1.448e+09 +4.7% 1.517e+09 perf-stat.ps.cache-references
55820 +22.2% 68204 ± 2% perf-stat.ps.context-switches
4.172e+10 +19.2% 4.972e+10 perf-stat.ps.instructions
2.556e+12 +20.2% 3.072e+12 ± 2% perf-stat.total.instructions
0.35 ± 9% -14.9% 0.29 ± 6% perf-sched.sch_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
0.06 ± 7% -20.5% 0.04 ± 4% perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
0.16 ±218% +798.3% 1.44 ± 40% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.25 ±152% +291.3% 0.99 ± 45% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
0.11 ±166% +568.2% 0.75 ± 45% perf-sched.sch_delay.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
0.84 ± 14% +39.2% 1.17 ± 9% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.11 ± 22% +108.5% 0.23 ± 12% perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
0.08 ± 59% -60.0% 0.03 ± 4% perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
0.16 ±218% +1286.4% 2.22 ± 25% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.13 ±153% +910.1% 1.27 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
9.23 -12.5% 8.08 perf-sched.total_wait_and_delay.average.ms
139892 +15.3% 161338 perf-sched.total_wait_and_delay.count.ms
9.18 -12.5% 8.03 perf-sched.total_wait_time.average.ms
0.70 ± 8% -14.5% 0.60 ± 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
0.11 ± 8% -20.1% 0.09 ± 4% perf-sched.wait_and_delay.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
429.48 ± 44% +63.6% 702.60 ± 11% perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
4.97 -14.0% 4.28 perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
0.23 ± 21% +104.2% 0.46 ± 12% perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
48576 ± 5% +36.3% 66215 ± 2% perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
81.83 +9.8% 89.83 ± 2% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
64098 +16.3% 74560 perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
15531 ± 17% -46.2% 8355 ± 6% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
0.36 ± 8% -14.2% 0.31 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
0.06 ± 7% -20.2% 0.04 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
0.04 ±178% -94.4% 0.00 ±130% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
0.16 ±218% +798.5% 1.44 ± 40% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.11 ±166% +568.6% 0.75 ± 45% perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
427.69 ± 45% +63.1% 697.48 ± 10% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
4.95 -14.0% 4.26 perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
0.12 ± 20% +99.9% 0.23 ± 12% perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
0.16 ±218% +1286.4% 2.22 ± 25% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.13 ±153% +911.4% 1.27 ± 34% perf-sched.wait_time.max.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s
commit:
f252f23ab6 ("net: Prevent use after free in netif_napi_set_irq_locked()")
265acc444f ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
f252f23ab657cd22 265acc444f8a96246e9d42b54b6
---------------- ---------------------------
%stddev %change %stddev
\ | \
205766 +3.2% 212279 vmstat.system.cs
309724 ± 5% +63.6% 506684 ± 9% sched_debug.cfs_rq:/.avg_vruntime.stddev
309724 ± 5% +63.6% 506684 ± 9% sched_debug.cfs_rq:/.min_vruntime.stddev
1307371 ± 8% -14.5% 1117523 ± 7% sched_debug.cpu.avg_idle.max
4333131 +4.4% 4525951 stress-ng.sockmany.ops
71816 +4.4% 74988 stress-ng.sockmany.ops_per_sec
7639150 +3.6% 7910527 stress-ng.time.voluntary_context_switches
693603 -18.6% 564616 ± 3% perf-c2c.DRAM.local
611374 -16.8% 508688 ± 2% perf-c2c.DRAM.remote
19509 +994.2% 213470 ± 7% perf-c2c.HITM.local
20252 +957.6% 214187 ± 7% perf-c2c.HITM.total
204521 +3.1% 210765 proc-vmstat.nr_shmem
938137 +2.9% 965493 proc-vmstat.nr_slab_reclaimable
3102658 +3.0% 3196837 proc-vmstat.nr_slab_unreclaimable
2113801 +1.8% 2151131 proc-vmstat.numa_hit
1881174 +2.0% 1919223 proc-vmstat.numa_local
6186586 +3.6% 6406837 proc-vmstat.pgalloc_normal
0.76 ± 46% -83.0% 0.13 ±144% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
0.02 ± 2% -6.3% 0.02 ± 2% perf-sched.sch_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
15.43 -12.6% 13.48 perf-sched.total_wait_and_delay.average.ms
234971 +15.6% 271684 perf-sched.total_wait_and_delay.count.ms
15.37 -12.6% 13.43 perf-sched.total_wait_time.average.ms
140.18 ± 5% -37.2% 88.02 ± 11% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
10.17 -14.1% 8.74 perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
4.02 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
104089 +16.4% 121193 perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
88.17 ± 6% +68.1% 148.17 ± 13% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
108724 +16.8% 127034 perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
1232 -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
4592 ± 12% +26.1% 5792 ± 14% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
11.29 ± 68% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
9.99 -13.3% 8.66 perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
139.53 ± 6% -37.2% 87.60 ± 11% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
10.15 -14.1% 8.72 perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
41.10 -17.2% 34.03 perf-stat.i.MPKI
1.424e+10 +14.6% 1.631e+10 perf-stat.i.branch-instructions
2.28 -0.1 2.17 perf-stat.i.branch-miss-rate%
3.193e+08 +9.4% 3.492e+08 perf-stat.i.branch-misses
77.01 -9.5 67.48 perf-stat.i.cache-miss-rate%
2.981e+09 -5.1% 2.83e+09 perf-stat.i.cache-misses
3.806e+09 +8.4% 4.127e+09 perf-stat.i.cache-references
217129 +3.2% 224056 perf-stat.i.context-switches
8.68 -12.7% 7.58 perf-stat.i.cpi
242.24 +4.0% 251.97 perf-stat.i.cycles-between-cache-misses
7.608e+10 +14.1% 8.679e+10 perf-stat.i.instructions
0.13 +13.3% 0.15 perf-stat.i.ipc
39.15 -16.8% 32.58 perf-stat.overall.MPKI
2.24 -0.1 2.14 perf-stat.overall.branch-miss-rate%
78.30 -9.7 68.56 perf-stat.overall.cache-miss-rate%
8.35 -12.4% 7.31 perf-stat.overall.cpi
213.17 +5.3% 224.53 perf-stat.overall.cycles-between-cache-misses
0.12 +14.1% 0.14 perf-stat.overall.ipc
1.401e+10 +14.6% 1.604e+10 perf-stat.ps.branch-instructions
3.139e+08 +9.4% 3.434e+08 perf-stat.ps.branch-misses
2.931e+09 -5.1% 2.782e+09 perf-stat.ps.cache-misses
3.743e+09 +8.4% 4.058e+09 perf-stat.ps.cache-references
213541 +3.3% 220574 perf-stat.ps.context-switches
7.485e+10 +14.1% 8.539e+10 perf-stat.ps.instructions
4.597e+12 +13.9% 5.235e+12 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-03-17 13:45 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-05 3:45 [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
2025-03-05 3:45 ` [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn() Eric Dumazet
2025-03-06 4:24 ` Kuniyuki Iwashima
2025-03-06 7:54 ` Jason Xing
2025-03-06 8:14 ` Eric Dumazet
2025-03-06 8:19 ` Jason Xing
2025-03-17 13:44 ` kernel test robot
2025-03-05 3:45 ` [PATCH net-next 2/2] inet: call inet6_ehashfn() once from inet6_hash_connect() Eric Dumazet
2025-03-06 4:26 ` Kuniyuki Iwashima
2025-03-06 8:22 ` Jason Xing
2025-03-05 4:01 ` [PATCH net-next 0/2] tcp: even faster connect() under stress Eric Dumazet
2025-03-06 23:40 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).