* [PATCH net-next 0/4] tcp: scale connect() under pressure
@ 2025-03-02 12:42 Eric Dumazet
2025-03-02 12:42 ` [PATCH net-next 1/4] tcp: use RCU in __inet{6}_check_established() Eric Dumazet
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Eric Dumazet @ 2025-03-02 12:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
Cc: Kuniyuki Iwashima, Jason Xing, Simon Horman, netdev, eric.dumazet,
Eric Dumazet
Adoption of bhash2 in linux-6.1 made some operations almost twice
more expensive, because of additional locks.
This series adds RCU in __inet_hash_connect() to help the
case where many attempts need to be made before finding
an available 4-tuple.
This brings a ~200 % improvement in this experiment:
Server:
ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
Client:
ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
Before series:
utime_start=0.288582
utime_end=1.548707
stime_start=20.637138
stime_end=2002.489845
num_transactions=484453
latency_min=0.156279245
latency_max=20.922042756
latency_mean=1.546521274
latency_stddev=3.936005194
num_samples=312537
throughput=47426.00
perf top on the client:
49.54% [kernel] [k] _raw_spin_lock
25.87% [kernel] [k] _raw_spin_lock_bh
5.97% [kernel] [k] queued_spin_lock_slowpath
5.67% [kernel] [k] __inet_hash_connect
3.53% [kernel] [k] __inet6_check_established
3.48% [kernel] [k] inet6_ehashfn
0.64% [kernel] [k] rcu_all_qs
After this series:
utime_start=0.271607
utime_end=3.847111
stime_start=18.407684
stime_end=1997.485557
num_transactions=1350742
latency_min=0.014131929
latency_max=17.895073144
latency_mean=0.505675853 # Nice reduction of latency metrics
latency_stddev=2.125164772
num_samples=307884
throughput=139866.80 # 194 % increase
perf top on client:
56.86% [kernel] [k] __inet6_check_established
17.96% [kernel] [k] __inet_hash_connect
13.88% [kernel] [k] inet6_ehashfn
2.52% [kernel] [k] rcu_all_qs
2.01% [kernel] [k] __cond_resched
0.41% [kernel] [k] _raw_spin_lock
Eric Dumazet (4):
tcp: use RCU in __inet{6}_check_established()
tcp: optimize inet_use_bhash2_on_bind()
tcp: add RCU management to inet_bind_bucket
tcp: use RCU lookup in __inet_hash_connect()
include/net/inet_hashtables.h | 7 ++--
net/ipv4/inet_connection_sock.c | 8 ++--
net/ipv4/inet_hashtables.c | 65 ++++++++++++++++++++++++---------
net/ipv4/inet_timewait_sock.c | 2 +-
net/ipv6/inet6_hashtables.c | 23 ++++++++++--
5 files changed, 75 insertions(+), 30 deletions(-)
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net-next 1/4] tcp: use RCU in __inet{6}_check_established()
2025-03-02 12:42 [PATCH net-next 0/4] tcp: scale connect() under pressure Eric Dumazet
@ 2025-03-02 12:42 ` Eric Dumazet
2025-03-03 0:24 ` Jason Xing
2025-03-04 0:20 ` Kuniyuki Iwashima
2025-03-02 12:42 ` [PATCH net-next 2/4] tcp: optimize inet_use_bhash2_on_bind() Eric Dumazet
` (3 subsequent siblings)
4 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2025-03-02 12:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
Cc: Kuniyuki Iwashima, Jason Xing, Simon Horman, netdev, eric.dumazet,
Eric Dumazet
When __inet_hash_connect() has to try many 4-tuples before
finding an available one, we see a high spinlock cost from
__inet_check_established() and/or __inet6_check_established().
This patch adds an RCU lookup to avoid the spinlock
acquisition when the 4-tuple is found in the hash table.
Note that there are still spin_lock_bh() calls in
__inet_hash_connect() to protect inet_bind_hashbucket,
this will be fixed later in this series.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
---
net/ipv4/inet_hashtables.c | 19 ++++++++++++++++---
net/ipv6/inet6_hashtables.c | 19 ++++++++++++++++---
2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 9bfcfd016e18275fb50fea8d77adc8a64fb12494..46d39aa2199ec3a405b50e8e85130e990d2c26b7 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -551,11 +551,24 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row,
unsigned int hash = inet_ehashfn(net, daddr, lport,
saddr, inet->inet_dport);
struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash);
- spinlock_t *lock = inet_ehash_lockp(hinfo, hash);
- struct sock *sk2;
- const struct hlist_nulls_node *node;
struct inet_timewait_sock *tw = NULL;
+ const struct hlist_nulls_node *node;
+ struct sock *sk2;
+ spinlock_t *lock;
+
+ rcu_read_lock();
+ sk_nulls_for_each(sk2, node, &head->chain) {
+ if (sk2->sk_hash != hash ||
+ !inet_match(net, sk2, acookie, ports, dif, sdif))
+ continue;
+ if (sk2->sk_state == TCP_TIME_WAIT)
+ break;
+ rcu_read_unlock();
+ return -EADDRNOTAVAIL;
+ }
+ rcu_read_unlock();
+ lock = inet_ehash_lockp(hinfo, hash);
spin_lock(lock);
sk_nulls_for_each(sk2, node, &head->chain) {
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 9ec05e354baa69d14e88da37f5a9fce11e874e35..3604a5cae5d29a25d24f9513308334ff8e64b083 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -276,11 +276,24 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row,
const unsigned int hash = inet6_ehashfn(net, daddr, lport, saddr,
inet->inet_dport);
struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash);
- spinlock_t *lock = inet_ehash_lockp(hinfo, hash);
- struct sock *sk2;
- const struct hlist_nulls_node *node;
struct inet_timewait_sock *tw = NULL;
+ const struct hlist_nulls_node *node;
+ struct sock *sk2;
+ spinlock_t *lock;
+
+ rcu_read_lock();
+ sk_nulls_for_each(sk2, node, &head->chain) {
+ if (sk2->sk_hash != hash ||
+ !inet6_match(net, sk2, saddr, daddr, ports, dif, sdif))
+ continue;
+ if (sk2->sk_state == TCP_TIME_WAIT)
+ break;
+ rcu_read_unlock();
+ return -EADDRNOTAVAIL;
+ }
+ rcu_read_unlock();
+ lock = inet_ehash_lockp(hinfo, hash);
spin_lock(lock);
sk_nulls_for_each(sk2, node, &head->chain) {
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next 2/4] tcp: optimize inet_use_bhash2_on_bind()
2025-03-02 12:42 [PATCH net-next 0/4] tcp: scale connect() under pressure Eric Dumazet
2025-03-02 12:42 ` [PATCH net-next 1/4] tcp: use RCU in __inet{6}_check_established() Eric Dumazet
@ 2025-03-02 12:42 ` Eric Dumazet
2025-03-03 0:24 ` Jason Xing
2025-03-04 0:22 ` Kuniyuki Iwashima
2025-03-02 12:42 ` [PATCH net-next 3/4] tcp: add RCU management to inet_bind_bucket Eric Dumazet
` (2 subsequent siblings)
4 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2025-03-02 12:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
Cc: Kuniyuki Iwashima, Jason Xing, Simon Horman, netdev, eric.dumazet,
Eric Dumazet
There is no reason to call ipv6_addr_type().
Instead, use highly optimized ipv6_addr_any() and ipv6_addr_v4mapped().
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/inet_connection_sock.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index bf9ce0c196575910b4b03fca13001979d4326297..b4e514da22b64f02cbd9f6c10698db359055e0cc 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -157,12 +157,10 @@ static bool inet_use_bhash2_on_bind(const struct sock *sk)
{
#if IS_ENABLED(CONFIG_IPV6)
if (sk->sk_family == AF_INET6) {
- int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr);
-
- if (addr_type == IPV6_ADDR_ANY)
+ if (ipv6_addr_any(&sk->sk_v6_rcv_saddr))
return false;
- if (addr_type != IPV6_ADDR_MAPPED)
+ if (!ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr))
return true;
}
#endif
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next 3/4] tcp: add RCU management to inet_bind_bucket
2025-03-02 12:42 [PATCH net-next 0/4] tcp: scale connect() under pressure Eric Dumazet
2025-03-02 12:42 ` [PATCH net-next 1/4] tcp: use RCU in __inet{6}_check_established() Eric Dumazet
2025-03-02 12:42 ` [PATCH net-next 2/4] tcp: optimize inet_use_bhash2_on_bind() Eric Dumazet
@ 2025-03-02 12:42 ` Eric Dumazet
2025-03-03 0:57 ` Jason Xing
2025-03-04 0:43 ` Kuniyuki Iwashima
2025-03-02 12:42 ` [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect() Eric Dumazet
2025-03-05 2:00 ` [PATCH net-next 0/4] tcp: scale connect() under pressure patchwork-bot+netdevbpf
4 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2025-03-02 12:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
Cc: Kuniyuki Iwashima, Jason Xing, Simon Horman, netdev, eric.dumazet,
Eric Dumazet
Add RCU protection to inet_bind_bucket structure.
- Add rcu_head field to the structure definition.
- Use kfree_rcu() at destroy time, and remove inet_bind_bucket_destroy()
first argument.
- Use hlist_del_rcu() and hlist_add_head_rcu() methods.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/inet_hashtables.h | 4 ++--
net/ipv4/inet_connection_sock.c | 2 +-
net/ipv4/inet_hashtables.c | 14 +++++++-------
net/ipv4/inet_timewait_sock.c | 2 +-
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 5eea47f135a421ce8275d4cd83c5771b3f448e5c..73c0e4087fd1a6d0d2a40ab0394165e07b08ed6d 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -89,6 +89,7 @@ struct inet_bind_bucket {
bool fast_ipv6_only;
struct hlist_node node;
struct hlist_head bhash2;
+ struct rcu_head rcu;
};
struct inet_bind2_bucket {
@@ -226,8 +227,7 @@ struct inet_bind_bucket *
inet_bind_bucket_create(struct kmem_cache *cachep, struct net *net,
struct inet_bind_hashbucket *head,
const unsigned short snum, int l3mdev);
-void inet_bind_bucket_destroy(struct kmem_cache *cachep,
- struct inet_bind_bucket *tb);
+void inet_bind_bucket_destroy(struct inet_bind_bucket *tb);
bool inet_bind_bucket_match(const struct inet_bind_bucket *tb,
const struct net *net, unsigned short port,
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index b4e514da22b64f02cbd9f6c10698db359055e0cc..e93c660340770a76446f97617ba23af32dc136fb 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -598,7 +598,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
if (bhash2_created)
inet_bind2_bucket_destroy(hinfo->bind2_bucket_cachep, tb2);
if (bhash_created)
- inet_bind_bucket_destroy(hinfo->bind_bucket_cachep, tb);
+ inet_bind_bucket_destroy(tb);
}
if (head2_lock_acquired)
spin_unlock(&head2->lock);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 46d39aa2199ec3a405b50e8e85130e990d2c26b7..b737e13f8459c53428980221355344327c4bc8dd 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -76,7 +76,7 @@ struct inet_bind_bucket *inet_bind_bucket_create(struct kmem_cache *cachep,
tb->fastreuse = 0;
tb->fastreuseport = 0;
INIT_HLIST_HEAD(&tb->bhash2);
- hlist_add_head(&tb->node, &head->chain);
+ hlist_add_head_rcu(&tb->node, &head->chain);
}
return tb;
}
@@ -84,11 +84,11 @@ struct inet_bind_bucket *inet_bind_bucket_create(struct kmem_cache *cachep,
/*
* Caller must hold hashbucket lock for this tb with local BH disabled
*/
-void inet_bind_bucket_destroy(struct kmem_cache *cachep, struct inet_bind_bucket *tb)
+void inet_bind_bucket_destroy(struct inet_bind_bucket *tb)
{
if (hlist_empty(&tb->bhash2)) {
- __hlist_del(&tb->node);
- kmem_cache_free(cachep, tb);
+ hlist_del_rcu(&tb->node);
+ kfree_rcu(tb, rcu);
}
}
@@ -201,7 +201,7 @@ static void __inet_put_port(struct sock *sk)
}
spin_unlock(&head2->lock);
- inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb);
+ inet_bind_bucket_destroy(tb);
spin_unlock(&head->lock);
}
@@ -285,7 +285,7 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
error:
if (created_inet_bind_bucket)
- inet_bind_bucket_destroy(table->bind_bucket_cachep, tb);
+ inet_bind_bucket_destroy(tb);
spin_unlock(&head2->lock);
spin_unlock(&head->lock);
return -ENOMEM;
@@ -1162,7 +1162,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
spin_unlock(&head2->lock);
if (tb_created)
- inet_bind_bucket_destroy(hinfo->bind_bucket_cachep, tb);
+ inet_bind_bucket_destroy(tb);
spin_unlock(&head->lock);
if (tw)
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 337390ba85b4082701f78f1a0913ba47c1741378..aded4bf1bc16d9f1d9fd80d60f41027dd53f38eb 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -39,7 +39,7 @@ void inet_twsk_bind_unhash(struct inet_timewait_sock *tw,
tw->tw_tb = NULL;
tw->tw_tb2 = NULL;
inet_bind2_bucket_destroy(hashinfo->bind2_bucket_cachep, tb2);
- inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb);
+ inet_bind_bucket_destroy(tb);
__sock_put((struct sock *)tw);
}
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
2025-03-02 12:42 [PATCH net-next 0/4] tcp: scale connect() under pressure Eric Dumazet
` (2 preceding siblings ...)
2025-03-02 12:42 ` [PATCH net-next 3/4] tcp: add RCU management to inet_bind_bucket Eric Dumazet
@ 2025-03-02 12:42 ` Eric Dumazet
2025-03-03 1:07 ` Jason Xing
` (2 more replies)
2025-03-05 2:00 ` [PATCH net-next 0/4] tcp: scale connect() under pressure patchwork-bot+netdevbpf
4 siblings, 3 replies; 17+ messages in thread
From: Eric Dumazet @ 2025-03-02 12:42 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
Cc: Kuniyuki Iwashima, Jason Xing, Simon Horman, netdev, eric.dumazet,
Eric Dumazet
When __inet_hash_connect() has to try many 4-tuples before
finding an available one, we see a high spinlock cost from
the many spin_lock_bh(&head->lock) performed in its loop.
This patch adds an RCU lookup to avoid the spinlock cost.
check_established() gets a new @rcu_lookup argument.
First reason is to not make any changes while head->lock
is not held.
Second reason is to not make this RCU lookup a second time
after the spinlock has been acquired.
Tested:
Server:
ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
Client:
ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
Before series:
utime_start=0.288582
utime_end=1.548707
stime_start=20.637138
stime_end=2002.489845
num_transactions=484453
latency_min=0.156279245
latency_max=20.922042756
latency_mean=1.546521274
latency_stddev=3.936005194
num_samples=312537
throughput=47426.00
perf top on the client:
49.54% [kernel] [k] _raw_spin_lock
25.87% [kernel] [k] _raw_spin_lock_bh
5.97% [kernel] [k] queued_spin_lock_slowpath
5.67% [kernel] [k] __inet_hash_connect
3.53% [kernel] [k] __inet6_check_established
3.48% [kernel] [k] inet6_ehashfn
0.64% [kernel] [k] rcu_all_qs
After this series:
utime_start=0.271607
utime_end=3.847111
stime_start=18.407684
stime_end=1997.485557
num_transactions=1350742
latency_min=0.014131929
latency_max=17.895073144
latency_mean=0.505675853 # Nice reduction of latency metrics
latency_stddev=2.125164772
num_samples=307884
throughput=139866.80 # 190 % increase
perf top on client:
56.86% [kernel] [k] __inet6_check_established
17.96% [kernel] [k] __inet_hash_connect
13.88% [kernel] [k] inet6_ehashfn
2.52% [kernel] [k] rcu_all_qs
2.01% [kernel] [k] __cond_resched
0.41% [kernel] [k] _raw_spin_lock
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/inet_hashtables.h | 3 +-
net/ipv4/inet_hashtables.c | 52 +++++++++++++++++++++++------------
net/ipv6/inet6_hashtables.c | 24 ++++++++--------
3 files changed, 50 insertions(+), 29 deletions(-)
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 73c0e4087fd1a6d0d2a40ab0394165e07b08ed6d..b12797f13c9a3d66fab99c877d059f9c29c30d11 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -529,7 +529,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
struct sock *sk, u64 port_offset,
int (*check_established)(struct inet_timewait_death_row *,
struct sock *, __u16,
- struct inet_timewait_sock **));
+ struct inet_timewait_sock **,
+ bool rcu_lookup));
int inet_hash_connect(struct inet_timewait_death_row *death_row,
struct sock *sk);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index b737e13f8459c53428980221355344327c4bc8dd..d1b5f45ee718410fdf3e78c113c7ebd4a1ddba40 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -537,7 +537,8 @@ EXPORT_SYMBOL_GPL(__inet_lookup_established);
/* called with local bh disabled */
static int __inet_check_established(struct inet_timewait_death_row *death_row,
struct sock *sk, __u16 lport,
- struct inet_timewait_sock **twp)
+ struct inet_timewait_sock **twp,
+ bool rcu_lookup)
{
struct inet_hashinfo *hinfo = death_row->hashinfo;
struct inet_sock *inet = inet_sk(sk);
@@ -556,17 +557,17 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row,
struct sock *sk2;
spinlock_t *lock;
- rcu_read_lock();
- sk_nulls_for_each(sk2, node, &head->chain) {
- if (sk2->sk_hash != hash ||
- !inet_match(net, sk2, acookie, ports, dif, sdif))
- continue;
- if (sk2->sk_state == TCP_TIME_WAIT)
- break;
- rcu_read_unlock();
- return -EADDRNOTAVAIL;
+ if (rcu_lookup) {
+ sk_nulls_for_each(sk2, node, &head->chain) {
+ if (sk2->sk_hash != hash ||
+ !inet_match(net, sk2, acookie, ports, dif, sdif))
+ continue;
+ if (sk2->sk_state == TCP_TIME_WAIT)
+ break;
+ return -EADDRNOTAVAIL;
+ }
+ return 0;
}
- rcu_read_unlock();
lock = inet_ehash_lockp(hinfo, hash);
spin_lock(lock);
@@ -1007,7 +1008,8 @@ static u32 *table_perturb;
int __inet_hash_connect(struct inet_timewait_death_row *death_row,
struct sock *sk, u64 port_offset,
int (*check_established)(struct inet_timewait_death_row *,
- struct sock *, __u16, struct inet_timewait_sock **))
+ struct sock *, __u16, struct inet_timewait_sock **,
+ bool rcu_lookup))
{
struct inet_hashinfo *hinfo = death_row->hashinfo;
struct inet_bind_hashbucket *head, *head2;
@@ -1025,7 +1027,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
if (port) {
local_bh_disable();
- ret = check_established(death_row, sk, port, NULL);
+ ret = check_established(death_row, sk, port, NULL, false);
local_bh_enable();
return ret;
}
@@ -1061,6 +1063,21 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
continue;
head = &hinfo->bhash[inet_bhashfn(net, port,
hinfo->bhash_size)];
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(tb, &head->chain, node) {
+ if (!inet_bind_bucket_match(tb, net, port, l3mdev))
+ continue;
+ if (tb->fastreuse >= 0 || tb->fastreuseport >= 0) {
+ rcu_read_unlock();
+ goto next_port;
+ }
+ if (!check_established(death_row, sk, port, &tw, true))
+ break;
+ rcu_read_unlock();
+ goto next_port;
+ }
+ rcu_read_unlock();
+
spin_lock_bh(&head->lock);
/* Does not bother with rcv_saddr checks, because
@@ -1070,12 +1087,12 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
if (inet_bind_bucket_match(tb, net, port, l3mdev)) {
if (tb->fastreuse >= 0 ||
tb->fastreuseport >= 0)
- goto next_port;
+ goto next_port_unlock;
WARN_ON(hlist_empty(&tb->bhash2));
if (!check_established(death_row, sk,
- port, &tw))
+ port, &tw, false))
goto ok;
- goto next_port;
+ goto next_port_unlock;
}
}
@@ -1089,8 +1106,9 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
tb->fastreuse = -1;
tb->fastreuseport = -1;
goto ok;
-next_port:
+next_port_unlock:
spin_unlock_bh(&head->lock);
+next_port:
cond_resched();
}
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 3604a5cae5d29a25d24f9513308334ff8e64b083..9be315496459fcb391123a07ac887e2f59d27360 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -263,7 +263,8 @@ EXPORT_SYMBOL_GPL(inet6_lookup);
static int __inet6_check_established(struct inet_timewait_death_row *death_row,
struct sock *sk, const __u16 lport,
- struct inet_timewait_sock **twp)
+ struct inet_timewait_sock **twp,
+ bool rcu_lookup)
{
struct inet_hashinfo *hinfo = death_row->hashinfo;
struct inet_sock *inet = inet_sk(sk);
@@ -281,17 +282,18 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row,
struct sock *sk2;
spinlock_t *lock;
- rcu_read_lock();
- sk_nulls_for_each(sk2, node, &head->chain) {
- if (sk2->sk_hash != hash ||
- !inet6_match(net, sk2, saddr, daddr, ports, dif, sdif))
- continue;
- if (sk2->sk_state == TCP_TIME_WAIT)
- break;
- rcu_read_unlock();
- return -EADDRNOTAVAIL;
+ if (rcu_lookup) {
+ sk_nulls_for_each(sk2, node, &head->chain) {
+ if (sk2->sk_hash != hash ||
+ !inet6_match(net, sk2, saddr, daddr,
+ ports, dif, sdif))
+ continue;
+ if (sk2->sk_state == TCP_TIME_WAIT)
+ break;
+ return -EADDRNOTAVAIL;
+ }
+ return 0;
}
- rcu_read_unlock();
lock = inet_ehash_lockp(hinfo, hash);
spin_lock(lock);
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 1/4] tcp: use RCU in __inet{6}_check_established()
2025-03-02 12:42 ` [PATCH net-next 1/4] tcp: use RCU in __inet{6}_check_established() Eric Dumazet
@ 2025-03-03 0:24 ` Jason Xing
2025-03-04 0:20 ` Kuniyuki Iwashima
1 sibling, 0 replies; 17+ messages in thread
From: Jason Xing @ 2025-03-03 0:24 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Simon Horman, netdev, eric.dumazet
On Sun, Mar 2, 2025 at 8:42 PM Eric Dumazet <edumazet@google.com> wrote:
>
> When __inet_hash_connect() has to try many 4-tuples before
> finding an available one, we see a high spinlock cost from
> __inet_check_established() and/or __inet6_check_established().
>
> This patch adds an RCU lookup to avoid the spinlock
> acquisition when the 4-tuple is found in the hash table.
>
> Note that there are still spin_lock_bh() calls in
> __inet_hash_connect() to protect inet_bind_hashbucket,
> this will be fixed later in this series.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Yesterday, I did a few tests on this single patch and managed to see a
~7% increase in performance on my virtual machine[1] :) Thanks!
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
[1]: https://lore.kernel.org/all/CAL+tcoBAVmTk_JBX=OEBqZZuoSzZd8bjuw9rgwRLMd9fvZOSkA@mail.gmail.com/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 2/4] tcp: optimize inet_use_bhash2_on_bind()
2025-03-02 12:42 ` [PATCH net-next 2/4] tcp: optimize inet_use_bhash2_on_bind() Eric Dumazet
@ 2025-03-03 0:24 ` Jason Xing
2025-03-04 0:22 ` Kuniyuki Iwashima
1 sibling, 0 replies; 17+ messages in thread
From: Jason Xing @ 2025-03-03 0:24 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Simon Horman, netdev, eric.dumazet
On Sun, Mar 2, 2025 at 8:42 PM Eric Dumazet <edumazet@google.com> wrote:
>
> There is no reason to call ipv6_addr_type().
>
> Instead, use highly optimized ipv6_addr_any() and ipv6_addr_v4mapped().
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 3/4] tcp: add RCU management to inet_bind_bucket
2025-03-02 12:42 ` [PATCH net-next 3/4] tcp: add RCU management to inet_bind_bucket Eric Dumazet
@ 2025-03-03 0:57 ` Jason Xing
2025-03-04 0:43 ` Kuniyuki Iwashima
1 sibling, 0 replies; 17+ messages in thread
From: Jason Xing @ 2025-03-03 0:57 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Simon Horman, netdev, eric.dumazet
On Sun, Mar 2, 2025 at 8:42 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Add RCU protection to inet_bind_bucket structure.
>
> - Add rcu_head field to the structure definition.
>
> - Use kfree_rcu() at destroy time, and remove inet_bind_bucket_destroy()
> first argument.
>
> - Use hlist_del_rcu() and hlist_add_head_rcu() methods.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Thanks!
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
2025-03-02 12:42 ` [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect() Eric Dumazet
@ 2025-03-03 1:07 ` Jason Xing
2025-03-03 10:25 ` Eric Dumazet
2025-03-04 0:51 ` Kuniyuki Iwashima
2025-03-10 14:03 ` kernel test robot
2 siblings, 1 reply; 17+ messages in thread
From: Jason Xing @ 2025-03-03 1:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Simon Horman, netdev, eric.dumazet
On Sun, Mar 2, 2025 at 8:42 PM Eric Dumazet <edumazet@google.com> wrote:
>
> When __inet_hash_connect() has to try many 4-tuples before
> finding an available one, we see a high spinlock cost from
> the many spin_lock_bh(&head->lock) performed in its loop.
>
> This patch adds an RCU lookup to avoid the spinlock cost.
>
> check_established() gets a new @rcu_lookup argument.
> First reason is to not make any changes while head->lock
> is not held.
> Second reason is to not make this RCU lookup a second time
> after the spinlock has been acquired.
>
> Tested:
>
> Server:
>
> ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
>
> Client:
>
> ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
>
> Before series:
>
> utime_start=0.288582
> utime_end=1.548707
> stime_start=20.637138
> stime_end=2002.489845
> num_transactions=484453
> latency_min=0.156279245
> latency_max=20.922042756
> latency_mean=1.546521274
> latency_stddev=3.936005194
> num_samples=312537
> throughput=47426.00
>
> perf top on the client:
>
> 49.54% [kernel] [k] _raw_spin_lock
> 25.87% [kernel] [k] _raw_spin_lock_bh
> 5.97% [kernel] [k] queued_spin_lock_slowpath
> 5.67% [kernel] [k] __inet_hash_connect
> 3.53% [kernel] [k] __inet6_check_established
> 3.48% [kernel] [k] inet6_ehashfn
> 0.64% [kernel] [k] rcu_all_qs
>
> After this series:
>
> utime_start=0.271607
> utime_end=3.847111
> stime_start=18.407684
> stime_end=1997.485557
> num_transactions=1350742
> latency_min=0.014131929
> latency_max=17.895073144
> latency_mean=0.505675853 # Nice reduction of latency metrics
> latency_stddev=2.125164772
> num_samples=307884
> throughput=139866.80 # 190 % increase
>
> perf top on client:
>
> 56.86% [kernel] [k] __inet6_check_established
> 17.96% [kernel] [k] __inet_hash_connect
> 13.88% [kernel] [k] inet6_ehashfn
> 2.52% [kernel] [k] rcu_all_qs
> 2.01% [kernel] [k] __cond_resched
> 0.41% [kernel] [k] _raw_spin_lock
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
I tested only on my virtual machine (with 64 cpus) and got an around
100% performance increase which is really good. And I also noticed
that the spin lock hotspot has gone :)
Thanks for working on this!!!
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
2025-03-03 1:07 ` Jason Xing
@ 2025-03-03 10:25 ` Eric Dumazet
2025-03-03 10:39 ` Jason Xing
0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2025-03-03 10:25 UTC (permalink / raw)
To: Jason Xing
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Simon Horman, netdev, eric.dumazet
On Mon, Mar 3, 2025 at 2:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Sun, Mar 2, 2025 at 8:42 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > When __inet_hash_connect() has to try many 4-tuples before
> > finding an available one, we see a high spinlock cost from
> > the many spin_lock_bh(&head->lock) performed in its loop.
> >
> > This patch adds an RCU lookup to avoid the spinlock cost.
> >
> > check_established() gets a new @rcu_lookup argument.
> > First reason is to not make any changes while head->lock
> > is not held.
> > Second reason is to not make this RCU lookup a second time
> > after the spinlock has been acquired.
> >
> > Tested:
> >
> > Server:
> >
> > ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
> >
> > Client:
> >
> > ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
> >
> > Before series:
> >
> > utime_start=0.288582
> > utime_end=1.548707
> > stime_start=20.637138
> > stime_end=2002.489845
> > num_transactions=484453
> > latency_min=0.156279245
> > latency_max=20.922042756
> > latency_mean=1.546521274
> > latency_stddev=3.936005194
> > num_samples=312537
> > throughput=47426.00
> >
> > perf top on the client:
> >
> > 49.54% [kernel] [k] _raw_spin_lock
> > 25.87% [kernel] [k] _raw_spin_lock_bh
> > 5.97% [kernel] [k] queued_spin_lock_slowpath
> > 5.67% [kernel] [k] __inet_hash_connect
> > 3.53% [kernel] [k] __inet6_check_established
> > 3.48% [kernel] [k] inet6_ehashfn
> > 0.64% [kernel] [k] rcu_all_qs
> >
> > After this series:
> >
> > utime_start=0.271607
> > utime_end=3.847111
> > stime_start=18.407684
> > stime_end=1997.485557
> > num_transactions=1350742
> > latency_min=0.014131929
> > latency_max=17.895073144
> > latency_mean=0.505675853 # Nice reduction of latency metrics
> > latency_stddev=2.125164772
> > num_samples=307884
> > throughput=139866.80 # 190 % increase
> >
> > perf top on client:
> >
> > 56.86% [kernel] [k] __inet6_check_established
> > 17.96% [kernel] [k] __inet_hash_connect
> > 13.88% [kernel] [k] inet6_ehashfn
> > 2.52% [kernel] [k] rcu_all_qs
> > 2.01% [kernel] [k] __cond_resched
> > 0.41% [kernel] [k] _raw_spin_lock
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> Tested-by: Jason Xing <kerneljasonxing@gmail.com>
>
> I tested only on my virtual machine (with 64 cpus) and got an around
> 100% performance increase which is really good. And I also noticed
> that the spin lock hotspot has gone :)
>
> Thanks for working on this!!!
Hold your breath, I have two additional patches bringing the perf to :
local_throughput=353891 # 646 % improvement
I will wait for this first series to be merged before sending these.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
2025-03-03 10:25 ` Eric Dumazet
@ 2025-03-03 10:39 ` Jason Xing
0 siblings, 0 replies; 17+ messages in thread
From: Jason Xing @ 2025-03-03 10:39 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
Kuniyuki Iwashima, Simon Horman, netdev, eric.dumazet
On Mon, Mar 3, 2025 at 6:25 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Mar 3, 2025 at 2:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Sun, Mar 2, 2025 at 8:42 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > When __inet_hash_connect() has to try many 4-tuples before
> > > finding an available one, we see a high spinlock cost from
> > > the many spin_lock_bh(&head->lock) performed in its loop.
> > >
> > > This patch adds an RCU lookup to avoid the spinlock cost.
> > >
> > > check_established() gets a new @rcu_lookup argument.
> > > First reason is to not make any changes while head->lock
> > > is not held.
> > > Second reason is to not make this RCU lookup a second time
> > > after the spinlock has been acquired.
> > >
> > > Tested:
> > >
> > > Server:
> > >
> > > ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
> > >
> > > Client:
> > >
> > > ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
> > >
> > > Before series:
> > >
> > > utime_start=0.288582
> > > utime_end=1.548707
> > > stime_start=20.637138
> > > stime_end=2002.489845
> > > num_transactions=484453
> > > latency_min=0.156279245
> > > latency_max=20.922042756
> > > latency_mean=1.546521274
> > > latency_stddev=3.936005194
> > > num_samples=312537
> > > throughput=47426.00
> > >
> > > perf top on the client:
> > >
> > > 49.54% [kernel] [k] _raw_spin_lock
> > > 25.87% [kernel] [k] _raw_spin_lock_bh
> > > 5.97% [kernel] [k] queued_spin_lock_slowpath
> > > 5.67% [kernel] [k] __inet_hash_connect
> > > 3.53% [kernel] [k] __inet6_check_established
> > > 3.48% [kernel] [k] inet6_ehashfn
> > > 0.64% [kernel] [k] rcu_all_qs
> > >
> > > After this series:
> > >
> > > utime_start=0.271607
> > > utime_end=3.847111
> > > stime_start=18.407684
> > > stime_end=1997.485557
> > > num_transactions=1350742
> > > latency_min=0.014131929
> > > latency_max=17.895073144
> > > latency_mean=0.505675853 # Nice reduction of latency metrics
> > > latency_stddev=2.125164772
> > > num_samples=307884
> > > throughput=139866.80 # 190 % increase
> > >
> > > perf top on client:
> > >
> > > 56.86% [kernel] [k] __inet6_check_established
> > > 17.96% [kernel] [k] __inet_hash_connect
> > > 13.88% [kernel] [k] inet6_ehashfn
> > > 2.52% [kernel] [k] rcu_all_qs
> > > 2.01% [kernel] [k] __cond_resched
> > > 0.41% [kernel] [k] _raw_spin_lock
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> >
> > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> > Tested-by: Jason Xing <kerneljasonxing@gmail.com>
> >
> > I tested only on my virtual machine (with 64 cpus) and got an around
> > 100% performance increase which is really good. And I also noticed
> > that the spin lock hotspot has gone :)
> >
> > Thanks for working on this!!!
>
> Hold your breath, I have two additional patches bringing the perf to :
>
> local_throughput=353891 # 646 % improvement
>
> I will wait for this first series to be merged before sending these.
OMG, I'm really shocked... It would be super cool :D
Thanks,
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 1/4] tcp: use RCU in __inet{6}_check_established()
2025-03-02 12:42 ` [PATCH net-next 1/4] tcp: use RCU in __inet{6}_check_established() Eric Dumazet
2025-03-03 0:24 ` Jason Xing
@ 2025-03-04 0:20 ` Kuniyuki Iwashima
1 sibling, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-04 0:20 UTC (permalink / raw)
To: edumazet
Cc: davem, eric.dumazet, horms, kerneljasonxing, kuba, kuniyu,
ncardwell, netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Sun, 2 Mar 2025 12:42:34 +0000
> When __inet_hash_connect() has to try many 4-tuples before
> finding an available one, we see a high spinlock cost from
> __inet_check_established() and/or __inet6_check_established().
>
> This patch adds an RCU lookup to avoid the spinlock
> acquisition when the 4-tuple is found in the hash table.
>
> Note that there are still spin_lock_bh() calls in
> __inet_hash_connect() to protect inet_bind_hashbucket,
> this will be fixed later in this series.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 2/4] tcp: optimize inet_use_bhash2_on_bind()
2025-03-02 12:42 ` [PATCH net-next 2/4] tcp: optimize inet_use_bhash2_on_bind() Eric Dumazet
2025-03-03 0:24 ` Jason Xing
@ 2025-03-04 0:22 ` Kuniyuki Iwashima
1 sibling, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-04 0:22 UTC (permalink / raw)
To: edumazet
Cc: davem, eric.dumazet, horms, kerneljasonxing, kuba, kuniyu,
ncardwell, netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Sun, 2 Mar 2025 12:42:35 +0000
> There is no reason to call ipv6_addr_type().
>
> Instead, use highly optimized ipv6_addr_any() and ipv6_addr_v4mapped().
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 3/4] tcp: add RCU management to inet_bind_bucket
2025-03-02 12:42 ` [PATCH net-next 3/4] tcp: add RCU management to inet_bind_bucket Eric Dumazet
2025-03-03 0:57 ` Jason Xing
@ 2025-03-04 0:43 ` Kuniyuki Iwashima
1 sibling, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-04 0:43 UTC (permalink / raw)
To: edumazet
Cc: davem, eric.dumazet, horms, kerneljasonxing, kuba, kuniyu,
ncardwell, netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Sun, 2 Mar 2025 12:42:36 +0000
> Add RCU protection to inet_bind_bucket structure.
>
> - Add rcu_head field to the structure definition.
>
> - Use kfree_rcu() at destroy time, and remove inet_bind_bucket_destroy()
> first argument.
>
> - Use hlist_del_rcu() and hlist_add_head_rcu() methods.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
2025-03-02 12:42 ` [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect() Eric Dumazet
2025-03-03 1:07 ` Jason Xing
@ 2025-03-04 0:51 ` Kuniyuki Iwashima
2025-03-10 14:03 ` kernel test robot
2 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-04 0:51 UTC (permalink / raw)
To: edumazet
Cc: davem, eric.dumazet, horms, kerneljasonxing, kuba, kuniyu,
ncardwell, netdev, pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Sun, 2 Mar 2025 12:42:37 +0000
> When __inet_hash_connect() has to try many 4-tuples before
> finding an available one, we see a high spinlock cost from
> the many spin_lock_bh(&head->lock) performed in its loop.
>
> This patch adds an RCU lookup to avoid the spinlock cost.
>
> check_established() gets a new @rcu_lookup argument.
> First reason is to not make any changes while head->lock
> is not held.
> Second reason is to not make this RCU lookup a second time
> after the spinlock has been acquired.
>
> Tested:
>
> Server:
>
> ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
>
> Client:
>
> ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
>
> Before series:
>
> utime_start=0.288582
> utime_end=1.548707
> stime_start=20.637138
> stime_end=2002.489845
> num_transactions=484453
> latency_min=0.156279245
> latency_max=20.922042756
> latency_mean=1.546521274
> latency_stddev=3.936005194
> num_samples=312537
> throughput=47426.00
>
> perf top on the client:
>
> 49.54% [kernel] [k] _raw_spin_lock
> 25.87% [kernel] [k] _raw_spin_lock_bh
> 5.97% [kernel] [k] queued_spin_lock_slowpath
> 5.67% [kernel] [k] __inet_hash_connect
> 3.53% [kernel] [k] __inet6_check_established
> 3.48% [kernel] [k] inet6_ehashfn
> 0.64% [kernel] [k] rcu_all_qs
>
> After this series:
>
> utime_start=0.271607
> utime_end=3.847111
> stime_start=18.407684
> stime_end=1997.485557
> num_transactions=1350742
> latency_min=0.014131929
> latency_max=17.895073144
> latency_mean=0.505675853 # Nice reduction of latency metrics
> latency_stddev=2.125164772
> num_samples=307884
> throughput=139866.80 # 190 % increase
>
> perf top on client:
>
> 56.86% [kernel] [k] __inet6_check_established
> 17.96% [kernel] [k] __inet_hash_connect
> 13.88% [kernel] [k] inet6_ehashfn
> 2.52% [kernel] [k] rcu_all_qs
> 2.01% [kernel] [k] __cond_resched
> 0.41% [kernel] [k] _raw_spin_lock
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Thanks for the great optimisation!
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 0/4] tcp: scale connect() under pressure
2025-03-02 12:42 [PATCH net-next 0/4] tcp: scale connect() under pressure Eric Dumazet
` (3 preceding siblings ...)
2025-03-02 12:42 ` [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect() Eric Dumazet
@ 2025-03-05 2:00 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-03-05 2:00 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, kuba, pabeni, ncardwell, kuniyu, kerneljasonxing, horms,
netdev, eric.dumazet
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Sun, 2 Mar 2025 12:42:33 +0000 you wrote:
> Adoption of bhash2 in linux-6.1 made some operations almost twice
> more expensive, because of additional locks.
>
> This series adds RCU in __inet_hash_connect() to help the
> case where many attempts need to be made before finding
> an available 4-tuple.
>
> [...]
Here is the summary with links:
- [net-next,1/4] tcp: use RCU in __inet{6}_check_established()
https://git.kernel.org/netdev/net-next/c/ae9d5b19b322
- [net-next,2/4] tcp: optimize inet_use_bhash2_on_bind()
https://git.kernel.org/netdev/net-next/c/ca79d80b0b9f
- [net-next,3/4] tcp: add RCU management to inet_bind_bucket
https://git.kernel.org/netdev/net-next/c/d186f405fdf4
- [net-next,4/4] tcp: use RCU lookup in __inet_hash_connect()
https://git.kernel.org/netdev/net-next/c/86c2bc293b81
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
2025-03-02 12:42 ` [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect() Eric Dumazet
2025-03-03 1:07 ` Jason Xing
2025-03-04 0:51 ` Kuniyuki Iwashima
@ 2025-03-10 14:03 ` kernel test robot
2 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2025-03-10 14:03 UTC (permalink / raw)
To: Eric Dumazet
Cc: oe-lkp, lkp, netdev, David S . Miller, Jakub Kicinski,
Paolo Abeni, Neal Cardwell, Kuniyuki Iwashima, Jason Xing,
Simon Horman, eric.dumazet, Eric Dumazet, oliver.sang
Hello,
kernel test robot noticed a 6.9% improvement of stress-ng.sockmany.ops_per_sec on:
commit: ba6c94b99d772f431fd589dd2cd606b59063557b ("[PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()")
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/tcp-use-RCU-in-__inet-6-_check_established/20250302-204711
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git f77f12010f67259bd0e1ad18877ed27c721b627a
patch link: https://lore.kernel.org/all/20250302124237.3913746-5-edumazet@google.com/
patch subject: [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: sockmany
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250310/202503102159.5f78c207-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s
commit:
4f97f75a5b ("tcp: add RCU management to inet_bind_bucket")
ba6c94b99d ("tcp: use RCU lookup in __inet_hash_connect()")
4f97f75a5bfa79ba ba6c94b99d772f431fd589dd2cd
---------------- ---------------------------
%stddev %change %stddev
\ | \
1742139 ± 89% -91.6% 146373 ± 56% numa-meminfo.node1.Unevictable
0.61 ± 3% +0.1 0.71 ± 3% mpstat.cpu.all.irq%
0.42 +0.0 0.46 ± 2% mpstat.cpu.all.usr%
435534 ± 89% -91.6% 36593 ± 56% numa-vmstat.node1.nr_unevictable
435534 ± 89% -91.6% 36593 ± 56% numa-vmstat.node1.nr_zone_unevictable
4057584 +7.0% 4340521 stress-ng.sockmany.ops
67264 +6.9% 71933 stress-ng.sockmany.ops_per_sec
604900 +12.3% 679404 ± 4% perf-c2c.DRAM.local
42998 ± 2% -55.7% 19034 ± 3% perf-c2c.HITM.local
13764 ± 4% -95.2% 663.67 ± 13% perf-c2c.HITM.remote
56762 ± 2% -65.3% 19698 ± 4% perf-c2c.HITM.total
7422009 +13.2% 8403980 ± 2% sched_debug.cfs_rq:/.avg_vruntime.max
195564 ± 5% +62.7% 318178 ± 10% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.23 ± 7% +25.4% 0.29 ± 4% sched_debug.cfs_rq:/.h_nr_queued.stddev
39935 ± 4% +27.0% 50726 ± 29% sched_debug.cfs_rq:/.load_avg.max
7422009 +13.2% 8403980 ± 2% sched_debug.cfs_rq:/.min_vruntime.max
195564 ± 5% +62.7% 318178 ± 10% sched_debug.cfs_rq:/.min_vruntime.stddev
0.23 ± 6% +26.6% 0.29 ± 4% sched_debug.cpu.nr_running.stddev
387640 +5.9% 410501 ± 9% proc-vmstat.nr_active_anon
109911 ± 2% +8.5% 119206 ± 2% proc-vmstat.nr_mapped
200627 +1.9% 204454 proc-vmstat.nr_shmem
895041 +4.9% 939289 proc-vmstat.nr_slab_reclaimable
2982921 +5.0% 3131084 proc-vmstat.nr_slab_unreclaimable
387640 +5.9% 410501 ± 9% proc-vmstat.nr_zone_active_anon
2071760 +2.0% 2112591 proc-vmstat.numa_hit
1839824 +2.2% 1880606 proc-vmstat.numa_local
5905025 +5.2% 6210697 proc-vmstat.pgalloc_normal
5291411 ± 12% +11.9% 5921072 proc-vmstat.pgfree
0.82 ± 13% -29.0% 0.58 ± 6% perf-sched.sch_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
4.50 ± 16% +29.5% 5.83 ± 15% perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
0.03 ± 56% -88.8% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
0.07 ±125% +3754.0% 2.67 ± 71% perf-sched.sch_delay.max.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap_unlocked.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
19.83 -22.3% 15.41 perf-sched.total_wait_and_delay.average.ms
177991 +32.7% 236147 perf-sched.total_wait_and_delay.count.ms
19.76 -22.3% 15.35 perf-sched.total_wait_time.average.ms
1.64 ± 12% -28.9% 1.17 ± 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
13.69 -26.2% 10.10 perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
6844 +11.8% 7651 ± 3% perf-sched.wait_and_delay.count.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
78701 +33.6% 105168 perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
81026 +35.2% 109539 perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
2268 ± 14% +90.6% 4325 ± 6% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
0.82 ± 12% -28.6% 0.59 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
13.49 -26.5% 9.91 perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
3.05 ± 3% +16.5% 3.55 ± 3% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
30.10 ± 20% -64.4% 10.72 ±113% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
1.14 ± 9% +22.2% 1.40 ± 7% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
13.67 -26.3% 10.08 perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
7.36 ± 57% +103.9% 15.01 ± 27% perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.03 ± 56% -88.8% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
0.07 ±125% +4e+05% 275.31 ±115% perf-sched.wait_time.max.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap_unlocked.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
35.70 +15.3% 41.18 perf-stat.i.MPKI
1.368e+10 +4.6% 1.431e+10 perf-stat.i.branch-instructions
2.15 +0.1 2.27 perf-stat.i.branch-miss-rate%
2.884e+08 +10.7% 3.192e+08 perf-stat.i.branch-misses
71.62 +5.5 77.09 perf-stat.i.cache-miss-rate%
2.377e+09 +26.3% 3.003e+09 perf-stat.i.cache-misses
3.264e+09 +17.4% 3.832e+09 perf-stat.i.cache-references
9.40 -8.1% 8.64 perf-stat.i.cpi
292.27 -18.0% 239.70 perf-stat.i.cycles-between-cache-misses
6.963e+10 +9.8% 7.645e+10 perf-stat.i.instructions
0.12 ± 2% +7.3% 0.13 perf-stat.i.ipc
34.12 +15.0% 39.25 perf-stat.overall.MPKI
2.11 +0.1 2.23 perf-stat.overall.branch-miss-rate%
72.81 +5.5 78.36 perf-stat.overall.cache-miss-rate%
9.07 -8.4% 8.31 perf-stat.overall.cpi
265.92 -20.4% 211.72 perf-stat.overall.cycles-between-cache-misses
0.11 +9.2% 0.12 perf-stat.overall.ipc
1.345e+10 +4.6% 1.408e+10 perf-stat.ps.branch-instructions
2.835e+08 +10.7% 3.139e+08 perf-stat.ps.branch-misses
2.337e+09 +26.3% 2.952e+09 perf-stat.ps.cache-misses
3.209e+09 +17.4% 3.768e+09 perf-stat.ps.cache-references
6.849e+10 +9.8% 7.521e+10 perf-stat.ps.instructions
4.236e+12 +9.1% 4.621e+12 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-03-10 14:03 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-02 12:42 [PATCH net-next 0/4] tcp: scale connect() under pressure Eric Dumazet
2025-03-02 12:42 ` [PATCH net-next 1/4] tcp: use RCU in __inet{6}_check_established() Eric Dumazet
2025-03-03 0:24 ` Jason Xing
2025-03-04 0:20 ` Kuniyuki Iwashima
2025-03-02 12:42 ` [PATCH net-next 2/4] tcp: optimize inet_use_bhash2_on_bind() Eric Dumazet
2025-03-03 0:24 ` Jason Xing
2025-03-04 0:22 ` Kuniyuki Iwashima
2025-03-02 12:42 ` [PATCH net-next 3/4] tcp: add RCU management to inet_bind_bucket Eric Dumazet
2025-03-03 0:57 ` Jason Xing
2025-03-04 0:43 ` Kuniyuki Iwashima
2025-03-02 12:42 ` [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect() Eric Dumazet
2025-03-03 1:07 ` Jason Xing
2025-03-03 10:25 ` Eric Dumazet
2025-03-03 10:39 ` Jason Xing
2025-03-04 0:51 ` Kuniyuki Iwashima
2025-03-10 14:03 ` kernel test robot
2025-03-05 2:00 ` [PATCH net-next 0/4] tcp: scale connect() under pressure patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).