netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 net-next 0/3] udp: Add 4-tuple hash for connected sockets
@ 2024-10-12  1:29 Philo Lu
  2024-10-12  1:29 ` [PATCH v4 net-next 1/3] net/udp: Add a new struct for hash2 slot Philo Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Philo Lu @ 2024-10-12  1:29 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing

This patchset introduces 4-tuple hash for connected udp sockets, to make
connected udp lookup faster.

Patch1: Add a new counter for hslot2 named hash4_cnt, to avoid cache line
        miss when lookup.
Patch2 and 3: Implement 4-tuple hash for ipv4.
(That for ipv6 is in progress.)

The detailed motivation is described in Patch 3.

AFAICS the patchset can be further improved by:
(a) Better interact with hash2/reuseport. Now hash4 hardly affects other
mechanisms, but maintaining sockets in both hash4 and hash2 lists seems
unnecessary.
(b) Support early demux and ipv6.

changelogs:
v3 -> v4:
- fix mistakes in udp_pernet_table_alloc() (Willem de Bruijn)

RFCv2 -> v3 (Gur Stavi):
- minor fix in udp_hashslot2() and udp_table_init()
- add rcu sync in rehash4()

RFCv1 -> RFCv2:
- add a new struct for hslot2
- remove the sockopt UDP_HASH4 because it has little side effect for
  unconnected sockets
- add rehash in connect()
- re-organize the patch into 3 smaller ones
- other minor fix

v3:
https://lore.kernel.org/all/20241010090351.79698-1-lulie@linux.alibaba.com/
RFCv2:
https://lore.kernel.org/all/20240924110414.52618-1-lulie@linux.alibaba.com/
RFCv1:
https://lore.kernel.org/all/20240913100941.8565-1-lulie@linux.alibaba.com/

Philo Lu (3):
  net/udp: Add a new struct for hash2 slot
  net/udp: Add 4-tuple hash list basis
  ipv4/udp: Add 4-tuple hash for connected socket

 include/linux/udp.h |   7 ++
 include/net/udp.h   |  44 ++++++++--
 net/ipv4/udp.c      | 197 ++++++++++++++++++++++++++++++++++++++------
 net/ipv6/udp.c      |  17 ++--
 4 files changed, 225 insertions(+), 40 deletions(-)

--
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 net-next 1/3] net/udp: Add a new struct for hash2 slot
  2024-10-12  1:29 [PATCH v4 net-next 0/3] udp: Add 4-tuple hash for connected sockets Philo Lu
@ 2024-10-12  1:29 ` Philo Lu
  2024-10-12  1:29 ` [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis Philo Lu
  2024-10-12  1:29 ` [PATCH v4 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket Philo Lu
  2 siblings, 0 replies; 11+ messages in thread
From: Philo Lu @ 2024-10-12  1:29 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing

Preparing for udp 4-tuple hash (uhash4 for short).

To implement uhash4 without cache line missing when lookup, hslot2 is
used to record the number of hashed sockets in hslot4. Thus adding a new
struct udp_hslot_main with field hash4_cnt, which is used by hash2. The
new struct is used to avoid doubling the size of udp_hslot.

Before uhash4 lookup, firstly checking hash4_cnt to see if there are
hashed sks in hslot4. Because hslot2 is always used in lookup, there is
no cache line miss.

Related helpers are updated, and use the helpers as possible.

uhash4 is implemented in following patches.

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
---
 include/net/udp.h | 27 +++++++++++++++++++++++----
 net/ipv4/udp.c    | 44 +++++++++++++++++++++++---------------------
 net/ipv6/udp.c    | 15 ++++++---------
 3 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 61222545ab1c..595364729138 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -50,7 +50,7 @@ struct udp_skb_cb {
 #define UDP_SKB_CB(__skb)	((struct udp_skb_cb *)((__skb)->cb))
 
 /**
- *	struct udp_hslot - UDP hash slot
+ *	struct udp_hslot - UDP hash slot used by udp_table.hash
  *
  *	@head:	head of list of sockets
  *	@count:	number of sockets in 'head' list
@@ -60,7 +60,19 @@ struct udp_hslot {
 	struct hlist_head	head;
 	int			count;
 	spinlock_t		lock;
-} __attribute__((aligned(2 * sizeof(long))));
+} __aligned(2 * sizeof(long));
+
+/**
+ *	struct udp_hslot_main - UDP hash slot used by udp_table.hash2
+ *
+ *	@hslot:	basic hash slot
+ *	@hash4_cnt: number of sockets in hslot4 of the same (local port, local address)
+ */
+struct udp_hslot_main {
+	struct udp_hslot	hslot; /* must be the first member */
+	u32			hash4_cnt;
+} __aligned(2 * sizeof(long));
+#define UDP_HSLOT_MAIN(__hslot) ((struct udp_hslot_main *)(__hslot))
 
 /**
  *	struct udp_table - UDP table
@@ -72,7 +84,7 @@ struct udp_hslot {
  */
 struct udp_table {
 	struct udp_hslot	*hash;
-	struct udp_hslot	*hash2;
+	struct udp_hslot_main	*hash2;
 	unsigned int		mask;
 	unsigned int		log;
 };
@@ -84,6 +96,13 @@ static inline struct udp_hslot *udp_hashslot(struct udp_table *table,
 {
 	return &table->hash[udp_hashfn(net, num, table->mask)];
 }
+
+static inline struct udp_hslot_main *udp_hashslot2_main(struct udp_table *table,
+							unsigned int hash)
+{
+	return &table->hash2[hash & table->mask];
+}
+
 /*
  * For secondary hash, net_hash_mix() is performed before calling
  * udp_hashslot2(), this explains difference with udp_hashslot()
@@ -91,7 +110,7 @@ static inline struct udp_hslot *udp_hashslot(struct udp_table *table,
 static inline struct udp_hslot *udp_hashslot2(struct udp_table *table,
 					      unsigned int hash)
 {
-	return &table->hash2[hash & table->mask];
+	return &table->hash2[hash & table->mask].hslot;
 }
 
 extern struct proto udp_prot;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8accbf4cb295..3a31e7d6d0dd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -486,13 +486,12 @@ struct sock *__udp4_lib_lookup(const struct net *net, __be32 saddr,
 		int sdif, struct udp_table *udptable, struct sk_buff *skb)
 {
 	unsigned short hnum = ntohs(dport);
-	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
 	struct sock *result, *sk;
+	unsigned int hash2;
 
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
-	slot2 = hash2 & udptable->mask;
-	hslot2 = &udptable->hash2[slot2];
+	hslot2 = udp_hashslot2(udptable, hash2);
 
 	/* Lookup connected or non-wildcard socket */
 	result = udp4_lib_lookup2(net, saddr, sport,
@@ -519,8 +518,7 @@ struct sock *__udp4_lib_lookup(const struct net *net, __be32 saddr,
 
 	/* Lookup wildcard sockets */
 	hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
-	slot2 = hash2 & udptable->mask;
-	hslot2 = &udptable->hash2[slot2];
+	hslot2 = udp_hashslot2(udptable, hash2);
 
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  htonl(INADDR_ANY), hnum, dif, sdif,
@@ -2266,7 +2264,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			    udptable->mask;
 		hash2 = ipv4_portaddr_hash(net, daddr, hnum) & udptable->mask;
 start_lookup:
-		hslot = &udptable->hash2[hash2];
+		hslot = &udptable->hash2[hash2].hslot;
 		offset = offsetof(typeof(*sk), __sk_common.skc_portaddr_node);
 	}
 
@@ -2537,14 +2535,13 @@ static struct sock *__udp4_lib_demux_lookup(struct net *net,
 	struct udp_table *udptable = net->ipv4.udp_table;
 	INET_ADDR_COOKIE(acookie, rmt_addr, loc_addr);
 	unsigned short hnum = ntohs(loc_port);
-	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
+	unsigned int hash2;
 	__portpair ports;
 	struct sock *sk;
 
 	hash2 = ipv4_portaddr_hash(net, loc_addr, hnum);
-	slot2 = hash2 & udptable->mask;
-	hslot2 = &udptable->hash2[slot2];
+	hslot2 = udp_hashslot2(udptable, hash2);
 	ports = INET_COMBINED_PORTS(rmt_port, hnum);
 
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
@@ -3185,7 +3182,7 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
 	batch_sks = 0;
 
 	for (; state->bucket <= udptable->mask; state->bucket++) {
-		struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
+		struct udp_hslot *hslot2 = &udptable->hash2[state->bucket].hslot;
 
 		if (hlist_empty(&hslot2->head))
 			continue;
@@ -3426,10 +3423,11 @@ __setup("uhash_entries=", set_uhash_entries);
 
 void __init udp_table_init(struct udp_table *table, const char *name)
 {
-	unsigned int i;
+	unsigned int i, slot_size;
 
+	slot_size = sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main);
 	table->hash = alloc_large_system_hash(name,
-					      2 * sizeof(struct udp_hslot),
+					      slot_size,
 					      uhash_entries,
 					      21, /* one slot per 2 MB */
 					      0,
@@ -3438,16 +3436,17 @@ void __init udp_table_init(struct udp_table *table, const char *name)
 					      UDP_HTABLE_SIZE_MIN,
 					      UDP_HTABLE_SIZE_MAX);
 
-	table->hash2 = table->hash + (table->mask + 1);
+	table->hash2 = (void *)(table->hash + (table->mask + 1));
 	for (i = 0; i <= table->mask; i++) {
 		INIT_HLIST_HEAD(&table->hash[i].head);
 		table->hash[i].count = 0;
 		spin_lock_init(&table->hash[i].lock);
 	}
 	for (i = 0; i <= table->mask; i++) {
-		INIT_HLIST_HEAD(&table->hash2[i].head);
-		table->hash2[i].count = 0;
-		spin_lock_init(&table->hash2[i].lock);
+		INIT_HLIST_HEAD(&table->hash2[i].hslot.head);
+		table->hash2[i].hslot.count = 0;
+		spin_lock_init(&table->hash2[i].hslot.lock);
+		table->hash2[i].hash4_cnt = 0;
 	}
 }
 
@@ -3474,18 +3473,20 @@ static void __net_init udp_sysctl_init(struct net *net)
 static struct udp_table __net_init *udp_pernet_table_alloc(unsigned int hash_entries)
 {
 	struct udp_table *udptable;
+	unsigned int slot_size;
 	int i;
 
 	udptable = kmalloc(sizeof(*udptable), GFP_KERNEL);
 	if (!udptable)
 		goto out;
 
-	udptable->hash = vmalloc_huge(hash_entries * 2 * sizeof(struct udp_hslot),
+	slot_size = sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main);
+	udptable->hash = vmalloc_huge(hash_entries * slot_size,
 				      GFP_KERNEL_ACCOUNT);
 	if (!udptable->hash)
 		goto free_table;
 
-	udptable->hash2 = udptable->hash + hash_entries;
+	udptable->hash2 = (void *)(udptable->hash + hash_entries);
 	udptable->mask = hash_entries - 1;
 	udptable->log = ilog2(hash_entries);
 
@@ -3494,9 +3495,10 @@ static struct udp_table __net_init *udp_pernet_table_alloc(unsigned int hash_ent
 		udptable->hash[i].count = 0;
 		spin_lock_init(&udptable->hash[i].lock);
 
-		INIT_HLIST_HEAD(&udptable->hash2[i].head);
-		udptable->hash2[i].count = 0;
-		spin_lock_init(&udptable->hash2[i].lock);
+		INIT_HLIST_HEAD(&udptable->hash2[i].hslot.head);
+		udptable->hash2[i].hslot.count = 0;
+		spin_lock_init(&udptable->hash2[i].hslot.lock);
+		udptable->hash2[i].hash4_cnt = 0;
 	}
 
 	return udptable;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 52dfbb2ff1a8..bbf3352213c4 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -224,13 +224,12 @@ struct sock *__udp6_lib_lookup(const struct net *net,
 			       struct sk_buff *skb)
 {
 	unsigned short hnum = ntohs(dport);
-	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
 	struct sock *result, *sk;
+	unsigned int hash2;
 
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
-	slot2 = hash2 & udptable->mask;
-	hslot2 = &udptable->hash2[slot2];
+	hslot2 = udp_hashslot2(udptable, hash2);
 
 	/* Lookup connected or non-wildcard sockets */
 	result = udp6_lib_lookup2(net, saddr, sport,
@@ -257,8 +256,7 @@ struct sock *__udp6_lib_lookup(const struct net *net,
 
 	/* Lookup wildcard sockets */
 	hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
-	slot2 = hash2 & udptable->mask;
-	hslot2 = &udptable->hash2[slot2];
+	hslot2 = udp_hashslot2(udptable, hash2);
 
 	result = udp6_lib_lookup2(net, saddr, sport,
 				  &in6addr_any, hnum, dif, sdif,
@@ -859,7 +857,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			    udptable->mask;
 		hash2 = ipv6_portaddr_hash(net, daddr, hnum) & udptable->mask;
 start_lookup:
-		hslot = &udptable->hash2[hash2];
+		hslot = &udptable->hash2[hash2].hslot;
 		offset = offsetof(typeof(*sk), __sk_common.skc_portaddr_node);
 	}
 
@@ -1065,14 +1063,13 @@ static struct sock *__udp6_lib_demux_lookup(struct net *net,
 {
 	struct udp_table *udptable = net->ipv4.udp_table;
 	unsigned short hnum = ntohs(loc_port);
-	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
+	unsigned int hash2;
 	__portpair ports;
 	struct sock *sk;
 
 	hash2 = ipv6_portaddr_hash(net, loc_addr, hnum);
-	slot2 = hash2 & udptable->mask;
-	hslot2 = &udptable->hash2[slot2];
+	hslot2 = udp_hashslot2(udptable, hash2);
 	ports = INET_COMBINED_PORTS(rmt_port, hnum);
 
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis
  2024-10-12  1:29 [PATCH v4 net-next 0/3] udp: Add 4-tuple hash for connected sockets Philo Lu
  2024-10-12  1:29 ` [PATCH v4 net-next 1/3] net/udp: Add a new struct for hash2 slot Philo Lu
@ 2024-10-12  1:29 ` Philo Lu
  2024-10-14 10:07   ` Paolo Abeni
  2024-10-12  1:29 ` [PATCH v4 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket Philo Lu
  2 siblings, 1 reply; 11+ messages in thread
From: Philo Lu @ 2024-10-12  1:29 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing

Add a new hash list, hash4, in udp table. It will be used to implement
4-tuple hash for connected udp sockets. This patch adds the hlist to
table, and implements helpers and the initialization. 4-tuple hash is
implemented in the following patch.

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com>
Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
---
 include/linux/udp.h |  7 +++++++
 include/net/udp.h   | 16 +++++++++++++++-
 net/ipv4/udp.c      | 15 +++++++++++++--
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index 3eb3f2b9a2a0..c04808360a05 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -56,6 +56,10 @@ struct udp_sock {
 	int		 pending;	/* Any pending frames ? */
 	__u8		 encap_type;	/* Is this an Encapsulation socket? */
 
+	/* For UDP 4-tuple hash */
+	__u16 udp_lrpa_hash;
+	struct hlist_node udp_lrpa_node;
+
 	/*
 	 * Following member retains the information to create a UDP header
 	 * when the socket is uncorked.
@@ -206,6 +210,9 @@ static inline void udp_allow_gso(struct sock *sk)
 #define udp_portaddr_for_each_entry_rcu(__sk, list) \
 	hlist_for_each_entry_rcu(__sk, list, __sk_common.skc_portaddr_node)
 
+#define udp_lrpa_for_each_entry_rcu(__up, list) \
+	hlist_for_each_entry_rcu(__up, list, udp_lrpa_node)
+
 #define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE)
 
 #endif	/* _LINUX_UDP_H */
diff --git a/include/net/udp.h b/include/net/udp.h
index 595364729138..80f9622d0db3 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -50,7 +50,7 @@ struct udp_skb_cb {
 #define UDP_SKB_CB(__skb)	((struct udp_skb_cb *)((__skb)->cb))
 
 /**
- *	struct udp_hslot - UDP hash slot used by udp_table.hash
+ *	struct udp_hslot - UDP hash slot used by udp_table.hash/hash4
  *
  *	@head:	head of list of sockets
  *	@count:	number of sockets in 'head' list
@@ -79,12 +79,15 @@ struct udp_hslot_main {
  *
  *	@hash:	hash table, sockets are hashed on (local port)
  *	@hash2:	hash table, sockets are hashed on (local port, local address)
+ *	@hash4:	hash table, connected sockets are hashed on
+ *		(local port, local address, remote port, remote address)
  *	@mask:	number of slots in hash tables, minus 1
  *	@log:	log2(number of slots in hash table)
  */
 struct udp_table {
 	struct udp_hslot	*hash;
 	struct udp_hslot_main	*hash2;
+	struct udp_hslot	*hash4;
 	unsigned int		mask;
 	unsigned int		log;
 };
@@ -113,6 +116,17 @@ static inline struct udp_hslot *udp_hashslot2(struct udp_table *table,
 	return &table->hash2[hash & table->mask].hslot;
 }
 
+static inline struct udp_hslot *udp_hashslot4(struct udp_table *table,
+					      unsigned int hash)
+{
+	return &table->hash4[hash & table->mask];
+}
+
+static inline bool udp_hashed4(const struct sock *sk)
+{
+	return !hlist_unhashed(&udp_sk(sk)->udp_lrpa_node);
+}
+
 extern struct proto udp_prot;
 
 extern atomic_long_t udp_memory_allocated;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3a31e7d6d0dd..1498ccb79c58 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -3425,7 +3425,7 @@ void __init udp_table_init(struct udp_table *table, const char *name)
 {
 	unsigned int i, slot_size;
 
-	slot_size = sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main);
+	slot_size = 2 * sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main);
 	table->hash = alloc_large_system_hash(name,
 					      slot_size,
 					      uhash_entries,
@@ -3437,6 +3437,7 @@ void __init udp_table_init(struct udp_table *table, const char *name)
 					      UDP_HTABLE_SIZE_MAX);
 
 	table->hash2 = (void *)(table->hash + (table->mask + 1));
+	table->hash4 = (void *)(table->hash2 + (table->mask + 1));
 	for (i = 0; i <= table->mask; i++) {
 		INIT_HLIST_HEAD(&table->hash[i].head);
 		table->hash[i].count = 0;
@@ -3448,6 +3449,11 @@ void __init udp_table_init(struct udp_table *table, const char *name)
 		spin_lock_init(&table->hash2[i].hslot.lock);
 		table->hash2[i].hash4_cnt = 0;
 	}
+	for (i = 0; i <= table->mask; i++) {
+		INIT_HLIST_HEAD(&table->hash4[i].head);
+		table->hash4[i].count = 0;
+		spin_lock_init(&table->hash4[i].lock);
+	}
 }
 
 u32 udp_flow_hashrnd(void)
@@ -3480,13 +3486,14 @@ static struct udp_table __net_init *udp_pernet_table_alloc(unsigned int hash_ent
 	if (!udptable)
 		goto out;
 
-	slot_size = sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main);
+	slot_size = 2 * sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main);
 	udptable->hash = vmalloc_huge(hash_entries * slot_size,
 				      GFP_KERNEL_ACCOUNT);
 	if (!udptable->hash)
 		goto free_table;
 
 	udptable->hash2 = (void *)(udptable->hash + hash_entries);
+	udptable->hash4 = (void *)(udptable->hash2 + hash_entries);
 	udptable->mask = hash_entries - 1;
 	udptable->log = ilog2(hash_entries);
 
@@ -3499,6 +3506,10 @@ static struct udp_table __net_init *udp_pernet_table_alloc(unsigned int hash_ent
 		udptable->hash2[i].hslot.count = 0;
 		spin_lock_init(&udptable->hash2[i].hslot.lock);
 		udptable->hash2[i].hash4_cnt = 0;
+
+		INIT_HLIST_HEAD(&udptable->hash4[i].head);
+		udptable->hash4[i].count = 0;
+		spin_lock_init(&udptable->hash4[i].lock);
 	}
 
 	return udptable;
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket
  2024-10-12  1:29 [PATCH v4 net-next 0/3] udp: Add 4-tuple hash for connected sockets Philo Lu
  2024-10-12  1:29 ` [PATCH v4 net-next 1/3] net/udp: Add a new struct for hash2 slot Philo Lu
  2024-10-12  1:29 ` [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis Philo Lu
@ 2024-10-12  1:29 ` Philo Lu
  2024-10-14 10:19   ` Paolo Abeni
  2 siblings, 1 reply; 11+ messages in thread
From: Philo Lu @ 2024-10-12  1:29 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing

Currently, the udp_table has two hash table, the port hash and portaddr
hash. Usually for UDP servers, all sockets have the same local port and
addr, so they are all on the same hash slot within a reuseport group.

In some applications, UDP servers use connect() to manage clients. In
particular, when firstly receiving from an unseen 4 tuple, a new socket
is created and connect()ed to the remote addr:port, and then the fd is
used exclusively by the client.

Once there are connected sks in a reuseport group, udp has to score all
sks in the same hash2 slot to find the best match. This could be
inefficient with a large number of connections, resulting in high
softirq overhead.

To solve the problem, this patch implement 4-tuple hash for connected
udp sockets. During connect(), hash4 slot is updated, as well as a
corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(),
hslot4 will be searched firstly if the counter is non-zero. Otherwise,
hslot2 is used like before. Note that only connected sockets enter this
hash4 path, while un-connected ones are not affected.

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com>
Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
---
 include/net/udp.h |   3 +-
 net/ipv4/udp.c    | 142 ++++++++++++++++++++++++++++++++++++++++++++--
 net/ipv6/udp.c    |   2 +-
 3 files changed, 141 insertions(+), 6 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 80f9622d0db3..5633b51cf8d4 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -226,7 +226,7 @@ static inline int udp_lib_hash(struct sock *sk)
 }
 
 void udp_lib_unhash(struct sock *sk);
-void udp_lib_rehash(struct sock *sk, u16 new_hash);
+void udp_lib_rehash(struct sock *sk, u16 new_hash, u16 new_hash4);
 
 static inline void udp_lib_close(struct sock *sk, long timeout)
 {
@@ -319,6 +319,7 @@ int udp_rcv(struct sk_buff *skb);
 int udp_ioctl(struct sock *sk, int cmd, int *karg);
 int udp_init_sock(struct sock *sk);
 int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
+int udp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 int __udp_disconnect(struct sock *sk, int flags);
 int udp_disconnect(struct sock *sk, int flags);
 __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1498ccb79c58..d7e3866617e0 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -478,6 +478,27 @@ static struct sock *udp4_lib_lookup2(const struct net *net,
 	return result;
 }
 
+static struct sock *udp4_lib_lookup4(const struct net *net,
+				     __be32 saddr, __be16 sport,
+				     __be32 daddr, unsigned int hnum,
+				     int dif, int sdif,
+				     struct udp_table *udptable)
+{
+	unsigned int hash4 = udp_ehashfn(net, daddr, hnum, saddr, sport);
+	const __portpair ports = INET_COMBINED_PORTS(sport, hnum);
+	struct udp_hslot *hslot4 = udp_hashslot4(udptable, hash4);
+	struct udp_sock *up;
+	struct sock *sk;
+
+	INET_ADDR_COOKIE(acookie, saddr, daddr);
+	udp_lrpa_for_each_entry_rcu(up, &hslot4->head) {
+		sk = (struct sock *)up;
+		if (inet_match(net, sk, acookie, ports, dif, sdif))
+			return sk;
+	}
+	return NULL;
+}
+
 /* UDP is nearly always wildcards out the wazoo, it makes no sense to try
  * harder than this. -DaveM
  */
@@ -493,6 +514,12 @@ struct sock *__udp4_lib_lookup(const struct net *net, __be32 saddr,
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
 	hslot2 = udp_hashslot2(udptable, hash2);
 
+	if (UDP_HSLOT_MAIN(hslot2)->hash4_cnt) {
+		result = udp4_lib_lookup4(net, saddr, sport, daddr, hnum, dif, sdif, udptable);
+		if (result) /* udp4_lib_lookup4 return sk or NULL */
+			return result;
+	}
+
 	/* Lookup connected or non-wildcard socket */
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
@@ -1931,6 +1958,85 @@ int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 }
 EXPORT_SYMBOL(udp_pre_connect);
 
+/* In hash4, rehash can also happen in connect(), where hash4_cnt keeps unchanged. */
+static void udp4_rehash4(struct udp_table *udptable, struct sock *sk, u16 newhash4)
+{
+	struct udp_hslot *hslot4, *nhslot4;
+
+	hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
+	nhslot4 = udp_hashslot4(udptable, newhash4);
+	udp_sk(sk)->udp_lrpa_hash = newhash4;
+
+	if (hslot4 != nhslot4) {
+		spin_lock_bh(&hslot4->lock);
+		hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
+		hslot4->count--;
+		spin_unlock_bh(&hslot4->lock);
+
+		synchronize_rcu();
+
+		spin_lock_bh(&nhslot4->lock);
+		hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &nhslot4->head);
+		nhslot4->count++;
+		spin_unlock_bh(&nhslot4->lock);
+	}
+}
+
+/* call with sock lock */
+static void udp4_hash4(struct sock *sk)
+{
+	struct udp_hslot *hslot, *hslot4;
+	struct udp_hslot_main *hslotm2;
+	struct net *net = sock_net(sk);
+	struct udp_table *udptable;
+	unsigned int hash;
+
+	if (sk_unhashed(sk) || inet_sk(sk)->inet_rcv_saddr == htonl(INADDR_ANY))
+		return;
+
+	hash = udp_ehashfn(net, inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num,
+			   inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport);
+
+	udptable = net->ipv4.udp_table;
+	if (udp_hashed4(sk)) {
+		udp4_rehash4(udptable, sk, hash);
+		return;
+	}
+
+	hslot = udp_hashslot(udptable, net, udp_sk(sk)->udp_port_hash);
+	hslotm2 = udp_hashslot2_main(udptable, udp_sk(sk)->udp_portaddr_hash);
+	hslot4 = udp_hashslot4(udptable, hash);
+	udp_sk(sk)->udp_lrpa_hash = hash;
+
+	spin_lock_bh(&hslot->lock);
+	if (rcu_access_pointer(sk->sk_reuseport_cb))
+		reuseport_detach_sock(sk);
+
+	spin_lock(&hslot4->lock);
+	hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &hslot4->head);
+	hslot4->count++;
+	spin_unlock(&hslot4->lock);
+
+	spin_lock(&hslotm2->hslot.lock);
+	hslotm2->hash4_cnt++;
+	spin_unlock(&hslotm2->hslot.lock);
+
+	spin_unlock_bh(&hslot->lock);
+}
+
+int udp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
+{
+	int res;
+
+	lock_sock(sk);
+	res = __ip4_datagram_connect(sk, uaddr, addr_len);
+	if (!res)
+		udp4_hash4(sk);
+	release_sock(sk);
+	return res;
+}
+EXPORT_SYMBOL(udp_connect);
+
 int __udp_disconnect(struct sock *sk, int flags)
 {
 	struct inet_sock *inet = inet_sk(sk);
@@ -1972,7 +2078,7 @@ void udp_lib_unhash(struct sock *sk)
 {
 	if (sk_hashed(sk)) {
 		struct udp_table *udptable = udp_get_table_prot(sk);
-		struct udp_hslot *hslot, *hslot2;
+		struct udp_hslot *hslot, *hslot2, *hslot4;
 
 		hslot  = udp_hashslot(udptable, sock_net(sk),
 				      udp_sk(sk)->udp_port_hash);
@@ -1990,6 +2096,18 @@ void udp_lib_unhash(struct sock *sk)
 			hlist_del_init_rcu(&udp_sk(sk)->udp_portaddr_node);
 			hslot2->count--;
 			spin_unlock(&hslot2->lock);
+
+			if (udp_hashed4(sk)) {
+				hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
+				spin_lock(&hslot4->lock);
+				hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
+				hslot4->count--;
+				spin_unlock(&hslot4->lock);
+
+				spin_lock(&hslot2->lock);
+				UDP_HSLOT_MAIN(hslot2)->hash4_cnt--;
+				spin_unlock(&hslot2->lock);
+			}
 		}
 		spin_unlock_bh(&hslot->lock);
 	}
@@ -1999,7 +2117,7 @@ EXPORT_SYMBOL(udp_lib_unhash);
 /*
  * inet_rcv_saddr was changed, we must rehash secondary hash
  */
-void udp_lib_rehash(struct sock *sk, u16 newhash)
+void udp_lib_rehash(struct sock *sk, u16 newhash, u16 newhash4)
 {
 	if (sk_hashed(sk)) {
 		struct udp_table *udptable = udp_get_table_prot(sk);
@@ -2031,6 +2149,19 @@ void udp_lib_rehash(struct sock *sk, u16 newhash)
 				spin_unlock(&nhslot2->lock);
 			}
 
+			if (udp_hashed4(sk)) {
+				udp4_rehash4(udptable, sk, newhash4);
+
+				if (hslot2 != nhslot2) {
+					spin_lock(&hslot2->lock);
+					UDP_HSLOT_MAIN(hslot2)->hash4_cnt--;
+					spin_unlock(&hslot2->lock);
+
+					spin_lock(&nhslot2->lock);
+					UDP_HSLOT_MAIN(nhslot2)->hash4_cnt++;
+					spin_unlock(&nhslot2->lock);
+				}
+			}
 			spin_unlock_bh(&hslot->lock);
 		}
 	}
@@ -2042,7 +2173,10 @@ void udp_v4_rehash(struct sock *sk)
 	u16 new_hash = ipv4_portaddr_hash(sock_net(sk),
 					  inet_sk(sk)->inet_rcv_saddr,
 					  inet_sk(sk)->inet_num);
-	udp_lib_rehash(sk, new_hash);
+	u16 new_hash4 = udp_ehashfn(sock_net(sk),
+				    inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num,
+				    inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport);
+	udp_lib_rehash(sk, new_hash, new_hash4);
 }
 
 static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
@@ -2935,7 +3069,7 @@ struct proto udp_prot = {
 	.owner			= THIS_MODULE,
 	.close			= udp_lib_close,
 	.pre_connect		= udp_pre_connect,
-	.connect		= ip4_datagram_connect,
+	.connect		= udp_connect,
 	.disconnect		= udp_disconnect,
 	.ioctl			= udp_ioctl,
 	.init			= udp_init_sock,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index bbf3352213c4..4d3dfcb48a39 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -111,7 +111,7 @@ void udp_v6_rehash(struct sock *sk)
 					  &sk->sk_v6_rcv_saddr,
 					  inet_sk(sk)->inet_num);
 
-	udp_lib_rehash(sk, new_hash);
+	udp_lib_rehash(sk, new_hash, 0); /* 4-tuple hash not implemented */
 }
 
 static int compute_score(struct sock *sk, const struct net *net,
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis
  2024-10-12  1:29 ` [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis Philo Lu
@ 2024-10-14 10:07   ` Paolo Abeni
  2024-10-16  6:30     ` Philo Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2024-10-14 10:07 UTC (permalink / raw)
  To: Philo Lu, netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing

Hi,

On 10/12/24 03:29, Philo Lu wrote:
> @@ -3480,13 +3486,14 @@ static struct udp_table __net_init *udp_pernet_table_alloc(unsigned int hash_ent
>   	if (!udptable)
>   		goto out;
>   
> -	slot_size = sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main);
> +	slot_size = 2 * sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main);
>   	udptable->hash = vmalloc_huge(hash_entries * slot_size,
>   				      GFP_KERNEL_ACCOUNT);

I'm sorry for the late feedback.

I think it would be better to make the hash4 infra a no op (no lookup, 
no additional memory used) for CONFIG_BASE_SMALL=y builds.

It would be great if you could please share some benchmark showing the 
raw max receive PPS performances for unconnected sockets, with and 
without this series applied, to ensure this does not cause any real 
regression for such workloads.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket
  2024-10-12  1:29 ` [PATCH v4 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket Philo Lu
@ 2024-10-14 10:19   ` Paolo Abeni
  2024-10-16  7:28     ` Philo Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Abeni @ 2024-10-14 10:19 UTC (permalink / raw)
  To: Philo Lu, netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing

On 10/12/24 03:29, Philo Lu wrote:
> Currently, the udp_table has two hash table, the port hash and portaddr
> hash. Usually for UDP servers, all sockets have the same local port and
> addr, so they are all on the same hash slot within a reuseport group.
> 
> In some applications, UDP servers use connect() to manage clients. In
> particular, when firstly receiving from an unseen 4 tuple, a new socket
> is created and connect()ed to the remote addr:port, and then the fd is
> used exclusively by the client.

How do you handle the following somewhat racing scenario? a 2nd packet 
beloning to the same 4-tulpe lands into the unconnected socket receive 
queue just after the 1st one, before the connected socket is created. 
The server process such packet after the connected socket creation.

How many connected sockets is your system serving concurrently? Possibly 
it would make sense to allocate a larger hash table for the connected 
UDP sockets, using separate/different min/max/scale values WRT to the 
unconnected tables.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis
  2024-10-14 10:07   ` Paolo Abeni
@ 2024-10-16  6:30     ` Philo Lu
  2024-10-16  7:45       ` Paolo Abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Philo Lu @ 2024-10-16  6:30 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing



On 2024/10/14 18:07, Paolo Abeni wrote:
> Hi,
> 
> On 10/12/24 03:29, Philo Lu wrote:
>> @@ -3480,13 +3486,14 @@ static struct udp_table __net_init 
>> *udp_pernet_table_alloc(unsigned int hash_ent
>>       if (!udptable)
>>           goto out;
>> -    slot_size = sizeof(struct udp_hslot) + sizeof(struct 
>> udp_hslot_main);
>> +    slot_size = 2 * sizeof(struct udp_hslot) + sizeof(struct 
>> udp_hslot_main);
>>       udptable->hash = vmalloc_huge(hash_entries * slot_size,
>>                         GFP_KERNEL_ACCOUNT);
> 
> I'm sorry for the late feedback.
> 
> I think it would be better to make the hash4 infra a no op (no lookup, 
> no additional memory used) for CONFIG_BASE_SMALL=y builds.
> 

Got it. There are 2 affected structs, udp_hslot and udp_sock. They (as 
well as related helpers like udp4_hash4) can be wrapped with 
CONFIG_BASE_SMALL, and then we can enable BASE_SMALL to eliminate 
additional overhead of hash4.

```
+struct udp_hslot_main {
+	struct udp_hslot	hslot; /* must be the first member */
+#if !IS_ENABLED(CONFIG_BASE_SMALL)
+	u32			hash4_cnt;
+#endif
+} __aligned(2 * sizeof(long));


@@ -56,6 +56,12 @@ struct udp_sock {
  	int		 pending;	/* Any pending frames ? */
  	__u8		 encap_type;	/* Is this an Encapsulation socket? */

+#if !IS_ENABLED(CONFIG_BASE_SMALL)
+	/* For UDP 4-tuple hash */
+	__u16 udp_lrpa_hash;
+	struct hlist_node udp_lrpa_node;
+#endif
+
```

> It would be great if you could please share some benchmark showing the 
> raw max receive PPS performances for unconnected sockets, with and 
> without this series applied, to ensure this does not cause any real 
> regression for such workloads.
> 

Tested using sockperf tp with default msgsize (14B), 3 times for w/ and 
w/o the patch set, and results show no obvious difference:

[msg/sec]  test1    test2    test3    mean
w/o patch  514,664  519,040  527,115  520.3k
w/  patch  516,863  526,337  527,195  523.5k (+0.6%)

Thank you for review, Paolo.
-- 
Philo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket
  2024-10-14 10:19   ` Paolo Abeni
@ 2024-10-16  7:28     ` Philo Lu
  0 siblings, 0 replies; 11+ messages in thread
From: Philo Lu @ 2024-10-16  7:28 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing



On 2024/10/14 18:19, Paolo Abeni wrote:
> On 10/12/24 03:29, Philo Lu wrote:
>> Currently, the udp_table has two hash table, the port hash and portaddr
>> hash. Usually for UDP servers, all sockets have the same local port and
>> addr, so they are all on the same hash slot within a reuseport group.
>>
>> In some applications, UDP servers use connect() to manage clients. In
>> particular, when firstly receiving from an unseen 4 tuple, a new socket
>> is created and connect()ed to the remote addr:port, and then the fd is
>> used exclusively by the client.
> 
> How do you handle the following somewhat racing scenario? a 2nd packet 
> beloning to the same 4-tulpe lands into the unconnected socket receive 
> queue just after the 1st one, before the connected socket is created. 
> The server process such packet after the connected socket creation.
> 

One method is to address it in application. Application maintains the 
information of connections, and it knows which connection to deliver 
incoming packets.

If the 2nd packet comes from the "listen" socket (i.e., the initial 
unconnected socket), app can search for the connection of it. Note that 
upon the 1st packet receiving, the connection is already created though 
the socket is not ready, so it can be found for the 2nd packet.

In this case, maybe several packets are processed with this method until 
the new connected socket created. Then it runs as we expect.

So I think it cannot be prevented but can be handled, depending on how 
applications use it.

> How many connected sockets is your system serving concurrently? Possibly 

About 10000 conns in general. So current same-sized hash4 table is 
enough for us now.

> it would make sense to allocate a larger hash table for the connected 
> UDP sockets, using separate/different min/max/scale values WRT to the 
> unconnected tables.
> 

Agreed that it's a good idea. But imo it could be left as future work 
when we definitely need it.

Thanks.
-- 
Philo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis
  2024-10-16  6:30     ` Philo Lu
@ 2024-10-16  7:45       ` Paolo Abeni
  2024-10-16  8:47         ` Philo Lu
  2024-10-17  7:46         ` Philo Lu
  0 siblings, 2 replies; 11+ messages in thread
From: Paolo Abeni @ 2024-10-16  7:45 UTC (permalink / raw)
  To: Philo Lu, netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing

On 10/16/24 08:30, Philo Lu wrote:
> On 2024/10/14 18:07, Paolo Abeni wrote:
>> It would be great if you could please share some benchmark showing the
>> raw max receive PPS performances for unconnected sockets, with and
>> without this series applied, to ensure this does not cause any real
>> regression for such workloads.
>>
> 
> Tested using sockperf tp with default msgsize (14B), 3 times for w/ and
> w/o the patch set, and results show no obvious difference:
> 
> [msg/sec]  test1    test2    test3    mean
> w/o patch  514,664  519,040  527,115  520.3k
> w/  patch  516,863  526,337  527,195  523.5k (+0.6%)
> 
> Thank you for review, Paolo.

Are the value in packet per seconds, or bytes per seconds? Are you doing 
a loopback test or over the wire? The most important question is: is the 
receiver side keeping (at least) 1 CPU fully busy? Otherwise the test is 
not very relevant.

It looks like you have some setup issue, or you are using a relatively 
low end H/W: the expected packet rate for reasonable server H/W is well 
above 1M (possibly much more than that, but I can't put my hands on 
recent H/W, so I can't provide a more accurate figure).

A single socket, user-space, UDP sender is usually unable to reach such 
tput without USO, and even with USO you likely need to do an 
over-the-wire test to really be able to keep the receiver fully busy. 
AFAICS sockperf does not support USO for the sender.

You could use the udpgso_bench_tx/udpgso_bench_rx pair from the net 
selftests directory instead.

Or you could use pktgen as traffic generator.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis
  2024-10-16  7:45       ` Paolo Abeni
@ 2024-10-16  8:47         ` Philo Lu
  2024-10-17  7:46         ` Philo Lu
  1 sibling, 0 replies; 11+ messages in thread
From: Philo Lu @ 2024-10-16  8:47 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing



On 2024/10/16 15:45, Paolo Abeni wrote:
> On 10/16/24 08:30, Philo Lu wrote:
>> On 2024/10/14 18:07, Paolo Abeni wrote:
>>> It would be great if you could please share some benchmark showing the
>>> raw max receive PPS performances for unconnected sockets, with and
>>> without this series applied, to ensure this does not cause any real
>>> regression for such workloads.
>>>
>>
>> Tested using sockperf tp with default msgsize (14B), 3 times for w/ and
>> w/o the patch set, and results show no obvious difference:
>>
>> [msg/sec]  test1    test2    test3    mean
>> w/o patch  514,664  519,040  527,115  520.3k
>> w/  patch  516,863  526,337  527,195  523.5k (+0.6%)
>>
>> Thank you for review, Paolo.
> 
> Are the value in packet per seconds, or bytes per seconds? Are you doing 
> a loopback test or over the wire? The most important question is: is the 
> receiver side keeping (at least) 1 CPU fully busy? Otherwise the test is 
> not very relevant.
> 

It's in packet per seconds (msg/sec). I make the cpu fully busy by 
binding the nic irq the same cpu with the server socket. The consumption 
is like:

%Cpu0:
3.0 us,  35.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  7.0 hi,  55.0 si,  0.0 st

> It looks like you have some setup issue, or you are using a relatively 
> low end H/W: the expected packet rate for reasonable server H/W is well 
> above 1M (possibly much more than that, but I can't put my hands on 
> recent H/W, so I can't provide a more accurate figure).
> 
> A single socket, user-space, UDP sender is usually unable to reach such 
> tput without USO, and even with USO you likely need to do an over-the- 
> wire test to really be able to keep the receiver fully busy. AFAICS 
> sockperf does not support USO for the sender.
> 
> You could use the udpgso_bench_tx/udpgso_bench_rx pair from the net 
> selftests directory instead.
> 

Thank you for your suggestion. I'll try it to see if I can get higher pps.
-- 
Philo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis
  2024-10-16  7:45       ` Paolo Abeni
  2024-10-16  8:47         ` Philo Lu
@ 2024-10-17  7:46         ` Philo Lu
  1 sibling, 0 replies; 11+ messages in thread
From: Philo Lu @ 2024-10-17  7:46 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, dsahern,
	antony.antony, steffen.klassert, linux-kernel, dust.li, jakub,
	fred.cc, yubing.qiuyubing



On 2024/10/16 15:45, Paolo Abeni wrote:
> On 10/16/24 08:30, Philo Lu wrote:
>> On 2024/10/14 18:07, Paolo Abeni wrote:
>>> It would be great if you could please share some benchmark showing the
>>> raw max receive PPS performances for unconnected sockets, with and
>>> without this series applied, to ensure this does not cause any real
>>> regression for such workloads.
>>>
>>
>> Tested using sockperf tp with default msgsize (14B), 3 times for w/ and
>> w/o the patch set, and results show no obvious difference:
>>
>> [msg/sec]  test1    test2    test3    mean
>> w/o patch  514,664  519,040  527,115  520.3k
>> w/  patch  516,863  526,337  527,195  523.5k (+0.6%)
>>
>> Thank you for review, Paolo.
> 
> Are the value in packet per seconds, or bytes per seconds? Are you doing 
> a loopback test or over the wire? The most important question is: is the 
> receiver side keeping (at least) 1 CPU fully busy? Otherwise the test is 
> not very relevant.
> 
> It looks like you have some setup issue, or you are using a relatively 
> low end H/W: the expected packet rate for reasonable server H/W is well 
> above 1M (possibly much more than that, but I can't put my hands on 
> recent H/W, so I can't provide a more accurate figure).
> 
> A single socket, user-space, UDP sender is usually unable to reach such 
> tput without USO, and even with USO you likely need to do an over-the- 
> wire test to really be able to keep the receiver fully busy. AFAICS 
> sockperf does not support USO for the sender.
> 
> You could use the udpgso_bench_tx/udpgso_bench_rx pair from the net 
> selftests directory instead.
> 
> Or you could use pktgen as traffic generator.
> 

I test it again with udpgso_bench_tx/udpgso_bench_rx. In server, 2 cpus 
are involved, one for udpgso_bench_rx and the other for nic rx queue so 
that the si of nic rx cpu is 100%. udpgso_bench_tx runs with payload 
size 20, and the tx pps is larger than rx ensuring rx is the bottleneck.

The outputs of udpgso_bench_rx:
[without patchset]
udp rx:     20 MB/s  1092546 calls/s
udp rx:     20 MB/s  1095051 calls/s
udp rx:     20 MB/s  1094136 calls/s
udp rx:     20 MB/s  1098860 calls/s
udp rx:     20 MB/s  1097963 calls/s
udp rx:     20 MB/s  1097460 calls/s
udp rx:     20 MB/s  1098370 calls/s
udp rx:     20 MB/s  1098089 calls/s
udp rx:     20 MB/s  1095330 calls/s
udp rx:     20 MB/s  1095486 calls/s

[with patchset]
udp rx:     21 MB/s  1105533 calls/s
udp rx:     21 MB/s  1105475 calls/s
udp rx:     21 MB/s  1104244 calls/s
udp rx:     21 MB/s  1105600 calls/s
udp rx:     21 MB/s  1108019 calls/s
udp rx:     21 MB/s  1101971 calls/s
udp rx:     21 MB/s  1104147 calls/s
udp rx:     21 MB/s  1104874 calls/s
udp rx:     21 MB/s  1101987 calls/s
udp rx:     21 MB/s  1105500 calls/s

The averages w/ and w/o the patchset are 1104735 and 1096329, the gap is 
0.8%, which I think is negligible.

Besides, perf shows ~0.6% higher cpu consumption of __udp4_lib_lookup() 
with this patchset (increasing from 5.7% to 6.3%).

Thanks.
-- 
Philo


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-10-17  7:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-12  1:29 [PATCH v4 net-next 0/3] udp: Add 4-tuple hash for connected sockets Philo Lu
2024-10-12  1:29 ` [PATCH v4 net-next 1/3] net/udp: Add a new struct for hash2 slot Philo Lu
2024-10-12  1:29 ` [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis Philo Lu
2024-10-14 10:07   ` Paolo Abeni
2024-10-16  6:30     ` Philo Lu
2024-10-16  7:45       ` Paolo Abeni
2024-10-16  8:47         ` Philo Lu
2024-10-17  7:46         ` Philo Lu
2024-10-12  1:29 ` [PATCH v4 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket Philo Lu
2024-10-14 10:19   ` Paolo Abeni
2024-10-16  7:28     ` Philo Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).