netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next-2.6] net: speedup sk_wake_async()
@ 2009-10-06 23:08 Eric Dumazet
  2009-10-07  0:28 ` David Miller
  2009-10-07  0:42 ` Rick Jones
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2009-10-06 23:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

Latency works, part 1


An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))

On 32bit arches :

offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
offsetof(struct sock, sk_lock)         =0x34   (rw)

offsetof(struct sock, sk_sleep)        =0x50 (read)
offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
offsetof(struct sock, sk_receive_queue)=0x74   (rw)

offsetof(struct sock, sk_forward_alloc)=0x98   (rw)

offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter)       =0xf8    (read)

offsetof(struct sock, sk_socket)       =0x138 (read)

offsetof(struct sock, sk_data_ready)   =0x15c   (read)


We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

This avoids one cache line load per incoming packet for common cases (no fasync())

We can leave (or even move in a future patch) sk->sk_socket in a cold location

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h |    3 ++-
 net/socket.c       |    3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 1621935..98398bd 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -504,6 +504,7 @@ enum sock_flags {
 	SOCK_TIMESTAMPING_SOFTWARE,     /* %SOF_TIMESTAMPING_SOFTWARE */
 	SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */
 	SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */
+	SOCK_FASYNC, /* fasync() active */
 };
 
 static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -1396,7 +1397,7 @@ static inline unsigned long sock_wspace(struct sock *sk)
 
 static inline void sk_wake_async(struct sock *sk, int how, int band)
 {
-	if (sk->sk_socket && sk->sk_socket->fasync_list)
+	if (sock_flag(sk, SOCK_FASYNC))
 		sock_wake_async(sk->sk_socket, how, band);
 }
 
diff --git a/net/socket.c b/net/socket.c
index 7565536..d53ad11 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1100,11 +1100,14 @@ static int sock_fasync(int fd, struct file *filp, int on)
 		fna->fa_next = sock->fasync_list;
 		write_lock_bh(&sk->sk_callback_lock);
 		sock->fasync_list = fna;
+		sock_set_flag(sk, SOCK_FASYNC);
 		write_unlock_bh(&sk->sk_callback_lock);
 	} else {
 		if (fa != NULL) {
 			write_lock_bh(&sk->sk_callback_lock);
 			*prev = fa->fa_next;
+			if (!sock->fasync_list)
+				sock_reset_flag(sk, SOCK_FASYNC);
 			write_unlock_bh(&sk->sk_callback_lock);
 			kfree(fa);
 		}

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC net-next-2.6] net: speedup sk_wake_async()
  2009-10-06 23:08 [RFC net-next-2.6] net: speedup sk_wake_async() Eric Dumazet
@ 2009-10-07  0:28 ` David Miller
  2009-10-07  0:42 ` Rick Jones
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2009-10-07  0:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Oct 2009 01:08:08 +0200

> An incoming datagram must bring into cpu cache *lot* of cache lines,
> in particular : (other parts omitted (hash chains, ip route cache...))
> 
> On 32bit arches :
 ...
> We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
> with no fasync() structures. (socket->fasync_list ptr is probably already in cache
> because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
> 
> This avoids one cache line load per incoming packet for common cases (no fasync())
> 
> We can leave (or even move in a future patch) sk->sk_socket in a cold location
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

I like it, applied to net-next-2.6, thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC net-next-2.6] net: speedup sk_wake_async()
  2009-10-06 23:08 [RFC net-next-2.6] net: speedup sk_wake_async() Eric Dumazet
  2009-10-07  0:28 ` David Miller
@ 2009-10-07  0:42 ` Rick Jones
  2009-10-07  3:37   ` Eric Dumazet
  1 sibling, 1 reply; 12+ messages in thread
From: Rick Jones @ 2009-10-07  0:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Linux Netdev List

Eric Dumazet wrote:
> Latency works, part 1
> 
> 
> An incoming datagram must bring into cpu cache *lot* of cache lines,
> in particular : (other parts omitted (hash chains, ip route cache...))
> 
> On 32bit arches :

How about 64-bit?

> offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
> offsetof(struct sock, sk_lock)         =0x34   (rw)
> 
> offsetof(struct sock, sk_sleep)        =0x50 (read)
> offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
> offsetof(struct sock, sk_receive_queue)=0x74   (rw)
> 
> offsetof(struct sock, sk_forward_alloc)=0x98   (rw)
> 
> offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
> offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
> offsetof(struct sock, sk_filter)       =0xf8    (read)
> 
> offsetof(struct sock, sk_socket)       =0x138 (read)
> 
> offsetof(struct sock, sk_data_ready)   =0x15c   (read)
> 
> 
> We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
> with no fasync() structures. (socket->fasync_list ptr is probably already in cache
> because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
> 
> This avoids one cache line load per incoming packet for common cases (no fasync())
> 
> We can leave (or even move in a future patch) sk->sk_socket in a cold location
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Got any netperf service demand changes?

rick jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC net-next-2.6] net: speedup sk_wake_async()
  2009-10-07  0:42 ` Rick Jones
@ 2009-10-07  3:37   ` Eric Dumazet
  2009-10-07  4:43     ` [PATCH] udp: extend hash tables to 256 slots Eric Dumazet
  2009-10-07 15:53     ` [RFC net-next-2.6] net: speedup sk_wake_async() Rick Jones
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2009-10-07  3:37 UTC (permalink / raw)
  To: Rick Jones; +Cc: David S. Miller, Linux Netdev List

Rick Jones a écrit :
> 
> How about 64-bit?

No data yet, but larger footprint unfortunatly :-(

> 
> Got any netperf service demand changes?

I was going to setup a bench lab, with a typical RTP mediaserver, with say
4000 UDP sockets, 2000 sockets exchanging 50 G.711 Alaw/ulaw
messages per second tx and rx. (Total : 100.000 packets per second each way)

Is netperf able to simulate this workload ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] udp: extend hash tables to 256 slots
  2009-10-07  3:37   ` Eric Dumazet
@ 2009-10-07  4:43     ` Eric Dumazet
  2009-10-07  5:29       ` David Miller
  2009-10-07 15:53     ` [RFC net-next-2.6] net: speedup sk_wake_async() Rick Jones
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2009-10-07  4:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: Rick Jones, Linux Netdev List

Eric Dumazet a écrit :
> I was going to setup a bench lab, with a typical RTP mediaserver, with say
> 4000 UDP sockets, 2000 sockets exchanging 50 G.711 Alaw/ulaw
> messages per second tx and rx. (Total : 100.000 packets per second each way)
> 

Hmm, it seems we'll have too many sockets per udp hash chain unfortunatly for this
workload to show any improvement.

(~32 sockets per chain : average of 16 misses to lookup the target socket.)

David, I believe UDP_HTABLE_SIZE never changed from its initial value of 128,
defined 15 years ago. Could we bump it to 256 ?

(back in 1995, SOCK_ARRAY_SIZE was 256)

(I'll probably use 1024 value for my tests)

[PATCH] udp: extend hash tables to 256 slots

UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups.
4000 active sockets -> 32 sockets per chain in average.

Doubling hash table size has a memory cost of 128 (pointers + spinlocks) for UDP,
same for UDPLite, this should be OK.

It reduces the size of bitmap used in udp_lib_get_port() and speedup port allocation.
#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE) -> 256 bits instead of 512 bits

Use CONFIG_BASE_SMALL to keep hash tables small for small machines.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 0cf5c4c..8aaa151 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -45,7 +45,7 @@ static inline struct udphdr *udp_hdr(const struct sk_buff *skb)
 	return (struct udphdr *)skb_transport_header(skb);
 }
 
-#define UDP_HTABLE_SIZE		128
+#define UDP_HTABLE_SIZE		(CONFIG_BASE_SMALL ? 128 : 256)
 
 static inline int udp_hashfn(struct net *net, const unsigned num)
 {

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] udp: extend hash tables to 256 slots
  2009-10-07  4:43     ` [PATCH] udp: extend hash tables to 256 slots Eric Dumazet
@ 2009-10-07  5:29       ` David Miller
  2009-10-07  5:33         ` Eric Dumazet
  2009-10-07 10:37         ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet
  0 siblings, 2 replies; 12+ messages in thread
From: David Miller @ 2009-10-07  5:29 UTC (permalink / raw)
  To: eric.dumazet; +Cc: rick.jones2, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Oct 2009 06:43:31 +0200

> David, I believe UDP_HTABLE_SIZE never changed from its initial value of 128,
> defined 15 years ago. Could we bump it to 256 ?
> 
> (back in 1995, SOCK_ARRAY_SIZE was 256)
> 
> (I'll probably use 1024 value for my tests)

That's incredible that it's been that low for so long :-)

Bug please, dynamically size this thing, maybe with a cap of say 64K
to start with.  If you don't have time for it I'll take care of this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] udp: extend hash tables to 256 slots
  2009-10-07  5:29       ` David Miller
@ 2009-10-07  5:33         ` Eric Dumazet
  2009-10-07  5:35           ` David Miller
  2009-10-07 10:37         ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2009-10-07  5:33 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, netdev

David Miller a écrit :
> 
> That's incredible that it's been that low for so long :-)
> 
> Bug please, dynamically size this thing, maybe with a cap of say 64K
> to start with.  If you don't have time for it I'll take care of this.


Well, we can not exceed 65536 slots, given the nature of UDP protocol :)

Do you mean a static allocation at boot time with a size that can be 
overidden in cmdline(like tcp and ip route),

or something that can dynamically extends hash table at runtime ?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] udp: extend hash tables to 256 slots
  2009-10-07  5:33         ` Eric Dumazet
@ 2009-10-07  5:35           ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2009-10-07  5:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: rick.jones2, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Oct 2009 07:33:09 +0200

> David Miller a écrit :
>> 
>> That's incredible that it's been that low for so long :-)
>> 
>> Bug please, dynamically size this thing, maybe with a cap of say 64K
>> to start with.  If you don't have time for it I'll take care of this.
> 
> 
> Well, we can not exceed 65536 slots, given the nature of UDP protocol :)
> 
> Do you mean a static allocation at boot time with a size that can be 
> overidden in cmdline(like tcp and ip route),
> 
> or something that can dynamically extends hash table at runtime ?

I mean dynamically size it between 256 and 65536 slots at boot time,
based upon memory size.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next-2.6] udp: dynamically size hash tables at boot time
  2009-10-07  5:29       ` David Miller
  2009-10-07  5:33         ` Eric Dumazet
@ 2009-10-07 10:37         ` Eric Dumazet
  2009-10-07 10:47           ` David Miller
  2009-10-08  5:01           ` David Miller
  1 sibling, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2009-10-07 10:37 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, netdev

David Miller a écrit :
> 
> That's incredible that it's been that low for so long :-)
> 
> Bug please, dynamically size this thing, maybe with a cap of say 64K
> to start with.  If you don't have time for it I'll take care of this.

Here we are.

Thank you

[PATCH] udp: dynamically size hash tables at boot time

UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups.

4000 active UDP sockets -> 32 sockets per chain in average. An incoming frame
has to lookup all sockets to find best match, so long chains hurt latency.

Instead of a fixed size hash table that cant be perfect for every needs,
let UDP stack choose its table size at boot time like tcp/ip route,
using alloc_large_system_hash() helper

Add an optional boot parameter, uhash_entries=x so that an admin can force a size
between 256 and 65536 if needed, like thash_entries and rhash_entries.

dmesg logs two new lines :
[    0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[    0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)


Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non debugging spinlocks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 Documentation/kernel-parameters.txt |    3
 include/linux/udp.h                 |    6 -
 include/net/udp.h                   |   13 ++-
 net/ipv4/udp.c                      |   91 ++++++++++++++++++--------
 net/ipv4/udplite.c                  |    4 -
 net/ipv6/udp.c                      |    6 -
 6 files changed, 87 insertions(+), 36 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6fa7292..02df20b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2589,6 +2589,9 @@ and is between 256 and 4096 characters. It is defined in the file
 	uart6850=	[HW,OSS]
 			Format: <io>,<irq>
 
+	uhash_entries=	[KNL,NET]
+			Set number of hash buckets for UDP/UDP-Lite connections
+
 	uhci-hcd.ignore_oc=
 			[USB] Ignore overcurrent events (default N).
 			Some badly-designed motherboards generate lots of
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 0cf5c4c..832361e 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -45,11 +45,11 @@ static inline struct udphdr *udp_hdr(const struct sk_buff *skb)
 	return (struct udphdr *)skb_transport_header(skb);
 }
 
-#define UDP_HTABLE_SIZE		128
+#define UDP_HTABLE_SIZE_MIN		(CONFIG_BASE_SMALL ? 128 : 256)
 
-static inline int udp_hashfn(struct net *net, const unsigned num)
+static inline int udp_hashfn(struct net *net, unsigned num, unsigned mask)
 {
-	return (num + net_hash_mix(net)) & (UDP_HTABLE_SIZE - 1);
+	return (num + net_hash_mix(net)) & mask;
 }
 
 struct udp_sock {
diff --git a/include/net/udp.h b/include/net/udp.h
index f98abd2..22aa2e7 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -54,12 +54,19 @@ struct udp_hslot {
 	struct hlist_nulls_head	head;
 	spinlock_t		lock;
 } __attribute__((aligned(2 * sizeof(long))));
+
 struct udp_table {
-	struct udp_hslot	hash[UDP_HTABLE_SIZE];
+	struct udp_hslot	*hash;
+	unsigned int mask;
+	unsigned int log;
 };
 extern struct udp_table udp_table;
-extern void udp_table_init(struct udp_table *);
-
+extern void udp_table_init(struct udp_table *, const char *);
+static inline struct udp_hslot *udp_hashslot(struct udp_table *table,
+					     struct net *net, unsigned num)
+{
+	return &table->hash[udp_hashfn(net, num, table->mask)];
+}
 
 /* Note: this must match 'valbool' in sock_setsockopt */
 #define UDP_CSUM_NOXMIT		1
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6ec6a8a..194bcdc 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -106,7 +106,7 @@
 #include <net/xfrm.h>
 #include "udp_impl.h"
 
-struct udp_table udp_table;
+struct udp_table udp_table __read_mostly;
 EXPORT_SYMBOL(udp_table);
 
 int sysctl_udp_mem[3] __read_mostly;
@@ -121,14 +121,16 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
 atomic_t udp_memory_allocated;
 EXPORT_SYMBOL(udp_memory_allocated);
 
-#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE)
+#define MAX_UDP_PORTS 65536
+#define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)
 
 static int udp_lib_lport_inuse(struct net *net, __u16 num,
 			       const struct udp_hslot *hslot,
 			       unsigned long *bitmap,
 			       struct sock *sk,
 			       int (*saddr_comp)(const struct sock *sk1,
-						 const struct sock *sk2))
+						 const struct sock *sk2),
+			       unsigned int log)
 {
 	struct sock *sk2;
 	struct hlist_nulls_node *node;
@@ -142,8 +144,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
 			|| sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
 		    (*saddr_comp)(sk, sk2)) {
 			if (bitmap)
-				__set_bit(sk2->sk_hash / UDP_HTABLE_SIZE,
-					  bitmap);
+				__set_bit(sk2->sk_hash >> log, bitmap);
 			else
 				return 1;
 		}
@@ -180,13 +181,15 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		/*
 		 * force rand to be an odd multiple of UDP_HTABLE_SIZE
 		 */
-		rand = (rand | 1) * UDP_HTABLE_SIZE;
-		for (last = first + UDP_HTABLE_SIZE; first != last; first++) {
-			hslot = &udptable->hash[udp_hashfn(net, first)];
+		rand = (rand | 1) * (udptable->mask + 1);
+		for (last = first + udptable->mask + 1;
+		     first != last;
+		     first++) {
+			hslot = udp_hashslot(udptable, net, first);
 			bitmap_zero(bitmap, PORTS_PER_CHAIN);
 			spin_lock_bh(&hslot->lock);
 			udp_lib_lport_inuse(net, snum, hslot, bitmap, sk,
-					    saddr_comp);
+					    saddr_comp, udptable->log);
 
 			snum = first;
 			/*
@@ -196,7 +199,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 			 */
 			do {
 				if (low <= snum && snum <= high &&
-				    !test_bit(snum / UDP_HTABLE_SIZE, bitmap))
+				    !test_bit(snum >> udptable->log, bitmap))
 					goto found;
 				snum += rand;
 			} while (snum != first);
@@ -204,9 +207,10 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		}
 		goto fail;
 	} else {
-		hslot = &udptable->hash[udp_hashfn(net, snum)];
+		hslot = udp_hashslot(udptable, net, snum);
 		spin_lock_bh(&hslot->lock);
-		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, saddr_comp))
+		if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk,
+					saddr_comp, 0))
 			goto fail_unlock;
 	}
 found:
@@ -283,7 +287,7 @@ static struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 	struct sock *sk, *result;
 	struct hlist_nulls_node *node;
 	unsigned short hnum = ntohs(dport);
-	unsigned int hash = udp_hashfn(net, hnum);
+	unsigned int hash = udp_hashfn(net, hnum, udptable->mask);
 	struct udp_hslot *hslot = &udptable->hash[hash];
 	int score, badness;
 
@@ -1013,8 +1017,8 @@ void udp_lib_unhash(struct sock *sk)
 {
 	if (sk_hashed(sk)) {
 		struct udp_table *udptable = sk->sk_prot->h.udp_table;
-		unsigned int hash = udp_hashfn(sock_net(sk), sk->sk_hash);
-		struct udp_hslot *hslot = &udptable->hash[hash];
+		struct udp_hslot *hslot = udp_hashslot(udptable, sock_net(sk),
+						     sk->sk_hash);
 
 		spin_lock_bh(&hslot->lock);
 		if (sk_nulls_del_node_init_rcu(sk)) {
@@ -1169,7 +1173,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udp_table *udptable)
 {
 	struct sock *sk;
-	struct udp_hslot *hslot = &udptable->hash[udp_hashfn(net, ntohs(uh->dest))];
+	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
 
 	spin_lock(&hslot->lock);
@@ -1609,9 +1613,14 @@ static struct sock *udp_get_first(struct seq_file *seq, int start)
 	struct udp_iter_state *state = seq->private;
 	struct net *net = seq_file_net(seq);
 
-	for (state->bucket = start; state->bucket < UDP_HTABLE_SIZE; ++state->bucket) {
+	for (state->bucket = start; state->bucket <= state->udp_table->mask;
+	     ++state->bucket) {
 		struct hlist_nulls_node *node;
 		struct udp_hslot *hslot = &state->udp_table->hash[state->bucket];
+
+		if (hlist_nulls_empty(&hslot->head))
+			continue;
+
 		spin_lock_bh(&hslot->lock);
 		sk_nulls_for_each(sk, node, &hslot->head) {
 			if (!net_eq(sock_net(sk), net))
@@ -1636,7 +1645,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk)
 	} while (sk && (!net_eq(sock_net(sk), net) || sk->sk_family != state->family));
 
 	if (!sk) {
-		if (state->bucket < UDP_HTABLE_SIZE)
+		if (state->bucket <= state->udp_table->mask)
 			spin_unlock_bh(&state->udp_table->hash[state->bucket].lock);
 		return udp_get_first(seq, state->bucket + 1);
 	}
@@ -1656,7 +1665,7 @@ static struct sock *udp_get_idx(struct seq_file *seq, loff_t pos)
 static void *udp_seq_start(struct seq_file *seq, loff_t *pos)
 {
 	struct udp_iter_state *state = seq->private;
-	state->bucket = UDP_HTABLE_SIZE;
+	state->bucket = MAX_UDP_PORTS;
 
 	return *pos ? udp_get_idx(seq, *pos-1) : SEQ_START_TOKEN;
 }
@@ -1678,7 +1687,7 @@ static void udp_seq_stop(struct seq_file *seq, void *v)
 {
 	struct udp_iter_state *state = seq->private;
 
-	if (state->bucket < UDP_HTABLE_SIZE)
+	if (state->bucket <= state->udp_table->mask)
 		spin_unlock_bh(&state->udp_table->hash[state->bucket].lock);
 }
 
@@ -1738,7 +1747,7 @@ static void udp4_format_sock(struct sock *sp, struct seq_file *f,
 	__u16 destp	  = ntohs(inet->dport);
 	__u16 srcp	  = ntohs(inet->sport);
 
-	seq_printf(f, "%4d: %08X:%04X %08X:%04X"
+	seq_printf(f, "%5d: %08X:%04X %08X:%04X"
 		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d%n",
 		bucket, src, srcp, dest, destp, sp->sk_state,
 		sk_wmem_alloc_get(sp),
@@ -1804,11 +1813,43 @@ void udp4_proc_exit(void)
 }
 #endif /* CONFIG_PROC_FS */
 
-void __init udp_table_init(struct udp_table *table)
+static __initdata unsigned long uhash_entries;
+static int __init set_uhash_entries(char *str)
 {
-	int i;
+	if (!str)
+		return 0;
+	uhash_entries = simple_strtoul(str, &str, 0);
+	if (uhash_entries && uhash_entries < UDP_HTABLE_SIZE_MIN)
+		uhash_entries = UDP_HTABLE_SIZE_MIN;
+	return 1;
+}
+__setup("uhash_entries=", set_uhash_entries);
 
-	for (i = 0; i < UDP_HTABLE_SIZE; i++) {
+void __init udp_table_init(struct udp_table *table, const char *name)
+{
+	unsigned int i;
+
+	if (!CONFIG_BASE_SMALL)
+		table->hash = alloc_large_system_hash(name,
+			sizeof(struct udp_hslot),
+			uhash_entries,
+			21, /* one slot per 2 MB */
+			0,
+			&table->log,
+			&table->mask,
+			64 * 1024);
+	/*
+	 * Make sure hash table has the minimum size
+	 */
+	if (CONFIG_BASE_SMALL || table->mask < UDP_HTABLE_SIZE_MIN - 1) {
+		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
+				      sizeof(struct udp_hslot), GFP_KERNEL);
+		if (!table->hash)
+			panic(name);
+		table->log = ilog2(UDP_HTABLE_SIZE_MIN);
+		table->mask = UDP_HTABLE_SIZE_MIN - 1;
+	}
+	for (i = 0; i <= table->mask; i++) {
 		INIT_HLIST_NULLS_HEAD(&table->hash[i].head, i);
 		spin_lock_init(&table->hash[i].lock);
 	}
@@ -1818,7 +1859,7 @@ void __init udp_init(void)
 {
 	unsigned long nr_pages, limit;
 
-	udp_table_init(&udp_table);
+	udp_table_init(&udp_table, "UDP");
 	/* Set the pressure threshold up by the same strategy of TCP. It is a
 	 * fraction of global memory that is up to 1/2 at 256 MB, decreasing
 	 * toward zero with the amount of memory, with a floor of 128 pages.
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index 95248d7..a495ca8 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -12,7 +12,7 @@
  */
 #include "udp_impl.h"
 
-struct udp_table 	udplite_table;
+struct udp_table 	udplite_table __read_mostly;
 EXPORT_SYMBOL(udplite_table);
 
 static int udplite_rcv(struct sk_buff *skb)
@@ -110,7 +110,7 @@ static inline int udplite4_proc_init(void)
 
 void __init udplite4_register(void)
 {
-	udp_table_init(&udplite_table);
+	udp_table_init(&udplite_table, "UDP-Lite");
 	if (proto_register(&udplite_prot, 1))
 		goto out_register_err;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 3a60f12..d42f503 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -132,7 +132,7 @@ static struct sock *__udp6_lib_lookup(struct net *net,
 	struct sock *sk, *result;
 	struct hlist_nulls_node *node;
 	unsigned short hnum = ntohs(dport);
-	unsigned int hash = udp_hashfn(net, hnum);
+	unsigned int hash = udp_hashfn(net, hnum, udptable->mask);
 	struct udp_hslot *hslot = &udptable->hash[hash];
 	int score, badness;
 
@@ -452,7 +452,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 {
 	struct sock *sk, *sk2;
 	const struct udphdr *uh = udp_hdr(skb);
-	struct udp_hslot *hslot = &udptable->hash[udp_hashfn(net, ntohs(uh->dest))];
+	struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));
 	int dif;
 
 	spin_lock(&hslot->lock);
@@ -1195,7 +1195,7 @@ static void udp6_sock_seq_show(struct seq_file *seq, struct sock *sp, int bucket
 	destp = ntohs(inet->dport);
 	srcp  = ntohs(inet->sport);
 	seq_printf(seq,
-		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
+		   "%5d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
 		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
 		   bucket,
 		   src->s6_addr32[0], src->s6_addr32[1],

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next-2.6] udp: dynamically size hash tables at boot time
  2009-10-07 10:37         ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet
@ 2009-10-07 10:47           ` David Miller
  2009-10-08  5:01           ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2009-10-07 10:47 UTC (permalink / raw)
  To: eric.dumazet; +Cc: rick.jones2, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Oct 2009 12:37:59 +0200

> David Miller a écrit :
>> 
>> That's incredible that it's been that low for so long :-)
>> 
>> Bug please, dynamically size this thing, maybe with a cap of say 64K
>> to start with.  If you don't have time for it I'll take care of this.
> 
> Here we are.
 ...
> [PATCH] udp: dynamically size hash tables at boot time

Looks good to me.

I'll let this sit for at least a day to get some review from
others.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC net-next-2.6] net: speedup sk_wake_async()
  2009-10-07  3:37   ` Eric Dumazet
  2009-10-07  4:43     ` [PATCH] udp: extend hash tables to 256 slots Eric Dumazet
@ 2009-10-07 15:53     ` Rick Jones
  1 sibling, 0 replies; 12+ messages in thread
From: Rick Jones @ 2009-10-07 15:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Linux Netdev List

Eric Dumazet wrote:
> Rick Jones a écrit :
> 
>>How about 64-bit?
> 
> 
> No data yet, but larger footprint unfortunatly :-(

True - nothing comes for free.  I'm not "in touch" with the embedded side, where 
I presume 32 bit will be if not already is now the primary bitness, but over in 
the server side of the world, at least the part I see, 64 bit is de rigeur, 
hence my curiousity.

>>Got any netperf service demand changes?
> 
> 
> I was going to setup a bench lab, with a typical RTP mediaserver, with say
> 4000 UDP sockets, 2000 sockets exchanging 50 G.711 Alaw/ulaw
> messages per second tx and rx. (Total : 100.000 packets per second each way)
> 
> Is netperf able to simulate this workload ?

Touche :)

It would be, well, cumbersome with netperf2, but possible.  One would 
./configure --enable-intervals and then run some variation of:

netperf -t UDP_STREAM -l <time> -H <remote> -b <burst size> -w <burst interval> 
-- -m <message size>

a large number of times.  Given the lack of test synchronization in netperf2 I 
probably would not try to aggregate the results of N thousand simultaneous 
netperf2 instances and would rely instead on external (relative to netperf) 
packet rate reports.

Still, if the cache miss removed is a non-trivial fraction of the overhead I 
would think that something like:

netperf -t UDP_RR -l <time> -I 99,0.5 -i 30,3 -c -C -H remote -- -r 4

run with and without the change would show a difference in the service demand, 
and if you hit the confidence intervals you would be able, per the above be 
confident in the "reality" of a CPU utilization difference of +/- 0.25% . 
Getting that test to that level of confidence probably means pinning the NIC 
interrupts to a specific CPU and then binding netperf/netserver on either side 
using the global -T option.

Barring getting sutiable confidence intervals, somewhere in the middle of all 
that would be ./configure --enable-burst and then, still with pinning and 
binding for "stability" something like:

netperf -t UDP_RR -l <time> -I 99,0.5 -i 30,3 -H <remote> -- -r 4 -b <burst>

to put multiple transactions in flight across that flow - choosing <burst> to 
take the CPU on which either netperf, netserver, or the interrupts are running 
to 100% saturation.  Here I left-off the CPU utilization since that is often the 
thing that cannot hit the confidence intervals, and leave the aggregate 
throughput as the proxy for efficiency change - which is why <burst> needs to 
take something to saturation in each case.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next-2.6] udp: dynamically size hash tables at boot time
  2009-10-07 10:37         ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet
  2009-10-07 10:47           ` David Miller
@ 2009-10-08  5:01           ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2009-10-08  5:01 UTC (permalink / raw)
  To: eric.dumazet; +Cc: rick.jones2, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Oct 2009 12:37:59 +0200

> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-10-08  5:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-06 23:08 [RFC net-next-2.6] net: speedup sk_wake_async() Eric Dumazet
2009-10-07  0:28 ` David Miller
2009-10-07  0:42 ` Rick Jones
2009-10-07  3:37   ` Eric Dumazet
2009-10-07  4:43     ` [PATCH] udp: extend hash tables to 256 slots Eric Dumazet
2009-10-07  5:29       ` David Miller
2009-10-07  5:33         ` Eric Dumazet
2009-10-07  5:35           ` David Miller
2009-10-07 10:37         ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet
2009-10-07 10:47           ` David Miller
2009-10-08  5:01           ` David Miller
2009-10-07 15:53     ` [RFC net-next-2.6] net: speedup sk_wake_async() Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).