* [RFC net-next-2.6] net: speedup sk_wake_async() @ 2009-10-06 23:08 Eric Dumazet 2009-10-07 0:28 ` David Miller 2009-10-07 0:42 ` Rick Jones 0 siblings, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2009-10-06 23:08 UTC (permalink / raw) To: David S. Miller; +Cc: Linux Netdev List Latency works, part 1 An incoming datagram must bring into cpu cache *lot* of cache lines, in particular : (other parts omitted (hash chains, ip route cache...)) On 32bit arches : offsetof(struct sock, sk_rcvbuf) =0x30 (read) offsetof(struct sock, sk_lock) =0x34 (rw) offsetof(struct sock, sk_sleep) =0x50 (read) offsetof(struct sock, sk_rmem_alloc) =0x64 (rw) offsetof(struct sock, sk_receive_queue)=0x74 (rw) offsetof(struct sock, sk_forward_alloc)=0x98 (rw) offsetof(struct sock, sk_callback_lock)=0xcc (rw) offsetof(struct sock, sk_drops) =0xd8 (read if we add dropcount support, rw if frame dropped) offsetof(struct sock, sk_filter) =0xf8 (read) offsetof(struct sock, sk_socket) =0x138 (read) offsetof(struct sock, sk_data_ready) =0x15c (read) We can avoid sk->sk_socket and socket->fasync_list referencing on sockets with no fasync() structures. (socket->fasync_list ptr is probably already in cache because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep) This avoids one cache line load per incoming packet for common cases (no fasync()) We can leave (or even move in a future patch) sk->sk_socket in a cold location Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/sock.h | 3 ++- net/socket.c | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index 1621935..98398bd 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -504,6 +504,7 @@ enum sock_flags { SOCK_TIMESTAMPING_SOFTWARE, /* %SOF_TIMESTAMPING_SOFTWARE */ SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */ SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */ + SOCK_FASYNC, /* fasync() active */ }; static inline void sock_copy_flags(struct sock *nsk, struct sock *osk) @@ -1396,7 +1397,7 @@ static inline unsigned long sock_wspace(struct sock *sk) static inline void sk_wake_async(struct sock *sk, int how, int band) { - if (sk->sk_socket && sk->sk_socket->fasync_list) + if (sock_flag(sk, SOCK_FASYNC)) sock_wake_async(sk->sk_socket, how, band); } diff --git a/net/socket.c b/net/socket.c index 7565536..d53ad11 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1100,11 +1100,14 @@ static int sock_fasync(int fd, struct file *filp, int on) fna->fa_next = sock->fasync_list; write_lock_bh(&sk->sk_callback_lock); sock->fasync_list = fna; + sock_set_flag(sk, SOCK_FASYNC); write_unlock_bh(&sk->sk_callback_lock); } else { if (fa != NULL) { write_lock_bh(&sk->sk_callback_lock); *prev = fa->fa_next; + if (!sock->fasync_list) + sock_reset_flag(sk, SOCK_FASYNC); write_unlock_bh(&sk->sk_callback_lock); kfree(fa); } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC net-next-2.6] net: speedup sk_wake_async() 2009-10-06 23:08 [RFC net-next-2.6] net: speedup sk_wake_async() Eric Dumazet @ 2009-10-07 0:28 ` David Miller 2009-10-07 0:42 ` Rick Jones 1 sibling, 0 replies; 12+ messages in thread From: David Miller @ 2009-10-07 0:28 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 07 Oct 2009 01:08:08 +0200 > An incoming datagram must bring into cpu cache *lot* of cache lines, > in particular : (other parts omitted (hash chains, ip route cache...)) > > On 32bit arches : ... > We can avoid sk->sk_socket and socket->fasync_list referencing on sockets > with no fasync() structures. (socket->fasync_list ptr is probably already in cache > because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep) > > This avoids one cache line load per incoming packet for common cases (no fasync()) > > We can leave (or even move in a future patch) sk->sk_socket in a cold location > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> I like it, applied to net-next-2.6, thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next-2.6] net: speedup sk_wake_async() 2009-10-06 23:08 [RFC net-next-2.6] net: speedup sk_wake_async() Eric Dumazet 2009-10-07 0:28 ` David Miller @ 2009-10-07 0:42 ` Rick Jones 2009-10-07 3:37 ` Eric Dumazet 1 sibling, 1 reply; 12+ messages in thread From: Rick Jones @ 2009-10-07 0:42 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, Linux Netdev List Eric Dumazet wrote: > Latency works, part 1 > > > An incoming datagram must bring into cpu cache *lot* of cache lines, > in particular : (other parts omitted (hash chains, ip route cache...)) > > On 32bit arches : How about 64-bit? > offsetof(struct sock, sk_rcvbuf) =0x30 (read) > offsetof(struct sock, sk_lock) =0x34 (rw) > > offsetof(struct sock, sk_sleep) =0x50 (read) > offsetof(struct sock, sk_rmem_alloc) =0x64 (rw) > offsetof(struct sock, sk_receive_queue)=0x74 (rw) > > offsetof(struct sock, sk_forward_alloc)=0x98 (rw) > > offsetof(struct sock, sk_callback_lock)=0xcc (rw) > offsetof(struct sock, sk_drops) =0xd8 (read if we add dropcount support, rw if frame dropped) > offsetof(struct sock, sk_filter) =0xf8 (read) > > offsetof(struct sock, sk_socket) =0x138 (read) > > offsetof(struct sock, sk_data_ready) =0x15c (read) > > > We can avoid sk->sk_socket and socket->fasync_list referencing on sockets > with no fasync() structures. (socket->fasync_list ptr is probably already in cache > because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep) > > This avoids one cache line load per incoming packet for common cases (no fasync()) > > We can leave (or even move in a future patch) sk->sk_socket in a cold location > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Got any netperf service demand changes? rick jones ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next-2.6] net: speedup sk_wake_async() 2009-10-07 0:42 ` Rick Jones @ 2009-10-07 3:37 ` Eric Dumazet 2009-10-07 4:43 ` [PATCH] udp: extend hash tables to 256 slots Eric Dumazet 2009-10-07 15:53 ` [RFC net-next-2.6] net: speedup sk_wake_async() Rick Jones 0 siblings, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2009-10-07 3:37 UTC (permalink / raw) To: Rick Jones; +Cc: David S. Miller, Linux Netdev List Rick Jones a écrit : > > How about 64-bit? No data yet, but larger footprint unfortunatly :-( > > Got any netperf service demand changes? I was going to setup a bench lab, with a typical RTP mediaserver, with say 4000 UDP sockets, 2000 sockets exchanging 50 G.711 Alaw/ulaw messages per second tx and rx. (Total : 100.000 packets per second each way) Is netperf able to simulate this workload ? ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] udp: extend hash tables to 256 slots 2009-10-07 3:37 ` Eric Dumazet @ 2009-10-07 4:43 ` Eric Dumazet 2009-10-07 5:29 ` David Miller 2009-10-07 15:53 ` [RFC net-next-2.6] net: speedup sk_wake_async() Rick Jones 1 sibling, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2009-10-07 4:43 UTC (permalink / raw) To: David S. Miller; +Cc: Rick Jones, Linux Netdev List Eric Dumazet a écrit : > I was going to setup a bench lab, with a typical RTP mediaserver, with say > 4000 UDP sockets, 2000 sockets exchanging 50 G.711 Alaw/ulaw > messages per second tx and rx. (Total : 100.000 packets per second each way) > Hmm, it seems we'll have too many sockets per udp hash chain unfortunatly for this workload to show any improvement. (~32 sockets per chain : average of 16 misses to lookup the target socket.) David, I believe UDP_HTABLE_SIZE never changed from its initial value of 128, defined 15 years ago. Could we bump it to 256 ? (back in 1995, SOCK_ARRAY_SIZE was 256) (I'll probably use 1024 value for my tests) [PATCH] udp: extend hash tables to 256 slots UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups. 4000 active sockets -> 32 sockets per chain in average. Doubling hash table size has a memory cost of 128 (pointers + spinlocks) for UDP, same for UDPLite, this should be OK. It reduces the size of bitmap used in udp_lib_get_port() and speedup port allocation. #define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE) -> 256 bits instead of 512 bits Use CONFIG_BASE_SMALL to keep hash tables small for small machines. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/include/linux/udp.h b/include/linux/udp.h index 0cf5c4c..8aaa151 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -45,7 +45,7 @@ static inline struct udphdr *udp_hdr(const struct sk_buff *skb) return (struct udphdr *)skb_transport_header(skb); } -#define UDP_HTABLE_SIZE 128 +#define UDP_HTABLE_SIZE (CONFIG_BASE_SMALL ? 128 : 256) static inline int udp_hashfn(struct net *net, const unsigned num) { ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] udp: extend hash tables to 256 slots 2009-10-07 4:43 ` [PATCH] udp: extend hash tables to 256 slots Eric Dumazet @ 2009-10-07 5:29 ` David Miller 2009-10-07 5:33 ` Eric Dumazet 2009-10-07 10:37 ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet 0 siblings, 2 replies; 12+ messages in thread From: David Miller @ 2009-10-07 5:29 UTC (permalink / raw) To: eric.dumazet; +Cc: rick.jones2, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 07 Oct 2009 06:43:31 +0200 > David, I believe UDP_HTABLE_SIZE never changed from its initial value of 128, > defined 15 years ago. Could we bump it to 256 ? > > (back in 1995, SOCK_ARRAY_SIZE was 256) > > (I'll probably use 1024 value for my tests) That's incredible that it's been that low for so long :-) Bug please, dynamically size this thing, maybe with a cap of say 64K to start with. If you don't have time for it I'll take care of this. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] udp: extend hash tables to 256 slots 2009-10-07 5:29 ` David Miller @ 2009-10-07 5:33 ` Eric Dumazet 2009-10-07 5:35 ` David Miller 2009-10-07 10:37 ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet 1 sibling, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2009-10-07 5:33 UTC (permalink / raw) To: David Miller; +Cc: rick.jones2, netdev David Miller a écrit : > > That's incredible that it's been that low for so long :-) > > Bug please, dynamically size this thing, maybe with a cap of say 64K > to start with. If you don't have time for it I'll take care of this. Well, we can not exceed 65536 slots, given the nature of UDP protocol :) Do you mean a static allocation at boot time with a size that can be overidden in cmdline(like tcp and ip route), or something that can dynamically extends hash table at runtime ? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] udp: extend hash tables to 256 slots 2009-10-07 5:33 ` Eric Dumazet @ 2009-10-07 5:35 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2009-10-07 5:35 UTC (permalink / raw) To: eric.dumazet; +Cc: rick.jones2, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 07 Oct 2009 07:33:09 +0200 > David Miller a écrit : >> >> That's incredible that it's been that low for so long :-) >> >> Bug please, dynamically size this thing, maybe with a cap of say 64K >> to start with. If you don't have time for it I'll take care of this. > > > Well, we can not exceed 65536 slots, given the nature of UDP protocol :) > > Do you mean a static allocation at boot time with a size that can be > overidden in cmdline(like tcp and ip route), > > or something that can dynamically extends hash table at runtime ? I mean dynamically size it between 256 and 65536 slots at boot time, based upon memory size. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next-2.6] udp: dynamically size hash tables at boot time 2009-10-07 5:29 ` David Miller 2009-10-07 5:33 ` Eric Dumazet @ 2009-10-07 10:37 ` Eric Dumazet 2009-10-07 10:47 ` David Miller 2009-10-08 5:01 ` David Miller 1 sibling, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2009-10-07 10:37 UTC (permalink / raw) To: David Miller; +Cc: rick.jones2, netdev David Miller a écrit : > > That's incredible that it's been that low for so long :-) > > Bug please, dynamically size this thing, maybe with a cap of say 64K > to start with. If you don't have time for it I'll take care of this. Here we are. Thank you [PATCH] udp: dynamically size hash tables at boot time UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups. 4000 active UDP sockets -> 32 sockets per chain in average. An incoming frame has to lookup all sockets to find best match, so long chains hurt latency. Instead of a fixed size hash table that cant be perfect for every needs, let UDP stack choose its table size at boot time like tcp/ip route, using alloc_large_system_hash() helper Add an optional boot parameter, uhash_entries=x so that an admin can force a size between 256 and 65536 if needed, like thash_entries and rhash_entries. dmesg logs two new lines : [ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes) [ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes) Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non debugging spinlocks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- Documentation/kernel-parameters.txt | 3 include/linux/udp.h | 6 - include/net/udp.h | 13 ++- net/ipv4/udp.c | 91 ++++++++++++++++++-------- net/ipv4/udplite.c | 4 - net/ipv6/udp.c | 6 - 6 files changed, 87 insertions(+), 36 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 6fa7292..02df20b 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2589,6 +2589,9 @@ and is between 256 and 4096 characters. It is defined in the file uart6850= [HW,OSS] Format: <io>,<irq> + uhash_entries= [KNL,NET] + Set number of hash buckets for UDP/UDP-Lite connections + uhci-hcd.ignore_oc= [USB] Ignore overcurrent events (default N). Some badly-designed motherboards generate lots of diff --git a/include/linux/udp.h b/include/linux/udp.h index 0cf5c4c..832361e 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -45,11 +45,11 @@ static inline struct udphdr *udp_hdr(const struct sk_buff *skb) return (struct udphdr *)skb_transport_header(skb); } -#define UDP_HTABLE_SIZE 128 +#define UDP_HTABLE_SIZE_MIN (CONFIG_BASE_SMALL ? 128 : 256) -static inline int udp_hashfn(struct net *net, const unsigned num) +static inline int udp_hashfn(struct net *net, unsigned num, unsigned mask) { - return (num + net_hash_mix(net)) & (UDP_HTABLE_SIZE - 1); + return (num + net_hash_mix(net)) & mask; } struct udp_sock { diff --git a/include/net/udp.h b/include/net/udp.h index f98abd2..22aa2e7 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -54,12 +54,19 @@ struct udp_hslot { struct hlist_nulls_head head; spinlock_t lock; } __attribute__((aligned(2 * sizeof(long)))); + struct udp_table { - struct udp_hslot hash[UDP_HTABLE_SIZE]; + struct udp_hslot *hash; + unsigned int mask; + unsigned int log; }; extern struct udp_table udp_table; -extern void udp_table_init(struct udp_table *); - +extern void udp_table_init(struct udp_table *, const char *); +static inline struct udp_hslot *udp_hashslot(struct udp_table *table, + struct net *net, unsigned num) +{ + return &table->hash[udp_hashfn(net, num, table->mask)]; +} /* Note: this must match 'valbool' in sock_setsockopt */ #define UDP_CSUM_NOXMIT 1 diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 6ec6a8a..194bcdc 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -106,7 +106,7 @@ #include <net/xfrm.h> #include "udp_impl.h" -struct udp_table udp_table; +struct udp_table udp_table __read_mostly; EXPORT_SYMBOL(udp_table); int sysctl_udp_mem[3] __read_mostly; @@ -121,14 +121,16 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min); atomic_t udp_memory_allocated; EXPORT_SYMBOL(udp_memory_allocated); -#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE) +#define MAX_UDP_PORTS 65536 +#define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN) static int udp_lib_lport_inuse(struct net *net, __u16 num, const struct udp_hslot *hslot, unsigned long *bitmap, struct sock *sk, int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2)) + const struct sock *sk2), + unsigned int log) { struct sock *sk2; struct hlist_nulls_node *node; @@ -142,8 +144,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num, || sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && (*saddr_comp)(sk, sk2)) { if (bitmap) - __set_bit(sk2->sk_hash / UDP_HTABLE_SIZE, - bitmap); + __set_bit(sk2->sk_hash >> log, bitmap); else return 1; } @@ -180,13 +181,15 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, /* * force rand to be an odd multiple of UDP_HTABLE_SIZE */ - rand = (rand | 1) * UDP_HTABLE_SIZE; - for (last = first + UDP_HTABLE_SIZE; first != last; first++) { - hslot = &udptable->hash[udp_hashfn(net, first)]; + rand = (rand | 1) * (udptable->mask + 1); + for (last = first + udptable->mask + 1; + first != last; + first++) { + hslot = udp_hashslot(udptable, net, first); bitmap_zero(bitmap, PORTS_PER_CHAIN); spin_lock_bh(&hslot->lock); udp_lib_lport_inuse(net, snum, hslot, bitmap, sk, - saddr_comp); + saddr_comp, udptable->log); snum = first; /* @@ -196,7 +199,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, */ do { if (low <= snum && snum <= high && - !test_bit(snum / UDP_HTABLE_SIZE, bitmap)) + !test_bit(snum >> udptable->log, bitmap)) goto found; snum += rand; } while (snum != first); @@ -204,9 +207,10 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, } goto fail; } else { - hslot = &udptable->hash[udp_hashfn(net, snum)]; + hslot = udp_hashslot(udptable, net, snum); spin_lock_bh(&hslot->lock); - if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, saddr_comp)) + if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, + saddr_comp, 0)) goto fail_unlock; } found: @@ -283,7 +287,7 @@ static struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, struct sock *sk, *result; struct hlist_nulls_node *node; unsigned short hnum = ntohs(dport); - unsigned int hash = udp_hashfn(net, hnum); + unsigned int hash = udp_hashfn(net, hnum, udptable->mask); struct udp_hslot *hslot = &udptable->hash[hash]; int score, badness; @@ -1013,8 +1017,8 @@ void udp_lib_unhash(struct sock *sk) { if (sk_hashed(sk)) { struct udp_table *udptable = sk->sk_prot->h.udp_table; - unsigned int hash = udp_hashfn(sock_net(sk), sk->sk_hash); - struct udp_hslot *hslot = &udptable->hash[hash]; + struct udp_hslot *hslot = udp_hashslot(udptable, sock_net(sk), + sk->sk_hash); spin_lock_bh(&hslot->lock); if (sk_nulls_del_node_init_rcu(sk)) { @@ -1169,7 +1173,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb, struct udp_table *udptable) { struct sock *sk; - struct udp_hslot *hslot = &udptable->hash[udp_hashfn(net, ntohs(uh->dest))]; + struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest)); int dif; spin_lock(&hslot->lock); @@ -1609,9 +1613,14 @@ static struct sock *udp_get_first(struct seq_file *seq, int start) struct udp_iter_state *state = seq->private; struct net *net = seq_file_net(seq); - for (state->bucket = start; state->bucket < UDP_HTABLE_SIZE; ++state->bucket) { + for (state->bucket = start; state->bucket <= state->udp_table->mask; + ++state->bucket) { struct hlist_nulls_node *node; struct udp_hslot *hslot = &state->udp_table->hash[state->bucket]; + + if (hlist_nulls_empty(&hslot->head)) + continue; + spin_lock_bh(&hslot->lock); sk_nulls_for_each(sk, node, &hslot->head) { if (!net_eq(sock_net(sk), net)) @@ -1636,7 +1645,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk) } while (sk && (!net_eq(sock_net(sk), net) || sk->sk_family != state->family)); if (!sk) { - if (state->bucket < UDP_HTABLE_SIZE) + if (state->bucket <= state->udp_table->mask) spin_unlock_bh(&state->udp_table->hash[state->bucket].lock); return udp_get_first(seq, state->bucket + 1); } @@ -1656,7 +1665,7 @@ static struct sock *udp_get_idx(struct seq_file *seq, loff_t pos) static void *udp_seq_start(struct seq_file *seq, loff_t *pos) { struct udp_iter_state *state = seq->private; - state->bucket = UDP_HTABLE_SIZE; + state->bucket = MAX_UDP_PORTS; return *pos ? udp_get_idx(seq, *pos-1) : SEQ_START_TOKEN; } @@ -1678,7 +1687,7 @@ static void udp_seq_stop(struct seq_file *seq, void *v) { struct udp_iter_state *state = seq->private; - if (state->bucket < UDP_HTABLE_SIZE) + if (state->bucket <= state->udp_table->mask) spin_unlock_bh(&state->udp_table->hash[state->bucket].lock); } @@ -1738,7 +1747,7 @@ static void udp4_format_sock(struct sock *sp, struct seq_file *f, __u16 destp = ntohs(inet->dport); __u16 srcp = ntohs(inet->sport); - seq_printf(f, "%4d: %08X:%04X %08X:%04X" + seq_printf(f, "%5d: %08X:%04X %08X:%04X" " %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d%n", bucket, src, srcp, dest, destp, sp->sk_state, sk_wmem_alloc_get(sp), @@ -1804,11 +1813,43 @@ void udp4_proc_exit(void) } #endif /* CONFIG_PROC_FS */ -void __init udp_table_init(struct udp_table *table) +static __initdata unsigned long uhash_entries; +static int __init set_uhash_entries(char *str) { - int i; + if (!str) + return 0; + uhash_entries = simple_strtoul(str, &str, 0); + if (uhash_entries && uhash_entries < UDP_HTABLE_SIZE_MIN) + uhash_entries = UDP_HTABLE_SIZE_MIN; + return 1; +} +__setup("uhash_entries=", set_uhash_entries); - for (i = 0; i < UDP_HTABLE_SIZE; i++) { +void __init udp_table_init(struct udp_table *table, const char *name) +{ + unsigned int i; + + if (!CONFIG_BASE_SMALL) + table->hash = alloc_large_system_hash(name, + sizeof(struct udp_hslot), + uhash_entries, + 21, /* one slot per 2 MB */ + 0, + &table->log, + &table->mask, + 64 * 1024); + /* + * Make sure hash table has the minimum size + */ + if (CONFIG_BASE_SMALL || table->mask < UDP_HTABLE_SIZE_MIN - 1) { + table->hash = kmalloc(UDP_HTABLE_SIZE_MIN * + sizeof(struct udp_hslot), GFP_KERNEL); + if (!table->hash) + panic(name); + table->log = ilog2(UDP_HTABLE_SIZE_MIN); + table->mask = UDP_HTABLE_SIZE_MIN - 1; + } + for (i = 0; i <= table->mask; i++) { INIT_HLIST_NULLS_HEAD(&table->hash[i].head, i); spin_lock_init(&table->hash[i].lock); } @@ -1818,7 +1859,7 @@ void __init udp_init(void) { unsigned long nr_pages, limit; - udp_table_init(&udp_table); + udp_table_init(&udp_table, "UDP"); /* Set the pressure threshold up by the same strategy of TCP. It is a * fraction of global memory that is up to 1/2 at 256 MB, decreasing * toward zero with the amount of memory, with a floor of 128 pages. diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c index 95248d7..a495ca8 100644 --- a/net/ipv4/udplite.c +++ b/net/ipv4/udplite.c @@ -12,7 +12,7 @@ */ #include "udp_impl.h" -struct udp_table udplite_table; +struct udp_table udplite_table __read_mostly; EXPORT_SYMBOL(udplite_table); static int udplite_rcv(struct sk_buff *skb) @@ -110,7 +110,7 @@ static inline int udplite4_proc_init(void) void __init udplite4_register(void) { - udp_table_init(&udplite_table); + udp_table_init(&udplite_table, "UDP-Lite"); if (proto_register(&udplite_prot, 1)) goto out_register_err; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 3a60f12..d42f503 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -132,7 +132,7 @@ static struct sock *__udp6_lib_lookup(struct net *net, struct sock *sk, *result; struct hlist_nulls_node *node; unsigned short hnum = ntohs(dport); - unsigned int hash = udp_hashfn(net, hnum); + unsigned int hash = udp_hashfn(net, hnum, udptable->mask); struct udp_hslot *hslot = &udptable->hash[hash]; int score, badness; @@ -452,7 +452,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb, { struct sock *sk, *sk2; const struct udphdr *uh = udp_hdr(skb); - struct udp_hslot *hslot = &udptable->hash[udp_hashfn(net, ntohs(uh->dest))]; + struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest)); int dif; spin_lock(&hslot->lock); @@ -1195,7 +1195,7 @@ static void udp6_sock_seq_show(struct seq_file *seq, struct sock *sp, int bucket destp = ntohs(inet->dport); srcp = ntohs(inet->sport); seq_printf(seq, - "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " + "%5d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n", bucket, src->s6_addr32[0], src->s6_addr32[1], ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next-2.6] udp: dynamically size hash tables at boot time 2009-10-07 10:37 ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet @ 2009-10-07 10:47 ` David Miller 2009-10-08 5:01 ` David Miller 1 sibling, 0 replies; 12+ messages in thread From: David Miller @ 2009-10-07 10:47 UTC (permalink / raw) To: eric.dumazet; +Cc: rick.jones2, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 07 Oct 2009 12:37:59 +0200 > David Miller a écrit : >> >> That's incredible that it's been that low for so long :-) >> >> Bug please, dynamically size this thing, maybe with a cap of say 64K >> to start with. If you don't have time for it I'll take care of this. > > Here we are. ... > [PATCH] udp: dynamically size hash tables at boot time Looks good to me. I'll let this sit for at least a day to get some review from others. Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next-2.6] udp: dynamically size hash tables at boot time 2009-10-07 10:37 ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet 2009-10-07 10:47 ` David Miller @ 2009-10-08 5:01 ` David Miller 1 sibling, 0 replies; 12+ messages in thread From: David Miller @ 2009-10-08 5:01 UTC (permalink / raw) To: eric.dumazet; +Cc: rick.jones2, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 07 Oct 2009 12:37:59 +0200 > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next-2.6] net: speedup sk_wake_async() 2009-10-07 3:37 ` Eric Dumazet 2009-10-07 4:43 ` [PATCH] udp: extend hash tables to 256 slots Eric Dumazet @ 2009-10-07 15:53 ` Rick Jones 1 sibling, 0 replies; 12+ messages in thread From: Rick Jones @ 2009-10-07 15:53 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, Linux Netdev List Eric Dumazet wrote: > Rick Jones a écrit : > >>How about 64-bit? > > > No data yet, but larger footprint unfortunatly :-( True - nothing comes for free. I'm not "in touch" with the embedded side, where I presume 32 bit will be if not already is now the primary bitness, but over in the server side of the world, at least the part I see, 64 bit is de rigeur, hence my curiousity. >>Got any netperf service demand changes? > > > I was going to setup a bench lab, with a typical RTP mediaserver, with say > 4000 UDP sockets, 2000 sockets exchanging 50 G.711 Alaw/ulaw > messages per second tx and rx. (Total : 100.000 packets per second each way) > > Is netperf able to simulate this workload ? Touche :) It would be, well, cumbersome with netperf2, but possible. One would ./configure --enable-intervals and then run some variation of: netperf -t UDP_STREAM -l <time> -H <remote> -b <burst size> -w <burst interval> -- -m <message size> a large number of times. Given the lack of test synchronization in netperf2 I probably would not try to aggregate the results of N thousand simultaneous netperf2 instances and would rely instead on external (relative to netperf) packet rate reports. Still, if the cache miss removed is a non-trivial fraction of the overhead I would think that something like: netperf -t UDP_RR -l <time> -I 99,0.5 -i 30,3 -c -C -H remote -- -r 4 run with and without the change would show a difference in the service demand, and if you hit the confidence intervals you would be able, per the above be confident in the "reality" of a CPU utilization difference of +/- 0.25% . Getting that test to that level of confidence probably means pinning the NIC interrupts to a specific CPU and then binding netperf/netserver on either side using the global -T option. Barring getting sutiable confidence intervals, somewhere in the middle of all that would be ./configure --enable-burst and then, still with pinning and binding for "stability" something like: netperf -t UDP_RR -l <time> -I 99,0.5 -i 30,3 -H <remote> -- -r 4 -b <burst> to put multiple transactions in flight across that flow - choosing <burst> to take the CPU on which either netperf, netserver, or the interrupts are running to 100% saturation. Here I left-off the CPU utilization since that is often the thing that cannot hit the confidence intervals, and leave the aggregate throughput as the proxy for efficiency change - which is why <burst> needs to take something to saturation in each case. happy benchmarking, rick jones ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-10-08 5:01 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-06 23:08 [RFC net-next-2.6] net: speedup sk_wake_async() Eric Dumazet 2009-10-07 0:28 ` David Miller 2009-10-07 0:42 ` Rick Jones 2009-10-07 3:37 ` Eric Dumazet 2009-10-07 4:43 ` [PATCH] udp: extend hash tables to 256 slots Eric Dumazet 2009-10-07 5:29 ` David Miller 2009-10-07 5:33 ` Eric Dumazet 2009-10-07 5:35 ` David Miller 2009-10-07 10:37 ` [PATCH net-next-2.6] udp: dynamically size hash tables at boot time Eric Dumazet 2009-10-07 10:47 ` David Miller 2009-10-08 5:01 ` David Miller 2009-10-07 15:53 ` [RFC net-next-2.6] net: speedup sk_wake_async() Rick Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).