* [PATCH] Bound TSO defer time (resend) @ 2006-10-17 0:53 John Heffner 2006-10-17 3:20 ` Stephen Hemminger 0 siblings, 1 reply; 32+ messages in thread From: John Heffner @ 2006-10-17 0:53 UTC (permalink / raw) To: netdev The original message didn't show up on the list. I'm assuming it's because the filters didn't like the attached postscript. I posted PDFs of the figures on the web: http://www.psc.edu/~jheffner/tmp/a.pdf http://www.psc.edu/~jheffner/tmp/b.pdf http://www.psc.edu/~jheffner/tmp/c.pdf -John ---------- Forwarded message ---------- Date: Mon, 16 Oct 2006 15:55:53 -0400 (EDT) From: John Heffner <jheffner@psc.edu> To: David Miller <davem@davemloft.net> Cc: netdev <netdev@vger.kernel.org> Subject: [PATCH] Bound TSO defer time This patch limits the amount of time you will defer sending a TSO segment to less than two clock ticks, or the time between two acks, whichever is longer. On slow links, deferring causes significant bursts. See attached plots, which show RTT through a 1 Mbps link with a 100 ms RTT and ~100 ms queue for (a) non-TSO, (b) currnet TSO, and (c) patched TSO. This burstiness causes significant jitter, tends to overflow queues early (bad for short queues), and makes delay-based congestion control more difficult. Deferring by a couple clock ticks I believe will have a relatively small impact on performance. Signed-off-by: John Heffner <jheffner@psc.edu> diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 0e058a2..27ae4b2 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -341,7 +341,9 @@ #endif int linger2; unsigned long last_synq_overflow; - + + __u32 tso_deferred; + /* Receiver side RTT estimation */ struct { __u32 rtt; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 9a253fa..3ea8973 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1087,11 +1087,15 @@ static int tcp_tso_should_defer(struct s u32 send_win, cong_win, limit, in_flight; if (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN) - return 0; + goto send_now; if (icsk->icsk_ca_state != TCP_CA_Open) - return 0; + goto send_now; + /* Defer for less than two clock ticks. */ + if (!tp->tso_deferred && ((jiffies<<1)>>1) - (tp->tso_deferred>>1) > 1) + goto send_now; + in_flight = tcp_packets_in_flight(tp); BUG_ON(tcp_skb_pcount(skb) <= 1 || @@ -1106,8 +1110,8 @@ static int tcp_tso_should_defer(struct s /* If a full-sized TSO skb can be sent, do it. */ if (limit >= 65536) - return 0; - + goto send_now; + if (sysctl_tcp_tso_win_divisor) { u32 chunk = min(tp->snd_wnd, tp->snd_cwnd * tp->mss_cache); @@ -1116,7 +1120,7 @@ static int tcp_tso_should_defer(struct s */ chunk /= sysctl_tcp_tso_win_divisor; if (limit >= chunk) - return 0; + goto send_now; } else { /* Different approach, try not to defer past a single * ACK. Receiver should ACK every other full sized @@ -1124,11 +1128,17 @@ static int tcp_tso_should_defer(struct s * then send now. */ if (limit > tcp_max_burst(tp) * tp->mss_cache) - return 0; + goto send_now; } - + /* Ok, it looks like it is advisable to defer. */ + tp->tso_deferred = 1 | (jiffies<<1); + return 1; + +send_now: + tp->tso_deferred = 0; + return 0; } /* Create a new MTU probe if we are ready. ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH] Bound TSO defer time (resend) 2006-10-17 0:53 [PATCH] Bound TSO defer time (resend) John Heffner @ 2006-10-17 3:20 ` Stephen Hemminger 2006-10-17 4:18 ` John Heffner 0 siblings, 1 reply; 32+ messages in thread From: Stephen Hemminger @ 2006-10-17 3:20 UTC (permalink / raw) To: John Heffner; +Cc: netdev On Mon, 16 Oct 2006 20:53:20 -0400 (EDT) John Heffner <jheffner@psc.edu> wrote: > The original message didn't show up on the list. I'm assuming it's > because the filters didn't like the attached postscript. I posted PDFs of > the figures on the web: > > http://www.psc.edu/~jheffner/tmp/a.pdf > http://www.psc.edu/~jheffner/tmp/b.pdf > http://www.psc.edu/~jheffner/tmp/c.pdf > > -John > > > ---------- Forwarded message ---------- > Date: Mon, 16 Oct 2006 15:55:53 -0400 (EDT) > From: John Heffner <jheffner@psc.edu> > To: David Miller <davem@davemloft.net> > Cc: netdev <netdev@vger.kernel.org> > Subject: [PATCH] Bound TSO defer time > > This patch limits the amount of time you will defer sending a TSO segment > to less than two clock ticks, or the time between two acks, whichever is > longer. > > On slow links, deferring causes significant bursts. See attached plots, > which show RTT through a 1 Mbps link with a 100 ms RTT and ~100 ms queue > for (a) non-TSO, (b) currnet TSO, and (c) patched TSO. This burstiness > causes significant jitter, tends to overflow queues early (bad for short > queues), and makes delay-based congestion control more difficult. > > Deferring by a couple clock ticks I believe will have a relatively small > impact on performance. > > > Signed-off-by: John Heffner <jheffner@psc.edu> Okay, but doing any timing on clock ticks makes the behavior dependent on the value of HZ which doesn't seem desirable. Should this be based on RTT or a real-time values? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Bound TSO defer time (resend) 2006-10-17 3:20 ` Stephen Hemminger @ 2006-10-17 4:18 ` John Heffner 2006-10-17 5:35 ` David Miller 2006-10-18 15:37 ` [PATCH] Bound TSO defer time (resend) Andi Kleen 0 siblings, 2 replies; 32+ messages in thread From: John Heffner @ 2006-10-17 4:18 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev Stephen Hemminger wrote: > On Mon, 16 Oct 2006 20:53:20 -0400 (EDT) > John Heffner <jheffner@psc.edu> wrote: >> This patch limits the amount of time you will defer sending a TSO segment >> to less than two clock ticks, or the time between two acks, whichever is >> longer. > > Okay, but doing any timing on clock ticks makes the behavior dependent > on the value of HZ which doesn't seem desirable. Should this be based > on RTT or a real-time values? It would be nice to use a high res clock so you don't depend on HZ, but this is still expensive on most SMP arch's as I understand it. -John ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Bound TSO defer time (resend) 2006-10-17 4:18 ` John Heffner @ 2006-10-17 5:35 ` David Miller 2006-10-17 12:22 ` John Heffner 2006-10-17 12:58 ` [PATCH] [NET] Size listen hash tables using backlog hint Eric Dumazet Hi 2006-10-18 15:37 ` [PATCH] Bound TSO defer time (resend) Andi Kleen 1 sibling, 2 replies; 32+ messages in thread From: David Miller @ 2006-10-17 5:35 UTC (permalink / raw) To: jheffner; +Cc: shemminger, netdev From: John Heffner <jheffner@psc.edu> Date: Tue, 17 Oct 2006 00:18:33 -0400 > Stephen Hemminger wrote: > > On Mon, 16 Oct 2006 20:53:20 -0400 (EDT) > > John Heffner <jheffner@psc.edu> wrote: > > >> This patch limits the amount of time you will defer sending a TSO segment > >> to less than two clock ticks, or the time between two acks, whichever is > >> longer. > > > > > Okay, but doing any timing on clock ticks makes the behavior dependent > > on the value of HZ which doesn't seem desirable. Should this be based > > on RTT or a real-time values? > > It would be nice to use a high res clock so you don't depend on HZ, but > this is still expensive on most SMP arch's as I understand it. Right so we do need to use a jiffies based solution. Since HZ is variable, I have a feeling that the thing to do here is pick some timeout in msec. Then replace the "2 clock ticks" with some msec_to_jiffies() calls, bottoming out at 1 jiffie. How does that sound? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Bound TSO defer time (resend) 2006-10-17 5:35 ` David Miller @ 2006-10-17 12:22 ` John Heffner 2006-10-19 3:39 ` David Miller 2006-10-17 12:58 ` [PATCH] [NET] Size listen hash tables using backlog hint Eric Dumazet Hi 1 sibling, 1 reply; 32+ messages in thread From: John Heffner @ 2006-10-17 12:22 UTC (permalink / raw) To: David Miller; +Cc: shemminger, netdev David Miller wrote: > From: John Heffner <jheffner@psc.edu> > Date: Tue, 17 Oct 2006 00:18:33 -0400 > >> Stephen Hemminger wrote: >>> On Mon, 16 Oct 2006 20:53:20 -0400 (EDT) >>> John Heffner <jheffner@psc.edu> wrote: >>>> This patch limits the amount of time you will defer sending a TSO segment >>>> to less than two clock ticks, or the time between two acks, whichever is >>>> longer. >>> Okay, but doing any timing on clock ticks makes the behavior dependent >>> on the value of HZ which doesn't seem desirable. Should this be based >>> on RTT or a real-time values? >> It would be nice to use a high res clock so you don't depend on HZ, but >> this is still expensive on most SMP arch's as I understand it. > > Right so we do need to use a jiffies based solution. > > Since HZ is variable, I have a feeling that the thing to do here > is pick some timeout in msec. Then replace the "2 clock ticks" > with some msec_to_jiffies() calls, bottoming out at 1 jiffie. > > How does that sound? That's actually how I originally coded it. :) But then it occurred to me that if you've already been waiting for a full clock tick, the marginal CPU savings of waiting longer will not be great. Which is why I chose the value of 2 ticks so you're guaranteed to have waited at least one full tick. -John ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Bound TSO defer time (resend) 2006-10-17 12:22 ` John Heffner @ 2006-10-19 3:39 ` David Miller 0 siblings, 0 replies; 32+ messages in thread From: David Miller @ 2006-10-19 3:39 UTC (permalink / raw) To: jheffner; +Cc: shemminger, netdev From: John Heffner <jheffner@psc.edu> Date: Tue, 17 Oct 2006 08:22:11 -0400 > That's actually how I originally coded it. :) But then it occurred to > me that if you've already been waiting for a full clock tick, the > marginal CPU savings of waiting longer will not be great. Which is why > I chose the value of 2 ticks so you're guaranteed to have waited at > least one full tick. Fair enough, patch applied, thanks. BTW, like some other's using Thunderbird to send patches, lines with nothing but spaces are being corrupted into fully empty lines. In fact, in your patch some trailing whitespace of existing code lines were also eliminated by Thunderbird, further corrupting the patch. I fixed this all up by hand, but please try to get this fixed up for future submissions. Thanks. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-17 5:35 ` David Miller 2006-10-17 12:22 ` John Heffner @ 2006-10-17 12:58 ` Eric Dumazet Hi 2006-10-18 7:38 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS Eric Dumazet ` (2 more replies) 1 sibling, 3 replies; 32+ messages in thread From: Eric Dumazet Hi @ 2006-10-17 12:58 UTC (permalink / raw) To: David Miller; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 1891 bytes --] Hi David We currently allocate a fixed size 512 (TCP_SYNQ_HSIZE) slots hash table for each LISTEN socket, regardless of various parameters (listen backlog for example) On x86_64, this means order-1 allocations (might fail), even for 'small' sockets, expecting few connections. On the contrary, a huge server wanting a backlog of 50000 is slowed down a bit because of this fixed limit. This patch makes the sizing of listen hash table a dynamic parameter, depending of : - net.core.somaxconn tunable (/proc/sys/net/core/somaxconn , default is 128) - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128) - backlog value given by user application (2nd parameter of listen()) - and available LOWMEM ram reqsk_queue_alloc() goal is to use a power of two size for the whole listen_sock structure, to avoid wasting memory for large backlogs, meaning the hash table nr_table_entries is not anymore a power of two. (Hence one AND (nr_table_entries - 1) must be replaced by MODULO nr_table_entries) We still limit memory allocation with the two existing tunables (somaxconn & tcp_max_syn_backlog). In case memory allocation has problems, reqsk_queue_alloc() reduces the size of the hash table to allow a successfull listen() call, without giving feedback to user application, as this 'backlog' was advisory. Thank you include/net/request_sock.h | 8 ++++---- include/net/tcp.h | 1 - net/core/request_sock.c | 39 +++++++++++++++++++++++++++++---------- net/dccp/ipv4.c | 2 +- net/dccp/proto.c | 6 +++--- net/ipv4/af_inet.c | 2 +- net/ipv4/inet_connection_sock.c | 8 +++++--- net/ipv4/tcp_ipv4.c | 6 +++--- net/ipv6/tcp_ipv6.c | 2 +- 9 files changed, 47 insertions(+), 27 deletions(-) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: size_listen_hash_table.patch --] [-- Type: text/plain, Size: 7676 bytes --] --- linux-2.6.19-rc2/net/core/request_sock.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/core/request_sock.c 2006-10-17 14:47:48.000000000 +0200 @@ -29,29 +29,48 @@ * it is absolutely not enough even at 100conn/sec. 256 cures most * of problems. This value is adjusted to 128 for very small machines * (<=32Mb of memory) and to 1024 on normal or better ones (>=256Mb). - * Further increasing requires to change hash table size. */ int sysctl_max_syn_backlog = 256; int reqsk_queue_alloc(struct request_sock_queue *queue, - const int nr_table_entries) + u32 nr_entries) { - const int lopt_size = sizeof(struct listen_sock) + - nr_table_entries * sizeof(struct request_sock *); - struct listen_sock *lopt = kzalloc(lopt_size, GFP_KERNEL); + struct listen_sock *lopt; + size_t size = sizeof(struct listen_sock); - if (lopt == NULL) - return -ENOMEM; + nr_entries = min_t(u32, nr_entries, sysctl_max_syn_backlog); + nr_entries = max_t(u32, nr_entries, 8); + size += nr_entries*sizeof(struct request_sock *); + size = roundup_pow_of_two(size); + while (1) { + lopt = kzalloc(size, GFP_KERNEL); + if (lopt != NULL) + break; + size >>= 1; + if (size < sizeof(struct listen_sock) + + 8 * sizeof(struct request_sock *)) + return -ENOMEM; + } + lopt->nr_table_entries = (size - sizeof(struct listen_sock)) / + sizeof(struct request_sock *); - for (lopt->max_qlen_log = 6; - (1 << lopt->max_qlen_log) < sysctl_max_syn_backlog; + /* + * max_qlen_log computation is based on the backlog (nr_entries), + * not on actual hash size (lopt->nr_table_entries). + */ + for (lopt->max_qlen_log = 3; + (1 << lopt->max_qlen_log) < nr_entries; lopt->max_qlen_log++); get_random_bytes(&lopt->hash_rnd, sizeof(lopt->hash_rnd)); rwlock_init(&queue->syn_wait_lock); queue->rskq_accept_head = NULL; - lopt->nr_table_entries = nr_table_entries; + /* + * This write_lock_bh()/write_unlock_bh() pair forces this CPU to commit + * its memory changes and let readers (which acquire syn_wait_lock in + * reader mode) operate without seeing random content. + */ write_lock_bh(&queue->syn_wait_lock); queue->listen_opt = lopt; write_unlock_bh(&queue->syn_wait_lock); --- linux-2.6.19-rc2/net/ipv4/af_inet.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/ipv4/af_inet.c 2006-10-17 10:32:22.000000000 +0200 @@ -204,7 +204,7 @@ * we can only allow the backlog to be adjusted. */ if (old_state != TCP_LISTEN) { - err = inet_csk_listen_start(sk, TCP_SYNQ_HSIZE); + err = inet_csk_listen_start(sk, backlog); if (err) goto out; } --- linux-2.6.19-rc2/net/ipv4/tcp_ipv4.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/ipv4/tcp_ipv4.c 2006-10-17 12:19:38.000000000 +0200 @@ -715,7 +715,7 @@ return dopt; } -struct request_sock_ops tcp_request_sock_ops = { +struct request_sock_ops tcp_request_sock_ops __read_mostly = { .family = PF_INET, .obj_size = sizeof(struct tcp_request_sock), .rtx_syn_ack = tcp_v4_send_synack, @@ -1385,7 +1385,7 @@ if (st->state == TCP_SEQ_STATE_OPENREQ) { struct request_sock *req = cur; - icsk = inet_csk(st->syn_wait_sk); + icsk = inet_csk(st->syn_wait_sk); req = req->dl_next; while (1) { while (req) { @@ -1395,7 +1395,7 @@ } req = req->dl_next; } - if (++st->sbucket >= TCP_SYNQ_HSIZE) + if (++st->sbucket >= icsk->icsk_accept_queue.listen_opt->nr_table_entries) break; get_req: req = icsk->icsk_accept_queue.listen_opt->syn_table[st->sbucket]; --- linux-2.6.19-rc2/net/dccp/proto.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/dccp/proto.c 2006-10-17 10:32:22.000000000 +0200 @@ -262,12 +262,12 @@ EXPORT_SYMBOL_GPL(dccp_destroy_sock); -static inline int dccp_listen_start(struct sock *sk) +static inline int dccp_listen_start(struct sock *sk, int backlog) { struct dccp_sock *dp = dccp_sk(sk); dp->dccps_role = DCCP_ROLE_LISTEN; - return inet_csk_listen_start(sk, TCP_SYNQ_HSIZE); + return inet_csk_listen_start(sk, backlog); } int dccp_disconnect(struct sock *sk, int flags) @@ -788,7 +788,7 @@ * FIXME: here it probably should be sk->sk_prot->listen_start * see tcp_listen_start */ - err = dccp_listen_start(sk); + err = dccp_listen_start(sk, backlog); if (err) goto out; } --- linux-2.6.19-rc2/net/dccp/ipv4.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/dccp/ipv4.c 2006-10-17 10:44:21.000000000 +0200 @@ -1020,7 +1020,7 @@ kfree(inet_rsk(req)->opt); } -static struct request_sock_ops dccp_request_sock_ops = { +static struct request_sock_ops dccp_request_sock_ops _read_mostly = { .family = PF_INET, .obj_size = sizeof(struct dccp_request_sock), .rtx_syn_ack = dccp_v4_send_response, --- linux-2.6.19-rc2/net/ipv6/tcp_ipv6.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/ipv6/tcp_ipv6.c 2006-10-17 10:44:21.000000000 +0200 @@ -526,7 +526,7 @@ kfree_skb(inet6_rsk(req)->pktopts); } -static struct request_sock_ops tcp6_request_sock_ops = { +static struct request_sock_ops tcp6_request_sock_ops _read_mostly = { .family = AF_INET6, .obj_size = sizeof(struct tcp6_request_sock), .rtx_syn_ack = tcp_v6_send_synack, --- linux-2.6.19-rc2/net/ipv4/inet_connection_sock.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/ipv4/inet_connection_sock.c 2006-10-17 10:32:22.000000000 +0200 @@ -343,9 +343,9 @@ EXPORT_SYMBOL_GPL(inet_csk_route_req); static inline u32 inet_synq_hash(const __be32 raddr, const __be16 rport, - const u32 rnd, const u16 synq_hsize) + const u32 rnd, const u32 synq_hsize) { - return jhash_2words((__force u32)raddr, (__force u32)rport, rnd) & (synq_hsize - 1); + return jhash_2words((__force u32)raddr, (__force u32)rport, rnd) % synq_hsize; } #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) @@ -478,7 +478,9 @@ reqp = &req->dl_next; } - i = (i + 1) & (lopt->nr_table_entries - 1); + i++; + if (i == lopt->nr_table_entries) + i = 0; } while (--budget > 0); --- linux-2.6.19-rc2/include/net/tcp.h 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/include/net/tcp.h 2006-10-17 10:51:51.000000000 +0200 @@ -138,7 +138,6 @@ #define MAX_TCP_SYNCNT 127 #define TCP_SYNQ_INTERVAL (HZ/5) /* Period of SYNACK timer */ -#define TCP_SYNQ_HSIZE 512 /* Size of SYNACK hash table */ #define TCP_PAWS_24DAYS (60 * 60 * 24 * 24) #define TCP_PAWS_MSL 60 /* Per-host timestamps are invalidated --- linux-2.6.19-rc2/include/net/request_sock.h 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/include/net/request_sock.h 2006-10-17 12:33:18.000000000 +0200 @@ -28,8 +28,8 @@ struct request_sock_ops { int family; - kmem_cache_t *slab; int obj_size; + kmem_cache_t *slab; int (*rtx_syn_ack)(struct sock *sk, struct request_sock *req, struct dst_entry *dst); @@ -51,12 +51,12 @@ u32 rcv_wnd; /* rcv_wnd offered first time */ u32 ts_recent; unsigned long expires; - struct request_sock_ops *rsk_ops; + const struct request_sock_ops *rsk_ops; struct sock *sk; u32 secid; }; -static inline struct request_sock *reqsk_alloc(struct request_sock_ops *ops) +static inline struct request_sock *reqsk_alloc(const struct request_sock_ops *ops) { struct request_sock *req = kmem_cache_alloc(ops->slab, SLAB_ATOMIC); @@ -120,7 +120,7 @@ }; extern int reqsk_queue_alloc(struct request_sock_queue *queue, - const int nr_table_entries); + unsigned int nr_table_entries); static inline struct listen_sock *reqsk_queue_yank_listen_sk(struct request_sock_queue *queue) { ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS 2006-10-17 12:58 ` [PATCH] [NET] Size listen hash tables using backlog hint Eric Dumazet Hi @ 2006-10-18 7:38 ` Eric Dumazet 2006-10-18 16:35 ` [PATCH] [NET] reduce per cpu ram used for loopback stats Eric Dumazet ` (2 more replies) 2006-10-19 3:31 ` [PATCH] [NET] Size listen hash tables using backlog hint David Miller 2006-10-19 9:27 ` Eric Dumazet 2 siblings, 3 replies; 32+ messages in thread From: Eric Dumazet @ 2006-10-18 7:38 UTC (permalink / raw) To: David Miller; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 261 bytes --] Hi David Lot of routers still use CPUS with 32 bytes cache lines. (Intel PIII) It make sense to make sure fields used at lookup time are in the same cache line, to reduce cache footprint and speedup lookups. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: inetpeer_speedup.patch --] [-- Type: text/plain, Size: 773 bytes --] --- linux/include/net/inetpeer.h 2006-10-18 09:30:12.000000000 +0200 +++ linux-ed/include/net/inetpeer.h 2006-10-18 09:32:17.000000000 +0200 @@ -17,14 +17,15 @@ struct inet_peer { + /* group together avl_left,avl_right,v4daddr to speedup lookups */ struct inet_peer *avl_left, *avl_right; + __u32 v4daddr; /* peer's address */ + __u16 avl_height; + __u16 ip_id_count; /* IP ID for the next packet */ struct inet_peer *unused_next, **unused_prevp; __u32 dtime; /* the time of last use of not * referenced entries */ atomic_t refcnt; - __u32 v4daddr; /* peer's address */ - __u16 avl_height; - __u16 ip_id_count; /* IP ID for the next packet */ atomic_t rid; /* Frag reception counter */ __u32 tcp_ts; unsigned long tcp_ts_stamp; ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] [NET] reduce per cpu ram used for loopback stats 2006-10-18 7:38 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS Eric Dumazet @ 2006-10-18 16:35 ` Eric Dumazet 2006-10-18 17:00 ` [PATCH, resent] " Eric Dumazet 2006-10-19 3:53 ` [PATCH] " David Miller 2006-10-19 3:44 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS David Miller 2006-10-19 10:57 ` Eric Dumazet 2 siblings, 2 replies; 32+ messages in thread From: Eric Dumazet @ 2006-10-18 16:35 UTC (permalink / raw) To: David Miller; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 276 bytes --] We dont need a full struct net_device_stats (currently 23 long : 184 bytes on x86_64) per possible CPU, but only two counters : bytes and packets We save few CPU cycles too in loopback_xmit() not updating 4 fields, but 2. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: loopback.patch --] [-- Type: text/plain, Size: 1893 bytes --] --- linux/drivers/net/loopback.c 2006-10-18 17:28:20.000000000 +0200 +++ linux-eddrivers/net/loopback.c 2006-10-18 18:26:41.000000000 +0200 @@ -58,7 +58,11 @@ #include <linux/tcp.h> #include <linux/percpu.h> -static DEFINE_PER_CPU(struct net_device_stats, loopback_stats); +struct pcpu_lstats { + unsigned long packets; + unsigned long bytes; +}; +static DEFINE_PER_CPU(struct pcpu_lstats, pcpu_lstats); #define LOOPBACK_OVERHEAD (128 + MAX_HEADER + 16 + 16) @@ -128,7 +132,7 @@ */ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev) { - struct net_device_stats *lb_stats; + struct pcpu_lstats *lb_stats; skb_orphan(skb); @@ -149,11 +153,9 @@ #endif dev->last_rx = jiffies; - lb_stats = &per_cpu(loopback_stats, get_cpu()); - lb_stats->rx_bytes += skb->len; - lb_stats->tx_bytes = lb_stats->rx_bytes; - lb_stats->rx_packets++; - lb_stats->tx_packets = lb_stats->rx_packets; + lb_stats = &per_cpu(pcpu_lstats, get_cpu()); + lb_stats->bytes += skb->len; + lb_stats->packets++; put_cpu(); netif_rx(skb); @@ -166,20 +168,21 @@ static struct net_device_stats *get_stats(struct net_device *dev) { struct net_device_stats *stats = &loopback_stats; + unsigned long bytes = 0; + unsigned long packets = 0; int i; - memset(stats, 0, sizeof(struct net_device_stats)); - for_each_possible_cpu(i) { - struct net_device_stats *lb_stats; + const struct pcpu_lstats *lb_stats; - lb_stats = &per_cpu(loopback_stats, i); - stats->rx_bytes += lb_stats->rx_bytes; - stats->tx_bytes += lb_stats->tx_bytes; - stats->rx_packets += lb_stats->rx_packets; - stats->tx_packets += lb_stats->tx_packets; + lb_stats = &per_cpu(pcpu_lstats, i); + bytes += lb_stats->bytes; + packets += lb_stats->packets; } - + stats->rx_packets = packets; + stats->tx_packets = packets; + stats->rx_bytes = bytes; + stats->tx_bytes = bytes; return stats; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH, resent] [NET] reduce per cpu ram used for loopback stats 2006-10-18 16:35 ` [PATCH] [NET] reduce per cpu ram used for loopback stats Eric Dumazet @ 2006-10-18 17:00 ` Eric Dumazet 2006-10-19 3:53 ` David Miller 2006-10-19 3:53 ` [PATCH] " David Miller 1 sibling, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2006-10-18 17:00 UTC (permalink / raw) To: David Miller; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 403 bytes --] Sorry David, the previous attachment had a missing / in one filename [NET] reduce per cpu ram used for loopback device stats We dont need a full struct net_device_stats (currently 23 long : 184 bytes on x86_64) per possible CPU, but only two counters : bytes and packets We save few CPU cycles too in loopback_xmit() not updating 4 fields, but 2. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: loopback.patch --] [-- Type: text/plain, Size: 1894 bytes --] --- linux/drivers/net/loopback.c 2006-10-18 17:28:20.000000000 +0200 +++ linux-ed/drivers/net/loopback.c 2006-10-18 18:26:41.000000000 +0200 @@ -58,7 +58,11 @@ #include <linux/tcp.h> #include <linux/percpu.h> -static DEFINE_PER_CPU(struct net_device_stats, loopback_stats); +struct pcpu_lstats { + unsigned long packets; + unsigned long bytes; +}; +static DEFINE_PER_CPU(struct pcpu_lstats, pcpu_lstats); #define LOOPBACK_OVERHEAD (128 + MAX_HEADER + 16 + 16) @@ -128,7 +132,7 @@ */ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev) { - struct net_device_stats *lb_stats; + struct pcpu_lstats *lb_stats; skb_orphan(skb); @@ -149,11 +153,9 @@ #endif dev->last_rx = jiffies; - lb_stats = &per_cpu(loopback_stats, get_cpu()); - lb_stats->rx_bytes += skb->len; - lb_stats->tx_bytes = lb_stats->rx_bytes; - lb_stats->rx_packets++; - lb_stats->tx_packets = lb_stats->rx_packets; + lb_stats = &per_cpu(pcpu_lstats, get_cpu()); + lb_stats->bytes += skb->len; + lb_stats->packets++; put_cpu(); netif_rx(skb); @@ -166,20 +168,21 @@ static struct net_device_stats *get_stats(struct net_device *dev) { struct net_device_stats *stats = &loopback_stats; + unsigned long bytes = 0; + unsigned long packets = 0; int i; - memset(stats, 0, sizeof(struct net_device_stats)); - for_each_possible_cpu(i) { - struct net_device_stats *lb_stats; + const struct pcpu_lstats *lb_stats; - lb_stats = &per_cpu(loopback_stats, i); - stats->rx_bytes += lb_stats->rx_bytes; - stats->tx_bytes += lb_stats->tx_bytes; - stats->rx_packets += lb_stats->rx_packets; - stats->tx_packets += lb_stats->tx_packets; + lb_stats = &per_cpu(pcpu_lstats, i); + bytes += lb_stats->bytes; + packets += lb_stats->packets; } - + stats->rx_packets = packets; + stats->tx_packets = packets; + stats->rx_bytes = bytes; + stats->tx_bytes = bytes; return stats; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH, resent] [NET] reduce per cpu ram used for loopback stats 2006-10-18 17:00 ` [PATCH, resent] " Eric Dumazet @ 2006-10-19 3:53 ` David Miller 0 siblings, 0 replies; 32+ messages in thread From: David Miller @ 2006-10-19 3:53 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Wed, 18 Oct 2006 19:00:03 +0200 > Sorry David, the previous attachment had a missing / in one filename Hehe, and I read this after replying to you about that :-) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] reduce per cpu ram used for loopback stats 2006-10-18 16:35 ` [PATCH] [NET] reduce per cpu ram used for loopback stats Eric Dumazet 2006-10-18 17:00 ` [PATCH, resent] " Eric Dumazet @ 2006-10-19 3:53 ` David Miller 1 sibling, 0 replies; 32+ messages in thread From: David Miller @ 2006-10-19 3:53 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Wed, 18 Oct 2006 18:35:48 +0200 Applied Eric, but the file paths in your patch were bogus and needed to be fixed up: > --- linux/drivers/net/loopback.c 2006-10-18 17:28:20.000000000 +0200 > +++ linux-eddrivers/net/loopback.c 2006-10-18 18:26:41.000000000 +0200 This would never apply, since "-p1" patch treatment would use "net/loopback.c" as the patch which is obviously wrong and should be "drivers/net/loopback.c" I don't know what you use to generate patches, but you might stand to gain from using some automated tools for this :-) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS 2006-10-18 7:38 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS Eric Dumazet 2006-10-18 16:35 ` [PATCH] [NET] reduce per cpu ram used for loopback stats Eric Dumazet @ 2006-10-19 3:44 ` David Miller 2006-10-19 10:57 ` Eric Dumazet 2 siblings, 0 replies; 32+ messages in thread From: David Miller @ 2006-10-19 3:44 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Wed, 18 Oct 2006 09:38:38 +0200 > Lot of routers still use CPUS with 32 bytes cache lines. (Intel PIII) > It make sense to make sure fields used at lookup time are in the same cache > line, to reduce cache footprint and speedup lookups. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Looks fine but patch doesn't apply: - __u32 v4daddr; /* peer's address */ in my tree v4daddr is a __be32, not a __u32. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS 2006-10-18 7:38 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS Eric Dumazet 2006-10-18 16:35 ` [PATCH] [NET] reduce per cpu ram used for loopback stats Eric Dumazet 2006-10-19 3:44 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS David Miller @ 2006-10-19 10:57 ` Eric Dumazet 2006-10-19 15:45 ` [PATCH] [NET] One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() Eric Dumazet 2006-10-20 7:28 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS David Miller 2 siblings, 2 replies; 32+ messages in thread From: Eric Dumazet @ 2006-10-19 10:57 UTC (permalink / raw) To: David Miller; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 306 bytes --] Hi David Lot of routers/embedded devices still use CPUS with 16/32 bytes cache lines. (486, Pentium, ... PIII) It makes sense to group together fields used at lookup time so they fit in one cache line. This reduce cache footprint and speedup lookups. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: inetpeer_speedup.patch --] [-- Type: text/plain, Size: 781 bytes --] --- net-2.6/include/net/inetpeer.h 2006-10-19 12:50:29.000000000 +0200 +++ net-2.6-ed/include/net/inetpeer.h 2006-10-19 12:52:08.000000000 +0200 @@ -17,14 +17,15 @@ struct inet_peer { + /* group together avl_left,avl_right,v4daddr to speedup lookups */ struct inet_peer *avl_left, *avl_right; + __be32 v4daddr; /* peer's address */ + __u16 avl_height; + __u16 ip_id_count; /* IP ID for the next packet */ struct inet_peer *unused_next, **unused_prevp; __u32 dtime; /* the time of last use of not * referenced entries */ atomic_t refcnt; - __be32 v4daddr; /* peer's address */ - __u16 avl_height; - __u16 ip_id_count; /* IP ID for the next packet */ atomic_t rid; /* Frag reception counter */ __u32 tcp_ts; unsigned long tcp_ts_stamp; ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] [NET] One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() 2006-10-19 10:57 ` Eric Dumazet @ 2006-10-19 15:45 ` Eric Dumazet 2006-10-20 7:22 ` David Miller 2006-10-20 7:28 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS David Miller 1 sibling, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2006-10-19 15:45 UTC (permalink / raw) To: David Miller; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 150 bytes --] I believe this NET_INC_STATS() call can be replaced by NET_INC_STATS_BH(), a little bit cheaper. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: OUTOFWINDOWICMPS.patch --] [-- Type: text/plain, Size: 383 bytes --] --- linux/net/ipv4/tcp_ipv4.c.orig 2006-10-19 17:37:22.000000000 +0200 +++ linux-ed/net/ipv4/tcp_ipv4.c 2006-10-19 17:37:43.000000000 +0200 @@ -373,7 +373,7 @@ seq = ntohl(th->seq); if (sk->sk_state != TCP_LISTEN && !between(seq, tp->snd_una, tp->snd_nxt)) { - NET_INC_STATS(LINUX_MIB_OUTOFWINDOWICMPS); + NET_INC_STATS_BH(LINUX_MIB_OUTOFWINDOWICMPS); goto out; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() 2006-10-19 15:45 ` [PATCH] [NET] One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() Eric Dumazet @ 2006-10-20 7:22 ` David Miller 2006-10-20 14:21 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 32+ messages in thread From: David Miller @ 2006-10-20 7:22 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Thu, 19 Oct 2006 17:45:26 +0200 > I believe this NET_INC_STATS() call can be replaced by NET_INC_STATS_BH(), a > little bit cheaper. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Applied, although I hope tcp_v4_err() never becomes a fast path :-) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() 2006-10-20 7:22 ` David Miller @ 2006-10-20 14:21 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 32+ messages in thread From: Arnaldo Carvalho de Melo @ 2006-10-20 14:21 UTC (permalink / raw) To: David Miller; +Cc: dada1, netdev On 10/20/06, David Miller <davem@davemloft.net> wrote: > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Thu, 19 Oct 2006 17:45:26 +0200 > > > I believe this NET_INC_STATS() call can be replaced by NET_INC_STATS_BH(), a > > little bit cheaper. > > > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > > Applied, although I hope tcp_v4_err() never becomes a fast path :-) I'll queue a cset in my net-2.6 tree to do the equivalent for dccp_v4_err() :-) - Arnaldo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS 2006-10-19 10:57 ` Eric Dumazet 2006-10-19 15:45 ` [PATCH] [NET] One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() Eric Dumazet @ 2006-10-20 7:28 ` David Miller 1 sibling, 0 replies; 32+ messages in thread From: David Miller @ 2006-10-20 7:28 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Thu, 19 Oct 2006 12:57:42 +0200 > Lot of routers/embedded devices still use CPUS with 16/32 bytes cache lines. > (486, Pentium, ... PIII) > It makes sense to group together fields used at lookup time so they fit in one > cache line. > This reduce cache footprint and speedup lookups. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Applied, thanks Eric. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-17 12:58 ` [PATCH] [NET] Size listen hash tables using backlog hint Eric Dumazet Hi 2006-10-18 7:38 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS Eric Dumazet @ 2006-10-19 3:31 ` David Miller 2006-10-19 4:54 ` Stephen Hemminger 2006-10-19 5:12 ` Eric Dumazet 2006-10-19 9:27 ` Eric Dumazet 2 siblings, 2 replies; 32+ messages in thread From: David Miller @ 2006-10-19 3:31 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet Hi <dada1@cosmosbay.com> Date: Tue, 17 Oct 2006 14:58:37 +0200 > reqsk_queue_alloc() goal is to use a power of two size for the whole > listen_sock structure, to avoid wasting memory for large backlogs, > meaning the hash table nr_table_entries is not anymore a power of > two. (Hence one AND (nr_table_entries - 1) must be replaced by > MODULO nr_table_entries) Modulus can be very expensive for some small/slow cpus. Please round down to a power-of-2 instead of up if you think the wastage really matters. Thanks. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 3:31 ` [PATCH] [NET] Size listen hash tables using backlog hint David Miller @ 2006-10-19 4:54 ` Stephen Hemminger 2006-10-19 5:08 ` David Miller 2006-10-19 5:12 ` Eric Dumazet 1 sibling, 1 reply; 32+ messages in thread From: Stephen Hemminger @ 2006-10-19 4:54 UTC (permalink / raw) To: David Miller; +Cc: dada1, netdev David Miller wrote: > From: Eric Dumazet Hi <dada1@cosmosbay.com> > Date: Tue, 17 Oct 2006 14:58:37 +0200 > > >> reqsk_queue_alloc() goal is to use a power of two size for the whole >> listen_sock structure, to avoid wasting memory for large backlogs, >> meaning the hash table nr_table_entries is not anymore a power of >> two. (Hence one AND (nr_table_entries - 1) must be replaced by >> MODULO nr_table_entries) >> > > Modulus can be very expensive for some small/slow cpus. Please round > down to a power-of-2 instead of up if you think the wastage really > matters. > Reminds me, anyone know why GCC is too stupid to convert modulus of a constant power of 2 into a mask operation? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 4:54 ` Stephen Hemminger @ 2006-10-19 5:08 ` David Miller 0 siblings, 0 replies; 32+ messages in thread From: David Miller @ 2006-10-19 5:08 UTC (permalink / raw) To: shemminger; +Cc: dada1, netdev From: Stephen Hemminger <shemminger@osdl.org> Date: Wed, 18 Oct 2006 21:54:08 -0700 > Reminds me, anyone know why GCC is too stupid to convert modulus of a > constant power of 2 into a mask operation? If the computation ends up being signed it can't perform this optimization. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 3:31 ` [PATCH] [NET] Size listen hash tables using backlog hint David Miller 2006-10-19 4:54 ` Stephen Hemminger @ 2006-10-19 5:12 ` Eric Dumazet 2006-10-19 6:12 ` David Miller 1 sibling, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2006-10-19 5:12 UTC (permalink / raw) To: David Miller; +Cc: netdev David Miller a écrit : > From: Eric Dumazet Hi <dada1@cosmosbay.com> > Date: Tue, 17 Oct 2006 14:58:37 +0200 > >> reqsk_queue_alloc() goal is to use a power of two size for the whole >> listen_sock structure, to avoid wasting memory for large backlogs, >> meaning the hash table nr_table_entries is not anymore a power of >> two. (Hence one AND (nr_table_entries - 1) must be replaced by >> MODULO nr_table_entries) > > Modulus can be very expensive for some small/slow cpus. Please round > down to a power-of-2 instead of up if you think the wastage really > matters. > > Thanks. I am not sure I understand your points. Rounding up or down still need the modulus. Only the size changes by a two factor. I feel you want me to remove the modulus, thats unrelated to rounding. A 66 MHz 486 can perform 1.000.000 divisions per second. Is it a 'slow' cpu ? If we stay with a power-of-two, say 2^X hash slots, using (2^X)*sizeof(void*), the extra bits added by struct listen_sock will *need* the same amount of memory, because of kmalloc() alignment to next power-of-two. That basically wastes half of the ram taken by struct listen_sock allocation, unless we add yet another pointer to hash table and do two kmallocs(), one for pure power-of-two hash table, one for struct listen_sock. If we keep current scheme, the current max kmalloc size of 131072 bytes would limit us to 65536 bytes for the hash table itself, so 8192 slots on 64bits platforms. I was expecting to use a 16380 slots hash size instead. The modulus is done on two places : inet_csk_search_req() : called from tcp_v4_err()/dccp_v4_err() only after checks. Frequency of such events is rather low. tcp_v4_hnd_req() : called from tcp_v4_do_rcv() for TCP_LISTEN state. Frequency of such events is rather low, especially on machines driven by small/slow cpus... inet_csk_reqsk_queue_hash_add()called from tcp_v4_conn_request() when a new connection attempt is stored in hash table. Thats in normal conditions two modulus done per new tcp/dccp sessions establishments. In DOS situation, I doubt the extra cycles will do any difference. So... what do you prefer : 1) Keep the modulus 2) allocate two blocks of ram (powser-of -two hash size, but one extra indirection) 3) waste near half of ram because one block allocated, and power-of-two hash size. Thank you Eric ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 5:12 ` Eric Dumazet @ 2006-10-19 6:12 ` David Miller 2006-10-19 6:34 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: David Miller @ 2006-10-19 6:12 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Thu, 19 Oct 2006 07:12:58 +0200 > A 66 MHz 486 can perform 1.000.000 divisions per second. Is it a 'slow' cpu ? Sparc and some other embedded chips have no division/modulus integer instruction and do it in software. > So... what do you prefer : > > 1) Keep the modulus > 2) allocate two blocks of ram (powser-of -two hash size, but one extra > indirection) > 3) waste near half of ram because one block allocated, and power-of-two hash size. I thought the problem was that you use a modulus and non-power-of-2 hash table size because rounding up to the next power of 2 wastes a lot of space? Given that, my suggestion is simply to not round up to the next power-of-2, or only do so when we are very very close to that next power-of-2. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 6:12 ` David Miller @ 2006-10-19 6:34 ` Eric Dumazet 2006-10-19 6:57 ` David Miller 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2006-10-19 6:34 UTC (permalink / raw) To: David Miller; +Cc: netdev David Miller a écrit : > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Thu, 19 Oct 2006 07:12:58 +0200 > >> A 66 MHz 486 can perform 1.000.000 divisions per second. Is it a 'slow' cpu ? > > Sparc and some other embedded chips have no division/modulus integer > instruction and do it in software. How many times this division will be done ? As I said, tcp session establishment. Are you aware a division is done in slab code when you kfree() one network frames ? That is much more problematic than SYN packets. > >> So... what do you prefer : >> >> 1) Keep the modulus >> 2) allocate two blocks of ram (powser-of -two hash size, but one extra >> indirection) >> 3) waste near half of ram because one block allocated, and power-of-two hash size. > > I thought the problem was that you use a modulus and non-power-of-2 > hash table size because rounding up to the next power of 2 wastes > a lot of space? Given that, my suggestion is simply to not round > up to the next power-of-2, or only do so when we are very very close > to that next power-of-2. My main problem is being able to use a large hash table on big servers. With power-of two constraint, plus kmalloc max size constraint, we can use half the size we could. Are you suggesting something like : Allocation time: ---------------- if (cpu is very_very_slow or hash size small_enough) { ptr->size = power_of_too; ptr->size_mask = (power_of_two - 1); } else { ptr->size = somevalue; ptr->size_mask = ~0; } Lookup time : --------------- if (ptr->size_mask != ~0) slot = hash & ptr->size_mask; else slot = hash % ptr->size; The extra conditional branch may be more expensive than just doing division on 99% of cpus... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 6:34 ` Eric Dumazet @ 2006-10-19 6:57 ` David Miller 2006-10-19 8:29 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: David Miller @ 2006-10-19 6:57 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Thu, 19 Oct 2006 08:34:53 +0200 > My main problem is being able to use a large hash table on big servers. > > With power-of two constraint, plus kmalloc max size constraint, we can use > half the size we could. Switch to vmalloc() at the kmalloc() cut-off point, just like I did for the other hashes in the tree. > Are you suggesting something like : Not at all. BTW, this all reminds me that we need to be careful that this isn't allowing arbitrary users to eat up a ton of unswappable ram. It's pretty easy to open up a lot of listening sockets :) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 6:57 ` David Miller @ 2006-10-19 8:29 ` Eric Dumazet 2006-10-19 8:41 ` David Miller 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2006-10-19 8:29 UTC (permalink / raw) To: David Miller; +Cc: netdev On Thursday 19 October 2006 08:57, David Miller wrote: > Switch to vmalloc() at the kmalloc() cut-off point, just like > I did for the other hashes in the tree. Yes, so you basically want option 4) :) 4) Use vmalloc() if size_lopt > PAGE_SIZE keep a power_of two : nr_table_entries = 2 ^ X; size_lopt = sizeof(listen_sock) + nr_table_entries*sizeof(void*); if (size > PAGE_SIZE) ptr = vmalloc(size_lopt); else ptr = kmalloc(size_lopt); Pros : Only under one page is wasted (ie allocated but not used) vmalloc() is nicer for NUMA, so I am pleased :) vmalloc() has more chances to succeed when memory is fragmented keep a power-of-two hash table size Cons : TLB cost // for reference struct listen_sock { u8 max_qlen_log; /* 3 bytes hole, try to use */ int qlen; int qlen_young; int clock_hand; u32 hash_rnd; u32 nr_table_entries; struct request_sock *syn_table[0]; /* hash table follow this header */ }; > BTW, this all reminds me that we need to be careful that this > isn't allowing arbitrary users to eat up a ton of unswappable > ram. It's pretty easy to open up a lot of listening sockets :) With actual somaxconn=128 limit, my patch ends in allocating less ram (half of a page) than current x86_64 kernel (2 pages) Thank you ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 8:29 ` Eric Dumazet @ 2006-10-19 8:41 ` David Miller 2006-10-19 9:11 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: David Miller @ 2006-10-19 8:41 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Thu, 19 Oct 2006 10:29:00 +0200 > Cons : > TLB cost For those hot x86 and x86_64 cpus you tend to be using, this particular cost is relatively small. :-) It's effectively like another memory reference in the worst case, in the best case it's "free". > With actual somaxconn=128 limit, my patch ends in allocating less > ram (half of a page) than current x86_64 kernel (2 pages) Understood. But the issue is that there are greater security implications than before when increasing this sysctl. To be honest, it's probably water under the bridge, because if you can stuff up SOMAXCONN number of sockets into the system per listening socket which is a lot more than the hash table eats up. :-) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 8:41 ` David Miller @ 2006-10-19 9:11 ` Eric Dumazet 0 siblings, 0 replies; 32+ messages in thread From: Eric Dumazet @ 2006-10-19 9:11 UTC (permalink / raw) To: David Miller; +Cc: netdev On Thursday 19 October 2006 10:41, David Miller wrote: > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Thu, 19 Oct 2006 10:29:00 +0200 > > > Cons : > > TLB cost > > For those hot x86 and x86_64 cpus you tend to be using, this > particular cost is relatively small. :-) It's effectively like > another memory reference in the worst case, in the best case > it's "free". Well, it was a private joke with you, as you *use* machines that take a fault on a TLB miss :) BTW I do care of old machines too... ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-17 12:58 ` [PATCH] [NET] Size listen hash tables using backlog hint Eric Dumazet Hi 2006-10-18 7:38 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS Eric Dumazet 2006-10-19 3:31 ` [PATCH] [NET] Size listen hash tables using backlog hint David Miller @ 2006-10-19 9:27 ` Eric Dumazet 2006-10-20 7:27 ` David Miller 2 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2006-10-19 9:27 UTC (permalink / raw) To: David Miller; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 1562 bytes --] Hi David Here is the second try for this patch. Many thanks for your feedback. [PATCH] [NET] Size listen hash tables using backlog hint We currently allocate a fixed size 512 (TCP_SYNQ_HSIZE) slots hash table for each LISTEN socket, regardless of various parameters (listen backlog for example) On x86_64, this means order-1 allocations (might fail), even for 'small' sockets, expecting few connections. On the contrary, a huge server wanting a backlog of 50000 is slowed down a bit because of this fixed limit. This patch makes the sizing of listen hash table a dynamic parameter, depending of : - net.core.somaxconn tunable (default is 128) - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128) - backlog value given by user application (2nd parameter of listen()) For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of kmalloc(). We still limit memory allocation with the two existing tunables (somaxconn & tcp_max_syn_backlog). include/net/request_sock.h | 8 ++++---- include/net/tcp.h | 1 - net/core/request_sock.c | 38 +++++++++++++++++++++++++++++--------- net/dccp/ipv4.c | 2 +- net/dccp/proto.c | 6 +++--- net/ipv4/af_inet.c | 2 +- net/ipv4/inet_connection_sock.c | 2 +- net/ipv4/tcp_ipv4.c | 6 +++--- net/ipv6/tcp_ipv6.c | 2 +- 9 files changed, 43 insertions(+), 24 deletions(-) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: size_listen_hash_table.patch --] [-- Type: text/plain, Size: 7933 bytes --] --- linux-2.6.19-rc2/net/core/request_sock.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/core/request_sock.c 2006-10-19 11:05:56.000000000 +0200 @@ -15,6 +15,7 @@ #include <linux/random.h> #include <linux/slab.h> #include <linux/string.h> +#include <linux/vmalloc.h> #include <net/request_sock.h> @@ -29,22 +30,31 @@ * it is absolutely not enough even at 100conn/sec. 256 cures most * of problems. This value is adjusted to 128 for very small machines * (<=32Mb of memory) and to 1024 on normal or better ones (>=256Mb). - * Further increasing requires to change hash table size. + * Note : Dont forget somaxconn that may limit backlog too. */ int sysctl_max_syn_backlog = 256; int reqsk_queue_alloc(struct request_sock_queue *queue, - const int nr_table_entries) + unsigned int nr_table_entries) { - const int lopt_size = sizeof(struct listen_sock) + - nr_table_entries * sizeof(struct request_sock *); - struct listen_sock *lopt = kzalloc(lopt_size, GFP_KERNEL); + size_t lopt_size = sizeof(struct listen_sock); + struct listen_sock *lopt; + nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog); + nr_table_entries = max_t(u32, nr_table_entries, 8); + nr_table_entries = roundup_pow_of_two(nr_table_entries + 1); + lopt_size += nr_table_entries * sizeof(struct request_sock *); + if (lopt_size > PAGE_SIZE) + lopt = __vmalloc(lopt_size, + GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO, + PAGE_KERNEL); + else + lopt = kzalloc(lopt_size, GFP_KERNEL); if (lopt == NULL) return -ENOMEM; - for (lopt->max_qlen_log = 6; - (1 << lopt->max_qlen_log) < sysctl_max_syn_backlog; + for (lopt->max_qlen_log = 3; + (1 << lopt->max_qlen_log) < nr_table_entries; lopt->max_qlen_log++); get_random_bytes(&lopt->hash_rnd, sizeof(lopt->hash_rnd)); @@ -52,6 +62,11 @@ queue->rskq_accept_head = NULL; lopt->nr_table_entries = nr_table_entries; + /* + * This write_lock_bh()/write_unlock_bh() pair forces this CPU to commit + * its memory changes and let readers (which acquire syn_wait_lock in + * reader mode) operate without seeing random content. + */ write_lock_bh(&queue->syn_wait_lock); queue->listen_opt = lopt; write_unlock_bh(&queue->syn_wait_lock); @@ -65,9 +80,11 @@ { /* make all the listen_opt local to us */ struct listen_sock *lopt = reqsk_queue_yank_listen_sk(queue); + size_t lopt_size = sizeof(struct listen_sock) + + lopt->nr_table_entries * sizeof(struct request_sock *); if (lopt->qlen != 0) { - int i; + unsigned int i; for (i = 0; i < lopt->nr_table_entries; i++) { struct request_sock *req; @@ -81,7 +98,10 @@ } BUG_TRAP(lopt->qlen == 0); - kfree(lopt); + if (lopt_size > PAGE_SIZE) + vfree(lopt); + else + kfree(lopt); } EXPORT_SYMBOL(reqsk_queue_destroy); --- linux-2.6.19-rc2/net/ipv4/af_inet.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/ipv4/af_inet.c 2006-10-17 10:32:22.000000000 +0200 @@ -204,7 +204,7 @@ * we can only allow the backlog to be adjusted. */ if (old_state != TCP_LISTEN) { - err = inet_csk_listen_start(sk, TCP_SYNQ_HSIZE); + err = inet_csk_listen_start(sk, backlog); if (err) goto out; } --- linux-2.6.19-rc2/net/ipv4/tcp_ipv4.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/ipv4/tcp_ipv4.c 2006-10-17 12:19:38.000000000 +0200 @@ -715,7 +715,7 @@ return dopt; } -struct request_sock_ops tcp_request_sock_ops = { +struct request_sock_ops tcp_request_sock_ops __read_mostly = { .family = PF_INET, .obj_size = sizeof(struct tcp_request_sock), .rtx_syn_ack = tcp_v4_send_synack, @@ -1385,7 +1385,7 @@ if (st->state == TCP_SEQ_STATE_OPENREQ) { struct request_sock *req = cur; - icsk = inet_csk(st->syn_wait_sk); + icsk = inet_csk(st->syn_wait_sk); req = req->dl_next; while (1) { while (req) { @@ -1395,7 +1395,7 @@ } req = req->dl_next; } - if (++st->sbucket >= TCP_SYNQ_HSIZE) + if (++st->sbucket >= icsk->icsk_accept_queue.listen_opt->nr_table_entries) break; get_req: req = icsk->icsk_accept_queue.listen_opt->syn_table[st->sbucket]; --- linux-2.6.19-rc2/net/dccp/proto.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/dccp/proto.c 2006-10-17 10:32:22.000000000 +0200 @@ -262,12 +262,12 @@ EXPORT_SYMBOL_GPL(dccp_destroy_sock); -static inline int dccp_listen_start(struct sock *sk) +static inline int dccp_listen_start(struct sock *sk, int backlog) { struct dccp_sock *dp = dccp_sk(sk); dp->dccps_role = DCCP_ROLE_LISTEN; - return inet_csk_listen_start(sk, TCP_SYNQ_HSIZE); + return inet_csk_listen_start(sk, backlog); } int dccp_disconnect(struct sock *sk, int flags) @@ -788,7 +788,7 @@ * FIXME: here it probably should be sk->sk_prot->listen_start * see tcp_listen_start */ - err = dccp_listen_start(sk); + err = dccp_listen_start(sk, backlog); if (err) goto out; } --- linux-2.6.19-rc2/net/dccp/ipv4.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/dccp/ipv4.c 2006-10-17 10:44:21.000000000 +0200 @@ -1020,7 +1020,7 @@ kfree(inet_rsk(req)->opt); } -static struct request_sock_ops dccp_request_sock_ops = { +static struct request_sock_ops dccp_request_sock_ops _read_mostly = { .family = PF_INET, .obj_size = sizeof(struct dccp_request_sock), .rtx_syn_ack = dccp_v4_send_response, --- linux-2.6.19-rc2/net/ipv6/tcp_ipv6.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/ipv6/tcp_ipv6.c 2006-10-17 10:44:21.000000000 +0200 @@ -526,7 +526,7 @@ kfree_skb(inet6_rsk(req)->pktopts); } -static struct request_sock_ops tcp6_request_sock_ops = { +static struct request_sock_ops tcp6_request_sock_ops _read_mostly = { .family = AF_INET6, .obj_size = sizeof(struct tcp6_request_sock), .rtx_syn_ack = tcp_v6_send_synack, --- linux-2.6.19-rc2/net/ipv4/inet_connection_sock.c 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/net/ipv4/inet_connection_sock.c 2006-10-19 10:51:26.000000000 +0200 @@ -343,7 +343,7 @@ EXPORT_SYMBOL_GPL(inet_csk_route_req); static inline u32 inet_synq_hash(const __be32 raddr, const __be16 rport, - const u32 rnd, const u16 synq_hsize) + const u32 rnd, const u32 synq_hsize) { return jhash_2words((__force u32)raddr, (__force u32)rport, rnd) & (synq_hsize - 1); } --- linux-2.6.19-rc2/include/net/tcp.h 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/include/net/tcp.h 2006-10-17 10:51:51.000000000 +0200 @@ -138,7 +138,6 @@ #define MAX_TCP_SYNCNT 127 #define TCP_SYNQ_INTERVAL (HZ/5) /* Period of SYNACK timer */ -#define TCP_SYNQ_HSIZE 512 /* Size of SYNACK hash table */ #define TCP_PAWS_24DAYS (60 * 60 * 24 * 24) #define TCP_PAWS_MSL 60 /* Per-host timestamps are invalidated --- linux-2.6.19-rc2/include/net/request_sock.h 2006-10-13 18:25:04.000000000 +0200 +++ linux-2.6.19-rc2-ed/include/net/request_sock.h 2006-10-17 12:33:18.000000000 +0200 @@ -28,8 +28,8 @@ struct request_sock_ops { int family; - kmem_cache_t *slab; int obj_size; + kmem_cache_t *slab; int (*rtx_syn_ack)(struct sock *sk, struct request_sock *req, struct dst_entry *dst); @@ -51,12 +51,12 @@ u32 rcv_wnd; /* rcv_wnd offered first time */ u32 ts_recent; unsigned long expires; - struct request_sock_ops *rsk_ops; + const struct request_sock_ops *rsk_ops; struct sock *sk; u32 secid; }; -static inline struct request_sock *reqsk_alloc(struct request_sock_ops *ops) +static inline struct request_sock *reqsk_alloc(const struct request_sock_ops *ops) { struct request_sock *req = kmem_cache_alloc(ops->slab, SLAB_ATOMIC); @@ -120,7 +120,7 @@ }; extern int reqsk_queue_alloc(struct request_sock_queue *queue, - const int nr_table_entries); + unsigned int nr_table_entries); static inline struct listen_sock *reqsk_queue_yank_listen_sk(struct request_sock_queue *queue) { ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [NET] Size listen hash tables using backlog hint 2006-10-19 9:27 ` Eric Dumazet @ 2006-10-20 7:27 ` David Miller 0 siblings, 0 replies; 32+ messages in thread From: David Miller @ 2006-10-20 7:27 UTC (permalink / raw) To: dada1; +Cc: netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Thu, 19 Oct 2006 11:27:50 +0200 > Here is the second try for this patch. Many thanks for your feedback. > > [PATCH] [NET] Size listen hash tables using backlog hint This version looks very good. It's not a major bug fix (obviously) so we'll have to defer it to 2.6.20, so please resubmit once I open up the net-2.6.20 tree. Thanks a lot! ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Bound TSO defer time (resend) 2006-10-17 4:18 ` John Heffner 2006-10-17 5:35 ` David Miller @ 2006-10-18 15:37 ` Andi Kleen 2006-10-18 16:40 ` Stephen Hemminger 1 sibling, 1 reply; 32+ messages in thread From: Andi Kleen @ 2006-10-18 15:37 UTC (permalink / raw) To: John Heffner; +Cc: Stephen Hemminger, netdev On Tuesday 17 October 2006 06:18, John Heffner wrote: > Stephen Hemminger wrote: > > On Mon, 16 Oct 2006 20:53:20 -0400 (EDT) > > John Heffner <jheffner@psc.edu> wrote: > > >> This patch limits the amount of time you will defer sending a TSO segment > >> to less than two clock ticks, or the time between two acks, whichever is > >> longer. > > > > > Okay, but doing any timing on clock ticks makes the behavior dependent > > on the value of HZ which doesn't seem desirable. Should this be based > > on RTT or a real-time values? > > It would be nice to use a high res clock so you don't depend on HZ, but > this is still expensive on most SMP arch's as I understand it. You can always use xtime. It doesn't have better solution than jiffies though, but it gives you real time. Drawback is that there is some work towards tickless kernels and with that xtime will be more expensive again. But hopefully not by that much. -Andi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] Bound TSO defer time (resend) 2006-10-18 15:37 ` [PATCH] Bound TSO defer time (resend) Andi Kleen @ 2006-10-18 16:40 ` Stephen Hemminger 0 siblings, 0 replies; 32+ messages in thread From: Stephen Hemminger @ 2006-10-18 16:40 UTC (permalink / raw) To: Andi Kleen; +Cc: John Heffner, netdev On Wed, 18 Oct 2006 17:37:36 +0200 Andi Kleen <ak@suse.de> wrote: > On Tuesday 17 October 2006 06:18, John Heffner wrote: > > Stephen Hemminger wrote: > > > On Mon, 16 Oct 2006 20:53:20 -0400 (EDT) > > > John Heffner <jheffner@psc.edu> wrote: > > > > >> This patch limits the amount of time you will defer sending a TSO segment > > >> to less than two clock ticks, or the time between two acks, whichever is > > >> longer. > > > > > > > > Okay, but doing any timing on clock ticks makes the behavior dependent > > > on the value of HZ which doesn't seem desirable. Should this be based > > > on RTT or a real-time values? > > > > It would be nice to use a high res clock so you don't depend on HZ, but > > this is still expensive on most SMP arch's as I understand it. > > You can always use xtime. It doesn't have better solution than jiffies > though, but it gives you real time. > > Drawback is that there is some work towards tickless kernels and with > that xtime will be more expensive again. But hopefully not by that much. > > -Andi Actually the thing to use now is ktime. It would then be compatiable with hrtimers. But it seems a bit of overkill in this case. -- Stephen Hemminger <shemminger@osdl.org> ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2006-10-20 14:21 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-17 0:53 [PATCH] Bound TSO defer time (resend) John Heffner 2006-10-17 3:20 ` Stephen Hemminger 2006-10-17 4:18 ` John Heffner 2006-10-17 5:35 ` David Miller 2006-10-17 12:22 ` John Heffner 2006-10-19 3:39 ` David Miller 2006-10-17 12:58 ` [PATCH] [NET] Size listen hash tables using backlog hint Eric Dumazet Hi 2006-10-18 7:38 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS Eric Dumazet 2006-10-18 16:35 ` [PATCH] [NET] reduce per cpu ram used for loopback stats Eric Dumazet 2006-10-18 17:00 ` [PATCH, resent] " Eric Dumazet 2006-10-19 3:53 ` David Miller 2006-10-19 3:53 ` [PATCH] " David Miller 2006-10-19 3:44 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS David Miller 2006-10-19 10:57 ` Eric Dumazet 2006-10-19 15:45 ` [PATCH] [NET] One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() Eric Dumazet 2006-10-20 7:22 ` David Miller 2006-10-20 14:21 ` Arnaldo Carvalho de Melo 2006-10-20 7:28 ` [PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS David Miller 2006-10-19 3:31 ` [PATCH] [NET] Size listen hash tables using backlog hint David Miller 2006-10-19 4:54 ` Stephen Hemminger 2006-10-19 5:08 ` David Miller 2006-10-19 5:12 ` Eric Dumazet 2006-10-19 6:12 ` David Miller 2006-10-19 6:34 ` Eric Dumazet 2006-10-19 6:57 ` David Miller 2006-10-19 8:29 ` Eric Dumazet 2006-10-19 8:41 ` David Miller 2006-10-19 9:11 ` Eric Dumazet 2006-10-19 9:27 ` Eric Dumazet 2006-10-20 7:27 ` David Miller 2006-10-18 15:37 ` [PATCH] Bound TSO defer time (resend) Andi Kleen 2006-10-18 16:40 ` Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).