* [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6 @ 2007-09-20 1:39 Tom Quetchenbach 2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach 2007-09-21 14:08 ` [PATCH 0/2] " Ilpo Järvinen 0 siblings, 2 replies; 7+ messages in thread From: Tom Quetchenbach @ 2007-09-20 1:39 UTC (permalink / raw) To: netdev Hello, I've been experimenting with David Miller's red-black tree patch for SACK processing. We're sending TCP traffic between two machines with 10Gbps cards over a 1Gbps bottleneck link and were getting very high CPU load with large windows. With a few tweaks, this patch seems to provide a pretty substantial improvement. David: this seems like excellent work so far. Here are a couple of patches against 2.6.22.6. The first one is just David's patches tweaked for 2.6.22.6, with a couple of minor bugfixes to get it to compile and not crash. (I also changed __tcp_insert_write_queue_tail() to set the fack_count of the new packet to the fack_count of the tail plus the packet count of the tail, not the packet count of the new skb, because I think that's how it was intended to be. Right? In the second patch there are a couple of significant changes. One is (as Baruch suggested) to modify the existing SACK fast path so that we don't tag packets we've already tagged when we advance by a packet. The other issue is that the cached fack_counts seem to be wrong, because they're set when we insert into the queue, but tcp_set_tso_segs() is called later, just before we send, so all the fack_counts are zero. My solution was to set the fack_count when we advance the send_head. Also I changed tcp_reset_fack_counts() so that it exits when it hits an skb whose tcp_skb_pcount() is zero or whose fack_count is already correct. (This really helps when TSO is on, since there's lots of inserting into the middle of the queue.) Please let me know how I can help get this tested and debugged. Reducing the SACK processing load is really going to be essential for us to start testing experimental TCP variants with large windows. Thanks -Tom ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] David Miller's rbtree patches for 2.6.22.6 2007-09-20 1:39 [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6 Tom Quetchenbach @ 2007-09-20 1:42 ` Tom Quetchenbach 2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach ` (2 more replies) 2007-09-21 14:08 ` [PATCH 0/2] " Ilpo Järvinen 1 sibling, 3 replies; 7+ messages in thread From: Tom Quetchenbach @ 2007-09-20 1:42 UTC (permalink / raw) To: netdev Patch 1: David Miller's red-black tree code, tweaked for 2.6.22.6, with some bugfixes -Tom --- diff -ur linux-2.6.22.6/include/linux/skbuff.h linux-2.6.22.6-rbtree-davem-fixed/include/linux/skbuff.h --- linux-2.6.22.6/include/linux/skbuff.h 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/include/linux/skbuff.h 2007-09-13 18:23:16.000000000 -0700 @@ -18,6 +18,7 @@ #include <linux/compiler.h> #include <linux/time.h> #include <linux/cache.h> +#include <linux/rbtree.h> #include <asm/atomic.h> #include <asm/types.h> @@ -242,6 +243,8 @@ struct sk_buff *next; struct sk_buff *prev; + struct rb_node rb; + struct sock *sk; ktime_t tstamp; struct net_device *dev; diff -ur linux-2.6.22.6/include/linux/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/linux/tcp.h --- linux-2.6.22.6/include/linux/tcp.h 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/include/linux/tcp.h 2007-09-13 18:23:16.000000000 -0700 @@ -174,6 +174,7 @@ #include <linux/skbuff.h> #include <linux/dmaengine.h> +#include <linux/rbtree.h> #include <net/sock.h> #include <net/inet_connection_sock.h> #include <net/inet_timewait_sock.h> @@ -321,6 +322,7 @@ u32 snd_cwnd_used; u32 snd_cwnd_stamp; + struct rb_root write_queue_rb; struct sk_buff_head out_of_order_queue; /* Out of order segments go here */ u32 rcv_wnd; /* Current receiver window */ @@ -339,9 +341,7 @@ struct sk_buff *scoreboard_skb_hint; struct sk_buff *retransmit_skb_hint; struct sk_buff *forward_skb_hint; - struct sk_buff *fastpath_skb_hint; - int fastpath_cnt_hint; int lost_cnt_hint; int retransmit_cnt_hint; int forward_cnt_hint; diff -ur linux-2.6.22.6/include/net/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h --- linux-2.6.22.6/include/net/tcp.h 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700 @@ -540,6 +540,7 @@ __u32 seq; /* Starting sequence number */ __u32 end_seq; /* SEQ + FIN + SYN + datalen */ __u32 when; /* used to compute rtt's */ + unsigned int fack_count; /* speed up SACK processing */ __u8 flags; /* TCP header flags. */ /* NOTE: These must match up to the flags byte in a @@ -1043,12 +1044,12 @@ } /*from STCP */ -static inline void clear_all_retrans_hints(struct tcp_sock *tp){ +static inline void clear_all_retrans_hints(struct tcp_sock *tp) +{ tp->lost_skb_hint = NULL; tp->scoreboard_skb_hint = NULL; tp->retransmit_skb_hint = NULL; tp->forward_skb_hint = NULL; - tp->fastpath_skb_hint = NULL; } /* MD5 Signature */ @@ -1166,6 +1167,7 @@ while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) sk_stream_free_skb(sk, skb); + tcp_sk(sk)->write_queue_rb = RB_ROOT; sk_stream_mem_reclaim(sk); } @@ -1208,7 +1210,7 @@ { struct tcp_sock *tp = tcp_sk(sk); - sk->sk_send_head = skb->next; + sk->sk_send_head = tcp_write_queue_next(sk, skb); if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue) sk->sk_send_head = NULL; /* Don't override Nagle indefinately with F-RTO */ @@ -1227,9 +1229,61 @@ sk->sk_send_head = NULL; } +static inline struct sk_buff *tcp_write_queue_find(struct sock *sk, __u32 seq) +{ + struct rb_node *rb_node = tcp_sk(sk)->write_queue_rb.rb_node; + struct sk_buff *skb = NULL; + + while (rb_node) { + struct sk_buff *tmp = rb_entry(rb_node,struct sk_buff,rb); + if (TCP_SKB_CB(tmp)->end_seq > seq) { + skb = tmp; + if (TCP_SKB_CB(tmp)->seq <= seq) + break; + rb_node = rb_node->rb_left; + } else + rb_node = rb_node->rb_right; + + } + return skb; +} + +static inline void tcp_rb_insert(struct sk_buff *skb, struct rb_root *root) +{ + struct rb_node **rb_link, *rb_parent; + __u32 seq = TCP_SKB_CB(skb)->seq; + + rb_link = &root->rb_node; + rb_parent = NULL; + while (*rb_link != NULL) { + struct sk_buff *tmp = rb_entry(*rb_link,struct sk_buff,rb); + rb_parent = *rb_link; + if (TCP_SKB_CB(tmp)->end_seq > seq) { + BUG_ON(TCP_SKB_CB(tmp)->seq <= seq); + rb_link = &rb_parent->rb_left; + } else { + rb_link = &rb_parent->rb_right; + } + } + rb_link_node(&skb->rb, rb_parent, rb_link); + rb_insert_color(&skb->rb, root); +} + +static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root) +{ + rb_erase(&skb->rb, root); +} + static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) { + struct sk_buff *tail = tcp_write_queue_tail(sk); + unsigned int fc = 0; + + if (tail) + fc = TCP_SKB_CB(tail)->fack_count + tcp_skb_pcount(tail); + TCP_SKB_CB(skb)->fack_count = fc; __skb_queue_tail(&sk->sk_write_queue, skb); + tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb); } static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) @@ -1241,9 +1295,35 @@ sk->sk_send_head = skb; } +/* This is only used for tcp_send_synack(), so the write queue should + * be empty. If that stops being true, the fack_count assignment + * will need to be more elaborate. + */ static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff *skb) { + BUG_ON(!skb_queue_empty(&sk->sk_write_queue)); __skb_queue_head(&sk->sk_write_queue, skb); + TCP_SKB_CB(skb)->fack_count = 0; + tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb); +} + +/* An insert into the middle of the write queue causes the fack + * counts in subsequent packets to become invalid, fix them up. + */ +static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *first) +{ + struct sk_buff *prev = first->prev; + unsigned int fc = 0; + + if (prev != (struct sk_buff *) &sk->sk_write_queue) + fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev); + + while (first != (struct sk_buff *)&sk->sk_write_queue) { + TCP_SKB_CB(first)->fack_count = fc; + + fc += tcp_skb_pcount(first); + first = first->next; + } } /* Insert buff after skb on the write queue of sk. */ @@ -1252,19 +1332,24 @@ struct sock *sk) { __skb_append(skb, buff, &sk->sk_write_queue); + tcp_reset_fack_counts(sk, buff); + tcp_rb_insert(buff, &tcp_sk(sk)->write_queue_rb); } -/* Insert skb between prev and next on the write queue of sk. */ +/* Insert new before skb on the write queue of sk. */ static inline void tcp_insert_write_queue_before(struct sk_buff *new, struct sk_buff *skb, struct sock *sk) { __skb_insert(new, skb->prev, skb, &sk->sk_write_queue); + tcp_reset_fack_counts(sk, new); + tcp_rb_insert(new, &tcp_sk(sk)->write_queue_rb); } static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk) { __skb_unlink(skb, &sk->sk_write_queue); + tcp_rb_unlink(skb, &tcp_sk(sk)->write_queue_rb); } static inline int tcp_skb_is_last(const struct sock *sk, diff -ur linux-2.6.22.6/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c --- linux-2.6.22.6/net/ipv4/tcp_input.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-13 18:23:16.000000000 -0700 @@ -947,14 +947,13 @@ unsigned char *ptr = (skb_transport_header(ack_skb) + TCP_SKB_CB(ack_skb)->sacked); struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2); - struct sk_buff *cached_skb; int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3; int reord = tp->packets_out; int prior_fackets; u32 lost_retrans = 0; int flag = 0; int found_dup_sack = 0; - int cached_fack_count; + int fack_count_base; int i; int first_sack_index; @@ -1020,7 +1019,6 @@ num_sacks = 1; else { int j; - tp->fastpath_skb_hint = NULL; /* order SACK blocks to allow in order walk of the retrans queue */ for (i = num_sacks-1; i > 0; i--) { @@ -1045,14 +1043,7 @@ /* clear flag as used for different purpose in following code */ flag = 0; - /* Use SACK fastpath hint if valid */ - cached_skb = tp->fastpath_skb_hint; - cached_fack_count = tp->fastpath_cnt_hint; - if (!cached_skb) { - cached_skb = tcp_write_queue_head(sk); - cached_fack_count = 0; - } - + fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count; for (i=0; i<num_sacks; i++, sp++) { struct sk_buff *skb; __u32 start_seq = ntohl(sp->start_seq); @@ -1060,8 +1051,10 @@ int fack_count; int dup_sack = (found_dup_sack && (i == first_sack_index)); - skb = cached_skb; - fack_count = cached_fack_count; + skb = tcp_write_queue_find(sk, start_seq); + if (!skb) + continue; + fack_count = TCP_SKB_CB(skb)->fack_count - fack_count_base; /* Event "B" in the comment above. */ if (after(end_seq, tp->high_seq)) @@ -1074,13 +1067,6 @@ if (skb == tcp_send_head(sk)) break; - cached_skb = skb; - cached_fack_count = fack_count; - if (i == first_sack_index) { - tp->fastpath_skb_hint = skb; - tp->fastpath_cnt_hint = fack_count; - } - /* The retransmission queue is always in order, so * we can short-circuit the walk early. */ diff -ur linux-2.6.22.6/net/ipv4/tcp_ipv4.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_ipv4.c --- linux-2.6.22.6/net/ipv4/tcp_ipv4.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_ipv4.c 2007-09-13 18:23:16.000000000 -0700 @@ -1845,6 +1845,7 @@ struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); + tp->write_queue_rb = RB_ROOT; skb_queue_head_init(&tp->out_of_order_queue); tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); diff -ur linux-2.6.22.6/net/ipv4/tcp_minisocks.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_minisocks.c --- linux-2.6.22.6/net/ipv4/tcp_minisocks.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_minisocks.c 2007-09-13 18:23:16.000000000 -0700 @@ -421,6 +421,7 @@ tcp_set_ca_state(newsk, TCP_CA_Open); tcp_init_xmit_timers(newsk); + newtp->write_queue_rb = RB_ROOT; skb_queue_head_init(&newtp->out_of_order_queue); newtp->write_seq = treq->snt_isn + 1; newtp->pushed_seq = newtp->write_seq; diff -ur linux-2.6.22.6/net/ipv4/tcp_output.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_output.c --- linux-2.6.22.6/net/ipv4/tcp_output.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_output.c 2007-09-13 18:23:16.000000000 -0700 @@ -1278,11 +1278,11 @@ sk_charge_skb(sk, nskb); skb = tcp_send_head(sk); + TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq; + TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size; tcp_insert_write_queue_before(nskb, skb, sk); tcp_advance_send_head(sk, skb); - TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq; - TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size; TCP_SKB_CB(nskb)->flags = TCPCB_FLAG_ACK; TCP_SKB_CB(nskb)->sacked = 0; nskb->csum = 0; diff -ur linux-2.6.22.6/net/ipv6/tcp_ipv6.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv6/tcp_ipv6.c --- linux-2.6.22.6/net/ipv6/tcp_ipv6.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv6/tcp_ipv6.c 2007-09-13 18:23:16.000000000 -0700 @@ -1900,6 +1900,7 @@ struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); + tp->write_queue_rb = RB_ROOT; skb_queue_head_init(&tp->out_of_order_queue); tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] David Miller's rbtree patches for 2.6.22.6 2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach @ 2007-09-20 1:44 ` Tom Quetchenbach 2007-09-21 13:48 ` Ilpo Järvinen 2007-09-20 21:27 ` [PATCH 1/2 revised] " Tom Quetchenbach 2007-09-21 13:37 ` [PATCH 1/2] " Ilpo Järvinen 2 siblings, 1 reply; 7+ messages in thread From: Tom Quetchenbach @ 2007-09-20 1:44 UTC (permalink / raw) To: netdev Patch 2: fixes to fack_counts and enhancement of SACK fast path -Tom --- diff -ur linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h linux-2.6.22.6-rbtree-tomq/include/net/tcp.h --- linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700 +++ linux-2.6.22.6-rbtree-tomq/include/net/tcp.h 2007-09-19 12:22:06.000000000 -0700 @@ -1213,6 +1213,11 @@ sk->sk_send_head = tcp_write_queue_next(sk, skb); if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue) sk->sk_send_head = NULL; + else + /* update fack_count of send_head. Since we've sent skb already, + * its packet count must be set by now. */ + TCP_SKB_CB(sk->sk_send_head)->fack_count = + TCP_SKB_CB(skb)->fack_count + tcp_skb_pcount(skb); /* Don't override Nagle indefinately with F-RTO */ if (tp->frto_counter == 2) tp->frto_counter = 3; @@ -1310,19 +1315,22 @@ /* An insert into the middle of the write queue causes the fack * counts in subsequent packets to become invalid, fix them up. */ -static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *first) +static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *skb) { - struct sk_buff *prev = first->prev; + struct sk_buff *prev = skb->prev; unsigned int fc = 0; if (prev != (struct sk_buff *) &sk->sk_write_queue) fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev); - while (first != (struct sk_buff *)&sk->sk_write_queue) { - TCP_SKB_CB(first)->fack_count = fc; + while (skb != (struct sk_buff *)&sk->sk_write_queue) { + if (TCP_SKB_CB(skb)->fack_count == fc || !tcp_skb_pcount(skb)) + break; - fc += tcp_skb_pcount(first); - first = first->next; + TCP_SKB_CB(skb)->fack_count = fc; + + fc += tcp_skb_pcount(skb); + skb = skb->next; } } diff -ur linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-tomq/net/ipv4/tcp_input.c --- linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-13 18:23:16.000000000 -0700 +++ linux-2.6.22.6-rbtree-tomq/net/ipv4/tcp_input.c 2007-09-19 12:27:42.000000000 -0700 @@ -956,6 +956,7 @@ int fack_count_base; int i; int first_sack_index; + u32 prev_end_seq = 0; if (!tp->sacked_out) tp->fackets_out = 0; @@ -1000,6 +1001,7 @@ if (i == 0) { if (tp->recv_sack_cache[i].start_seq != start_seq) flag = 0; + prev_end_seq = ntohl(tp->recv_sack_cache[i].end_seq); } else { if ((tp->recv_sack_cache[i].start_seq != start_seq) || (tp->recv_sack_cache[i].end_seq != end_seq)) @@ -1016,9 +1018,16 @@ first_sack_index = 0; if (flag) + /* all that has changed is end of first SACK block. So all we + * need to do is tag those skbs that were'nt tagged last time. */ num_sacks = 1; else { int j; + + /* more than just end of first SACK block has changed; invalidate + * prev_end_seq */ + + prev_end_seq = 0; /* order SACK blocks to allow in order walk of the retrans queue */ for (i = num_sacks-1; i > 0; i--) { @@ -1051,6 +1060,8 @@ int fack_count; int dup_sack = (found_dup_sack && (i == first_sack_index)); + if (prev_end_seq) start_seq = prev_end_seq; + skb = tcp_write_queue_find(sk, start_seq); if (!skb) continue; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] David Miller's rbtree patches for 2.6.22.6 2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach @ 2007-09-21 13:48 ` Ilpo Järvinen 0 siblings, 0 replies; 7+ messages in thread From: Ilpo Järvinen @ 2007-09-21 13:48 UTC (permalink / raw) To: Tom Quetchenbach; +Cc: Netdev On Wed, 19 Sep 2007, Tom Quetchenbach wrote: > Patch 2: fixes to fack_counts and enhancement of SACK fast path ...Usually these are not combined in patches but a separate patch per change. > diff -ur linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h linux-2.6.22.6-rbtree-tomq/include/net/tcp.h > --- linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700 > +++ linux-2.6.22.6-rbtree-tomq/include/net/tcp.h 2007-09-19 12:22:06.000000000 -0700 > @@ -1213,6 +1213,11 @@ ...Btw, please always use -p to diff as well as it helps review :-). > sk->sk_send_head = tcp_write_queue_next(sk, skb); > if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue) > sk->sk_send_head = NULL; > + else > + /* update fack_count of send_head. Since we've sent skb already, > + * its packet count must be set by now. */ > + TCP_SKB_CB(sk->sk_send_head)->fack_count = > + TCP_SKB_CB(skb)->fack_count + tcp_skb_pcount(skb); > > /* Don't override Nagle indefinately with F-RTO */ > if (tp->frto_counter == 2) > tp->frto_counter = 3; > @@ -1310,19 +1315,22 @@ > /* An insert into the middle of the write queue causes the fack > * counts in subsequent packets to become invalid, fix them up. > */ > -static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *first) > +static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *skb) > { > - struct sk_buff *prev = first->prev; > + struct sk_buff *prev = skb->prev; > unsigned int fc = 0; > > if (prev != (struct sk_buff *) &sk->sk_write_queue) > fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev); > > - while (first != (struct sk_buff *)&sk->sk_write_queue) { > - TCP_SKB_CB(first)->fack_count = fc; > + while (skb != (struct sk_buff *)&sk->sk_write_queue) { > + if (TCP_SKB_CB(skb)->fack_count == fc || !tcp_skb_pcount(skb)) What is this !tcp_skb_pcount(skb) for???? I think what we really want here is this: || (skb == tcp_send_head(sk)) > + break; > > - fc += tcp_skb_pcount(first); > - first = first->next; > + TCP_SKB_CB(skb)->fack_count = fc; > + > + fc += tcp_skb_pcount(skb); > + skb = skb->next; > } > } > > diff -ur linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-tomq/net/ipv4/tcp_input.c > --- linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-13 18:23:16.000000000 -0700 > +++ linux-2.6.22.6-rbtree-tomq/net/ipv4/tcp_input.c 2007-09-19 12:27:42.000000000 -0700 > @@ -956,6 +956,7 @@ > int fack_count_base; > int i; > int first_sack_index; > + u32 prev_end_seq = 0; > > if (!tp->sacked_out) > tp->fackets_out = 0; > @@ -1000,6 +1001,7 @@ > if (i == 0) { > if (tp->recv_sack_cache[i].start_seq != start_seq) > flag = 0; > + prev_end_seq = ntohl(tp->recv_sack_cache[i].end_seq); > } else { > if ((tp->recv_sack_cache[i].start_seq != start_seq) || > (tp->recv_sack_cache[i].end_seq != end_seq)) > @@ -1016,9 +1018,16 @@ > > first_sack_index = 0; > if (flag) > + /* all that has changed is end of first SACK block. So all we > + * need to do is tag those skbs that were'nt tagged last time. */ IMHO, a bit verbose comment. Besides, we already tell above that what SACK fastpath is all about... > num_sacks = 1; > else { > int j; > + > + /* more than just end of first SACK block has changed; invalidate > + * prev_end_seq */ > + > + prev_end_seq = 0; Don't tell what's obvious from code as well. > /* order SACK blocks to allow in order walk of the retrans queue */ > for (i = num_sacks-1; i > 0; i--) { > @@ -1051,6 +1060,8 @@ > int fack_count; > int dup_sack = (found_dup_sack && (i == first_sack_index)); > > + if (prev_end_seq) start_seq = prev_end_seq; > + ...We never add statements on the same line after if statements (see Documentation/CodingStyle) -- i. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2 revised] David Miller's rbtree patches for 2.6.22.6 2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach 2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach @ 2007-09-20 21:27 ` Tom Quetchenbach 2007-09-21 13:37 ` [PATCH 1/2] " Ilpo Järvinen 2 siblings, 0 replies; 7+ messages in thread From: Tom Quetchenbach @ 2007-09-20 21:27 UTC (permalink / raw) To: netdev > Patch 1: David Miller's red-black tree code, tweaked for 2.6.22.6, > with some bugfixes oops... The original patch can cause a kernel panic if tcp_sacktag_write_queue is called when the write queue is empty. Use this one instead. -Tom --- diff -ur linux-2.6.22.6/include/linux/skbuff.h linux-2.6.22.6-rbtree-davem-fixed/include/linux/skbuff.h --- linux-2.6.22.6/include/linux/skbuff.h 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/include/linux/skbuff.h 2007-09-13 18:23:16.000000000 -0700 @@ -18,6 +18,7 @@ #include <linux/compiler.h> #include <linux/time.h> #include <linux/cache.h> +#include <linux/rbtree.h> #include <asm/atomic.h> #include <asm/types.h> @@ -242,6 +243,8 @@ struct sk_buff *next; struct sk_buff *prev; + struct rb_node rb; + struct sock *sk; ktime_t tstamp; struct net_device *dev; diff -ur linux-2.6.22.6/include/linux/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/linux/tcp.h --- linux-2.6.22.6/include/linux/tcp.h 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/include/linux/tcp.h 2007-09-13 18:23:16.000000000 -0700 @@ -174,6 +174,7 @@ #include <linux/skbuff.h> #include <linux/dmaengine.h> +#include <linux/rbtree.h> #include <net/sock.h> #include <net/inet_connection_sock.h> #include <net/inet_timewait_sock.h> @@ -321,6 +322,7 @@ u32 snd_cwnd_used; u32 snd_cwnd_stamp; + struct rb_root write_queue_rb; struct sk_buff_head out_of_order_queue; /* Out of order segments go here */ u32 rcv_wnd; /* Current receiver window */ @@ -339,9 +341,7 @@ struct sk_buff *scoreboard_skb_hint; struct sk_buff *retransmit_skb_hint; struct sk_buff *forward_skb_hint; - struct sk_buff *fastpath_skb_hint; - int fastpath_cnt_hint; int lost_cnt_hint; int retransmit_cnt_hint; int forward_cnt_hint; diff -ur linux-2.6.22.6/include/net/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h --- linux-2.6.22.6/include/net/tcp.h 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700 @@ -540,6 +540,7 @@ __u32 seq; /* Starting sequence number */ __u32 end_seq; /* SEQ + FIN + SYN + datalen */ __u32 when; /* used to compute rtt's */ + unsigned int fack_count; /* speed up SACK processing */ __u8 flags; /* TCP header flags. */ /* NOTE: These must match up to the flags byte in a @@ -1043,12 +1044,12 @@ } /*from STCP */ -static inline void clear_all_retrans_hints(struct tcp_sock *tp){ +static inline void clear_all_retrans_hints(struct tcp_sock *tp) +{ tp->lost_skb_hint = NULL; tp->scoreboard_skb_hint = NULL; tp->retransmit_skb_hint = NULL; tp->forward_skb_hint = NULL; - tp->fastpath_skb_hint = NULL; } /* MD5 Signature */ @@ -1166,6 +1167,7 @@ while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) sk_stream_free_skb(sk, skb); + tcp_sk(sk)->write_queue_rb = RB_ROOT; sk_stream_mem_reclaim(sk); } @@ -1208,7 +1210,7 @@ { struct tcp_sock *tp = tcp_sk(sk); - sk->sk_send_head = skb->next; + sk->sk_send_head = tcp_write_queue_next(sk, skb); if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue) sk->sk_send_head = NULL; /* Don't override Nagle indefinately with F-RTO */ @@ -1227,9 +1229,61 @@ sk->sk_send_head = NULL; } +static inline struct sk_buff *tcp_write_queue_find(struct sock *sk, __u32 seq) +{ + struct rb_node *rb_node = tcp_sk(sk)->write_queue_rb.rb_node; + struct sk_buff *skb = NULL; + + while (rb_node) { + struct sk_buff *tmp = rb_entry(rb_node,struct sk_buff,rb); + if (TCP_SKB_CB(tmp)->end_seq > seq) { + skb = tmp; + if (TCP_SKB_CB(tmp)->seq <= seq) + break; + rb_node = rb_node->rb_left; + } else + rb_node = rb_node->rb_right; + + } + return skb; +} + +static inline void tcp_rb_insert(struct sk_buff *skb, struct rb_root *root) +{ + struct rb_node **rb_link, *rb_parent; + __u32 seq = TCP_SKB_CB(skb)->seq; + + rb_link = &root->rb_node; + rb_parent = NULL; + while (*rb_link != NULL) { + struct sk_buff *tmp = rb_entry(*rb_link,struct sk_buff,rb); + rb_parent = *rb_link; + if (TCP_SKB_CB(tmp)->end_seq > seq) { + BUG_ON(TCP_SKB_CB(tmp)->seq <= seq); + rb_link = &rb_parent->rb_left; + } else { + rb_link = &rb_parent->rb_right; + } + } + rb_link_node(&skb->rb, rb_parent, rb_link); + rb_insert_color(&skb->rb, root); +} + +static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root) +{ + rb_erase(&skb->rb, root); +} + static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) { + struct sk_buff *tail = tcp_write_queue_tail(sk); + unsigned int fc = 0; + + if (tail) + fc = TCP_SKB_CB(tail)->fack_count + tcp_skb_pcount(tail); + TCP_SKB_CB(skb)->fack_count = fc; __skb_queue_tail(&sk->sk_write_queue, skb); + tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb); } static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) @@ -1241,9 +1295,35 @@ sk->sk_send_head = skb; } +/* This is only used for tcp_send_synack(), so the write queue should + * be empty. If that stops being true, the fack_count assignment + * will need to be more elaborate. + */ static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff *skb) { + BUG_ON(!skb_queue_empty(&sk->sk_write_queue)); __skb_queue_head(&sk->sk_write_queue, skb); + TCP_SKB_CB(skb)->fack_count = 0; + tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb); +} + +/* An insert into the middle of the write queue causes the fack + * counts in subsequent packets to become invalid, fix them up. + */ +static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *first) +{ + struct sk_buff *prev = first->prev; + unsigned int fc = 0; + + if (prev != (struct sk_buff *) &sk->sk_write_queue) + fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev); + + while (first != (struct sk_buff *)&sk->sk_write_queue) { + TCP_SKB_CB(first)->fack_count = fc; + + fc += tcp_skb_pcount(first); + first = first->next; + } } /* Insert buff after skb on the write queue of sk. */ @@ -1252,19 +1332,24 @@ struct sock *sk) { __skb_append(skb, buff, &sk->sk_write_queue); + tcp_reset_fack_counts(sk, buff); + tcp_rb_insert(buff, &tcp_sk(sk)->write_queue_rb); } -/* Insert skb between prev and next on the write queue of sk. */ +/* Insert new before skb on the write queue of sk. */ static inline void tcp_insert_write_queue_before(struct sk_buff *new, struct sk_buff *skb, struct sock *sk) { __skb_insert(new, skb->prev, skb, &sk->sk_write_queue); + tcp_reset_fack_counts(sk, new); + tcp_rb_insert(new, &tcp_sk(sk)->write_queue_rb); } static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk) { __skb_unlink(skb, &sk->sk_write_queue); + tcp_rb_unlink(skb, &tcp_sk(sk)->write_queue_rb); } static inline int tcp_skb_is_last(const struct sock *sk, diff -ur linux-2.6.22.6/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c --- linux-2.6.22.6/net/ipv4/tcp_input.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-20 14:10:26.000000000 -0700 @@ -947,14 +947,13 @@ unsigned char *ptr = (skb_transport_header(ack_skb) + TCP_SKB_CB(ack_skb)->sacked); struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2); - struct sk_buff *cached_skb; int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3; int reord = tp->packets_out; int prior_fackets; u32 lost_retrans = 0; int flag = 0; int found_dup_sack = 0; - int cached_fack_count; + int fack_count_base; int i; int first_sack_index; @@ -1020,7 +1019,6 @@ num_sacks = 1; else { int j; - tp->fastpath_skb_hint = NULL; /* order SACK blocks to allow in order walk of the retrans queue */ for (i = num_sacks-1; i > 0; i--) { @@ -1045,13 +1043,8 @@ /* clear flag as used for different purpose in following code */ flag = 0; - /* Use SACK fastpath hint if valid */ - cached_skb = tp->fastpath_skb_hint; - cached_fack_count = tp->fastpath_cnt_hint; - if (!cached_skb) { - cached_skb = tcp_write_queue_head(sk); - cached_fack_count = 0; - } + if (likely(tcp_write_queue_head(sk))) + fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count; for (i=0; i<num_sacks; i++, sp++) { struct sk_buff *skb; @@ -1060,8 +1053,10 @@ int fack_count; int dup_sack = (found_dup_sack && (i == first_sack_index)); - skb = cached_skb; - fack_count = cached_fack_count; + skb = tcp_write_queue_find(sk, start_seq); + if (!skb) + continue; + fack_count = TCP_SKB_CB(skb)->fack_count - fack_count_base; /* Event "B" in the comment above. */ if (after(end_seq, tp->high_seq)) @@ -1074,13 +1069,6 @@ if (skb == tcp_send_head(sk)) break; - cached_skb = skb; - cached_fack_count = fack_count; - if (i == first_sack_index) { - tp->fastpath_skb_hint = skb; - tp->fastpath_cnt_hint = fack_count; - } - /* The retransmission queue is always in order, so * we can short-circuit the walk early. */ diff -ur linux-2.6.22.6/net/ipv4/tcp_ipv4.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_ipv4.c --- linux-2.6.22.6/net/ipv4/tcp_ipv4.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_ipv4.c 2007-09-13 18:23:16.000000000 -0700 @@ -1845,6 +1845,7 @@ struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); + tp->write_queue_rb = RB_ROOT; skb_queue_head_init(&tp->out_of_order_queue); tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); diff -ur linux-2.6.22.6/net/ipv4/tcp_minisocks.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_minisocks.c --- linux-2.6.22.6/net/ipv4/tcp_minisocks.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_minisocks.c 2007-09-13 18:23:16.000000000 -0700 @@ -421,6 +421,7 @@ tcp_set_ca_state(newsk, TCP_CA_Open); tcp_init_xmit_timers(newsk); + newtp->write_queue_rb = RB_ROOT; skb_queue_head_init(&newtp->out_of_order_queue); newtp->write_seq = treq->snt_isn + 1; newtp->pushed_seq = newtp->write_seq; diff -ur linux-2.6.22.6/net/ipv4/tcp_output.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_output.c --- linux-2.6.22.6/net/ipv4/tcp_output.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_output.c 2007-09-13 18:23:16.000000000 -0700 @@ -1278,11 +1278,11 @@ sk_charge_skb(sk, nskb); skb = tcp_send_head(sk); + TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq; + TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size; tcp_insert_write_queue_before(nskb, skb, sk); tcp_advance_send_head(sk, skb); - TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq; - TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size; TCP_SKB_CB(nskb)->flags = TCPCB_FLAG_ACK; TCP_SKB_CB(nskb)->sacked = 0; nskb->csum = 0; diff -ur linux-2.6.22.6/net/ipv6/tcp_ipv6.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv6/tcp_ipv6.c --- linux-2.6.22.6/net/ipv6/tcp_ipv6.c 2007-08-30 23:21:01.000000000 -0700 +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv6/tcp_ipv6.c 2007-09-13 18:23:16.000000000 -0700 @@ -1900,6 +1900,7 @@ struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); + tp->write_queue_rb = RB_ROOT; skb_queue_head_init(&tp->out_of_order_queue); tcp_init_xmit_timers(sk); tcp_prequeue_init(tp); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] David Miller's rbtree patches for 2.6.22.6 2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach 2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach 2007-09-20 21:27 ` [PATCH 1/2 revised] " Tom Quetchenbach @ 2007-09-21 13:37 ` Ilpo Järvinen 2 siblings, 0 replies; 7+ messages in thread From: Ilpo Järvinen @ 2007-09-21 13:37 UTC (permalink / raw) To: Tom Quetchenbach; +Cc: netdev On Wed, 19 Sep 2007, Tom Quetchenbach wrote: > Patch 1: David Miller's red-black tree code, tweaked for 2.6.22.6, > with some bugfixes It would help if you would leave the original changes as is (rb-tree and fack_count separated) and add your work on top of that... > diff -ur linux-2.6.22.6/include/net/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h > --- linux-2.6.22.6/include/net/tcp.h 2007-08-30 23:21:01.000000000 -0700 > +++ linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700 > @@ -540,6 +540,7 @@ > __u32 seq; /* Starting sequence number */ > __u32 end_seq; /* SEQ + FIN + SYN + datalen */ > __u32 when; /* used to compute rtt's */ > + unsigned int fack_count; /* speed up SACK processing */ > __u8 flags; /* TCP header flags. */ > > /* NOTE: These must match up to the flags byte in a > @@ -1043,12 +1044,12 @@ > } > > /*from STCP */ > -static inline void clear_all_retrans_hints(struct tcp_sock *tp){ > +static inline void clear_all_retrans_hints(struct tcp_sock *tp) > +{ Unrelated change, please don't do that. Besides, it's already fixed in net-2.6.24. > tp->lost_skb_hint = NULL; > tp->scoreboard_skb_hint = NULL; > tp->retransmit_skb_hint = NULL; > tp->forward_skb_hint = NULL; > - tp->fastpath_skb_hint = NULL; > } > > /* MD5 Signature */ > @@ -1227,9 +1229,61 @@ > sk->sk_send_head = NULL; > } > > +static inline struct sk_buff *tcp_write_queue_find(struct sock *sk, __u32 seq) > +{ > + struct rb_node *rb_node = tcp_sk(sk)->write_queue_rb.rb_node; > + struct sk_buff *skb = NULL; > + > + while (rb_node) { > + struct sk_buff *tmp = rb_entry(rb_node,struct sk_buff,rb); > + if (TCP_SKB_CB(tmp)->end_seq > seq) { This is old and buggy version of the rb-tree code. Get the latest rb-tree patch from tcp-2.6 tree. > + skb = tmp; > + if (TCP_SKB_CB(tmp)->seq <= seq) ...fixed in tcp-2.6. > + break; > + rb_node = rb_node->rb_left; > + } else > + rb_node = rb_node->rb_right; > + > + } > + return skb; > +} > + > +static inline void tcp_rb_insert(struct sk_buff *skb, struct rb_root *root) > +{ > + struct rb_node **rb_link, *rb_parent; > + __u32 seq = TCP_SKB_CB(skb)->seq; > + > + rb_link = &root->rb_node; > + rb_parent = NULL; > + while (*rb_link != NULL) { > + struct sk_buff *tmp = rb_entry(*rb_link,struct sk_buff,rb); > + rb_parent = *rb_link; > + if (TCP_SKB_CB(tmp)->end_seq > seq) { > + BUG_ON(TCP_SKB_CB(tmp)->seq <= seq); ...these are broken as well. > > + rb_link = &rb_parent->rb_left; > + } else { > + rb_link = &rb_parent->rb_right; > + } > + } > + rb_link_node(&skb->rb, rb_parent, rb_link); > + rb_insert_color(&skb->rb, root); > +} > + > +static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root) > +{ > + rb_erase(&skb->rb, root); > +} > + > static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) > { > + struct sk_buff *tail = tcp_write_queue_tail(sk); > + unsigned int fc = 0; > + > + if (tail) > + fc = TCP_SKB_CB(tail)->fack_count + tcp_skb_pcount(tail); > + TCP_SKB_CB(skb)->fack_count = fc; > __skb_queue_tail(&sk->sk_write_queue, skb); > + tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb); > } > > static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) > diff -ur linux-2.6.22.6/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c > --- linux-2.6.22.6/net/ipv4/tcp_input.c 2007-08-30 23:21:01.000000000 -0700 > +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-13 18:23:16.000000000 -0700 > @@ -947,14 +947,13 @@ > unsigned char *ptr = (skb_transport_header(ack_skb) + > TCP_SKB_CB(ack_skb)->sacked); > struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2); > - struct sk_buff *cached_skb; > int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3; > int reord = tp->packets_out; > int prior_fackets; > u32 lost_retrans = 0; > int flag = 0; > int found_dup_sack = 0; > - int cached_fack_count; > + int fack_count_base; > int i; > int first_sack_index; > > @@ -1020,7 +1019,6 @@ > num_sacks = 1; > else { > int j; > - tp->fastpath_skb_hint = NULL; > > /* order SACK blocks to allow in order walk of the retrans queue */ > for (i = num_sacks-1; i > 0; i--) { > @@ -1045,14 +1043,7 @@ > /* clear flag as used for different purpose in following code */ > flag = 0; > > - /* Use SACK fastpath hint if valid */ > - cached_skb = tp->fastpath_skb_hint; > - cached_fack_count = tp->fastpath_cnt_hint; > - if (!cached_skb) { > - cached_skb = tcp_write_queue_head(sk); > - cached_fack_count = 0; > - } > - > + fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count; > for (i=0; i<num_sacks; i++, sp++) { > struct sk_buff *skb; > __u32 start_seq = ntohl(sp->start_seq); > @@ -1060,8 +1051,10 @@ > int fack_count; > int dup_sack = (found_dup_sack && (i == first_sack_index)); > > - skb = cached_skb; > - fack_count = cached_fack_count; > + skb = tcp_write_queue_find(sk, start_seq); > + if (!skb) > + continue; In net-2.6.24 we validate SACK blocks early. ...This is not a working solution anyway since tcp_write_queue_find(end_seq) might be valid don't you think (though with validator it should only happen when dup_sack is set)? For non-DSACK cases, a better alternative is like this: if (WARN_ON(skb == NULL)) continue; > + fack_count = TCP_SKB_CB(skb)->fack_count - fack_count_base; > > /* Event "B" in the comment above. */ > if (after(end_seq, tp->high_seq)) -- i. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6 2007-09-20 1:39 [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6 Tom Quetchenbach 2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach @ 2007-09-21 14:08 ` Ilpo Järvinen 1 sibling, 0 replies; 7+ messages in thread From: Ilpo Järvinen @ 2007-09-21 14:08 UTC (permalink / raw) To: Tom Quetchenbach; +Cc: Netdev On Wed, 19 Sep 2007, Tom Quetchenbach wrote: > Here are a couple of patches against 2.6.22.6. The first one is just > David's patches tweaked for 2.6.22.6, with a couple of minor bugfixes to > get it to compile and not crash. Why did you combine original patches to a single larger one, I think Dave made them separate on purpose. > (I also changed > __tcp_insert_write_queue_tail() to set the fack_count of the new packet > to the fack_count of the tail plus the packet count of the tail, not the > packet count of the new skb, because I think that's how it was intended > to be. Right? I think I noticed similar "off-by-pcount" error when I looked that long time ago, so I guess you're correct. We're only interested in delta of it anyway and add the current skb's pcount to it (which is not fixed until tcp_fragment in sacktag is past). > In the second patch there are a couple of significant changes. One is > (as Baruch suggested) to modify the existing SACK fast path so that we > don't tag packets we've already tagged when we advance by a packet. This solution would still spend extensive amount of time in processing loop, whenever recv_sack_cache fast-path is not taken, that is, e.g. when cumulative ACK after retransmissions arrive or new hole becomes visible (which are not very exceptional events after all :-)). In the cumulative ACK case especially, this processing is very likely _fully_ wasted walking. So there is still room for large improvements. I've made an improved version of the current sacktag walk couple of days ago (it's in a state where it didn't crash but likely very buggy still), I'll post it here soon... Idea is embed recv_sack_cache checking fully into the walking loop. By doing that an previously known work is not duplicated. The patch is currently against non-rbtree stuff but incorporating rb-tree things on top of it should be very trivial and synergy benefits with rbtree should be considerable because non-rbtree has to do "fast walk" skipping for skbs that are under highest_sack which is prone to cache misses. > The other issue is that the cached fack_counts seem to be wrong, because > they're set when we insert into the queue, but tcp_set_tso_segs() is > called later, just before we send, so all the fack_counts are zero. My > solution was to set the fack_count when we advance the send_head. I think it's better solution anyway, since we might have to do reset_fack_counts() in between and there's no need to update past sk_send_head. > Also I > changed tcp_reset_fack_counts() so that it exits when it hits an skb > whose tcp_skb_pcount() is zero Do you mind to explain what's the purpose of that? > or whose fack_count is already correct. > (This really helps when TSO is on, since there's lots of inserting into > the middle of the queue.) Good point. > Please let me know how I can help get this tested and debugged. Most network development happens against latest net-2.6(.x) trees. In addition, there's experimental tcp-2.6 tree but it's currently a bit outdated already (and DaveM is very busy with the phenomenal merge they're doing for 2.6.24 :-) so it's not too likely to be updated very soon). ...Anyway, thanks for your interest in these things. -- i. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-09-21 14:08 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-20 1:39 [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6 Tom Quetchenbach 2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach 2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach 2007-09-21 13:48 ` Ilpo Järvinen 2007-09-20 21:27 ` [PATCH 1/2 revised] " Tom Quetchenbach 2007-09-21 13:37 ` [PATCH 1/2] " Ilpo Järvinen 2007-09-21 14:08 ` [PATCH 0/2] " Ilpo Järvinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).