* [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6
@ 2007-09-20 1:39 Tom Quetchenbach
2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach
2007-09-21 14:08 ` [PATCH 0/2] " Ilpo Järvinen
0 siblings, 2 replies; 7+ messages in thread
From: Tom Quetchenbach @ 2007-09-20 1:39 UTC (permalink / raw)
To: netdev
Hello,
I've been experimenting with David Miller's red-black tree patch for
SACK processing. We're sending TCP traffic between two machines with
10Gbps cards over a 1Gbps bottleneck link and were getting very high CPU
load with large windows. With a few tweaks, this patch seems to provide
a pretty substantial improvement. David: this seems like excellent work
so far.
Here are a couple of patches against 2.6.22.6. The first one is just
David's patches tweaked for 2.6.22.6, with a couple of minor bugfixes to
get it to compile and not crash. (I also changed
__tcp_insert_write_queue_tail() to set the fack_count of the new packet
to the fack_count of the tail plus the packet count of the tail, not the
packet count of the new skb, because I think that's how it was intended
to be. Right?
In the second patch there are a couple of significant changes. One is
(as Baruch suggested) to modify the existing SACK fast path so that we
don't tag packets we've already tagged when we advance by a packet.
The other issue is that the cached fack_counts seem to be wrong, because
they're set when we insert into the queue, but tcp_set_tso_segs() is
called later, just before we send, so all the fack_counts are zero. My
solution was to set the fack_count when we advance the send_head. Also I
changed tcp_reset_fack_counts() so that it exits when it hits an skb
whose tcp_skb_pcount() is zero or whose fack_count is already correct.
(This really helps when TSO is on, since there's lots of inserting into
the middle of the queue.)
Please let me know how I can help get this tested and debugged. Reducing
the SACK processing load is really going to be essential for us to start
testing experimental TCP variants with large windows.
Thanks
-Tom
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] David Miller's rbtree patches for 2.6.22.6
2007-09-20 1:39 [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6 Tom Quetchenbach
@ 2007-09-20 1:42 ` Tom Quetchenbach
2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach
` (2 more replies)
2007-09-21 14:08 ` [PATCH 0/2] " Ilpo Järvinen
1 sibling, 3 replies; 7+ messages in thread
From: Tom Quetchenbach @ 2007-09-20 1:42 UTC (permalink / raw)
To: netdev
Patch 1: David Miller's red-black tree code, tweaked for 2.6.22.6,
with some bugfixes
-Tom
---
diff -ur linux-2.6.22.6/include/linux/skbuff.h linux-2.6.22.6-rbtree-davem-fixed/include/linux/skbuff.h
--- linux-2.6.22.6/include/linux/skbuff.h 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/include/linux/skbuff.h 2007-09-13 18:23:16.000000000 -0700
@@ -18,6 +18,7 @@
#include <linux/compiler.h>
#include <linux/time.h>
#include <linux/cache.h>
+#include <linux/rbtree.h>
#include <asm/atomic.h>
#include <asm/types.h>
@@ -242,6 +243,8 @@
struct sk_buff *next;
struct sk_buff *prev;
+ struct rb_node rb;
+
struct sock *sk;
ktime_t tstamp;
struct net_device *dev;
diff -ur linux-2.6.22.6/include/linux/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/linux/tcp.h
--- linux-2.6.22.6/include/linux/tcp.h 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/include/linux/tcp.h 2007-09-13 18:23:16.000000000 -0700
@@ -174,6 +174,7 @@
#include <linux/skbuff.h>
#include <linux/dmaengine.h>
+#include <linux/rbtree.h>
#include <net/sock.h>
#include <net/inet_connection_sock.h>
#include <net/inet_timewait_sock.h>
@@ -321,6 +322,7 @@
u32 snd_cwnd_used;
u32 snd_cwnd_stamp;
+ struct rb_root write_queue_rb;
struct sk_buff_head out_of_order_queue; /* Out of order segments go here */
u32 rcv_wnd; /* Current receiver window */
@@ -339,9 +341,7 @@
struct sk_buff *scoreboard_skb_hint;
struct sk_buff *retransmit_skb_hint;
struct sk_buff *forward_skb_hint;
- struct sk_buff *fastpath_skb_hint;
- int fastpath_cnt_hint;
int lost_cnt_hint;
int retransmit_cnt_hint;
int forward_cnt_hint;
diff -ur linux-2.6.22.6/include/net/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h
--- linux-2.6.22.6/include/net/tcp.h 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700
@@ -540,6 +540,7 @@
__u32 seq; /* Starting sequence number */
__u32 end_seq; /* SEQ + FIN + SYN + datalen */
__u32 when; /* used to compute rtt's */
+ unsigned int fack_count; /* speed up SACK processing */
__u8 flags; /* TCP header flags. */
/* NOTE: These must match up to the flags byte in a
@@ -1043,12 +1044,12 @@
}
/*from STCP */
-static inline void clear_all_retrans_hints(struct tcp_sock *tp){
+static inline void clear_all_retrans_hints(struct tcp_sock *tp)
+{
tp->lost_skb_hint = NULL;
tp->scoreboard_skb_hint = NULL;
tp->retransmit_skb_hint = NULL;
tp->forward_skb_hint = NULL;
- tp->fastpath_skb_hint = NULL;
}
/* MD5 Signature */
@@ -1166,6 +1167,7 @@
while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
sk_stream_free_skb(sk, skb);
+ tcp_sk(sk)->write_queue_rb = RB_ROOT;
sk_stream_mem_reclaim(sk);
}
@@ -1208,7 +1210,7 @@
{
struct tcp_sock *tp = tcp_sk(sk);
- sk->sk_send_head = skb->next;
+ sk->sk_send_head = tcp_write_queue_next(sk, skb);
if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue)
sk->sk_send_head = NULL;
/* Don't override Nagle indefinately with F-RTO */
@@ -1227,9 +1229,61 @@
sk->sk_send_head = NULL;
}
+static inline struct sk_buff *tcp_write_queue_find(struct sock *sk, __u32 seq)
+{
+ struct rb_node *rb_node = tcp_sk(sk)->write_queue_rb.rb_node;
+ struct sk_buff *skb = NULL;
+
+ while (rb_node) {
+ struct sk_buff *tmp = rb_entry(rb_node,struct sk_buff,rb);
+ if (TCP_SKB_CB(tmp)->end_seq > seq) {
+ skb = tmp;
+ if (TCP_SKB_CB(tmp)->seq <= seq)
+ break;
+ rb_node = rb_node->rb_left;
+ } else
+ rb_node = rb_node->rb_right;
+
+ }
+ return skb;
+}
+
+static inline void tcp_rb_insert(struct sk_buff *skb, struct rb_root *root)
+{
+ struct rb_node **rb_link, *rb_parent;
+ __u32 seq = TCP_SKB_CB(skb)->seq;
+
+ rb_link = &root->rb_node;
+ rb_parent = NULL;
+ while (*rb_link != NULL) {
+ struct sk_buff *tmp = rb_entry(*rb_link,struct sk_buff,rb);
+ rb_parent = *rb_link;
+ if (TCP_SKB_CB(tmp)->end_seq > seq) {
+ BUG_ON(TCP_SKB_CB(tmp)->seq <= seq);
+ rb_link = &rb_parent->rb_left;
+ } else {
+ rb_link = &rb_parent->rb_right;
+ }
+ }
+ rb_link_node(&skb->rb, rb_parent, rb_link);
+ rb_insert_color(&skb->rb, root);
+}
+
+static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root)
+{
+ rb_erase(&skb->rb, root);
+}
+
static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
{
+ struct sk_buff *tail = tcp_write_queue_tail(sk);
+ unsigned int fc = 0;
+
+ if (tail)
+ fc = TCP_SKB_CB(tail)->fack_count + tcp_skb_pcount(tail);
+ TCP_SKB_CB(skb)->fack_count = fc;
__skb_queue_tail(&sk->sk_write_queue, skb);
+ tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
}
static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
@@ -1241,9 +1295,35 @@
sk->sk_send_head = skb;
}
+/* This is only used for tcp_send_synack(), so the write queue should
+ * be empty. If that stops being true, the fack_count assignment
+ * will need to be more elaborate.
+ */
static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff *skb)
{
+ BUG_ON(!skb_queue_empty(&sk->sk_write_queue));
__skb_queue_head(&sk->sk_write_queue, skb);
+ TCP_SKB_CB(skb)->fack_count = 0;
+ tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
+}
+
+/* An insert into the middle of the write queue causes the fack
+ * counts in subsequent packets to become invalid, fix them up.
+ */
+static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *first)
+{
+ struct sk_buff *prev = first->prev;
+ unsigned int fc = 0;
+
+ if (prev != (struct sk_buff *) &sk->sk_write_queue)
+ fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
+
+ while (first != (struct sk_buff *)&sk->sk_write_queue) {
+ TCP_SKB_CB(first)->fack_count = fc;
+
+ fc += tcp_skb_pcount(first);
+ first = first->next;
+ }
}
/* Insert buff after skb on the write queue of sk. */
@@ -1252,19 +1332,24 @@
struct sock *sk)
{
__skb_append(skb, buff, &sk->sk_write_queue);
+ tcp_reset_fack_counts(sk, buff);
+ tcp_rb_insert(buff, &tcp_sk(sk)->write_queue_rb);
}
-/* Insert skb between prev and next on the write queue of sk. */
+/* Insert new before skb on the write queue of sk. */
static inline void tcp_insert_write_queue_before(struct sk_buff *new,
struct sk_buff *skb,
struct sock *sk)
{
__skb_insert(new, skb->prev, skb, &sk->sk_write_queue);
+ tcp_reset_fack_counts(sk, new);
+ tcp_rb_insert(new, &tcp_sk(sk)->write_queue_rb);
}
static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk)
{
__skb_unlink(skb, &sk->sk_write_queue);
+ tcp_rb_unlink(skb, &tcp_sk(sk)->write_queue_rb);
}
static inline int tcp_skb_is_last(const struct sock *sk,
diff -ur linux-2.6.22.6/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c
--- linux-2.6.22.6/net/ipv4/tcp_input.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-13 18:23:16.000000000 -0700
@@ -947,14 +947,13 @@
unsigned char *ptr = (skb_transport_header(ack_skb) +
TCP_SKB_CB(ack_skb)->sacked);
struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
- struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3;
int reord = tp->packets_out;
int prior_fackets;
u32 lost_retrans = 0;
int flag = 0;
int found_dup_sack = 0;
- int cached_fack_count;
+ int fack_count_base;
int i;
int first_sack_index;
@@ -1020,7 +1019,6 @@
num_sacks = 1;
else {
int j;
- tp->fastpath_skb_hint = NULL;
/* order SACK blocks to allow in order walk of the retrans queue */
for (i = num_sacks-1; i > 0; i--) {
@@ -1045,14 +1043,7 @@
/* clear flag as used for different purpose in following code */
flag = 0;
- /* Use SACK fastpath hint if valid */
- cached_skb = tp->fastpath_skb_hint;
- cached_fack_count = tp->fastpath_cnt_hint;
- if (!cached_skb) {
- cached_skb = tcp_write_queue_head(sk);
- cached_fack_count = 0;
- }
-
+ fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count;
for (i=0; i<num_sacks; i++, sp++) {
struct sk_buff *skb;
__u32 start_seq = ntohl(sp->start_seq);
@@ -1060,8 +1051,10 @@
int fack_count;
int dup_sack = (found_dup_sack && (i == first_sack_index));
- skb = cached_skb;
- fack_count = cached_fack_count;
+ skb = tcp_write_queue_find(sk, start_seq);
+ if (!skb)
+ continue;
+ fack_count = TCP_SKB_CB(skb)->fack_count - fack_count_base;
/* Event "B" in the comment above. */
if (after(end_seq, tp->high_seq))
@@ -1074,13 +1067,6 @@
if (skb == tcp_send_head(sk))
break;
- cached_skb = skb;
- cached_fack_count = fack_count;
- if (i == first_sack_index) {
- tp->fastpath_skb_hint = skb;
- tp->fastpath_cnt_hint = fack_count;
- }
-
/* The retransmission queue is always in order, so
* we can short-circuit the walk early.
*/
diff -ur linux-2.6.22.6/net/ipv4/tcp_ipv4.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_ipv4.c
--- linux-2.6.22.6/net/ipv4/tcp_ipv4.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_ipv4.c 2007-09-13 18:23:16.000000000 -0700
@@ -1845,6 +1845,7 @@
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
+ tp->write_queue_rb = RB_ROOT;
skb_queue_head_init(&tp->out_of_order_queue);
tcp_init_xmit_timers(sk);
tcp_prequeue_init(tp);
diff -ur linux-2.6.22.6/net/ipv4/tcp_minisocks.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_minisocks.c
--- linux-2.6.22.6/net/ipv4/tcp_minisocks.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_minisocks.c 2007-09-13 18:23:16.000000000 -0700
@@ -421,6 +421,7 @@
tcp_set_ca_state(newsk, TCP_CA_Open);
tcp_init_xmit_timers(newsk);
+ newtp->write_queue_rb = RB_ROOT;
skb_queue_head_init(&newtp->out_of_order_queue);
newtp->write_seq = treq->snt_isn + 1;
newtp->pushed_seq = newtp->write_seq;
diff -ur linux-2.6.22.6/net/ipv4/tcp_output.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_output.c
--- linux-2.6.22.6/net/ipv4/tcp_output.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_output.c 2007-09-13 18:23:16.000000000 -0700
@@ -1278,11 +1278,11 @@
sk_charge_skb(sk, nskb);
skb = tcp_send_head(sk);
+ TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq;
+ TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size;
tcp_insert_write_queue_before(nskb, skb, sk);
tcp_advance_send_head(sk, skb);
- TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq;
- TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size;
TCP_SKB_CB(nskb)->flags = TCPCB_FLAG_ACK;
TCP_SKB_CB(nskb)->sacked = 0;
nskb->csum = 0;
diff -ur linux-2.6.22.6/net/ipv6/tcp_ipv6.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv6/tcp_ipv6.c
--- linux-2.6.22.6/net/ipv6/tcp_ipv6.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv6/tcp_ipv6.c 2007-09-13 18:23:16.000000000 -0700
@@ -1900,6 +1900,7 @@
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
+ tp->write_queue_rb = RB_ROOT;
skb_queue_head_init(&tp->out_of_order_queue);
tcp_init_xmit_timers(sk);
tcp_prequeue_init(tp);
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] David Miller's rbtree patches for 2.6.22.6
2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach
@ 2007-09-20 1:44 ` Tom Quetchenbach
2007-09-21 13:48 ` Ilpo Järvinen
2007-09-20 21:27 ` [PATCH 1/2 revised] " Tom Quetchenbach
2007-09-21 13:37 ` [PATCH 1/2] " Ilpo Järvinen
2 siblings, 1 reply; 7+ messages in thread
From: Tom Quetchenbach @ 2007-09-20 1:44 UTC (permalink / raw)
To: netdev
Patch 2: fixes to fack_counts and enhancement of SACK fast path
-Tom
---
diff -ur linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h linux-2.6.22.6-rbtree-tomq/include/net/tcp.h
--- linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700
+++ linux-2.6.22.6-rbtree-tomq/include/net/tcp.h 2007-09-19 12:22:06.000000000 -0700
@@ -1213,6 +1213,11 @@
sk->sk_send_head = tcp_write_queue_next(sk, skb);
if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue)
sk->sk_send_head = NULL;
+ else
+ /* update fack_count of send_head. Since we've sent skb already,
+ * its packet count must be set by now. */
+ TCP_SKB_CB(sk->sk_send_head)->fack_count =
+ TCP_SKB_CB(skb)->fack_count + tcp_skb_pcount(skb);
/* Don't override Nagle indefinately with F-RTO */
if (tp->frto_counter == 2)
tp->frto_counter = 3;
@@ -1310,19 +1315,22 @@
/* An insert into the middle of the write queue causes the fack
* counts in subsequent packets to become invalid, fix them up.
*/
-static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *first)
+static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *skb)
{
- struct sk_buff *prev = first->prev;
+ struct sk_buff *prev = skb->prev;
unsigned int fc = 0;
if (prev != (struct sk_buff *) &sk->sk_write_queue)
fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
- while (first != (struct sk_buff *)&sk->sk_write_queue) {
- TCP_SKB_CB(first)->fack_count = fc;
+ while (skb != (struct sk_buff *)&sk->sk_write_queue) {
+ if (TCP_SKB_CB(skb)->fack_count == fc || !tcp_skb_pcount(skb))
+ break;
- fc += tcp_skb_pcount(first);
- first = first->next;
+ TCP_SKB_CB(skb)->fack_count = fc;
+
+ fc += tcp_skb_pcount(skb);
+ skb = skb->next;
}
}
diff -ur linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-tomq/net/ipv4/tcp_input.c
--- linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-13 18:23:16.000000000 -0700
+++ linux-2.6.22.6-rbtree-tomq/net/ipv4/tcp_input.c 2007-09-19 12:27:42.000000000 -0700
@@ -956,6 +956,7 @@
int fack_count_base;
int i;
int first_sack_index;
+ u32 prev_end_seq = 0;
if (!tp->sacked_out)
tp->fackets_out = 0;
@@ -1000,6 +1001,7 @@
if (i == 0) {
if (tp->recv_sack_cache[i].start_seq != start_seq)
flag = 0;
+ prev_end_seq = ntohl(tp->recv_sack_cache[i].end_seq);
} else {
if ((tp->recv_sack_cache[i].start_seq != start_seq) ||
(tp->recv_sack_cache[i].end_seq != end_seq))
@@ -1016,9 +1018,16 @@
first_sack_index = 0;
if (flag)
+ /* all that has changed is end of first SACK block. So all we
+ * need to do is tag those skbs that were'nt tagged last time. */
num_sacks = 1;
else {
int j;
+
+ /* more than just end of first SACK block has changed; invalidate
+ * prev_end_seq */
+
+ prev_end_seq = 0;
/* order SACK blocks to allow in order walk of the retrans queue */
for (i = num_sacks-1; i > 0; i--) {
@@ -1051,6 +1060,8 @@
int fack_count;
int dup_sack = (found_dup_sack && (i == first_sack_index));
+ if (prev_end_seq) start_seq = prev_end_seq;
+
skb = tcp_write_queue_find(sk, start_seq);
if (!skb)
continue;
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2 revised] David Miller's rbtree patches for 2.6.22.6
2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach
2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach
@ 2007-09-20 21:27 ` Tom Quetchenbach
2007-09-21 13:37 ` [PATCH 1/2] " Ilpo Järvinen
2 siblings, 0 replies; 7+ messages in thread
From: Tom Quetchenbach @ 2007-09-20 21:27 UTC (permalink / raw)
To: netdev
> Patch 1: David Miller's red-black tree code, tweaked for 2.6.22.6,
> with some bugfixes
oops...
The original patch can cause a kernel panic if tcp_sacktag_write_queue
is called when the write queue is empty. Use this one instead.
-Tom
---
diff -ur linux-2.6.22.6/include/linux/skbuff.h linux-2.6.22.6-rbtree-davem-fixed/include/linux/skbuff.h
--- linux-2.6.22.6/include/linux/skbuff.h 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/include/linux/skbuff.h 2007-09-13 18:23:16.000000000 -0700
@@ -18,6 +18,7 @@
#include <linux/compiler.h>
#include <linux/time.h>
#include <linux/cache.h>
+#include <linux/rbtree.h>
#include <asm/atomic.h>
#include <asm/types.h>
@@ -242,6 +243,8 @@
struct sk_buff *next;
struct sk_buff *prev;
+ struct rb_node rb;
+
struct sock *sk;
ktime_t tstamp;
struct net_device *dev;
diff -ur linux-2.6.22.6/include/linux/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/linux/tcp.h
--- linux-2.6.22.6/include/linux/tcp.h 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/include/linux/tcp.h 2007-09-13 18:23:16.000000000 -0700
@@ -174,6 +174,7 @@
#include <linux/skbuff.h>
#include <linux/dmaengine.h>
+#include <linux/rbtree.h>
#include <net/sock.h>
#include <net/inet_connection_sock.h>
#include <net/inet_timewait_sock.h>
@@ -321,6 +322,7 @@
u32 snd_cwnd_used;
u32 snd_cwnd_stamp;
+ struct rb_root write_queue_rb;
struct sk_buff_head out_of_order_queue; /* Out of order segments go here */
u32 rcv_wnd; /* Current receiver window */
@@ -339,9 +341,7 @@
struct sk_buff *scoreboard_skb_hint;
struct sk_buff *retransmit_skb_hint;
struct sk_buff *forward_skb_hint;
- struct sk_buff *fastpath_skb_hint;
- int fastpath_cnt_hint;
int lost_cnt_hint;
int retransmit_cnt_hint;
int forward_cnt_hint;
diff -ur linux-2.6.22.6/include/net/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h
--- linux-2.6.22.6/include/net/tcp.h 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700
@@ -540,6 +540,7 @@
__u32 seq; /* Starting sequence number */
__u32 end_seq; /* SEQ + FIN + SYN + datalen */
__u32 when; /* used to compute rtt's */
+ unsigned int fack_count; /* speed up SACK processing */
__u8 flags; /* TCP header flags. */
/* NOTE: These must match up to the flags byte in a
@@ -1043,12 +1044,12 @@
}
/*from STCP */
-static inline void clear_all_retrans_hints(struct tcp_sock *tp){
+static inline void clear_all_retrans_hints(struct tcp_sock *tp)
+{
tp->lost_skb_hint = NULL;
tp->scoreboard_skb_hint = NULL;
tp->retransmit_skb_hint = NULL;
tp->forward_skb_hint = NULL;
- tp->fastpath_skb_hint = NULL;
}
/* MD5 Signature */
@@ -1166,6 +1167,7 @@
while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
sk_stream_free_skb(sk, skb);
+ tcp_sk(sk)->write_queue_rb = RB_ROOT;
sk_stream_mem_reclaim(sk);
}
@@ -1208,7 +1210,7 @@
{
struct tcp_sock *tp = tcp_sk(sk);
- sk->sk_send_head = skb->next;
+ sk->sk_send_head = tcp_write_queue_next(sk, skb);
if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue)
sk->sk_send_head = NULL;
/* Don't override Nagle indefinately with F-RTO */
@@ -1227,9 +1229,61 @@
sk->sk_send_head = NULL;
}
+static inline struct sk_buff *tcp_write_queue_find(struct sock *sk, __u32 seq)
+{
+ struct rb_node *rb_node = tcp_sk(sk)->write_queue_rb.rb_node;
+ struct sk_buff *skb = NULL;
+
+ while (rb_node) {
+ struct sk_buff *tmp = rb_entry(rb_node,struct sk_buff,rb);
+ if (TCP_SKB_CB(tmp)->end_seq > seq) {
+ skb = tmp;
+ if (TCP_SKB_CB(tmp)->seq <= seq)
+ break;
+ rb_node = rb_node->rb_left;
+ } else
+ rb_node = rb_node->rb_right;
+
+ }
+ return skb;
+}
+
+static inline void tcp_rb_insert(struct sk_buff *skb, struct rb_root *root)
+{
+ struct rb_node **rb_link, *rb_parent;
+ __u32 seq = TCP_SKB_CB(skb)->seq;
+
+ rb_link = &root->rb_node;
+ rb_parent = NULL;
+ while (*rb_link != NULL) {
+ struct sk_buff *tmp = rb_entry(*rb_link,struct sk_buff,rb);
+ rb_parent = *rb_link;
+ if (TCP_SKB_CB(tmp)->end_seq > seq) {
+ BUG_ON(TCP_SKB_CB(tmp)->seq <= seq);
+ rb_link = &rb_parent->rb_left;
+ } else {
+ rb_link = &rb_parent->rb_right;
+ }
+ }
+ rb_link_node(&skb->rb, rb_parent, rb_link);
+ rb_insert_color(&skb->rb, root);
+}
+
+static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root)
+{
+ rb_erase(&skb->rb, root);
+}
+
static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
{
+ struct sk_buff *tail = tcp_write_queue_tail(sk);
+ unsigned int fc = 0;
+
+ if (tail)
+ fc = TCP_SKB_CB(tail)->fack_count + tcp_skb_pcount(tail);
+ TCP_SKB_CB(skb)->fack_count = fc;
__skb_queue_tail(&sk->sk_write_queue, skb);
+ tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
}
static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
@@ -1241,9 +1295,35 @@
sk->sk_send_head = skb;
}
+/* This is only used for tcp_send_synack(), so the write queue should
+ * be empty. If that stops being true, the fack_count assignment
+ * will need to be more elaborate.
+ */
static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff *skb)
{
+ BUG_ON(!skb_queue_empty(&sk->sk_write_queue));
__skb_queue_head(&sk->sk_write_queue, skb);
+ TCP_SKB_CB(skb)->fack_count = 0;
+ tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
+}
+
+/* An insert into the middle of the write queue causes the fack
+ * counts in subsequent packets to become invalid, fix them up.
+ */
+static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *first)
+{
+ struct sk_buff *prev = first->prev;
+ unsigned int fc = 0;
+
+ if (prev != (struct sk_buff *) &sk->sk_write_queue)
+ fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
+
+ while (first != (struct sk_buff *)&sk->sk_write_queue) {
+ TCP_SKB_CB(first)->fack_count = fc;
+
+ fc += tcp_skb_pcount(first);
+ first = first->next;
+ }
}
/* Insert buff after skb on the write queue of sk. */
@@ -1252,19 +1332,24 @@
struct sock *sk)
{
__skb_append(skb, buff, &sk->sk_write_queue);
+ tcp_reset_fack_counts(sk, buff);
+ tcp_rb_insert(buff, &tcp_sk(sk)->write_queue_rb);
}
-/* Insert skb between prev and next on the write queue of sk. */
+/* Insert new before skb on the write queue of sk. */
static inline void tcp_insert_write_queue_before(struct sk_buff *new,
struct sk_buff *skb,
struct sock *sk)
{
__skb_insert(new, skb->prev, skb, &sk->sk_write_queue);
+ tcp_reset_fack_counts(sk, new);
+ tcp_rb_insert(new, &tcp_sk(sk)->write_queue_rb);
}
static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk)
{
__skb_unlink(skb, &sk->sk_write_queue);
+ tcp_rb_unlink(skb, &tcp_sk(sk)->write_queue_rb);
}
static inline int tcp_skb_is_last(const struct sock *sk,
diff -ur linux-2.6.22.6/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c
--- linux-2.6.22.6/net/ipv4/tcp_input.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-20 14:10:26.000000000 -0700
@@ -947,14 +947,13 @@
unsigned char *ptr = (skb_transport_header(ack_skb) +
TCP_SKB_CB(ack_skb)->sacked);
struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
- struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3;
int reord = tp->packets_out;
int prior_fackets;
u32 lost_retrans = 0;
int flag = 0;
int found_dup_sack = 0;
- int cached_fack_count;
+ int fack_count_base;
int i;
int first_sack_index;
@@ -1020,7 +1019,6 @@
num_sacks = 1;
else {
int j;
- tp->fastpath_skb_hint = NULL;
/* order SACK blocks to allow in order walk of the retrans queue */
for (i = num_sacks-1; i > 0; i--) {
@@ -1045,13 +1043,8 @@
/* clear flag as used for different purpose in following code */
flag = 0;
- /* Use SACK fastpath hint if valid */
- cached_skb = tp->fastpath_skb_hint;
- cached_fack_count = tp->fastpath_cnt_hint;
- if (!cached_skb) {
- cached_skb = tcp_write_queue_head(sk);
- cached_fack_count = 0;
- }
+ if (likely(tcp_write_queue_head(sk)))
+ fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count;
for (i=0; i<num_sacks; i++, sp++) {
struct sk_buff *skb;
@@ -1060,8 +1053,10 @@
int fack_count;
int dup_sack = (found_dup_sack && (i == first_sack_index));
- skb = cached_skb;
- fack_count = cached_fack_count;
+ skb = tcp_write_queue_find(sk, start_seq);
+ if (!skb)
+ continue;
+ fack_count = TCP_SKB_CB(skb)->fack_count - fack_count_base;
/* Event "B" in the comment above. */
if (after(end_seq, tp->high_seq))
@@ -1074,13 +1069,6 @@
if (skb == tcp_send_head(sk))
break;
- cached_skb = skb;
- cached_fack_count = fack_count;
- if (i == first_sack_index) {
- tp->fastpath_skb_hint = skb;
- tp->fastpath_cnt_hint = fack_count;
- }
-
/* The retransmission queue is always in order, so
* we can short-circuit the walk early.
*/
diff -ur linux-2.6.22.6/net/ipv4/tcp_ipv4.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_ipv4.c
--- linux-2.6.22.6/net/ipv4/tcp_ipv4.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_ipv4.c 2007-09-13 18:23:16.000000000 -0700
@@ -1845,6 +1845,7 @@
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
+ tp->write_queue_rb = RB_ROOT;
skb_queue_head_init(&tp->out_of_order_queue);
tcp_init_xmit_timers(sk);
tcp_prequeue_init(tp);
diff -ur linux-2.6.22.6/net/ipv4/tcp_minisocks.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_minisocks.c
--- linux-2.6.22.6/net/ipv4/tcp_minisocks.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_minisocks.c 2007-09-13 18:23:16.000000000 -0700
@@ -421,6 +421,7 @@
tcp_set_ca_state(newsk, TCP_CA_Open);
tcp_init_xmit_timers(newsk);
+ newtp->write_queue_rb = RB_ROOT;
skb_queue_head_init(&newtp->out_of_order_queue);
newtp->write_seq = treq->snt_isn + 1;
newtp->pushed_seq = newtp->write_seq;
diff -ur linux-2.6.22.6/net/ipv4/tcp_output.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_output.c
--- linux-2.6.22.6/net/ipv4/tcp_output.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_output.c 2007-09-13 18:23:16.000000000 -0700
@@ -1278,11 +1278,11 @@
sk_charge_skb(sk, nskb);
skb = tcp_send_head(sk);
+ TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq;
+ TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size;
tcp_insert_write_queue_before(nskb, skb, sk);
tcp_advance_send_head(sk, skb);
- TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq;
- TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size;
TCP_SKB_CB(nskb)->flags = TCPCB_FLAG_ACK;
TCP_SKB_CB(nskb)->sacked = 0;
nskb->csum = 0;
diff -ur linux-2.6.22.6/net/ipv6/tcp_ipv6.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv6/tcp_ipv6.c
--- linux-2.6.22.6/net/ipv6/tcp_ipv6.c 2007-08-30 23:21:01.000000000 -0700
+++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv6/tcp_ipv6.c 2007-09-13 18:23:16.000000000 -0700
@@ -1900,6 +1900,7 @@
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
+ tp->write_queue_rb = RB_ROOT;
skb_queue_head_init(&tp->out_of_order_queue);
tcp_init_xmit_timers(sk);
tcp_prequeue_init(tp);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] David Miller's rbtree patches for 2.6.22.6
2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach
2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach
2007-09-20 21:27 ` [PATCH 1/2 revised] " Tom Quetchenbach
@ 2007-09-21 13:37 ` Ilpo Järvinen
2 siblings, 0 replies; 7+ messages in thread
From: Ilpo Järvinen @ 2007-09-21 13:37 UTC (permalink / raw)
To: Tom Quetchenbach; +Cc: netdev
On Wed, 19 Sep 2007, Tom Quetchenbach wrote:
> Patch 1: David Miller's red-black tree code, tweaked for 2.6.22.6,
> with some bugfixes
It would help if you would leave the original changes as is (rb-tree and
fack_count separated) and add your work on top of that...
> diff -ur linux-2.6.22.6/include/net/tcp.h linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h
> --- linux-2.6.22.6/include/net/tcp.h 2007-08-30 23:21:01.000000000 -0700
> +++ linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700
> @@ -540,6 +540,7 @@
> __u32 seq; /* Starting sequence number */
> __u32 end_seq; /* SEQ + FIN + SYN + datalen */
> __u32 when; /* used to compute rtt's */
> + unsigned int fack_count; /* speed up SACK processing */
> __u8 flags; /* TCP header flags. */
>
> /* NOTE: These must match up to the flags byte in a
> @@ -1043,12 +1044,12 @@
> }
>
> /*from STCP */
> -static inline void clear_all_retrans_hints(struct tcp_sock *tp){
> +static inline void clear_all_retrans_hints(struct tcp_sock *tp)
> +{
Unrelated change, please don't do that. Besides, it's already fixed in
net-2.6.24.
> tp->lost_skb_hint = NULL;
> tp->scoreboard_skb_hint = NULL;
> tp->retransmit_skb_hint = NULL;
> tp->forward_skb_hint = NULL;
> - tp->fastpath_skb_hint = NULL;
> }
>
> /* MD5 Signature */
> @@ -1227,9 +1229,61 @@
> sk->sk_send_head = NULL;
> }
>
> +static inline struct sk_buff *tcp_write_queue_find(struct sock *sk, __u32 seq)
> +{
> + struct rb_node *rb_node = tcp_sk(sk)->write_queue_rb.rb_node;
> + struct sk_buff *skb = NULL;
> +
> + while (rb_node) {
> + struct sk_buff *tmp = rb_entry(rb_node,struct sk_buff,rb);
> + if (TCP_SKB_CB(tmp)->end_seq > seq) {
This is old and buggy version of the rb-tree code. Get the latest rb-tree
patch from tcp-2.6 tree.
> + skb = tmp;
> + if (TCP_SKB_CB(tmp)->seq <= seq)
...fixed in tcp-2.6.
> + break;
> + rb_node = rb_node->rb_left;
> + } else
> + rb_node = rb_node->rb_right;
> +
> + }
> + return skb;
> +}
> +
> +static inline void tcp_rb_insert(struct sk_buff *skb, struct rb_root *root)
> +{
> + struct rb_node **rb_link, *rb_parent;
> + __u32 seq = TCP_SKB_CB(skb)->seq;
> +
> + rb_link = &root->rb_node;
> + rb_parent = NULL;
> + while (*rb_link != NULL) {
> + struct sk_buff *tmp = rb_entry(*rb_link,struct sk_buff,rb);
> + rb_parent = *rb_link;
> + if (TCP_SKB_CB(tmp)->end_seq > seq) {
> + BUG_ON(TCP_SKB_CB(tmp)->seq <= seq);
...these are broken as well.
>
> + rb_link = &rb_parent->rb_left;
> + } else {
> + rb_link = &rb_parent->rb_right;
> + }
> + }
> + rb_link_node(&skb->rb, rb_parent, rb_link);
> + rb_insert_color(&skb->rb, root);
> +}
> +
> +static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root)
> +{
> + rb_erase(&skb->rb, root);
> +}
> +
> static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
> {
> + struct sk_buff *tail = tcp_write_queue_tail(sk);
> + unsigned int fc = 0;
> +
> + if (tail)
> + fc = TCP_SKB_CB(tail)->fack_count + tcp_skb_pcount(tail);
> + TCP_SKB_CB(skb)->fack_count = fc;
> __skb_queue_tail(&sk->sk_write_queue, skb);
> + tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
> }
>
> static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
> diff -ur linux-2.6.22.6/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c
> --- linux-2.6.22.6/net/ipv4/tcp_input.c 2007-08-30 23:21:01.000000000 -0700
> +++ linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-13 18:23:16.000000000 -0700
> @@ -947,14 +947,13 @@
> unsigned char *ptr = (skb_transport_header(ack_skb) +
> TCP_SKB_CB(ack_skb)->sacked);
> struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
> - struct sk_buff *cached_skb;
> int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3;
> int reord = tp->packets_out;
> int prior_fackets;
> u32 lost_retrans = 0;
> int flag = 0;
> int found_dup_sack = 0;
> - int cached_fack_count;
> + int fack_count_base;
> int i;
> int first_sack_index;
>
> @@ -1020,7 +1019,6 @@
> num_sacks = 1;
> else {
> int j;
> - tp->fastpath_skb_hint = NULL;
>
> /* order SACK blocks to allow in order walk of the retrans queue */
> for (i = num_sacks-1; i > 0; i--) {
> @@ -1045,14 +1043,7 @@
> /* clear flag as used for different purpose in following code */
> flag = 0;
>
> - /* Use SACK fastpath hint if valid */
> - cached_skb = tp->fastpath_skb_hint;
> - cached_fack_count = tp->fastpath_cnt_hint;
> - if (!cached_skb) {
> - cached_skb = tcp_write_queue_head(sk);
> - cached_fack_count = 0;
> - }
> -
> + fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count;
> for (i=0; i<num_sacks; i++, sp++) {
> struct sk_buff *skb;
> __u32 start_seq = ntohl(sp->start_seq);
> @@ -1060,8 +1051,10 @@
> int fack_count;
> int dup_sack = (found_dup_sack && (i == first_sack_index));
>
> - skb = cached_skb;
> - fack_count = cached_fack_count;
> + skb = tcp_write_queue_find(sk, start_seq);
> + if (!skb)
> + continue;
In net-2.6.24 we validate SACK blocks early. ...This is not a working
solution anyway since tcp_write_queue_find(end_seq) might be valid don't
you think (though with validator it should only happen when dup_sack is
set)? For non-DSACK cases, a better alternative is like this:
if (WARN_ON(skb == NULL))
continue;
> + fack_count = TCP_SKB_CB(skb)->fack_count - fack_count_base;
>
> /* Event "B" in the comment above. */
> if (after(end_seq, tp->high_seq))
--
i.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] David Miller's rbtree patches for 2.6.22.6
2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach
@ 2007-09-21 13:48 ` Ilpo Järvinen
0 siblings, 0 replies; 7+ messages in thread
From: Ilpo Järvinen @ 2007-09-21 13:48 UTC (permalink / raw)
To: Tom Quetchenbach; +Cc: Netdev
On Wed, 19 Sep 2007, Tom Quetchenbach wrote:
> Patch 2: fixes to fack_counts and enhancement of SACK fast path
...Usually these are not combined in patches but a separate patch per
change.
> diff -ur linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h linux-2.6.22.6-rbtree-tomq/include/net/tcp.h
> --- linux-2.6.22.6-rbtree-davem-fixed/include/net/tcp.h 2007-09-19 17:36:07.000000000 -0700
> +++ linux-2.6.22.6-rbtree-tomq/include/net/tcp.h 2007-09-19 12:22:06.000000000 -0700
> @@ -1213,6 +1213,11 @@
...Btw, please always use -p to diff as well as it helps review :-).
> sk->sk_send_head = tcp_write_queue_next(sk, skb);
> if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue)
> sk->sk_send_head = NULL;
> + else
> + /* update fack_count of send_head. Since we've sent skb already,
> + * its packet count must be set by now. */
> + TCP_SKB_CB(sk->sk_send_head)->fack_count =
> + TCP_SKB_CB(skb)->fack_count + tcp_skb_pcount(skb);
>
> /* Don't override Nagle indefinately with F-RTO */
> if (tp->frto_counter == 2)
> tp->frto_counter = 3;
> @@ -1310,19 +1315,22 @@
> /* An insert into the middle of the write queue causes the fack
> * counts in subsequent packets to become invalid, fix them up.
> */
> -static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *first)
> +static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *skb)
> {
> - struct sk_buff *prev = first->prev;
> + struct sk_buff *prev = skb->prev;
> unsigned int fc = 0;
>
> if (prev != (struct sk_buff *) &sk->sk_write_queue)
> fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
>
> - while (first != (struct sk_buff *)&sk->sk_write_queue) {
> - TCP_SKB_CB(first)->fack_count = fc;
> + while (skb != (struct sk_buff *)&sk->sk_write_queue) {
> + if (TCP_SKB_CB(skb)->fack_count == fc || !tcp_skb_pcount(skb))
What is this !tcp_skb_pcount(skb) for???? I think what we really want here
is this: || (skb == tcp_send_head(sk))
> + break;
>
> - fc += tcp_skb_pcount(first);
> - first = first->next;
> + TCP_SKB_CB(skb)->fack_count = fc;
> +
> + fc += tcp_skb_pcount(skb);
> + skb = skb->next;
> }
> }
>
> diff -ur linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c linux-2.6.22.6-rbtree-tomq/net/ipv4/tcp_input.c
> --- linux-2.6.22.6-rbtree-davem-fixed/net/ipv4/tcp_input.c 2007-09-13 18:23:16.000000000 -0700
> +++ linux-2.6.22.6-rbtree-tomq/net/ipv4/tcp_input.c 2007-09-19 12:27:42.000000000 -0700
> @@ -956,6 +956,7 @@
> int fack_count_base;
> int i;
> int first_sack_index;
> + u32 prev_end_seq = 0;
>
> if (!tp->sacked_out)
> tp->fackets_out = 0;
> @@ -1000,6 +1001,7 @@
> if (i == 0) {
> if (tp->recv_sack_cache[i].start_seq != start_seq)
> flag = 0;
> + prev_end_seq = ntohl(tp->recv_sack_cache[i].end_seq);
> } else {
> if ((tp->recv_sack_cache[i].start_seq != start_seq) ||
> (tp->recv_sack_cache[i].end_seq != end_seq))
> @@ -1016,9 +1018,16 @@
>
> first_sack_index = 0;
> if (flag)
> + /* all that has changed is end of first SACK block. So all we
> + * need to do is tag those skbs that were'nt tagged last time. */
IMHO, a bit verbose comment. Besides, we already tell above that what SACK
fastpath is all about...
> num_sacks = 1;
> else {
> int j;
> +
> + /* more than just end of first SACK block has changed; invalidate
> + * prev_end_seq */
> +
> + prev_end_seq = 0;
Don't tell what's obvious from code as well.
> /* order SACK blocks to allow in order walk of the retrans queue */
> for (i = num_sacks-1; i > 0; i--) {
> @@ -1051,6 +1060,8 @@
> int fack_count;
> int dup_sack = (found_dup_sack && (i == first_sack_index));
>
> + if (prev_end_seq) start_seq = prev_end_seq;
> +
...We never add statements on the same line after if statements (see
Documentation/CodingStyle)
--
i.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6
2007-09-20 1:39 [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6 Tom Quetchenbach
2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach
@ 2007-09-21 14:08 ` Ilpo Järvinen
1 sibling, 0 replies; 7+ messages in thread
From: Ilpo Järvinen @ 2007-09-21 14:08 UTC (permalink / raw)
To: Tom Quetchenbach; +Cc: Netdev
On Wed, 19 Sep 2007, Tom Quetchenbach wrote:
> Here are a couple of patches against 2.6.22.6. The first one is just
> David's patches tweaked for 2.6.22.6, with a couple of minor bugfixes to
> get it to compile and not crash.
Why did you combine original patches to a single larger one, I think Dave
made them separate on purpose.
> (I also changed
> __tcp_insert_write_queue_tail() to set the fack_count of the new packet
> to the fack_count of the tail plus the packet count of the tail, not the
> packet count of the new skb, because I think that's how it was intended
> to be. Right?
I think I noticed similar "off-by-pcount" error when I looked that long
time ago, so I guess you're correct. We're only interested in delta
of it anyway and add the current skb's pcount to it (which is not
fixed until tcp_fragment in sacktag is past).
> In the second patch there are a couple of significant changes. One is
> (as Baruch suggested) to modify the existing SACK fast path so that we
> don't tag packets we've already tagged when we advance by a packet.
This solution would still spend extensive amount of time in processing
loop, whenever recv_sack_cache fast-path is not taken, that is, e.g. when
cumulative ACK after retransmissions arrive or new hole becomes visible
(which are not very exceptional events after all :-)). In the cumulative
ACK case especially, this processing is very likely _fully_ wasted
walking.
So there is still room for large improvements. I've made an improved
version of the current sacktag walk couple of days ago (it's in a state
where it didn't crash but likely very buggy still), I'll post it here
soon... Idea is embed recv_sack_cache checking fully into the walking
loop. By doing that an previously known work is not duplicated. The patch
is currently against non-rbtree stuff but incorporating rb-tree things on
top of it should be very trivial and synergy benefits with rbtree should
be considerable because non-rbtree has to do "fast walk" skipping for skbs
that are under highest_sack which is prone to cache misses.
> The other issue is that the cached fack_counts seem to be wrong, because
> they're set when we insert into the queue, but tcp_set_tso_segs() is
> called later, just before we send, so all the fack_counts are zero. My
> solution was to set the fack_count when we advance the send_head.
I think it's better solution anyway, since we might have to do
reset_fack_counts() in between and there's no need to update past
sk_send_head.
> Also I
> changed tcp_reset_fack_counts() so that it exits when it hits an skb
> whose tcp_skb_pcount() is zero
Do you mind to explain what's the purpose of that?
> or whose fack_count is already correct.
> (This really helps when TSO is on, since there's lots of inserting into
> the middle of the queue.)
Good point.
> Please let me know how I can help get this tested and debugged.
Most network development happens against latest net-2.6(.x) trees.
In addition, there's experimental tcp-2.6 tree but it's currently a bit
outdated already (and DaveM is very busy with the phenomenal merge
they're doing for 2.6.24 :-) so it's not too likely to be updated very
soon).
...Anyway, thanks for your interest in these things.
--
i.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-09-21 14:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-20 1:39 [PATCH 0/2] David Miller's rbtree patches for 2.6.22.6 Tom Quetchenbach
2007-09-20 1:42 ` [PATCH 1/2] " Tom Quetchenbach
2007-09-20 1:44 ` [PATCH 2/2] " Tom Quetchenbach
2007-09-21 13:48 ` Ilpo Järvinen
2007-09-20 21:27 ` [PATCH 1/2 revised] " Tom Quetchenbach
2007-09-21 13:37 ` [PATCH 1/2] " Ilpo Järvinen
2007-09-21 14:08 ` [PATCH 0/2] " Ilpo Järvinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).