[PATCH net-2.6.24 0/9]: TCP improvements & cleanups

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-2.6.24 0/9]: TCP improvements & cleanups
@ 2007-09-20 12:17 Ilpo Järvinen
  2007-09-20 12:17 ` [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb Ilpo Järvinen
  0 siblings, 1 reply; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Hi Dave,

Just in case you're short on what to do ;-) here are some TCP
related cleanups & improvements to net-2.6.24. Including FRTO
undo fix which finally should allow FRTO to be turned on, and
some simple fastpath tweaks simple enough to the 2.6.24
schedule. ...I've a larger fastpath_hint removal patch coming
up later too but it's really a monster which needs more time
though I guess it could really cut down the SACK processing
latencies people are experience with high-speed flows (I'll
probably post it with RFC once you've picked these up).

These were boot (& couple of hours) tested on the top of
net-2.6.24 (something after the first "large" rebase you did,
so you could count that as success report of it too :-)).
Not sure if all those fragment/collapse paths I modified got
executed though.

--
 i.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb
  2007-09-20 12:17 [PATCH net-2.6.24 0/9]: TCP improvements & cleanups Ilpo Järvinen
@ 2007-09-20 12:17 ` Ilpo Järvinen
  2007-09-20 12:17   ` [PATCH 2/9] [TCP]: Make fackets_out accurate Ilpo Järvinen
  2007-09-20 18:29   ` [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb David Miller
  0 siblings, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

In general, it should not be necessary to call tcp_fragment for
already SACKed skbs, but it's better to be safe than sorry. And
indeed, it can be called from sacktag when a DSACK arrives or
some ACK (with SACK) reordering occurs (sacktag could be made
to avoid the call in the latter case though I'm not sure if it's
worth of the trouble and added complexity to cover such marginal
case).

The collapse case has return for SACKED_ACKED case earlier, so
just WARN_ON if internal inconsistency is detected for some
reason.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_output.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d65d17b..9df5b2a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -692,6 +692,9 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 	TCP_SKB_CB(buff)->end_seq = TCP_SKB_CB(skb)->end_seq;
 	TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(buff)->seq;
 
+	if (tp->sacked_out && (TCP_SKB_CB(skb)->seq == tp->highest_sack))
+		tp->highest_sack = TCP_SKB_CB(buff)->seq;
+
 	/* PSH and FIN should only be set in the second packet. */
 	flags = TCP_SKB_CB(skb)->flags;
 	TCP_SKB_CB(skb)->flags = flags & ~(TCPCB_FLAG_FIN|TCPCB_FLAG_PSH);
@@ -1723,6 +1726,10 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		/* Update sequence range on original skb. */
 		TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(next_skb)->end_seq;
 
+		if (WARN_ON(tp->sacked_out &&
+		    (TCP_SKB_CB(next_skb)->seq == tp->highest_sack)))
+			return;
+
 		/* Merge over control information. */
 		flags |= TCP_SKB_CB(next_skb)->flags; /* This moves PSH/FIN etc. over */
 		TCP_SKB_CB(skb)->flags = flags;
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/9] [TCP]: Make fackets_out accurate
  2007-09-20 12:17 ` [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb Ilpo Järvinen
@ 2007-09-20 12:17   ` Ilpo Järvinen
  2007-09-20 12:17     ` [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_ Ilpo Järvinen
  2007-09-20 18:30     ` [PATCH 2/9] [TCP]: Make fackets_out accurate David Miller
  2007-09-20 18:29   ` [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb David Miller
  1 sibling, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Substraction for fackets_out is unconditional when snd_una
advances, thus there's no need to do it inside the loop. Just
make sure correct bounds are honored.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c  |   10 +++-------
 net/ipv4/tcp_output.c |   44 ++++++++++++++++++++++++++------------------
 2 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fd0ae4d..09b6b1d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2302,8 +2302,8 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	 * 1. Reno does not count dupacks (sacked_out) automatically. */
 	if (!tp->packets_out)
 		tp->sacked_out = 0;
-	/* 2. SACK counts snd_fack in packets inaccurately. */
-	if (tp->sacked_out == 0)
+
+	if (WARN_ON(!tp->sacked_out && tp->fackets_out))
 		tp->fackets_out = 0;
 
 	/* Now state machine starts.
@@ -2571,10 +2571,6 @@ static int tcp_tso_acked(struct sock *sk, struct sk_buff *skb,
 		} else if (*seq_rtt < 0)
 			*seq_rtt = now - scb->when;
 
-		if (tp->fackets_out) {
-			__u32 dval = min(tp->fackets_out, packets_acked);
-			tp->fackets_out -= dval;
-		}
 		tp->packets_out -= packets_acked;
 
 		BUG_ON(tcp_skb_pcount(skb) == 0);
@@ -2657,7 +2653,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 			seq_rtt = now - scb->when;
 			last_ackt = skb->tstamp;
 		}
-		tcp_dec_pcount_approx(&tp->fackets_out, skb);
 		tp->packets_out -= tcp_skb_pcount(skb);
 		tcp_unlink_write_queue(skb, sk);
 		sk_stream_free_skb(sk, skb);
@@ -2672,6 +2667,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		tcp_ack_update_rtt(sk, acked, seq_rtt);
 		tcp_rearm_rto(sk);
 
+		tp->fackets_out -= min(pkts_acked, tp->fackets_out);
 		if (tcp_is_reno(tp))
 			tcp_remove_reno_sacks(sk, pkts_acked);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9df5b2a..cbe8bf6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -652,6 +652,26 @@ static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb, unsigned
 	}
 }
 
+/* When a modification to fackets out becomes necessary, we need to check
+ * skb is counted to fackets_out or not. Another important thing is to
+ * tweak SACK fastpath hint too as it would overwrite all changes unless
+ * hint is also changed.
+ */
+static void tcp_adjust_fackets_out(struct tcp_sock *tp, struct sk_buff *skb,
+				   int decr)
+{
+	if (!tp->sacked_out)
+		return;
+
+	if (!before(tp->highest_sack, TCP_SKB_CB(skb)->seq))
+		tp->fackets_out -= decr;
+
+	/* cnt_hint is "off-by-one" compared with fackets_out (see sacktag) */
+	if (tp->fastpath_skb_hint != NULL &&
+	    after(TCP_SKB_CB(tp->fastpath_skb_hint)->seq, TCP_SKB_CB(skb)->seq))
+		tp->fastpath_cnt_hint -= decr;
+}
+
 /* Function to create two new TCP segments.  Shrinks the given segment
  * to the specified size and appends a new segment with the rest of the
  * packet to the list.  This won't be called frequently, I hope.
@@ -746,21 +766,12 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
 			tp->lost_out -= diff;
 
-		if (diff > 0) {
-			/* Adjust Reno SACK estimate. */
-			if (tcp_is_reno(tp)) {
-				tcp_dec_pcount_approx_int(&tp->sacked_out, diff);
-				tcp_verify_left_out(tp);
-			}
-
-			tcp_dec_pcount_approx_int(&tp->fackets_out, diff);
-			/* SACK fastpath might overwrite it unless dealt with */
-			if (tp->fastpath_skb_hint != NULL &&
-			    after(TCP_SKB_CB(tp->fastpath_skb_hint)->seq,
-				  TCP_SKB_CB(skb)->seq)) {
-				tcp_dec_pcount_approx_int(&tp->fastpath_cnt_hint, diff);
-			}
+		/* Adjust Reno SACK estimate. */
+		if (tcp_is_reno(tp) && diff > 0) {
+			tcp_dec_pcount_approx_int(&tp->sacked_out, diff);
+			tcp_verify_left_out(tp);
 		}
+		tcp_adjust_fackets_out(tp, skb, diff);
 	}
 
 	/* Link BUFF into the send queue. */
@@ -1746,10 +1757,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		if (tcp_is_reno(tp) && tp->sacked_out)
 			tcp_dec_pcount_approx(&tp->sacked_out, next_skb);
 
-		/* Not quite right: it can be > snd.fack, but
-		 * it is better to underestimate fackets.
-		 */
-		tcp_dec_pcount_approx(&tp->fackets_out, next_skb);
+		tcp_adjust_fackets_out(tp, skb, tcp_skb_pcount(next_skb));
 		tp->packets_out -= tcp_skb_pcount(next_skb);
 		sk_stream_free_skb(sk, next_skb);
 	}
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_
  2007-09-20 12:17   ` [PATCH 2/9] [TCP]: Make fackets_out accurate Ilpo Järvinen
@ 2007-09-20 12:17     ` Ilpo Järvinen
  2007-09-20 12:17       ` [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue Ilpo Järvinen
  2007-09-20 18:31       ` [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_ David Miller
  2007-09-20 18:30     ` [PATCH 2/9] [TCP]: Make fackets_out accurate David Miller
  1 sibling, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

In addition, fix its function comment spacing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/net/tcp.h     |    4 ++--
 net/ipv4/tcp_input.c  |   10 +++++-----
 net/ipv4/tcp_output.c |    6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index f28f382..16dfe3c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1066,8 +1066,8 @@ static inline void tcp_mib_init(void)
 	TCP_ADD_STATS_USER(TCP_MIB_MAXCONN, -1);
 }
 
-/*from STCP */
-static inline void clear_all_retrans_hints(struct tcp_sock *tp){
+/* from STCP */
+static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {
 	tp->lost_skb_hint = NULL;
 	tp->scoreboard_skb_hint = NULL;
 	tp->retransmit_skb_hint = NULL;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 09b6b1d..89162a9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1670,7 +1670,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 	tp->high_seq = tp->frto_highmark;
 	TCP_ECN_queue_cwr(tp);
 
-	clear_all_retrans_hints(tp);
+	tcp_clear_all_retrans_hints(tp);
 }
 
 void tcp_clear_retrans(struct tcp_sock *tp)
@@ -1741,7 +1741,7 @@ void tcp_enter_loss(struct sock *sk, int how)
 	/* Abort FRTO algorithm if one is in progress */
 	tp->frto_counter = 0;
 
-	clear_all_retrans_hints(tp);
+	tcp_clear_all_retrans_hints(tp);
 }
 
 static int tcp_check_sack_reneging(struct sock *sk)
@@ -2106,7 +2106,7 @@ static void tcp_undo_cwr(struct sock *sk, const int undo)
 
 	/* There is something screwy going on with the retrans hints after
 	   an undo */
-	clear_all_retrans_hints(tp);
+	tcp_clear_all_retrans_hints(tp);
 }
 
 static inline int tcp_may_undo(struct tcp_sock *tp)
@@ -2199,7 +2199,7 @@ static int tcp_try_undo_loss(struct sock *sk)
 			TCP_SKB_CB(skb)->sacked &= ~TCPCB_LOST;
 		}
 
-		clear_all_retrans_hints(tp);
+		tcp_clear_all_retrans_hints(tp);
 
 		DBGUNDO(sk, "partial loss");
 		tp->lost_out = 0;
@@ -2656,7 +2656,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		tp->packets_out -= tcp_skb_pcount(skb);
 		tcp_unlink_write_queue(skb, sk);
 		sk_stream_free_skb(sk, skb);
-		clear_all_retrans_hints(tp);
+		tcp_clear_all_retrans_hints(tp);
 	}
 
 	if (acked&FLAG_ACKED) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cbe8bf6..f46d24b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -687,7 +687,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 
 	BUG_ON(len > skb->len);
 
-	clear_all_retrans_hints(tp);
+	tcp_clear_all_retrans_hints(tp);
 	nsize = skb_headlen(skb) - len;
 	if (nsize < 0)
 		nsize = 0;
@@ -1719,7 +1719,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		       tcp_skb_pcount(next_skb) != 1);
 
 		/* changing transmit queue under us so clear hints */
-		clear_all_retrans_hints(tp);
+		tcp_clear_all_retrans_hints(tp);
 
 		/* Ok.	We will be able to collapse the packet. */
 		tcp_unlink_write_queue(next_skb, sk);
@@ -1792,7 +1792,7 @@ void tcp_simple_retransmit(struct sock *sk)
 		}
 	}
 
-	clear_all_retrans_hints(tp);
+	tcp_clear_all_retrans_hints(tp);
 
 	if (!lost)
 		return;
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue
  2007-09-20 12:17     ` [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_ Ilpo Järvinen
@ 2007-09-20 12:17       ` Ilpo Järvinen
  2007-09-20 12:17         ` [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue Ilpo Järvinen
  2007-09-20 18:33         ` [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue David Miller
  2007-09-20 18:31       ` [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_ David Miller
  1 sibling, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

The accounting code is pretty much the same, so it's a shame
we do it in two places.

I'm not too sure if added fully_acked check in MTU probing is
really what we want perhaps the added end_seq could be used in
the after() comparison.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   75 +++++++++++++++++++++----------------------------
 1 files changed, 32 insertions(+), 43 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 89162a9..d340fd5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2528,14 +2528,12 @@ static void tcp_rearm_rto(struct sock *sk)
 	}
 }
 
-static int tcp_tso_acked(struct sock *sk, struct sk_buff *skb,
-			 __u32 now, __s32 *seq_rtt)
+static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
 	__u32 seq = tp->snd_una;
 	__u32 packets_acked;
-	int acked = 0;
 
 	/* If we get here, the whole TSO packet has not been
 	 * acked.
@@ -2548,36 +2546,11 @@ static int tcp_tso_acked(struct sock *sk, struct sk_buff *skb,
 	packets_acked -= tcp_skb_pcount(skb);
 
 	if (packets_acked) {
-		__u8 sacked = scb->sacked;
-
-		acked |= FLAG_DATA_ACKED;
-		if (sacked) {
-			if (sacked & TCPCB_RETRANS) {
-				if (sacked & TCPCB_SACKED_RETRANS)
-					tp->retrans_out -= packets_acked;
-				acked |= FLAG_RETRANS_DATA_ACKED;
-				*seq_rtt = -1;
-			} else if (*seq_rtt < 0)
-				*seq_rtt = now - scb->when;
-			if (sacked & TCPCB_SACKED_ACKED)
-				tp->sacked_out -= packets_acked;
-			if (sacked & TCPCB_LOST)
-				tp->lost_out -= packets_acked;
-			if (sacked & TCPCB_URG) {
-				if (tp->urg_mode &&
-				    !before(seq, tp->snd_up))
-					tp->urg_mode = 0;
-			}
-		} else if (*seq_rtt < 0)
-			*seq_rtt = now - scb->when;
-
-		tp->packets_out -= packets_acked;
-
 		BUG_ON(tcp_skb_pcount(skb) == 0);
 		BUG_ON(!before(scb->seq, scb->end_seq));
 	}
 
-	return acked;
+	return packets_acked;
 }
 
 /* Remove acknowledged frames from the retransmission queue. */
@@ -2587,6 +2560,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct sk_buff *skb;
 	__u32 now = tcp_time_stamp;
+	int fully_acked = 1;
 	int acked = 0;
 	int prior_packets = tp->packets_out;
 	__s32 seq_rtt = -1;
@@ -2595,6 +2569,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	while ((skb = tcp_write_queue_head(sk)) &&
 	       skb != tcp_send_head(sk)) {
 		struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
+		u32 end_seq;
+		u32 packets_acked;
 		__u8 sacked = scb->sacked;
 
 		/* If our packet is before the ack sequence we can
@@ -2602,11 +2578,19 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		 * the other end.
 		 */
 		if (after(scb->end_seq, tp->snd_una)) {
-			if (tcp_skb_pcount(skb) > 1 &&
-			    after(tp->snd_una, scb->seq))
-				acked |= tcp_tso_acked(sk, skb,
-						       now, &seq_rtt);
-			break;
+			if (tcp_skb_pcount(skb) == 1 ||
+			    !after(tp->snd_una, scb->seq))
+				break;
+
+			packets_acked = tcp_tso_acked(sk, skb);
+			if (!packets_acked)
+				break;
+
+			fully_acked = 0;
+			end_seq = tp->snd_una;
+		} else {
+			packets_acked = tcp_skb_pcount(skb);
+			end_seq = scb->end_seq;
 		}
 
 		/* Initial outgoing SYN's get put onto the write_queue
@@ -2624,7 +2608,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		}
 
 		/* MTU probing checks */
-		if (icsk->icsk_mtup.probe_size) {
+		if (fully_acked && icsk->icsk_mtup.probe_size) {
 			if (!after(tp->mtu_probe.probe_seq_end, TCP_SKB_CB(skb)->end_seq)) {
 				tcp_mtup_probe_success(sk, skb);
 			}
@@ -2633,27 +2617,32 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		if (sacked) {
 			if (sacked & TCPCB_RETRANS) {
 				if (sacked & TCPCB_SACKED_RETRANS)
-					tp->retrans_out -= tcp_skb_pcount(skb);
+					tp->retrans_out -= packets_acked;
 				acked |= FLAG_RETRANS_DATA_ACKED;
 				seq_rtt = -1;
 			} else if (seq_rtt < 0) {
 				seq_rtt = now - scb->when;
-				last_ackt = skb->tstamp;
+				if (fully_acked)
+					last_ackt = skb->tstamp;
 			}
 			if (sacked & TCPCB_SACKED_ACKED)
-				tp->sacked_out -= tcp_skb_pcount(skb);
+				tp->sacked_out -= packets_acked;
 			if (sacked & TCPCB_LOST)
-				tp->lost_out -= tcp_skb_pcount(skb);
+				tp->lost_out -= packets_acked;
 			if (sacked & TCPCB_URG) {
-				if (tp->urg_mode &&
-				    !before(scb->end_seq, tp->snd_up))
+				if (tp->urg_mode && !before(end_seq, tp->snd_up))
 					tp->urg_mode = 0;
 			}
 		} else if (seq_rtt < 0) {
 			seq_rtt = now - scb->when;
-			last_ackt = skb->tstamp;
+			if (fully_acked)
+				last_ackt = skb->tstamp;
 		}
-		tp->packets_out -= tcp_skb_pcount(skb);
+		tp->packets_out -= packets_acked;
+
+		if (!fully_acked)
+			break;
+
 		tcp_unlink_write_queue(skb, sk);
 		sk_stream_free_skb(sk, skb);
 		tcp_clear_all_retrans_hints(tp);
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue
  2007-09-20 12:17       ` [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue Ilpo Järvinen
@ 2007-09-20 12:17         ` Ilpo Järvinen
  2007-09-20 12:17           ` [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users Ilpo Järvinen
  2007-09-20 18:33           ` [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue David Miller
  2007-09-20 18:33         ` [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue David Miller
  1 sibling, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Implements following cleanups:
- Comment re-placement (CodingStyle)
- tcp_tso_acked() local (wrapper-like) variable removal
  (readability)
- __-types removed (IMHO they make local variables jumpy looking
  and just was space)
- acked -> flag (naming conventions elsewhere in TCP code)
- linebreak adjustments (readability)
- nested if()s combined (reduced indentation)
- clarifying newlines added

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   66 ++++++++++++++++++++++---------------------------
 1 files changed, 30 insertions(+), 36 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d340fd5..74accb0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2528,55 +2528,49 @@ static void tcp_rearm_rto(struct sock *sk)
 	}
 }
 
+/* If we get here, the whole TSO packet has not been acked. */
 static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
-	__u32 seq = tp->snd_una;
-	__u32 packets_acked;
+	u32 packets_acked;
 
-	/* If we get here, the whole TSO packet has not been
-	 * acked.
-	 */
-	BUG_ON(!after(scb->end_seq, seq));
+	BUG_ON(!after(TCP_SKB_CB(skb)->end_seq, tp->snd_una));
 
 	packets_acked = tcp_skb_pcount(skb);
-	if (tcp_trim_head(sk, skb, seq - scb->seq))
+	if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))
 		return 0;
 	packets_acked -= tcp_skb_pcount(skb);
 
 	if (packets_acked) {
 		BUG_ON(tcp_skb_pcount(skb) == 0);
-		BUG_ON(!before(scb->seq, scb->end_seq));
+		BUG_ON(!before(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq));
 	}
 
 	return packets_acked;
 }
 
-/* Remove acknowledged frames from the retransmission queue. */
-static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
+/* Remove acknowledged frames from the retransmission queue. If our packet
+ * is before the ack sequence we can discard it as it's confirmed to have
+ * arrived at the other end.
+ */
+static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct sk_buff *skb;
-	__u32 now = tcp_time_stamp;
+	u32 now = tcp_time_stamp;
 	int fully_acked = 1;
-	int acked = 0;
+	int flag = 0;
 	int prior_packets = tp->packets_out;
-	__s32 seq_rtt = -1;
+	s32 seq_rtt = -1;
 	ktime_t last_ackt = net_invalid_timestamp();
 
-	while ((skb = tcp_write_queue_head(sk)) &&
-	       skb != tcp_send_head(sk)) {
+	while ((skb = tcp_write_queue_head(sk)) && skb != tcp_send_head(sk)) {
 		struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
 		u32 end_seq;
 		u32 packets_acked;
-		__u8 sacked = scb->sacked;
+		u8 sacked = scb->sacked;
 
-		/* If our packet is before the ack sequence we can
-		 * discard it as it's confirmed to have arrived at
-		 * the other end.
-		 */
 		if (after(scb->end_seq, tp->snd_una)) {
 			if (tcp_skb_pcount(skb) == 1 ||
 			    !after(tp->snd_una, scb->seq))
@@ -2601,38 +2595,38 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		 * quickly.  This is severely frowned upon behavior.
 		 */
 		if (!(scb->flags & TCPCB_FLAG_SYN)) {
-			acked |= FLAG_DATA_ACKED;
+			flag |= FLAG_DATA_ACKED;
 		} else {
-			acked |= FLAG_SYN_ACKED;
+			flag |= FLAG_SYN_ACKED;
 			tp->retrans_stamp = 0;
 		}
 
 		/* MTU probing checks */
-		if (fully_acked && icsk->icsk_mtup.probe_size) {
-			if (!after(tp->mtu_probe.probe_seq_end, TCP_SKB_CB(skb)->end_seq)) {
-				tcp_mtup_probe_success(sk, skb);
-			}
+		if (fully_acked && icsk->icsk_mtup.probe_size &&
+		    !after(tp->mtu_probe.probe_seq_end, scb->end_seq)) {
+			tcp_mtup_probe_success(sk, skb);
 		}
 
 		if (sacked) {
 			if (sacked & TCPCB_RETRANS) {
 				if (sacked & TCPCB_SACKED_RETRANS)
 					tp->retrans_out -= packets_acked;
-				acked |= FLAG_RETRANS_DATA_ACKED;
+				flag |= FLAG_RETRANS_DATA_ACKED;
 				seq_rtt = -1;
 			} else if (seq_rtt < 0) {
 				seq_rtt = now - scb->when;
 				if (fully_acked)
 					last_ackt = skb->tstamp;
 			}
+
 			if (sacked & TCPCB_SACKED_ACKED)
 				tp->sacked_out -= packets_acked;
 			if (sacked & TCPCB_LOST)
 				tp->lost_out -= packets_acked;
-			if (sacked & TCPCB_URG) {
-				if (tp->urg_mode && !before(end_seq, tp->snd_up))
-					tp->urg_mode = 0;
-			}
+
+			if ((sacked & TCPCB_URG) && tp->urg_mode &&
+			    !before(end_seq, tp->snd_up))
+				tp->urg_mode = 0;
 		} else if (seq_rtt < 0) {
 			seq_rtt = now - scb->when;
 			if (fully_acked)
@@ -2648,12 +2642,12 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		tcp_clear_all_retrans_hints(tp);
 	}
 
-	if (acked&FLAG_ACKED) {
+	if (flag & FLAG_ACKED) {
 		u32 pkts_acked = prior_packets - tp->packets_out;
 		const struct tcp_congestion_ops *ca_ops
 			= inet_csk(sk)->icsk_ca_ops;
 
-		tcp_ack_update_rtt(sk, acked, seq_rtt);
+		tcp_ack_update_rtt(sk, flag, seq_rtt);
 		tcp_rearm_rto(sk);
 
 		tp->fackets_out -= min(pkts_acked, tp->fackets_out);
@@ -2664,7 +2658,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 			s32 rtt_us = -1;
 
 			/* Is the ACK triggering packet unambiguous? */
-			if (!(acked & FLAG_RETRANS_DATA_ACKED)) {
+			if (!(flag & FLAG_RETRANS_DATA_ACKED)) {
 				/* High resolution needed and available? */
 				if (ca_ops->flags & TCP_CONG_RTT_STAMP &&
 				    !ktime_equal(last_ackt,
@@ -2703,7 +2697,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	}
 #endif
 	*seq_rtt_p = seq_rtt;
-	return acked;
+	return flag;
 }
 
 static void tcp_ack_probe(struct sock *sk)
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users
  2007-09-20 12:17         ` [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue Ilpo Järvinen
@ 2007-09-20 12:17           ` Ilpo Järvinen
  2007-09-20 12:17             ` [PATCH 7/9] [TCP] FRTO: Update sysctl documentation Ilpo Järvinen
  2007-09-20 18:35             ` [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users David Miller
  2007-09-20 18:33           ` [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue David Miller
  1 sibling, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Basically this change enables it, previously other undo_marker
users were left with nothing. Reverse undo_marker logic
completely to get it set right in CA_Loss. On the other hand,
when spurious RTO is detected, clear it. Clearing might be too
heavy for some scenarios but seems safe enough starting point
for now and shouldn't have much effect except in majority of
cases (if in any).

By adding a new FLAG_ we avoid looping through write_queue when
RTO occurs.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   42 +++++++++++++++++++++++++++---------------
 1 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 74accb0..948e79a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -104,6 +104,7 @@ int sysctl_tcp_abc __read_mostly;
 #define FLAG_ONLY_ORIG_SACKED	0x200 /* SACKs only non-rexmit sent before RTO */
 #define FLAG_SND_UNA_ADVANCED	0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */
 #define FLAG_DSACKING_ACK	0x800 /* SACK blocks contained DSACK info */
+#define FLAG_NONHEAD_RETRANS_ACKED	0x1000 /* Non-head rexmitted data was ACKed */
 
 #define FLAG_ACKED		(FLAG_DATA_ACKED|FLAG_SYN_ACKED)
 #define FLAG_NOT_DUP		(FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED)
@@ -1597,6 +1598,8 @@ void tcp_enter_frto(struct sock *sk)
 	tp->undo_retrans = 0;
 
 	skb = tcp_write_queue_head(sk);
+	if (TCP_SKB_CB(skb)->sacked & TCPCB_RETRANS)
+		tp->undo_marker = 0;
 	if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS) {
 		TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
 		tp->retrans_out -= tcp_skb_pcount(skb);
@@ -1646,6 +1649,8 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 			/* ...enter this if branch just for the first segment */
 			flag |= FLAG_DATA_ACKED;
 		} else {
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_RETRANS)
+				tp->undo_marker = 0;
 			TCP_SKB_CB(skb)->sacked &= ~(TCPCB_LOST|TCPCB_SACKED_RETRANS);
 		}
 
@@ -1661,7 +1666,6 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 	tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments;
 	tp->snd_cwnd_cnt = 0;
 	tp->snd_cwnd_stamp = tcp_time_stamp;
-	tp->undo_marker = 0;
 	tp->frto_counter = 0;
 
 	tp->reordering = min_t(unsigned int, tp->reordering,
@@ -2587,20 +2591,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 			end_seq = scb->end_seq;
 		}
 
-		/* Initial outgoing SYN's get put onto the write_queue
-		 * just like anything else we transmit.  It is not
-		 * true data, and if we misinform our callers that
-		 * this ACK acks real data, we will erroneously exit
-		 * connection startup slow start one packet too
-		 * quickly.  This is severely frowned upon behavior.
-		 */
-		if (!(scb->flags & TCPCB_FLAG_SYN)) {
-			flag |= FLAG_DATA_ACKED;
-		} else {
-			flag |= FLAG_SYN_ACKED;
-			tp->retrans_stamp = 0;
-		}
-
 		/* MTU probing checks */
 		if (fully_acked && icsk->icsk_mtup.probe_size &&
 		    !after(tp->mtu_probe.probe_seq_end, scb->end_seq)) {
@@ -2613,6 +2603,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 					tp->retrans_out -= packets_acked;
 				flag |= FLAG_RETRANS_DATA_ACKED;
 				seq_rtt = -1;
+				if ((flag & FLAG_DATA_ACKED) ||
+				    (packets_acked > 1))
+					flag |= FLAG_NONHEAD_RETRANS_ACKED;
 			} else if (seq_rtt < 0) {
 				seq_rtt = now - scb->when;
 				if (fully_acked)
@@ -2634,6 +2627,20 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 		}
 		tp->packets_out -= packets_acked;
 
+		/* Initial outgoing SYN's get put onto the write_queue
+		 * just like anything else we transmit.  It is not
+		 * true data, and if we misinform our callers that
+		 * this ACK acks real data, we will erroneously exit
+		 * connection startup slow start one packet too
+		 * quickly.  This is severely frowned upon behavior.
+		 */
+		if (!(scb->flags & TCPCB_FLAG_SYN)) {
+			flag |= FLAG_DATA_ACKED;
+		} else {
+			flag |= FLAG_SYN_ACKED;
+			tp->retrans_stamp = 0;
+		}
+
 		if (!fully_acked)
 			break;
 
@@ -2852,6 +2859,10 @@ static int tcp_process_frto(struct sock *sk, int flag)
 	if (flag&FLAG_DATA_ACKED)
 		inet_csk(sk)->icsk_retransmits = 0;
 
+	if ((flag & FLAG_NONHEAD_RETRANS_ACKED) ||
+	    ((tp->frto_counter >= 2) && (flag & FLAG_RETRANS_DATA_ACKED)))
+		tp->undo_marker = 0;
+
 	if (!before(tp->snd_una, tp->frto_highmark)) {
 		tcp_enter_frto_loss(sk, (tp->frto_counter == 1 ? 2 : 3), flag);
 		return 1;
@@ -2916,6 +2927,7 @@ static int tcp_process_frto(struct sock *sk, int flag)
 			break;
 		}
 		tp->frto_counter = 0;
+		tp->undo_marker = 0;
 	}
 	return 0;
 }
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/9] [TCP] FRTO: Update sysctl documentation
  2007-09-20 12:17           ` [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users Ilpo Järvinen
@ 2007-09-20 12:17             ` Ilpo Järvinen
  2007-09-20 12:17               ` [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default Ilpo Järvinen
  2007-09-20 18:35               ` [PATCH 7/9] [TCP] FRTO: Update sysctl documentation David Miller
  2007-09-20 18:35             ` [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users David Miller
  1 sibling, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Since the SACK enhanced FRTO was added, the code has been
under test numerous times so remove "experimental" claim
from the documentation. Also be a bit more verbose about
the usage.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 Documentation/networking/ip-sysctl.txt |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 32c2e9d..6ae2fef 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -180,13 +180,20 @@ tcp_fin_timeout - INTEGER
 	to live longer.	Cf. tcp_max_orphans.
 
 tcp_frto - INTEGER
-	Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
+	Enables Forward RTO-Recovery (F-RTO) defined in RFC4138.
+	F-RTO is an enhanced recovery algorithm for TCP retransmission
 	timeouts.  It is particularly beneficial in wireless environments
 	where packet loss is typically due to random radio interference
-	rather than intermediate router congestion. If set to 1, basic
-	version is enabled. 2 enables SACK enhanced F-RTO, which is
-	EXPERIMENTAL. The basic version can be used also when SACK is
-	enabled for a flow through tcp_sack sysctl.
+	rather than intermediate router congestion.  FRTO is sender-side
+	only modification.  Therefore it does not require any support from
+	the peer, but in a typical case, however, where wireless link is
+	the local access link and most of the data flows downlink, the
+	faraway servers should have FRTO enabled to take advantage of it.
+	If set to 1, basic version is enabled.  2 enables SACK enhanced
+	F-RTO if flow uses SACK.  The basic version can be used also when
+	SACK is in use though scenario(s) with it exists where FRTO
+	interacts badly with the packet counting of the SACK enabled TCP
+	flow.
 
 tcp_frto_response - INTEGER
 	When F-RTO has detected that a TCP retransmission timeout was
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default
  2007-09-20 12:17             ` [PATCH 7/9] [TCP] FRTO: Update sysctl documentation Ilpo Järvinen
@ 2007-09-20 12:17               ` Ilpo Järvinen
  2007-09-20 12:17                 ` [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations Ilpo Järvinen
  2007-09-20 18:36                 ` [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default David Miller
  2007-09-20 18:35               ` [PATCH 7/9] [TCP] FRTO: Update sysctl documentation David Miller
  1 sibling, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Most of the description that follows comes from my mail to
netdev (some editing done):

Main obstacle to FRTO use is its deployment as it has to be on
the sender side where as wireless link is often the receiver's
access link. Take initiative on behalf of unlucky receivers and
enable it by default in future Linux TCP senders. Also IETF
seems to interested in advancing FRTO from experimental [1].

How does FRTO help?
===================

FRTO detects spurious RTOs and avoids a number of unnecessary
retransmissions and a couple of other problems that can arise
due to incorrect guess made at RTO (i.e., that segments were
lost when they actually got delayed which is likely to occur
e.g. in wireless environments with link-layer retransmission).
Though FRTO cannot prevent the first (potentially unnecessary)
retransmission at RTO, I suspect that it won't cost that much
even if you have to pay for each bit (won't be that high
percentage out of all packets after all :-)). However, usually
when you have a spurious RTO, not only the first segment
unnecessarily retransmitted but the *whole window*. It goes like
this: all cumulative ACKs got delayed due to in-order delivery,
then TCP will actually send 1.5*original cwnd worth of data in
the RTO's slow-start when the delayed ACKs arrive (basically the
original cwnd worth of it unnecessarily). In case one is
interested in minimizing unnecessary retransmissions e.g. due to
cost, those rexmissions must never see daylight. Besides, in the
worst case the generated burst overloads the bottleneck buffers
which is likely to significantly delay the further progress of
the flow. In case of ll rexmissions, ACK compression often
occurs at the same time making the burst very "sharp edged" (in
that case TCP often loses most of the segments above high_seq
=> very bad performance too). When FRTO is enabled, those
unnecessary retransmissions are fully avoided except for the
first segment and the cwnd behavior after detected spurious RTO
is determined by the response (one can tune that by sysctl).

Basic version (non-SACK enhanced one), FRTO can fail to detect
spurious RTO as spurious and falls back to conservative
behavior. ACK lossage is much less significant than reordering,
usually the FRTO can detect spurious RTO if at least 2
cumulative ACKs from original window are preserved (excluding
the ACK that advances to high_seq). With SACK-enhanced version,
the detection is quite robust.

FRTO should remove the need to set a high lower bound for the
RTO estimator due to delay spikes that occur relatively common
in some environments (esp. in wireless/cellular ones).

[1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.html

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 948e79a..02b549b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -85,7 +85,7 @@ int sysctl_tcp_adv_win_scale __read_mostly = 2;
 int sysctl_tcp_stdurg __read_mostly;
 int sysctl_tcp_rfc1337 __read_mostly;
 int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
-int sysctl_tcp_frto __read_mostly;
+int sysctl_tcp_frto __read_mostly = 2;
 int sysctl_tcp_frto_response __read_mostly;
 int sysctl_tcp_nometrics_save __read_mostly;

-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations
  2007-09-20 12:17               ` [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default Ilpo Järvinen
@ 2007-09-20 12:17                 ` Ilpo Järvinen
  2007-09-20 18:41                   ` David Miller
  2007-09-20 18:36                 ` [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default David Miller
  1 sibling, 1 reply; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

There's no reason to clear the sacktag skb hint when small part
of the rexmit queue changes. Account changes (if any) instead when
fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
no need to discard SACK tag hint at all.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/net/tcp.h     |    6 +++++-
 net/ipv4/tcp_input.c  |   14 ++++++++------
 net/ipv4/tcp_output.c |   12 ++++++++----
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 16dfe3c..07b1faa 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1067,11 +1067,15 @@ static inline void tcp_mib_init(void)
 }
 
 /* from STCP */
-static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {
+static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) {
 	tp->lost_skb_hint = NULL;
 	tp->scoreboard_skb_hint = NULL;
 	tp->retransmit_skb_hint = NULL;
 	tp->forward_skb_hint = NULL;
+}
+
+static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {
+	tcp_clear_retrans_hints_partial(tp);
 	tp->fastpath_skb_hint = NULL;
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 02b549b..1092b5a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1674,7 +1674,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 	tp->high_seq = tp->frto_highmark;
 	TCP_ECN_queue_cwr(tp);
 
-	tcp_clear_all_retrans_hints(tp);
+	tcp_clear_retrans_hints_partial(tp);
 }
 
 void tcp_clear_retrans(struct tcp_sock *tp)
@@ -1714,10 +1714,14 @@ void tcp_enter_loss(struct sock *sk, int how)
 	tp->bytes_acked = 0;
 	tcp_clear_retrans(tp);
 
-	/* Push undo marker, if it was plain RTO and nothing
-	 * was retransmitted. */
-	if (!how)
+	if (!how) {
+		/* Push undo marker, if it was plain RTO and nothing
+		 * was retransmitted. */
 		tp->undo_marker = tp->snd_una;
+		tcp_clear_retrans_hints_partial(tp);
+	} else {
+		tcp_clear_all_retrans_hints(tp);
+	}
 
 	tcp_for_write_queue(skb, sk) {
 		if (skb == tcp_send_head(sk))
@@ -1744,8 +1748,6 @@ void tcp_enter_loss(struct sock *sk, int how)
 	TCP_ECN_queue_cwr(tp);
 	/* Abort FRTO algorithm if one is in progress */
 	tp->frto_counter = 0;
-
-	tcp_clear_all_retrans_hints(tp);
 }
 
 static int tcp_check_sack_reneging(struct sock *sk)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f46d24b..cbb83ac 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -687,7 +687,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 
 	BUG_ON(len > skb->len);
 
-	tcp_clear_all_retrans_hints(tp);
+	tcp_clear_retrans_hints_partial(tp);
 	nsize = skb_headlen(skb) - len;
 	if (nsize < 0)
 		nsize = 0;
@@ -1718,9 +1718,6 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		BUG_ON(tcp_skb_pcount(skb) != 1 ||
 		       tcp_skb_pcount(next_skb) != 1);
 
-		/* changing transmit queue under us so clear hints */
-		tcp_clear_all_retrans_hints(tp);
-
 		/* Ok.	We will be able to collapse the packet. */
 		tcp_unlink_write_queue(next_skb, sk);
 
@@ -1759,6 +1756,13 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 
 		tcp_adjust_fackets_out(tp, skb, tcp_skb_pcount(next_skb));
 		tp->packets_out -= tcp_skb_pcount(next_skb);
+
+		/* changed transmit queue under us so clear hints */
+		tcp_clear_retrans_hints_partial(tp);
+		/* manually tune sacktag skb hint */
+		if (tp->fastpath_skb_hint == next_skb)
+			tp->fastpath_skb_hint = skb;
+
 		sk_stream_free_skb(sk, next_skb);
 	}
 }
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb
  2007-09-20 12:17 ` [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb Ilpo Järvinen
  2007-09-20 12:17   ` [PATCH 2/9] [TCP]: Make fackets_out accurate Ilpo Järvinen
@ 2007-09-20 18:29   ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-09-20 18:29 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 20 Sep 2007 15:17:44 +0300

> In general, it should not be necessary to call tcp_fragment for
> already SACKed skbs, but it's better to be safe than sorry. And
> indeed, it can be called from sacktag when a DSACK arrives or
> some ACK (with SACK) reordering occurs (sacktag could be made
> to avoid the call in the latter case though I'm not sure if it's
> worth of the trouble and added complexity to cover such marginal
> case).
> 
> The collapse case has return for SACKED_ACKED case earlier, so
> just WARN_ON if internal inconsistency is detected for some
> reason.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied, thanks Ilpo.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/9] [TCP]: Make fackets_out accurate
  2007-09-20 12:17   ` [PATCH 2/9] [TCP]: Make fackets_out accurate Ilpo Järvinen
  2007-09-20 12:17     ` [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_ Ilpo Järvinen
@ 2007-09-20 18:30     ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-09-20 18:30 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 20 Sep 2007 15:17:45 +0300

> Substraction for fackets_out is unconditional when snd_una
> advances, thus there's no need to do it inside the loop. Just
> make sure correct bounds are honored.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_
  2007-09-20 12:17     ` [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_ Ilpo Järvinen
  2007-09-20 12:17       ` [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue Ilpo Järvinen
@ 2007-09-20 18:31       ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-09-20 18:31 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 20 Sep 2007 15:17:46 +0300

> In addition, fix its function comment spacing.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied.

> -/*from STCP */
> -static inline void clear_all_retrans_hints(struct tcp_sock *tp){
> +/* from STCP */
> +static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {

This brace should also be on a line by itself.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue
  2007-09-20 12:17       ` [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue Ilpo Järvinen
  2007-09-20 12:17         ` [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue Ilpo Järvinen
@ 2007-09-20 18:33         ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-09-20 18:33 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 20 Sep 2007 15:17:47 +0300

> The accounting code is pretty much the same, so it's a shame
> we do it in two places.
> 
> I'm not too sure if added fully_acked check in MTU probing is
> really what we want perhaps the added end_seq could be used in
> the after() comparison.

Indeed there are a bunch of tradeoffs to consider when
handling the TSO-partial-ack cases.

> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied, thanks Ilpo.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue
  2007-09-20 12:17         ` [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue Ilpo Järvinen
  2007-09-20 12:17           ` [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users Ilpo Järvinen
@ 2007-09-20 18:33           ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-09-20 18:33 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 20 Sep 2007 15:17:48 +0300

> Implements following cleanups:
> - Comment re-placement (CodingStyle)
> - tcp_tso_acked() local (wrapper-like) variable removal
>   (readability)
> - __-types removed (IMHO they make local variables jumpy looking
>   and just was space)
> - acked -> flag (naming conventions elsewhere in TCP code)
> - linebreak adjustments (readability)
> - nested if()s combined (reduced indentation)
> - clarifying newlines added
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users
  2007-09-20 12:17           ` [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users Ilpo Järvinen
  2007-09-20 12:17             ` [PATCH 7/9] [TCP] FRTO: Update sysctl documentation Ilpo Järvinen
@ 2007-09-20 18:35             ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-09-20 18:35 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 20 Sep 2007 15:17:49 +0300

> Basically this change enables it, previously other undo_marker
> users were left with nothing. Reverse undo_marker logic
> completely to get it set right in CA_Loss. On the other hand,
> when spurious RTO is detected, clear it. Clearing might be too
> heavy for some scenarios but seems safe enough starting point
> for now and shouldn't have much effect except in majority of
> cases (if in any).
> 
> By adding a new FLAG_ we avoid looping through write_queue when
> RTO occurs.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied.  Thanks for following up on all of this stuff to
get FRTO in shape.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/9] [TCP] FRTO: Update sysctl documentation
  2007-09-20 12:17             ` [PATCH 7/9] [TCP] FRTO: Update sysctl documentation Ilpo Järvinen
  2007-09-20 12:17               ` [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default Ilpo Järvinen
@ 2007-09-20 18:35               ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-09-20 18:35 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 20 Sep 2007 15:17:50 +0300

> Since the SACK enhanced FRTO was added, the code has been
> under test numerous times so remove "experimental" claim
> from the documentation. Also be a bit more verbose about
> the usage.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

APplied.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default
  2007-09-20 12:17               ` [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default Ilpo Järvinen
  2007-09-20 12:17                 ` [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations Ilpo Järvinen
@ 2007-09-20 18:36                 ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-09-20 18:36 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 20 Sep 2007 15:17:51 +0300

> Most of the description that follows comes from my mail to
> netdev (some editing done):
 ...
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied, thanks Ilpo!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations
  2007-09-20 12:17                 ` [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations Ilpo Järvinen
@ 2007-09-20 18:41                   ` David Miller
  2007-09-20 19:18                     ` Ilpo Järvinen
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2007-09-20 18:41 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 20 Sep 2007 15:17:52 +0300

> There's no reason to clear the sacktag skb hint when small part
> of the rexmit queue changes. Account changes (if any) instead when
> fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
> no need to discard SACK tag hint at all.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied, and I followed it up with this coding style fixlet.

Thanks!

commit e3723ad866a1e0690f3bc32443180ec1f6657f4a
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Thu Sep 20 11:40:37 2007 -0700

    [TCP]: Minor coding style fixup.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 07b1faa..991ccdc 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1067,14 +1067,16 @@ static inline void tcp_mib_init(void)
 }
 
 /* from STCP */
-static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) {
+static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp)
+{
 	tp->lost_skb_hint = NULL;
 	tp->scoreboard_skb_hint = NULL;
 	tp->retransmit_skb_hint = NULL;
 	tp->forward_skb_hint = NULL;
 }
 
-static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {
+static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp)
+{
 	tcp_clear_retrans_hints_partial(tp);
 	tp->fastpath_skb_hint = NULL;
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations
  2007-09-20 18:41                   ` David Miller
@ 2007-09-20 19:18                     ` Ilpo Järvinen
  0 siblings, 0 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-09-20 19:18 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 673 bytes --]

On Thu, 20 Sep 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Thu, 20 Sep 2007 15:17:52 +0300
> 
> > There's no reason to clear the sacktag skb hint when small part
> > of the rexmit queue changes. Account changes (if any) instead when
> > fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
> > no need to discard SACK tag hint at all.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> Applied, and I followed it up with this coding style fixlet.

Yeah, that's for fixing it... ...Just didn't notice it was left wrong 
while doing things that required more thinking to get them right...

-- 
 i.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-09-20 19:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-20 12:17 [PATCH net-2.6.24 0/9]: TCP improvements & cleanups Ilpo Järvinen
2007-09-20 12:17 ` [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb Ilpo Järvinen
2007-09-20 12:17   ` [PATCH 2/9] [TCP]: Make fackets_out accurate Ilpo Järvinen
2007-09-20 12:17     ` [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_ Ilpo Järvinen
2007-09-20 12:17       ` [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue Ilpo Järvinen
2007-09-20 12:17         ` [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue Ilpo Järvinen
2007-09-20 12:17           ` [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users Ilpo Järvinen
2007-09-20 12:17             ` [PATCH 7/9] [TCP] FRTO: Update sysctl documentation Ilpo Järvinen
2007-09-20 12:17               ` [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default Ilpo Järvinen
2007-09-20 12:17                 ` [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations Ilpo Järvinen
2007-09-20 18:41                   ` David Miller
2007-09-20 19:18                     ` Ilpo Järvinen
2007-09-20 18:36                 ` [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default David Miller
2007-09-20 18:35               ` [PATCH 7/9] [TCP] FRTO: Update sysctl documentation David Miller
2007-09-20 18:35             ` [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users David Miller
2007-09-20 18:33           ` [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue David Miller
2007-09-20 18:33         ` [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue David Miller
2007-09-20 18:31       ` [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_ David Miller
2007-09-20 18:30     ` [PATCH 2/9] [TCP]: Make fackets_out accurate David Miller
2007-09-20 18:29   ` [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).