[PATCH 0/9]: tcp-2.6 patchset

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/9]: tcp-2.6 patchset
@ 2007-05-26  8:35 Ilpo Järvinen
  2007-05-26  8:35 ` [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier Ilpo Järvinen
  2007-05-26 23:44 ` [PATCH 0/9]: tcp-2.6 patchset David Miller
  0 siblings, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:35 UTC (permalink / raw)
  To: netdev; +Cc: David Miller

Hi,

Here are some changes to TCP I've been baking. Before doing this
patchset, I rebased tcp-2.6 branch to the current net-2.6 (goes
almost cleanly) because there are some depencies to the TCP work in
there.

I booted these today and no very obvious problems showed up (OOPSes,
BUG()s, reported scoreboard leaks, etc.) while I did some transfers
with varying sysctl settings: SACK/FACK on/off and FRTO enabled).
However, no in depth analysis on behavior has been made.

The last patch is highly experimental (and incomplete), it should
NOT be applied yet. I suspect that there are at least off-by-MSS
things in it.

I'm asking for your comments.

Dave, you could consider applying other than the last one if they
seem ok to you too (you'll need to rebase your tcp-2.6 in that case
first to apply cleanly those that touch tcp_sync_left_out :-)).

-- 
 i.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier
  2007-05-26  8:35 [PATCH 0/9]: tcp-2.6 patchset Ilpo Järvinen
@ 2007-05-26  8:35 ` Ilpo Järvinen
  2007-05-26  8:35   ` [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting Ilpo Järvinen
  2007-05-31  8:39   ` [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier David Miller
  2007-05-26 23:44 ` [PATCH 0/9]: tcp-2.6 patchset David Miller
  1 sibling, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:35 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo Järvinen

From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   98 ++++++++++++++++++++++++--------------------------
 1 files changed, 47 insertions(+), 51 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index cb81e31..925a26f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1319,6 +1319,53 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb,
 /* F-RTO can only be used if TCP has never retransmitted anything other than
  * head (SACK enhanced variant from Appendix B of RFC4138 is more robust here)
  */
+static void tcp_check_reno_reordering(struct sock *sk, const int addend)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 holes;
+
+	holes = max(tp->lost_out, 1U);
+	holes = min(holes, tp->packets_out);
+
+	if ((tp->sacked_out + holes) > tp->packets_out) {
+		tp->sacked_out = tp->packets_out - holes;
+		tcp_update_reordering(sk, tp->packets_out + addend, 0);
+	}
+}
+
+/* Emulate SACKs for SACKless connection: account for a new dupack. */
+
+static void tcp_add_reno_sack(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	tp->sacked_out++;
+	tcp_check_reno_reordering(sk, 0);
+	tcp_sync_left_out(tp);
+}
+
+/* Account for ACK, ACKing some data in Reno Recovery phase. */
+
+static void tcp_remove_reno_sacks(struct sock *sk, int acked)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (acked > 0) {
+		/* One ACK acked hole. The rest eat duplicate ACKs. */
+		if (acked-1 >= tp->sacked_out)
+			tp->sacked_out = 0;
+		else
+			tp->sacked_out -= acked-1;
+	}
+	tcp_check_reno_reordering(sk, acked);
+	tcp_sync_left_out(tp);
+}
+
+static inline void tcp_reset_reno_sack(struct tcp_sock *tp)
+{
+	tp->sacked_out = 0;
+	tp->left_out = tp->lost_out;
+}
+
 int tcp_use_frto(struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
@@ -1734,57 +1781,6 @@ static int tcp_time_to_recover(struct sock *sk)
 	return 0;
 }
 
-/* If we receive more dupacks than we expected counting segments
- * in assumption of absent reordering, interpret this as reordering.
- * The only another reason could be bug in receiver TCP.
- */
-static void tcp_check_reno_reordering(struct sock *sk, const int addend)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-	u32 holes;
-
-	holes = max(tp->lost_out, 1U);
-	holes = min(holes, tp->packets_out);
-
-	if ((tp->sacked_out + holes) > tp->packets_out) {
-		tp->sacked_out = tp->packets_out - holes;
-		tcp_update_reordering(sk, tp->packets_out + addend, 0);
-	}
-}
-
-/* Emulate SACKs for SACKless connection: account for a new dupack. */
-
-static void tcp_add_reno_sack(struct sock *sk)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-	tp->sacked_out++;
-	tcp_check_reno_reordering(sk, 0);
-	tcp_sync_left_out(tp);
-}
-
-/* Account for ACK, ACKing some data in Reno Recovery phase. */
-
-static void tcp_remove_reno_sacks(struct sock *sk, int acked)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (acked > 0) {
-		/* One ACK acked hole. The rest eat duplicate ACKs. */
-		if (acked-1 >= tp->sacked_out)
-			tp->sacked_out = 0;
-		else
-			tp->sacked_out -= acked-1;
-	}
-	tcp_check_reno_reordering(sk, acked);
-	tcp_sync_left_out(tp);
-}
-
-static inline void tcp_reset_reno_sack(struct tcp_sock *tp)
-{
-	tp->sacked_out = 0;
-	tp->left_out = tp->lost_out;
-}
-
 /* RFC: This is from the original, I doubt that this is necessary at all:
  * clear xmit_retrans hint if seq of this skb is beyond hint. How could we
  * retransmitted past LOST markings in the first place? I'm not fully sure
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting
  2007-05-26  8:35 ` [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier Ilpo Järvinen
@ 2007-05-26  8:35   ` Ilpo Järvinen
  2007-05-26  8:35     ` [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out Ilpo Järvinen
  2007-05-31  8:40     ` [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting David Miller
  2007-05-31  8:39   ` [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier David Miller
  1 sibling, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:35 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo Järvinen

From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>

F-RTO does not touch SACKED_ACKED bits at all, so there is no
need to recount them in tcp_enter_frto_loss. After removal of
the else branch, nested ifs can be combined.

This must also reset sacked_out when SACK is not in use as TCP
could have received some duplicate ACKs prior RTO. To achieve
that in a sane manner, tcp_reset_reno_sack was re-placed by the
previous patch.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   23 +++++++----------------
 1 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 925a26f..4dab5d7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1479,17 +1479,15 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
-	int cnt = 0;
 
-	tp->sacked_out = 0;
 	tp->lost_out = 0;
-	tp->fackets_out = 0;
 	tp->retrans_out = 0;
+	if (IsReno(tp))
+		tcp_reset_reno_sack(tp);
 
 	tcp_for_write_queue(skb, sk) {
 		if (skb == tcp_send_head(sk))
 			break;
-		cnt += tcp_skb_pcount(skb);
 		/*
 		 * Count the retransmission made on RTO correctly (only when
 		 * waiting for the first ACK and did not get it)...
@@ -1501,19 +1499,12 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 		} else {
 			TCP_SKB_CB(skb)->sacked &= ~(TCPCB_LOST|TCPCB_SACKED_RETRANS);
 		}
-		if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED)) {
 
-			/* Do not mark those segments lost that were
-			 * forward transmitted after RTO
-			 */
-			if (!after(TCP_SKB_CB(skb)->end_seq,
-				   tp->frto_highmark)) {
-				TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
-				tp->lost_out += tcp_skb_pcount(skb);
-			}
-		} else {
-			tp->sacked_out += tcp_skb_pcount(skb);
-			tp->fackets_out = cnt;
+		/* Don't lost mark skbs that were fwd transmitted after RTO */
+		if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED) &&
+		    !after(TCP_SKB_CB(skb)->end_seq, tp->frto_highmark)) {
+			TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
+			tp->lost_out += tcp_skb_pcount(skb);
 		}
 	}
 	tcp_sync_left_out(tp);
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out
  2007-05-26  8:35   ` [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting Ilpo Järvinen
@ 2007-05-26  8:35     ` Ilpo Järvinen
  2007-05-26  8:35       ` [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint Ilpo Järvinen
  2007-05-31  8:42       ` [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out David Miller
  2007-05-31  8:40     ` [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting David Miller
  1 sibling, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:35 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo Järvinen

From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>

It is easily calculable when needed and user are not that many
after all.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/linux/tcp.h      |    1 -
 include/net/tcp.h        |    4 ++--
 net/ipv4/tcp_input.c     |   11 ++---------
 net/ipv4/tcp_minisocks.c |    1 -
 net/ipv4/tcp_output.c    |   12 +++---------
 5 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 728321f..63e04f4 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -305,7 +305,6 @@ struct tcp_sock {
 	u32	rtt_seq;	/* sequence number to update rttvar	*/
 
 	u32	packets_out;	/* Packets which are "in flight"	*/
-	u32	left_out;	/* Packets which leaved network	*/
 	u32	retrans_out;	/* Retransmitted packets out		*/
 /*
  *      Options received (usually on last packet, some only on SYN packets).
diff --git a/include/net/tcp.h b/include/net/tcp.h
index dd58cf2..5d866a4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -716,7 +716,8 @@ static inline void tcp_ca_event(struct sock *sk, const enum tcp_ca_event event)
  */
 static inline unsigned int tcp_packets_in_flight(const struct tcp_sock *tp)
 {
-	return (tp->packets_out - tp->left_out + tp->retrans_out);
+	return tp->packets_out - (tp->sacked_out + tp->lost_out) +
+		tp->retrans_out;
 }
 
 /* If cwnd > ssthresh, we may raise ssthresh to be half-way to cwnd.
@@ -738,7 +739,6 @@ static inline void tcp_sync_left_out(struct tcp_sock *tp)
 {
 	BUG_ON(tp->rx_opt.sack_ok &&
 	       (tp->sacked_out + tp->lost_out > tp->packets_out));
-	tp->left_out = tp->sacked_out + tp->lost_out;
 }
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4dab5d7..c17e077 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1301,8 +1301,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb,
 		}
 	}
 
-	tp->left_out = tp->sacked_out + tp->lost_out;
-
 	if ((state.reord < tp->fackets_out) && icsk->icsk_ca_state != TCP_CA_Loss &&
 	    (!tp->frto_highmark || after(tp->snd_una, tp->frto_highmark)))
 		tcp_update_reordering(sk, ((tp->fackets_out + 1) - state.reord), 0);
@@ -1363,7 +1361,6 @@ static void tcp_remove_reno_sacks(struct sock *sk, int acked)
 static inline void tcp_reset_reno_sack(struct tcp_sock *tp)
 {
 	tp->sacked_out = 0;
-	tp->left_out = tp->lost_out;
 }
 
 int tcp_use_frto(struct sock *sk)
@@ -1526,7 +1523,6 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 
 void tcp_clear_retrans(struct tcp_sock *tp)
 {
-	tp->left_out = 0;
 	tp->retrans_out = 0;
 
 	tp->fackets_out = 0;
@@ -2023,7 +2019,7 @@ static void DBGUNDO(struct sock *sk, const char *msg)
 	printk(KERN_DEBUG "Undo %s %u.%u.%u.%u/%u c%u l%u ss%u/%u p%u\n",
 	       msg,
 	       NIPQUAD(inet->daddr), ntohs(inet->dport),
-	       tp->snd_cwnd, tp->left_out,
+	       tp->snd_cwnd, tp->sacked_out + tp->lost_out,
 	       tp->snd_ssthresh, tp->prior_ssthresh,
 	       tp->packets_out);
 }
@@ -2152,7 +2148,6 @@ static int tcp_try_undo_loss(struct sock *sk)
 
 		DBGUNDO(sk, "partial loss");
 		tp->lost_out = 0;
-		tp->left_out = tp->sacked_out;
 		tcp_undo_cwr(sk, 1);
 		NET_INC_STATS_BH(LINUX_MIB_TCPLOSSUNDO);
 		inet_csk(sk)->icsk_retransmits = 0;
@@ -2176,8 +2171,6 @@ static void tcp_try_to_open(struct sock *sk, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	tp->left_out = tp->sacked_out;
-
 	if (tp->retrans_out == 0)
 		tp->retrans_stamp = 0;
 
@@ -2187,7 +2180,7 @@ static void tcp_try_to_open(struct sock *sk, int flag)
 	if (inet_csk(sk)->icsk_ca_state != TCP_CA_CWR) {
 		int state = TCP_CA_Open;
 
-		if (tp->left_out || tp->retrans_out || tp->undo_marker)
+		if (tp->sacked_out || tp->retrans_out || tp->undo_marker)
 			state = TCP_CA_Disorder;
 
 		if (inet_csk(sk)->icsk_ca_state != state) {
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 42588eb..598ddec 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -399,7 +399,6 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
 		newicsk->icsk_rto = TCP_TIMEOUT_INIT;
 
 		newtp->packets_out = 0;
-		newtp->left_out = 0;
 		newtp->retrans_out = 0;
 		newtp->sacked_out = 0;
 		newtp->fackets_out = 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8891775..d67fb3d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -682,10 +682,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 		if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS)
 			tp->retrans_out -= diff;
 
-		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST) {
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
 			tp->lost_out -= diff;
-			tp->left_out -= diff;
-		}
 
 		if (diff > 0) {
 			/* Adjust Reno SACK estimate. */
@@ -1675,15 +1673,11 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked&(TCPCB_EVER_RETRANS|TCPCB_AT_TAIL);
 		if (TCP_SKB_CB(next_skb)->sacked&TCPCB_SACKED_RETRANS)
 			tp->retrans_out -= tcp_skb_pcount(next_skb);
-		if (TCP_SKB_CB(next_skb)->sacked&TCPCB_LOST) {
+		if (TCP_SKB_CB(next_skb)->sacked&TCPCB_LOST)
 			tp->lost_out -= tcp_skb_pcount(next_skb);
-			tp->left_out -= tcp_skb_pcount(next_skb);
-		}
 		/* Reno case is special. Sigh... */
-		if (!tp->rx_opt.sack_ok && tp->sacked_out) {
+		if (!tp->rx_opt.sack_ok && tp->sacked_out)
 			tcp_dec_pcount_approx(&tp->sacked_out, next_skb);
-			tp->left_out -= tcp_skb_pcount(next_skb);
-		}
 
 		/* Not quite right: it can be > snd.fack, but
 		 * it is better to underestimate fackets.
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint
  2007-05-26  8:35     ` [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out Ilpo Järvinen
@ 2007-05-26  8:35       ` Ilpo Järvinen
  2007-05-26  8:35         ` [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it Ilpo Järvinen
  2007-05-31  8:43         ` [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint David Miller
  2007-05-31  8:42       ` [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out David Miller
  1 sibling, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:35 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo Järvinen

From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>

In addition, added a reference about the purpose of the loop.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/linux/tcp.h   |    1 -
 net/ipv4/tcp_output.c |   23 +++++++++--------------
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 63e04f4..1ce9fd4 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -343,7 +343,6 @@ struct tcp_sock {
 
 	int     fastpath_cnt_hint;
 	int     retransmit_cnt_hint;
-	int     forward_cnt_hint;
 
 	u16	advmss;		/* Advertised MSS			*/
 	u16	prior_ssthresh; /* ssthresh saved at recovery start	*/
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d67fb3d..f33e25b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1931,33 +1931,28 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 	 * and retransmission... Both ways have their merits...
 	 *
 	 * For now we do not retransmit anything, while we have some new
-	 * segments to send.
+	 * segments to send. In the other cases, follow rule 3 for
+	 * NextSeg() specified in RFC3517.
 	 */
 
 	if (tcp_may_send_now(sk))
 		return;
 
-	if (tp->forward_skb_hint) {
+	/* If nothing is SACKed, highest_sack in the loop won't be valid */
+	if (!tp->sacked_out)
+		return;
+
+	if (tp->forward_skb_hint)
 		skb = tp->forward_skb_hint;
-		packet_cnt = tp->forward_cnt_hint;
-	} else{
+	else
 		skb = tcp_write_queue_head(sk);
-		packet_cnt = 0;
-	}
 
 	tcp_for_write_queue_from(skb, sk) {
 		if (skb == tcp_send_head(sk))
 			break;
-		tp->forward_cnt_hint = packet_cnt;
 		tp->forward_skb_hint = skb;
 
-		/* Similar to the retransmit loop above we
-		 * can pretend that the retransmitted SKB
-		 * we send out here will be composed of one
-		 * real MSS sized packet because tcp_retransmit_skb()
-		 * will fragment it if necessary.
-		 */
-		if (++packet_cnt > tp->fackets_out)
+		if (after(TCP_SKB_CB(skb)->seq, tp->highest_sack))
 			break;
 
 		if (tcp_packets_in_flight(tp) >= tp->snd_cwnd)
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it
  2007-05-26  8:35       ` [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint Ilpo Järvinen
@ 2007-05-26  8:35         ` Ilpo Järvinen
  2007-05-26  8:35           ` [PATCH 6/9] [TCP]: Reorganize lost marking code Ilpo Järvinen
  2007-05-31  8:43           ` [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it David Miller
  2007-05-31  8:43         ` [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint David Miller
  1 sibling, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:35 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo Järvinen

From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>

No other users exist for tcp_ecn.h. Very few things remain in
tcp.h, for most TCP ECN functions callers reside within a
single .c file and can be placed there.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/net/tcp.h        |   14 ++++-
 include/net/tcp_ecn.h    |  130 ----------------------------------------------
 net/ipv4/tcp_input.c     |   50 ++++++++++++++++++
 net/ipv4/tcp_minisocks.c |    6 ++
 net/ipv4/tcp_output.c    |   50 ++++++++++++++++++
 5 files changed, 118 insertions(+), 132 deletions(-)
 delete mode 100644 include/net/tcp_ecn.h

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5d866a4..6f88d13 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -39,6 +39,7 @@
 #include <net/snmp.h>
 #include <net/ip.h>
 #include <net/tcp_states.h>
+#include <net/inet_ecn.h>
 
 #include <linux/seq_file.h>
 
@@ -324,6 +325,17 @@ static inline void tcp_clear_options(struct tcp_options_received *rx_opt)
  	rx_opt->tstamp_ok = rx_opt->sack_ok = rx_opt->wscale_ok = rx_opt->snd_wscale = 0;
 }
 
+#define	TCP_ECN_OK		1
+#define	TCP_ECN_QUEUE_CWR	2
+#define	TCP_ECN_DEMAND_CWR	4
+
+static __inline__ void
+TCP_ECN_create_request(struct request_sock *req, struct tcphdr *th)
+{
+	if (sysctl_tcp_ecn && th->ece && th->cwr)
+		inet_rsk(req)->ecn_ok = 1;
+}
+
 enum tcp_tw_status
 {
 	TCP_TW_SUCCESS = 0,
@@ -567,8 +579,6 @@ struct tcp_skb_cb {
 
 #define TCP_SKB_CB(__skb)	((struct tcp_skb_cb *)&((__skb)->cb[0]))
 
-#include <net/tcp_ecn.h>
-
 /* Due to TSO, an SKB can be composed of multiple actual
  * packets.  To keep these tracked properly, we use this.
  */
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
deleted file mode 100644
index 89eb3e0..0000000
--- a/include/net/tcp_ecn.h
+++ /dev/null
@@ -1,130 +0,0 @@
-#ifndef _NET_TCP_ECN_H_
-#define _NET_TCP_ECN_H_ 1
-
-#include <net/inet_ecn.h>
-#include <net/request_sock.h>
-
-#define TCP_HP_BITS (~(TCP_RESERVED_BITS|TCP_FLAG_PSH))
-
-#define	TCP_ECN_OK		1
-#define TCP_ECN_QUEUE_CWR	2
-#define TCP_ECN_DEMAND_CWR	4
-
-static inline void TCP_ECN_queue_cwr(struct tcp_sock *tp)
-{
-	if (tp->ecn_flags&TCP_ECN_OK)
-		tp->ecn_flags |= TCP_ECN_QUEUE_CWR;
-}
-
-
-/* Output functions */
-
-static inline void TCP_ECN_send_synack(struct tcp_sock *tp,
-				       struct sk_buff *skb)
-{
-	TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_CWR;
-	if (!(tp->ecn_flags&TCP_ECN_OK))
-		TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_ECE;
-}
-
-static inline void TCP_ECN_send_syn(struct sock *sk, struct sk_buff *skb)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	tp->ecn_flags = 0;
-	if (sysctl_tcp_ecn) {
-		TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_ECE|TCPCB_FLAG_CWR;
-		tp->ecn_flags = TCP_ECN_OK;
-	}
-}
-
-static __inline__ void
-TCP_ECN_make_synack(struct request_sock *req, struct tcphdr *th)
-{
-	if (inet_rsk(req)->ecn_ok)
-		th->ece = 1;
-}
-
-static inline void TCP_ECN_send(struct sock *sk, struct sk_buff *skb,
-				int tcp_header_len)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (tp->ecn_flags & TCP_ECN_OK) {
-		/* Not-retransmitted data segment: set ECT and inject CWR. */
-		if (skb->len != tcp_header_len &&
-		    !before(TCP_SKB_CB(skb)->seq, tp->snd_nxt)) {
-			INET_ECN_xmit(sk);
-			if (tp->ecn_flags&TCP_ECN_QUEUE_CWR) {
-				tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR;
-				tcp_hdr(skb)->cwr = 1;
-				skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
-			}
-		} else {
-			/* ACK or retransmitted segment: clear ECT|CE */
-			INET_ECN_dontxmit(sk);
-		}
-		if (tp->ecn_flags & TCP_ECN_DEMAND_CWR)
-			tcp_hdr(skb)->ece = 1;
-	}
-}
-
-/* Input functions */
-
-static inline void TCP_ECN_accept_cwr(struct tcp_sock *tp, struct sk_buff *skb)
-{
-	if (tcp_hdr(skb)->cwr)
-		tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
-}
-
-static inline void TCP_ECN_withdraw_cwr(struct tcp_sock *tp)
-{
-	tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
-}
-
-static inline void TCP_ECN_check_ce(struct tcp_sock *tp, struct sk_buff *skb)
-{
-	if (tp->ecn_flags&TCP_ECN_OK) {
-		if (INET_ECN_is_ce(TCP_SKB_CB(skb)->flags))
-			tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
-		/* Funny extension: if ECT is not set on a segment,
-		 * it is surely retransmit. It is not in ECN RFC,
-		 * but Linux follows this rule. */
-		else if (INET_ECN_is_not_ect((TCP_SKB_CB(skb)->flags)))
-			tcp_enter_quickack_mode((struct sock *)tp);
-	}
-}
-
-static inline void TCP_ECN_rcv_synack(struct tcp_sock *tp, struct tcphdr *th)
-{
-	if ((tp->ecn_flags&TCP_ECN_OK) && (!th->ece || th->cwr))
-		tp->ecn_flags &= ~TCP_ECN_OK;
-}
-
-static inline void TCP_ECN_rcv_syn(struct tcp_sock *tp, struct tcphdr *th)
-{
-	if ((tp->ecn_flags&TCP_ECN_OK) && (!th->ece || !th->cwr))
-		tp->ecn_flags &= ~TCP_ECN_OK;
-}
-
-static inline int TCP_ECN_rcv_ecn_echo(struct tcp_sock *tp, struct tcphdr *th)
-{
-	if (th->ece && !th->syn && (tp->ecn_flags&TCP_ECN_OK))
-		return 1;
-	return 0;
-}
-
-static inline void TCP_ECN_openreq_child(struct tcp_sock *tp,
-					 struct request_sock *req)
-{
-	tp->ecn_flags = inet_rsk(req)->ecn_ok ? TCP_ECN_OK : 0;
-}
-
-static __inline__ void
-TCP_ECN_create_request(struct request_sock *req, struct tcphdr *th)
-{
-	if (sysctl_tcp_ecn && th->ece && th->cwr)
-		inet_rsk(req)->ecn_ok = 1;
-}
-
-#endif
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c17e077..452a3b0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -116,6 +116,7 @@ int sysctl_tcp_abc __read_mostly;
 #define IsSackFrto() (sysctl_tcp_frto == 0x2)
 
 #define TCP_REMNANT (TCP_FLAG_FIN|TCP_FLAG_URG|TCP_FLAG_SYN|TCP_FLAG_PSH)
+#define TCP_HP_BITS (~(TCP_RESERVED_BITS|TCP_FLAG_PSH))
 
 /* Adapt the MSS value used to make delayed ack decision to the
  * real world.
@@ -196,6 +197,55 @@ static inline int tcp_in_quickack_mode(const struct sock *sk)
 	return icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong;
 }
 
+static inline void TCP_ECN_queue_cwr(struct tcp_sock *tp)
+{
+	if (tp->ecn_flags&TCP_ECN_OK)
+		tp->ecn_flags |= TCP_ECN_QUEUE_CWR;
+}
+
+static inline void TCP_ECN_accept_cwr(struct tcp_sock *tp, struct sk_buff *skb)
+{
+	if (tcp_hdr(skb)->cwr)
+		tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
+}
+
+static inline void TCP_ECN_withdraw_cwr(struct tcp_sock *tp)
+{
+	tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
+}
+
+static inline void TCP_ECN_check_ce(struct tcp_sock *tp, struct sk_buff *skb)
+{
+	if (tp->ecn_flags&TCP_ECN_OK) {
+		if (INET_ECN_is_ce(TCP_SKB_CB(skb)->flags))
+			tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
+		/* Funny extension: if ECT is not set on a segment,
+		 * it is surely retransmit. It is not in ECN RFC,
+		 * but Linux follows this rule. */
+		else if (INET_ECN_is_not_ect((TCP_SKB_CB(skb)->flags)))
+			tcp_enter_quickack_mode((struct sock *)tp);
+	}
+}
+
+static inline void TCP_ECN_rcv_synack(struct tcp_sock *tp, struct tcphdr *th)
+{
+	if ((tp->ecn_flags&TCP_ECN_OK) && (!th->ece || th->cwr))
+		tp->ecn_flags &= ~TCP_ECN_OK;
+}
+
+static inline void TCP_ECN_rcv_syn(struct tcp_sock *tp, struct tcphdr *th)
+{
+	if ((tp->ecn_flags&TCP_ECN_OK) && (!th->ece || !th->cwr))
+		tp->ecn_flags &= ~TCP_ECN_OK;
+}
+
+static inline int TCP_ECN_rcv_ecn_echo(struct tcp_sock *tp, struct tcphdr *th)
+{
+	if (th->ece && !th->syn && (tp->ecn_flags&TCP_ECN_OK))
+		return 1;
+	return 0;
+}
+
 /* Buffer size and advertised window tuning.
  *
  * 1. Tuning sk->sk_sndbuf, when connection enters established state.
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 598ddec..0ad4f36 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -368,6 +368,12 @@ void tcp_twsk_destructor(struct sock *sk)
 
 EXPORT_SYMBOL_GPL(tcp_twsk_destructor);
 
+static inline void TCP_ECN_openreq_child(struct tcp_sock *tp,
+					 struct request_sock *req)
+{
+	tp->ecn_flags = inet_rsk(req)->ecn_ok ? TCP_ECN_OK : 0;
+}
+
 /* This is not only more efficient than what we used to do, it eliminates
  * a lot of code duplication between IPv4/IPv6 SYN recv processing. -DaveM
  *
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f33e25b..23ee283 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -269,6 +269,56 @@ static u16 tcp_select_window(struct sock *sk)
 	return new_win;
 }
 
+static inline void TCP_ECN_send_synack(struct tcp_sock *tp,
+				       struct sk_buff *skb)
+{
+	TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_CWR;
+	if (!(tp->ecn_flags&TCP_ECN_OK))
+		TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_ECE;
+}
+
+static inline void TCP_ECN_send_syn(struct sock *sk, struct sk_buff *skb)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	tp->ecn_flags = 0;
+	if (sysctl_tcp_ecn) {
+		TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_ECE|TCPCB_FLAG_CWR;
+		tp->ecn_flags = TCP_ECN_OK;
+	}
+}
+
+static __inline__ void
+TCP_ECN_make_synack(struct request_sock *req, struct tcphdr *th)
+{
+	if (inet_rsk(req)->ecn_ok)
+		th->ece = 1;
+}
+
+static inline void TCP_ECN_send(struct sock *sk, struct sk_buff *skb,
+				int tcp_header_len)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (tp->ecn_flags & TCP_ECN_OK) {
+		/* Not-retransmitted data segment: set ECT and inject CWR. */
+		if (skb->len != tcp_header_len &&
+		    !before(TCP_SKB_CB(skb)->seq, tp->snd_nxt)) {
+			INET_ECN_xmit(sk);
+			if (tp->ecn_flags&TCP_ECN_QUEUE_CWR) {
+				tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR;
+				tcp_hdr(skb)->cwr = 1;
+				skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
+			}
+		} else {
+			/* ACK or retransmitted segment: clear ECT|CE */
+			INET_ECN_dontxmit(sk);
+		}
+		if (tp->ecn_flags & TCP_ECN_DEMAND_CWR)
+			tcp_hdr(skb)->ece = 1;
+	}
+}
+
 static void tcp_build_and_update_options(__be32 *ptr, struct tcp_sock *tp,
 					 __u32 tstamp, __u8 **md5_hash)
 {
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/9] [TCP]: Reorganize lost marking code
  2007-05-26  8:35         ` [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it Ilpo Järvinen
@ 2007-05-26  8:35           ` Ilpo Järvinen
  2007-05-26  8:36             ` [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq Ilpo Järvinen
                               ` (2 more replies)
  2007-05-31  8:43           ` [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it David Miller
  1 sibling, 3 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:35 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo Järvinen

From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>

The indentation started to get scary, so I reorganized code so
that some trivial ifs are in tcp_update_scoreboard and the main
loops remain in tcp_update_scoreboard_fack.

It's much easier to view the actual changes with git-diff -w
than from the messy looking (default) diff.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |  156 +++++++++++++++++++++++++-------------------------
 1 files changed, 78 insertions(+), 78 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 452a3b0..fbce87f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1866,7 +1866,6 @@ static void tcp_mark_head_lost_single(struct sock *sk)
 	    before(tp->snd_una, tp->high_seq)) {
 		TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
 		tp->lost_out += tcp_skb_pcount(skb);
-		tcp_sync_left_out(tp);
 	}
 }
 
@@ -1905,99 +1904,84 @@ static void tcp_mark_head_lost_single(struct sock *sk)
  * Key difference here is that FACK uses both SACK blocks and holes while
  * RFC3517 is using only SACK blocks when counting for reordering.
  */
-static void tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
-				       int fast_rexmit)
+struct sk_buff *tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
+					   int fast_rexmit)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	/* Beware: timedout_continue might be pointing to sk_write_queue */
 	struct sk_buff *timedout_continue = NULL;
 	struct sk_buff *skb;
-
-	if (!tp->sacked_out) {
-		if (IsFack(tp) || fast_rexmit) {
-			tcp_mark_head_lost_single(sk);
-			skb = tcp_write_queue_head(sk);
-			timedout_continue = tcp_write_queue_next(sk, skb);
-		}
-
+	unsigned int holes_seen = 0;
+
+	skb = tcp_write_queue_find(sk, entry_seq);
+	/* If this ever becomes expensive, it can be delayed */
+	timedout_continue = tcp_write_queue_next(sk, skb);
+	if (entry_seq != tp->highest_sack) {
+		/* Not interested in "the last" SACKed one we got */
+		/* RFC: find_below could help here too */
+		skb = tcp_write_queue_prev(sk, skb);
+		if (IsFack(tp) && tcp_skb_timedout(sk, skb) &&
+		    (tp->fackets_out < tp->packets_out)) {
+			timedout_continue = tcp_write_queue_find(sk, tp->highest_sack);
+			timedout_continue = tcp_write_queue_next(sk, timedout_continue);
+		} else
+			timedout_continue = NULL;
 	} else {
-		unsigned int holes_seen = 0;
-
-		skb = tcp_write_queue_find(sk, entry_seq);
-		/* If this ever becomes expensive, it can be delayed */
-		timedout_continue = tcp_write_queue_next(sk, skb);
-		if (entry_seq != tp->highest_sack) {
-			/* Not interested in "the last" SACKed one we got */
-			/* RFC: find_below could help here too */
-			skb = tcp_write_queue_prev(sk, skb);
-			if (IsFack(tp) && tcp_skb_timedout(sk, skb) &&
-			    (tp->fackets_out < tp->packets_out)) {
-				timedout_continue = tcp_write_queue_find(sk, tp->highest_sack);
-				timedout_continue = tcp_write_queue_next(sk, timedout_continue);
-			} else
-				timedout_continue = NULL;
-		} else {
-			unsigned int reord_count = IsFack(tp) ? 0 : 1;
-
-			/* Phase I: Search until TCP can mark */
-			tcp_for_write_queue_backwards_from(skb, sk) {
-				if ((tp->fackets_out <= tp->sacked_out +
-							tp->lost_out +
-							holes_seen) ||
-				    (TCP_SKB_CB(skb)->sacked & TCPCB_LOST))
-					goto backwards_walk_done;
-
-				if (IsFack(tp) && tcp_skb_timedout(sk, skb))
-					break;
-				else
-					timedout_continue = NULL;
-
-				if (IsFack(tp) ||
-				    (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
-					reord_count += tcp_skb_pcount(skb);
-				if (reord_count > tp->reordering) {
-					if (after(TCP_SKB_CB(skb)->seq, tp->high_seq)) {
-						/* RFC: should we have find_below? */
-						skb = tcp_write_queue_find(sk, tp->high_seq);
-						/* Timedout top is again uncertain? */
-						if (tcp_skb_timedout(sk, skb))
-							timedout_continue = skb;
-						skb = tcp_write_queue_prev(sk, skb);
-					}
-					/* TODO?: do skb fragmentation if necessary */
-					break;
-				}
-
-				if (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
-					holes_seen += tcp_skb_pcount(skb);
-			}
-		}
+		unsigned int reord_count = IsFack(tp) ? 0 : 1;
 
-		/* Phase II: Marker */
+		/* Phase I: Search until TCP can mark */
 		tcp_for_write_queue_backwards_from(skb, sk) {
 			if ((tp->fackets_out <= tp->sacked_out + tp->lost_out +
 						holes_seen) ||
 			    (TCP_SKB_CB(skb)->sacked & TCPCB_LOST))
 				goto backwards_walk_done;
 
-			if (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) {
-				TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
-				tp->lost_out += tcp_skb_pcount(skb);
-				tcp_verify_retransmit_hint(tp, skb);
+			if (IsFack(tp) && tcp_skb_timedout(sk, skb))
+				break;
+			else
+				timedout_continue = NULL;
+
+			if (IsFack(tp) ||
+			    (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
+				reord_count += tcp_skb_pcount(skb);
+			if (reord_count > tp->reordering) {
+				if (after(TCP_SKB_CB(skb)->seq, tp->high_seq)) {
+					/* RFC: should we have find_below? */
+					skb = tcp_write_queue_find(sk, tp->high_seq);
+					/* Timedout top is again uncertain? */
+					if (tcp_skb_timedout(sk, skb))
+						timedout_continue = skb;
+					skb = tcp_write_queue_prev(sk, skb);
+				}
+				/* TODO?: do skb fragmentation if necessary */
+				break;
 			}
+
+			if (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
+				holes_seen += tcp_skb_pcount(skb);
 		}
+	}
+
+	/* Phase II: Marker */
+	tcp_for_write_queue_backwards_from(skb, sk) {
+		if ((tp->fackets_out <= tp->sacked_out + tp->lost_out +
+					holes_seen) ||
+		    (TCP_SKB_CB(skb)->sacked & TCPCB_LOST))
+			goto backwards_walk_done;
 
-		/* Phase III: Nothing is still marked?, mark head then */
-		if ((IsFack(tp) || fast_rexmit) && !tp->lost_out)
-			tcp_mark_head_lost_single(sk);
+		if (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) {
+			TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
+			tp->lost_out += tcp_skb_pcount(skb);
+			tcp_verify_retransmit_hint(tp, skb);
+		}
 	}
-backwards_walk_done:
 
-	/* Continue with timedout work */
-	if (IsFack(tp) && (timedout_continue != NULL))
-		tcp_timedout_mark_forward(sk, timedout_continue);
+	/* Phase III: Nothing is still marked?, mark head then */
+	if ((IsFack(tp) || fast_rexmit) && !tp->lost_out)
+		tcp_mark_head_lost_single(sk);
 
-	tcp_sync_left_out(tp);
+backwards_walk_done:
+	return timedout_continue;
 }
 
 /* Account newly detected lost packet(s) */
@@ -2005,11 +1989,27 @@ static void tcp_update_scoreboard(struct sock *sk, u32 sack_entry_seq,
 				  int fast_rexmit)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb = NULL;
+
+	if (!IsReno(tp)) {
+		if (!tp->sacked_out) {
+			if (IsFack(tp) || fast_rexmit) {
+				tcp_mark_head_lost_single(sk);
+				skb = tcp_write_queue_head(sk);
+				skb = tcp_write_queue_next(sk, skb);
+			}
+		} else
+			skb = tcp_update_scoreboard_fack(sk, sack_entry_seq,
+							 fast_rexmit);
 
-	if (!IsReno(tp))
-		tcp_update_scoreboard_fack(sk, sack_entry_seq, fast_rexmit);
-	else
+		/* Continue with timedout work */
+		if (IsFack(tp) && (skb != NULL))
+			tcp_timedout_mark_forward(sk, skb);
+
+	} else
 		tcp_mark_head_lost_single(sk);
+
+	tcp_sync_left_out(tp);
 }
 
 /* CWND moderation, preventing bursts due to too big ACKs
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq
  2007-05-26  8:35           ` [PATCH 6/9] [TCP]: Reorganize lost marking code Ilpo Järvinen
@ 2007-05-26  8:36             ` Ilpo Järvinen
  2007-05-26  8:36               ` [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue Ilpo Järvinen
  2007-05-31  8:44               ` [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq David Miller
  2007-05-27  7:38             ` [PATCH 6/9] [TCP]: Reorganize lost marking code Ilpo Järvinen
  2007-05-31  8:44             ` David Miller
  2 siblings, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:36 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo Järvinen

From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>

In addition, implemented find_below using minus one. Some
reorganization was necessary to make code efficient again.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   31 +++++++++++++++++++++----------
 1 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fbce87f..8c1e38a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1913,22 +1913,33 @@ struct sk_buff *tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
 	struct sk_buff *skb;
 	unsigned int holes_seen = 0;
 
-	skb = tcp_write_queue_find(sk, entry_seq);
-	/* If this ever becomes expensive, it can be delayed */
-	timedout_continue = tcp_write_queue_next(sk, skb);
 	if (entry_seq != tp->highest_sack) {
-		/* Not interested in "the last" SACKed one we got */
-		/* RFC: find_below could help here too */
-		skb = tcp_write_queue_prev(sk, skb);
+		/* Look for skb below min_seq(entry_seq, tp->high_seq) */
+		if (!after(entry_seq, tp->high_seq)) {
+			skb = tcp_write_queue_find(sk, entry_seq - 1);
+		} else {
+			BUG_ON(tp->snd_una == tp->high_seq);
+			skb = tcp_write_queue_find(sk, tp->high_seq - 1);
+		}
+
+		timedout_continue = NULL;
 		if (IsFack(tp) && tcp_skb_timedout(sk, skb) &&
 		    (tp->fackets_out < tp->packets_out)) {
-			timedout_continue = tcp_write_queue_find(sk, tp->highest_sack);
-			timedout_continue = tcp_write_queue_next(sk, timedout_continue);
-		} else
-			timedout_continue = NULL;
+			timedout_continue = tcp_write_queue_next(sk, skb);
+		    	if (!after(entry_seq, tp->high_seq)) {
+				/* Use latest SACK info in skipping past skbs */
+				timedout_continue = tcp_write_queue_find(sk, tp->highest_sack);
+				timedout_continue = tcp_write_queue_next(sk, timedout_continue);
+			}
+		}
+
 	} else {
 		unsigned int reord_count = IsFack(tp) ? 0 : 1;
 
+		skb = tcp_write_queue_find(sk, entry_seq);
+		/* If this ever becomes expensive, it can be delayed */
+		timedout_continue = tcp_write_queue_next(sk, skb);
+
 		/* Phase I: Search until TCP can mark */
 		tcp_for_write_queue_backwards_from(skb, sk) {
 			if ((tp->fackets_out <= tp->sacked_out + tp->lost_out +
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue
  2007-05-26  8:36             ` [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq Ilpo Järvinen
@ 2007-05-26  8:36               ` Ilpo Järvinen
  2007-05-26  8:36                 ` [PATCH 9/9] [RFC] [TCP]: Kill tp->fackets_out (tcp_sock diet program) Ilpo Järvinen
  2007-07-03  5:00                 ` [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue David Miller
  2007-05-31  8:44               ` [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq David Miller
  1 sibling, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:36 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo Järvinen

From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>

Previously TCP had a transitional state during which reno
counted segments that are already below the current window into
sacked_out, which is now prevented. Re-try now unconditional
S+L catching (I wonder if we could get that BUG_ON place
pinpointed more exactly in oops because now inlining makes it
lose its original context as they always seem to be in
tcp_ack, #define helps?).

Beware, this change is not a trivial one and might have some
unexpected side-effects under obscure conditions since state
tracking is to happen much later on and the reno sack counting
was highly depending on the current state.

This approach conservatively calls just remove_sack and leaves
reset_sack() calls alone. The best solution to the whole problem
would be to first calculate the new sacked_out fully (this patch
does not move reno_sack_reset calls from original sites and thus
does not implement this). However, that would require very
invasive change to fastretrans_alert (perhaps even slicing it to
two halves). Alternatively, all callers of tcp_packets_in_flight
(i.e., users that depend on sacked_out) should be postponed
until the new sacked_out has been calculated but it isn't any
simpler alternative.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/net/tcp.h    |    3 +--
 net/ipv4/tcp_input.c |   19 +++++++++++--------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6f88d13..5255d51 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -747,8 +747,7 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
 
 static inline void tcp_sync_left_out(struct tcp_sock *tp)
 {
-	BUG_ON(tp->rx_opt.sack_ok &&
-	       (tp->sacked_out + tp->lost_out > tp->packets_out));
+	BUG_ON(tp->sacked_out + tp->lost_out > tp->packets_out);
 }
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8c1e38a..9c12c90 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2295,7 +2295,7 @@ static void tcp_mtup_probe_success(struct sock *sk, struct sk_buff *skb)
  */
 static void
 tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una,
-		      int prior_packets, int flag, u32 mark_lost_entry_seq)
+		      int pkts_acked, int flag, u32 mark_lost_entry_seq)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -2380,12 +2380,8 @@ tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una,
 		if (prior_snd_una == tp->snd_una) {
 			if (IsReno(tp) && is_dupack)
 				tcp_add_reno_sack(sk);
-		} else {
-			int acked = prior_packets - tp->packets_out;
-			if (IsReno(tp))
-				tcp_remove_reno_sacks(sk, acked);
-			is_dupack = tcp_try_undo_partial(sk, acked);
-		}
+		} else
+			is_dupack = tcp_try_undo_partial(sk, pkts_acked);
 		break;
 	case TCP_CA_Loss:
 		if (flag&FLAG_DATA_ACKED)
@@ -2602,6 +2598,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	struct sk_buff *skb;
 	__u32 now = tcp_time_stamp;
 	int acked = 0;
+	int prior_packets = tp->packets_out;
 	__s32 seq_rtt = -1;
 	u32 pkts_acked = 0;
 	ktime_t last_ackt = ktime_set(0,0);
@@ -2682,6 +2679,11 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		tcp_ack_update_rtt(sk, acked, seq_rtt);
 		tcp_ack_packets_out(sk);
 
+		if (IsReno(tp)) {
+			int acked = prior_packets - tp->packets_out;
+			tcp_remove_reno_sacks(sk, acked);
+		}
+
 		if (ca_ops->pkts_acked)
 			ca_ops->pkts_acked(sk, pkts_acked, last_ackt);
 	}
@@ -3017,7 +3019,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 		if ((flag & FLAG_DATA_ACKED) && !frto_cwnd &&
 		    tcp_may_raise_cwnd(sk, flag))
 			tcp_cong_avoid(sk, ack,  seq_rtt, prior_in_flight, 0);
-		tcp_fastretrans_alert(sk, prior_snd_una, prior_packets, flag,
+		tcp_fastretrans_alert(sk, prior_snd_una,
+				      prior_packets - tp->packets_out, flag,
 				      mark_lost_entry_seq);
 	} else {
 		if ((flag & FLAG_DATA_ACKED) && !frto_cwnd)
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 9/9] [RFC] [TCP]: Kill tp->fackets_out (tcp_sock diet program)
  2007-05-26  8:36               ` [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue Ilpo Järvinen
@ 2007-05-26  8:36                 ` Ilpo Järvinen
  2007-07-03  5:00                 ` [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue David Miller
  1 sibling, 0 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-26  8:36 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Ilpo Järvinen

From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>

The replacement fastpath uses arithmetics to find out fackets_out
(or a necessary subset of it) when the sizes are less than MSS
off from MSS*packets_out. Slowpath walks through the write queue.

Saves some space in tcp_sock. Both fackets_out and SACK
processing fack count hint can be removed right away. However,
this also enables further improvements to SACK processing since
strict ordering constraints due to fack counting are removed,
e.g., the highest_sack might be useful replacement for the skb
hint (we should measure this at some real server, e.g., by adding
mib counts or so to see how often highest_sack and fastpath skb
hint really point to a different thing; I suspect that's being
really marginal proportion if that ever occurs).

Negative side:
- Sending many data segments with SACK blocks (bi-dir flow and
losses at the same round-trip for both directions) can cause TCP
to need slow path.
- No-Nagler will take a great hit but that's what they're
asking for...

Unsolved issues:
- MSS shrinkage must be handled without failure in arithmetics.
Either run tcp_fragment for every too large skb (which is not
going to be a safe alternative since allocs can fail) or somehow
force slow path (so far I haven't found a solution that would
consume less than u32 seqno in tcp_sock)!
- ...A number of potential off by one (or by MSS in this case)
places haven't yet been triple checked...

Enhancement ideas:
- It might be possible to extend fastpath by checking if snd_sml
is below / above highest_sack (could allow to up 2*MSS off case
using fast path)...

All in all, very scary patch. Please do not apply. This is not
yet completely thought one.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/linux/tcp.h      |    2 -
 include/net/tcp.h        |    1 +
 net/ipv4/tcp.c           |    2 +-
 net/ipv4/tcp_input.c     |  216 +++++++++++++++++++++++++++++++++-------------
 net/ipv4/tcp_minisocks.c |    1 -
 net/ipv4/tcp_output.c    |   22 ++----
 6 files changed, 162 insertions(+), 82 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 1ce9fd4..d4ecb1f 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -341,14 +341,12 @@ struct tcp_sock {
 	struct sk_buff *forward_skb_hint;
 	struct sk_buff *fastpath_skb_hint;
 
-	int     fastpath_cnt_hint;
 	int     retransmit_cnt_hint;
 
 	u16	advmss;		/* Advertised MSS			*/
 	u16	prior_ssthresh; /* ssthresh saved at recovery start	*/
 	u32	lost_out;	/* Lost packets			*/
 	u32	sacked_out;	/* SACK'd packets			*/
-	u32	fackets_out;	/* FACK'd packets			*/
 	u32	high_seq;	/* snd_nxt at onset of congestion	*/
 
 	u32	retrans_stamp;	/* Timestamp of the last retransmit,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5255d51..f96a466 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -359,6 +359,7 @@ extern int			tcp_use_frto(struct sock *sk);
 extern void			tcp_enter_frto(struct sock *sk);
 extern void			tcp_enter_loss(struct sock *sk, int how);
 extern void			tcp_clear_retrans(struct tcp_sock *tp);
+extern int			tcp_calc_fackets_out(struct sock *sk);
 extern void			tcp_update_metrics(struct sock *sk);
 
 extern void			tcp_close(struct sock *sk, 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bd4c295..3f9a021 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2020,7 +2020,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	info->tcpi_sacked = tp->sacked_out;
 	info->tcpi_lost = tp->lost_out;
 	info->tcpi_retrans = tp->retrans_out;
-	info->tcpi_fackets = tp->fackets_out;
+	info->tcpi_fackets = tcp_calc_fackets_out(sk);
 
 	info->tcpi_last_data_sent = jiffies_to_msecs(now - tp->lsndtime);
 	info->tcpi_last_data_recv = jiffies_to_msecs(now - icsk->icsk_ack.lrcvtime);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9c12c90..02cb546 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -913,6 +913,82 @@ reset:
 	}
 }
 
+static u32 tcp_expected_fullsized(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct inet_connection_sock *icsk = inet_csk(sk);
+	u32 expected_size;
+
+	expected_size = tp->packets_out * tp->mss_cache;
+	if (icsk->icsk_mtup.probe_size)
+		expected_size += icsk->icsk_mtup.probe_size - tp->mss_cache;
+
+	return expected_size;
+}
+
+static u32 tcp_missing_from_fullsized(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	return tcp_expected_fullsized(sk) - (tp->snd_nxt - tp->snd_una);
+}
+
+/* Returns 1 if less than MSS bytes are missing (off from fullsized
+ * segments). Under such conditions, simple aritmetics can be used to
+ * calculate segment boundaries cheaply (without write_queue walking).
+ *
+ * Idea: range_almost_fullsized could be do to allows even 2*MSS off
+ * cases, not with 100% coverage though, as it's possible exclude one
+ * additional non-MSS sized skb by checking if snd_sml is outside the
+ * range.
+ */
+static int tcp_win_almost_fullsized(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	return tcp_missing_from_fullsized(sk) < tp->mss_cache;
+}
+
+/* Figures out the pcount between skb and end_seq, skb == NULL means head.
+ * in_pkts controls the return format (pcount/bytes). To limit walk lengths,
+ * also maxval is provided (set to packets_out if nothing else is valid).
+ *
+ * Fastpath: use arithmetic
+ * Slowpath: walk skbs
+ */
+static int tcp_size_pkts(struct sock *sk, struct sk_buff *skb, 
+			 u32 end_seq, int maxiter, int in_pkts)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 size;
+
+	if (!tp->packets_out)
+		return 0;
+	if (skb == NULL)
+		skb = tcp_write_queue_head(sk);
+	if (TCP_SKB_CB(skb)->seq == end_seq)
+		return 0;
+
+	if (tcp_win_almost_fullsized(sk)) {
+		size = end_seq + tp->mss_cache - TCP_SKB_CB(skb)->seq;
+		if (in_pkts)
+			size = DIV_ROUND_UP(size, tp->mss_cache);
+
+	} else {
+		size = 0;
+		tcp_for_write_queue_from(skb, sk) {
+			size += tcp_skb_pcount(skb);
+			if (!before(TCP_SKB_CB(skb)->seq, end_seq) ||
+			    (size >= maxiter))
+				break;
+		}
+		if (!in_pkts)
+			size *= tp->mss_cache;
+	}
+
+	return size;
+}				  
+
 static void tcp_update_reordering(struct sock *sk, const int metric,
 				  const int ts)
 {
@@ -930,10 +1006,9 @@ static void tcp_update_reordering(struct sock *sk, const int metric,
 		else
 			NET_INC_STATS_BH(LINUX_MIB_TCPSACKREORDER);
 #if FASTRETRANS_DEBUG > 1
-		printk(KERN_DEBUG "Disorder%d %d %u f%u s%u rr%d\n",
+		printk(KERN_DEBUG "Disorder%d %d %u s%u rr%d\n",
 		       tp->rx_opt.sack_ok, inet_csk(sk)->icsk_ca_state,
 		       tp->reordering,
-		       tp->fackets_out,
 		       tp->sacked_out,
 		       tp->undo_marker ? tp->undo_retrans : 0);
 #endif
@@ -993,12 +1068,20 @@ static void tcp_update_reordering(struct sock *sk, const int metric,
 struct tcp_sacktag_state {
 	unsigned int flag;
 	int dup_sack;
-	int reord;
-	int prior_fackets;
+	struct sk_buff *reord;
 	u32 lost_retrans;
+	u32 prior_hisack;
 	int first_sack_index;
 };
 
+static void tcp_skb_reordered(struct tcp_sacktag_state *state, 
+			      struct sk_buff *skb)
+{
+	if ((state->reord == NULL) ||
+	    before(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(state->reord)->seq))
+		state->reord = skb;
+}
+
 static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
 			   struct tcp_sack_block_wire *sp, int num_sacks,
 			   u32 prior_snd_una)
@@ -1034,7 +1117,7 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
 
 static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
 			    struct tcp_sacktag_state *state, int in_sack,
-			    int fack_count, u32 end_seq)
+			    u32 end_seq)
 {
 	u8 sacked = TCP_SKB_CB(skb)->sacked;
 
@@ -1049,12 +1132,12 @@ static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
 		if (sacked & TCPCB_RETRANS) {
 			if ((state->dup_sack && in_sack) &&
 			    (sacked & TCPCB_SACKED_ACKED))
-				state->reord = min(fack_count, state->reord);
+				tcp_skb_reordered(state, skb);
 		} else {
 			/* If it was in a hole, we detected reordering. */
-			if (fack_count < state->prior_fackets &&
+			if (before(TCP_SKB_CB(skb)->seq, state->prior_hisack) &&
 			    !(sacked & TCPCB_SACKED_ACKED))
-				state->reord = min(fack_count, state->reord);
+				tcp_skb_reordered(state, skb);
 		}
 
 		/* Nothing to do; acked frame is about to be dropped. */
@@ -1089,8 +1172,8 @@ static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
 			 * which was in hole. It is reordering.
 			 */
 			if (!(sacked & TCPCB_RETRANS) &&
-			    fack_count < state->prior_fackets)
-				state->reord = min(fack_count, state->reord);
+			    before(TCP_SKB_CB(skb)->seq, state->prior_hisack))
+				tcp_skb_reordered(state, skb);
 
 			if (sacked & TCPCB_LOST) {
 				TCP_SKB_CB(skb)->sacked &= ~TCPCB_LOST;
@@ -1117,16 +1200,11 @@ static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
 		state->flag |= FLAG_DATA_SACKED;
 		tp->sacked_out += tcp_skb_pcount(skb);
 
-		if (fack_count > tp->fackets_out)
-			tp->fackets_out = fack_count;
-
 		if (after(TCP_SKB_CB(skb)->seq,
 			  tp->highest_sack))
 			tp->highest_sack = TCP_SKB_CB(skb)->seq;
-	} else {
-		if (state->dup_sack && (sacked&TCPCB_RETRANS))
-			state->reord = min(fack_count, state->reord);
-	}
+	} else if (state->dup_sack && (sacked&TCPCB_RETRANS))
+		tcp_skb_reordered(state, skb);
 
 	/* D-SACK. We can detect redundant retransmission
 	 * in S|R and plain R frames and clear it.
@@ -1153,14 +1231,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb,
 	struct sk_buff *cached_skb;
 	int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3;
 	struct tcp_sacktag_state state;
-	int cached_fack_count;
 	int i;
 	int force_one_sack;
 
-	if (!tp->sacked_out) {
-		tp->fackets_out = 0;
+	if (!tp->sacked_out)
 		tp->highest_sack = tp->snd_una;
-	} else
+	else
 		*mark_lost_entry_seq = tp->highest_sack;
 
 	state.dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, prior_snd_una);
@@ -1227,25 +1303,20 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb,
 
 	/* Use SACK fastpath hint if valid */
 	cached_skb = tp->fastpath_skb_hint;
-	cached_fack_count = tp->fastpath_cnt_hint;
-	if (!cached_skb) {
+	if (!cached_skb)
 		cached_skb = tcp_write_queue_head(sk);
-		cached_fack_count = 0;
-	}
 
 	state.flag = 0;
-	state.reord = tp->packets_out;
-	state.prior_fackets = tp->fackets_out;
+	state.reord = NULL;
+	state.prior_hisack = tp->highest_sack;
 	state.lost_retrans = 0;
 
 	for (i=0; i<num_sacks; i++, sp++) {
 		struct sk_buff *skb;
 		__u32 start_seq = ntohl(sp->start_seq);
 		__u32 end_seq = ntohl(sp->end_seq);
-		int fack_count;
 
 		skb = cached_skb;
-		fack_count = cached_fack_count;
 
 		/* Event "B" in the comment above. */
 		if (after(end_seq, tp->high_seq))
@@ -1258,11 +1329,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb,
 				break;
 
 			cached_skb = skb;
-			cached_fack_count = fack_count;
-			if (i == state.first_sack_index) {
+			if (i == state.first_sack_index)
 				tp->fastpath_skb_hint = skb;
-				tp->fastpath_cnt_hint = fack_count;
-			}
 
 			/* The retransmission queue is always in order, so
 			 * we can short-circuit the walk early.
@@ -1293,10 +1361,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb,
 				pcount = tcp_skb_pcount(skb);
 			}
 
-			fack_count += pcount;
-
-			tcp_sacktag_one(skb, tp, &state, in_sack,
-					fack_count, end_seq);
+			tcp_sacktag_one(skb, tp, &state, in_sack, end_seq);
 		}
 
 		/* Prepare non-reno LOST marking fast path entry point, the
@@ -1351,9 +1416,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb,
 		}
 	}
 
-	if ((state.reord < tp->fackets_out) && icsk->icsk_ca_state != TCP_CA_Loss &&
-	    (!tp->frto_highmark || after(tp->snd_una, tp->frto_highmark)))
-		tcp_update_reordering(sk, ((tp->fackets_out + 1) - state.reord), 0);
+	if ((state.reord != NULL) && icsk->icsk_ca_state != TCP_CA_Loss &&
+	    (!tp->frto_highmark || after(tp->snd_una, tp->frto_highmark))) {
+		int reord_cnt;
+		BUG_ON(after(TCP_SKB_CB(state.reord)->seq, tp->highest_sack));
+
+		reord_cnt = tcp_size_pkts(sk, state.reord,
+					  tp->highest_sack + tp->mss_cache,
+					  TCP_MAX_REORDERING, 1);
+		tcp_update_reordering(sk, reord_cnt, 0);
+	}
 
 #if FASTRETRANS_DEBUG > 0
 	BUG_TRAP((int)tp->sacked_out >= 0);
@@ -1575,7 +1647,6 @@ void tcp_clear_retrans(struct tcp_sock *tp)
 {
 	tp->retrans_out = 0;
 
-	tp->fackets_out = 0;
 	tp->sacked_out = 0;
 	tp->lost_out = 0;
 
@@ -1626,7 +1697,6 @@ void tcp_enter_loss(struct sock *sk, int how)
 			tp->lost_out += tcp_skb_pcount(skb);
 		} else {
 			tp->sacked_out += tcp_skb_pcount(skb);
-			tp->fackets_out = cnt;
 		}
 	}
 	tcp_sync_left_out(tp);
@@ -1667,10 +1737,26 @@ static int tcp_check_sack_reneging(struct sock *sk)
 	return 0;
 }
 
-static inline int tcp_fackets_out(struct tcp_sock *tp)
+
+
+int tcp_calc_fackets_out(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (IsReno(tp))
+		return tp->sacked_out + 1;
+
+	return tcp_size_pkts(sk, NULL, tp->highest_sack, tp->packets_out, 1);
+}
+
+static inline int tcp_enough_reordering(struct sock *sk)
 {
-	return (IsReno(tp) || Is3517Sack(tp)) ? tp->sacked_out + 1 :
-						tp->fackets_out;
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (!IsFack(tp))
+		return (tp->sacked_out + 1) * tp->mss_cache;
+
+	return tcp_size_pkts(sk, NULL, tp->highest_sack, tp->reordering, 0);
 }
 
 static inline int tcp_skb_timedout(struct sock *sk, struct sk_buff *skb)
@@ -1793,7 +1879,8 @@ static int tcp_time_to_recover(struct sock *sk)
 		return 1;
 
 	/* Not-A-Trick#2 : Classic rule... */
-	if (tcp_fackets_out(tp) > tp->reordering)
+	/* CHECKME: off-by-one? */
+	if (tcp_enough_reordering(sk) > tp->reordering * tp->mss_cache)
 		return 1;
 
 	/* Trick#3 : when we use RFC2988 timer restart, fast
@@ -1912,6 +1999,16 @@ struct sk_buff *tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
 	struct sk_buff *timedout_continue = NULL;
 	struct sk_buff *skb;
 	unsigned int holes_seen = 0;
+	u32 fackets_out;
+	
+	/* fackets_out is calculated only when it's cheap, in other cases
+	 * we just overestimate it which keeps our arithmetic valid
+	 */
+	if (tcp_win_almost_fullsized(sk))
+		fackets_out = tcp_size_pkts(sk, NULL, tp->highest_sack,
+					    tp->packets_out, 0);
+	else
+		fackets_out = tp->packets_out * tp->mss_cache;
 
 	if (entry_seq != tp->highest_sack) {
 		/* Look for skb below min_seq(entry_seq, tp->high_seq) */
@@ -1924,7 +2021,9 @@ struct sk_buff *tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
 
 		timedout_continue = NULL;
 		if (IsFack(tp) && tcp_skb_timedout(sk, skb) &&
-		    (tp->fackets_out < tp->packets_out)) {
+		    (!(before(tp->snd_sml,  tp->highest_sack) ||
+		       tcp_win_almost_fullsized(sk)) ||
+		     tp->highest_sack + tp->mss_cache < tp->snd_nxt)) {
 			timedout_continue = tcp_write_queue_next(sk, skb);
 		    	if (!after(entry_seq, tp->high_seq)) {
 				/* Use latest SACK info in skipping past skbs */
@@ -1942,8 +2041,8 @@ struct sk_buff *tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
 
 		/* Phase I: Search until TCP can mark */
 		tcp_for_write_queue_backwards_from(skb, sk) {
-			if ((tp->fackets_out <= tp->sacked_out + tp->lost_out +
-						holes_seen) ||
+			if ((fackets_out <= (tp->sacked_out + tp->lost_out +
+					     holes_seen) * tp->mss_cache) ||
 			    (TCP_SKB_CB(skb)->sacked & TCPCB_LOST))
 				goto backwards_walk_done;
 
@@ -1975,8 +2074,8 @@ struct sk_buff *tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
 
 	/* Phase II: Marker */
 	tcp_for_write_queue_backwards_from(skb, sk) {
-		if ((tp->fackets_out <= tp->sacked_out + tp->lost_out +
-					holes_seen) ||
+		if ((fackets_out <= (tp->sacked_out + tp->lost_out +
+				     holes_seen) * tp->mss_cache) ||
 		    (TCP_SKB_CB(skb)->sacked & TCPCB_LOST))
 			goto backwards_walk_done;
 
@@ -2168,7 +2267,8 @@ static int tcp_try_undo_partial(struct sock *sk, int acked)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	/* Partial ACK arrived. Force Hoe's retransmit. */
-	int failed = IsReno(tp) || (tcp_fackets_out(tp) > tp->reordering);
+	int failed = IsReno(tp) ||
+		(tcp_enough_reordering(sk) > tp->reordering * tp->mss_cache);
 
 	if (tcp_may_undo(tp)) {
 		/* Plain luck! Hole if filled with delayed
@@ -2177,7 +2277,7 @@ static int tcp_try_undo_partial(struct sock *sk, int acked)
 		if (tp->retrans_out == 0)
 			tp->retrans_stamp = 0;
 
-		tcp_update_reordering(sk, tcp_fackets_out(tp) + acked, 1);
+		tcp_update_reordering(sk, tcp_calc_fackets_out(sk) + acked, 1);
 
 		DBGUNDO(sk, "Hoe");
 		tcp_undo_cwr(sk, 0);
@@ -2306,9 +2406,6 @@ tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una,
 	 * 1. Reno does not count dupacks (sacked_out) automatically. */
 	if (!tp->packets_out)
 		tp->sacked_out = 0;
-	/* 2. SACK counts snd_fack in packets inaccurately. */
-	if (tp->sacked_out == 0)
-		tp->fackets_out = 0;
 
 	/* Now state machine starts.
 	 * A. ECE, hence prohibit cwnd undoing, the reduction is required. */
@@ -2323,7 +2420,7 @@ tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una,
 	if (IsFack(tp) && (flag&FLAG_DATA_LOST) &&
 	    before(tp->snd_una, tp->high_seq) &&
 	    icsk->icsk_ca_state != TCP_CA_Open &&
-	    tp->fackets_out > tp->reordering) {
+	    tcp_enough_reordering(sk) > tp->reordering * tp->mss_cache) {
 		tcp_update_scoreboard_fack(sk, mark_lost_entry_seq, 0);
 		NET_INC_STATS_BH(LINUX_MIB_TCPLOSS);
 	}
@@ -2577,10 +2674,6 @@ static int tcp_tso_acked(struct sock *sk, struct sk_buff *skb,
 		} else if (*seq_rtt < 0)
 			*seq_rtt = now - scb->when;
 
-		if (tp->fackets_out) {
-			__u32 dval = min(tp->fackets_out, packets_acked);
-			tp->fackets_out -= dval;
-		}
 		tp->packets_out -= packets_acked;
 
 		BUG_ON(tcp_skb_pcount(skb) == 0);
@@ -2665,7 +2758,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 			seq_rtt = now - scb->when;
 			last_ackt = skb->tstamp;
 		}
-		tcp_dec_pcount_approx(&tp->fackets_out, skb);
 		tcp_packets_out_dec(tp, skb);
 		tcp_unlink_write_queue(skb, sk);
 		sk_stream_free_skb(sk, skb);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 0ad4f36..44f548d 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -407,7 +407,6 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
 		newtp->packets_out = 0;
 		newtp->retrans_out = 0;
 		newtp->sacked_out = 0;
-		newtp->fackets_out = 0;
 		newtp->snd_ssthresh = 0x7fffffff;
 
 		/* So many TCP implementations out there (incorrectly) count the
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 23ee283..1753683 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -735,18 +735,12 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
 			tp->lost_out -= diff;
 
-		if (diff > 0) {
-			/* Adjust Reno SACK estimate. */
-			if (!tp->rx_opt.sack_ok) {
-				tp->sacked_out -= diff;
-				if ((int)tp->sacked_out < 0)
-					tp->sacked_out = 0;
-				tcp_sync_left_out(tp);
-			}
-
-			tp->fackets_out -= diff;
-			if ((int)tp->fackets_out < 0)
-				tp->fackets_out = 0;
+		/* Adjust Reno SACK estimate. */
+		if (diff > 0 && !tp->rx_opt.sack_ok) {
+			tp->sacked_out -= diff;
+			if ((int)tp->sacked_out < 0)
+				tp->sacked_out = 0;
+			tcp_sync_left_out(tp);
 		}
 	}
 
@@ -1729,10 +1723,6 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		if (!tp->rx_opt.sack_ok && tp->sacked_out)
 			tcp_dec_pcount_approx(&tp->sacked_out, next_skb);
 
-		/* Not quite right: it can be > snd.fack, but
-		 * it is better to underestimate fackets.
-		 */
-		tcp_dec_pcount_approx(&tp->fackets_out, next_skb);
 		tcp_packets_out_dec(tp, next_skb);
 		sk_stream_free_skb(sk, next_skb);
 	}
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-26  8:35 [PATCH 0/9]: tcp-2.6 patchset Ilpo Järvinen
  2007-05-26  8:35 ` [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier Ilpo Järvinen
@ 2007-05-26 23:44 ` David Miller
  2007-05-27  7:58   ` Ilpo Järvinen
  1 sibling, 1 reply; 37+ messages in thread
From: David Miller @ 2007-05-26 23:44 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 26 May 2007 11:35:53 +0300

> Dave, you could consider applying other than the last one if they
> seem ok to you too (you'll need to rebase your tcp-2.6 in that case
> first to apply cleanly those that touch tcp_sync_left_out :-)).

Absolutely, I'll do the quick rebase of tcp-2.6 and review +
applying of your patches (except the last one) over the weekend.

Thanks!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] [TCP]: Reorganize lost marking code
  2007-05-26  8:35           ` [PATCH 6/9] [TCP]: Reorganize lost marking code Ilpo Järvinen
  2007-05-26  8:36             ` [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq Ilpo Järvinen
@ 2007-05-27  7:38             ` Ilpo Järvinen
  2007-05-31  8:44             ` David Miller
  2 siblings, 0 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-27  7:38 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7601 bytes --]

On Sat, 26 May 2007, Ilpo Järvinen wrote:

> -static void tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
> -				       int fast_rexmit)
> +struct sk_buff *tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
> +					   int fast_rexmit)

Noticed that the static got dropped here unintentionally, 2nd try included 
below.


[PATCH 6/9] [TCP]: Reorganize lost marking code

The indentation started to get scary, so I reorganized code so
that some trivial ifs are in tcp_update_scoreboard and the main
loops remain in tcp_update_scoreboard_fack.

It's much easier to view the actual changes with git-diff -w
than from the messy looking (default) diff.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |  157 +++++++++++++++++++++++++-------------------------
 1 files changed, 79 insertions(+), 78 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 452a3b0..c5eda37 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1866,7 +1866,6 @@ static void tcp_mark_head_lost_single(struct sock *sk)
 	    before(tp->snd_una, tp->high_seq)) {
 		TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
 		tp->lost_out += tcp_skb_pcount(skb);
-		tcp_sync_left_out(tp);
 	}
 }
 
@@ -1905,99 +1904,85 @@ static void tcp_mark_head_lost_single(struct sock *sk)
  * Key difference here is that FACK uses both SACK blocks and holes while
  * RFC3517 is using only SACK blocks when counting for reordering.
  */
-static void tcp_update_scoreboard_fack(struct sock *sk, u32 entry_seq,
-				       int fast_rexmit)
+static struct sk_buff *tcp_update_scoreboard_fack(struct sock *sk,
+						  u32 entry_seq,
+						  int fast_rexmit)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	/* Beware: timedout_continue might be pointing to sk_write_queue */
 	struct sk_buff *timedout_continue = NULL;
 	struct sk_buff *skb;
-
-	if (!tp->sacked_out) {
-		if (IsFack(tp) || fast_rexmit) {
-			tcp_mark_head_lost_single(sk);
-			skb = tcp_write_queue_head(sk);
-			timedout_continue = tcp_write_queue_next(sk, skb);
-		}
-
+	unsigned int holes_seen = 0;
+
+	skb = tcp_write_queue_find(sk, entry_seq);
+	/* If this ever becomes expensive, it can be delayed */
+	timedout_continue = tcp_write_queue_next(sk, skb);
+	if (entry_seq != tp->highest_sack) {
+		/* Not interested in "the last" SACKed one we got */
+		/* RFC: find_below could help here too */
+		skb = tcp_write_queue_prev(sk, skb);
+		if (IsFack(tp) && tcp_skb_timedout(sk, skb) &&
+		    (tp->fackets_out < tp->packets_out)) {
+			timedout_continue = tcp_write_queue_find(sk, tp->highest_sack);
+			timedout_continue = tcp_write_queue_next(sk, timedout_continue);
+		} else
+			timedout_continue = NULL;
 	} else {
-		unsigned int holes_seen = 0;
-
-		skb = tcp_write_queue_find(sk, entry_seq);
-		/* If this ever becomes expensive, it can be delayed */
-		timedout_continue = tcp_write_queue_next(sk, skb);
-		if (entry_seq != tp->highest_sack) {
-			/* Not interested in "the last" SACKed one we got */
-			/* RFC: find_below could help here too */
-			skb = tcp_write_queue_prev(sk, skb);
-			if (IsFack(tp) && tcp_skb_timedout(sk, skb) &&
-			    (tp->fackets_out < tp->packets_out)) {
-				timedout_continue = tcp_write_queue_find(sk, tp->highest_sack);
-				timedout_continue = tcp_write_queue_next(sk, timedout_continue);
-			} else
-				timedout_continue = NULL;
-		} else {
-			unsigned int reord_count = IsFack(tp) ? 0 : 1;
-
-			/* Phase I: Search until TCP can mark */
-			tcp_for_write_queue_backwards_from(skb, sk) {
-				if ((tp->fackets_out <= tp->sacked_out +
-							tp->lost_out +
-							holes_seen) ||
-				    (TCP_SKB_CB(skb)->sacked & TCPCB_LOST))
-					goto backwards_walk_done;
-
-				if (IsFack(tp) && tcp_skb_timedout(sk, skb))
-					break;
-				else
-					timedout_continue = NULL;
-
-				if (IsFack(tp) ||
-				    (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
-					reord_count += tcp_skb_pcount(skb);
-				if (reord_count > tp->reordering) {
-					if (after(TCP_SKB_CB(skb)->seq, tp->high_seq)) {
-						/* RFC: should we have find_below? */
-						skb = tcp_write_queue_find(sk, tp->high_seq);
-						/* Timedout top is again uncertain? */
-						if (tcp_skb_timedout(sk, skb))
-							timedout_continue = skb;
-						skb = tcp_write_queue_prev(sk, skb);
-					}
-					/* TODO?: do skb fragmentation if necessary */
-					break;
-				}
-
-				if (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
-					holes_seen += tcp_skb_pcount(skb);
-			}
-		}
+		unsigned int reord_count = IsFack(tp) ? 0 : 1;
 
-		/* Phase II: Marker */
+		/* Phase I: Search until TCP can mark */
 		tcp_for_write_queue_backwards_from(skb, sk) {
 			if ((tp->fackets_out <= tp->sacked_out + tp->lost_out +
 						holes_seen) ||
 			    (TCP_SKB_CB(skb)->sacked & TCPCB_LOST))
 				goto backwards_walk_done;
 
-			if (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) {
-				TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
-				tp->lost_out += tcp_skb_pcount(skb);
-				tcp_verify_retransmit_hint(tp, skb);
+			if (IsFack(tp) && tcp_skb_timedout(sk, skb))
+				break;
+			else
+				timedout_continue = NULL;
+
+			if (IsFack(tp) ||
+			    (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
+				reord_count += tcp_skb_pcount(skb);
+			if (reord_count > tp->reordering) {
+				if (after(TCP_SKB_CB(skb)->seq, tp->high_seq)) {
+					/* RFC: should we have find_below? */
+					skb = tcp_write_queue_find(sk, tp->high_seq);
+					/* Timedout top is again uncertain? */
+					if (tcp_skb_timedout(sk, skb))
+						timedout_continue = skb;
+					skb = tcp_write_queue_prev(sk, skb);
+				}
+				/* TODO?: do skb fragmentation if necessary */
+				break;
 			}
+
+			if (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
+				holes_seen += tcp_skb_pcount(skb);
 		}
+	}
+
+	/* Phase II: Marker */
+	tcp_for_write_queue_backwards_from(skb, sk) {
+		if ((tp->fackets_out <= tp->sacked_out + tp->lost_out +
+					holes_seen) ||
+		    (TCP_SKB_CB(skb)->sacked & TCPCB_LOST))
+			goto backwards_walk_done;
 
-		/* Phase III: Nothing is still marked?, mark head then */
-		if ((IsFack(tp) || fast_rexmit) && !tp->lost_out)
-			tcp_mark_head_lost_single(sk);
+		if (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) {
+			TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
+			tp->lost_out += tcp_skb_pcount(skb);
+			tcp_verify_retransmit_hint(tp, skb);
+		}
 	}
-backwards_walk_done:
 
-	/* Continue with timedout work */
-	if (IsFack(tp) && (timedout_continue != NULL))
-		tcp_timedout_mark_forward(sk, timedout_continue);
+	/* Phase III: Nothing is still marked?, mark head then */
+	if ((IsFack(tp) || fast_rexmit) && !tp->lost_out)
+		tcp_mark_head_lost_single(sk);
 
-	tcp_sync_left_out(tp);
+backwards_walk_done:
+	return timedout_continue;
 }
 
 /* Account newly detected lost packet(s) */
@@ -2005,11 +1990,27 @@ static void tcp_update_scoreboard(struct sock *sk, u32 sack_entry_seq,
 				  int fast_rexmit)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb = NULL;
+
+	if (!IsReno(tp)) {
+		if (!tp->sacked_out) {
+			if (IsFack(tp) || fast_rexmit) {
+				tcp_mark_head_lost_single(sk);
+				skb = tcp_write_queue_head(sk);
+				skb = tcp_write_queue_next(sk, skb);
+			}
+		} else
+			skb = tcp_update_scoreboard_fack(sk, sack_entry_seq,
+							 fast_rexmit);
 
-	if (!IsReno(tp))
-		tcp_update_scoreboard_fack(sk, sack_entry_seq, fast_rexmit);
-	else
+		/* Continue with timedout work */
+		if (IsFack(tp) && (skb != NULL))
+			tcp_timedout_mark_forward(sk, skb);
+
+	} else
 		tcp_mark_head_lost_single(sk);
+
+	tcp_sync_left_out(tp);
 }
 
 /* CWND moderation, preventing bursts due to too big ACKs
-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-26 23:44 ` [PATCH 0/9]: tcp-2.6 patchset David Miller
@ 2007-05-27  7:58   ` Ilpo Järvinen
  2007-05-27  9:11     ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-27  7:58 UTC (permalink / raw)
  To: David Miller, Stephen Hemminger, Herbert Xu; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1194 bytes --]

On Sat, 26 May 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Sat, 26 May 2007 11:35:53 +0300
> 
> > Dave, you could consider applying other than the last one if they
> > seem ok to you too (you'll need to rebase your tcp-2.6 in that case
> > first to apply cleanly those that touch tcp_sync_left_out :-)).
> 
> Absolutely, I'll do the quick rebase of tcp-2.6 and review +
> applying of your patches (except the last one) over the weekend.

While you're in the right context (reviewing patch 8), you could also
look if tcp_clean_rtx_queue does a right thing when passing a strange 
pkts_acked to congestion control modules. I wonder if it really should 
ignore GSO the way it does currently... I read some cc module code and 
some was adding it to snd_cwnd_cnt, etc. which is a strong indication 
that GSO should be considered... Also if the head is GSO skb that is not 
completely acked, the loop breaks with pkts_acked being zero, I doubt
that can be correct... 

...added Stephen H. & Herbert X. to this thread since also they might have 
some idea about it.

...Anyway, if it's wrong, the fix should go to net-2.6 rather than to 
tcp-2.6...

-- 
 i.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-27  7:58   ` Ilpo Järvinen
@ 2007-05-27  9:11     ` David Miller
  2007-05-27 11:16       ` Ilpo Järvinen
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2007-05-27  9:11 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: shemminger, herbert, netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sun, 27 May 2007 10:58:27 +0300 (EEST)

> On Sat, 26 May 2007, David Miller wrote:
> 
> > From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> > Date: Sat, 26 May 2007 11:35:53 +0300
> > 
> > > Dave, you could consider applying other than the last one if they
> > > seem ok to you too (you'll need to rebase your tcp-2.6 in that case
> > > first to apply cleanly those that touch tcp_sync_left_out :-)).
> > 
> > Absolutely, I'll do the quick rebase of tcp-2.6 and review +
> > applying of your patches (except the last one) over the weekend.
> 
> While you're in the right context (reviewing patch 8), you could also
> look if tcp_clean_rtx_queue does a right thing when passing a strange 
> pkts_acked to congestion control modules. I wonder if it really should 
> ignore GSO the way it does currently... I read some cc module code and 
> some was adding it to snd_cwnd_cnt, etc. which is a strong indication 
> that GSO should be considered... Also if the head is GSO skb that is not 
> completely acked, the loop breaks with pkts_acked being zero, I doubt
> that can be correct... 

I rebased tcp-2.6, applied your patches 1-7 (which I'll comment
a bit on tomorrow) and will likely take a look at these issues
wrt. patch 8 tomorrow.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-27  9:11     ` David Miller
@ 2007-05-27 11:16       ` Ilpo Järvinen
  2007-05-27 14:04         ` Baruch Even
  0 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-27 11:16 UTC (permalink / raw)
  To: David Miller; +Cc: Stephen Hemminger, Herbert Xu, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1838 bytes --]

On Sun, 27 May 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Sun, 27 May 2007 10:58:27 +0300 (EEST)
> 
> > While you're in the right context (reviewing patch 8), you could also
> > look if tcp_clean_rtx_queue does a right thing when passing a strange 
> > pkts_acked to congestion control modules. I wonder if it really should 
> > ignore GSO the way it does currently... I read some cc module code and 
> > some was adding it to snd_cwnd_cnt, etc. which is a strong indication 
> > that GSO should be considered... Also if the head is GSO skb that is not 
> > completely acked, the loop breaks with pkts_acked being zero, I doubt
> > that can be correct... 
> 
> [...snip...]
> will likely take a look at these issues wrt. patch 8 tomorrow.

...I hope I got myself understood correctly (with my non-native
English :-))... ...My intention was to say that there might be
a bug in tcp_clean_rtx_queue too, which on logical level is
unrelated to the patch 8 itself and to the change it makes (but
caught my attention while I was doing that patch as it resides in
the same function). ...There could have been confusion since I'm too 
changing things in the same function and because my article usage is 
usually far from perfect... :-) Also the fastretrans_alert arg rename
to the same name (i.e., pkts_acked) in patch 8 is unrelated to the 
potential bug.

Thus, my original question basically culminates in this: should cc
modules be passed number of packets acked or number of skbs acked?
...The latter makes no sense to me unless the value is intented to
be interpreted as number of timestamps acked or something along those 
lines. ...I briefly tried looking up for documentation for cc module 
interface but didn't find anything useful about this, and thus asked in 
the first place...

-- 
 i.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-27 11:16       ` Ilpo Järvinen
@ 2007-05-27 14:04         ` Baruch Even
  2007-05-27 16:56           ` Ilpo Järvinen
  0 siblings, 1 reply; 37+ messages in thread
From: Baruch Even @ 2007-05-27 14:04 UTC (permalink / raw)
  To: Ilpo J?rvinen; +Cc: David Miller, Stephen Hemminger, Herbert Xu, Netdev

* Ilpo J?rvinen <ilpo.jarvinen@helsinki.fi> [070527 14:16]:
> On Sun, 27 May 2007, David Miller wrote:
> 
> > From: "Ilpo_J?rvinen" <ilpo.jarvinen@helsinki.fi>
> > Date: Sun, 27 May 2007 10:58:27 +0300 (EEST)
> > 
> > > While you're in the right context (reviewing patch 8), you could also
> > > look if tcp_clean_rtx_queue does a right thing when passing a strange 
> > > pkts_acked to congestion control modules. I wonder if it really should 
> > > ignore GSO the way it does currently... I read some cc module code and 
> > > some was adding it to snd_cwnd_cnt, etc. which is a strong indication 
> > > that GSO should be considered... Also if the head is GSO skb that is not 
> > > completely acked, the loop breaks with pkts_acked being zero, I doubt
> > > that can be correct... 
> > 
> > [...snip...]
> > will likely take a look at these issues wrt. patch 8 tomorrow.
> 
> [...snip...]
> 
> Thus, my original question basically culminates in this: should cc
> modules be passed number of packets acked or number of skbs acked?
> ...The latter makes no sense to me unless the value is intented to
> be interpreted as number of timestamps acked or something along those 
> lines. ...I briefly tried looking up for documentation for cc module 
> interface but didn't find anything useful about this, and thus asked in 
> the first place...

At least the htcp module that I wrote assumes that the number is actual
number of tcp packets so GSO should be considered.

The consequences of this bug are not too large but it does make all
congestion control algorithms a lot less aggressive. On my machines GSO
is disabled by default (e1000 at 100mbps & Tigon3 @ 1Gbps).

Cheers,
Baruch

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-27 14:04         ` Baruch Even
@ 2007-05-27 16:56           ` Ilpo Järvinen
  2007-05-28 10:27             ` Ilpo Järvinen
  0 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-27 16:56 UTC (permalink / raw)
  To: Baruch Even; +Cc: David Miller, Stephen Hemminger, Herbert Xu, Netdev

On Sun, 27 May 2007, Baruch Even wrote:

> * Ilpo J?rvinen <ilpo.jarvinen@helsinki.fi> [070527 14:16]:
> > 
> > Thus, my original question basically culminates in this: should cc
> > modules be passed number of packets acked or number of skbs acked?
> > ...The latter makes no sense to me unless the value is intented to
> > be interpreted as number of timestamps acked or something along those 
> > lines. ...I briefly tried looking up for documentation for cc module 
> > interface but didn't find anything useful about this, and thus asked in 
> > the first place...
> 
> At least the htcp module that I wrote assumes that the number is actual
> number of tcp packets so GSO should be considered.

Thanks for the info! It is what I suspected... ...I'll write a patch for 
it tomorrow against net-2.6... Dave, beware that it will partially 
overlap with the changes made in the patch 8, so you might choose to put 
the patch 8 on hold until this issue is first resolved...

> The consequences of this bug are not too large but it does make all
> congestion control algorithms a lot less aggressive. On my machines GSO
> is disabled by default (e1000 at 100mbps & Tigon3 @ 1Gbps).

Agreed, that's my impression too. However, some algorithms do things
like > 0 checks for it, so it might disturb their dynamics even more
than in the "too small value" cases...


-- 
 i.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-27 16:56           ` Ilpo Järvinen
@ 2007-05-28 10:27             ` Ilpo Järvinen
  2007-05-29 16:14               ` Stephen Hemminger
  0 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-28 10:27 UTC (permalink / raw)
  To: David Miller; +Cc: Baruch Even, Stephen Hemminger, Herbert Xu, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3637 bytes --]

On Sun, 27 May 2007, Ilpo Järvinen wrote:

> On Sun, 27 May 2007, Baruch Even wrote:
> 
> > * Ilpo J?rvinen <ilpo.jarvinen@helsinki.fi> [070527 14:16]:
> > > 
> > > Thus, my original question basically culminates in this: should cc
> > > modules be passed number of packets acked or number of skbs acked?
> > > ...The latter makes no sense to me unless the value is intented to
> > > be interpreted as number of timestamps acked or something along those 
> > > lines. ...I briefly tried looking up for documentation for cc module 
> > > interface but didn't find anything useful about this, and thus asked in 
> > > the first place...
> > 
> > At least the htcp module that I wrote assumes that the number is actual
> > number of tcp packets so GSO should be considered.
> 
> Thanks for the info! It is what I suspected... ...I'll write a patch for 
> it tomorrow against net-2.6... Dave, beware that it will partially 
> overlap with the changes made in the patch 8, so you might choose to put 
> the patch 8 on hold until this issue is first resolved...
> 
> > The consequences of this bug are not too large but it does make all
> > congestion control algorithms a lot less aggressive. On my machines GSO
> > is disabled by default (e1000 at 100mbps & Tigon3 @ 1Gbps).
> 
> Agreed, that's my impression too. However, some algorithms do things
> like > 0 checks for it, so it might disturb their dynamics even more
> than in the "too small value" cases...

Hmm, there seems to be another case that I'm not too sure of...
Please check the alternative I choose for SYN handling below...

...hmm... While exploring this SYN thingie, I noticed that commit 
164891aadf1721fca4dce473bb0e0998181537c6 drops !FLAG_RETRANS_DATA_ACKED 
check from rtt_sample call (when combining it with pkts_acked call).
I hope that's intentional?!? ...the commit message didn't say anything 
about it nor was anything in cc modules changed to accomodate that.


[PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)

The code used to ignore GSO completely, passing either way too
small or zero pkts_acked when GSO skb or part of it got ACKed.
In addition, there is no need to calculate the value in the loop
but simple arithmetics after the loop is sufficient.

It is not very clear how SYN segments should be handled, so I
choose to follow the previous implementation in this respect.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 38cb25b..7dfaabb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2407,8 +2407,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	struct sk_buff *skb;
 	__u32 now = tcp_time_stamp;
 	int acked = 0;
+	int prior_packets = tp->packets_out;
 	__s32 seq_rtt = -1;
-	u32 pkts_acked = 0;
 	ktime_t last_ackt = ktime_set(0,0);
 
 	while ((skb = tcp_write_queue_head(sk)) &&
@@ -2437,7 +2437,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		 */
 		if (!(scb->flags & TCPCB_FLAG_SYN)) {
 			acked |= FLAG_DATA_ACKED;
-			++pkts_acked;
 		} else {
 			acked |= FLAG_SYN_ACKED;
 			tp->retrans_stamp = 0;
@@ -2481,9 +2480,13 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	}
 
 	if (acked&FLAG_ACKED) {
+		u32 pkts_acked = prior_packets - tp->packets_out;
 		const struct tcp_congestion_ops *ca_ops
 			= inet_csk(sk)->icsk_ca_ops;
 
+		if (acked & FLAG_SYN_ACKED)
+			pkts_acked--;
+
 		tcp_ack_update_rtt(sk, acked, seq_rtt);
 		tcp_ack_packets_out(sk);
 
-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-28 10:27             ` Ilpo Järvinen
@ 2007-05-29 16:14               ` Stephen Hemminger
  2007-05-29 20:07                 ` Ilpo Järvinen
  0 siblings, 1 reply; 37+ messages in thread
From: Stephen Hemminger @ 2007-05-29 16:14 UTC (permalink / raw)
  To: "Ilpo =?UTF-8?B?SsOkcnZpbmVuIg==?= <ilpo.jarvinen
  Cc: David Miller, Baruch Even, Herbert Xu, Netdev

On Mon, 28 May 2007 13:27:03 +0300 (EEST)
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:

> On Sun, 27 May 2007, Ilpo Järvinen wrote:
> 
> > On Sun, 27 May 2007, Baruch Even wrote:
> > 
> > > * Ilpo J?rvinen <ilpo.jarvinen@helsinki.fi> [070527 14:16]:
> > > > 
> > > > Thus, my original question basically culminates in this: should cc
> > > > modules be passed number of packets acked or number of skbs acked?
> > > > ...The latter makes no sense to me unless the value is intented to
> > > > be interpreted as number of timestamps acked or something along those 
> > > > lines. ...I briefly tried looking up for documentation for cc module 
> > > > interface but didn't find anything useful about this, and thus asked in 
> > > > the first place...
> > > 
> > > At least the htcp module that I wrote assumes that the number is actual
> > > number of tcp packets so GSO should be considered.
> > 
> > Thanks for the info! It is what I suspected... ...I'll write a patch for 
> > it tomorrow against net-2.6... Dave, beware that it will partially 
> > overlap with the changes made in the patch 8, so you might choose to put 
> > the patch 8 on hold until this issue is first resolved...
> > 
> > > The consequences of this bug are not too large but it does make all
> > > congestion control algorithms a lot less aggressive. On my machines GSO
> > > is disabled by default (e1000 at 100mbps & Tigon3 @ 1Gbps).
> > 
> > Agreed, that's my impression too. However, some algorithms do things
> > like > 0 checks for it, so it might disturb their dynamics even more
> > than in the "too small value" cases...
> 
> Hmm, there seems to be another case that I'm not too sure of...
> Please check the alternative I choose for SYN handling below...
> 
> ...hmm... While exploring this SYN thingie, I noticed that commit 
> 164891aadf1721fca4dce473bb0e0998181537c6 drops !FLAG_RETRANS_DATA_ACKED 
> check from rtt_sample call (when combining it with pkts_acked call).
> I hope that's intentional?!? ...the commit message didn't say anything 
> about it nor was anything in cc modules changed to accomodate that.
> 
> 
> [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
> 
> The code used to ignore GSO completely, passing either way too
> small or zero pkts_acked when GSO skb or part of it got ACKed.
> In addition, there is no need to calculate the value in the loop
> but simple arithmetics after the loop is sufficient.

Yes, thanks for fixing this. Wonder how it affects measurements.
 
> It is not very clear how SYN segments should be handled, so I
> choose to follow the previous implementation in this respect.

Since we don't invoke congestion control modules until after the SYN
handshake this is not a problem.


-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-29 16:14               ` Stephen Hemminger
@ 2007-05-29 20:07                 ` Ilpo Järvinen
  2007-05-29 20:19                   ` Stephen Hemminger
  0 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-29 20:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Baruch Even, David Miller, Herbert Xu, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1500 bytes --]

On Tue, 29 May 2007, Stephen Hemminger wrote:

> On Mon, 28 May 2007 13:27:03 +0300 (EEST)
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:
> 
> > On Sun, 27 May 2007, Ilpo Järvinen wrote:
> > 
> > [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
> 
> Yes, thanks for fixing this. Wonder how it affects measurements.

...It's a bit hard to tell since dynamics change so dramatically 
in > 0 check cases, the resulting behavior in too small value cases
may be easier to predict... It's possible that this could explain some 
anomalities you've been seeing in your measurements.

> > It is not very clear how SYN segments should be handled, so I
> > choose to follow the previous implementation in this respect.
> 
> Since we don't invoke congestion control modules until after the SYN
> handshake this is not a problem.

Just curious, do you mean that cc modules cannot measure, e.g., initial
RTT through this mechanism (though they could do that in init() cb then
I suppose)... Or do you mean that they are called already for the ACK
that completes the SYN handshake and therefore its skb is being cleaned 
from the queue right now (this is the case I above refer to)? 
In the first case the decrementer code is NOP. If the latter, then it
is just interface specification question, i.e., if SYNs are treated as 
zero or one in num_acked for the pkts_acked callback (I have no opinion
on this but was just trying to make sure cc modules get what they
expect :-)). 

-- 
 i.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-29 20:07                 ` Ilpo Järvinen
@ 2007-05-29 20:19                   ` Stephen Hemminger
  2007-05-29 20:58                     ` Ilpo Järvinen
  0 siblings, 1 reply; 37+ messages in thread
From: Stephen Hemminger @ 2007-05-29 20:19 UTC (permalink / raw)
  To: "Ilpo =?UTF-8?B?SsOkcnZpbmVuIg==?= <ilpo.jarvinen
  Cc: Baruch Even, David Miller, Herbert Xu, Netdev

On Tue, 29 May 2007 23:07:00 +0300 (EEST)
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:

> On Tue, 29 May 2007, Stephen Hemminger wrote:
> 
> > On Mon, 28 May 2007 13:27:03 +0300 (EEST)
> > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:
> > 
> > > On Sun, 27 May 2007, Ilpo Järvinen wrote:
> > > 
> > > [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
> > 
> > Yes, thanks for fixing this. Wonder how it affects measurements.
> 
> ...It's a bit hard to tell since dynamics change so dramatically 
> in > 0 check cases, the resulting behavior in too small value cases
> may be easier to predict... It's possible that this could explain some 
> anomalities you've been seeing in your measurements.
> 
> > > It is not very clear how SYN segments should be handled, so I
> > > choose to follow the previous implementation in this respect.
> > 
> > Since we don't invoke congestion control modules until after the SYN
> > handshake this is not a problem.
> 
> Just curious, do you mean that cc modules cannot measure, e.g., initial
> RTT through this mechanism (though they could do that in init() cb then
> I suppose)... Or do you mean that they are called already for the ACK
> that completes the SYN handshake and therefore its skb is being cleaned 
> from the queue right now (this is the case I above refer to)? 
> In the first case the decrementer code is NOP. If the latter, then it
> is just interface specification question, i.e., if SYNs are treated as 
> zero or one in num_acked for the pkts_acked callback (I have no opinion
> on this but was just trying to make sure cc modules get what they
> expect :-)). 

We don't switch a socket out of Reno until after the initial handshake.
It evolved that way, and has a couple of advantages:
 * uncomplete connections don't up module refcount so it is easier to
   debug
 * in future, we can choose congestion control to use based on route
   metric.
 
As an interface, it makes sense to keep the API with the SYN counting
as a packet.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-29 20:19                   ` Stephen Hemminger
@ 2007-05-29 20:58                     ` Ilpo Järvinen
  2007-05-29 21:15                       ` Stephen Hemminger
  2007-05-30  9:10                       ` [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules) Ilpo Järvinen
  0 siblings, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-29 20:58 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller; +Cc: Baruch Even, Herbert Xu, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3421 bytes --]

On Tue, 29 May 2007, Stephen Hemminger wrote:

> On Tue, 29 May 2007 23:07:00 +0300 (EEST)
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:
> 
> > On Tue, 29 May 2007, Stephen Hemminger wrote:
> > 
> > > Since we don't invoke congestion control modules until after the SYN
> > > handshake this is not a problem.
> > 
> > Just curious, do you mean that cc modules cannot measure, e.g., initial
> > RTT through this mechanism (though they could do that in init() cb then
> > I suppose)... Or do you mean that they are called already for the ACK
> > that completes the SYN handshake and therefore its skb is being cleaned 
> > from the queue right now (this is the case I above refer to)? 
> > In the first case the decrementer code is NOP. If the latter, then it
> > is just interface specification question, i.e., if SYNs are treated as 
> > zero or one in num_acked for the pkts_acked callback (I have no opinion
> > on this but was just trying to make sure cc modules get what they
> > expect :-)). 
> 
> We don't switch a socket out of Reno until after the initial handshake.

...It's still not very clear to me what exactly your "after" means (both 
here and in your earlier description), i.e., whether clean_rtx_queue call 
that cleans SYN skb from the queue happens before or after the switch out 
of reno or not... If I understand the code correctly, this specific 
clean_rtx_queue call happens after "your after" but I could be 
misunderstanding the current 3-way handshake code. :-)

> As an interface, it makes sense to keep the API with the SYN counting
> as a packet.

Ok, this one answers the remaining question concerning the patch, here
is it without the decrementer for SYN case (which IMHO wasn't very 
beautiful looking anyway :-)).

Dave, please consider this to net-2.6. It could be a stable candidate 
as well, haven't tested yet if it applies cleanly to stable:


[PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)

The code used to ignore GSO completely, passing either way too
small or zero pkts_acked when GSO skb or part of it got ACKed.
In addition, there is no need to calculate the value in the loop
but simple arithmetics after the loop is sufficient. Pkts_acked
callback of congestion control modules include SYN as a packet
for now on.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 38cb25b..74683d8 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2407,8 +2407,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	struct sk_buff *skb;
 	__u32 now = tcp_time_stamp;
 	int acked = 0;
+	int prior_packets = tp->packets_out;
 	__s32 seq_rtt = -1;
-	u32 pkts_acked = 0;
 	ktime_t last_ackt = ktime_set(0,0);
 
 	while ((skb = tcp_write_queue_head(sk)) &&
@@ -2437,7 +2437,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		 */
 		if (!(scb->flags & TCPCB_FLAG_SYN)) {
 			acked |= FLAG_DATA_ACKED;
-			++pkts_acked;
 		} else {
 			acked |= FLAG_SYN_ACKED;
 			tp->retrans_stamp = 0;
@@ -2481,6 +2480,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	}
 
 	if (acked&FLAG_ACKED) {
+		u32 pkts_acked = prior_packets - tp->packets_out;
 		const struct tcp_congestion_ops *ca_ops
 			= inet_csk(sk)->icsk_ca_ops;
 
-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/9]: tcp-2.6 patchset
  2007-05-29 20:58                     ` Ilpo Järvinen
@ 2007-05-29 21:15                       ` Stephen Hemminger
  2007-05-30  9:10                       ` [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules) Ilpo Järvinen
  1 sibling, 0 replies; 37+ messages in thread
From: Stephen Hemminger @ 2007-05-29 21:15 UTC (permalink / raw)
  To: "Ilpo =?UTF-8?B?SsOkcnZpbmVuIg==?= <ilpo.jarvinen
  Cc: David Miller, Baruch Even, Herbert Xu, Netdev

On Tue, 29 May 2007 23:58:39 +0300 (EEST)
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:

> On Tue, 29 May 2007, Stephen Hemminger wrote:
> 
> > On Tue, 29 May 2007 23:07:00 +0300 (EEST)
> > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:
> > 
> > > On Tue, 29 May 2007, Stephen Hemminger wrote:
> > > 
> > > > Since we don't invoke congestion control modules until after the SYN
> > > > handshake this is not a problem.
> > > 
> > > Just curious, do you mean that cc modules cannot measure, e.g., initial
> > > RTT through this mechanism (though they could do that in init() cb then
> > > I suppose)... Or do you mean that they are called already for the ACK
> > > that completes the SYN handshake and therefore its skb is being cleaned 
> > > from the queue right now (this is the case I above refer to)? 
> > > In the first case the decrementer code is NOP. If the latter, then it
> > > is just interface specification question, i.e., if SYNs are treated as 
> > > zero or one in num_acked for the pkts_acked callback (I have no opinion
> > > on this but was just trying to make sure cc modules get what they
> > > expect :-)). 
> > 
> > We don't switch a socket out of Reno until after the initial handshake.
> 
> ...It's still not very clear to me what exactly your "after" means (both 
> here and in your earlier description), i.e., whether clean_rtx_queue call 
> that cleans SYN skb from the queue happens before or after the switch out 
> of reno or not... If I understand the code correctly, this specific 
> clean_rtx_queue call happens after "your after" but I could be 
> misunderstanding the current 3-way handshake code. :-)

The call to clean_rtx_queue and congestion control routine happens here:
    tcp_ack
       tcp_clean_rtx_queue
          cong->pkts_acked

Setup of congestion control happens here:
    tcp_ack
    tcp_rcv_syn_sent_state_process
      tcp_init_congestion_control

or here:
    tcp_ack
    tcp_rcv_state_process (case ack of TCP_SYN_RECV)
       tcp_init_congestion_control
     
Another benefit of this, is that when tcp_init_congestion_control is called
things like the srtt and initial sequence number have already been setup.

> > As an interface, it makes sense to keep the API with the SYN counting
> > as a packet.
> 
> Ok, this one answers the remaining question concerning the patch, here
> is it without the decrementer for SYN case (which IMHO wasn't very 
> beautiful looking anyway :-)).
> 
> Dave, please consider this to net-2.6. It could be a stable candidate 
> as well, haven't tested yet if it applies cleanly to stable:


-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
  2007-05-29 20:58                     ` Ilpo Järvinen
  2007-05-29 21:15                       ` Stephen Hemminger
@ 2007-05-30  9:10                       ` Ilpo Järvinen
  2007-06-01  4:38                         ` David Miller
  1 sibling, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-30  9:10 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2141 bytes --]

On Tue, 29 May 2007, Ilpo Järvinen wrote:

> [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
> 
> The code used to ignore GSO completely, passing either way too
> small or zero pkts_acked when GSO skb or part of it got ACKed.
> In addition, there is no need to calculate the value in the loop
> but simple arithmetics after the loop is sufficient. Pkts_acked
> callback of congestion control modules include SYN as a packet
> for now on.

Based on feedback from Stephen, I changed the commit msg bit
clearer, the patch remains the same.

[PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)

The code used to ignore GSO completely, passing either way too
small or zero pkts_acked when GSO skb or part of it got ACKed.
In addition, there is no need to calculate the value in the loop
but simple arithmetics after the loop is sufficient. There is
no need to handle SYN case specially because congestion control
modules are not yet initialized when FLAG_SYN_ACKED is set.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 38cb25b..74683d8 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2407,8 +2407,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	struct sk_buff *skb;
 	__u32 now = tcp_time_stamp;
 	int acked = 0;
+	int prior_packets = tp->packets_out;
 	__s32 seq_rtt = -1;
-	u32 pkts_acked = 0;
 	ktime_t last_ackt = ktime_set(0,0);
 
 	while ((skb = tcp_write_queue_head(sk)) &&
@@ -2437,7 +2437,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 		 */
 		if (!(scb->flags & TCPCB_FLAG_SYN)) {
 			acked |= FLAG_DATA_ACKED;
-			++pkts_acked;
 		} else {
 			acked |= FLAG_SYN_ACKED;
 			tp->retrans_stamp = 0;
@@ -2481,6 +2480,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
 	}
 
 	if (acked&FLAG_ACKED) {
+		u32 pkts_acked = prior_packets - tp->packets_out;
 		const struct tcp_congestion_ops *ca_ops
 			= inet_csk(sk)->icsk_ca_ops;
 
-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier
  2007-05-26  8:35 ` [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier Ilpo Järvinen
  2007-05-26  8:35   ` [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting Ilpo Järvinen
@ 2007-05-31  8:39   ` David Miller
  1 sibling, 0 replies; 37+ messages in thread
From: David Miller @ 2007-05-31  8:39 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 26 May 2007 11:35:54 +0300

> From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

This is of course trivial and fine, applied.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting
  2007-05-26  8:35   ` [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting Ilpo Järvinen
  2007-05-26  8:35     ` [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out Ilpo Järvinen
@ 2007-05-31  8:40     ` David Miller
  1 sibling, 0 replies; 37+ messages in thread
From: David Miller @ 2007-05-31  8:40 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 26 May 2007 11:35:55 +0300

> From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>
> 
> F-RTO does not touch SACKED_ACKED bits at all, so there is no
> need to recount them in tcp_enter_frto_loss. After removal of
> the else branch, nested ifs can be combined.
> 
> This must also reset sacked_out when SACK is not in use as TCP
> could have received some duplicate ACKs prior RTO. To achieve
> that in a sane manner, tcp_reset_reno_sack was re-placed by the
> previous patch.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Looks good, applied.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out
  2007-05-26  8:35     ` [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out Ilpo Järvinen
  2007-05-26  8:35       ` [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint Ilpo Järvinen
@ 2007-05-31  8:42       ` David Miller
  2007-05-31 16:31         ` Ilpo Järvinen
  1 sibling, 1 reply; 37+ messages in thread
From: David Miller @ 2007-05-31  8:42 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 26 May 2007 11:35:56 +0300

> From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>
> 
> It is easily calculable when needed and user are not that many
> after all.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

This looks good, but are you absolutely sure we never used
a stale value of tp->left_out on purpose?

I tried to audit this but there are so many cases :-)

What I mean here is, was there a case where we updated
sacked_out or lost_out, but then used left_out before
it was updated, and we were doing this on purpose?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint
  2007-05-26  8:35       ` [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint Ilpo Järvinen
  2007-05-26  8:35         ` [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it Ilpo Järvinen
@ 2007-05-31  8:43         ` David Miller
  1 sibling, 0 replies; 37+ messages in thread
From: David Miller @ 2007-05-31  8:43 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 26 May 2007 11:35:57 +0300

> From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>
> 
> In addition, added a reference about the purpose of the loop.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Nice observation, patch applied.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it
  2007-05-26  8:35         ` [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it Ilpo Järvinen
  2007-05-26  8:35           ` [PATCH 6/9] [TCP]: Reorganize lost marking code Ilpo Järvinen
@ 2007-05-31  8:43           ` David Miller
  1 sibling, 0 replies; 37+ messages in thread
From: David Miller @ 2007-05-31  8:43 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 26 May 2007 11:35:58 +0300

> From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>
> 
> No other users exist for tcp_ecn.h. Very few things remain in
> tcp.h, for most TCP ECN functions callers reside within a
> single .c file and can be placed there.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Looks good, applied.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] [TCP]: Reorganize lost marking code
  2007-05-26  8:35           ` [PATCH 6/9] [TCP]: Reorganize lost marking code Ilpo Järvinen
  2007-05-26  8:36             ` [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq Ilpo Järvinen
  2007-05-27  7:38             ` [PATCH 6/9] [TCP]: Reorganize lost marking code Ilpo Järvinen
@ 2007-05-31  8:44             ` David Miller
  2 siblings, 0 replies; 37+ messages in thread
From: David Miller @ 2007-05-31  8:44 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 26 May 2007 11:35:59 +0300

> From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>
> 
> The indentation started to get scary, so I reorganized code so
> that some trivial ifs are in tcp_update_scoreboard and the main
> loops remain in tcp_update_scoreboard_fack.
> 
> It's much easier to view the actual changes with git-diff -w
> than from the messy looking (default) diff.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied, thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq
  2007-05-26  8:36             ` [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq Ilpo Järvinen
  2007-05-26  8:36               ` [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue Ilpo Järvinen
@ 2007-05-31  8:44               ` David Miller
  1 sibling, 0 replies; 37+ messages in thread
From: David Miller @ 2007-05-31  8:44 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 26 May 2007 11:36:00 +0300

> From: =?ISO-8859-1?q?Ilpo_J=E4rvinen?= <ilpo.jarvinen@helsinki.fi>
> 
> In addition, implemented find_below using minus one. Some
> reorganization was necessary to make code efficient again.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

I wonder if this adds some case where timedout_continue can
be left as NULL and accessed?

But applied, thanks!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out
  2007-05-31  8:42       ` [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out David Miller
@ 2007-05-31 16:31         ` Ilpo Järvinen
  2007-07-03  5:02           ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2007-05-31 16:31 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2919 bytes --]

Thanks for you comments...

On Thu, 31 May 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Sat, 26 May 2007 11:35:56 +0300
> 
> > It is easily calculable when needed and user are not that many
> > after all.
>
> This looks good, but are you absolutely sure we never used
> a stale value of tp->left_out on purpose?
> 
> I tried to audit this but there are so many cases :-)

Yes I know... So often complexity in these kind of things really
perplexes me too, in a way this is somewhat similar to the case with FRTO 
and no SACKed bits touched (which just looked very obvious and safe) but 
would have caused the dreaded S+L situation unless special handling for 
IsReno(tp) case would have been added.

> What I mean here is, was there a case where we updated
> sacked_out or lost_out, but then used left_out before
> it was updated, and we were doing this on purpose?

I understand what you mean even though I think that such approach would be 
very fragile if being done (and should be fixed). I was actually wondering 
earlier that could it be the only reason for the separate variable in the 
first place. Though so far I don't know any cases... ...but the code ain't 
always that trivial like you say :-), however, this after proven safe, 
will make it cleaner and will save next one to come from the same
wondering... ;-)

Anyway, I'll be try to do some grep tricks in order to reduce the number 
of cases that need an expensive manual audit to a smaller set where 
tcp_sync_left_out is not right after the adjustment. As plenty of 
functions end to tcp_sync_left_out without returning in between, it could 
help a lot, of course the calls within that function could have issues too 
but they're relatively easy to check then (common pattern is a trivial 
loop + sync as you know)... Usually only when the "stale state" leaks from 
a function, we're in troubles with the number of possible paths. ...I'll 
come back to this later on after verifying it and perhaps provide the 
same shortened list for you too so that you can see if I overlooked 
something... 

It might actually end up being easier to audit all tcp_packets_in_flight 
callers instead to see if there's always sync_left_out before them.

Another thing, I was thinking of renaming tcp_sync_left_out to 
tcp_verify_left_out now that it does only BUG traps... Do you have a hunch 
about this (from patch 8 msg): I wonder if we could get that BUG_ON place 
pinpointed more exactly in oops because now inlining makes it lose its 
original context as they always seem to be in tcp_ack, #define helps?
(i.e, how does it show up in oops if tcp_verify_left_out looks like this):

#define tcp_verify_left_out(tp) BUG_ON(...)

...does it still point to the tcp.h line xxxx then or to the
tcp_sync_left_out(tp) @ line yyyy in tcp_input.c? ...I'll test that
later on by myself in case you don't know the answer.

-- 
 i.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
  2007-05-30  9:10                       ` [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules) Ilpo Järvinen
@ 2007-06-01  4:38                         ` David Miller
  2007-06-01 11:22                           ` Ilpo Järvinen
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2007-06-01  4:38 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 30 May 2007 12:10:06 +0300 (EEST)

> Based on feedback from Stephen, I changed the commit msg bit
> clearer, the patch remains the same.
> 
> [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
> 
> The code used to ignore GSO completely, passing either way too
> small or zero pkts_acked when GSO skb or part of it got ACKed.
> In addition, there is no need to calculate the value in the loop
> but simple arithmetics after the loop is sufficient. There is
> no need to handle SYN case specially because congestion control
> modules are not yet initialized when FLAG_SYN_ACKED is set.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Patch applied, thanks a lot!

I'll push this one to -stable too.

Thanks again.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
  2007-06-01  4:38                         ` David Miller
@ 2007-06-01 11:22                           ` Ilpo Järvinen
  0 siblings, 0 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-06-01 11:22 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 952 bytes --]

On Thu, 31 May 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Wed, 30 May 2007 12:10:06 +0300 (EEST)
> 
> > [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
> > 
> > The code used to ignore GSO completely, passing either way too
> > small or zero pkts_acked when GSO skb or part of it got ACKed.
> > In addition, there is no need to calculate the value in the loop
> > but simple arithmetics after the loop is sufficient. There is
> > no need to handle SYN case specially because congestion control
> > modules are not yet initialized when FLAG_SYN_ACKED is set.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> Patch applied, thanks a lot!

...It's good to be on the alert with this one because it might uncover
some hidden bugs from congestion control module code (it changes some 
behavior so radically), which obviously will have to be then fixed
too...


-- 
 i.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue
  2007-05-26  8:36               ` [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue Ilpo Järvinen
  2007-05-26  8:36                 ` [PATCH 9/9] [RFC] [TCP]: Kill tp->fackets_out (tcp_sock diet program) Ilpo Järvinen
@ 2007-07-03  5:00                 ` David Miller
  2007-07-03 11:08                   ` Ilpo Järvinen
  1 sibling, 1 reply; 37+ messages in thread
From: David Miller @ 2007-07-03  5:00 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 26 May 2007 11:36:01 +0300

> Previously TCP had a transitional state during which reno
> counted segments that are already below the current window into
> sacked_out, which is now prevented. Re-try now unconditional
> S+L catching (I wonder if we could get that BUG_ON place
> pinpointed more exactly in oops because now inlining makes it
> lose its original context as they always seem to be in
> tcp_ack, #define helps?).
> 
> Beware, this change is not a trivial one and might have some
> unexpected side-effects under obscure conditions since state
> tracking is to happen much later on and the reno sack counting
> was highly depending on the current state.
> 
> This approach conservatively calls just remove_sack and leaves
> reset_sack() calls alone. The best solution to the whole problem
> would be to first calculate the new sacked_out fully (this patch
> does not move reno_sack_reset calls from original sites and thus
> does not implement this). However, that would require very
> invasive change to fastretrans_alert (perhaps even slicing it to
> two halves). Alternatively, all callers of tcp_packets_in_flight
> (i.e., users that depend on sacked_out) should be postponed
> until the new sacked_out has been calculated but it isn't any
> simpler alternative.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

So basically the idea behind this patch is to do the update of
the fake RENO sakcs in clean_rtx_queue instead of fixing it up
at the very last moment right before we invoke tcp_try_to_undo_partial().

I like this patch and I can't find any holes in the idea.

But some things have changed in the meantime and this patch
(and probably 9/9 too) don't apply cleanly.  Could you respin
these against current tcp-2.6 so I can apply them?

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out
  2007-05-31 16:31         ` Ilpo Järvinen
@ 2007-07-03  5:02           ` David Miller
  0 siblings, 0 replies; 37+ messages in thread
From: David Miller @ 2007-07-03  5:02 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 31 May 2007 19:31:21 +0300 (EEST)

> (i.e, how does it show up in oops if tcp_verify_left_out looks like this):
> 
> #define tcp_verify_left_out(tp) BUG_ON(...)
> 
> ...does it still point to the tcp.h line xxxx then or to the
> tcp_sync_left_out(tp) @ line yyyy in tcp_input.c? ...I'll test that
> later on by myself in case you don't know the answer.

Yes a define would make the BUG appear in the function it gets
used in.

Depending upon the configuration and the implementation of BUG()
the same might happen for inline functions too.  For example, if
the BUG() is configured in non-verbose mode and is implemented using
a trap instruction or __builtin_trap().  But you cannot depend
upon this universally of course.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue
  2007-07-03  5:00                 ` [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue David Miller
@ 2007-07-03 11:08                   ` Ilpo Järvinen
  0 siblings, 0 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2007-07-03 11:08 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2935 bytes --]

On Mon, 2 Jul 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Sat, 26 May 2007 11:36:01 +0300
> 
> > Previously TCP had a transitional state during which reno
> > counted segments that are already below the current window into
> > sacked_out, which is now prevented. Re-try now unconditional
> > S+L catching (I wonder if we could get that BUG_ON place
> > pinpointed more exactly in oops because now inlining makes it
> > lose its original context as they always seem to be in
> > tcp_ack, #define helps?).
> > 
> > Beware, this change is not a trivial one and might have some
> > unexpected side-effects under obscure conditions since state
> > tracking is to happen much later on and the reno sack counting
> > was highly depending on the current state.
> > 
> > This approach conservatively calls just remove_sack and leaves
> > reset_sack() calls alone. The best solution to the whole problem
> > would be to first calculate the new sacked_out fully (this patch
> > does not move reno_sack_reset calls from original sites and thus
> > does not implement this). However, that would require very
> > invasive change to fastretrans_alert (perhaps even slicing it to
> > two halves). Alternatively, all callers of tcp_packets_in_flight
> > (i.e., users that depend on sacked_out) should be postponed
> > until the new sacked_out has been calculated but it isn't any
> > simpler alternative.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> So basically the idea behind this patch is to do the update of
> the fake RENO sakcs in clean_rtx_queue instead of fixing it up
> at the very last moment right before we invoke tcp_try_to_undo_partial().

Yeap, it would change sacked_out that things are seeing before 
undo_partial point (if there would be some) but it would qualify IMHO 
as fix for those case too rather than bug. I checked that F-RTO (with 
reno) cannot ever access stale sacked_out because of other constraints 
based on ACK's snd_una, and that's pretty much what's being done between 
there.

TCP might still do another update later on by using tcp_reset_reno_sack, 
but that's always going to more aggressive update (or at least an equal 
one).

> I like this patch and I can't find any holes in the idea.
> 
> But some things have changed in the meantime and this patch
> (and probably 9/9 too) don't apply cleanly.  Could you respin
> these against current tcp-2.6 so I can apply them?

I knew that :-), it was postponed due to other, more important issues and 
because the trees then got some depencies at that point I decided it's 
better to wait for a while until tcp-2.6 gets all stuff that's to be in 
net-2.6... But now that things seem to have settled, I'll provide the 
updated one later on. The 9/9 should be dropped, I've noticed a cpu 
processing vulnerability in it which I previously didn't see there (I 
describe it in my other post).


-- 
 i.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2007-07-03 11:08 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-26  8:35 [PATCH 0/9]: tcp-2.6 patchset Ilpo Järvinen
2007-05-26  8:35 ` [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier Ilpo Järvinen
2007-05-26  8:35   ` [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting Ilpo Järvinen
2007-05-26  8:35     ` [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out Ilpo Järvinen
2007-05-26  8:35       ` [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint Ilpo Järvinen
2007-05-26  8:35         ` [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it Ilpo Järvinen
2007-05-26  8:35           ` [PATCH 6/9] [TCP]: Reorganize lost marking code Ilpo Järvinen
2007-05-26  8:36             ` [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq Ilpo Järvinen
2007-05-26  8:36               ` [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue Ilpo Järvinen
2007-05-26  8:36                 ` [PATCH 9/9] [RFC] [TCP]: Kill tp->fackets_out (tcp_sock diet program) Ilpo Järvinen
2007-07-03  5:00                 ` [PATCH 8/9] [TCP]: Reduce sacked_out with reno when purging write_queue David Miller
2007-07-03 11:08                   ` Ilpo Järvinen
2007-05-31  8:44               ` [PATCH 7/9] [TCP]: Correct fastpath entrypoint below high_seq David Miller
2007-05-27  7:38             ` [PATCH 6/9] [TCP]: Reorganize lost marking code Ilpo Järvinen
2007-05-31  8:44             ` David Miller
2007-05-31  8:43           ` [PATCH 5/9] [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it David Miller
2007-05-31  8:43         ` [PATCH 4/9] [TCP]: Access to highest_sack obsoletes forward_cnt_hint David Miller
2007-05-31  8:42       ` [PATCH 3/9] [TCP]: Tighten tcp_sock's belt, drop left_out David Miller
2007-05-31 16:31         ` Ilpo Järvinen
2007-07-03  5:02           ` David Miller
2007-05-31  8:40     ` [PATCH 2/9] [TCP] FRTO: remove unnecessary fackets/sacked_out recounting David Miller
2007-05-31  8:39   ` [PATCH 1/9] [TCP]: Move Reno SACKed_out counter functions earlier David Miller
2007-05-26 23:44 ` [PATCH 0/9]: tcp-2.6 patchset David Miller
2007-05-27  7:58   ` Ilpo Järvinen
2007-05-27  9:11     ` David Miller
2007-05-27 11:16       ` Ilpo Järvinen
2007-05-27 14:04         ` Baruch Even
2007-05-27 16:56           ` Ilpo Järvinen
2007-05-28 10:27             ` Ilpo Järvinen
2007-05-29 16:14               ` Stephen Hemminger
2007-05-29 20:07                 ` Ilpo Järvinen
2007-05-29 20:19                   ` Stephen Hemminger
2007-05-29 20:58                     ` Ilpo Järvinen
2007-05-29 21:15                       ` Stephen Hemminger
2007-05-30  9:10                       ` [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules) Ilpo Järvinen
2007-06-01  4:38                         ` David Miller
2007-06-01 11:22                           ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).