netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: simplify microsecond rtt sampling
@ 2006-09-27 11:21 John Heffner
  2006-09-27 15:13 ` John Heffner
  0 siblings, 1 reply; 6+ messages in thread
From: John Heffner @ 2006-09-27 11:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 488 bytes --]

About commit 2d2abbab63f6726a147ae61ada39bf2c9ee0db9a:

It looks like this patch bypassed the enforcement of Karn's algorithm in 
tcp_ack_no_tstamp() for the purposes of usec RTT sampling used by 
congestion control modules.  This will give them bad RTT data when there 
are retransmits.  I haven't actually observed this, but it seems like it 
would be the case. ;)  Please correct me if I'm wrong.

Here's a patch that should be a fix.



Signed-off-by: John Heffner <jheffner@psc.edu>

[-- Attachment #2: rtt_sample_fix.patch --]
[-- Type: text/plain, Size: 1061 bytes --]

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 159fa3f..725c868 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2306,8 +2306,6 @@ static int tcp_clean_rtx_queue(struct so
 				seq_rtt = -1;
 			} else if (seq_rtt < 0) {
 				seq_rtt = now - scb->when;
-				if (rtt_sample)
-					(*rtt_sample)(sk, tcp_usrtt(skb));
 			}
 			if (sacked & TCPCB_SACKED_ACKED)
 				tp->sacked_out -= tcp_skb_pcount(skb);
@@ -2320,8 +2318,6 @@ static int tcp_clean_rtx_queue(struct so
 			}
 		} else if (seq_rtt < 0) {
 			seq_rtt = now - scb->when;
-			if (rtt_sample)
-				(*rtt_sample)(sk, tcp_usrtt(skb));
 		}
 		tcp_dec_pcount_approx(&tp->fackets_out, skb);
 		tcp_packets_out_dec(tp, skb);
@@ -2333,6 +2329,8 @@ static int tcp_clean_rtx_queue(struct so
 	if (acked&FLAG_ACKED) {
 		tcp_ack_update_rtt(sk, acked, seq_rtt);
 		tcp_ack_packets_out(sk, tp);
+		if (rtt_sample && !(acked & FLAG_RETRANS_DATA_ACKED))
+			(*rtt_sample)(sk, tcp_usrtt(skb));
 
 		if (icsk->icsk_ca_ops->pkts_acked)
 			icsk->icsk_ca_ops->pkts_acked(sk, pkts_acked);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: simplify microsecond rtt sampling
  2006-09-27 11:21 simplify microsecond rtt sampling John Heffner
@ 2006-09-27 15:13 ` John Heffner
  2006-09-28 12:35   ` John Heffner
  0 siblings, 1 reply; 6+ messages in thread
From: John Heffner @ 2006-09-27 15:13 UTC (permalink / raw)
  To: John Heffner; +Cc: Stephen Hemminger, David Miller, netdev

Okay, this patch is junk (never trust compile-tested code).  Will send 
something better soon.

   -John


John Heffner wrote:
> About commit 2d2abbab63f6726a147ae61ada39bf2c9ee0db9a:
> 
> It looks like this patch bypassed the enforcement of Karn's algorithm in 
> tcp_ack_no_tstamp() for the purposes of usec RTT sampling used by 
> congestion control modules.  This will give them bad RTT data when there 
> are retransmits.  I haven't actually observed this, but it seems like it 
> would be the case. ;)  Please correct me if I'm wrong.
> 
> Here's a patch that should be a fix.
> 
> 
> 
> Signed-off-by: John Heffner <jheffner@psc.edu>
> 
> 
> ------------------------------------------------------------------------
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 159fa3f..725c868 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2306,8 +2306,6 @@ static int tcp_clean_rtx_queue(struct so
>  				seq_rtt = -1;
>  			} else if (seq_rtt < 0) {
>  				seq_rtt = now - scb->when;
> -				if (rtt_sample)
> -					(*rtt_sample)(sk, tcp_usrtt(skb));
>  			}
>  			if (sacked & TCPCB_SACKED_ACKED)
>  				tp->sacked_out -= tcp_skb_pcount(skb);
> @@ -2320,8 +2318,6 @@ static int tcp_clean_rtx_queue(struct so
>  			}
>  		} else if (seq_rtt < 0) {
>  			seq_rtt = now - scb->when;
> -			if (rtt_sample)
> -				(*rtt_sample)(sk, tcp_usrtt(skb));
>  		}
>  		tcp_dec_pcount_approx(&tp->fackets_out, skb);
>  		tcp_packets_out_dec(tp, skb);
> @@ -2333,6 +2329,8 @@ static int tcp_clean_rtx_queue(struct so
>  	if (acked&FLAG_ACKED) {
>  		tcp_ack_update_rtt(sk, acked, seq_rtt);
>  		tcp_ack_packets_out(sk, tp);
> +		if (rtt_sample && !(acked & FLAG_RETRANS_DATA_ACKED))
> +			(*rtt_sample)(sk, tcp_usrtt(skb));
>  
>  		if (icsk->icsk_ca_ops->pkts_acked)
>  			icsk->icsk_ca_ops->pkts_acked(sk, pkts_acked);


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: simplify microsecond rtt sampling
  2006-09-27 15:13 ` John Heffner
@ 2006-09-28 12:35   ` John Heffner
  2006-09-28 12:47     ` John Heffner
  0 siblings, 1 reply; 6+ messages in thread
From: John Heffner @ 2006-09-28 12:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 77 bytes --]

Here is a corrected patch.


Signed-off-by: John Heffner <jheffner@psc.edu>


[-- Attachment #2: rtt_sample_fix.patch --]
[-- Type: text/plain, Size: 1686 bytes --]

-static u32 tcp_usrtt(const struct sk_buff *skb)
+static u32 tcp_usrtt(struct timeval *tv)
 {
-	struct timeval tv, now;
+	struct timeval now;
 
 	do_gettimeofday(&now);
-	skb_get_timestamp(skb, &tv);
-	return (now.tv_sec - tv.tv_sec) * 1000000 + (now.tv_usec - tv.tv_usec);
+	return (now.tv_sec - tv->tv_sec) * 1000000 + (now.tv_usec - tv->tv_usec);
 }
 
 /* Remove acknowledged frames from the retransmission queue. */
@@ -2249,6 +2248,7 @@ static int tcp_clean_rtx_queue(struct so
 	u32 pkts_acked = 0;
 	void (*rtt_sample)(struct sock *sk, u32 usrtt)
 		= icsk->icsk_ca_ops->rtt_sample;
+	struct timeval tv;
 
 	while ((skb = skb_peek(&sk->sk_write_queue)) &&
 	       skb != sk->sk_send_head) {
@@ -2297,8 +2297,7 @@ static int tcp_clean_rtx_queue(struct so
 				seq_rtt = -1;
 			} else if (seq_rtt < 0) {
 				seq_rtt = now - scb->when;
-				if (rtt_sample)
-					(*rtt_sample)(sk, tcp_usrtt(skb));
+				skb_get_timestamp(skb, &tv);
 			}
 			if (sacked & TCPCB_SACKED_ACKED)
 				tp->sacked_out -= tcp_skb_pcount(skb);
@@ -2311,8 +2310,7 @@ static int tcp_clean_rtx_queue(struct so
 			}
 		} else if (seq_rtt < 0) {
 			seq_rtt = now - scb->when;
-			if (rtt_sample)
-				(*rtt_sample)(sk, tcp_usrtt(skb));
+			skb_get_timestamp(skb, &tv);
 		}
 		tcp_dec_pcount_approx(&tp->fackets_out, skb);
 		tcp_packets_out_dec(tp, skb);
@@ -2324,6 +2322,8 @@ static int tcp_clean_rtx_queue(struct so
 	if (acked&FLAG_ACKED) {
 		tcp_ack_update_rtt(sk, acked, seq_rtt);
 		tcp_ack_packets_out(sk, tp);
+		if (rtt_sample && !(acked & FLAG_RETRANS_DATA_ACKED))
+			(*rtt_sample)(sk, tcp_usrtt(&tv));
 
 		if (icsk->icsk_ca_ops->pkts_acked)
 			icsk->icsk_ca_ops->pkts_acked(sk, pkts_acked);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: simplify microsecond rtt sampling
  2006-09-28 12:35   ` John Heffner
@ 2006-09-28 12:47     ` John Heffner
  2006-09-28 16:36       ` Stephen Hemminger
  0 siblings, 1 reply; 6+ messages in thread
From: John Heffner @ 2006-09-28 12:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 337 bytes --]

Sigh.  Here's one that's not corrupted.  And for the record:


This changes the microsecond RTT sampling so that samples are taken in 
the same way that RTT samples are taken for the RTO calculator: on the 
last segment acknowledged, and only when the segment hasn't been 
retransmitted.


Signed-off-by: John Heffner <jheffner@psc.edu>

[-- Attachment #2: rtt_sample_fix.patch --]
[-- Type: text/plain, Size: 1913 bytes --]

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b5521a9..d0f6bd6 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2228,13 +2228,12 @@ static int tcp_tso_acked(struct sock *sk
 	return acked;
 }
 
-static u32 tcp_usrtt(const struct sk_buff *skb)
+static u32 tcp_usrtt(struct timeval *tv)
 {
-	struct timeval tv, now;
+	struct timeval now;
 
 	do_gettimeofday(&now);
-	skb_get_timestamp(skb, &tv);
-	return (now.tv_sec - tv.tv_sec) * 1000000 + (now.tv_usec - tv.tv_usec);
+	return (now.tv_sec - tv->tv_sec) * 1000000 + (now.tv_usec - tv->tv_usec);
 }
 
 /* Remove acknowledged frames from the retransmission queue. */
@@ -2249,6 +2248,7 @@ static int tcp_clean_rtx_queue(struct so
 	u32 pkts_acked = 0;
 	void (*rtt_sample)(struct sock *sk, u32 usrtt)
 		= icsk->icsk_ca_ops->rtt_sample;
+	struct timeval tv;
 
 	while ((skb = skb_peek(&sk->sk_write_queue)) &&
 	       skb != sk->sk_send_head) {
@@ -2297,8 +2297,7 @@ static int tcp_clean_rtx_queue(struct so
 				seq_rtt = -1;
 			} else if (seq_rtt < 0) {
 				seq_rtt = now - scb->when;
-				if (rtt_sample)
-					(*rtt_sample)(sk, tcp_usrtt(skb));
+				skb_get_timestamp(skb, &tv);
 			}
 			if (sacked & TCPCB_SACKED_ACKED)
 				tp->sacked_out -= tcp_skb_pcount(skb);
@@ -2311,8 +2310,7 @@ static int tcp_clean_rtx_queue(struct so
 			}
 		} else if (seq_rtt < 0) {
 			seq_rtt = now - scb->when;
-			if (rtt_sample)
-				(*rtt_sample)(sk, tcp_usrtt(skb));
+			skb_get_timestamp(skb, &tv);
 		}
 		tcp_dec_pcount_approx(&tp->fackets_out, skb);
 		tcp_packets_out_dec(tp, skb);
@@ -2324,6 +2322,8 @@ static int tcp_clean_rtx_queue(struct so
 	if (acked&FLAG_ACKED) {
 		tcp_ack_update_rtt(sk, acked, seq_rtt);
 		tcp_ack_packets_out(sk, tp);
+		if (rtt_sample && !(acked & FLAG_RETRANS_DATA_ACKED))
+			(*rtt_sample)(sk, tcp_usrtt(&tv));
 
 		if (icsk->icsk_ca_ops->pkts_acked)
 			icsk->icsk_ca_ops->pkts_acked(sk, pkts_acked);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: simplify microsecond rtt sampling
  2006-09-28 12:47     ` John Heffner
@ 2006-09-28 16:36       ` Stephen Hemminger
  2006-09-28 21:49         ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2006-09-28 16:36 UTC (permalink / raw)
  To: John Heffner; +Cc: David Miller, netdev

On Thu, 28 Sep 2006 13:47:28 +0100
John Heffner <jheffner@psc.edu> wrote:

> Sigh.  Here's one that's not corrupted.  And for the record:
> 
> 
> This changes the microsecond RTT sampling so that samples are taken in 
> the same way that RTT samples are taken for the RTO calculator: on the 
> last segment acknowledged, and only when the segment hasn't been 
> retransmitted.
> 
> 
> Signed-off-by: John Heffner <jheffner@psc.edu>

Looks ok. I wish skbuff timestamp was converted to ktime.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: simplify microsecond rtt sampling
  2006-09-28 16:36       ` Stephen Hemminger
@ 2006-09-28 21:49         ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2006-09-28 21:49 UTC (permalink / raw)
  To: shemminger; +Cc: jheffner, netdev

From: Stephen Hemminger <shemminger@osdl.org>
Date: Thu, 28 Sep 2006 09:36:50 -0700

> On Thu, 28 Sep 2006 13:47:28 +0100
> John Heffner <jheffner@psc.edu> wrote:
> 
> > Sigh.  Here's one that's not corrupted.  And for the record:
> > 
> > 
> > This changes the microsecond RTT sampling so that samples are taken in 
> > the same way that RTT samples are taken for the RTO calculator: on the 
> > last segment acknowledged, and only when the segment hasn't been 
> > retransmitted.
> > 
> > 
> > Signed-off-by: John Heffner <jheffner@psc.edu>
> 
> Looks ok. I wish skbuff timestamp was converted to ktime.

It looks good to me too, applied.

Since this is really a bug fix, I'm goint to push this into
-stable as well.

Thanks a lot.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-09-28 21:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-27 11:21 simplify microsecond rtt sampling John Heffner
2006-09-27 15:13 ` John Heffner
2006-09-28 12:35   ` John Heffner
2006-09-28 12:47     ` John Heffner
2006-09-28 16:36       ` Stephen Hemminger
2006-09-28 21:49         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).