* Re: simplify microsecond rtt sampling
@ 2006-09-27 11:21 John Heffner
2006-09-27 15:13 ` John Heffner
0 siblings, 1 reply; 6+ messages in thread
From: John Heffner @ 2006-09-27 11:21 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 488 bytes --]
About commit 2d2abbab63f6726a147ae61ada39bf2c9ee0db9a:
It looks like this patch bypassed the enforcement of Karn's algorithm in
tcp_ack_no_tstamp() for the purposes of usec RTT sampling used by
congestion control modules. This will give them bad RTT data when there
are retransmits. I haven't actually observed this, but it seems like it
would be the case. ;) Please correct me if I'm wrong.
Here's a patch that should be a fix.
Signed-off-by: John Heffner <jheffner@psc.edu>
[-- Attachment #2: rtt_sample_fix.patch --]
[-- Type: text/plain, Size: 1061 bytes --]
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 159fa3f..725c868 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2306,8 +2306,6 @@ static int tcp_clean_rtx_queue(struct so
seq_rtt = -1;
} else if (seq_rtt < 0) {
seq_rtt = now - scb->when;
- if (rtt_sample)
- (*rtt_sample)(sk, tcp_usrtt(skb));
}
if (sacked & TCPCB_SACKED_ACKED)
tp->sacked_out -= tcp_skb_pcount(skb);
@@ -2320,8 +2318,6 @@ static int tcp_clean_rtx_queue(struct so
}
} else if (seq_rtt < 0) {
seq_rtt = now - scb->when;
- if (rtt_sample)
- (*rtt_sample)(sk, tcp_usrtt(skb));
}
tcp_dec_pcount_approx(&tp->fackets_out, skb);
tcp_packets_out_dec(tp, skb);
@@ -2333,6 +2329,8 @@ static int tcp_clean_rtx_queue(struct so
if (acked&FLAG_ACKED) {
tcp_ack_update_rtt(sk, acked, seq_rtt);
tcp_ack_packets_out(sk, tp);
+ if (rtt_sample && !(acked & FLAG_RETRANS_DATA_ACKED))
+ (*rtt_sample)(sk, tcp_usrtt(skb));
if (icsk->icsk_ca_ops->pkts_acked)
icsk->icsk_ca_ops->pkts_acked(sk, pkts_acked);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: simplify microsecond rtt sampling
2006-09-27 11:21 simplify microsecond rtt sampling John Heffner
@ 2006-09-27 15:13 ` John Heffner
2006-09-28 12:35 ` John Heffner
0 siblings, 1 reply; 6+ messages in thread
From: John Heffner @ 2006-09-27 15:13 UTC (permalink / raw)
To: John Heffner; +Cc: Stephen Hemminger, David Miller, netdev
Okay, this patch is junk (never trust compile-tested code). Will send
something better soon.
-John
John Heffner wrote:
> About commit 2d2abbab63f6726a147ae61ada39bf2c9ee0db9a:
>
> It looks like this patch bypassed the enforcement of Karn's algorithm in
> tcp_ack_no_tstamp() for the purposes of usec RTT sampling used by
> congestion control modules. This will give them bad RTT data when there
> are retransmits. I haven't actually observed this, but it seems like it
> would be the case. ;) Please correct me if I'm wrong.
>
> Here's a patch that should be a fix.
>
>
>
> Signed-off-by: John Heffner <jheffner@psc.edu>
>
>
> ------------------------------------------------------------------------
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 159fa3f..725c868 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2306,8 +2306,6 @@ static int tcp_clean_rtx_queue(struct so
> seq_rtt = -1;
> } else if (seq_rtt < 0) {
> seq_rtt = now - scb->when;
> - if (rtt_sample)
> - (*rtt_sample)(sk, tcp_usrtt(skb));
> }
> if (sacked & TCPCB_SACKED_ACKED)
> tp->sacked_out -= tcp_skb_pcount(skb);
> @@ -2320,8 +2318,6 @@ static int tcp_clean_rtx_queue(struct so
> }
> } else if (seq_rtt < 0) {
> seq_rtt = now - scb->when;
> - if (rtt_sample)
> - (*rtt_sample)(sk, tcp_usrtt(skb));
> }
> tcp_dec_pcount_approx(&tp->fackets_out, skb);
> tcp_packets_out_dec(tp, skb);
> @@ -2333,6 +2329,8 @@ static int tcp_clean_rtx_queue(struct so
> if (acked&FLAG_ACKED) {
> tcp_ack_update_rtt(sk, acked, seq_rtt);
> tcp_ack_packets_out(sk, tp);
> + if (rtt_sample && !(acked & FLAG_RETRANS_DATA_ACKED))
> + (*rtt_sample)(sk, tcp_usrtt(skb));
>
> if (icsk->icsk_ca_ops->pkts_acked)
> icsk->icsk_ca_ops->pkts_acked(sk, pkts_acked);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: simplify microsecond rtt sampling
2006-09-27 15:13 ` John Heffner
@ 2006-09-28 12:35 ` John Heffner
2006-09-28 12:47 ` John Heffner
0 siblings, 1 reply; 6+ messages in thread
From: John Heffner @ 2006-09-28 12:35 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 77 bytes --]
Here is a corrected patch.
Signed-off-by: John Heffner <jheffner@psc.edu>
[-- Attachment #2: rtt_sample_fix.patch --]
[-- Type: text/plain, Size: 1686 bytes --]
-static u32 tcp_usrtt(const struct sk_buff *skb)
+static u32 tcp_usrtt(struct timeval *tv)
{
- struct timeval tv, now;
+ struct timeval now;
do_gettimeofday(&now);
- skb_get_timestamp(skb, &tv);
- return (now.tv_sec - tv.tv_sec) * 1000000 + (now.tv_usec - tv.tv_usec);
+ return (now.tv_sec - tv->tv_sec) * 1000000 + (now.tv_usec - tv->tv_usec);
}
/* Remove acknowledged frames from the retransmission queue. */
@@ -2249,6 +2248,7 @@ static int tcp_clean_rtx_queue(struct so
u32 pkts_acked = 0;
void (*rtt_sample)(struct sock *sk, u32 usrtt)
= icsk->icsk_ca_ops->rtt_sample;
+ struct timeval tv;
while ((skb = skb_peek(&sk->sk_write_queue)) &&
skb != sk->sk_send_head) {
@@ -2297,8 +2297,7 @@ static int tcp_clean_rtx_queue(struct so
seq_rtt = -1;
} else if (seq_rtt < 0) {
seq_rtt = now - scb->when;
- if (rtt_sample)
- (*rtt_sample)(sk, tcp_usrtt(skb));
+ skb_get_timestamp(skb, &tv);
}
if (sacked & TCPCB_SACKED_ACKED)
tp->sacked_out -= tcp_skb_pcount(skb);
@@ -2311,8 +2310,7 @@ static int tcp_clean_rtx_queue(struct so
}
} else if (seq_rtt < 0) {
seq_rtt = now - scb->when;
- if (rtt_sample)
- (*rtt_sample)(sk, tcp_usrtt(skb));
+ skb_get_timestamp(skb, &tv);
}
tcp_dec_pcount_approx(&tp->fackets_out, skb);
tcp_packets_out_dec(tp, skb);
@@ -2324,6 +2322,8 @@ static int tcp_clean_rtx_queue(struct so
if (acked&FLAG_ACKED) {
tcp_ack_update_rtt(sk, acked, seq_rtt);
tcp_ack_packets_out(sk, tp);
+ if (rtt_sample && !(acked & FLAG_RETRANS_DATA_ACKED))
+ (*rtt_sample)(sk, tcp_usrtt(&tv));
if (icsk->icsk_ca_ops->pkts_acked)
icsk->icsk_ca_ops->pkts_acked(sk, pkts_acked);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: simplify microsecond rtt sampling
2006-09-28 12:35 ` John Heffner
@ 2006-09-28 12:47 ` John Heffner
2006-09-28 16:36 ` Stephen Hemminger
0 siblings, 1 reply; 6+ messages in thread
From: John Heffner @ 2006-09-28 12:47 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 337 bytes --]
Sigh. Here's one that's not corrupted. And for the record:
This changes the microsecond RTT sampling so that samples are taken in
the same way that RTT samples are taken for the RTO calculator: on the
last segment acknowledged, and only when the segment hasn't been
retransmitted.
Signed-off-by: John Heffner <jheffner@psc.edu>
[-- Attachment #2: rtt_sample_fix.patch --]
[-- Type: text/plain, Size: 1913 bytes --]
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b5521a9..d0f6bd6 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2228,13 +2228,12 @@ static int tcp_tso_acked(struct sock *sk
return acked;
}
-static u32 tcp_usrtt(const struct sk_buff *skb)
+static u32 tcp_usrtt(struct timeval *tv)
{
- struct timeval tv, now;
+ struct timeval now;
do_gettimeofday(&now);
- skb_get_timestamp(skb, &tv);
- return (now.tv_sec - tv.tv_sec) * 1000000 + (now.tv_usec - tv.tv_usec);
+ return (now.tv_sec - tv->tv_sec) * 1000000 + (now.tv_usec - tv->tv_usec);
}
/* Remove acknowledged frames from the retransmission queue. */
@@ -2249,6 +2248,7 @@ static int tcp_clean_rtx_queue(struct so
u32 pkts_acked = 0;
void (*rtt_sample)(struct sock *sk, u32 usrtt)
= icsk->icsk_ca_ops->rtt_sample;
+ struct timeval tv;
while ((skb = skb_peek(&sk->sk_write_queue)) &&
skb != sk->sk_send_head) {
@@ -2297,8 +2297,7 @@ static int tcp_clean_rtx_queue(struct so
seq_rtt = -1;
} else if (seq_rtt < 0) {
seq_rtt = now - scb->when;
- if (rtt_sample)
- (*rtt_sample)(sk, tcp_usrtt(skb));
+ skb_get_timestamp(skb, &tv);
}
if (sacked & TCPCB_SACKED_ACKED)
tp->sacked_out -= tcp_skb_pcount(skb);
@@ -2311,8 +2310,7 @@ static int tcp_clean_rtx_queue(struct so
}
} else if (seq_rtt < 0) {
seq_rtt = now - scb->when;
- if (rtt_sample)
- (*rtt_sample)(sk, tcp_usrtt(skb));
+ skb_get_timestamp(skb, &tv);
}
tcp_dec_pcount_approx(&tp->fackets_out, skb);
tcp_packets_out_dec(tp, skb);
@@ -2324,6 +2322,8 @@ static int tcp_clean_rtx_queue(struct so
if (acked&FLAG_ACKED) {
tcp_ack_update_rtt(sk, acked, seq_rtt);
tcp_ack_packets_out(sk, tp);
+ if (rtt_sample && !(acked & FLAG_RETRANS_DATA_ACKED))
+ (*rtt_sample)(sk, tcp_usrtt(&tv));
if (icsk->icsk_ca_ops->pkts_acked)
icsk->icsk_ca_ops->pkts_acked(sk, pkts_acked);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: simplify microsecond rtt sampling
2006-09-28 12:47 ` John Heffner
@ 2006-09-28 16:36 ` Stephen Hemminger
2006-09-28 21:49 ` David Miller
0 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2006-09-28 16:36 UTC (permalink / raw)
To: John Heffner; +Cc: David Miller, netdev
On Thu, 28 Sep 2006 13:47:28 +0100
John Heffner <jheffner@psc.edu> wrote:
> Sigh. Here's one that's not corrupted. And for the record:
>
>
> This changes the microsecond RTT sampling so that samples are taken in
> the same way that RTT samples are taken for the RTO calculator: on the
> last segment acknowledged, and only when the segment hasn't been
> retransmitted.
>
>
> Signed-off-by: John Heffner <jheffner@psc.edu>
Looks ok. I wish skbuff timestamp was converted to ktime.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: simplify microsecond rtt sampling
2006-09-28 16:36 ` Stephen Hemminger
@ 2006-09-28 21:49 ` David Miller
0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2006-09-28 21:49 UTC (permalink / raw)
To: shemminger; +Cc: jheffner, netdev
From: Stephen Hemminger <shemminger@osdl.org>
Date: Thu, 28 Sep 2006 09:36:50 -0700
> On Thu, 28 Sep 2006 13:47:28 +0100
> John Heffner <jheffner@psc.edu> wrote:
>
> > Sigh. Here's one that's not corrupted. And for the record:
> >
> >
> > This changes the microsecond RTT sampling so that samples are taken in
> > the same way that RTT samples are taken for the RTO calculator: on the
> > last segment acknowledged, and only when the segment hasn't been
> > retransmitted.
> >
> >
> > Signed-off-by: John Heffner <jheffner@psc.edu>
>
> Looks ok. I wish skbuff timestamp was converted to ktime.
It looks good to me too, applied.
Since this is really a bug fix, I'm goint to push this into
-stable as well.
Thanks a lot.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-09-28 21:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-27 11:21 simplify microsecond rtt sampling John Heffner
2006-09-27 15:13 ` John Heffner
2006-09-28 12:35 ` John Heffner
2006-09-28 12:47 ` John Heffner
2006-09-28 16:36 ` Stephen Hemminger
2006-09-28 21:49 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).