* [PATCH/RFC] TCP: use non-delayed ACK for congestion control RTT
@ 2007-12-17 13:44 Gavin McCullagh
2007-12-18 20:40 ` [PATCH/RFC] [v2] " Gavin McCullagh
0 siblings, 1 reply; 18+ messages in thread
From: Gavin McCullagh @ 2007-12-17 13:44 UTC (permalink / raw)
To: netdev
When a delayed ACK representing two packets arrives, there are two RTT
samples available, one for each packet. The first (in order of seq number)
will be artificially long due to the delay waiting for the second packet,
the second will trigger the ACK and so will not itself be delayed.
According to rfc1323, the SRTT used for RTO calculation should use the
first rtt, so receivers echo the timestamp from the first packet in the
delayed ack. For congestion control however, it seems measuring delayed
ack delay is not desirable as it varies independently of congestion.
The patch below causes seq_rtt to be updated with any available later
packet rtts which should have less (and hopefully zero) delack delay. The
lower seq_rtt then gets passed to ca_ops->pkts_acked().
For non-delay based congestion control (cubic, h-tcp), rtt is sometimes
used for rtt-scaling. In shortening the RTT, this may make them a little
less aggressive. Delay-based schemes (eg vegas, illinois) should get a
considerably cleaner, more accurate congestion signal, particularly for
small cwnds. The congestion control module can potentially also filter out
bad RTTs due to the delayed ack alarm by looking at the associated cnt
which (where delayed acking is in use) should probably be 1 if the alarm
went off or greater if the ACK was triggered by a packet.
I seem to be undoing a design decision here so perhaps there is some reason
this should not be done? Comments/explanations appreciated...
Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie>
--- a/net/ipv4/tcp_input.c 2007-12-15 00:22:23.000000000 +0000
+++ b/net/ipv4/tcp_input.c 2007-12-17 13:35:16.000000000 +0000
@@ -2691,11 +2691,9 @@ static int tcp_clean_rtx_queue(struct so
(packets_acked > 1))
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
- if (seq_rtt < 0) {
- seq_rtt = now - scb->when;
- if (fully_acked)
- last_ackt = skb->tstamp;
- }
+ seq_rtt = now - scb->when;
+ if (fully_acked)
+ last_ackt = skb->tstamp;
if (!(sacked & TCPCB_SACKED_ACKED))
reord = min(cnt, reord);
}
@@ -2709,11 +2707,9 @@ static int tcp_clean_rtx_queue(struct so
!before(end_seq, tp->snd_up))
tp->urg_mode = 0;
} else {
- if (seq_rtt < 0) {
- seq_rtt = now - scb->when;
- if (fully_acked)
- last_ackt = skb->tstamp;
- }
+ seq_rtt = now - scb->when;
+ if (fully_acked)
+ last_ackt = skb->tstamp;
reord = min(cnt, reord);
}
tp->packets_out -= packets_acked;
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-17 13:44 [PATCH/RFC] TCP: use non-delayed ACK for congestion control RTT Gavin McCullagh
@ 2007-12-18 20:40 ` Gavin McCullagh
2007-12-19 10:28 ` Gavin McCullagh
2007-12-19 11:08 ` Ilpo Järvinen
0 siblings, 2 replies; 18+ messages in thread
From: Gavin McCullagh @ 2007-12-18 20:40 UTC (permalink / raw)
To: netdev
The last attempt didn't take account of the situation where a timestamp
wasn't available and tcp_clean_rtx_queue() has to feed both the RTO and the
congestion avoidance. This updated patch stores both RTTs, making the
delayed one available for the RTO and the other (ca_seq_rtt) available for
congestion control.
Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie>
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 889c893..6fb7989 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
u32 cnt = 0;
u32 reord = tp->packets_out;
s32 seq_rtt = -1;
+ s32 ca_seq_rtt = -1;
ktime_t last_ackt = net_invalid_timestamp();
while ((skb = tcp_write_queue_head(sk)) && skb != tcp_send_head(sk)) {
@@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
if (sacked & TCPCB_SACKED_RETRANS)
tp->retrans_out -= packets_acked;
flag |= FLAG_RETRANS_DATA_ACKED;
+ ca_seq_rtt = -1;
seq_rtt = -1;
if ((flag & FLAG_DATA_ACKED) ||
(packets_acked > 1))
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
+ ca_seq_rtt = now - scb->when;
if (seq_rtt < 0) {
- seq_rtt = now - scb->when;
+ seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb->tstamp;
}
@@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
!before(end_seq, tp->snd_up))
tp->urg_mode = 0;
} else {
+ ca_seq_rtt = now - scb->when;
if (seq_rtt < 0) {
- seq_rtt = now - scb->when;
+ seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb->tstamp;
}
@@ -2772,8 +2776,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
net_invalid_timestamp()))
rtt_us = ktime_us_delta(ktime_get_real(),
last_ackt);
- else if (seq_rtt > 0)
- rtt_us = jiffies_to_usecs(seq_rtt);
+ else if (ca_seq_rtt > 0)
+ rtt_us = jiffies_to_usecs(ca_seq_rtt);
}
ca_ops->pkts_acked(sk, pkts_acked, rtt_us);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-18 20:40 ` [PATCH/RFC] [v2] " Gavin McCullagh
@ 2007-12-19 10:28 ` Gavin McCullagh
2007-12-19 11:08 ` Ilpo Järvinen
1 sibling, 0 replies; 18+ messages in thread
From: Gavin McCullagh @ 2007-12-19 10:28 UTC (permalink / raw)
To: netdev
Hi,
On Tue, 18 Dec 2007, Gavin McCullagh wrote:
> The last attempt didn't take account of the situation where a timestamp
> wasn't available and tcp_clean_rtx_queue() has to feed both the RTO and the
> congestion avoidance. This updated patch stores both RTTs, making the
> delayed one available for the RTO and the other (ca_seq_rtt) available for
> congestion control.
I forgot to include some data to show the difference this can make to the
RTT signal:
http://www.hamilton.ie/gavinmc/linux/tcp_clean_rtx_queue.html
Gavin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-18 20:40 ` [PATCH/RFC] [v2] " Gavin McCullagh
2007-12-19 10:28 ` Gavin McCullagh
@ 2007-12-19 11:08 ` Ilpo Järvinen
2007-12-19 11:31 ` Gavin McCullagh
1 sibling, 1 reply; 18+ messages in thread
From: Ilpo Järvinen @ 2007-12-19 11:08 UTC (permalink / raw)
To: Gavin McCullagh; +Cc: Netdev
On Tue, 18 Dec 2007, Gavin McCullagh wrote:
> The last attempt didn't take account of the situation where a timestamp
> wasn't available and tcp_clean_rtx_queue() has to feed both the RTO and the
> congestion avoidance. This updated patch stores both RTTs, making the
> delayed one available for the RTO and the other (ca_seq_rtt) available for
> congestion control.
>
>
> Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie>
>
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 889c893..6fb7989 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
> if (sacked & TCPCB_SACKED_RETRANS)
> tp->retrans_out -= packets_acked;
> flag |= FLAG_RETRANS_DATA_ACKED;
> + ca_seq_rtt = -1;
> seq_rtt = -1;
> if ((flag & FLAG_DATA_ACKED) ||
> (packets_acked > 1))
> flag |= FLAG_NONHEAD_RETRANS_ACKED;
> } else {
> + ca_seq_rtt = now - scb->when;
Isn't it also much better this way in a case where ACK losses happened,
taking the longest RTT in that case is clearly questionable as it
may over-estimate considerably.
However, another thing to consider is the possibility of this value being
used in "timeout-like" fashion in ca modules (I haven't read enough ca
modules code to know if any of them does that), on contrary to
determinating just rtt or packet's delay in which case this change seems
appropriate (most modules do the latter). Therefore, if timeout-like
module exists one should also add TCP_CONG_RTT_STAMP_LONGEST for that
particular module and keep using seq_rtt for it like previously and use
ca_seq_rtt only for others.
> if (seq_rtt < 0) {
> - seq_rtt = now - scb->when;
> + seq_rtt = ca_seq_rtt;
> if (fully_acked)
> last_ackt = skb->tstamp;
> }
> @@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
> !before(end_seq, tp->snd_up))
> tp->urg_mode = 0;
> } else {
> + ca_seq_rtt = now - scb->when;
> if (seq_rtt < 0) {
> - seq_rtt = now - scb->when;
> + seq_rtt = ca_seq_rtt;
> if (fully_acked)
> last_ackt = skb->tstamp;
> }
This part doesn't exists anymore in development tree. Please base this
patch (and anything in future) you intend to get included to mainline
onto net-2.6.25 unless there's a very good reason to not do so or
whatever 2.6.xx is the correct net development tree at that time (if
one exists). Thanks.
--
i.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-19 11:08 ` Ilpo Järvinen
@ 2007-12-19 11:31 ` Gavin McCullagh
2007-12-19 13:30 ` Ilpo Järvinen
0 siblings, 1 reply; 18+ messages in thread
From: Gavin McCullagh @ 2007-12-19 11:31 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: Netdev
Hi,
On Wed, 19 Dec 2007, Ilpo Järvinen wrote:
> Isn't it also much better this way in a case where ACK losses happened,
> taking the longest RTT in that case is clearly questionable as it
> may over-estimate considerably.
Quite so.
> However, another thing to consider is the possibility of this value being
> used in "timeout-like" fashion in ca modules (I haven't read enough ca
> modules code to know if any of them does that), on contrary to
> determinating just rtt or packet's delay in which case this change seems
> appropriate (most modules do the latter).
I'm not aware of any, but I haven't read them all either. I would have
thought tp->srtt was the value to use in this instance, but perhaps the
individual timestamps including delack delay are useful.
> Therefore, if timeout-like module exists one should also add
> TCP_CONG_RTT_STAMP_LONGEST for that particular module and keep using
> seq_rtt for it like previously and use ca_seq_rtt only for others.
Seems reasonable. I'll add this.
> This part doesn't exists anymore in development tree. Please base this
> patch (and anything in future) you intend to get included to mainline
> onto net-2.6.25 unless there's a very good reason to not do so or
> whatever 2.6.xx is the correct net development tree at that time (if
> one exists). Thanks.
Will do. I gather I should use the latest net- tree in future when
submitting patches.
Thanks for the helpful comments,
Gavin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-19 11:31 ` Gavin McCullagh
@ 2007-12-19 13:30 ` Ilpo Järvinen
2007-12-21 11:14 ` David Miller
0 siblings, 1 reply; 18+ messages in thread
From: Ilpo Järvinen @ 2007-12-19 13:30 UTC (permalink / raw)
To: Gavin McCullagh; +Cc: Netdev, David Miller
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2350 bytes --]
On Wed, 19 Dec 2007, Gavin McCullagh wrote:
> On Wed, 19 Dec 2007, Ilpo Järvinen wrote:
>
> > Isn't it also much better this way in a case where ACK losses happened,
> > taking the longest RTT in that case is clearly questionable as it
> > may over-estimate considerably.
>
> Quite so.
>
> > However, another thing to consider is the possibility of this value being
> > used in "timeout-like" fashion in ca modules (I haven't read enough ca
> > modules code to know if any of them does that), on contrary to
> > determinating just rtt or packet's delay in which case this change seems
> > appropriate (most modules do the latter).
>
> I'm not aware of any, but I haven't read them all either. I would have
> thought tp->srtt was the value to use in this instance,
Very likely so...
> > Therefore, if timeout-like module exists one should also add
> > TCP_CONG_RTT_STAMP_LONGEST for that particular module and keep using
> > seq_rtt for it like previously and use ca_seq_rtt only for others.
>
> Seems reasonable. I'll add this.
...therefore I said "if". I'm not sure what they all do, haven't read them
all that carefully... :-) Please check first if ..._LONGEST is necessary
at all by quickly going through how the ca modules use it, I guess most of
them won't be that complicated, one can probably figure out the intented
usage by couple of minutes review. If there aren't any modules who need
delayed ACK & other path caused delays included, ..._LONGEST would just
end up being unnecessary cruft :-).
> > This part doesn't exists anymore in development tree. Please base this
> > patch (and anything in future) you intend to get included to mainline
> > onto net-2.6.25 unless there's a very good reason to not do so or
> > whatever 2.6.xx is the correct net development tree at that time (if
> > one exists). Thanks.
>
> Will do. I gather I should use the latest net- tree in future when
> submitting patches.
Doh, I owe you apology as I was probably too hasty to point you towards
net-2.6.25. I suppose this could by considered as fix as well and
therefore could probably be accepted to net-2.6 as well, which is for
bugfixes only after merge window is closed. But it's Dave how will make
such decisions, not me :-), and it's he who gets to deal with all
the resulting conflicts ;-) (I added Cc to him).
--
i.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-19 13:30 ` Ilpo Järvinen
@ 2007-12-21 11:14 ` David Miller
2007-12-21 13:31 ` Gavin McCullagh
0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2007-12-21 11:14 UTC (permalink / raw)
To: ilpo.jarvinen; +Cc: Gavin.McCullagh, netdev
From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 19 Dec 2007 15:30:03 +0200 (EET)
> On Wed, 19 Dec 2007, Gavin McCullagh wrote:
>
> > Will do. I gather I should use the latest net- tree in future when
> > submitting patches.
>
> Doh, I owe you apology as I was probably too hasty to point you towards
> net-2.6.25. I suppose this could by considered as fix as well and
> therefore could probably be accepted to net-2.6 as well, which is for
> bugfixes only after merge window is closed. But it's Dave how will make
> such decisions, not me :-), and it's he who gets to deal with all
> the resulting conflicts ;-) (I added Cc to him).
When Gavin respins the patch I'll look at in the context of submitting
it as a bug fix. So Gavin please generate the patch against Linus's
vanilla GIT tree or net-2.6, your choise.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-21 11:14 ` David Miller
@ 2007-12-21 13:31 ` Gavin McCullagh
2007-12-21 14:05 ` David Miller
2007-12-21 14:07 ` Ilpo Järvinen
0 siblings, 2 replies; 18+ messages in thread
From: Gavin McCullagh @ 2007-12-21 13:31 UTC (permalink / raw)
To: David Miller; +Cc: ilpo.jarvinen, netdev
On Fri, 21 Dec 2007, David Miller wrote:
> When Gavin respins the patch I'll look at in the context of submitting
> it as a bug fix. So Gavin please generate the patch against Linus's
> vanilla GIT tree or net-2.6, your choise.
The existing patch was against Linus' linux-2.6.git from a few days ago so
I've updated my tree and regenerated the patch (below). Is that the right
one?
I'm just checking through the existing CA modules. I don't see the rtt
used for RTO anywhere. This is what I gather they're each using rtt for.
tcp_highspeed.c doesn't implement .pkts_acked
tcp_hybla.c doesn't implement .pkts_acked
tcp_scalable.c doesn't implement .pkts_acked
tcp_bic.c ignores rtt value from .pkts_acked
tcp_lp.c seems to ignore rtt value from .pkts_acked (despite setting
TCP_CONG_RTT_STAMP for high res rtts -- why?)
tcp_vegas.c uses high res rtt to measure congestion signal, increase,
backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_veno.c uses high res rtt to measure congestion signal, increase,
backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_yeah.c uses high res rtt to measure congestion signal, increase,
backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_illinois.c uses rtt to scale increase, backoff
-- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_htcp.c uses rtt to scale increase, backoff
tcp_cubic.c uses rtt to scale increase, backoff
tcp_westwood.c scales backoff using rtt
So as far as I can tell, timeout stuff is not ever altered using
pkts_acked() so I guess this fix only affects westwood, htcp and cubic just
now.
I need to re-read properly, but I think the same problem affects the
microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno,
yeah, illinois). I might follow up with another patch which changes the
behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that.
Thanks,
Gavin
Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie>
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 889c893..6fb7989 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
u32 cnt = 0;
u32 reord = tp->packets_out;
s32 seq_rtt = -1;
+ s32 ca_seq_rtt = -1;
ktime_t last_ackt = net_invalid_timestamp();
while ((skb = tcp_write_queue_head(sk)) && skb != tcp_send_head(sk)) {
@@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
if (sacked & TCPCB_SACKED_RETRANS)
tp->retrans_out -= packets_acked;
flag |= FLAG_RETRANS_DATA_ACKED;
+ ca_seq_rtt = -1;
seq_rtt = -1;
if ((flag & FLAG_DATA_ACKED) ||
(packets_acked > 1))
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
+ ca_seq_rtt = now - scb->when;
if (seq_rtt < 0) {
- seq_rtt = now - scb->when;
+ seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb->tstamp;
}
@@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
!before(end_seq, tp->snd_up))
tp->urg_mode = 0;
} else {
+ ca_seq_rtt = now - scb->when;
if (seq_rtt < 0) {
- seq_rtt = now - scb->when;
+ seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb->tstamp;
}
@@ -2772,8 +2776,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
net_invalid_timestamp()))
rtt_us = ktime_us_delta(ktime_get_real(),
last_ackt);
- else if (seq_rtt > 0)
- rtt_us = jiffies_to_usecs(seq_rtt);
+ else if (ca_seq_rtt > 0)
+ rtt_us = jiffies_to_usecs(ca_seq_rtt);
}
ca_ops->pkts_acked(sk, pkts_acked, rtt_us);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-21 13:31 ` Gavin McCullagh
@ 2007-12-21 14:05 ` David Miller
2007-12-21 14:07 ` Ilpo Järvinen
1 sibling, 0 replies; 18+ messages in thread
From: David Miller @ 2007-12-21 14:05 UTC (permalink / raw)
To: Gavin.McCullagh; +Cc: ilpo.jarvinen, netdev
From: Gavin McCullagh <Gavin.McCullagh@nuim.ie>
Date: Fri, 21 Dec 2007 13:31:06 +0000
> On Fri, 21 Dec 2007, David Miller wrote:
>
> > When Gavin respins the patch I'll look at in the context of submitting
> > it as a bug fix. So Gavin please generate the patch against Linus's
> > vanilla GIT tree or net-2.6, your choise.
>
> The existing patch was against Linus' linux-2.6.git from a few days ago so
> I've updated my tree and regenerated the patch (below). Is that the right
> one?
Yep, it is.
> I need to re-read properly, but I think the same problem affects the
> microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno,
> yeah, illinois). I might follow up with another patch which changes the
> behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that.
Ok, let us know what you find.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-21 13:31 ` Gavin McCullagh
2007-12-21 14:05 ` David Miller
@ 2007-12-21 14:07 ` Ilpo Järvinen
2007-12-21 14:10 ` Ilpo Järvinen
2007-12-30 1:15 ` [PATCH/RFC] [v3] " Gavin McCullagh
1 sibling, 2 replies; 18+ messages in thread
From: Ilpo Järvinen @ 2007-12-21 14:07 UTC (permalink / raw)
To: Gavin McCullagh; +Cc: David Miller, Netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3414 bytes --]
On Fri, 21 Dec 2007, Gavin McCullagh wrote:
> On Fri, 21 Dec 2007, David Miller wrote:
>
> > When Gavin respins the patch I'll look at in the context of submitting
> > it as a bug fix. So Gavin please generate the patch against Linus's
> > vanilla GIT tree or net-2.6, your choise.
>
> The existing patch was against Linus' linux-2.6.git from a few days ago so
> I've updated my tree and regenerated the patch (below). Is that the right
> one?
>
> I'm just checking through the existing CA modules. I don't see the rtt
> used for RTO anywhere. This is what I gather they're each using rtt for.
I meant more timeout like fashion (e.g., to "timeout" some internal
phase but I don't find that too likely)...
Thanks for checking them.
> So as far as I can tell, timeout stuff is not ever altered using
> pkts_acked() so I guess this fix only affects westwood, htcp and cubic just
> now.
>
> I need to re-read properly, but I think the same problem affects the
> microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno,
> yeah, illinois). I might follow up with another patch which changes the
> behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that.
Please do, you might have to remove fully_acked checks to do that right
though so it won't be as straight-forward change as this one and requires
some amount of thinking to result in a right thing.
> Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie>
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 889c893..6fb7989 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
> u32 cnt = 0;
> u32 reord = tp->packets_out;
> s32 seq_rtt = -1;
> + s32 ca_seq_rtt = -1;
> ktime_t last_ackt = net_invalid_timestamp();
>
> while ((skb = tcp_write_queue_head(sk)) && skb != tcp_send_head(sk)) {
> @@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
> if (sacked & TCPCB_SACKED_RETRANS)
> tp->retrans_out -= packets_acked;
> flag |= FLAG_RETRANS_DATA_ACKED;
> + ca_seq_rtt = -1;
> seq_rtt = -1;
> if ((flag & FLAG_DATA_ACKED) ||
> (packets_acked > 1))
> flag |= FLAG_NONHEAD_RETRANS_ACKED;
> } else {
> + ca_seq_rtt = now - scb->when;
> if (seq_rtt < 0) {
> - seq_rtt = now - scb->when;
> + seq_rtt = ca_seq_rtt;
> if (fully_acked)
> last_ackt = skb->tstamp;
> }
> @@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
> !before(end_seq, tp->snd_up))
> tp->urg_mode = 0;
> } else {
> + ca_seq_rtt = now - scb->when;
> if (seq_rtt < 0) {
> - seq_rtt = now - scb->when;
> + seq_rtt = ca_seq_rtt;
> if (fully_acked)
> last_ackt = skb->tstamp;
> }
> @@ -2772,8 +2776,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
> net_invalid_timestamp()))
> rtt_us = ktime_us_delta(ktime_get_real(),
> last_ackt);
> - else if (seq_rtt > 0)
> - rtt_us = jiffies_to_usecs(seq_rtt);
> + else if (ca_seq_rtt > 0)
> + rtt_us = jiffies_to_usecs(ca_seq_rtt);
> }
>
> ca_ops->pkts_acked(sk, pkts_acked, rtt_us);
>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
/ Reviewed-by... whatever, I don't know if they really started to use it
or not... :-)
--
i.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
2007-12-21 14:07 ` Ilpo Järvinen
@ 2007-12-21 14:10 ` Ilpo Järvinen
2007-12-30 1:15 ` [PATCH/RFC] [v3] " Gavin McCullagh
1 sibling, 0 replies; 18+ messages in thread
From: Ilpo Järvinen @ 2007-12-21 14:10 UTC (permalink / raw)
To: Gavin McCullagh; +Cc: Netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 433 bytes --]
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:
> On Fri, 21 Dec 2007, Gavin McCullagh wrote:
>
> > I'm just checking through the existing CA modules. I don't see the rtt
> > used for RTO anywhere. This is what I gather they're each using rtt for.
>
> I meant more timeout like fashion (e.g., to "timeout" some internal
> phase but I don't find that too likely)...
I meant didn't, no need to recheck them due to that... :-)
--
i.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v3] TCP: use non-delayed ACK for congestion control RTT
2007-12-21 14:07 ` Ilpo Järvinen
2007-12-21 14:10 ` Ilpo Järvinen
@ 2007-12-30 1:15 ` Gavin McCullagh
2007-12-30 1:25 ` Gavin McCullagh
` (2 more replies)
1 sibling, 3 replies; 18+ messages in thread
From: Gavin McCullagh @ 2007-12-30 1:15 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: David Miller, Netdev
Hi,
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:
> > I need to re-read properly, but I think the same problem affects the
> > microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno,
> > yeah, illinois). I might follow up with another patch which changes the
> > behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that.
>
> Please do, you might have to remove fully_acked checks to do that right
> though so it won't be as straight-forward change as this one and requires
> some amount of thinking to result in a right thing.
The TCP_CONG_RTT_STAMP code does need to be fixed similarly. A combined
patch will follow this mail. As I understand it, the fully_acked checks
kick in where the ACK is a portion of a TSO chunk and doesn't completely
ACK that chunk. Leaving the checks in place means you get one rtt for each
TSO chunk, based on the ACK for the last byte of the chunk. This rtt
therefore is the maximum available and includes the time-lag between the
first and last chunk being acked. Removing the tests gives you an RTT
value for each ACK in a tso chunk, including the minimum and maximum. It
seems the minimum rtt is the best indicator of congestion. On the other
hand having all available RTTs gives the congestion avoidance greater
knowledge of how the RTT is evolving (albeit somewhat coloured by TSO
delays which don't seem too severe in my tests).
The patch I'll suggest for now takes all RTT values available, rather than
the old maximum or adding extra logic to pluck out the minimum.
I've captured some rtt values from timestamps and from
tcp_clean_rtx_queue() to demonstrate the effect of this patch:
http://www.hamilton.ie/gavinmc/linux/tcp_clean_rtx_queue.html#usec
As always, comments are most welcome,
Gavin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v3] TCP: use non-delayed ACK for congestion control RTT
2007-12-30 1:15 ` [PATCH/RFC] [v3] " Gavin McCullagh
@ 2007-12-30 1:25 ` Gavin McCullagh
2007-12-30 3:09 ` David Miller
2007-12-30 3:06 ` David Miller
2007-12-30 9:43 ` Ilpo Järvinen
2 siblings, 1 reply; 18+ messages in thread
From: Gavin McCullagh @ 2007-12-30 1:25 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: David Miller, Netdev
When a delayed ACK representing two packets arrives, there are two RTT
samples available, one for each packet. The first (in order of seq number)
will be artificially long due to the delay waiting for the second packet,
the second will trigger the ACK and so will not itself be delayed.
According to rfc1323, the SRTT used for RTO calculation should use the
first rtt, so receivers echo the timestamp from the first packet in the
delayed ack. For congestion control however, it seems measuring delayed
ack delay is not desirable as it varies independently of congestion.
The patch below causes seq_rtt and last_ackt to be updated with any
available later packet rtts which should have less (and hopefully zero)
delack delay. The rtt value then gets passed to ca_ops->pkts_acked().
Where TCP_CONG_RTT_STAMP was set, effort was made to supress RTTs from
within a TSO chunk (!fully_acked), using only the final ACK (which includes
any TSO delay) to generate RTTs. This patch removes these checks so RTTs
are passed for each ACK to ca_ops->pkts_acked().
For non-delay based congestion control (cubic, h-tcp), rtt is sometimes
used for rtt-scaling. In shortening the RTT, this may make them a little
less aggressive. Delay-based schemes (eg vegas, veno, illinois) should get
a cleaner, more accurate congestion signal, particularly for small cwnds.
The congestion control module can potentially also filter out bad RTTs due
to the delayed ack alarm by looking at the associated cnt which (where
delayed acking is in use) should probably be 1 if the alarm went off or
greater if the ACK was triggered by a packet.
Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie>
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 889c893..cbba288 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
u32 cnt = 0;
u32 reord = tp->packets_out;
s32 seq_rtt = -1;
+ s32 ca_seq_rtt = -1;
ktime_t last_ackt = net_invalid_timestamp();
while ((skb = tcp_write_queue_head(sk)) && skb != tcp_send_head(sk)) {
@@ -2659,6 +2660,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
u32 packets_acked;
u8 sacked = scb->sacked;
+ /* Determine how many packets and what bytes were acked, tso and else */
if (after(scb->end_seq, tp->snd_una)) {
if (tcp_skb_pcount(skb) == 1 ||
!after(tp->snd_una, scb->seq))
@@ -2686,15 +2688,16 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
if (sacked & TCPCB_SACKED_RETRANS)
tp->retrans_out -= packets_acked;
flag |= FLAG_RETRANS_DATA_ACKED;
+ ca_seq_rtt = -1;
seq_rtt = -1;
if ((flag & FLAG_DATA_ACKED) ||
(packets_acked > 1))
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
+ ca_seq_rtt = now - scb->when;
+ last_ackt = skb->tstamp;
if (seq_rtt < 0) {
- seq_rtt = now - scb->when;
- if (fully_acked)
- last_ackt = skb->tstamp;
+ seq_rtt = ca_seq_rtt;
}
if (!(sacked & TCPCB_SACKED_ACKED))
reord = min(cnt, reord);
@@ -2709,10 +2712,10 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
!before(end_seq, tp->snd_up))
tp->urg_mode = 0;
} else {
+ ca_seq_rtt = now - scb->when;
+ last_ackt = skb->tstamp;
if (seq_rtt < 0) {
- seq_rtt = now - scb->when;
- if (fully_acked)
- last_ackt = skb->tstamp;
+ seq_rtt = ca_seq_rtt;
}
reord = min(cnt, reord);
}
@@ -2772,8 +2775,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
net_invalid_timestamp()))
rtt_us = ktime_us_delta(ktime_get_real(),
last_ackt);
- else if (seq_rtt > 0)
- rtt_us = jiffies_to_usecs(seq_rtt);
+ else if (ca_seq_rtt > 0)
+ rtt_us = jiffies_to_usecs(ca_seq_rtt);
}
ca_ops->pkts_acked(sk, pkts_acked, rtt_us);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v3] TCP: use non-delayed ACK for congestion control RTT
2007-12-30 1:15 ` [PATCH/RFC] [v3] " Gavin McCullagh
2007-12-30 1:25 ` Gavin McCullagh
@ 2007-12-30 3:06 ` David Miller
2007-12-30 9:43 ` Ilpo Järvinen
2 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2007-12-30 3:06 UTC (permalink / raw)
To: Gavin.McCullagh; +Cc: ilpo.jarvinen, netdev
From: Gavin McCullagh <Gavin.McCullagh@nuim.ie>
Date: Sun, 30 Dec 2007 01:15:00 +0000
> A combined patch will follow this mail.
Please send a relative patch, I've already applied your
original patch to:
kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git
and will be sending that to Linus shortly.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v3] TCP: use non-delayed ACK for congestion control RTT
2007-12-30 1:25 ` Gavin McCullagh
@ 2007-12-30 3:09 ` David Miller
2007-12-30 12:20 ` Gavin McCullagh
0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2007-12-30 3:09 UTC (permalink / raw)
To: Gavin.McCullagh; +Cc: ilpo.jarvinen, netdev
Never mind about making the relative patch, I didn't want to have
to wait for you to send me that and have it block my merge of
fixes with Linus this evening.
The following is what I applied on top of your other patch:
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6fb7989..cbba288 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2660,6 +2660,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
u32 packets_acked;
u8 sacked = scb->sacked;
+ /* Determine how many packets and what bytes were acked, tso and else */
if (after(scb->end_seq, tp->snd_una)) {
if (tcp_skb_pcount(skb) == 1 ||
!after(tp->snd_una, scb->seq))
@@ -2694,10 +2695,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
ca_seq_rtt = now - scb->when;
+ last_ackt = skb->tstamp;
if (seq_rtt < 0) {
seq_rtt = ca_seq_rtt;
- if (fully_acked)
- last_ackt = skb->tstamp;
}
if (!(sacked & TCPCB_SACKED_ACKED))
reord = min(cnt, reord);
@@ -2713,10 +2713,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
tp->urg_mode = 0;
} else {
ca_seq_rtt = now - scb->when;
+ last_ackt = skb->tstamp;
if (seq_rtt < 0) {
seq_rtt = ca_seq_rtt;
- if (fully_acked)
- last_ackt = skb->tstamp;
}
reord = min(cnt, reord);
}
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v3] TCP: use non-delayed ACK for congestion control RTT
2007-12-30 1:15 ` [PATCH/RFC] [v3] " Gavin McCullagh
2007-12-30 1:25 ` Gavin McCullagh
2007-12-30 3:06 ` David Miller
@ 2007-12-30 9:43 ` Ilpo Järvinen
2007-12-30 12:35 ` Gavin McCullagh
2 siblings, 1 reply; 18+ messages in thread
From: Ilpo Järvinen @ 2007-12-30 9:43 UTC (permalink / raw)
To: Gavin McCullagh; +Cc: David Miller, Netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1680 bytes --]
On Sun, 30 Dec 2007, Gavin McCullagh wrote:
> Hi,
>
> On Fri, 21 Dec 2007, Ilpo Järvinen wrote:
>
> > > I need to re-read properly, but I think the same problem affects the
> > > microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno,
> > > yeah, illinois). I might follow up with another patch which changes the
> > > behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that.
> >
> > Please do, you might have to remove fully_acked checks to do that right
> > though so it won't be as straight-forward change as this one and requires
> > some amount of thinking to result in a right thing.
>
> The TCP_CONG_RTT_STAMP code does need to be fixed similarly. A combined
> patch will follow this mail. As I understand it, the fully_acked checks
> kick in where the ACK is a portion of a TSO chunk and doesn't completely
> ACK that chunk. Leaving the checks in place means you get one rtt for each
> TSO chunk, based on the ACK for the last byte of the chunk. This rtt
> therefore is the maximum available and includes the time-lag between the
> first and last chunk being acked. Removing the tests gives you an RTT
> value for each ACK in a tso chunk, including the minimum and maximum. It
> seems the minimum rtt is the best indicator of congestion. On the other
> hand having all available RTTs gives the congestion avoidance greater
> knowledge of how the RTT is evolving (albeit somewhat coloured by TSO
> delays which don't seem too severe in my tests).
I guess the non-minimum TSO delays are only significant in case there was
something unexpectional happening and in such case we definately want to
have some measurements taken.
--
i.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v3] TCP: use non-delayed ACK for congestion control RTT
2007-12-30 3:09 ` David Miller
@ 2007-12-30 12:20 ` Gavin McCullagh
0 siblings, 0 replies; 18+ messages in thread
From: Gavin McCullagh @ 2007-12-30 12:20 UTC (permalink / raw)
To: David Miller; +Cc: ilpo.jarvinen, netdev
Hi,
On Sat, 29 Dec 2007, David Miller wrote:
> Never mind about making the relative patch, I didn't want to have
> to wait for you to send me that and have it block my merge of
> fixes with Linus this evening.
Ah, sorry for the inconvenience. I didn't realise you had merged it yet.
> The following is what I applied on top of your other patch:
That looks fine.
Gavin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH/RFC] [v3] TCP: use non-delayed ACK for congestion control RTT
2007-12-30 9:43 ` Ilpo Järvinen
@ 2007-12-30 12:35 ` Gavin McCullagh
0 siblings, 0 replies; 18+ messages in thread
From: Gavin McCullagh @ 2007-12-30 12:35 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: David Miller, Netdev
On Sun, 30 Dec 2007, Ilpo Järvinen wrote:
> I guess the non-minimum TSO delays are only significant in case there was
> something unexpectional happening and in such case we definately want to
> have some measurements taken.
Broadly speaking, delay-based schemes need as much information about the
queueing delay as possible so more points are generally useful,
particularly if the delay is fluctuating rapidly.
When we started looking at delay-based schemes we had trouble with delay
information fluctuating wildly due to TSO. John Heffner made a change
(that I can't find at the minute) which reduced a TSO timeout and seemed to
reduce this problem greatly. It's worth bearing in mind though that TSO
may cause spuriously high delay measurements.
Gavin
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2007-12-30 12:35 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-17 13:44 [PATCH/RFC] TCP: use non-delayed ACK for congestion control RTT Gavin McCullagh
2007-12-18 20:40 ` [PATCH/RFC] [v2] " Gavin McCullagh
2007-12-19 10:28 ` Gavin McCullagh
2007-12-19 11:08 ` Ilpo Järvinen
2007-12-19 11:31 ` Gavin McCullagh
2007-12-19 13:30 ` Ilpo Järvinen
2007-12-21 11:14 ` David Miller
2007-12-21 13:31 ` Gavin McCullagh
2007-12-21 14:05 ` David Miller
2007-12-21 14:07 ` Ilpo Järvinen
2007-12-21 14:10 ` Ilpo Järvinen
2007-12-30 1:15 ` [PATCH/RFC] [v3] " Gavin McCullagh
2007-12-30 1:25 ` Gavin McCullagh
2007-12-30 3:09 ` David Miller
2007-12-30 12:20 ` Gavin McCullagh
2007-12-30 3:06 ` David Miller
2007-12-30 9:43 ` Ilpo Järvinen
2007-12-30 12:35 ` Gavin McCullagh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).