snd_cwnd drawn and quartered

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* snd_cwnd drawn and quartered
@ 2002-12-25  1:50 Werner Almesberger
  2003-01-02  1:38 ` kuznet
  0 siblings, 1 reply; 22+ messages in thread
From: Werner Almesberger @ 2002-12-25  1:50 UTC (permalink / raw)
  To: netdev; +Cc: chengjin

Hi all,

how about a little bug for Christmas ? :-)

There seems to be a bug in how fast recovery halves cwnd:
Linux TCP decrements snd_cwnd for every second incoming ACK
until we leave fast recovery.

Using the NewReno procedure, recovery ends between the time of
the inital loss plus RTT (no further losses), or about one RTT
later (if we lose the packet just sent before we detected the
initial loss).

So, in the worst case (i.e. the drawn-out recovery), cwnd gets
decremented for every second incoming ACK during two RTTs. In
the first RTT, we get the equivalent of the old cwnd of ACKs,
while in the second RTT, we've slowed down, and get only half
as many ACKs. So we end up with cwnd being a quarter of its
initial value.

Now, one could argue that this actually kind of makes sense
(i.e. no discontinuity for loss near the end of recovery),
but the code quite clearly tries to avoid this case:

net/ipv4/tcp_input.c:tcp_cwnd_down:

	if (decr && tp->snd_cwnd > tp->snd_ssthresh/2)
		tp->snd_cwnd -= decr;

Unfortunately, snd_ssthresh has already been halved at this
point, so the test actually does nothing. So I'd suggest to
change this to

	if (decr && tp->snd_cwnd > tp->snd_ssthresh)

This was found by Cheng Jin and me in his simulator based
on 2.4.18, but the code still seems to be the same in 2.5.53.

BTW, ironically, at least gcc 3.1 generates slightly worse
code if I change the second line to

		tp->snd_cwnd--;

It may still be desirable to change this for clarity,
though.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2002-12-25  1:50 snd_cwnd drawn and quartered Werner Almesberger
@ 2003-01-02  1:38 ` kuznet
  2003-01-02  6:08   ` Werner Almesberger
  0 siblings, 1 reply; 22+ messages in thread
From: kuznet @ 2003-01-02  1:38 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: netdev

Hello!

> 	if (decr && tp->snd_cwnd > tp->snd_ssthresh/2)
> 		tp->snd_cwnd -= decr;
> 
> Unfortunately, snd_ssthresh has already been halved at this
> point, so the test actually does nothing.

It does the thing which it is supposed to do: prevents reducing cwnd below
1/4 of original one. This was proposed in one of rete-halving related drafts
with title sort of "...boundary checks...", I forgot exact title, can find
it if you are curious.

> 	if (decr && tp->snd_cwnd > tp->snd_ssthresh)

Maybe this is even correct, but I do not see why it can be essential.
cwnd falls too low not due to decrementing due to rate-halving,
but due to draining out in_flight when we are not able to keep pipe full.
Please, show. 

Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-02  1:38 ` kuznet
@ 2003-01-02  6:08   ` Werner Almesberger
  2003-01-02  8:31     ` Werner Almesberger
                       ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Werner Almesberger @ 2003-01-02  6:08 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, chengjin

kuznet@ms2.inr.ac.ru wrote:
> It does the thing which it is supposed to do: prevents reducing cwnd below
> 1/4 of original one.

Okay, this it does. I guess the only case where this would make a
difference is if your network duplicates packets.

> This was proposed in one of rete-halving related drafts
> with title sort of "...boundary checks...", I forgot exact title, can find
> it if you are curious.

I searched around but didn't spot anything. A pointer would be
welcome, thanks !

> Maybe this is even correct, but I do not see why it can be essential.
> cwnd falls too low not due to decrementing due to rate-halving,
> but due to draining out in_flight when we are not able to keep pipe full.

Yes, but rate-halving is what causes in-flight to drop in the first
place (assuming we have enough fresh data to send, of course), no ?

> Please, show. 

I've put graphs of a simulation run (with and without the change) at
http://www.almesberger.net/misc/half.eps
http://www.almesberger.net/misc/quarter.eps

Y-axis is in segments, x-axis is in some arbitrary time unit, RTT is
one initial cwnd (100 packets), path is asymmetric with zero-delay
and loss-less backward channel. (While unusual, this shouldn't
actually affect what TCP does in recovery.) Losses happen right
before the packet hits the receiver.

I've also asked Cheng if he can send you a copy of his simulator.

Thanks,
- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-02  6:08   ` Werner Almesberger
@ 2003-01-02  8:31     ` Werner Almesberger
  2003-01-02 21:26     ` Werner Almesberger
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Werner Almesberger @ 2003-01-02  8:31 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, chengjin

I wrote:
> Okay, this it does. I guess the only case where this would make a
> difference is if your network duplicates packets.

... and, of course, when losing retransmitted segments, oops.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-02  6:08   ` Werner Almesberger
  2003-01-02  8:31     ` Werner Almesberger
@ 2003-01-02 21:26     ` Werner Almesberger
  2003-01-14  0:12     ` kuznet
  2003-01-14  0:54     ` snd_cwnd drawn and quartered kuznet
  3 siblings, 0 replies; 22+ messages in thread
From: Werner Almesberger @ 2003-01-02 21:26 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, chengjin

I wrote:
> I searched around but didn't spot anything. A pointer would be
> welcome, thanks !

I think I found it. Is it "The Rate-Halving Algorithm for TCP Congestion
Control", section 5.14, "Application of Bounding Paremeters" (sic!) ?
http://www.psc.edu/networking/ftp/papers/draft-ratehalving.txt

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-02  6:08   ` Werner Almesberger
  2003-01-02  8:31     ` Werner Almesberger
  2003-01-02 21:26     ` Werner Almesberger
@ 2003-01-14  0:12     ` kuznet
  2003-01-14  1:20       ` Cheng Jin
  2003-01-14  4:01       ` Werner Almesberger
  2003-01-14  0:54     ` snd_cwnd drawn and quartered kuznet
  3 siblings, 2 replies; 22+ messages in thread
From: kuznet @ 2003-01-14  0:12 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: netdev, chengjin

Hello!

> I searched around but didn't spot anything. A pointer would be
> welcome, thanks !

I bring apologies for silence. I see you have already found it.

> Yes, but rate-halving is what causes in-flight to drop in the first
> place (assuming we have enough fresh data to send, of course), no ?

Of course. But draining happens when you received more ACKs than
you sent packets. When such pathalogy happens we just have to do something,
at least to understand when it happens under normal conditions.

Probably, the reason of confusion is that original rh seems to include
some bits of "lost-sensitive recovery", so cnwd really is supposed to shrink
to cwnd/2 - lost there. See? We do not make this, and the check for 1/4 really
look as an alien.

> I've put graphs of a simulation run (with and without the change) at
> http://www.almesberger.net/misc/half.eps
> http://www.almesberger.net/misc/quarter.eps

I see. Not quite understand the reason though. :-) You said something
about lost retransmissions... How much of retransmits were lost
in the simulation? Are you aware that each lost retransmission,
if we behaved honestly, would collapse cwnd to 1? :-)

Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  0:12     ` kuznet
@ 2003-01-14  1:20       ` Cheng Jin
  2003-01-14  1:46         ` kuznet
  2003-01-14  4:01       ` Werner Almesberger
  1 sibling, 1 reply; 22+ messages in thread
From: Cheng Jin @ 2003-01-14  1:20 UTC (permalink / raw)
  To: kuznet@ms2.inr.ac.ru; +Cc: Werner Almesberger, netdev@oss.sgi.com

> I see. Not quite understand the reason though. :-) You said something
> about lost retransmissions... How much of retransmits were lost
> in the simulation? Are you aware that each lost retransmission,
> if we behaved honestly, would collapse cwnd to 1? :-)

Yes, I am aware of this, and when this happens, cwnd will sit @ 1 until
TCP gets out of recovery (1 retransmits per rtt).  It's debatable whether
cwnd should remain 1 for the rest of the recovery period even if really
sever congestion brings cwnd down to 1.  For example, if there are 100
pkts to be rexmitted when cwnd gets to 1, it will take at least 100 RTTs to
leave recovery.  By then, whatever network condition that causes congestion
in the first place might be long gone already, but one wouldn't know the
true state of the network with such a small cwnd.  Maybe cwnd could be
clamped at some minimum value (greater than 1) depending on ssthresh?

>From what I have observed, cwnd gets reduced to 1 mainly because of lost
retransmits.  I think the tcp_rs can show that. Linux TCP deals with a
lost retransmit by reducing retrans_out by one so in_flight ends up
becoming one less.  In tcp_cwnd_down, cwnd is always clamped at in_flight
so if many rexmits are lost (over many RTTs), cwnd gets reduced quite
often.  However, when cwnd is small, loss retransmits are not  reliable
signals for the severity of congestion, for example, when cwnd is 10,
does losing one pkt mean 10% pkts loss in the network?  Again, I am not
saying right or wrong about how cwnd changes, but it is something that
we could think about.

THanks,

Cheng

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  1:20       ` Cheng Jin
@ 2003-01-14  1:46         ` kuznet
  2003-01-14  1:58           ` Cheng Jin
  0 siblings, 1 reply; 22+ messages in thread
From: kuznet @ 2003-01-14  1:46 UTC (permalink / raw)
  To: Cheng Jin; +Cc: wa, netdev

Hello!

> Yes, I am aware of this, and when this happens, cwnd will sit @ 1 until
> TCP gets out of recovery (1 retransmits per rtt).  It's debatable

Nothing to debate, really. Following specs lost retransmission must trigger
RTO timeout with sunsubsequent slow start starting from 1.

We are _more_ liberal when SACKs are on, this really can be argued
and I am ready to defend the liberal values. :-) But when we cannot
detect loss opf retransmission, we have to return to standard go-back-n
behaviour.

> pkts to be rexmitted when cwnd gets to 1, it will take at least 100 RTTs to

Not a big deal. You have to wait the same 100 RTTs at start of each connection,
even though initial slow start _assumes_ that no congestion is present.
So, when you experienced real congestion, slow start is really fair. :-)

>      In tcp_cwnd_down, cwnd is always clamped at in_flight
> so if many rexmits are lost (over many RTTs), cwnd gets reduced quite
> often.

It is (half of) loss-sensitive recovery. Luckily, we are not mad enough
to shrink ssthresh as well. But imagine, month ago I had long boring
argument with a guy who insisted that ssthresh must be shrinken too,
otherwise "cwnd grows too fast". :-)

> does losing one pkt mean 10% pkts loss in the network?

Not less than 10% of loss. Maybe, more, but you were lucky and your
packets percolated through deadly congested router. :-)
It is basic assumption of cwnd avoidance: each loss is considered
as loss due to congestion.

Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  1:46         ` kuznet
@ 2003-01-14  1:58           ` Cheng Jin
  2003-01-14  2:12             ` kuznet
  0 siblings, 1 reply; 22+ messages in thread
From: Cheng Jin @ 2003-01-14  1:58 UTC (permalink / raw)
  To: kuznet@ms2.inr.ac.ru; +Cc: wa@almesberger.net, netdev@oss.sgi.com

> > Yes, I am aware of this, and when this happens, cwnd will sit @ 1 until
> > TCP gets out of recovery (1 retransmits per rtt).  It's debatable
>
> Nothing to debate, really. Following specs lost retransmission must trigger
> RTO timeout with sunsubsequent slow start starting from 1.

I don't always see time-out+slow-start in this case.  I have seen
TCP staying in recovery for a very long time with cwnd=1.  I think Linux
TCP has various detection mechanisms for lost rexmit so time-out doesn't
always happen.  I think there is lost rexmit detection code at the end of
tcp_sacktag_write_queue.

> We are _more_ liberal when SACKs are on, this really can be argued
> and I am ready to defend the liberal values. :-) But when we cannot
> detect loss opf retransmission, we have to return to standard go-back-n
> behaviour.

Yes, Linux does indeed do more aggressive lost detection than what's in
the spec.  I am fine with that.

> Not a big deal. You have to wait the same 100 RTTs at start of each connection,
> even though initial slow start _assumes_ that no congestion is present.
> So, when you experienced real congestion, slow start is really fair. :-)

I am not sure what you mean here, slow start takes much faster to
transmit 100 pkts than 100 RTTs.  Maybe there is a misunderstanding here,
when cwnd=1 in congestion recovery, it will stay at 1 until it exits
recovery unless some packet times out first.

> Not less than 10% of loss. Maybe, more, but you were lucky and your
> packets percolated through deadly congested router. :-)
> It is basic assumption of cwnd avoidance: each loss is considered
> as loss due to congestion.

Certainly. It's difficult to know what the right cwnd should be when this
happens, and a conservative approach may be the best one to take.  I am
just curious if people can come up with better ways to set cwnd.

Cheng

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  1:58           ` Cheng Jin
@ 2003-01-14  2:12             ` kuznet
  2003-01-14  2:19               ` Cheng Jin
  0 siblings, 1 reply; 22+ messages in thread
From: kuznet @ 2003-01-14  2:12 UTC (permalink / raw)
  To: Cheng Jin; +Cc: wa, netdev

Hello!

> I am not sure what you mean here, slow start takes much faster to
> transmit 100 pkts than 100 RTTs.  Maybe there is a misunderstanding here,
> when cwnd=1 in congestion recovery, it will stay at 1 until it exits
> recovery unless some packet times out first.

I see! Seems, you spotted a real problem. I missed this.

I feel we should add a trigger stopping such "fast" :-) restransmit
when cwnd falls low. Maybe, the same <cwnd/4 is good criterium.
I need to think, the situation is really weird.

Could you, please, prepare a demo pseudo-tcpdump with your simulator?

> just curious if people can come up with better ways to set cwnd.

Come up! :-)

Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  2:12             ` kuznet
@ 2003-01-14  2:19               ` Cheng Jin
  2003-01-14  5:07                 ` kuznet
  0 siblings, 1 reply; 22+ messages in thread
From: Cheng Jin @ 2003-01-14  2:19 UTC (permalink / raw)
  To: kuznet@ms2.inr.ac.ru; +Cc: wa@almesberger.net, netdev@oss.sgi.com


> Could you, please, prepare a demo pseudo-tcpdump with your simulator?

What would you like to see in the demo?  Just the straight output from my
simulator?

Cheng

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  2:19               ` Cheng Jin
@ 2003-01-14  5:07                 ` kuznet
  0 siblings, 0 replies; 22+ messages in thread
From: kuznet @ 2003-01-14  5:07 UTC (permalink / raw)
  To: Cheng Jin; +Cc: wa, netdev

Hello!

> What would you like to see in the demo?  Just the straight output from my
> simulator?

Something which will allow to restore history to demonstrate
how it is possible to finish in recovery state with cwnd of 1
and lots of data in flight. I see that this is not impossible,
but simply cannot figure out when this madness happens.

Format does not matter. seq/(s)ack paires are good.

Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  0:12     ` kuznet
  2003-01-14  1:20       ` Cheng Jin
@ 2003-01-14  4:01       ` Werner Almesberger
       [not found]         ` <200301140502.IAA10733@sex.inr.ac.ru>
  2003-01-19  6:55         ` example showing how cwnd gets to one Cheng Jin
  1 sibling, 2 replies; 22+ messages in thread
From: Werner Almesberger @ 2003-01-14  4:01 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, chengjin

kuznet@ms2.inr.ac.ru wrote:
> Of course. But draining happens when you received more ACKs than
> you sent packets. When such pathalogy happens we just have to do something,
> at least to understand when it happens under normal conditions.

Yes, that's a separate problem. There are (at least :-) two problems
we're looking at:

 - cwnd getting too small, probably due to "natural causes", and
   not recovering properly
 - cwnd getting reduced too much by the ssthresh/2 test

Cheng's been talking about the first one, I'm on the latter. Note
that in my case, no retransmissions are lost. There are only some
additional losses in the initial cwnd (actually, just one loss
would be sufficient), which then extend the recovery period.

I went through draft-ratehalving and compared it with what our TCP
is doing. It seems that the test is actually about 50% right :-)

Below is my analysis of the situation. How do you like the idea of
adding another variable ? :-)

Here we go:

To analyze whether changing tcp_input.c:tcp_cwnd_down from
	if (decr && tp->snd_cwnd > tp->snd_ssthresh/2)
to
	if (decr && tp->snd_cwnd > tp->snd_ssthresh)
yields valid TCP behaviour, I'm comparing the algorithm
specification in draft-ratehalving [1] with the implementation of
Linux TCP. The goal is to show whether Linux TCP still performs
according to draft-ratehalving after the change.

[1] http://www.psc.edu/networking/ftp/papers/draft-ratehalving.txt

draft-ratehalving generally aims to set cwnd (they call it rhcwnd)
to (prior_cwnd-loss)/2 (section 5.14), where "loss" are the packets
that have been lost from the RTT sent before we entered recovery,
and "prior_cwnd" is the cwnd at the time when we began recovery.
This is also explained in section 3.1 of RFC2581.

For simplicity, let's assume there is no reordering, no ACK loss,
and no jumps in delay.

Without NewReno, there are two cases: if the loss is indicated by
an "exact" means (ECN, SACK), it reduces cwnd by half the distance
by which fack is advanced, plus half the size of any "holes" found
via SACK (5.6). At the end of recovery, cwnd should therefore reach
(prior_cwnd-loss)/2, as specified above.

Still without NewReno, if loss is indicated by a duplicate ACK
without SACK, cwnd is reduced by half a segment for each duplicate
ACK received (4.7). This way, cwnd will shrink to
(prior_cwnd+loss)/2. (No typo - it's "+").

With NewReno, the algorithms are the same, but cwnd stops
decrementing at cwnd <= prior_cwnd/2 (5.12).

Once we get out of recovery, cwnd gets set to
(prior_cwnd-num_retrans)/2, where num_retrans is the number of
retransmissions in the "repair interval" (4.13). This is effectively
the number of retransmissions we needed to fix the initial loss.

I'm not entirely sure how this changes if we lose a retransmission.
RFC2581 requires cwnd to be halved twice in this case (4.3).

At the end (4.14), draft-ratehalving forces the new cwnd below
prior_cwnd/2 (in case we didn't decrement enough, e.g. in the second
"old" Reno case). It also sets ssthresh to the new cwnd, but makes
sure it (ssthresh) does not drop below prior_cwnd/4 to ensure "that
the TCP connection is not unduly harmed by extreme network conditions"
(5.14, probably meaning reordering).

When entering congestion (tcp_input.c:tcp_fastretrans_alert), Linux
TCP sets ssthresh to roughly half cwnd (tcp.h:tcp_recalc_ssthresh).
Note that this differs from the requirement of setting ssthresh to
half the amount of data in flight.

During recovery, cwnd reduction is done by tcp_input.c:tcp_cwnd_down
as follows: snd_cwnd is decremented for every second (duplicate) ACK,
which corresponds to sections 4.7 and 4.12 of draft-ratehalving,
except that snd_cwnd reduction stops at snd_ssthresh/2 (corresponding
roughly to prior_cwnd/4) instead of prior_cwnd/2.

Additionally, cwnd may be further reduced if there are less than
cwnd packets in flight. (This deserves further analysis.)

The equivalent to the first part of 4.14 happens in tcp_complete_cwr:
cwnd is set to the minimum of cwnd and ssthresh, where the latter is
(roughly) prior_cwnd/2.

Raising the cut-off point for cwnd reduction to ssthresh would still
yield the cwnd decrease described in section 4.7, and the cut-off
would occur at the point described in section 4.12. Furthermore, at
the end of recovery, snd_cwnd is set up prior_cwnd/2, which is
consistent with section 5.14.

So far, so good. Unfortunately, there are are two exceptions: a loss
outside the cwnd in which the initial loss occurred (i.e. loss of
data above high_seq) or the loss of a retransmission is required to
cause another halving of cwnd. A loss above high_seq is detected and
handled as separate loss after the current loss episode has ended,
and there does not need to concern us here.

However, loss of a retransmission is handled implicitly, as follows:
it extends the recovery interval to at least 2*RTT. This causes
the current implementation of tcp_cwnd_down to decrement snd_cwnd to
ssthresh/2, yielding the correct result.

The most correct solution seems therefore to introduce yet another
TCP state variable cwnd_bound that limits how far tcp_cwnd_down can
decrement snd_cwnd. Initially, tcp_input.c:tcp_clear_retrans and
tcp_input.c:tcp_fastretrans_alert would set cwnd_bound to the
reduced snd_ssthresh, limiting snd_cwnd reductions to prior_cwnd/2.
If tcp_input.c:tcp_sacktag_write_queue detects loss of a
retransmission, it sets cwnd_bound to ssthresh/2, allowing reduction
down to prior_cwnd/4.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

[parent not found: <200301140502.IAA10733@sex.inr.ac.ru>]

* Re: snd_cwnd drawn and quartered
       [not found]         ` <200301140502.IAA10733@sex.inr.ac.ru>
@ 2003-01-14  5:25           ` Werner Almesberger
  2003-01-14  6:14             ` kuznet
  0 siblings, 1 reply; 22+ messages in thread
From: Werner Almesberger @ 2003-01-14  5:25 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, chengjin

kuznet@ms2.inr.ac.ru wrote:
> Losses in the same window must not result in multiplicative collapse
> of cwnd or extension of recovery period. If they do, we are really buggy.

Oh, according to NewReno, extending the recovery period is just fine.
You just need to stop counting ... And what we have is definitely
NewReno-ish, with recovery ending at high_seq.

> Which is funny and unexpected, I read the paper only after the implementation
> has been mostly finished (based on old Hoe's papers) and just added
> some colorful details, sort of this one. :-)

That explains why things are so similar but still not quite the
same :-))

> It is required to stop all the smartness and to enter RTO timeout,

Okay, that makes also the ssthresh/2 fix easier, because it eliminates
the one case where the current behaviour is actually right.

> I did not get this. This limit is boundary check which you referred to
> several sentences above. Not falling under prior_cwnd/2 does not need
> a special check, we do not receive more than prior_cnwd ACKs while single
> recovery period in any case.

Yes yes, we do ... that's what NewReno is all about. Let's say you
lose the 0th and the 99th segment (with cwnd = 100). You'll detect
the first loss at t = 100, and enter recovery. You leave recovery
once snd_nxt (stored in high_seq) at that time has been ack'ed. So
this is 100. At time t=199, we find out about the loss of the 99th
segment, and retransmit. This gets ack'ed at time t=299. So it's
only then when we leave recovery.

draft-ratehalving distinguishes the adjustement interval and the
repair interval. The latter lasts until we've fixed all losses,
while the former should indeed not exceed one RTT. It's this
limitation that's missing in Linux.

> Huh. We can just shot this silly heuritsics, if it hurts. :-)

If you enter RTO upon loss of retransmission, we can, yes.
(And we don't even need a new variable ;-)

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  5:25           ` Werner Almesberger
@ 2003-01-14  6:14             ` kuznet
  2003-01-14  6:36               ` Werner Almesberger
  0 siblings, 1 reply; 22+ messages in thread
From: kuznet @ 2003-01-14  6:14 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: netdev, chengjin

Hello!

> > Losses in the same window must not result in multiplicative collapse
> > of cwnd or extension of recovery period. If they do, we are really buggy.
> 
> Oh, according to NewReno, extending the recovery period is just fine.

Sorry, it is just nonsense in newreno, it is that high_seq makes.
Well, and this is surely not fine, losses in several consecutive
windows must result in multiplicative reduction of cwnd.

> Yes yes, we do ... that's what NewReno is all about. Let's say you
> lose the 0th and the 99th segment (with cwnd = 100). You'll detect
> the first loss at t = 100, and enter recovery. You leave recovery
> once snd_nxt (stored in high_seq) at that time has been ack'ed. So
> this is 100. At time t=199, we find out about the loss of the 99th
> segment, and retransmit. This gets ack'ed at time t=299. So it's
> only then when we leave recovery.

I do not understand, what you has just said. You cant discover
loss of 99th segment _after_ 100th was happily ACKed. :-) :-)

> draft-ratehalving distinguishes the adjustement interval and the
> repair interval. The latter lasts until we've fixed all losses,
> while the former should indeed not exceed one RTT. It's this
> limitation that's missing in Linux.

But thit does not matter because it handles opposite case, when cwnd
was not reduced enough for one RTT, so rh falls to hold state
to decrease cwnd smoothly. It is just an unnecessary complication
to my opinion. Did I get this wrong?

Picture, Werner! What does happen on picture quarter.eps???? That's question.

Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  6:14             ` kuznet
@ 2003-01-14  6:36               ` Werner Almesberger
  2003-01-15 17:50                 ` kuznet
  0 siblings, 1 reply; 22+ messages in thread
From: Werner Almesberger @ 2003-01-14  6:36 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, chengjin

kuznet@ms2.inr.ac.ru wrote:
> Sorry, it is just nonsense in newreno, it is that high_seq makes.
> Well, and this is surely not fine, losses in several consecutive
> windows must result in multiplicative reduction of cwnd.

This is precisely what NewReno does. If you lose anything within
that cwnd, recovery is extended. If you lose something after the
cwnd, you'll first finish the old recovery cycle (since snd_una
has now passed high_seq), and then enter a new one, with the
appropriate reduction of ssthresh and cwnd.

> > Yes yes, we do ... that's what NewReno is all about. Let's say you
> > lose the 0th and the 99th segment (with cwnd = 100). You'll detect
> > the first loss at t = 100, and enter recovery. You leave recovery
> > once snd_nxt (stored in high_seq) at that time has been ack'ed. So
> > this is 100. At time t=199, we find out about the loss of the 99th
> > segment, and retransmit. This gets ack'ed at time t=299. So it's
> > only then when we leave recovery.
> 
> I do not understand, what you has just said. You cant discover
> loss of 99th segment _after_ 100th was happily ACKed. :-) :-)

That would be kind of evil :-) But that's not what I meant. The
100 refers to high_seq, i.e. the segment we need to get ack'ed
for leaving recovery.

> But thit does not matter because it handles opposite case, when cwnd
> was not reduced enough for one RTT, so rh falls to hold state
> to decrease cwnd smoothly. It is just an unnecessary complication
> to my opinion. Did I get this wrong?

Hmm, I think it's like I said. The case of cwnd not getting
reduced enough is handled when exiting repair, i.e. section 4.13
and 4.14.

Note that those intervals don't have to be explicitly tracked in
any way. E.g. the adjustment interval can just be implemented by
stopping decrementing cwnd at the new ssthresh.

> Picture, Werner! What does happen on picture quarter.eps????

t=0:  we have the first loss. At this time, snd_nxt is 100, so
      we set high_seq accordingly, slash ssthresh, and enter
      recovery.
~98:  we have the last loss within our cwnd (the simulation
      just stops losing packets at this point. In real life,
      this transmission should of course be smoother.)
100:  we've recovered our initial loss, but snd_una is still
      below high_seq, because of all the other losses in that
      cwnd
~200: we recover the last loss, and snd_una finally increases
      above high_seq, so we leave recovery now.

All that is just fine, if I interpret NewReno right. The only
problem is that we should have stopped reducing cwnd after the
first RTT.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-14  6:36               ` Werner Almesberger
@ 2003-01-15 17:50                 ` kuznet
  2003-01-15 18:25                   ` Werner Almesberger
  0 siblings, 1 reply; 22+ messages in thread
From: kuznet @ 2003-01-15 17:50 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: netdev, chengjin

Hello!

> This is precisely what NewReno does. If you lose anything within
> that cwnd, recovery is extended.

Werner, where did you get this information? In that case recovery
will not finish. :-)

> 100 refers to high_seq, i.e. the segment we need to get ack'ed
> for leaving recovery.

I still do not understand. Apparently it is based on assumption
of extension of high_seq which must not happen.

> 100:  we've recovered our initial loss, but snd_una is still
>       below high_seq, because of all the other losses in that
>       cwnd

This must not happen. I did not mean this in code and cannot see
how it can happen. high_seq is set once while single recovery cycle.
Something is buggy.

Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-15 17:50                 ` kuznet
@ 2003-01-15 18:25                   ` Werner Almesberger
  2003-01-15 18:43                     ` kuznet
  0 siblings, 1 reply; 22+ messages in thread
From: Werner Almesberger @ 2003-01-15 18:25 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, chengjin

kuznet@ms2.inr.ac.ru wrote:
>> This is precisely what NewReno does. If you lose anything within
>> that cwnd, recovery is extended.
> 
> Werner, where did you get this information? In that case recovery
> will not finish. :-)

Maybe I used the wrong word. The sequence number we're waiting
for (high_seq) doesn't change, of course. But the recovery
takes longer than just one RTT, because it takes longer for
snd_una to reach high_seq - due to the second loss.

And because recovery takes longer than one RTT, we decrement
cwnd too much.

>> 100:  we've recovered our initial loss, but snd_una is still
>>       below high_seq, because of all the other losses in that
>>       cwnd
> 
> This must not happen. I did not mean this in code and cannot see
> how it can happen. high_seq is set once while single recovery cycle.
> Something is buggy.

Yes, high_seq is set only once. That's okay. It's snd_una that
(correctly) takes more than one RTT to reach high_seq.

1) tcp_enter_loss sets high_seq = snd_nxt. At that time (t = 0),
   snd_una is 0, snd_nxt is 100.
2) tcp_fastretrans_alert tries to exit recovery only if snd_una
   reaches high_seq

Am I reading this right ?

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-15 18:25                   ` Werner Almesberger
@ 2003-01-15 18:43                     ` kuznet
  2003-01-15 19:37                       ` Werner Almesberger
  0 siblings, 1 reply; 22+ messages in thread
From: kuznet @ 2003-01-15 18:43 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: netdev, chengjin

Hello!

> Am I reading this right ?

Yes, most likely. Dumb me understood you finally. :-)

We receive some amount of dupacks for segments >high_seq, before ACK
for retransmitted 99th segment arrives and terminates recovery. You meaned
this, right? :-)

Yup, this is bug. OK, this case is sorted out. I think that small fix
proposed by you is enough for beginning.

Ough... another weirdness with cwand drained to 1 still remains.

Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-15 18:43                     ` kuznet
@ 2003-01-15 19:37                       ` Werner Almesberger
  0 siblings, 0 replies; 22+ messages in thread
From: Werner Almesberger @ 2003-01-15 19:37 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, chengjin

kuznet@ms2.inr.ac.ru wrote:
> We receive some amount of dupacks for segments >high_seq, before ACK
> for retransmitted 99th segment arrives and terminates recovery. You meaned
> this, right? :-)

Yes, that's it.

> Yup, this is bug. OK, this case is sorted out. I think that small fix
> proposed by you is enough for beginning.

I think, in addition to this fix, you also need to do RTO upon loss
of retransmission. Otherwise, my fix would make you halve cwnd only
once in this case.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* example showing how cwnd gets to one
  2003-01-14  4:01       ` Werner Almesberger
       [not found]         ` <200301140502.IAA10733@sex.inr.ac.ru>
@ 2003-01-19  6:55         ` Cheng Jin
  1 sibling, 0 replies; 22+ messages in thread
From: Cheng Jin @ 2003-01-19  6:55 UTC (permalink / raw)
  To: kuznet, werner; +Cc: netdev@oss.sgi.com

[-- Attachment #1: Type: TEXT/PLAIN, Size: 860 bytes --]

Hi Alexey,

Sorry about the delay in getting you the example.  I wanted to clean up
the output/code a little bit.  I have attached the tar ball of the source
code of the simulator plus an example showing how cwnd goes from 200 to 1.
It's under 20% random packet loss.  You might laugh because the loss rate
is "unrealistically" high, but on long latency paths, this can happen if
the bottleneck queue is not very large.

If you only care about ack/packets sent, please do a 'grep ">>>" file'
on the example file.  I have included the TCP recovery state variable to
make things clear in the output.  maize is the sender, and
blue is the receiver in the simulation.  tcp_time_stamp in the simulation
is kept in rounds/windows of packets i.e., 0 correspondes to the first round
where first loss happens, instead of jiffies.

Thanks,

Cheng

Lab # 626 395 8820

[-- Attachment #2: Type: APPLICATION/X-GZIP, Size: 22130 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: snd_cwnd drawn and quartered
  2003-01-02  6:08   ` Werner Almesberger
                       ` (2 preceding siblings ...)
  2003-01-14  0:12     ` kuznet
@ 2003-01-14  0:54     ` kuznet
  3 siblings, 0 replies; 22+ messages in thread
From: kuznet @ 2003-01-14  0:54 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: netdev, chengjin

Hello!

> http://www.almesberger.net/misc/quarter.eps

So... recovery is supposed to terminate when snd.una reaches 100
(snd.nxt at beginning of fast retransmit). In this case cwnd would
be orig_cwnd/2, as expected. But it did not stop! Hence, something
extraordinary happened while recovery, which resulted in the second recovery.
But I do not understand why snd_ssthresh was not shrinken too.
All this smells like a bug. I do not see from the picture
what was this. Can you make a pseudo-tcpdump instead of picture?

Alexey

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2003-01-19  6:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-25  1:50 snd_cwnd drawn and quartered Werner Almesberger
2003-01-02  1:38 ` kuznet
2003-01-02  6:08   ` Werner Almesberger
2003-01-02  8:31     ` Werner Almesberger
2003-01-02 21:26     ` Werner Almesberger
2003-01-14  0:12     ` kuznet
2003-01-14  1:20       ` Cheng Jin
2003-01-14  1:46         ` kuznet
2003-01-14  1:58           ` Cheng Jin
2003-01-14  2:12             ` kuznet
2003-01-14  2:19               ` Cheng Jin
2003-01-14  5:07                 ` kuznet
2003-01-14  4:01       ` Werner Almesberger
     [not found]         ` <200301140502.IAA10733@sex.inr.ac.ru>
2003-01-14  5:25           ` Werner Almesberger
2003-01-14  6:14             ` kuznet
2003-01-14  6:36               ` Werner Almesberger
2003-01-15 17:50                 ` kuznet
2003-01-15 18:25                   ` Werner Almesberger
2003-01-15 18:43                     ` kuznet
2003-01-15 19:37                       ` Werner Almesberger
2003-01-19  6:55         ` example showing how cwnd gets to one Cheng Jin
2003-01-14  0:54     ` snd_cwnd drawn and quartered kuznet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).