* snd_cwnd drawn and quartered @ 2002-12-25 1:50 Werner Almesberger 2003-01-02 1:38 ` kuznet 0 siblings, 1 reply; 22+ messages in thread From: Werner Almesberger @ 2002-12-25 1:50 UTC (permalink / raw) To: netdev; +Cc: chengjin Hi all, how about a little bug for Christmas ? :-) There seems to be a bug in how fast recovery halves cwnd: Linux TCP decrements snd_cwnd for every second incoming ACK until we leave fast recovery. Using the NewReno procedure, recovery ends between the time of the inital loss plus RTT (no further losses), or about one RTT later (if we lose the packet just sent before we detected the initial loss). So, in the worst case (i.e. the drawn-out recovery), cwnd gets decremented for every second incoming ACK during two RTTs. In the first RTT, we get the equivalent of the old cwnd of ACKs, while in the second RTT, we've slowed down, and get only half as many ACKs. So we end up with cwnd being a quarter of its initial value. Now, one could argue that this actually kind of makes sense (i.e. no discontinuity for loss near the end of recovery), but the code quite clearly tries to avoid this case: net/ipv4/tcp_input.c:tcp_cwnd_down: if (decr && tp->snd_cwnd > tp->snd_ssthresh/2) tp->snd_cwnd -= decr; Unfortunately, snd_ssthresh has already been halved at this point, so the test actually does nothing. So I'd suggest to change this to if (decr && tp->snd_cwnd > tp->snd_ssthresh) This was found by Cheng Jin and me in his simulator based on 2.4.18, but the code still seems to be the same in 2.5.53. BTW, ironically, at least gcc 3.1 generates slightly worse code if I change the second line to tp->snd_cwnd--; It may still be desirable to change this for clarity, though. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2002-12-25 1:50 snd_cwnd drawn and quartered Werner Almesberger @ 2003-01-02 1:38 ` kuznet 2003-01-02 6:08 ` Werner Almesberger 0 siblings, 1 reply; 22+ messages in thread From: kuznet @ 2003-01-02 1:38 UTC (permalink / raw) To: Werner Almesberger; +Cc: netdev Hello! > if (decr && tp->snd_cwnd > tp->snd_ssthresh/2) > tp->snd_cwnd -= decr; > > Unfortunately, snd_ssthresh has already been halved at this > point, so the test actually does nothing. It does the thing which it is supposed to do: prevents reducing cwnd below 1/4 of original one. This was proposed in one of rete-halving related drafts with title sort of "...boundary checks...", I forgot exact title, can find it if you are curious. > if (decr && tp->snd_cwnd > tp->snd_ssthresh) Maybe this is even correct, but I do not see why it can be essential. cwnd falls too low not due to decrementing due to rate-halving, but due to draining out in_flight when we are not able to keep pipe full. Please, show. Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-02 1:38 ` kuznet @ 2003-01-02 6:08 ` Werner Almesberger 2003-01-02 8:31 ` Werner Almesberger ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Werner Almesberger @ 2003-01-02 6:08 UTC (permalink / raw) To: kuznet; +Cc: netdev, chengjin kuznet@ms2.inr.ac.ru wrote: > It does the thing which it is supposed to do: prevents reducing cwnd below > 1/4 of original one. Okay, this it does. I guess the only case where this would make a difference is if your network duplicates packets. > This was proposed in one of rete-halving related drafts > with title sort of "...boundary checks...", I forgot exact title, can find > it if you are curious. I searched around but didn't spot anything. A pointer would be welcome, thanks ! > Maybe this is even correct, but I do not see why it can be essential. > cwnd falls too low not due to decrementing due to rate-halving, > but due to draining out in_flight when we are not able to keep pipe full. Yes, but rate-halving is what causes in-flight to drop in the first place (assuming we have enough fresh data to send, of course), no ? > Please, show. I've put graphs of a simulation run (with and without the change) at http://www.almesberger.net/misc/half.eps http://www.almesberger.net/misc/quarter.eps Y-axis is in segments, x-axis is in some arbitrary time unit, RTT is one initial cwnd (100 packets), path is asymmetric with zero-delay and loss-less backward channel. (While unusual, this shouldn't actually affect what TCP does in recovery.) Losses happen right before the packet hits the receiver. I've also asked Cheng if he can send you a copy of his simulator. Thanks, - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-02 6:08 ` Werner Almesberger @ 2003-01-02 8:31 ` Werner Almesberger 2003-01-02 21:26 ` Werner Almesberger ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Werner Almesberger @ 2003-01-02 8:31 UTC (permalink / raw) To: kuznet; +Cc: netdev, chengjin I wrote: > Okay, this it does. I guess the only case where this would make a > difference is if your network duplicates packets. ... and, of course, when losing retransmitted segments, oops. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-02 6:08 ` Werner Almesberger 2003-01-02 8:31 ` Werner Almesberger @ 2003-01-02 21:26 ` Werner Almesberger 2003-01-14 0:12 ` kuznet 2003-01-14 0:54 ` snd_cwnd drawn and quartered kuznet 3 siblings, 0 replies; 22+ messages in thread From: Werner Almesberger @ 2003-01-02 21:26 UTC (permalink / raw) To: kuznet; +Cc: netdev, chengjin I wrote: > I searched around but didn't spot anything. A pointer would be > welcome, thanks ! I think I found it. Is it "The Rate-Halving Algorithm for TCP Congestion Control", section 5.14, "Application of Bounding Paremeters" (sic!) ? http://www.psc.edu/networking/ftp/papers/draft-ratehalving.txt - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-02 6:08 ` Werner Almesberger 2003-01-02 8:31 ` Werner Almesberger 2003-01-02 21:26 ` Werner Almesberger @ 2003-01-14 0:12 ` kuznet 2003-01-14 1:20 ` Cheng Jin 2003-01-14 4:01 ` Werner Almesberger 2003-01-14 0:54 ` snd_cwnd drawn and quartered kuznet 3 siblings, 2 replies; 22+ messages in thread From: kuznet @ 2003-01-14 0:12 UTC (permalink / raw) To: Werner Almesberger; +Cc: netdev, chengjin Hello! > I searched around but didn't spot anything. A pointer would be > welcome, thanks ! I bring apologies for silence. I see you have already found it. > Yes, but rate-halving is what causes in-flight to drop in the first > place (assuming we have enough fresh data to send, of course), no ? Of course. But draining happens when you received more ACKs than you sent packets. When such pathalogy happens we just have to do something, at least to understand when it happens under normal conditions. Probably, the reason of confusion is that original rh seems to include some bits of "lost-sensitive recovery", so cnwd really is supposed to shrink to cwnd/2 - lost there. See? We do not make this, and the check for 1/4 really look as an alien. > I've put graphs of a simulation run (with and without the change) at > http://www.almesberger.net/misc/half.eps > http://www.almesberger.net/misc/quarter.eps I see. Not quite understand the reason though. :-) You said something about lost retransmissions... How much of retransmits were lost in the simulation? Are you aware that each lost retransmission, if we behaved honestly, would collapse cwnd to 1? :-) Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 0:12 ` kuznet @ 2003-01-14 1:20 ` Cheng Jin 2003-01-14 1:46 ` kuznet 2003-01-14 4:01 ` Werner Almesberger 1 sibling, 1 reply; 22+ messages in thread From: Cheng Jin @ 2003-01-14 1:20 UTC (permalink / raw) To: kuznet@ms2.inr.ac.ru; +Cc: Werner Almesberger, netdev@oss.sgi.com > I see. Not quite understand the reason though. :-) You said something > about lost retransmissions... How much of retransmits were lost > in the simulation? Are you aware that each lost retransmission, > if we behaved honestly, would collapse cwnd to 1? :-) Yes, I am aware of this, and when this happens, cwnd will sit @ 1 until TCP gets out of recovery (1 retransmits per rtt). It's debatable whether cwnd should remain 1 for the rest of the recovery period even if really sever congestion brings cwnd down to 1. For example, if there are 100 pkts to be rexmitted when cwnd gets to 1, it will take at least 100 RTTs to leave recovery. By then, whatever network condition that causes congestion in the first place might be long gone already, but one wouldn't know the true state of the network with such a small cwnd. Maybe cwnd could be clamped at some minimum value (greater than 1) depending on ssthresh? >From what I have observed, cwnd gets reduced to 1 mainly because of lost retransmits. I think the tcp_rs can show that. Linux TCP deals with a lost retransmit by reducing retrans_out by one so in_flight ends up becoming one less. In tcp_cwnd_down, cwnd is always clamped at in_flight so if many rexmits are lost (over many RTTs), cwnd gets reduced quite often. However, when cwnd is small, loss retransmits are not reliable signals for the severity of congestion, for example, when cwnd is 10, does losing one pkt mean 10% pkts loss in the network? Again, I am not saying right or wrong about how cwnd changes, but it is something that we could think about. THanks, Cheng ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 1:20 ` Cheng Jin @ 2003-01-14 1:46 ` kuznet 2003-01-14 1:58 ` Cheng Jin 0 siblings, 1 reply; 22+ messages in thread From: kuznet @ 2003-01-14 1:46 UTC (permalink / raw) To: Cheng Jin; +Cc: wa, netdev Hello! > Yes, I am aware of this, and when this happens, cwnd will sit @ 1 until > TCP gets out of recovery (1 retransmits per rtt). It's debatable Nothing to debate, really. Following specs lost retransmission must trigger RTO timeout with sunsubsequent slow start starting from 1. We are _more_ liberal when SACKs are on, this really can be argued and I am ready to defend the liberal values. :-) But when we cannot detect loss opf retransmission, we have to return to standard go-back-n behaviour. > pkts to be rexmitted when cwnd gets to 1, it will take at least 100 RTTs to Not a big deal. You have to wait the same 100 RTTs at start of each connection, even though initial slow start _assumes_ that no congestion is present. So, when you experienced real congestion, slow start is really fair. :-) > In tcp_cwnd_down, cwnd is always clamped at in_flight > so if many rexmits are lost (over many RTTs), cwnd gets reduced quite > often. It is (half of) loss-sensitive recovery. Luckily, we are not mad enough to shrink ssthresh as well. But imagine, month ago I had long boring argument with a guy who insisted that ssthresh must be shrinken too, otherwise "cwnd grows too fast". :-) > does losing one pkt mean 10% pkts loss in the network? Not less than 10% of loss. Maybe, more, but you were lucky and your packets percolated through deadly congested router. :-) It is basic assumption of cwnd avoidance: each loss is considered as loss due to congestion. Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 1:46 ` kuznet @ 2003-01-14 1:58 ` Cheng Jin 2003-01-14 2:12 ` kuznet 0 siblings, 1 reply; 22+ messages in thread From: Cheng Jin @ 2003-01-14 1:58 UTC (permalink / raw) To: kuznet@ms2.inr.ac.ru; +Cc: wa@almesberger.net, netdev@oss.sgi.com > > Yes, I am aware of this, and when this happens, cwnd will sit @ 1 until > > TCP gets out of recovery (1 retransmits per rtt). It's debatable > > Nothing to debate, really. Following specs lost retransmission must trigger > RTO timeout with sunsubsequent slow start starting from 1. I don't always see time-out+slow-start in this case. I have seen TCP staying in recovery for a very long time with cwnd=1. I think Linux TCP has various detection mechanisms for lost rexmit so time-out doesn't always happen. I think there is lost rexmit detection code at the end of tcp_sacktag_write_queue. > We are _more_ liberal when SACKs are on, this really can be argued > and I am ready to defend the liberal values. :-) But when we cannot > detect loss opf retransmission, we have to return to standard go-back-n > behaviour. Yes, Linux does indeed do more aggressive lost detection than what's in the spec. I am fine with that. > Not a big deal. You have to wait the same 100 RTTs at start of each connection, > even though initial slow start _assumes_ that no congestion is present. > So, when you experienced real congestion, slow start is really fair. :-) I am not sure what you mean here, slow start takes much faster to transmit 100 pkts than 100 RTTs. Maybe there is a misunderstanding here, when cwnd=1 in congestion recovery, it will stay at 1 until it exits recovery unless some packet times out first. > Not less than 10% of loss. Maybe, more, but you were lucky and your > packets percolated through deadly congested router. :-) > It is basic assumption of cwnd avoidance: each loss is considered > as loss due to congestion. Certainly. It's difficult to know what the right cwnd should be when this happens, and a conservative approach may be the best one to take. I am just curious if people can come up with better ways to set cwnd. Cheng ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 1:58 ` Cheng Jin @ 2003-01-14 2:12 ` kuznet 2003-01-14 2:19 ` Cheng Jin 0 siblings, 1 reply; 22+ messages in thread From: kuznet @ 2003-01-14 2:12 UTC (permalink / raw) To: Cheng Jin; +Cc: wa, netdev Hello! > I am not sure what you mean here, slow start takes much faster to > transmit 100 pkts than 100 RTTs. Maybe there is a misunderstanding here, > when cwnd=1 in congestion recovery, it will stay at 1 until it exits > recovery unless some packet times out first. I see! Seems, you spotted a real problem. I missed this. I feel we should add a trigger stopping such "fast" :-) restransmit when cwnd falls low. Maybe, the same <cwnd/4 is good criterium. I need to think, the situation is really weird. Could you, please, prepare a demo pseudo-tcpdump with your simulator? > just curious if people can come up with better ways to set cwnd. Come up! :-) Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 2:12 ` kuznet @ 2003-01-14 2:19 ` Cheng Jin 2003-01-14 5:07 ` kuznet 0 siblings, 1 reply; 22+ messages in thread From: Cheng Jin @ 2003-01-14 2:19 UTC (permalink / raw) To: kuznet@ms2.inr.ac.ru; +Cc: wa@almesberger.net, netdev@oss.sgi.com > Could you, please, prepare a demo pseudo-tcpdump with your simulator? What would you like to see in the demo? Just the straight output from my simulator? Cheng ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 2:19 ` Cheng Jin @ 2003-01-14 5:07 ` kuznet 0 siblings, 0 replies; 22+ messages in thread From: kuznet @ 2003-01-14 5:07 UTC (permalink / raw) To: Cheng Jin; +Cc: wa, netdev Hello! > What would you like to see in the demo? Just the straight output from my > simulator? Something which will allow to restore history to demonstrate how it is possible to finish in recovery state with cwnd of 1 and lots of data in flight. I see that this is not impossible, but simply cannot figure out when this madness happens. Format does not matter. seq/(s)ack paires are good. Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 0:12 ` kuznet 2003-01-14 1:20 ` Cheng Jin @ 2003-01-14 4:01 ` Werner Almesberger [not found] ` <200301140502.IAA10733@sex.inr.ac.ru> 2003-01-19 6:55 ` example showing how cwnd gets to one Cheng Jin 1 sibling, 2 replies; 22+ messages in thread From: Werner Almesberger @ 2003-01-14 4:01 UTC (permalink / raw) To: kuznet; +Cc: netdev, chengjin kuznet@ms2.inr.ac.ru wrote: > Of course. But draining happens when you received more ACKs than > you sent packets. When such pathalogy happens we just have to do something, > at least to understand when it happens under normal conditions. Yes, that's a separate problem. There are (at least :-) two problems we're looking at: - cwnd getting too small, probably due to "natural causes", and not recovering properly - cwnd getting reduced too much by the ssthresh/2 test Cheng's been talking about the first one, I'm on the latter. Note that in my case, no retransmissions are lost. There are only some additional losses in the initial cwnd (actually, just one loss would be sufficient), which then extend the recovery period. I went through draft-ratehalving and compared it with what our TCP is doing. It seems that the test is actually about 50% right :-) Below is my analysis of the situation. How do you like the idea of adding another variable ? :-) Here we go: To analyze whether changing tcp_input.c:tcp_cwnd_down from if (decr && tp->snd_cwnd > tp->snd_ssthresh/2) to if (decr && tp->snd_cwnd > tp->snd_ssthresh) yields valid TCP behaviour, I'm comparing the algorithm specification in draft-ratehalving [1] with the implementation of Linux TCP. The goal is to show whether Linux TCP still performs according to draft-ratehalving after the change. [1] http://www.psc.edu/networking/ftp/papers/draft-ratehalving.txt draft-ratehalving generally aims to set cwnd (they call it rhcwnd) to (prior_cwnd-loss)/2 (section 5.14), where "loss" are the packets that have been lost from the RTT sent before we entered recovery, and "prior_cwnd" is the cwnd at the time when we began recovery. This is also explained in section 3.1 of RFC2581. For simplicity, let's assume there is no reordering, no ACK loss, and no jumps in delay. Without NewReno, there are two cases: if the loss is indicated by an "exact" means (ECN, SACK), it reduces cwnd by half the distance by which fack is advanced, plus half the size of any "holes" found via SACK (5.6). At the end of recovery, cwnd should therefore reach (prior_cwnd-loss)/2, as specified above. Still without NewReno, if loss is indicated by a duplicate ACK without SACK, cwnd is reduced by half a segment for each duplicate ACK received (4.7). This way, cwnd will shrink to (prior_cwnd+loss)/2. (No typo - it's "+"). With NewReno, the algorithms are the same, but cwnd stops decrementing at cwnd <= prior_cwnd/2 (5.12). Once we get out of recovery, cwnd gets set to (prior_cwnd-num_retrans)/2, where num_retrans is the number of retransmissions in the "repair interval" (4.13). This is effectively the number of retransmissions we needed to fix the initial loss. I'm not entirely sure how this changes if we lose a retransmission. RFC2581 requires cwnd to be halved twice in this case (4.3). At the end (4.14), draft-ratehalving forces the new cwnd below prior_cwnd/2 (in case we didn't decrement enough, e.g. in the second "old" Reno case). It also sets ssthresh to the new cwnd, but makes sure it (ssthresh) does not drop below prior_cwnd/4 to ensure "that the TCP connection is not unduly harmed by extreme network conditions" (5.14, probably meaning reordering). When entering congestion (tcp_input.c:tcp_fastretrans_alert), Linux TCP sets ssthresh to roughly half cwnd (tcp.h:tcp_recalc_ssthresh). Note that this differs from the requirement of setting ssthresh to half the amount of data in flight. During recovery, cwnd reduction is done by tcp_input.c:tcp_cwnd_down as follows: snd_cwnd is decremented for every second (duplicate) ACK, which corresponds to sections 4.7 and 4.12 of draft-ratehalving, except that snd_cwnd reduction stops at snd_ssthresh/2 (corresponding roughly to prior_cwnd/4) instead of prior_cwnd/2. Additionally, cwnd may be further reduced if there are less than cwnd packets in flight. (This deserves further analysis.) The equivalent to the first part of 4.14 happens in tcp_complete_cwr: cwnd is set to the minimum of cwnd and ssthresh, where the latter is (roughly) prior_cwnd/2. Raising the cut-off point for cwnd reduction to ssthresh would still yield the cwnd decrease described in section 4.7, and the cut-off would occur at the point described in section 4.12. Furthermore, at the end of recovery, snd_cwnd is set up prior_cwnd/2, which is consistent with section 5.14. So far, so good. Unfortunately, there are are two exceptions: a loss outside the cwnd in which the initial loss occurred (i.e. loss of data above high_seq) or the loss of a retransmission is required to cause another halving of cwnd. A loss above high_seq is detected and handled as separate loss after the current loss episode has ended, and there does not need to concern us here. However, loss of a retransmission is handled implicitly, as follows: it extends the recovery interval to at least 2*RTT. This causes the current implementation of tcp_cwnd_down to decrement snd_cwnd to ssthresh/2, yielding the correct result. The most correct solution seems therefore to introduce yet another TCP state variable cwnd_bound that limits how far tcp_cwnd_down can decrement snd_cwnd. Initially, tcp_input.c:tcp_clear_retrans and tcp_input.c:tcp_fastretrans_alert would set cwnd_bound to the reduced snd_ssthresh, limiting snd_cwnd reductions to prior_cwnd/2. If tcp_input.c:tcp_sacktag_write_queue detects loss of a retransmission, it sets cwnd_bound to ssthresh/2, allowing reduction down to prior_cwnd/4. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <200301140502.IAA10733@sex.inr.ac.ru>]
* Re: snd_cwnd drawn and quartered [not found] ` <200301140502.IAA10733@sex.inr.ac.ru> @ 2003-01-14 5:25 ` Werner Almesberger 2003-01-14 6:14 ` kuznet 0 siblings, 1 reply; 22+ messages in thread From: Werner Almesberger @ 2003-01-14 5:25 UTC (permalink / raw) To: kuznet; +Cc: netdev, chengjin kuznet@ms2.inr.ac.ru wrote: > Losses in the same window must not result in multiplicative collapse > of cwnd or extension of recovery period. If they do, we are really buggy. Oh, according to NewReno, extending the recovery period is just fine. You just need to stop counting ... And what we have is definitely NewReno-ish, with recovery ending at high_seq. > Which is funny and unexpected, I read the paper only after the implementation > has been mostly finished (based on old Hoe's papers) and just added > some colorful details, sort of this one. :-) That explains why things are so similar but still not quite the same :-)) > It is required to stop all the smartness and to enter RTO timeout, Okay, that makes also the ssthresh/2 fix easier, because it eliminates the one case where the current behaviour is actually right. > I did not get this. This limit is boundary check which you referred to > several sentences above. Not falling under prior_cwnd/2 does not need > a special check, we do not receive more than prior_cnwd ACKs while single > recovery period in any case. Yes yes, we do ... that's what NewReno is all about. Let's say you lose the 0th and the 99th segment (with cwnd = 100). You'll detect the first loss at t = 100, and enter recovery. You leave recovery once snd_nxt (stored in high_seq) at that time has been ack'ed. So this is 100. At time t=199, we find out about the loss of the 99th segment, and retransmit. This gets ack'ed at time t=299. So it's only then when we leave recovery. draft-ratehalving distinguishes the adjustement interval and the repair interval. The latter lasts until we've fixed all losses, while the former should indeed not exceed one RTT. It's this limitation that's missing in Linux. > Huh. We can just shot this silly heuritsics, if it hurts. :-) If you enter RTO upon loss of retransmission, we can, yes. (And we don't even need a new variable ;-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 5:25 ` Werner Almesberger @ 2003-01-14 6:14 ` kuznet 2003-01-14 6:36 ` Werner Almesberger 0 siblings, 1 reply; 22+ messages in thread From: kuznet @ 2003-01-14 6:14 UTC (permalink / raw) To: Werner Almesberger; +Cc: netdev, chengjin Hello! > > Losses in the same window must not result in multiplicative collapse > > of cwnd or extension of recovery period. If they do, we are really buggy. > > Oh, according to NewReno, extending the recovery period is just fine. Sorry, it is just nonsense in newreno, it is that high_seq makes. Well, and this is surely not fine, losses in several consecutive windows must result in multiplicative reduction of cwnd. > Yes yes, we do ... that's what NewReno is all about. Let's say you > lose the 0th and the 99th segment (with cwnd = 100). You'll detect > the first loss at t = 100, and enter recovery. You leave recovery > once snd_nxt (stored in high_seq) at that time has been ack'ed. So > this is 100. At time t=199, we find out about the loss of the 99th > segment, and retransmit. This gets ack'ed at time t=299. So it's > only then when we leave recovery. I do not understand, what you has just said. You cant discover loss of 99th segment _after_ 100th was happily ACKed. :-) :-) > draft-ratehalving distinguishes the adjustement interval and the > repair interval. The latter lasts until we've fixed all losses, > while the former should indeed not exceed one RTT. It's this > limitation that's missing in Linux. But thit does not matter because it handles opposite case, when cwnd was not reduced enough for one RTT, so rh falls to hold state to decrease cwnd smoothly. It is just an unnecessary complication to my opinion. Did I get this wrong? Picture, Werner! What does happen on picture quarter.eps???? That's question. Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 6:14 ` kuznet @ 2003-01-14 6:36 ` Werner Almesberger 2003-01-15 17:50 ` kuznet 0 siblings, 1 reply; 22+ messages in thread From: Werner Almesberger @ 2003-01-14 6:36 UTC (permalink / raw) To: kuznet; +Cc: netdev, chengjin kuznet@ms2.inr.ac.ru wrote: > Sorry, it is just nonsense in newreno, it is that high_seq makes. > Well, and this is surely not fine, losses in several consecutive > windows must result in multiplicative reduction of cwnd. This is precisely what NewReno does. If you lose anything within that cwnd, recovery is extended. If you lose something after the cwnd, you'll first finish the old recovery cycle (since snd_una has now passed high_seq), and then enter a new one, with the appropriate reduction of ssthresh and cwnd. > > Yes yes, we do ... that's what NewReno is all about. Let's say you > > lose the 0th and the 99th segment (with cwnd = 100). You'll detect > > the first loss at t = 100, and enter recovery. You leave recovery > > once snd_nxt (stored in high_seq) at that time has been ack'ed. So > > this is 100. At time t=199, we find out about the loss of the 99th > > segment, and retransmit. This gets ack'ed at time t=299. So it's > > only then when we leave recovery. > > I do not understand, what you has just said. You cant discover > loss of 99th segment _after_ 100th was happily ACKed. :-) :-) That would be kind of evil :-) But that's not what I meant. The 100 refers to high_seq, i.e. the segment we need to get ack'ed for leaving recovery. > But thit does not matter because it handles opposite case, when cwnd > was not reduced enough for one RTT, so rh falls to hold state > to decrease cwnd smoothly. It is just an unnecessary complication > to my opinion. Did I get this wrong? Hmm, I think it's like I said. The case of cwnd not getting reduced enough is handled when exiting repair, i.e. section 4.13 and 4.14. Note that those intervals don't have to be explicitly tracked in any way. E.g. the adjustment interval can just be implemented by stopping decrementing cwnd at the new ssthresh. > Picture, Werner! What does happen on picture quarter.eps???? t=0: we have the first loss. At this time, snd_nxt is 100, so we set high_seq accordingly, slash ssthresh, and enter recovery. ~98: we have the last loss within our cwnd (the simulation just stops losing packets at this point. In real life, this transmission should of course be smoother.) 100: we've recovered our initial loss, but snd_una is still below high_seq, because of all the other losses in that cwnd ~200: we recover the last loss, and snd_una finally increases above high_seq, so we leave recovery now. All that is just fine, if I interpret NewReno right. The only problem is that we should have stopped reducing cwnd after the first RTT. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-14 6:36 ` Werner Almesberger @ 2003-01-15 17:50 ` kuznet 2003-01-15 18:25 ` Werner Almesberger 0 siblings, 1 reply; 22+ messages in thread From: kuznet @ 2003-01-15 17:50 UTC (permalink / raw) To: Werner Almesberger; +Cc: netdev, chengjin Hello! > This is precisely what NewReno does. If you lose anything within > that cwnd, recovery is extended. Werner, where did you get this information? In that case recovery will not finish. :-) > 100 refers to high_seq, i.e. the segment we need to get ack'ed > for leaving recovery. I still do not understand. Apparently it is based on assumption of extension of high_seq which must not happen. > 100: we've recovered our initial loss, but snd_una is still > below high_seq, because of all the other losses in that > cwnd This must not happen. I did not mean this in code and cannot see how it can happen. high_seq is set once while single recovery cycle. Something is buggy. Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-15 17:50 ` kuznet @ 2003-01-15 18:25 ` Werner Almesberger 2003-01-15 18:43 ` kuznet 0 siblings, 1 reply; 22+ messages in thread From: Werner Almesberger @ 2003-01-15 18:25 UTC (permalink / raw) To: kuznet; +Cc: netdev, chengjin kuznet@ms2.inr.ac.ru wrote: >> This is precisely what NewReno does. If you lose anything within >> that cwnd, recovery is extended. > > Werner, where did you get this information? In that case recovery > will not finish. :-) Maybe I used the wrong word. The sequence number we're waiting for (high_seq) doesn't change, of course. But the recovery takes longer than just one RTT, because it takes longer for snd_una to reach high_seq - due to the second loss. And because recovery takes longer than one RTT, we decrement cwnd too much. >> 100: we've recovered our initial loss, but snd_una is still >> below high_seq, because of all the other losses in that >> cwnd > > This must not happen. I did not mean this in code and cannot see > how it can happen. high_seq is set once while single recovery cycle. > Something is buggy. Yes, high_seq is set only once. That's okay. It's snd_una that (correctly) takes more than one RTT to reach high_seq. 1) tcp_enter_loss sets high_seq = snd_nxt. At that time (t = 0), snd_una is 0, snd_nxt is 100. 2) tcp_fastretrans_alert tries to exit recovery only if snd_una reaches high_seq Am I reading this right ? - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-15 18:25 ` Werner Almesberger @ 2003-01-15 18:43 ` kuznet 2003-01-15 19:37 ` Werner Almesberger 0 siblings, 1 reply; 22+ messages in thread From: kuznet @ 2003-01-15 18:43 UTC (permalink / raw) To: Werner Almesberger; +Cc: netdev, chengjin Hello! > Am I reading this right ? Yes, most likely. Dumb me understood you finally. :-) We receive some amount of dupacks for segments >high_seq, before ACK for retransmitted 99th segment arrives and terminates recovery. You meaned this, right? :-) Yup, this is bug. OK, this case is sorted out. I think that small fix proposed by you is enough for beginning. Ough... another weirdness with cwand drained to 1 still remains. Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-15 18:43 ` kuznet @ 2003-01-15 19:37 ` Werner Almesberger 0 siblings, 0 replies; 22+ messages in thread From: Werner Almesberger @ 2003-01-15 19:37 UTC (permalink / raw) To: kuznet; +Cc: netdev, chengjin kuznet@ms2.inr.ac.ru wrote: > We receive some amount of dupacks for segments >high_seq, before ACK > for retransmitted 99th segment arrives and terminates recovery. You meaned > this, right? :-) Yes, that's it. > Yup, this is bug. OK, this case is sorted out. I think that small fix > proposed by you is enough for beginning. I think, in addition to this fix, you also need to do RTO upon loss of retransmission. Otherwise, my fix would make you halve cwnd only once in this case. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* example showing how cwnd gets to one 2003-01-14 4:01 ` Werner Almesberger [not found] ` <200301140502.IAA10733@sex.inr.ac.ru> @ 2003-01-19 6:55 ` Cheng Jin 1 sibling, 0 replies; 22+ messages in thread From: Cheng Jin @ 2003-01-19 6:55 UTC (permalink / raw) To: kuznet, werner; +Cc: netdev@oss.sgi.com [-- Attachment #1: Type: TEXT/PLAIN, Size: 860 bytes --] Hi Alexey, Sorry about the delay in getting you the example. I wanted to clean up the output/code a little bit. I have attached the tar ball of the source code of the simulator plus an example showing how cwnd goes from 200 to 1. It's under 20% random packet loss. You might laugh because the loss rate is "unrealistically" high, but on long latency paths, this can happen if the bottleneck queue is not very large. If you only care about ack/packets sent, please do a 'grep ">>>" file' on the example file. I have included the TCP recovery state variable to make things clear in the output. maize is the sender, and blue is the receiver in the simulation. tcp_time_stamp in the simulation is kept in rounds/windows of packets i.e., 0 correspondes to the first round where first loss happens, instead of jiffies. Thanks, Cheng Lab # 626 395 8820 [-- Attachment #2: Type: APPLICATION/X-GZIP, Size: 22130 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: snd_cwnd drawn and quartered 2003-01-02 6:08 ` Werner Almesberger ` (2 preceding siblings ...) 2003-01-14 0:12 ` kuznet @ 2003-01-14 0:54 ` kuznet 3 siblings, 0 replies; 22+ messages in thread From: kuznet @ 2003-01-14 0:54 UTC (permalink / raw) To: Werner Almesberger; +Cc: netdev, chengjin Hello! > http://www.almesberger.net/misc/quarter.eps So... recovery is supposed to terminate when snd.una reaches 100 (snd.nxt at beginning of fast retransmit). In this case cwnd would be orig_cwnd/2, as expected. But it did not stop! Hence, something extraordinary happened while recovery, which resulted in the second recovery. But I do not understand why snd_ssthresh was not shrinken too. All this smells like a bug. I do not see from the picture what was this. Can you make a pseudo-tcpdump instead of picture? Alexey ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2003-01-19 6:55 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-25 1:50 snd_cwnd drawn and quartered Werner Almesberger
2003-01-02 1:38 ` kuznet
2003-01-02 6:08 ` Werner Almesberger
2003-01-02 8:31 ` Werner Almesberger
2003-01-02 21:26 ` Werner Almesberger
2003-01-14 0:12 ` kuznet
2003-01-14 1:20 ` Cheng Jin
2003-01-14 1:46 ` kuznet
2003-01-14 1:58 ` Cheng Jin
2003-01-14 2:12 ` kuznet
2003-01-14 2:19 ` Cheng Jin
2003-01-14 5:07 ` kuznet
2003-01-14 4:01 ` Werner Almesberger
[not found] ` <200301140502.IAA10733@sex.inr.ac.ru>
2003-01-14 5:25 ` Werner Almesberger
2003-01-14 6:14 ` kuznet
2003-01-14 6:36 ` Werner Almesberger
2003-01-15 17:50 ` kuznet
2003-01-15 18:25 ` Werner Almesberger
2003-01-15 18:43 ` kuznet
2003-01-15 19:37 ` Werner Almesberger
2003-01-19 6:55 ` example showing how cwnd gets to one Cheng Jin
2003-01-14 0:54 ` snd_cwnd drawn and quartered kuznet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).