TCP IPv4 strange retransmits

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* TCP IPv4 strange retransmits
@ 2008-03-04 13:00 Arnd Hannemann
  2008-03-04 13:36 ` Ilpo Järvinen
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Hannemann @ 2008-03-04 13:00 UTC (permalink / raw)
  To: Netdev

Hi,

I'm observing some retransmits with kernel 2.6.24.2, which I don't understand.
For instance in this cutout[1] of a sequence diagram which was captured[2]
on the TCP sender, 4 retransmits are made.
According to netstat -st output[3][4] all those 4 retransmits were "fast retransmit".
But there are no three DUPACKs which I expected would be needed for fast retransmit?

Also interesting all retransmits happen _after_ those segments were
already acked and sacked, internal queuing or latency issues?

It would be great if somebody could shed some light on this,
why those segments are retransmitted.
Dumps and xplots are available here[5].

Scenario details:

192.168.0.5 <---------------\ /------------------> 192.168.0.7
	                     |
   [ tc qdisc add dev wldev root netem delay 10ms reorder 25% ]
                             |
                             |
                         192.168.0.6

192.168.0.5 establishes connection to 192.168.0.7 via 192.168.0.6.
Then bulk tcp transfer was performed from 192.168.0.5 to 192.168.0.6 for 500 ms.
Default tcp configuration of 2.6.24.2 was used (cc=cubic).

Best regards,
Arnd

[1] http://www.umic-mesh.net/~hannemann/strange-reorder/strange_reorder.png
[2] http://www.umic-mesh.net/~hannemann/strange-reorder/sender.dump
[3] http://www.umic-mesh.net/~hannemann/strange-reorder/netstat-sender.before
[4] http://www.umic-mesh.net/~hannemann/strange-reorder/netstat-sender.after
[5] http://www.umic-mesh.net/~hannemann/strange-reorder/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-04 13:00 TCP IPv4 strange retransmits Arnd Hannemann
@ 2008-03-04 13:36 ` Ilpo Järvinen
  2008-03-04 14:31   ` Arnd Hannemann
  0 siblings, 1 reply; 11+ messages in thread
From: Ilpo Järvinen @ 2008-03-04 13:36 UTC (permalink / raw)
  To: Arnd Hannemann; +Cc: Netdev

On Tue, 4 Mar 2008, Arnd Hannemann wrote:

> I'm observing some retransmits with kernel 2.6.24.2, which I don't 
> understand. For instance in this cutout[1] of a sequence diagram which 
> was captured[2] on the TCP sender, 4 retransmits are made.

They don't correspond to each other?

> According to netstat -st output[3][4] all those 4 retransmits were "fast 
> retransmit".
> But there are no three DUPACKs which I expected would be needed for fast 
> retransmit?

With FACK it's enough that you have fackets_out > tp->reordering 
(=dupThresh).

> Also interesting all retransmits happen _after_ those segments were
> already acked and sacked, internal queuing or latency issues?

I think your viewer is doing something wrong, sender.dump is not giving 
such information (or you draw that from wrong end?). Or it just draws
DSACK like that?

> It would be great if somebody could shed some light on this,
> why those segments are retransmitted.
> Dumps and xplots are available here[5].

...I quickly glanced over it and found no strange behavior in
the sender.dump.

-- 
 i.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-04 13:36 ` Ilpo Järvinen
@ 2008-03-04 14:31   ` Arnd Hannemann
  2008-03-04 21:04     ` H. Willstrand
  2008-03-04 21:07     ` Ilpo Järvinen
  0 siblings, 2 replies; 11+ messages in thread
From: Arnd Hannemann @ 2008-03-04 14:31 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev

Hi,

Ilpo Järvinen wrote:
> On Tue, 4 Mar 2008, Arnd Hannemann wrote:
> 
>> I'm observing some retransmits with kernel 2.6.24.2, which I don't 
>> understand. For instance in this cutout[1] of a sequence diagram which 
>> was captured[2] on the TCP sender, 4 retransmits are made.
> 
> They don't correspond to each other?

Hmm, they should.

> 
>> According to netstat -st output[3][4] all those 4 retransmits were "fast 
>> retransmit".
>> But there are no three DUPACKs which I expected would be needed for fast 
>> retransmit?
> 
> With FACK it's enough that you have fackets_out > tp->reordering 
> (=dupThresh).

If it is FACK shouldn't it be accounted for LINUX_MIB_TCPFORWARDRETRANS
instead of LINUX_MIB_TCPFASTRETRANS?

> 
>> Also interesting all retransmits happen _after_ those segments were
>> already acked and sacked, internal queuing or latency issues?
> 
> I think your viewer is doing something wrong, sender.dump is not giving 
> such information (or you draw that from wrong end?). Or it just draws
> DSACK like that?

Viewer is tcptrace and xplot. So nothing special at all.
You see it also in wireshark, if you draw a sequence diagram.
You also see it in wireshark if you sort by capture timestamp. I always thought
that capture timestamp order is correct and not dump order, but maybe I'm wrong?

Tcpdump:

12:08:20.667538 IP 192.168.0.7.33824 > 192.168.0.5.50139: . ack 23485 win 22720 <nop,nop,timestamp 969759 972885,nop,nop,sack 2 {24905:26325}{27745:29165}>
^^^^^ got acked at .667538

12:08:20.646749 IP 192.168.0.5.50139 > 192.168.0.7.33824: . 22065:23485(1420) ack 1 win 2864 <nop,nop,timestamp 972885 969754>
^^^^^ got retransmitted at .646749

> 
>> It would be great if somebody could shed some light on this,
>> why those segments are retransmitted.
>> Dumps and xplots are available here[5].
> 
> ...I quickly glanced over it and found no strange behavior in
> the sender.dump.

Best regards,
Arnd


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-04 14:31   ` Arnd Hannemann
@ 2008-03-04 21:04     ` H. Willstrand
  2008-03-04 22:41       ` Arnd Hannemann
  2008-03-04 21:07     ` Ilpo Järvinen
  1 sibling, 1 reply; 11+ messages in thread
From: H. Willstrand @ 2008-03-04 21:04 UTC (permalink / raw)
  To: Arnd Hannemann; +Cc: Ilpo Järvinen, Netdev

On Tue, Mar 4, 2008 at 3:31 PM, Arnd Hannemann
<hannemann@nets.rwth-aachen.de> wrote:
> Hi,
>
>
>  Ilpo Järvinen wrote:
>  > On Tue, 4 Mar 2008, Arnd Hannemann wrote:
>  >
>  >> I'm observing some retransmits with kernel 2.6.24.2, which I don't
>  >> understand. For instance in this cutout[1] of a sequence diagram which
>  >> was captured[2] on the TCP sender, 4 retransmits are made.
>  >
>  > They don't correspond to each other?
>
>  Hmm, they should.
>
>
>  >
>  >> According to netstat -st output[3][4] all those 4 retransmits were "fast
>  >> retransmit".
>  >> But there are no three DUPACKs which I expected would be needed for fast
>  >> retransmit?
>  >
>  > With FACK it's enough that you have fackets_out > tp->reordering
>  > (=dupThresh).
>
>  If it is FACK shouldn't it be accounted for LINUX_MIB_TCPFORWARDRETRANS
>  instead of LINUX_MIB_TCPFASTRETRANS?
>
>
>  >
>  >> Also interesting all retransmits happen _after_ those segments were
>  >> already acked and sacked, internal queuing or latency issues?
>  >
>  > I think your viewer is doing something wrong, sender.dump is not giving
>  > such information (or you draw that from wrong end?). Or it just draws
>  > DSACK like that?
>
>  Viewer is tcptrace and xplot. So nothing special at all.
>  You see it also in wireshark, if you draw a sequence diagram.
>  You also see it in wireshark if you sort by capture timestamp. I always thought
>  that capture timestamp order is correct and not dump order, but maybe I'm wrong?
>
>  Tcpdump:
>
>  12:08:20.667538 IP 192.168.0.7.33824 > 192.168.0.5.50139: . ack 23485 win 22720 <nop,nop,timestamp 969759 972885,nop,nop,sack 2 {24905:26325}{27745:29165}>
>  ^^^^^ got acked at .667538
>
>  12:08:20.646749 IP 192.168.0.5.50139 > 192.168.0.7.33824: . 22065:23485(1420) ack 1 win 2864 <nop,nop,timestamp 972885 969754>
>  ^^^^^ got retransmitted at .646749
>
>
>  >
>  >> It would be great if somebody could shed some light on this,
>  >> why those segments are retransmitted.
>  >> Dumps and xplots are available here[5].
>  >
>  > ...I quickly glanced over it and found no strange behavior in
>  > the sender.dump.
>
>  Best regards,
>  Arnd
>
>
>
>  --
>  To unsubscribe from this list: send the line "unsubscribe netdev" in
>  the body of a message to majordomo@vger.kernel.org
>  More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Hi!

I recommend you to capture packages both on sender-side and
receiver-side to verify the tcpdump.

Regards,
Harri

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-04 21:04     ` H. Willstrand
@ 2008-03-04 22:41       ` Arnd Hannemann
  0 siblings, 0 replies; 11+ messages in thread
From: Arnd Hannemann @ 2008-03-04 22:41 UTC (permalink / raw)
  To: H. Willstrand; +Cc: Ilpo Järvinen, Netdev

H. Willstrand wrote:

[snip]

> 
> Hi!
> 
> I recommend you to capture packages both on sender-side and
> receiver-side to verify the tcpdump.

In fact I did:

http://www.umic-mesh.net/~hannemann/strange-reorder/receiver.dump

> Regards,
> Harri
> 

Regards,
Arnd


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-04 14:31   ` Arnd Hannemann
  2008-03-04 21:04     ` H. Willstrand
@ 2008-03-04 21:07     ` Ilpo Järvinen
  2008-03-04 21:19       ` Ilpo Järvinen
  2008-03-04 23:03       ` Arnd Hannemann
  1 sibling, 2 replies; 11+ messages in thread
From: Ilpo Järvinen @ 2008-03-04 21:07 UTC (permalink / raw)
  To: Arnd Hannemann; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3278 bytes --]

On Tue, 4 Mar 2008, Arnd Hannemann wrote:

> Hi,
> 
> Ilpo Järvinen wrote:
> > On Tue, 4 Mar 2008, Arnd Hannemann wrote:
> > 
> >> I'm observing some retransmits with kernel 2.6.24.2, which I don't 
> >> understand. For instance in this cutout[1] of a sequence diagram which 
> >> was captured[2] on the TCP sender, 4 retransmits are made.
> > 
> > They don't correspond to each other?
> 
> Hmm, they should.

Yeah, they probably do, I was just too hasty and failed to notice those 
small negative offsets.

> >> According to netstat -st output[3][4] all those 4 retransmits were "fast 
> >> retransmit".
> >> But there are no three DUPACKs which I expected would be needed for fast 
> >> retransmit?
> > 
> > With FACK it's enough that you have fackets_out > tp->reordering 
> > (=dupThresh).
> 
> If it is FACK shouldn't it be accounted for LINUX_MIB_TCPFORWARDRETRANS
> instead of LINUX_MIB_TCPFASTRETRANS?

No, if there's any skb which is more than fackets_out-tp->reordering from 
the highest SACKed skb, it will be marked TCPCB_LOST (see 
tcp_mark_head_lost & it's caller), and all LOST segments are retransmitted 
by the earlier loop (for a while still as I'm going to very likely change 
that in net-2.6.26, commits for consolidating both, nearly identical loops 
are already in my local git and await some testing).

Forwardretrans is only incremented when there isn't TCPCB_LOST set for a 
segment and it doesn't apply in this case anyway because you have new data 
to send (see the decision making for forward retransmits, it's well 
commented btw).

> >> Also interesting all retransmits happen _after_ those segments were
> >> already acked and sacked, internal queuing or latency issues?
> > 
> > I think your viewer is doing something wrong, sender.dump is not giving 
> > such information (or you draw that from wrong end?). Or it just draws
> > DSACK like that?
> 
> Viewer is tcptrace and xplot. So nothing special at all.
> You see it also in wireshark, if you draw a sequence diagram.

Ah, now I noticed those small timeleaps, very small enough to not
catch my eye earlier as the amount of numbers in such screen is just
overhelming... :-)

> You also see it in wireshark if you sort by capture timestamp. I always 
> thought that capture timestamp order is correct and not dump order, but 
> maybe I'm wrong?

I'm not sure, in the other order they make very much sense. In addition, 
the ACKs are processed in order and their effects are immediate even if 
there's more information awaiting to be processed.

> Tcpdump:
> 
> 12:08:20.667538 IP 192.168.0.7.33824 > 192.168.0.5.50139: . ack 23485 win 22720 <nop,nop,timestamp 969759 972885,nop,nop,sack 2 {24905:26325}{27745:29165}>
> ^^^^^ got acked at .667538

Did you paste wrong timestamp as 667538 == 667538? ...It just makes no 
sense for me, what are you trying to say here?

> 12:08:20.646749 IP 192.168.0.5.50139 > 192.168.0.7.33824: . 22065:23485(1420) ack 1 win 2864 <nop,nop,timestamp 972885 969754>
> ^^^^^ got retransmitted at .646749

What's the problem here? At .646749 something was retransmitted, but only 
after .667538 it was acked? Again, this makes very little sense for me...
Why did you copy them wrong way around from the tcpdump log? Or are these 
two lines related at all?

-- 
 i.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-04 21:07     ` Ilpo Järvinen
@ 2008-03-04 21:19       ` Ilpo Järvinen
  2008-03-04 23:03       ` Arnd Hannemann
  1 sibling, 0 replies; 11+ messages in thread
From: Ilpo Järvinen @ 2008-03-04 21:19 UTC (permalink / raw)
  To: Arnd Hannemann; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 835 bytes --]

On Tue, 4 Mar 2008, Ilpo Järvinen wrote:

> In addition, the ACKs are processed in order and their effects are 
> immediate even if there's more information awaiting to be processed.

Before somebody asks or suggests (one might be tempted to think it's a 
good idea), no, we likely don't want to do it other way around unless 
somebody first proves that it won't negatively affect TCP's ACK clock,
and would benefits only some corner-case like this (and even in such 
case, one might get bitten by the tcp_max_burst). It would of course be 
possible to come up with a solution that reverse those _and_ fixes the 
ACK clock problems caused by such approach. The problems are similar to 
what LRO is causing btw, so it might not be complete waste of efforts to 
fix the ACK clock problems and reuse the solution in LRO as well.

-- 
 i.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-04 21:07     ` Ilpo Järvinen
  2008-03-04 21:19       ` Ilpo Järvinen
@ 2008-03-04 23:03       ` Arnd Hannemann
  2008-03-05  7:00         ` Ilpo Järvinen
  1 sibling, 1 reply; 11+ messages in thread
From: Arnd Hannemann @ 2008-03-04 23:03 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev

Ilpo Järvinen wrote:
> On Tue, 4 Mar 2008, Arnd Hannemann wrote:
> 
>> Hi,
>>
>> Ilpo Järvinen wrote:
>>> On Tue, 4 Mar 2008, Arnd Hannemann wrote:
>>>
>>>> I'm observing some retransmits with kernel 2.6.24.2, which I don't 
>>>> understand. For instance in this cutout[1] of a sequence diagram which 
>>>> was captured[2] on the TCP sender, 4 retransmits are made.
>>> They don't correspond to each other?
>> Hmm, they should.
> 
> Yeah, they probably do, I was just too hasty and failed to notice those 
> small negative offsets.
> 
>>>> According to netstat -st output[3][4] all those 4 retransmits were "fast 
>>>> retransmit".
>>>> But there are no three DUPACKs which I expected would be needed for fast 
>>>> retransmit?
>>> With FACK it's enough that you have fackets_out > tp->reordering 
>>> (=dupThresh).
>> If it is FACK shouldn't it be accounted for LINUX_MIB_TCPFORWARDRETRANS
>> instead of LINUX_MIB_TCPFASTRETRANS?
> 
> No, if there's any skb which is more than fackets_out-tp->reordering from 
> the highest SACKed skb, it will be marked TCPCB_LOST (see 
> tcp_mark_head_lost & it's caller), and all LOST segments are retransmitted 
> by the earlier loop (for a while still as I'm going to very likely change 
> that in net-2.6.26, commits for consolidating both, nearly identical loops 
> are already in my local git and await some testing).
> 
> Forwardretrans is only incremented when there isn't TCPCB_LOST set for a 
> segment and it doesn't apply in this case anyway because you have new data 
> to send (see the decision making for forward retransmits, it's well 
> commented btw).

Ah, I see. Thank you for clarifying.
However fackets_out is not so well documented ;-)
But it now makes all sense (with dump order):
An ACK 19225 arrives with SACK block {27745:29165}, so fackets_out becomes ~6 ((27745-19225)/1450)
tp->reordering is 3 at this time so he starts to retransmit.
However some SACK ACK comes early enough so he stops at 4 retransmits.
Or something like that...

> 
>>>> Also interesting all retransmits happen _after_ those segments were
>>>> already acked and sacked, internal queuing or latency issues?
>>> I think your viewer is doing something wrong, sender.dump is not giving 
>>> such information (or you draw that from wrong end?). Or it just draws
>>> DSACK like that?
>> Viewer is tcptrace and xplot. So nothing special at all.
>> You see it also in wireshark, if you draw a sequence diagram.
> 
> Ah, now I noticed those small timeleaps, very small enough to not
> catch my eye earlier as the amount of numbers in such screen is just
> overhelming... :-)

Very small indeed. Probably the time a packets travels in kernel through the layer
is higher than the difference between ACK and retransmit.

> 
>> You also see it in wireshark if you sort by capture timestamp. I always 
>> thought that capture timestamp order is correct and not dump order, but 
>> maybe I'm wrong?
> 
> I'm not sure, in the other order they make very much sense. In addition, 
> the ACKs are processed in order and their effects are immediate even if 
> there's more information awaiting to be processed.

> 
>> Tcpdump:
>>
>> 12:08:20.667538 IP 192.168.0.7.33824 > 192.168.0.5.50139: . ack 23485 win 22720 <nop,nop,timestamp 969759 972885,nop,nop,sack 2 {24905:26325}{27745:29165}>
>> ^^^^^ got acked at .667538
> 
> Did you paste wrong timestamp as 667538 == 667538? ...It just makes no 
> sense for me, what are you trying to say here?
> 
>> 12:08:20.646749 IP 192.168.0.5.50139 > 192.168.0.7.33824: . 22065:23485(1420) ack 1 win 2864 <nop,nop,timestamp 972885 969754>
>> ^^^^^ got retransmitted at .646749
> 
> What's the problem here? At .646749 something was retransmitted, but only 
> after .667538 it was acked? Again, this makes very little sense for me...
> Why did you copy them wrong way around from the tcpdump log? Or are these 
> two lines related at all?

Sorry, this was just bogus. Just wanted to point out the timestamp differences and made a
wrong example. Screen full of numbers... ;-)

Thanks for your help.

Best regards,
Arnd



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-04 23:03       ` Arnd Hannemann
@ 2008-03-05  7:00         ` Ilpo Järvinen
  2008-03-05 13:04           ` Arnd Hannemann
  0 siblings, 1 reply; 11+ messages in thread
From: Ilpo Järvinen @ 2008-03-05  7:00 UTC (permalink / raw)
  To: Arnd Hannemann; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3171 bytes --]

On Wed, 5 Mar 2008, Arnd Hannemann wrote:

> Ilpo Järvinen wrote:
>
> > No, if there's any skb which is more than fackets_out-tp->reordering from 
> > the highest SACKed skb, it will be marked TCPCB_LOST (see 
> > tcp_mark_head_lost & it's caller), and all LOST segments are retransmitted 
> > by the earlier loop (for a while still as I'm going to very likely change 
> > that in net-2.6.26, commits for consolidating both, nearly identical loops 
> > are already in my local git and await some testing).
> > 
> > Forwardretrans is only incremented when there isn't TCPCB_LOST set for a 
> > segment and it doesn't apply in this case anyway because you have new data 
> > to send (see the decision making for forward retransmits, it's well 
> > commented btw).
> 
> Ah, I see. Thank you for clarifying.
> However fackets_out is not so well documented ;-)

I think I've fixed this for 2.6.25... :-) :

...
/* Heurestics to calculate number of duplicate ACKs. There's no dupACKs
 * counter when SACK is enabled (without SACK, sacked_out is used for
 * that purpose).
 *
 * Instead, with FACK TCP uses fackets_out that includes both SACKed
 * segments up to the highest received SACK block so far and holes in
 * between them.
 *
 * With reordering, holes may still be in flight, so RFC3517 recovery
 * uses pure sacked_out (total number of SACKed segments) even though
 * it violates the RFC that uses duplicate ACKs, often these are equal
 * but when e.g. out-of-window ACKs or packet duplication occurs,
 * they differ. Since neither occurs due to loss, TCP should really
 * ignore them.
 */
static inline int tcp_dupack_heurestics(struct tcp_sock *tp)
...

...Though some FACK comments seem to be saying something else still.

> But it now makes all sense (with dump order):
> An ACK 19225 arrives with SACK block {27745:29165}, so fackets_out becomes 
> ~6 ((27745-19225)/1450)
> tp->reordering is 3 at this time so he starts to retransmit.
> However some SACK ACK comes early enough so he stops at 4 retransmits.
> Or something like that...

Another thing you should consider is reordering detection which hopefully 
worked at 13:08:20.667529 through the newly discored SACK block which is 
_lower_ than the highestmost SACK block received so far. That results in 
FACK -> RFC3517, FACK is built on inorder assumptions and whenever we find 
that untrue, e.g., due to SACK/ACK for non-rexmit when something larger 
has been confirmed received we disable it. Ah, but this was 2.6.24.y? It 
doesn't yet do RFC3517 IIRC, but has something remotely resembling 
newreno, but only for the first packet because the next cumulative ACK may 
often trigger timedout loop which basically marks everything lost (I don't 
remember if the latter was changed to occur only with FACK ages ago or 
not).

> >> Tcpdump:
> 
> Sorry, this was just bogus. Just wanted to point out the timestamp 
> differences and made a wrong example. Screen full of numbers... ;-)

I thought so :-).

...Large, nearly equal numbers in two dimensions, maybe at some day 
I wake up and notice I've read them too long noticing that capturing 
this kind of things is no longer a problem to me... :-/

-- 
 i.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-05  7:00         ` Ilpo Järvinen
@ 2008-03-05 13:04           ` Arnd Hannemann
  2008-03-05 19:32             ` Ilpo Järvinen
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Hannemann @ 2008-03-05 13:04 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev

Ilpo Järvinen wrote:
> On Wed, 5 Mar 2008, Arnd Hannemann wrote:
> 
>> Ilpo Järvinen wrote:
>>
>>> No, if there's any skb which is more than fackets_out-tp->reordering from 
>>> the highest SACKed skb, it will be marked TCPCB_LOST (see 
>>> tcp_mark_head_lost & it's caller), and all LOST segments are retransmitted 
>>> by the earlier loop (for a while still as I'm going to very likely change 
>>> that in net-2.6.26, commits for consolidating both, nearly identical loops 
>>> are already in my local git and await some testing).
>>>
>>> Forwardretrans is only incremented when there isn't TCPCB_LOST set for a 
>>> segment and it doesn't apply in this case anyway because you have new data 
>>> to send (see the decision making for forward retransmits, it's well 
>>> commented btw).
>> Ah, I see. Thank you for clarifying.
>> However fackets_out is not so well documented ;-)
> 
> I think I've fixed this for 2.6.25... :-) :
> 
> ...
> /* Heurestics to calculate number of duplicate ACKs. There's no dupACKs
>  * counter when SACK is enabled (without SACK, sacked_out is used for
>  * that purpose).
>  *
>  * Instead, with FACK TCP uses fackets_out that includes both SACKed
>  * segments up to the highest received SACK block so far and holes in
>  * between them.
>  *
>  * With reordering, holes may still be in flight, so RFC3517 recovery
>  * uses pure sacked_out (total number of SACKed segments) even though
>  * it violates the RFC that uses duplicate ACKs, often these are equal
>  * but when e.g. out-of-window ACKs or packet duplication occurs,
>  * they differ. Since neither occurs due to loss, TCP should really
>  * ignore them.
>  */
> static inline int tcp_dupack_heurestics(struct tcp_sock *tp)
> ...

Great :-) But shouldn't it read "heuristics" ?

> ...Though some FACK comments seem to be saying something else still.
> 
>> But it now makes all sense (with dump order):
>> An ACK 19225 arrives with SACK block {27745:29165}, so fackets_out becomes 
>> ~6 ((27745-19225)/1450)
>> tp->reordering is 3 at this time so he starts to retransmit.
>> However some SACK ACK comes early enough so he stops at 4 retransmits.
>> Or something like that...
> 
> Another thing you should consider is reordering detection which hopefully 
> worked at 13:08:20.667529 through the newly discored SACK block which is 
> _lower_ than the highestmost SACK block received so far. That results in 
> FACK -> RFC3517, FACK is built on inorder assumptions and whenever we find 
> that untrue, e.g., due to SACK/ACK for non-rexmit when something larger 
> has been confirmed received we disable it. Ah, but this was 2.6.24.y? It 

Yes, it was 2.6.24.2. Actually you can see reordering detection at work here[3],
the tool[4] we are using to measure TCP throughput samples the tcp_info struct and the
column #reor should reflect tp->reordering.
First it is 3 then it grows up to 16. Off course this is only a hint because
tcp_info is only sampled every 50ms in this example, but at least it shows that some
reordering detection took place...

> doesn't yet do RFC3517 IIRC, but has something remotely resembling 
> newreno, but only for the first packet because the next cumulative ACK may 
> often trigger timedout loop which basically marks everything lost (I don't 
> remember if the latter was changed to occur only with FACK ages ago or 
> not).

Not sure if I understood this. Will have to look into this some more.

> 
>>>> Tcpdump:
>> Sorry, this was just bogus. Just wanted to point out the timestamp 
>> differences and made a wrong example. Screen full of numbers... ;-)
> 
> I thought so :-).
> 
> ...Large, nearly equal numbers in two dimensions, maybe at some day 
> I wake up and notice I've read them too long noticing that capturing 
> this kind of things is no longer a problem to me... :-/
> 

[3] http://www.umic-mesh.net/~hannemann/strange-reorder/flowgrind.output
[4] http://www.umic-mesh.net/research/tcp/flowgrind.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP IPv4 strange retransmits
  2008-03-05 13:04           ` Arnd Hannemann
@ 2008-03-05 19:32             ` Ilpo Järvinen
  0 siblings, 0 replies; 11+ messages in thread
From: Ilpo Järvinen @ 2008-03-05 19:32 UTC (permalink / raw)
  To: Arnd Hannemann; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2207 bytes --]

On Wed, 5 Mar 2008, Arnd Hannemann wrote:

> Ilpo Järvinen wrote:
> > On Wed, 5 Mar 2008, Arnd Hannemann wrote:
> > 
> >> Ilpo Järvinen wrote:
> > 
> > ...
> > /* Heurestics to calculate number of duplicate ACKs. There's no dupACKs
> 
> Great :-) But shouldn't it read "heuristics" ?

Sure, a Finnish vovel leaked into it. If somebody would have asked,
I wouldn't even have known which was the right form in English.

> > Another thing you should consider is reordering detection which hopefully 
> > worked at 13:08:20.667529 through the newly discored SACK block which is 
> > _lower_ than the highestmost SACK block received so far. That results in 
> > FACK -> RFC3517, FACK is built on inorder assumptions and whenever we find 
> > that untrue, e.g., due to SACK/ACK for non-rexmit when something larger 
> > has been confirmed received we disable it. Ah, but this was 2.6.24.y? It 
> 
> Yes, it was 2.6.24.2. Actually you can see reordering detection at work here[3],
> the tool[4] we are using to measure TCP throughput samples the tcp_info struct and the
> column #reor should reflect tp->reordering.
> First it is 3 then it grows up to 16. Off course this is only a hint because
> tcp_info is only sampled every 50ms in this example, but at least it shows that some
> reordering detection took place...

Ok. I usually can determine exact events from tcpdump, too used to
them... :-)

> > doesn't yet do RFC3517 IIRC, but has something remotely resembling 
> > newreno, but only for the first packet because the next cumulative ACK may 
> > often trigger timedout loop which basically marks everything lost (I don't 
> > remember if the latter was changed to occur only with FACK ages ago or 
> > not).
> 
> Not sure if I understood this. Will have to look into this some more.

Before 2.6.25, the non-FACK SACK was quite strange mixture of things. 
It won't resemble anything RFCish by any means, unless timedout loop 
(see the loop that plays with scorboard_skb_hint) was already changed to 
be used with FACK only in 2.6.24 or before it (I don't remember if I ever 
submitted that because making non-FACK SACK behave very close to what 
RFC3517 does was just around the corner as well).


-- 
 i.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-03-05 19:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-04 13:00 TCP IPv4 strange retransmits Arnd Hannemann
2008-03-04 13:36 ` Ilpo Järvinen
2008-03-04 14:31   ` Arnd Hannemann
2008-03-04 21:04     ` H. Willstrand
2008-03-04 22:41       ` Arnd Hannemann
2008-03-04 21:07     ` Ilpo Järvinen
2008-03-04 21:19       ` Ilpo Järvinen
2008-03-04 23:03       ` Arnd Hannemann
2008-03-05  7:00         ` Ilpo Järvinen
2008-03-05 13:04           ` Arnd Hannemann
2008-03-05 19:32             ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).