netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3][v2] tcp: fix ICMP-RTO war
@ 2010-01-29 22:15 Damian Lukowski
  2010-02-01  7:33 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Damian Lukowski @ 2010-01-29 22:15 UTC (permalink / raw)
  To: Netdev

This patches fix the current RTO calculation routine, when
srtt and rttvar are zero, yielding an RTO of zero 
Under some circumstances, TCPs srtt and rttvar are zero,
yielding a calculated RTO of zero.
This is particularly unfortunate for ICMP based RTO recalculation
as introduced in f1ecd5d9e736660 (Revert Backoff [v3]: Revert RTO
on ICMP destination unreachable), as it results in RTO retransmission
flooding.

Thanks to Ilpo Jarvinen for providing debug patches and to
Denys Fedoryshchenko for reporting and testing.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3][v2] tcp: fix ICMP-RTO war
  2010-01-29 22:15 [PATCH 0/3][v2] tcp: fix ICMP-RTO war Damian Lukowski
@ 2010-02-01  7:33 ` David Miller
  2010-02-08 12:34   ` Damian Lukowski
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2010-02-01  7:33 UTC (permalink / raw)
  To: damian; +Cc: netdev

From: Damian Lukowski <damian@tvk.rwth-aachen.de>
Date: Fri, 29 Jan 2010 23:15:51 +0100

> This patches fix the current RTO calculation routine, when
> srtt and rttvar are zero, yielding an RTO of zero 
> Under some circumstances, TCPs srtt and rttvar are zero,
> yielding a calculated RTO of zero.
> This is particularly unfortunate for ICMP based RTO recalculation
> as introduced in f1ecd5d9e736660 (Revert Backoff [v3]: Revert RTO
> on ICMP destination unreachable), as it results in RTO retransmission
> flooding.
> 
> Thanks to Ilpo Jarvinen for providing debug patches and to
> Denys Fedoryshchenko for reporting and testing.
> 
> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>

I still haven't seen a detailed enough analysis of why these
tiny RTOs can come to exist in the first place.

Please show me a list of events, function by function, the value of
relevant variables and per-socket TCP state, in the TCP stack, that
show how this ends up happening.

Thanks for all of your work on this so far.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3][v2] tcp: fix ICMP-RTO war
  2010-02-01  7:33 ` David Miller
@ 2010-02-08 12:34   ` Damian Lukowski
  2010-02-08 21:58     ` Damian Lukowski
  0 siblings, 1 reply; 6+ messages in thread
From: Damian Lukowski @ 2010-02-08 12:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Am 01.02.2010, 08:33 Uhr, schrieb David Miller <davem@davemloft.net>:

> From: Damian Lukowski <damian@tvk.rwth-aachen.de>
> Date: Fri, 29 Jan 2010 23:15:51 +0100
>
>> This patches fix the current RTO calculation routine, when
>> srtt and rttvar are zero, yielding an RTO of zero
>> Under some circumstances, TCPs srtt and rttvar are zero,
>> yielding a calculated RTO of zero.
>> This is particularly unfortunate for ICMP based RTO recalculation
>> as introduced in f1ecd5d9e736660 (Revert Backoff [v3]: Revert RTO
>> on ICMP destination unreachable), as it results in RTO retransmission
>> flooding.
>>
>> Thanks to Ilpo Jarvinen for providing debug patches and to
>> Denys Fedoryshchenko for reporting and testing.
>>
>> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
>
> I still haven't seen a detailed enough analysis of why these
> tiny RTOs can come to exist in the first place.
>
> Please show me a list of events, function by function, the value of
> relevant variables and per-socket TCP state, in the TCP stack, that
> show how this ends up happening.
>
> Thanks for all of your work on this so far.

I might have figured it out, but could not verify it, so maybe you can
comment my thought.

When a listening TCP receives a SYN, it will send a SYN+ACK
and wait for an ACK to complete the handshake.
Look at tcp_rcv_state_process::step 5::case SYN_RECV::acceptable
and the code after the comment "tcp_ack considers this ACK as duplicate
and does not calculate rtt".

If the connecting client has disabled timestamps, the rtt statistics
won't be updated here, while the state is changed above.
I printk'ed at the very end of TCP_SYN_RECV and got the following:
state 1 (ESTABLISHED), srtt 0, rttvar 0.

So my suspicion is: If connectivity breaks right after a listening TCP
has completed the handshake without timestamps, and the listening TCP
sends data after establishing the connection, we will get the observed
behaviour.

Regards
  Damian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3][v2] tcp: fix ICMP-RTO war
  2010-02-08 12:34   ` Damian Lukowski
@ 2010-02-08 21:58     ` Damian Lukowski
  2010-02-09 12:37       ` Ilpo Järvinen
  0 siblings, 1 reply; 6+ messages in thread
From: Damian Lukowski @ 2010-02-08 21:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Damian Lukowski schrieb:
> Am 01.02.2010, 08:33 Uhr, schrieb David Miller <davem@davemloft.net>:
> 
>> From: Damian Lukowski <damian@tvk.rwth-aachen.de>
>> Date: Fri, 29 Jan 2010 23:15:51 +0100
>>
>>> This patches fix the current RTO calculation routine, when
>>> srtt and rttvar are zero, yielding an RTO of zero
>>> Under some circumstances, TCPs srtt and rttvar are zero,
>>> yielding a calculated RTO of zero.
>>> This is particularly unfortunate for ICMP based RTO recalculation
>>> as introduced in f1ecd5d9e736660 (Revert Backoff [v3]: Revert RTO
>>> on ICMP destination unreachable), as it results in RTO retransmission
>>> flooding.
>>>
>>> Thanks to Ilpo Jarvinen for providing debug patches and to
>>> Denys Fedoryshchenko for reporting and testing.
>>>
>>> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
>>
>> I still haven't seen a detailed enough analysis of why these
>> tiny RTOs can come to exist in the first place.
>>
>> Please show me a list of events, function by function, the value of
>> relevant variables and per-socket TCP state, in the TCP stack, that
>> show how this ends up happening.
>>
>> Thanks for all of your work on this so far.
> 
> I might have figured it out, but could not verify it, so maybe you can
> comment my thought.
> 
> When a listening TCP receives a SYN, it will send a SYN+ACK
> and wait for an ACK to complete the handshake.
> Look at tcp_rcv_state_process::step 5::case SYN_RECV::acceptable
> and the code after the comment "tcp_ack considers this ACK as duplicate
> and does not calculate rtt".
> 
> If the connecting client has disabled timestamps, the rtt statistics
> won't be updated here, while the state is changed above.
> I printk'ed at the very end of TCP_SYN_RECV and got the following:
> state 1 (ESTABLISHED), srtt 0, rttvar 0.
> 
> So my suspicion is: If connectivity breaks right after a listening TCP
> has completed the handshake without timestamps, and the listening TCP
> sends data after establishing the connection, we will get the observed
> behaviour.

Just verified that by dropping pure ACKs coming from the originally
listening TCP using iptables.

Regards
 Damian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3][v2] tcp: fix ICMP-RTO war
  2010-02-08 21:58     ` Damian Lukowski
@ 2010-02-09 12:37       ` Ilpo Järvinen
  2010-02-09 22:29         ` Damian Lukowski
  0 siblings, 1 reply; 6+ messages in thread
From: Ilpo Järvinen @ 2010-02-09 12:37 UTC (permalink / raw)
  To: Damian Lukowski; +Cc: David Miller, Netdev

On Mon, 8 Feb 2010, Damian Lukowski wrote:

> Damian Lukowski schrieb:
> > Am 01.02.2010, 08:33 Uhr, schrieb David Miller <davem@davemloft.net>:
> > 
> >> From: Damian Lukowski <damian@tvk.rwth-aachen.de>
> >> Date: Fri, 29 Jan 2010 23:15:51 +0100
> >>
> >>> This patches fix the current RTO calculation routine, when
> >>> srtt and rttvar are zero, yielding an RTO of zero
> >>> Under some circumstances, TCPs srtt and rttvar are zero,
> >>> yielding a calculated RTO of zero.
> >>> This is particularly unfortunate for ICMP based RTO recalculation
> >>> as introduced in f1ecd5d9e736660 (Revert Backoff [v3]: Revert RTO
> >>> on ICMP destination unreachable), as it results in RTO retransmission
> >>> flooding.
> >>>
> >>> Thanks to Ilpo Jarvinen for providing debug patches and to
> >>> Denys Fedoryshchenko for reporting and testing.
> >>>
> >>> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
> >>
> >> I still haven't seen a detailed enough analysis of why these
> >> tiny RTOs can come to exist in the first place.
> >>
> >> Please show me a list of events, function by function, the value of
> >> relevant variables and per-socket TCP state, in the TCP stack, that
> >> show how this ends up happening.
> >>
> >> Thanks for all of your work on this so far.
> > 
> > I might have figured it out, but could not verify it, so maybe you can
> > comment my thought.
> > 
> > When a listening TCP receives a SYN, it will send a SYN+ACK
> > and wait for an ACK to complete the handshake.
> > Look at tcp_rcv_state_process::step 5::case SYN_RECV::acceptable
> > and the code after the comment "tcp_ack considers this ACK as duplicate
> > and does not calculate rtt".
> > 
> > If the connecting client has disabled timestamps, the rtt statistics
> > won't be updated here, while the state is changed above.
> > I printk'ed at the very end of TCP_SYN_RECV and got the following:
> > state 1 (ESTABLISHED), srtt 0, rttvar 0.
> > 
> > So my suspicion is: If connectivity breaks right after a listening TCP
> > has completed the handshake without timestamps, and the listening TCP
> > sends data after establishing the connection, we will get the observed
> > behaviour.
> 
> Just verified that by dropping pure ACKs coming from the originally
> listening TCP using iptables.

Isn't that "tcp_ack considers" comment like this: we have bug elsewhere in 
the code but workaround it here, at least for some of the cases? (I'd put 
it other way around: but alas, only for part of the cases.) ...It sound 
like that to me. What exactly is the reason why rtt shouldn't be 
calculated/initialized on such ACK, anything I'm missing?

-- 
 i.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3][v2] tcp: fix ICMP-RTO war
  2010-02-09 12:37       ` Ilpo Järvinen
@ 2010-02-09 22:29         ` Damian Lukowski
  0 siblings, 0 replies; 6+ messages in thread
From: Damian Lukowski @ 2010-02-09 22:29 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Netdev

Am 09.02.2010, 13:37 Uhr, schrieb Ilpo Järvinen  
<ilpo.jarvinen@helsinki.fi>:

> On Mon, 8 Feb 2010, Damian Lukowski wrote:
>
>> Damian Lukowski schrieb:
>> > Am 01.02.2010, 08:33 Uhr, schrieb David Miller <davem@davemloft.net>:
>> >
>> >> From: Damian Lukowski <damian@tvk.rwth-aachen.de>
>> >> Date: Fri, 29 Jan 2010 23:15:51 +0100
>> >>
>> >>> This patches fix the current RTO calculation routine, when
>> >>> srtt and rttvar are zero, yielding an RTO of zero
>> >>> Under some circumstances, TCPs srtt and rttvar are zero,
>> >>> yielding a calculated RTO of zero.
>> >>> This is particularly unfortunate for ICMP based RTO recalculation
>> >>> as introduced in f1ecd5d9e736660 (Revert Backoff [v3]: Revert RTO
>> >>> on ICMP destination unreachable), as it results in RTO  
>> retransmission
>> >>> flooding.
>> >>>
>> >>> Thanks to Ilpo Jarvinen for providing debug patches and to
>> >>> Denys Fedoryshchenko for reporting and testing.
>> >>>
>> >>> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
>> >>
>> >> I still haven't seen a detailed enough analysis of why these
>> >> tiny RTOs can come to exist in the first place.
>> >>
>> >> Please show me a list of events, function by function, the value of
>> >> relevant variables and per-socket TCP state, in the TCP stack, that
>> >> show how this ends up happening.
>> >>
>> >> Thanks for all of your work on this so far.
>> >
>> > I might have figured it out, but could not verify it, so maybe you can
>> > comment my thought.
>> >
>> > When a listening TCP receives a SYN, it will send a SYN+ACK
>> > and wait for an ACK to complete the handshake.
>> > Look at tcp_rcv_state_process::step 5::case SYN_RECV::acceptable
>> > and the code after the comment "tcp_ack considers this ACK as  
>> duplicate
>> > and does not calculate rtt".
>> >
>> > If the connecting client has disabled timestamps, the rtt statistics
>> > won't be updated here, while the state is changed above.
>> > I printk'ed at the very end of TCP_SYN_RECV and got the following:
>> > state 1 (ESTABLISHED), srtt 0, rttvar 0.
>> >
>> > So my suspicion is: If connectivity breaks right after a listening TCP
>> > has completed the handshake without timestamps, and the listening TCP
>> > sends data after establishing the connection, we will get the observed
>> > behaviour.
>>
>> Just verified that by dropping pure ACKs coming from the originally
>> listening TCP using iptables.
>
> Isn't that "tcp_ack considers" comment like this: we have bug elsewhere  
> in
> the code but workaround it here, at least for some of the cases? (I'd put
> it other way around: but alas, only for part of the cases.) ...It sound
> like that to me. What exactly is the reason why rtt shouldn't be
> calculated/initialized on such ACK, anything I'm missing?


The RTT is extracted when traversing tcp_write_queue_head() in
tcp_clean_rtx_queue() but is null when the SYNACK is acknowledged,
so it may have seemed to be a harder issue for the commentator.
I also have been thinking a while how to obtain the timestamp of the
sent SYNACK, but there is no need for it.
We just need to call tcp_ack_no_tstamp(), which will set the
RTO to 3 seconds as defined in RFC 2988.

Regards
   Damian

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-02-09 22:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-29 22:15 [PATCH 0/3][v2] tcp: fix ICMP-RTO war Damian Lukowski
2010-02-01  7:33 ` David Miller
2010-02-08 12:34   ` Damian Lukowski
2010-02-08 21:58     ` Damian Lukowski
2010-02-09 12:37       ` Ilpo Järvinen
2010-02-09 22:29         ` Damian Lukowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).