scp stalls mysteriously

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* scp stalls mysteriously
@ 2009-11-27 21:35 Frederic Leroy
  2009-11-27 22:12 ` Ilpo Järvinen
  2009-11-27 22:22 ` Frederic Leroy
  0 siblings, 2 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-11-27 21:35 UTC (permalink / raw)
  To: Netdev; +Cc: Ilpo Järvinen, Asdo

Ilpo,

I put traces of stall here : 
http://www.starox.org/pub/scp_stall/

openssh versions : 

houba tmp # equery l openssh
[ Searching for package 'openssh' in all categories among: ]
 * installed packages
[I--] [  ] net-misc/openssh-5.2_p1-r3 (0)

fredo@sg:~$ dpkg -l | grep ssh
ii  libssh2-1                         0.18-1                    SSH2 client-side library
ii  openssh-blacklist                 0.4.1                     list of default blacklisted OpenSSH RSA and 
ii  openssh-client                    1:5.1p1-5                 secure shell client, an rlogin/rsh/rcp repla
ii  openssh-server                    1:5.1p1-5                 secure shell server, an rshd replacement
ii  ssh                               1:5.1p1-5                 secure shell client and server (metapackage)

I will downgrade gentoo ssh and give another try ...

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-27 21:35 scp stalls mysteriously Frederic Leroy
@ 2009-11-27 22:12 ` Ilpo Järvinen
       [not found]   ` <20091128010156.1219012a@houba>
  2009-11-27 22:22 ` Frederic Leroy
  1 sibling, 1 reply; 63+ messages in thread
From: Ilpo Järvinen @ 2009-11-27 22:12 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

On Fri, 27 Nov 2009, Frederic Leroy wrote:

> Ilpo,
> 
> I put traces of stall here : 
> http://www.starox.org/pub/scp_stall/
> 
> openssh versions : 
> 
> houba tmp # equery l openssh
> [ Searching for package 'openssh' in all categories among: ]
>  * installed packages
> [I--] [  ] net-misc/openssh-5.2_p1-r3 (0)
> 
> fredo@sg:~$ dpkg -l | grep ssh
> ii  libssh2-1                         0.18-1                    SSH2 client-side library
> ii  openssh-blacklist                 0.4.1                     list of default blacklisted OpenSSH RSA and 
> ii  openssh-client                    1:5.1p1-5                 secure shell client, an rlogin/rsh/rcp repla
> ii  openssh-server                    1:5.1p1-5                 secure shell server, an rshd replacement
> ii  ssh                               1:5.1p1-5                 secure shell client and server (metapackage)
> 
> I will downgrade gentoo ssh and give another try ...

...I think we can drop ssh from suspect lists for now.

Your proc/net/tcp capture on houba was perhaps made too late? ...The 
connection is missing already.

But anyway, at least the problem is visible... It seems that 3998:4046 
gets never retransmitted, not even by RTO which seems very very strange
to me... And after this: 23:21:56.154269 IP 192.168.1.19.50028 > 
192.168.1.15.22: . ack 3998 win 379 ... sack 3 
{4238:4286}{4142:4190}{4046:4094}> also fast retransmit should have 
already triggered. ...I'll look more into this if I can figure it out 
from the current traces but it'll take a while.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-27 21:35 scp stalls mysteriously Frederic Leroy
  2009-11-27 22:12 ` Ilpo Järvinen
@ 2009-11-27 22:22 ` Frederic Leroy
  2009-11-27 22:28   ` Ilpo Järvinen
  2009-11-27 22:30   ` Ilpo Järvinen
  1 sibling, 2 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-11-27 22:22 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Ilpo Järvinen, Asdo

Le Fri, 27 Nov 2009 22:35:45 +0100,
Frederic Leroy <fredo@starox.org> a écrit :
> openssh versions : 
> 
> houba tmp # equery l openssh
> [ Searching for package 'openssh' in all categories among: ]
>  * installed packages
> [I--] [  ] net-misc/openssh-5.2_p1-r3 (0)
> 
> fredo@sg:~$ dpkg -l | grep ssh
> ii  libssh2-1                         0.18-1                    SSH2
> client-side library ii  openssh-blacklist
> 0.4.1                     list of default blacklisted OpenSSH RSA and
> ii  openssh-client                    1:5.1p1-5
> secure shell client, an rlogin/rsh/rcp repla ii
> openssh-server                    1:5.1p1-5                 secure
> shell server, an rshd replacement ii
> ssh                               1:5.1p1-5                 secure
> shell client and server (metapackage)
> 
> I will downgrade gentoo ssh and give another try ...

Argh !!! 

I emerged openssh-5.1_p1-r2 on houba.
Restarted ssh server (/etc/init.d/sshd restart)
Log on sg and make a scp as usual. 
It worked fine.
Make a second copy, and It worked fine also.

Then, I emerged openssh-5.2_p1-r3 on houba, restarted sshd, make three
big copies ... and finally it worked fine. 

The bug mysteriously escaped from the dungeon !

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-27 22:22 ` Frederic Leroy
@ 2009-11-27 22:28   ` Ilpo Järvinen
  2009-11-27 22:30   ` Ilpo Järvinen
  1 sibling, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-11-27 22:28 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1557 bytes --]

On Fri, 27 Nov 2009, Frederic Leroy wrote:

> Le Fri, 27 Nov 2009 22:35:45 +0100,
> Frederic Leroy <fredo@starox.org> a écrit :
> > openssh versions : 
> > 
> > houba tmp # equery l openssh
> > [ Searching for package 'openssh' in all categories among: ]
> >  * installed packages
> > [I--] [  ] net-misc/openssh-5.2_p1-r3 (0)
> > 
> > fredo@sg:~$ dpkg -l | grep ssh
> > ii  libssh2-1                         0.18-1                    SSH2
> > client-side library ii  openssh-blacklist
> > 0.4.1                     list of default blacklisted OpenSSH RSA and
> > ii  openssh-client                    1:5.1p1-5
> > secure shell client, an rlogin/rsh/rcp repla ii
> > openssh-server                    1:5.1p1-5                 secure
> > shell server, an rshd replacement ii
> > ssh                               1:5.1p1-5                 secure
> > shell client and server (metapackage)
> > 
> > I will downgrade gentoo ssh and give another try ...
> 
> Argh !!! 
> 
> I emerged openssh-5.1_p1-r2 on houba.
> Restarted ssh server (/etc/init.d/sshd restart)
> Log on sg and make a scp as usual. 
> It worked fine.
> Make a second copy, and It worked fine also.
> 
> Then, I emerged openssh-5.2_p1-r3 on houba, restarted sshd, make three
> big copies ... and finally it worked fine. 
> 
> The bug mysteriously escaped from the dungeon !

...You probably just lacked the losses which are crucial so it certainly 
hasn't escaped anywhere (I guess that due to some random factor you didn't 
get the losses this time and all of course will then work).

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-27 22:22 ` Frederic Leroy
  2009-11-27 22:28   ` Ilpo Järvinen
@ 2009-11-27 22:30   ` Ilpo Järvinen
  1 sibling, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-11-27 22:30 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1448 bytes --]

On Fri, 27 Nov 2009, Frederic Leroy wrote:

> Le Fri, 27 Nov 2009 22:35:45 +0100,
> Frederic Leroy <fredo@starox.org> a écrit :
> > openssh versions : 
> > 
> > houba tmp # equery l openssh
> > [ Searching for package 'openssh' in all categories among: ]
> >  * installed packages
> > [I--] [  ] net-misc/openssh-5.2_p1-r3 (0)
> > 
> > fredo@sg:~$ dpkg -l | grep ssh
> > ii  libssh2-1                         0.18-1                    SSH2
> > client-side library ii  openssh-blacklist
> > 0.4.1                     list of default blacklisted OpenSSH RSA and
> > ii  openssh-client                    1:5.1p1-5
> > secure shell client, an rlogin/rsh/rcp repla ii
> > openssh-server                    1:5.1p1-5                 secure
> > shell server, an rshd replacement ii
> > ssh                               1:5.1p1-5                 secure
> > shell client and server (metapackage)
> > 
> > I will downgrade gentoo ssh and give another try ...
> 
> Argh !!! 
> 
> I emerged openssh-5.1_p1-r2 on houba.
> Restarted ssh server (/etc/init.d/sshd restart)
> Log on sg and make a scp as usual. 
> It worked fine.
> Make a second copy, and It worked fine also.
> 
> Then, I emerged openssh-5.2_p1-r3 on houba, restarted sshd, make three
> big copies ... and finally it worked fine. 
> 
> The bug mysteriously escaped from the dungeon !

...Anyway, I'll probably try to reproduce tomorrow myself based on the 
patterns I see in your logs.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
       [not found]   ` <20091128010156.1219012a@houba>
@ 2009-11-28 11:31     ` Ilpo Järvinen
  2009-11-29 22:13       ` Ilpo Järvinen
  2009-11-30 18:50       ` Frederic Leroy
  0 siblings, 2 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-11-28 11:31 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2071 bytes --]

I restored Ccs. Please keep them.

On Sat, 28 Nov 2009, Frederic Leroy wrote:

> Le Sat, 28 Nov 2009 00:12:23 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > On Fri, 27 Nov 2009, Frederic Leroy wrote:
> > 
> > > I put traces of stall here : 
> > > http://www.starox.org/pub/scp_stall/
> >
> > Your proc/net/tcp capture on houba was perhaps made too late? ...The 
> > connection is missing already.
> 
> It could be ! I had a doubt while using my 2 keyboards ... 
> 
> For information for the pcaps, I filtered and used "tcpdump ... ether
> host xx:xx:xx:xx:xx"
> I waited a bit after the stall and kill the scp with ctrl-c.
> 
> > But anyway, at least the problem is visible...
> 
> Great!
> 
> > It seems that
> > 3998:4046 gets never retransmitted, not even by RTO which seems very
> > very strange to me... And after this: 23:21:56.154269 IP
> > 192.168.1.19.50028 > 192.168.1.15.22: . ack 3998 win 379 ... sack 3 
> > {4238:4286}{4142:4190}{4046:4094}> also fast retransmit should have 
> > already triggered. ...I'll look more into this if I can figure it out 
> > from the current traces but it'll take a while.
> 
> Can it help you, if I make other traces ?
> 
> I won't be available until monday.

Perhaps having the /proc/net/tcp would at least tell what state the timer 
is (if I cannot reproduce right away). ...It is rather strange that two 
independent mechanisms for loss recovery seem both to fail to get 
triggered here, no traces of retransmission whatsoever. I think it is for 
now enough to concentrate on what happens on 192.168.1.15 (=houba?) and 
get tcpdump and proc/net/tcp from there, the other end/direction has very 
little significance here (except for the fact that bidirectionality might 
be needed to actually trigger it). You could even think of getting 
proc/net/tcp a bit more often, right from the start:

while [ : ]; do grep ":0016" /proc/net/tcp; sleep 0.1; done | tee scp_stall-houba.x.proc_net_tcp

...Please wait at least 2 minutes before hitting ctrl-c or otherwise 
artificially intervening.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-28 11:31     ` Ilpo Järvinen
@ 2009-11-29 22:13       ` Ilpo Järvinen
  2009-11-30 18:50       ` Frederic Leroy
  1 sibling, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-11-29 22:13 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2569 bytes --]

On Sat, 28 Nov 2009, Ilpo Järvinen wrote:

> I restored Ccs. Please keep them.
> 
> On Sat, 28 Nov 2009, Frederic Leroy wrote:
> 
> > Le Sat, 28 Nov 2009 00:12:23 +0200 (EET),
> > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> > 
> > > On Fri, 27 Nov 2009, Frederic Leroy wrote:
> > > 
> > > > I put traces of stall here : 
> > > > http://www.starox.org/pub/scp_stall/
> > >
> > > Your proc/net/tcp capture on houba was perhaps made too late? ...The 
> > > connection is missing already.
> > 
> > It could be ! I had a doubt while using my 2 keyboards ... 
> > 
> > For information for the pcaps, I filtered and used "tcpdump ... ether
> > host xx:xx:xx:xx:xx"
> > I waited a bit after the stall and kill the scp with ctrl-c.
> > 
> > > But anyway, at least the problem is visible...
> > 
> > Great!
> > 
> > > It seems that
> > > 3998:4046 gets never retransmitted, not even by RTO which seems very
> > > very strange to me... And after this: 23:21:56.154269 IP
> > > 192.168.1.19.50028 > 192.168.1.15.22: . ack 3998 win 379 ... sack 3 
> > > {4238:4286}{4142:4190}{4046:4094}> also fast retransmit should have 
> > > already triggered. ...I'll look more into this if I can figure it out 
> > > from the current traces but it'll take a while.
> > 
> > Can it help you, if I make other traces ?
> > 
> > I won't be available until monday.
> 
> Perhaps having the /proc/net/tcp would at least tell what state the timer 
> is (if I cannot reproduce right away). ...It is rather strange that two 
> independent mechanisms for loss recovery seem both to fail to get 
> triggered here, no traces of retransmission whatsoever. I think it is for 
> now enough to concentrate on what happens on 192.168.1.15 (=houba?) and 
> get tcpdump and proc/net/tcp from there, the other end/direction has very 
> little significance here (except for the fact that bidirectionality might 
> be needed to actually trigger it). You could even think of getting 
> proc/net/tcp a bit more often, right from the start:
> 
> while [ : ]; do grep ":0016" /proc/net/tcp; sleep 0.1; done | tee scp_stall-houba.x.proc_net_tcp
> 
> ...Please wait at least 2 minutes before hitting ctrl-c or otherwise 
> artificially intervening.

So far no luck in reproducing the exactly same scenario as you do, 
however, I'm currently solving another problem I found related to excess 
growth in RTT estimator which is enough for me to get a temporal, but 
long-lasting, - stalled - with scp (that growth happens only with 
timestamps so if I disable them I've better success with the transfer).

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-28 11:31     ` Ilpo Järvinen
  2009-11-29 22:13       ` Ilpo Järvinen
@ 2009-11-30 18:50       ` Frederic Leroy
  2009-11-30 19:10         ` Ilpo Järvinen
  2009-11-30 20:18         ` Ilpo Järvinen
  1 sibling, 2 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-11-30 18:50 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, Asdo

Le Sat, 28 Nov 2009 13:31:14 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> I restored Ccs. Please keep them.

Sorry, 

> On Sat, 28 Nov 2009, Frederic Leroy wrote:
> 
> > Le Sat, 28 Nov 2009 00:12:23 +0200 (EET),
> > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> > 
> > > On Fri, 27 Nov 2009, Frederic Leroy wrote:
> > > 
> > > > I put traces of stall here : 
> > > > http://www.starox.org/pub/scp_stall/
> [...]
> Perhaps having the /proc/net/tcp would at least tell what state the
> timer is (if I cannot reproduce right away). ...It is rather strange
> that two independent mechanisms for loss recovery seem both to fail
> to get triggered here, no traces of retransmission whatsoever. I
> think it is for now enough to concentrate on what happens on
> 192.168.1.15 (=houba?) and get tcpdump and proc/net/tcp from there,
> the other end/direction has very little significance here (except for
> the fact that bidirectionality might be needed to actually trigger
> it). You could even think of getting proc/net/tcp a bit more often,
> right from the start:
> 
> while [ : ]; do grep ":0016" /proc/net/tcp; sleep 0.1; done | tee
> scp_stall-houba.x.proc_net_tcp
> 
> ...Please wait at least 2 minutes before hitting ctrl-c or otherwise 
> artificially intervening.

You're right, 192.168.1.15 is houba.
The faulty behaviour comes back today, I did new captures on
192.168.1.15 only : 
http://www.starox.org/pub/scp_stall/scp_stall-houba.2.pcap
http://www.starox.org/pub/scp_stall/scp_stall-houba.2.proc_net_tcp
http://www.starox.org/pub/scp_stall/scp_stall-houba.3.pcap
http://www.starox.org/pub/scp_stall/scp_stall-houba.3.proc_net_tcp

Which tool do you use to analyze /proc/net/tcp ?

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-30 18:50       ` Frederic Leroy
@ 2009-11-30 19:10         ` Ilpo Järvinen
  2009-11-30 20:18         ` Ilpo Järvinen
  1 sibling, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-11-30 19:10 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2004 bytes --]

On Mon, 30 Nov 2009, Frederic Leroy wrote:

> Le Sat, 28 Nov 2009 13:31:14 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > On Sat, 28 Nov 2009, Frederic Leroy wrote:
> > 
> > > Le Sat, 28 Nov 2009 00:12:23 +0200 (EET),
> > > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> > > 
> > > > On Fri, 27 Nov 2009, Frederic Leroy wrote:
> > > > 
> > > > > I put traces of stall here : 
> > > > > http://www.starox.org/pub/scp_stall/
> > [...]
> > Perhaps having the /proc/net/tcp would at least tell what state the
> > timer is (if I cannot reproduce right away). ...It is rather strange
> > that two independent mechanisms for loss recovery seem both to fail
> > to get triggered here, no traces of retransmission whatsoever. I
> > think it is for now enough to concentrate on what happens on
> > 192.168.1.15 (=houba?) and get tcpdump and proc/net/tcp from there,
> > the other end/direction has very little significance here (except for
> > the fact that bidirectionality might be needed to actually trigger
> > it). You could even think of getting proc/net/tcp a bit more often,
> > right from the start:
> > 
> > while [ : ]; do grep ":0016" /proc/net/tcp; sleep 0.1; done | tee
> > scp_stall-houba.x.proc_net_tcp
> > 
> > ...Please wait at least 2 minutes before hitting ctrl-c or otherwise 
> > artificially intervening.
> 
> You're right, 192.168.1.15 is houba.
> The faulty behaviour comes back today, I did new captures on
> 192.168.1.15 only : 
> http://www.starox.org/pub/scp_stall/scp_stall-houba.2.pcap
> http://www.starox.org/pub/scp_stall/scp_stall-houba.2.proc_net_tcp
> http://www.starox.org/pub/scp_stall/scp_stall-houba.3.pcap
> http://www.starox.org/pub/scp_stall/scp_stall-houba.3.proc_net_tcp

Thanks, I'll take a look.

> Which tool do you use to analyze /proc/net/tcp ?

I just recall the order of fields from net/ipv4/tcp_ipv4.c and apply my 
knowledge about TCP internals from there on. ...I've no particularly fancy 
tools for the job.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-30 18:50       ` Frederic Leroy
  2009-11-30 19:10         ` Ilpo Järvinen
@ 2009-11-30 20:18         ` Ilpo Järvinen
  2009-11-30 20:37           ` Frederic Leroy
  2009-11-30 21:24           ` Frederic Leroy
  1 sibling, 2 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-11-30 20:18 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3284 bytes --]

On Mon, 30 Nov 2009, Frederic Leroy wrote:

> Le Sat, 28 Nov 2009 13:31:14 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > On Sat, 28 Nov 2009, Frederic Leroy wrote:
> > 
> > > Le Sat, 28 Nov 2009 00:12:23 +0200 (EET),
> > > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> > > 
> > > > On Fri, 27 Nov 2009, Frederic Leroy wrote:
> > > > 
> > > > > I put traces of stall here : 
> > > > > http://www.starox.org/pub/scp_stall/
> > [...]
> > Perhaps having the /proc/net/tcp would at least tell what state the
> > timer is (if I cannot reproduce right away). ...It is rather strange
> > that two independent mechanisms for loss recovery seem both to fail
> > to get triggered here, no traces of retransmission whatsoever. I
> > think it is for now enough to concentrate on what happens on
> > 192.168.1.15 (=houba?) and get tcpdump and proc/net/tcp from there,
> > the other end/direction has very little significance here (except for
> > the fact that bidirectionality might be needed to actually trigger
> > it). You could even think of getting proc/net/tcp a bit more often,
> > right from the start:
> > 
> > while [ : ]; do grep ":0016" /proc/net/tcp; sleep 0.1; done | tee
> > scp_stall-houba.x.proc_net_tcp
> > 
> > ...Please wait at least 2 minutes before hitting ctrl-c or otherwise 
> > artificially intervening.
> 
> You're right, 192.168.1.15 is houba.
> The faulty behaviour comes back today, I did new captures on
> 192.168.1.15 only : 
> http://www.starox.org/pub/scp_stall/scp_stall-houba.2.pcap
> http://www.starox.org/pub/scp_stall/scp_stall-houba.2.proc_net_tcp
> http://www.starox.org/pub/scp_stall/scp_stall-houba.3.pcap
> http://www.starox.org/pub/scp_stall/scp_stall-houba.3.proc_net_tcp
> 
> Which tool do you use to analyze /proc/net/tcp ?

Thanks, at least ....3.... shows that cwnd (congestion window, the 2nd to 
last field in /proc/net/tcp) goes to zero which it never ever should. 
...I'll start reading some code to figure out the cause for that.

How much is your RTT btw? The RTO timeout easily goes up to 120 secods 
similar to what I found myself already. Thus I'm not sure if this stall 
would be "infinite" after all.

I've some problem in .3. case in  matching timing of the proc/net/tcp dump 
and the tcpdump ... did you perhaps cut it somewhere? This might point out 
to yet another problem, however, I'm not sure what I can atm trust in the 
logs. ...Timestamps would probably help me a bit (I've changed also sleep 
to do 0.01 granularity):

while [ : ]; do date "+%s.%N"; grep ":0016" /proc/net/tcp; sleep 0.01; done | tee scp_stall-houba.x.proc_net_tcp

(Please run start while loop before starting the transfer and all the way 
until the connection is already killed and only then stop the loop).

What happened in .2. case I've very little idea, proc/net/tcp does claim 
retransmission would have happened on TCP layer but that never went into 
tcpdump log (the kind of thing might have happened in the .3. case too 
which is why I asked for the timestamps for matching captures more exactly 
but there were some retransmissions earlier in the log so I cannot be 
sure). Can you also check that tcpdump doesn't lose any packets, it 
should show: "0 packets dropped by kernel" when you ctrl-c it.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-30 20:18         ` Ilpo Järvinen
@ 2009-11-30 20:37           ` Frederic Leroy
  2009-11-30 21:37             ` Ilpo Järvinen
  2009-11-30 21:24           ` Frederic Leroy
  1 sibling, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-11-30 20:37 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, Asdo

Le Mon, 30 Nov 2009 22:18:22 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> On Mon, 30 Nov 2009, Frederic Leroy wrote:
> 
> Thanks, at least ....3.... shows that cwnd (congestion window, the
> 2nd to last field in /proc/net/tcp) goes to zero which it never ever
> should. ...I'll start reading some code to figure out the cause for
> that.
> 
> How much is your RTT btw?

The RTT is ~ 1.4ms.
The following trace has a rtt as big as 5ms but most of the time, it's
under 1.4ms and periodically grow to 3ms. 

-------------------------------------------
64 bytes from sg (192.168.1.19): icmp_seq=681 ttl=64 time=1.26 ms
64 bytes from sg (192.168.1.19): icmp_seq=682 ttl=64 time=1.26 ms
64 bytes from sg (192.168.1.19): icmp_seq=683 ttl=64 time=1.34 ms
64 bytes from sg (192.168.1.19): icmp_seq=684 ttl=64 time=2.01 ms
64 bytes from sg (192.168.1.19): icmp_seq=686 ttl=64 time=2.15 ms
64 bytes from sg (192.168.1.19): icmp_seq=687 ttl=64 time=5.07 ms
64 bytes from sg (192.168.1.19): icmp_seq=688 ttl=64 time=1.19 ms
64 bytes from sg (192.168.1.19): icmp_seq=689 ttl=64 time=1.30 ms
64 bytes from sg (192.168.1.19): icmp_seq=690 ttl=64 time=1.19 ms
64 bytes from sg (192.168.1.19): icmp_seq=691 ttl=64 time=1.14 ms
-------------------------------------------

> The RTO timeout easily goes up to 120
> secods similar to what I found myself already. Thus I'm not sure if
> this stall would be "infinite" after all.

I notice that grep '":0016" /proc/net/tcp' stop to report something when
it stalls. Is the connection closed ?
 
> I've some problem in .3. case in  matching timing of the proc/net/tcp
> dump and the tcpdump ... did you perhaps cut it somewhere?

No, I didn't cut it.

> This might
> point out to yet another problem, however, I'm not sure what I can
> atm trust in the logs. ...Timestamps would probably help me a bit
> (I've changed also sleep to do 0.01 granularity):
> 
> while [ : ]; do date "+%s.%N"; grep ":0016" /proc/net/tcp; sleep
> 0.01; done | tee scp_stall-houba.x.proc_net_tcp
 
> (Please run start while loop before starting the transfer and all the
> way until the connection is already killed and only then stop the
> loop).

I took care to timings this time. I did as you said.

> What happened in .2. case I've very little idea, proc/net/tcp does
> claim retransmission would have happened on TCP layer but that never
> went into tcpdump log (the kind of thing might have happened in
> the .3. case too which is why I asked for the timestamps for matching
> captures more exactly but there were some retransmissions earlier in
> the log so I cannot be sure). Can you also check that tcpdump doesn't
> lose any packets, it should show: "0 packets dropped by kernel" when
> you ctrl-c it.

I still have the logs, O packet dropped : 

-------------------------------------------
houba tmp # tcpdump -i wlan1 -s 1500 -w /tmp/ssh_stall-houba.1.pcap -vv
ether host 00:40:45:09:3f:3f tcpdump: listening on wlan1, link-type
EN10MB (Ethernet), capture size 1500 bytes ^C8923 packets captured
8923 packets received by filter
0 packets dropped by kernel
houba tmp # tcpdump -i wlan1 -s 1500 -w /tmp/scp_stall-houba.3.pcap
-vv ether host 00:40:45:09:3f:3f tcpdump: listening on wlan1, link-type
EN10MB (Ethernet), capture size 1500 bytes ^C17045 packets captured
17045 packets received by filter
0 packets dropped by kernel
houba tmp # 
-------------------------------------------


-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-30 20:18         ` Ilpo Järvinen
  2009-11-30 20:37           ` Frederic Leroy
@ 2009-11-30 21:24           ` Frederic Leroy
  2009-11-30 21:26             ` Ilpo Järvinen
  1 sibling, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-11-30 21:24 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, Asdo

Le Mon, 30 Nov 2009 22:18:22 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> On Mon, 30 Nov 2009, Frederic Leroy wrote:
> 
> > Le Sat, 28 Nov 2009 13:31:14 +0200 (EET),
> > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> point out to yet another problem, however, I'm not sure what I can
> atm trust in the logs. ...Timestamps would probably help me a bit
> (I've changed also sleep to do 0.01 granularity):

I made 2 other traces with 0.01 granularity, but it works normally. I
suppose there are less interference in the evening.

The .4 is a bad trace where I forgot to put timestamp. And when I
realized it was copying successfully, I stopped it before it tooks too
much space.

It can be interesting because the stream had a little "pause".

The .5 is the trace of a successfull copy. I limited packet capture size
to 96 bytes. It has stalled once in the middle.

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-30 21:24           ` Frederic Leroy
@ 2009-11-30 21:26             ` Ilpo Järvinen
  2009-11-30 21:54               ` Frederic Leroy
  0 siblings, 1 reply; 63+ messages in thread
From: Ilpo Järvinen @ 2009-11-30 21:26 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1072 bytes --]

On Mon, 30 Nov 2009, Frederic Leroy wrote:

> Le Mon, 30 Nov 2009 22:18:22 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > On Mon, 30 Nov 2009, Frederic Leroy wrote:
> > 
> > > Le Sat, 28 Nov 2009 13:31:14 +0200 (EET),
> > > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> > point out to yet another problem, however, I'm not sure what I can
> > atm trust in the logs. ...Timestamps would probably help me a bit
> > (I've changed also sleep to do 0.01 granularity):
> 
> I made 2 other traces with 0.01 granularity, but it works normally. I
> suppose there are less interference in the evening.
> 
> The .4 is a bad trace where I forgot to put timestamp. And when I
> realized it was copying successfully, I stopped it before it tooks too
> much space.
> 
> It can be interesting because the stream had a little "pause".
> 
> The .5 is the trace of a successfull copy. I limited packet capture size
> to 96 bytes. It has stalled once in the middle.

Could you compress such big ones. ...I only have 1Mbps downlink here :-).

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-30 20:37           ` Frederic Leroy
@ 2009-11-30 21:37             ` Ilpo Järvinen
  2009-11-30 22:19               ` Frederic Leroy
  2009-12-01 20:19               ` Frederic Leroy
  0 siblings, 2 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-11-30 21:37 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4574 bytes --]

On Mon, 30 Nov 2009, Frederic Leroy wrote:

> Le Mon, 30 Nov 2009 22:18:22 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > On Mon, 30 Nov 2009, Frederic Leroy wrote:
> > 
> > Thanks, at least ....3.... shows that cwnd (congestion window, the
> > 2nd to last field in /proc/net/tcp) goes to zero which it never ever
> > should. ...I'll start reading some code to figure out the cause for
> > that.
> > 
> > How much is your RTT btw?
> 
> The RTT is ~ 1.4ms.
> The following trace has a rtt as big as 5ms but most of the time, it's
> under 1.4ms and periodically grow to 3ms. 
> 
> -------------------------------------------
> 64 bytes from sg (192.168.1.19): icmp_seq=681 ttl=64 time=1.26 ms
> 64 bytes from sg (192.168.1.19): icmp_seq=682 ttl=64 time=1.26 ms
> 64 bytes from sg (192.168.1.19): icmp_seq=683 ttl=64 time=1.34 ms
> 64 bytes from sg (192.168.1.19): icmp_seq=684 ttl=64 time=2.01 ms
> 64 bytes from sg (192.168.1.19): icmp_seq=686 ttl=64 time=2.15 ms
> 64 bytes from sg (192.168.1.19): icmp_seq=687 ttl=64 time=5.07 ms
> 64 bytes from sg (192.168.1.19): icmp_seq=688 ttl=64 time=1.19 ms
> 64 bytes from sg (192.168.1.19): icmp_seq=689 ttl=64 time=1.30 ms
> 64 bytes from sg (192.168.1.19): icmp_seq=690 ttl=64 time=1.19 ms
> 64 bytes from sg (192.168.1.19): icmp_seq=691 ttl=64 time=1.14 ms
> -------------------------------------------

Thanks, quite much what I expected (a low one).

> > The RTO timeout easily goes up to 120
> > secods similar to what I found myself already. Thus I'm not sure if
> > this stall would be "infinite" after all.
> 
> I notice that grep '":0016" /proc/net/tcp' stop to report something when
> it stalls. Is the connection closed ?

...that seems possible because of the reset in the end (however, the 
earlier fin in .3. does not get reset as response for some reason). 
Another option is that the socket is locked so long that grep has to wait 
until read succeeds.

...I might have been wrong with this 120 seconds thing btw, I misread the 
units.

> > I've some problem in .3. case in  matching timing of the proc/net/tcp
> > dump and the tcpdump ... did you perhaps cut it somewhere?
> 
> No, I didn't cut it.

Ok.

> > This might
> > point out to yet another problem, however, I'm not sure what I can
> > atm trust in the logs. ...Timestamps would probably help me a bit
> > (I've changed also sleep to do 0.01 granularity):
> > 
> > while [ : ]; do date "+%s.%N"; grep ":0016" /proc/net/tcp; sleep
> > 0.01; done | tee scp_stall-houba.x.proc_net_tcp
>  
> > (Please run start while loop before starting the transfer and all the
> > way until the connection is already killed and only then stop the
> > loop).
> 
> I took care to timings this time. I did as you said.
> 
> > What happened in .2. case I've very little idea, proc/net/tcp does
> > claim retransmission would have happened on TCP layer but that never
> > went into tcpdump log (the kind of thing might have happened in
> > the .3. case too which is why I asked for the timestamps for matching
> > captures more exactly but there were some retransmissions earlier in
> > the log so I cannot be sure). Can you also check that tcpdump doesn't
> > lose any packets, it should show: "0 packets dropped by kernel" when
> > you ctrl-c it.
> 
> I still have the logs, O packet dropped : 
> 
> -------------------------------------------
> houba tmp # tcpdump -i wlan1 -s 1500 -w /tmp/ssh_stall-houba.1.pcap -vv
> ether host 00:40:45:09:3f:3f tcpdump: listening on wlan1, link-type
> EN10MB (Ethernet), capture size 1500 bytes ^C8923 packets captured
> 8923 packets received by filter
> 0 packets dropped by kernel
> houba tmp # tcpdump -i wlan1 -s 1500 -w /tmp/scp_stall-houba.3.pcap
> -vv ether host 00:40:45:09:3f:3f tcpdump: listening on wlan1, link-type
> EN10MB (Ethernet), capture size 1500 bytes ^C17045 packets captured
> 17045 packets received by filter
> 0 packets dropped by kernel
> houba tmp # 

Ok. Just wanted to confirm since it can be that that not all packets are 
in the tcpdump that TCP was sending. For what reason is unknown at the 
moment. Any fancy firewall (iptables) rules on houba that could do such a 
trick here? Also, have you changed sysctl settings from the defaults?

Could you try btw to disable frto (set tcp_frto sysctl to zero), in theory 
it could cause zero cwnd if something else is wrong too (I haven't yet 
read them all through so there might be other places where a zero cwnd 
can be set). Also I'm interested what happens if you disable timestamps, 
is the problem still reproducable.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-30 21:26             ` Ilpo Järvinen
@ 2009-11-30 21:54               ` Frederic Leroy
  0 siblings, 0 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-11-30 21:54 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, Asdo

Le Mon, 30 Nov 2009 23:26:55 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> On Mon, 30 Nov 2009, Frederic Leroy wrote:
> 
> > Le Mon, 30 Nov 2009 22:18:22 +0200 (EET),
> > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> [...]
> > The .5 is the trace of a successfull copy. I limited packet capture
> > size to 96 bytes. It has stalled once in the middle.
> 
> Could you compress such big ones. ...I only have 1Mbps downlink
> here :-).

Done ... and will do for the next :) 

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-30 21:37             ` Ilpo Järvinen
@ 2009-11-30 22:19               ` Frederic Leroy
  2009-12-01 20:19               ` Frederic Leroy
  1 sibling, 0 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-11-30 22:19 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, Asdo

Le Mon, 30 Nov 2009 23:37:56 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> Ok. Just wanted to confirm since it can be that that not all packets
> are in the tcpdump that TCP was sending. For what reason is unknown
> at the moment. Any fancy firewall (iptables) rules on houba that
> could do such a trick here?

Nothing with netfilter or schedulers on houba, neither on sg ( no log
because no iproute installed ...)
http://www.starox.org/pub/scp_stall/houba.netfilter.log

> Also, have you changed sysctl settings from the defaults?

No but I could have change defaults in the linux config file.

I saved the current content of /proc/sys/net and proc/net : 
http://www.starox.org/pub/scp_stall/sg-proc_net-proc_sys_net.tbz2
http://www.starox.org/pub/scp_stall/houba-proc_net-proc_sys_net.tbz2

I put also linux config files in the directory.

> Could you try btw to disable frto (set tcp_frto sysctl to zero), in
> theory it could cause zero cwnd if something else is wrong too (I
> haven't yet read them all through so there might be other places
> where a zero cwnd can be set). Also I'm interested what happens if
> you disable timestamps, is the problem still reproducable.
 
Ok, I'll try these tomorrow when it will fault again :)

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-11-30 21:37             ` Ilpo Järvinen
  2009-11-30 22:19               ` Frederic Leroy
@ 2009-12-01 20:19               ` Frederic Leroy
  2009-12-01 20:27                 ` Ilpo Järvinen
  1 sibling, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-12-01 20:19 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, Asdo

Le Mon, 30 Nov 2009 23:37:56 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> On Mon, 30 Nov 2009, Frederic Leroy wrote:
>
> Could you try btw to disable frto (set tcp_frto sysctl to zero), in
> theory it could cause zero cwnd if something else is wrong too (I
> haven't yet read them all through so there might be other places
> where a zero cwnd can be set). Also I'm interested what happens if
> you disable timestamps, is the problem still reproducable.

I made 3 new traces and put it in the usual place.
One with frto set to 0, one just after with frto set to 2.

And the last with timestamp disabled. 

All stream has stalled.

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-01 20:19               ` Frederic Leroy
@ 2009-12-01 20:27                 ` Ilpo Järvinen
  2009-12-02  7:59                   ` Frederic Leroy
  0 siblings, 1 reply; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-01 20:27 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1178 bytes --]

On Tue, 1 Dec 2009, Frederic Leroy wrote:

> Le Mon, 30 Nov 2009 23:37:56 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > On Mon, 30 Nov 2009, Frederic Leroy wrote:
> >
> > Could you try btw to disable frto (set tcp_frto sysctl to zero), in
> > theory it could cause zero cwnd if something else is wrong too (I
> > haven't yet read them all through so there might be other places
> > where a zero cwnd can be set). Also I'm interested what happens if
> > you disable timestamps, is the problem still reproducable.
> 
> I made 3 new traces and put it in the usual place.
> One with frto set to 0, one just after with frto set to 2.
> 
> And the last with timestamp disabled. 
> 
> All stream has stalled.

Thanks, I'll hopefully have some time tomorrow to go through all the new 
ones. In the meantime, can you provide your tso configuration on houba 
(ethtool -k dev). If tso is on it might be worth to try without it 
(set with -K), however, I don't expect it to change anything. Also,
I wonder if you have had some working (past) kernel version?

No need for additional traces atm, I'll get back to this once I've gone 
through the logs.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-01 20:27                 ` Ilpo Järvinen
@ 2009-12-02  7:59                   ` Frederic Leroy
  2009-12-02 12:59                     ` Ilpo Järvinen
  0 siblings, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-12-02  7:59 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, Asdo

Le Tue, 1 Dec 2009 22:27:01 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> On Tue, 1 Dec 2009, Frederic Leroy wrote:
>
> Thanks, I'll hopefully have some time tomorrow to go through all the
> new ones. In the meantime, can you provide your tso configuration on
> houba (ethtool -k dev).

No tso on the interface : 

houba ~ # ethtool -k wlan1
Offload parameters for wlan1:
Cannot get device rx csum settings: Operation not supported
Cannot get device tx csum settings: Operation not supported
Cannot get device scatter-gather settings: Operation not supported
Cannot get device tcp segmentation offload settings: Operation not
supported Cannot get device udp large send offload settings: Operation
not supported Cannot get device generic segmentation offload settings:
Operation not supported no offload info available

> Also,
> I wonder if you have had some working (past) kernel version?

I tried a 2.6.31, and it seems to work. If it really work (connection
don't stall everytime at the moment), I'll try to make a "git bisect"
asap.

> No need for additional traces atm, I'll get back to this once I've
> gone through the logs.

Ok :)

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-02  7:59                   ` Frederic Leroy
@ 2009-12-02 12:59                     ` Ilpo Järvinen
  2009-12-02 15:44                       ` Frederic Leroy
  0 siblings, 1 reply; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-02 12:59 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Netdev, Asdo, David Miller, Eric Dumazet, Herbert Xu

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1192 bytes --]

On Wed, 2 Dec 2009, Frederic Leroy wrote:

> Le Tue, 1 Dec 2009 22:27:01 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > On Tue, 1 Dec 2009, Frederic Leroy wrote:
> >
> > Thanks, I'll hopefully have some time tomorrow to go through all the
> > new ones.

Thanks for the logs. ...It is very much confirmed now that the connection 
suddently dies.

What is directly inferrable from the logs: Losses happen. RTO is triggered 
for the first time (for some reason also fast retransmit is missing and it 
most likely it should have preceeded RTO), however, the retransmission 
that RTO should make is never captured into the tcpdump. Then when RTO 
triggers for the second time, the connection silently dies immediately and 
consistently for some reason. ...From that point on resets are sent if the 
peer keeps trying.

> > Also,
> > I wonder if you have had some working (past) kernel version?
> 
> I tried a 2.6.31, and it seems to work. If it really work (connection
> don't stall everytime at the moment), I'll try to make a "git bisect"
> asap.

In one of the cases, also the sg end dies (the 4th case). I suppose that 
was running earlier kernel already?

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-02 12:59                     ` Ilpo Järvinen
@ 2009-12-02 15:44                       ` Frederic Leroy
  2009-12-02 16:05                         ` Ilpo Järvinen
  0 siblings, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-12-02 15:44 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, Asdo, David Miller, Eric Dumazet, Herbert Xu

On Wed, Dec 02, 2009 at 02:59:20PM +0200, Ilpo Järvinen wrote:
> On Wed, 2 Dec 2009, Frederic Leroy wrote:
> 
> > Le Tue, 1 Dec 2009 22:27:01 +0200 (EET),
> > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> What is directly inferrable from the logs: Losses happen. RTO is triggered 
> for the first time (for some reason also fast retransmit is missing and it 
> most likely it should have preceeded RTO), however, the retransmission 
> that RTO should make is never captured into the tcpdump. Then when RTO 
> triggers for the second time, the connection silently dies immediately and 
> consistently for some reason. ...From that point on resets are sent if the 
> peer keeps trying.

If you want, I can add printk when a rto is made, in order to compare with tcpdump.

> > > I wonder if you have had some working (past) kernel version?
> > I tried a 2.6.31, and it seems to work. If it really work (connection
> > don't stall everytime at the moment), I'll try to make a "git bisect"
> > asap.

I can confirm now that 2.6.31 works well.

I made a little change during the test : 
Instead of launching scp on sg, I launched it from houba.
The scp command return then  to bash with an error, without stalling.
Which appears to me logical because the connection is killed from houba point of view.
 
> In one of the cases, also the sg end dies (the 4th case). I suppose that 
> was running earlier kernel already?

This case was the one with tcp_frto=2 and tcp_timestamps=0 on houba.

I only rebooted sg two days ago. It still 2.6.32-rc5. Nothing had been changed on this host during all tests.
tcp_frto=2
tcp_timestamps=1

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-02 15:44                       ` Frederic Leroy
@ 2009-12-02 16:05                         ` Ilpo Järvinen
  2009-12-02 17:34                           ` Frederic Leroy
  2009-12-03  8:56                           ` Frederic Leroy
  0 siblings, 2 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-02 16:05 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Netdev, Asdo, David Miller, Eric Dumazet, Herbert Xu,
	Damian Lukowski

[-- Attachment #1: Type: TEXT/PLAIN, Size: 13452 bytes --]

On Wed, 2 Dec 2009, Frederic Leroy wrote:

> On Wed, Dec 02, 2009 at 02:59:20PM +0200, Ilpo Järvinen wrote:
> > On Wed, 2 Dec 2009, Frederic Leroy wrote:
> > 
> > > Le Tue, 1 Dec 2009 22:27:01 +0200 (EET),
> > > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> > What is directly inferrable from the logs: Losses happen. RTO is triggered 
> > for the first time (for some reason also fast retransmit is missing and it 
> > most likely it should have preceeded RTO), however, the retransmission 
> > that RTO should make is never captured into the tcpdump. Then when RTO 
> > triggers for the second time, the connection silently dies immediately and 
> > consistently for some reason. ...From that point on resets are sent if the 
> > peer keeps trying.
> 
> If you want, I can add printk when a rto is made, in order to compare 
> with tcpdump. 

It's rather conclusive already that the first one did happen
(icsk_retransmit is increased in very few places and your proc/net/tcp 
shows increase of that to 1 every time). What happened on the second one 
is unknown though (icsk_retransmits==1 at that point of time), I suppose 
it takes some path in the RTO code which leads to death of the connection.

> > > > I wonder if you have had some working (past) kernel version?
> > > I tried a 2.6.31, and it seems to work. If it really work (connection
> > > don't stall everytime at the moment), I'll try to make a "git bisect"
> > > asap.
> 
> I can confirm now that 2.6.31 works well.
> 
> I made a little change during the test : 
> Instead of launching scp on sg, I launched it from houba.
> The scp command return then  to bash with an error, without stalling.
> Which appears to me logical because the connection is killed from houba point of view.
>  
> > In one of the cases, also the sg end dies (the 4th case). I suppose that 
> > was running earlier kernel already?
> 
> This case was the one with tcp_frto=2 and tcp_timestamps=0 on houba.

I suppose we're confused, I was refering to .4. case, did you perhaps mix 
that up with the latest set of tests which yields .8.?

> I only rebooted sg two days ago. It still 2.6.32-rc5. Nothing had been 
> changed on this host during all tests. 
> tcp_frto=2
> tcp_timestamps=1

...Ah so it was running the new kernel all the way along, my bad, I 
somehow was able to misread your kernel version numbers in the earlier 
mail :-).

IIRC your kernels Asdo were older than that, right?

In the recent work, the most suspicious things are the new timeout things, 
I'll read them through once I've some time (but so far I've not found 
anything wrong in them but I of course can miss something subtle).  
...I've added Damian as CC if he has some idea. If you want you can try 
with a trivial revert of that stuff, I've included a patch for that below.


-- 
 i.


--
[PATCH] Revert new RTO backoff stuff

---
 Documentation/networking/ip-sysctl.txt |   37 ++++++---------------
 include/net/tcp.h                      |   35 ---------------------
 net/ipv4/tcp_input.c                   |    5 ++-
 net/ipv4/tcp_ipv4.c                    |   53 +++++---------------------------
 net/ipv4/tcp_timer.c                   |   13 +++-----
 5 files changed, 27 insertions(+), 116 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index fbe427a..da07602 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -311,12 +311,9 @@ tcp_no_metrics_save - BOOLEAN
 	connections.
 
 tcp_orphan_retries - INTEGER
-	This value influences the timeout of a locally closed TCP connection,
-	when RTO retransmissions remain unacknowledged.
-	See tcp_retries2 for more details.
-
-	The default value is 7.
-	If your machine is a loaded WEB server,
+	How may times to retry before killing TCP connection, closed
+	by our side. Default value 7 corresponds to ~50sec-16min
+	depending on RTO. If you machine is loaded WEB server,
 	you should think about lowering this value, such sockets
 	may consume significant resources. Cf. tcp_max_orphans.
 
@@ -330,28 +327,16 @@ tcp_retrans_collapse - BOOLEAN
 	certain TCP stacks.
 
 tcp_retries1 - INTEGER
-	This value influences the time, after which TCP decides, that
-	something is wrong due to unacknowledged RTO retransmissions,
-	and reports this suspicion to the network layer.
-	See tcp_retries2 for more details.
-
-	RFC 1122 recommends at least 3 retransmissions, which is the
-	default.
+	How many times to retry before deciding that something is wrong
+	and it is necessary to report this suspicion to network layer.
+	Minimal RFC value is 3, it is default, which corresponds
+	to ~3sec-8min depending on RTO.
 
 tcp_retries2 - INTEGER
-	This value influences the timeout of an alive TCP connection,
-	when RTO retransmissions remain unacknowledged.
-	Given a value of N, a hypothetical TCP connection following
-	exponential backoff with an initial RTO of TCP_RTO_MIN would
-	retransmit N times before killing the connection at the (N+1)th RTO.
-
-	The default value of 15 yields a hypothetical timeout of 924.6
-	seconds and is a lower bound for the effective timeout.
-	TCP will effectively time out at the first RTO which exceeds the
-	hypothetical timeout.
-
-	RFC 1122 recommends at least 100 seconds for the timeout,
-	which corresponds to a value of at least 8.
+	How may times to retry before killing alive TCP connection.
+	RFC1122 says that the limit should be longer than 100 sec.
+	It is too small number.	Default value 15 corresponds to ~13-30min
+	depending on RTO.
 
 tcp_rfc1337 - BOOLEAN
 	If set, the TCP stack behaves conforming to RFC1337. If unset,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..983367e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -469,7 +469,6 @@ extern void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
 				      int nonagle);
 extern int tcp_may_send_now(struct sock *sk);
 extern int tcp_retransmit_skb(struct sock *, struct sk_buff *);
-extern void tcp_retransmit_timer(struct sock *sk);
 extern void tcp_xmit_retransmit_queue(struct sock *);
 extern void tcp_simple_retransmit(struct sock *);
 extern int tcp_trim_head(struct sock *, struct sk_buff *, u32);
@@ -522,17 +521,6 @@ extern int tcp_mtu_to_mss(struct sock *sk, int pmtu);
 extern int tcp_mss_to_mtu(struct sock *sk, int mss);
 extern void tcp_mtup_init(struct sock *sk);
 
-static inline void tcp_bound_rto(const struct sock *sk)
-{
-	if (inet_csk(sk)->icsk_rto > TCP_RTO_MAX)
-		inet_csk(sk)->icsk_rto = TCP_RTO_MAX;
-}
-
-static inline u32 __tcp_set_rto(const struct tcp_sock *tp)
-{
-	return (tp->srtt >> 3) + tp->rttvar;
-}
-
 static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd)
 {
 	tp->pred_flags = htonl((tp->tcp_header_len << 26) |
@@ -1259,29 +1247,6 @@ static inline struct sk_buff *tcp_write_queue_prev(struct sock *sk, struct sk_bu
 #define tcp_for_write_queue_from_safe(skb, tmp, sk)			\
 	skb_queue_walk_from_safe(&(sk)->sk_write_queue, skb, tmp)
 
-/* This function calculates a "timeout" which is equivalent to the timeout of a
- * TCP connection after "boundary" unsucessful, exponentially backed-off
- * retransmissions with an initial RTO of TCP_RTO_MIN.
- */
-static inline bool retransmits_timed_out(const struct sock *sk,
-					 unsigned int boundary)
-{
-	unsigned int timeout, linear_backoff_thresh;
-
-	if (!inet_csk(sk)->icsk_retransmits)
-		return false;
-
-	linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
-
-	if (boundary <= linear_backoff_thresh)
-		timeout = ((2 << boundary) - 1) * TCP_RTO_MIN;
-	else
-		timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
-			  (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
-
-	return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= timeout;
-}
-
 static inline struct sk_buff *tcp_send_head(struct sock *sk)
 {
 	return sk->sk_send_head;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d86784b..6322e62 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -685,7 +685,7 @@ static inline void tcp_set_rto(struct sock *sk)
 	 *    is invisible. Actually, Linux-2.4 also generates erratic
 	 *    ACKs in some circumstances.
 	 */
-	inet_csk(sk)->icsk_rto = __tcp_set_rto(tp);
+	inet_csk(sk)->icsk_rto = (tp->srtt >> 3) + tp->rttvar;
 
 	/* 2. Fixups made earlier cannot be right.
 	 *    If we do not estimate RTO correctly without them,
@@ -696,7 +696,8 @@ static inline void tcp_set_rto(struct sock *sk)
 	/* NOTE: clamping at TCP_RTO_MIN is not required, current algo
 	 * guarantees that rto is higher.
 	 */
-	tcp_bound_rto(sk);
+	if (inet_csk(sk)->icsk_rto > TCP_RTO_MAX)
+		inet_csk(sk)->icsk_rto = TCP_RTO_MAX;
 }
 
 /* Save metrics learned by this TCP session.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7cda24b..702ce88 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -328,29 +328,26 @@ static void do_pmtu_discovery(struct sock *sk, struct iphdr *iph, u32 mtu)
  *
  */
 
-void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
+void tcp_v4_err(struct sk_buff *skb, u32 info)
 {
-	struct iphdr *iph = (struct iphdr *)icmp_skb->data;
-	struct tcphdr *th = (struct tcphdr *)(icmp_skb->data + (iph->ihl << 2));
-	struct inet_connection_sock *icsk;
+	struct iphdr *iph = (struct iphdr *)skb->data;
+	struct tcphdr *th = (struct tcphdr *)(skb->data + (iph->ihl << 2));
 	struct tcp_sock *tp;
 	struct inet_sock *inet;
-	const int type = icmp_hdr(icmp_skb)->type;
-	const int code = icmp_hdr(icmp_skb)->code;
+	const int type = icmp_hdr(skb)->type;
+	const int code = icmp_hdr(skb)->code;
 	struct sock *sk;
-	struct sk_buff *skb;
 	__u32 seq;
-	__u32 remaining;
 	int err;
-	struct net *net = dev_net(icmp_skb->dev);
+	struct net *net = dev_net(skb->dev);
 
-	if (icmp_skb->len < (iph->ihl << 2) + 8) {
+	if (skb->len < (iph->ihl << 2) + 8) {
 		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
 		return;
 	}
 
 	sk = inet_lookup(net, &tcp_hashinfo, iph->daddr, th->dest,
-			iph->saddr, th->source, inet_iif(icmp_skb));
+			iph->saddr, th->source, inet_iif(skb));
 	if (!sk) {
 		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
 		return;
@@ -370,7 +367,6 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 	if (sk->sk_state == TCP_CLOSE)
 		goto out;
 
-	icsk = inet_csk(sk);
 	tp = tcp_sk(sk);
 	seq = ntohl(th->seq);
 	if (sk->sk_state != TCP_LISTEN &&
@@ -397,39 +393,6 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 		}
 
 		err = icmp_err_convert[code].errno;
-		/* check if icmp_skb allows revert of backoff
-		 * (see draft-zimmermann-tcp-lcd) */
-		if (code != ICMP_NET_UNREACH && code != ICMP_HOST_UNREACH)
-			break;
-		if (seq != tp->snd_una  || !icsk->icsk_retransmits ||
-		    !icsk->icsk_backoff)
-			break;
-
-		icsk->icsk_backoff--;
-		inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) <<
-					 icsk->icsk_backoff;
-		tcp_bound_rto(sk);
-
-		skb = tcp_write_queue_head(sk);
-		BUG_ON(!skb);
-
-		remaining = icsk->icsk_rto - min(icsk->icsk_rto,
-				tcp_time_stamp - TCP_SKB_CB(skb)->when);
-
-		if (remaining) {
-			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
-						  remaining, TCP_RTO_MAX);
-		} else if (sock_owned_by_user(sk)) {
-			/* RTO revert clocked out retransmission,
-			 * but socket is locked. Will defer. */
-			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
-						  HZ/20, TCP_RTO_MAX);
-		} else {
-			/* RTO revert clocked out retransmission.
-			 * Will retransmit now */
-			tcp_retransmit_timer(sk);
-		}
-
 		break;
 	case ICMP_TIME_EXCEEDED:
 		err = EHOSTUNREACH;
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index cdb2ca7..c520fb6 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -137,14 +137,13 @@ static int tcp_write_timeout(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	int retry_until;
-	bool do_reset;
 
 	if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
 		if (icsk->icsk_retransmits)
 			dst_negative_advice(&sk->sk_dst_cache);
 		retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
 	} else {
-		if (retransmits_timed_out(sk, sysctl_tcp_retries1)) {
+		if (icsk->icsk_retransmits >= sysctl_tcp_retries1) {
 			/* Black hole detection */
 			tcp_mtu_probing(icsk, sk);
 
@@ -156,15 +155,13 @@ static int tcp_write_timeout(struct sock *sk)
 			const int alive = (icsk->icsk_rto < TCP_RTO_MAX);
 
 			retry_until = tcp_orphan_retries(sk, alive);
-			do_reset = alive ||
-				   !retransmits_timed_out(sk, retry_until);
 
-			if (tcp_out_of_resources(sk, do_reset))
+			if (tcp_out_of_resources(sk, alive || icsk->icsk_retransmits < retry_until))
 				return 1;
 		}
 	}
 
-	if (retransmits_timed_out(sk, retry_until)) {
+	if (icsk->icsk_retransmits >= retry_until) {
 		/* Has it gone just too far? */
 		tcp_write_err(sk);
 		return 1;
@@ -282,7 +279,7 @@ static void tcp_probe_timer(struct sock *sk)
  *	The TCP retransmit timer.
  */
 
-void tcp_retransmit_timer(struct sock *sk)
+static void tcp_retransmit_timer(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
@@ -388,7 +385,7 @@ void tcp_retransmit_timer(struct sock *sk)
 out_reset_timer:
 	icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
 	inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
-	if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
+	if (icsk->icsk_retransmits > sysctl_tcp_retries1)
 		__sk_dst_reset(sk);
 
 out:;
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-02 16:05                         ` Ilpo Järvinen
@ 2009-12-02 17:34                           ` Frederic Leroy
  2009-12-02 19:17                             ` Damian Lukowski
  2009-12-03  8:56                           ` Frederic Leroy
  1 sibling, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-12-02 17:34 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Netdev, Asdo, David Miller, Eric Dumazet, Herbert Xu,
	Damian Lukowski

Le Wed, 2 Dec 2009 18:05:24 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> > > In one of the cases, also the sg end dies (the 4th case). I
> > > suppose that was running earlier kernel already?
> > 
> > This case was the one with tcp_frto=2 and tcp_timestamps=0 on houba.
> 
> I suppose we're confused, I was refering to .4. case, did you perhaps
> mix that up with the latest set of tests which yields .8.?

You're right, I was talking about .8

> In the recent work, the most suspicious things are the new timeout
> things, I'll read them through once I've some time (but so far I've
> not found anything wrong in them but I of course can miss something
> subtle). ...I've added Damian as CC if he has some idea. If you want
> you can try with a trivial revert of that stuff, I've included a
> patch for that below.

I will try it. 

I just discover something not good. I use tuxonice[1] branch on houba
for 2.6.32*.
Although my 2.6.31 kernel is vanilla[2].

[1] git://git.kernel.org/pub/scm/linux/kernel/git/nigelc/tuxonice-head.git
[2] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

Now, I'm gonna go compile kernels again and again and make more
tests ... :)

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-02 17:34                           ` Frederic Leroy
@ 2009-12-02 19:17                             ` Damian Lukowski
  2009-12-03  8:59                               ` Frederic Leroy
  0 siblings, 1 reply; 63+ messages in thread
From: Damian Lukowski @ 2009-12-02 19:17 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Ilpo Järvinen, Netdev, Asdo, David Miller, Eric Dumazet,
	Herbert Xu

Frederic Leroy schrieb:
> Le Wed, 2 Dec 2009 18:05:24 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
>>>> In one of the cases, also the sg end dies (the 4th case). I
>>>> suppose that was running earlier kernel already?
>>> This case was the one with tcp_frto=2 and tcp_timestamps=0 on houba.
>> I suppose we're confused, I was refering to .4. case, did you perhaps
>> mix that up with the latest set of tests which yields .8.?
> 
> You're right, I was talking about .8
> 
>> In the recent work, the most suspicious things are the new timeout
>> things, I'll read them through once I've some time (but so far I've
>> not found anything wrong in them but I of course can miss something
>> subtle). ...I've added Damian as CC if he has some idea. If you want
>> you can try with a trivial revert of that stuff, I've included a
>> patch for that below.
> 
> I will try it. 

Hi,
could you please printk retrans_stamp just before the return in 
include/net/tcp.h:retransmits_timed_out()?
If the value is not monotonically increasing but is reset to 0 at some
point, this might lead to problems in tcp_write_timeout().
It's the only idea I have now.

> 
> I just discover something not good. I use tuxonice[1] branch on houba
> for 2.6.32*.
> Although my 2.6.31 kernel is vanilla[2].
> 
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/nigelc/tuxonice-head.git
> [2] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> 
> Now, I'm gonna go compile kernels again and again and make more
> tests ... :)
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-02 16:05                         ` Ilpo Järvinen
  2009-12-02 17:34                           ` Frederic Leroy
@ 2009-12-03  8:56                           ` Frederic Leroy
  1 sibling, 0 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-12-03  8:56 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Netdev, Asdo, David Miller, Eric Dumazet, Herbert Xu,
	Damian Lukowski

On Wed, Dec 02, 2009 at 06:05:24PM +0200, Ilpo Järvinen wrote:
>
> In the recent work, the most suspicious things are the new timeout things, 
> I'll read them through once I've some time (but so far I've not found 
> anything wrong in them but I of course can miss something subtle).  
> ...I've added Damian as CC if he has some idea. If you want you can try 
> with a trivial revert of that stuff, I've included a patch for that below.

> [PATCH] Revert new RTO backoff stuff

So I recompiled my kernel with linus-stable tree. It works (bad) as the tuxonice 2.6.32-rc5.
All further test are on linus-stable tree.

I made 3 test with your patch. All 3 copy worked well.
So it's in the new RTO backoff.

I made only one trace (.9).

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-02 19:17                             ` Damian Lukowski
@ 2009-12-03  8:59                               ` Frederic Leroy
  2009-12-03 10:29                                 ` Ilpo Järvinen
  0 siblings, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-12-03  8:59 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Ilpo Järvinen, Netdev, Asdo, David Miller, Eric Dumazet,
	Herbert Xu

On Wed, Dec 02, 2009 at 08:17:44PM +0100, Damian Lukowski wrote:
> could you please printk retrans_stamp just before the return in 
> include/net/tcp.h:retransmits_timed_out()?
> If the value is not monotonically increasing but is reset to 0 at some
> point, this might lead to problems in tcp_write_timeout().
> It's the only idea I have now.

Your idea is good.
Only one out of 4 value is not null.

Logs corresponding on http://wwW.starox.org/pub/scp_stall is .10

I make 2 attempts. Printk corresponding to .10 are those after the line 
"wlan1 enter promiscuous mode"

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03  8:59                               ` Frederic Leroy
@ 2009-12-03 10:29                                 ` Ilpo Järvinen
  2009-12-03 10:34                                   ` David Miller
                                                     ` (4 more replies)
  0 siblings, 5 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-03 10:29 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Damian Lukowski, Netdev, Asdo, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

I've added Greg as CC to make him aware of this issue in early as it now 
affects 2.6.32 too (rather important to get dealt quickly in stable once 
we have a tested solution since TCP is pretty broken with the silent 
deaths this problem seems to cause). ...One possibility would be to just 
queue the tested revert to stable and sort this thing out for 2.6.33 in 
net-2.6.

Opinions, Dave?, Greg?

Now back to the issue...

You said in the other mail that "All further test are on linus-stable 
tree.", which has this contradiction that Linus does not maintain stable 
trees. Which exactly was the tree used for the .9. test, Linus' tree or 
the 2.6.31 stable tree? I suppose the former since the revert wouldn't 
apply to 2.6.31 so I just want to confirm.

On Thu, 3 Dec 2009, Frederic Leroy wrote:
> On Wed, Dec 02, 2009 at 08:17:44PM +0100, Damian Lukowski wrote:
> > could you please printk retrans_stamp just before the return in 
> > include/net/tcp.h:retransmits_timed_out()?
> > If the value is not monotonically increasing but is reset to 0 at some
> > point, this might lead to problems in tcp_write_timeout().
> > It's the only idea I have now.
> 
> Your idea is good.
> Only one out of 4 value is not null.
>
> Logs corresponding on http://wwW.starox.org/pub/scp_stall is .10
> 
> I make 2 attempts. Printk corresponding to .10 are those after the line 
> "wlan1 enter promiscuous mode"

Nice thinking indeed Damian, thanks. ...But but, where exactly did you 
print? ...There are multiple returns and the return false branch is 
expected to have a zero retrans_stamp in a typical case but that is not
a problem because we never use the value.

...Anyway, if I'm wrong with my suspicion and it still holds that we have 
zero retrans_stamp in the substraction too, it could have something to do 
with this snippet:

static void tcp_try_to_open(struct sock *sk, int flag)
{
        struct tcp_sock *tp = tcp_sk(sk);

        tcp_verify_left_out(tp);

        if (!tp->frto_counter && tp->retrans_out == 0)
                tp->retrans_stamp = 0;

...It bit me last time when FRTO was enabled after very small modification 
(without running a full verification after the trivial looking 
modification). ...So I've worked around this clearing for FRTO as you 
can see :-).

Also, we have the another mystery to be solved, the fast retransmission is 
not triggered for some reason (or alternatively not captured in to a 
log), even in the working .9. case. It would be easy to see whether it 
works at all from TCP point of view by looking into mibs once you have 
have some transfers in a working configuration:

grep -A1 TCP /proc/net/netstat

...luckily this fast retransmit issue is less crucial as almost all people 
are pretty happy already if their RTO-based recovery works even if the 
fast recovery would not. So figuring it out can be postponed (if one has 
to prioritize) until the silent death issue is out of the way.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 10:29                                 ` Ilpo Järvinen
@ 2009-12-03 10:34                                   ` David Miller
  2009-12-03 10:48                                     ` Ilpo Järvinen
  2009-12-03 11:57                                   ` Damian Lukowski
                                                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 63+ messages in thread
From: David Miller @ 2009-12-03 10:34 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: fredo, damian, netdev, asdo, eric.dumazet, herbert, gregkh

From: "Ilpo J^[$(D+#^[(Brvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 3 Dec 2009 12:29:39 +0200 (EET)

> I've added Greg as CC to make him aware of this issue in early as it now 
> affects 2.6.32 too (rather important to get dealt quickly in stable once 
> we have a tested solution since TCP is pretty broken with the silent 
> deaths this problem seems to cause). ...One possibility would be to just 
> queue the tested revert to stable and sort this thing out for 2.6.33 in 
> net-2.6.

What revert?  Zero context provided here...

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 10:34                                   ` David Miller
@ 2009-12-03 10:48                                     ` Ilpo Järvinen
  2009-12-03 12:19                                       ` Asdo
  0 siblings, 1 reply; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-03 10:48 UTC (permalink / raw)
  To: David Miller
  Cc: fredo, damian, Netdev, asdo, eric.dumazet, Herbert Xu, Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 12326 bytes --]

On Thu, 3 Dec 2009, David Miller wrote:

> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Thu, 3 Dec 2009 12:29:39 +0200 (EET)
> 
> > I've added Greg as CC to make him aware of this issue in early as it now 
> > affects 2.6.32 too (rather important to get dealt quickly in stable once 
> > we have a tested solution since TCP is pretty broken with the silent 
> > deaths this problem seems to cause). ...One possibility would be to just 
> > queue the tested revert to stable and sort this thing out for 2.6.33 in 
> > net-2.6.
> 
> What revert?  Zero context provided here...

...I've CCed this thread to you earlier so I'd have expected such question 
only from Greg (but I know you are busy and read mails in a pseudorandom 
order so I understand :-))... It's the revert for the RTO backoff revert 
thing (its talks about reverting backoff so it's easy to get confused here 
by the terminology :-)). I've included the earlier revert patch in a 
slightly more verbose form below. ...Certainly not in a minimal one form 
with the initial rename patch but I didn't take changes while we did the 
initial testing to find the cause.

-- 
 i.

--
[PATCH] TCP: Revert new RTO backoff stuff

Reverts:

5d7892298a819743b3892df08bb496992fe85951
	RTO connection timeout: sysctl documentation update
5152fc7de3ae31b46692022ea63ce0501280f5b1
	RTO connection timeout: coding style fixes and comments
6fa12c85031485dff38ce550c24f10da23b0adaa
	Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.
f1ecd5d9e7366609d640ff4040304ea197fbc618
	Revert Backoff [v3]: Revert RTO on ICMP destination unreachable
4d1a2d9ec1c17df077ed09a0d135bccf5637a3b7
	Revert Backoff [v3]: Rename skb to icmp_skb in tcp_v4_err()

---
 Documentation/networking/ip-sysctl.txt |   37 ++++++---------------
 include/net/tcp.h                      |   35 ---------------------
 net/ipv4/tcp_input.c                   |    5 ++-
 net/ipv4/tcp_ipv4.c                    |   53 +++++---------------------------
 net/ipv4/tcp_timer.c                   |   13 +++-----
 5 files changed, 27 insertions(+), 116 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index fbe427a..da07602 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -311,12 +311,9 @@ tcp_no_metrics_save - BOOLEAN
 	connections.
 
 tcp_orphan_retries - INTEGER
-	This value influences the timeout of a locally closed TCP connection,
-	when RTO retransmissions remain unacknowledged.
-	See tcp_retries2 for more details.
-
-	The default value is 7.
-	If your machine is a loaded WEB server,
+	How may times to retry before killing TCP connection, closed
+	by our side. Default value 7 corresponds to ~50sec-16min
+	depending on RTO. If you machine is loaded WEB server,
 	you should think about lowering this value, such sockets
 	may consume significant resources. Cf. tcp_max_orphans.
 
@@ -330,28 +327,16 @@ tcp_retrans_collapse - BOOLEAN
 	certain TCP stacks.
 
 tcp_retries1 - INTEGER
-	This value influences the time, after which TCP decides, that
-	something is wrong due to unacknowledged RTO retransmissions,
-	and reports this suspicion to the network layer.
-	See tcp_retries2 for more details.
-
-	RFC 1122 recommends at least 3 retransmissions, which is the
-	default.
+	How many times to retry before deciding that something is wrong
+	and it is necessary to report this suspicion to network layer.
+	Minimal RFC value is 3, it is default, which corresponds
+	to ~3sec-8min depending on RTO.
 
 tcp_retries2 - INTEGER
-	This value influences the timeout of an alive TCP connection,
-	when RTO retransmissions remain unacknowledged.
-	Given a value of N, a hypothetical TCP connection following
-	exponential backoff with an initial RTO of TCP_RTO_MIN would
-	retransmit N times before killing the connection at the (N+1)th RTO.
-
-	The default value of 15 yields a hypothetical timeout of 924.6
-	seconds and is a lower bound for the effective timeout.
-	TCP will effectively time out at the first RTO which exceeds the
-	hypothetical timeout.
-
-	RFC 1122 recommends at least 100 seconds for the timeout,
-	which corresponds to a value of at least 8.
+	How may times to retry before killing alive TCP connection.
+	RFC1122 says that the limit should be longer than 100 sec.
+	It is too small number.	Default value 15 corresponds to ~13-30min
+	depending on RTO.
 
 tcp_rfc1337 - BOOLEAN
 	If set, the TCP stack behaves conforming to RFC1337. If unset,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..983367e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -469,7 +469,6 @@ extern void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
 				      int nonagle);
 extern int tcp_may_send_now(struct sock *sk);
 extern int tcp_retransmit_skb(struct sock *, struct sk_buff *);
-extern void tcp_retransmit_timer(struct sock *sk);
 extern void tcp_xmit_retransmit_queue(struct sock *);
 extern void tcp_simple_retransmit(struct sock *);
 extern int tcp_trim_head(struct sock *, struct sk_buff *, u32);
@@ -522,17 +521,6 @@ extern int tcp_mtu_to_mss(struct sock *sk, int pmtu);
 extern int tcp_mss_to_mtu(struct sock *sk, int mss);
 extern void tcp_mtup_init(struct sock *sk);
 
-static inline void tcp_bound_rto(const struct sock *sk)
-{
-	if (inet_csk(sk)->icsk_rto > TCP_RTO_MAX)
-		inet_csk(sk)->icsk_rto = TCP_RTO_MAX;
-}
-
-static inline u32 __tcp_set_rto(const struct tcp_sock *tp)
-{
-	return (tp->srtt >> 3) + tp->rttvar;
-}
-
 static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd)
 {
 	tp->pred_flags = htonl((tp->tcp_header_len << 26) |
@@ -1259,29 +1247,6 @@ static inline struct sk_buff *tcp_write_queue_prev(struct sock *sk, struct sk_bu
 #define tcp_for_write_queue_from_safe(skb, tmp, sk)			\
 	skb_queue_walk_from_safe(&(sk)->sk_write_queue, skb, tmp)
 
-/* This function calculates a "timeout" which is equivalent to the timeout of a
- * TCP connection after "boundary" unsucessful, exponentially backed-off
- * retransmissions with an initial RTO of TCP_RTO_MIN.
- */
-static inline bool retransmits_timed_out(const struct sock *sk,
-					 unsigned int boundary)
-{
-	unsigned int timeout, linear_backoff_thresh;
-
-	if (!inet_csk(sk)->icsk_retransmits)
-		return false;
-
-	linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
-
-	if (boundary <= linear_backoff_thresh)
-		timeout = ((2 << boundary) - 1) * TCP_RTO_MIN;
-	else
-		timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
-			  (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
-
-	return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= timeout;
-}
-
 static inline struct sk_buff *tcp_send_head(struct sock *sk)
 {
 	return sk->sk_send_head;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d86784b..6322e62 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -685,7 +685,7 @@ static inline void tcp_set_rto(struct sock *sk)
 	 *    is invisible. Actually, Linux-2.4 also generates erratic
 	 *    ACKs in some circumstances.
 	 */
-	inet_csk(sk)->icsk_rto = __tcp_set_rto(tp);
+	inet_csk(sk)->icsk_rto = (tp->srtt >> 3) + tp->rttvar;
 
 	/* 2. Fixups made earlier cannot be right.
 	 *    If we do not estimate RTO correctly without them,
@@ -696,7 +696,8 @@ static inline void tcp_set_rto(struct sock *sk)
 	/* NOTE: clamping at TCP_RTO_MIN is not required, current algo
 	 * guarantees that rto is higher.
 	 */
-	tcp_bound_rto(sk);
+	if (inet_csk(sk)->icsk_rto > TCP_RTO_MAX)
+		inet_csk(sk)->icsk_rto = TCP_RTO_MAX;
 }
 
 /* Save metrics learned by this TCP session.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7cda24b..702ce88 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -328,29 +328,26 @@ static void do_pmtu_discovery(struct sock *sk, struct iphdr *iph, u32 mtu)
  *
  */
 
-void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
+void tcp_v4_err(struct sk_buff *skb, u32 info)
 {
-	struct iphdr *iph = (struct iphdr *)icmp_skb->data;
-	struct tcphdr *th = (struct tcphdr *)(icmp_skb->data + (iph->ihl << 2));
-	struct inet_connection_sock *icsk;
+	struct iphdr *iph = (struct iphdr *)skb->data;
+	struct tcphdr *th = (struct tcphdr *)(skb->data + (iph->ihl << 2));
 	struct tcp_sock *tp;
 	struct inet_sock *inet;
-	const int type = icmp_hdr(icmp_skb)->type;
-	const int code = icmp_hdr(icmp_skb)->code;
+	const int type = icmp_hdr(skb)->type;
+	const int code = icmp_hdr(skb)->code;
 	struct sock *sk;
-	struct sk_buff *skb;
 	__u32 seq;
-	__u32 remaining;
 	int err;
-	struct net *net = dev_net(icmp_skb->dev);
+	struct net *net = dev_net(skb->dev);
 
-	if (icmp_skb->len < (iph->ihl << 2) + 8) {
+	if (skb->len < (iph->ihl << 2) + 8) {
 		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
 		return;
 	}
 
 	sk = inet_lookup(net, &tcp_hashinfo, iph->daddr, th->dest,
-			iph->saddr, th->source, inet_iif(icmp_skb));
+			iph->saddr, th->source, inet_iif(skb));
 	if (!sk) {
 		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
 		return;
@@ -370,7 +367,6 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 	if (sk->sk_state == TCP_CLOSE)
 		goto out;
 
-	icsk = inet_csk(sk);
 	tp = tcp_sk(sk);
 	seq = ntohl(th->seq);
 	if (sk->sk_state != TCP_LISTEN &&
@@ -397,39 +393,6 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 		}
 
 		err = icmp_err_convert[code].errno;
-		/* check if icmp_skb allows revert of backoff
-		 * (see draft-zimmermann-tcp-lcd) */
-		if (code != ICMP_NET_UNREACH && code != ICMP_HOST_UNREACH)
-			break;
-		if (seq != tp->snd_una  || !icsk->icsk_retransmits ||
-		    !icsk->icsk_backoff)
-			break;
-
-		icsk->icsk_backoff--;
-		inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) <<
-					 icsk->icsk_backoff;
-		tcp_bound_rto(sk);
-
-		skb = tcp_write_queue_head(sk);
-		BUG_ON(!skb);
-
-		remaining = icsk->icsk_rto - min(icsk->icsk_rto,
-				tcp_time_stamp - TCP_SKB_CB(skb)->when);
-
-		if (remaining) {
-			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
-						  remaining, TCP_RTO_MAX);
-		} else if (sock_owned_by_user(sk)) {
-			/* RTO revert clocked out retransmission,
-			 * but socket is locked. Will defer. */
-			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
-						  HZ/20, TCP_RTO_MAX);
-		} else {
-			/* RTO revert clocked out retransmission.
-			 * Will retransmit now */
-			tcp_retransmit_timer(sk);
-		}
-
 		break;
 	case ICMP_TIME_EXCEEDED:
 		err = EHOSTUNREACH;
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index cdb2ca7..c520fb6 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -137,14 +137,13 @@ static int tcp_write_timeout(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	int retry_until;
-	bool do_reset;
 
 	if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
 		if (icsk->icsk_retransmits)
 			dst_negative_advice(&sk->sk_dst_cache);
 		retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
 	} else {
-		if (retransmits_timed_out(sk, sysctl_tcp_retries1)) {
+		if (icsk->icsk_retransmits >= sysctl_tcp_retries1) {
 			/* Black hole detection */
 			tcp_mtu_probing(icsk, sk);
 
@@ -156,15 +155,13 @@ static int tcp_write_timeout(struct sock *sk)
 			const int alive = (icsk->icsk_rto < TCP_RTO_MAX);
 
 			retry_until = tcp_orphan_retries(sk, alive);
-			do_reset = alive ||
-				   !retransmits_timed_out(sk, retry_until);
 
-			if (tcp_out_of_resources(sk, do_reset))
+			if (tcp_out_of_resources(sk, alive || icsk->icsk_retransmits < retry_until))
 				return 1;
 		}
 	}
 
-	if (retransmits_timed_out(sk, retry_until)) {
+	if (icsk->icsk_retransmits >= retry_until) {
 		/* Has it gone just too far? */
 		tcp_write_err(sk);
 		return 1;
@@ -282,7 +279,7 @@ static void tcp_probe_timer(struct sock *sk)
  *	The TCP retransmit timer.
  */
 
-void tcp_retransmit_timer(struct sock *sk)
+static void tcp_retransmit_timer(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
@@ -388,7 +385,7 @@ void tcp_retransmit_timer(struct sock *sk)
 out_reset_timer:
 	icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
 	inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
-	if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
+	if (icsk->icsk_retransmits > sysctl_tcp_retries1)
 		__sk_dst_reset(sk);
 
 out:;
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 10:29                                 ` Ilpo Järvinen
  2009-12-03 10:34                                   ` David Miller
@ 2009-12-03 11:57                                   ` Damian Lukowski
  2009-12-03 12:19                                     ` Damian Lukowski
  2009-12-03 12:49                                     ` Ilpo Järvinen
  2009-12-03 12:11                                   ` Frederic Leroy
                                                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 63+ messages in thread
From: Damian Lukowski @ 2009-12-03 11:57 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Frederic Leroy, Netdev, Asdo, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

Ilpo Järvinen schrieb:
> I've added Greg as CC to make him aware of this issue in early as it now 
> affects 2.6.32 too (rather important to get dealt quickly in stable once 
> we have a tested solution since TCP is pretty broken with the silent 
> deaths this problem seems to cause). ...One possibility would be to just 
> queue the tested revert to stable and sort this thing out for 2.6.33 in 
> net-2.6.
> 
> Opinions, Dave?, Greg?
> 
> Now back to the issue...
> 
> You said in the other mail that "All further test are on linus-stable 
> tree.", which has this contradiction that Linus does not maintain stable 
> trees. Which exactly was the tree used for the .9. test, Linus' tree or 
> the 2.6.31 stable tree? I suppose the former since the revert wouldn't 
> apply to 2.6.31 so I just want to confirm.
> 
> 
> On Thu, 3 Dec 2009, Frederic Leroy wrote:
>> On Wed, Dec 02, 2009 at 08:17:44PM +0100, Damian Lukowski wrote:
>>> could you please printk retrans_stamp just before the return in 
>>> include/net/tcp.h:retransmits_timed_out()?
>>> If the value is not monotonically increasing but is reset to 0 at some
>>> point, this might lead to problems in tcp_write_timeout().
>>> It's the only idea I have now.
>> Your idea is good.
>> Only one out of 4 value is not null.
>>
>> Logs corresponding on http://wwW.starox.org/pub/scp_stall is .10
>>
>> I make 2 attempts. Printk corresponding to .10 are those after the line 
>> "wlan1 enter promiscuous mode"
> 
> Nice thinking indeed Damian, thanks. ...But but, where exactly did you 
> print? ...There are multiple returns and the return false branch is 
> expected to have a zero retrans_stamp in a typical case but that is not
> a problem because we never use the value.

Yes, it's the retrans_stamp in the subtraction I suspected to be 0.
I also suspect this to happen only in the ca_state < CA_Loss case,
so one first solution might be to return true whenever retrans_stamp == 0.
Unluckily, I still cannot reproduce the scp stalls here, so it would be nice
if Frederic printed retrans_stamp together with icsk_ca_state and
icsk_retransmits, please.

Damian

> ...Anyway, if I'm wrong with my suspicion and it still holds that we have 
> zero retrans_stamp in the substraction too, it could have something to do 
> with this snippet:
> 
> static void tcp_try_to_open(struct sock *sk, int flag)
> {
>         struct tcp_sock *tp = tcp_sk(sk);
> 
>         tcp_verify_left_out(tp);
> 
>         if (!tp->frto_counter && tp->retrans_out == 0)
>                 tp->retrans_stamp = 0;
> 
> ...It bit me last time when FRTO was enabled after very small modification 
> (without running a full verification after the trivial looking 
> modification). ...So I've worked around this clearing for FRTO as you 
> can see :-).
> 
> 
> Also, we have the another mystery to be solved, the fast retransmission is 
> not triggered for some reason (or alternatively not captured in to a 
> log), even in the working .9. case. It would be easy to see whether it 
> works at all from TCP point of view by looking into mibs once you have 
> have some transfers in a working configuration:
> 
> grep -A1 TCP /proc/net/netstat
> 
> ...luckily this fast retransmit issue is less crucial as almost all people 
> are pretty happy already if their RTO-based recovery works even if the 
> fast recovery would not. So figuring it out can be postponed (if one has 
> to prioritize) until the silent death issue is out of the way.
> 
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 10:29                                 ` Ilpo Järvinen
  2009-12-03 10:34                                   ` David Miller
  2009-12-03 11:57                                   ` Damian Lukowski
@ 2009-12-03 12:11                                   ` Frederic Leroy
  2009-12-03 12:44                                     ` Ilpo Järvinen
  2009-12-03 13:37                                     ` Arnd Hannemann
  2009-12-03 14:27                                   ` Arnd Hannemann
  2009-12-03 18:32                                   ` Greg KH
  4 siblings, 2 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-12-03 12:11 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Damian Lukowski, Netdev, Asdo, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

Le Thu, 3 Dec 2009 12:29:39 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> Opinions, Dave?, Greg?
> 
> Now back to the issue...
> 
> You said in the other mail that "All further test are on linus-stable 
> tree.", which has this contradiction that Linus does not maintain
> stable trees. Which exactly was the tree used for the .9. test

Sorry I'm confused and so confuse you.
For .9 .10 and now I'm only using : 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
 
> Linus' tree or the 2.6.31 stable tree? I suppose the former since the
> revert wouldn't apply to 2.6.31 so I just want to confirm.

I didn't keep the source of the old 2.6.31 kernel I have. 
So it's either 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
or
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6-stable.git

> Nice thinking indeed Damian, thanks. ...But but, where exactly did
> you print? ...There are multiple returns and the return false branch
> is expected to have a zero retrans_stamp in a typical case but that
> is not a problem because we never use the value.

Here is the code :
http://www.starox.org/pub/scp_stall/printk_retrans_stamp.patch

> ...Anyway, if I'm wrong with my suspicion and it still holds that we
> have zero retrans_stamp in the substraction too, it could have
> something to do with this snippet:
> 
> static void tcp_try_to_open(struct sock *sk, int flag)
> {
>         struct tcp_sock *tp = tcp_sk(sk);
> 
>         tcp_verify_left_out(tp);
> 
>         if (!tp->frto_counter && tp->retrans_out == 0)
>                 tp->retrans_stamp = 0;
> 
> ...It bit me last time when FRTO was enabled after very small
> modification (without running a full verification after the trivial
> looking modification). ...So I've worked around this clearing for
> FRTO as you can see :-).

:)

> Also, we have the another mystery to be solved, the fast
> retransmission is not triggered for some reason (or alternatively not
> captured in to a log), even in the working .9. case. It would be easy
> to see whether it works at all from TCP point of view by looking into
> mibs once you have have some transfers in a working configuration:
> 
> grep -A1 TCP /proc/net/netstat

I will try this evening. I can do test only outside office hours.

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 10:48                                     ` Ilpo Järvinen
@ 2009-12-03 12:19                                       ` Asdo
  0 siblings, 0 replies; 63+ messages in thread
From: Asdo @ 2009-12-03 12:19 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: David Miller, fredo, damian, Netdev, eric.dumazet, Herbert Xu,
	Greg KH

To people in: "scp stalls mysteriously":
Hi all,
I am quite sure as of now, having followed the discussion and having 
tested linux to linux transfers which didn't stall for me, that this is 
not my bug :-/
You can remove me from the CC as I cannot provide any useful info to this.
Good luck...

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 11:57                                   ` Damian Lukowski
@ 2009-12-03 12:19                                     ` Damian Lukowski
  2009-12-03 12:49                                     ` Ilpo Järvinen
  1 sibling, 0 replies; 63+ messages in thread
From: Damian Lukowski @ 2009-12-03 12:19 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Frederic Leroy, Netdev, Asdo, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

Damian Lukowski schrieb:
> Ilpo Järvinen schrieb:
>> I've added Greg as CC to make him aware of this issue in early as it now 
>> affects 2.6.32 too (rather important to get dealt quickly in stable once 
>> we have a tested solution since TCP is pretty broken with the silent 
>> deaths this problem seems to cause). ...One possibility would be to just 
>> queue the tested revert to stable and sort this thing out for 2.6.33 in 
>> net-2.6.
>>
>> Opinions, Dave?, Greg?
>>
>> Now back to the issue...
>>
>> You said in the other mail that "All further test are on linus-stable 
>> tree.", which has this contradiction that Linus does not maintain stable 
>> trees. Which exactly was the tree used for the .9. test, Linus' tree or 
>> the 2.6.31 stable tree? I suppose the former since the revert wouldn't 
>> apply to 2.6.31 so I just want to confirm.
>>
>>
>> On Thu, 3 Dec 2009, Frederic Leroy wrote:
>>> On Wed, Dec 02, 2009 at 08:17:44PM +0100, Damian Lukowski wrote:
>>>> could you please printk retrans_stamp just before the return in 
>>>> include/net/tcp.h:retransmits_timed_out()?
>>>> If the value is not monotonically increasing but is reset to 0 at some
>>>> point, this might lead to problems in tcp_write_timeout().
>>>> It's the only idea I have now.
>>> Your idea is good.
>>> Only one out of 4 value is not null.
>>>
>>> Logs corresponding on http://wwW.starox.org/pub/scp_stall is .10
>>>
>>> I make 2 attempts. Printk corresponding to .10 are those after the line 
>>> "wlan1 enter promiscuous mode"
>> Nice thinking indeed Damian, thanks. ...But but, where exactly did you 
>> print? ...There are multiple returns and the return false branch is 
>> expected to have a zero retrans_stamp in a typical case but that is not
>> a problem because we never use the value.
> 
> Yes, it's the retrans_stamp in the subtraction I suspected to be 0.
> I also suspect this to happen only in the ca_state < CA_Loss case,
> so one first solution might be to return true whenever retrans_stamp == 0.

return false, of course.

> Unluckily, I still cannot reproduce the scp stalls here, so it would be nice
> if Frederic printed retrans_stamp together with icsk_ca_state and
> icsk_retransmits, please.
> 
> Damian
> 
>> ...Anyway, if I'm wrong with my suspicion and it still holds that we have 
>> zero retrans_stamp in the substraction too, it could have something to do 
>> with this snippet:
>>
>> static void tcp_try_to_open(struct sock *sk, int flag)
>> {
>>         struct tcp_sock *tp = tcp_sk(sk);
>>
>>         tcp_verify_left_out(tp);
>>
>>         if (!tp->frto_counter && tp->retrans_out == 0)
>>                 tp->retrans_stamp = 0;
>>
>> ...It bit me last time when FRTO was enabled after very small modification 
>> (without running a full verification after the trivial looking 
>> modification). ...So I've worked around this clearing for FRTO as you 
>> can see :-).
>>
>>
>> Also, we have the another mystery to be solved, the fast retransmission is 
>> not triggered for some reason (or alternatively not captured in to a 
>> log), even in the working .9. case. It would be easy to see whether it 
>> works at all from TCP point of view by looking into mibs once you have 
>> have some transfers in a working configuration:
>>
>> grep -A1 TCP /proc/net/netstat
>>
>> ...luckily this fast retransmit issue is less crucial as almost all people 
>> are pretty happy already if their RTO-based recovery works even if the 
>> fast recovery would not. So figuring it out can be postponed (if one has 
>> to prioritize) until the silent death issue is out of the way.
>>
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 12:11                                   ` Frederic Leroy
@ 2009-12-03 12:44                                     ` Ilpo Järvinen
  2009-12-03 13:37                                     ` Arnd Hannemann
  1 sibling, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-03 12:44 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Damian Lukowski, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1549 bytes --]

On Thu, 3 Dec 2009, Frederic Leroy wrote:

> Le Thu, 3 Dec 2009 12:29:39 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > Opinions, Dave?, Greg?
> > 
> > Now back to the issue...
> > 
> > You said in the other mail that "All further test are on linus-stable 
> > tree.", which has this contradiction that Linus does not maintain
> > stable trees. Which exactly was the tree used for the .9. test
> 
> Sorry I'm confused and so confuse you.
> For .9 .10 and now I'm only using : 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

Thanks for the confirmation.

> > Nice thinking indeed Damian, thanks. ...But but, where exactly did
> > you print? ...There are multiple returns and the return false branch
> > is expected to have a zero retrans_stamp in a typical case but that
> > is not a problem because we never use the value.
> 
> Here is the code :
> http://www.starox.org/pub/scp_stall/printk_retrans_stamp.patch

So I was wrong.

> > Also, we have the another mystery to be solved, the fast
> > retransmission is not triggered for some reason (or alternatively not
> > captured in to a log), even in the working .9. case. It would be easy
> > to see whether it works at all from TCP point of view by looking into
> > mibs once you have have some transfers in a working configuration:
> > 
> > grep -A1 TCP /proc/net/netstat
> 
> I will try this evening. I can do test only outside office hours.

Ok. Please take before test and after test capture to allow easy 
compare of what changed.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 11:57                                   ` Damian Lukowski
  2009-12-03 12:19                                     ` Damian Lukowski
@ 2009-12-03 12:49                                     ` Ilpo Järvinen
  2009-12-03 14:10                                       ` Damian Lukowski
  1 sibling, 1 reply; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-03 12:49 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Frederic Leroy, Netdev, Asdo, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

On Thu, 3 Dec 2009, Damian Lukowski wrote:

> > On Thu, 3 Dec 2009, Frederic Leroy wrote:
> >> On Wed, Dec 02, 2009 at 08:17:44PM +0100, Damian Lukowski wrote:
> >>> could you please printk retrans_stamp just before the return in 
> >>> include/net/tcp.h:retransmits_timed_out()?
> >>> If the value is not monotonically increasing but is reset to 0 at some
> >>> point, this might lead to problems in tcp_write_timeout().
> >>> It's the only idea I have now.
> >> Your idea is good.
> >> Only one out of 4 value is not null.
> >>
> >> Logs corresponding on http://wwW.starox.org/pub/scp_stall is .10
> >>
> >> I make 2 attempts. Printk corresponding to .10 are those after the line 
> >> "wlan1 enter promiscuous mode"
> > 
> > Nice thinking indeed Damian, thanks. ...But but, where exactly did you 
> > print? ...There are multiple returns and the return false branch is 
> > expected to have a zero retrans_stamp in a typical case but that is not
> > a problem because we never use the value.
> 
> Yes, it's the retrans_stamp in the subtraction I suspected to be 0.
> I also suspect this to happen only in the ca_state < CA_Loss case,
> so one first solution might be to return true whenever retrans_stamp == 0.

I suppose adding || !tp->retrans_stamp into the false condition is fine 
as long as we don't then have a connection that can cause a connection 
to hang there forever for some reason (this needs to be understood well 
enough, not just test driven in stables :-)).

> Unluckily, I still cannot reproduce the scp stalls here, so it would be nice
> if Frederic printed retrans_stamp together with icsk_ca_state and
> icsk_retransmits, please.

It wouldn't hurt to know tp->packets_out and tp->retrans_out too, that 
might have some significant w.r.t what happens because of FRTO.


-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 12:11                                   ` Frederic Leroy
  2009-12-03 12:44                                     ` Ilpo Järvinen
@ 2009-12-03 13:37                                     ` Arnd Hannemann
  1 sibling, 0 replies; 63+ messages in thread
From: Arnd Hannemann @ 2009-12-03 13:37 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Ilpo Järvinen, Damian Lukowski, Netdev, Asdo, David Miller,
	Eric Dumazet, Herbert Xu, Greg KH

Frederic Leroy wrote:
> Le Thu, 3 Dec 2009 12:29:39 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
>> Opinions, Dave?, Greg?
>>
>> Now back to the issue...
>>
>> You said in the other mail that "All further test are on linus-stable 
>> tree.", which has this contradiction that Linus does not maintain
>> stable trees. Which exactly was the tree used for the .9. test
> 
> Sorry I'm confused and so confuse you.
> For .9 .10 and now I'm only using : 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>  
>> Linus' tree or the 2.6.31 stable tree? I suppose the former since the
>> revert wouldn't apply to 2.6.31 so I just want to confirm.
> 
> I didn't keep the source of the old 2.6.31 kernel I have. 
> So it's either 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> or
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6-stable.git
> 
>> Nice thinking indeed Damian, thanks. ...But but, where exactly did
>> you print? ...There are multiple returns and the return false branch
>> is expected to have a zero retrans_stamp in a typical case but that
>> is not a problem because we never use the value.
> 
> Here is the code :
> http://www.starox.org/pub/scp_stall/printk_retrans_stamp.patch
> 
>> ...Anyway, if I'm wrong with my suspicion and it still holds that we
>> have zero retrans_stamp in the substraction too, it could have
>> something to do with this snippet:
>>
>> static void tcp_try_to_open(struct sock *sk, int flag)
>> {
>>         struct tcp_sock *tp = tcp_sk(sk);
>>
>>         tcp_verify_left_out(tp);
>>
>>         if (!tp->frto_counter && tp->retrans_out == 0)
>>                 tp->retrans_stamp = 0;
>>
>> ...It bit me last time when FRTO was enabled after very small
>> modification (without running a full verification after the trivial
>> looking modification). ...So I've worked around this clearing for
>> FRTO as you can see :-).
> 
> :)
> 
>> Also, we have the another mystery to be solved, the fast
>> retransmission is not triggered for some reason (or alternatively not
>> captured in to a log), even in the working .9. case. It would be easy
>> to see whether it works at all from TCP point of view by looking into
>> mibs once you have have some transfers in a working configuration:
>>
>> grep -A1 TCP /proc/net/netstat
> 
> I will try this evening. I can do test only outside office hours.

If you don't mind, could you also post the output of
"sysctl -a | grep net.ipv4.tcp", please.
The tars you posted (proc_net_tcp.tbz2) seem to be empty.

Thanks.

Best regards,
Arnd

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 12:49                                     ` Ilpo Järvinen
@ 2009-12-03 14:10                                       ` Damian Lukowski
  2009-12-03 19:23                                         ` Frederic Leroy
  0 siblings, 1 reply; 63+ messages in thread
From: Damian Lukowski @ 2009-12-03 14:10 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Frederic Leroy, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

Ilpo Järvinen schrieb:
> On Thu, 3 Dec 2009, Damian Lukowski wrote:
> 
>>> On Thu, 3 Dec 2009, Frederic Leroy wrote:
>>>> On Wed, Dec 02, 2009 at 08:17:44PM +0100, Damian Lukowski wrote:
>>>>> could you please printk retrans_stamp just before the return in 
>>>>> include/net/tcp.h:retransmits_timed_out()?
>>>>> If the value is not monotonically increasing but is reset to 0 at some
>>>>> point, this might lead to problems in tcp_write_timeout().
>>>>> It's the only idea I have now.
>>>> Your idea is good.
>>>> Only one out of 4 value is not null.
>>>>
>>>> Logs corresponding on http://wwW.starox.org/pub/scp_stall is .10
>>>>
>>>> I make 2 attempts. Printk corresponding to .10 are those after the line 
>>>> "wlan1 enter promiscuous mode"
>>> Nice thinking indeed Damian, thanks. ...But but, where exactly did you 
>>> print? ...There are multiple returns and the return false branch is 
>>> expected to have a zero retrans_stamp in a typical case but that is not
>>> a problem because we never use the value.
>> Yes, it's the retrans_stamp in the subtraction I suspected to be 0.
>> I also suspect this to happen only in the ca_state < CA_Loss case,
>> so one first solution might be to return true whenever retrans_stamp == 0.
> 
> I suppose adding || !tp->retrans_stamp into the false condition is fine 
> as long as we don't then have a connection that can cause a connection 
> to hang there forever for some reason (this needs to be understood well 
> enough, not just test driven in stables :-)).
> 
>> Unluckily, I still cannot reproduce the scp stalls here, so it would be nice
>> if Frederic printed retrans_stamp together with icsk_ca_state and
>> icsk_retransmits, please.
> 
> It wouldn't hurt to know tp->packets_out and tp->retrans_out too, that 
> might have some significant w.r.t what happens because of FRTO.

I made a patch for Frederic with Codebase
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

Thanks for testing.


diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..c170948 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1268,7 +1268,8 @@ static inline bool retransmits_timed_out(const struct sock *sk,
 {
 	unsigned int timeout, linear_backoff_thresh;
 
-	if (!inet_csk(sk)->icsk_retransmits)
+	if (!inet_csk(sk)->icsk_retransmits ||
+	    !tcp_sk(sk)->retrans_stamp)
 		return false;
 
 	linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
@@ -1279,6 +1280,11 @@ static inline bool retransmits_timed_out(const struct sock *sk,
 		timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
 			  (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
 
+	printk("stamp, rstamp, retrans, ca, p_out, retr_out: "
+	       "%u, %u, %u, %u, %u, %u\n", tcp_time_stamp,
+	       tcp_sk(sk)->retrans_stamp, inet_csk(sk)->icsk_retransmits,
+	       inet_csk(sk)->icsk_ca_state, tcp_sk(sk)->packets_out,
+	       tcp_sk(sk)->retrans_out);
 	return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= timeout;
 }
 
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 10:29                                 ` Ilpo Järvinen
                                                     ` (2 preceding siblings ...)
  2009-12-03 12:11                                   ` Frederic Leroy
@ 2009-12-03 14:27                                   ` Arnd Hannemann
  2009-12-03 14:36                                     ` Ilpo Järvinen
  2009-12-03 18:32                                   ` Greg KH
  4 siblings, 1 reply; 63+ messages in thread
From: Arnd Hannemann @ 2009-12-03 14:27 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Frederic Leroy, Damian Lukowski, Netdev, David Miller,
	Eric Dumazet, Herbert Xu, Greg KH

Ilpo Järvinen wrote:

[snipped]

> Also, we have the another mystery to be solved, the fast retransmission is 
> not triggered for some reason (or alternatively not captured in to a 
> log), even in the working .9. case. It would be easy to see whether it 
> works at all from TCP point of view by looking into mibs once you have 
> have some transfers in a working configuration:
> 
> grep -A1 TCP /proc/net/netstat
> 
> ...luckily this fast retransmit issue is less crucial as almost all people 
> are pretty happy already if their RTO-based recovery works even if the 
> fast recovery would not. So figuring it out can be postponed (if one has 
> to prioritize) until the silent death issue is out of the way.
> 
> 

I looked at the working .9 case stream from 192.168.1.15 to 192.168.1.19.
I don't think it is a mystery that fast retransmit does not trigger.
The condition SACKED_DATA > 3* SMSS is simply not fulfilled.
Neither are there 3 non-continuous SACK sequences.
The segments sent are too small :-(
Interesting though, seems to me in this case non-SACK would be better than SACK.
Or did I miss something?

Hey we could cook up a draft for this problem ;-)

Anyway, real problem is, RTO does not trigger...

Best regards,
Arnd

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 14:27                                   ` Arnd Hannemann
@ 2009-12-03 14:36                                     ` Ilpo Järvinen
  2009-12-03 15:34                                       ` Arnd Hannemann
  0 siblings, 1 reply; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-03 14:36 UTC (permalink / raw)
  To: Arnd Hannemann
  Cc: Frederic Leroy, Damian Lukowski, Netdev, David Miller,
	Eric Dumazet, Herbert Xu, Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1797 bytes --]

On Thu, 3 Dec 2009, Arnd Hannemann wrote:

> Ilpo Järvinen wrote:
> 
> [snipped]
> 
> > Also, we have the another mystery to be solved, the fast retransmission is 
> > not triggered for some reason (or alternatively not captured in to a 
> > log), even in the working .9. case. It would be easy to see whether it 
> > works at all from TCP point of view by looking into mibs once you have 
> > have some transfers in a working configuration:
> > 
> > grep -A1 TCP /proc/net/netstat
> > 
> > ...luckily this fast retransmit issue is less crucial as almost all people 
> > are pretty happy already if their RTO-based recovery works even if the 
> > fast recovery would not. So figuring it out can be postponed (if one has 
> > to prioritize) until the silent death issue is out of the way.
> > 
> > 
> 
> I looked at the working .9 case stream from 192.168.1.15 to 192.168.1.19.
> I don't think it is a mystery that fast retransmit does not trigger.
> The condition SACKED_DATA > 3* SMSS is simply not fulfilled.
> Neither are there 3 non-continuous SACK sequences.
> The segments sent are too small :-(
> Interesting though, seems to me in this case non-SACK would be better than SACK.
> Or did I miss something?

Yes, a particularly big one, linux does not count SACKs bytes but packets. 
In the first recovery, plenty of packets are SACKed:

    135 sack 1 {2598:2646}>
    108 sack 1 {2598:2694}>
    121 sack 1 {2598:2742}>
     95 sack 1 {2598:2790}>
    426 sack 1 {2598:2838}>

fackets_out should be 6 now which is way more than 3 which is the 
default tp->reordering.

> Hey we could cook up a draft for this problem ;-)
>
> Anyway, real problem is, RTO does not trigger...

There are two problems. ...Both are real. ;-) But significance of the 
other is much worse than the other.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 14:36                                     ` Ilpo Järvinen
@ 2009-12-03 15:34                                       ` Arnd Hannemann
  2009-12-03 15:48                                         ` Ilpo Järvinen
  0 siblings, 1 reply; 63+ messages in thread
From: Arnd Hannemann @ 2009-12-03 15:34 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Frederic Leroy, Damian Lukowski, Netdev, David Miller,
	Eric Dumazet, Herbert Xu, Greg KH

Ilpo Järvinen wrote:
> On Thu, 3 Dec 2009, Arnd Hannemann wrote:
> 
>> Ilpo Järvinen wrote:
>>
>> [snipped]
>>
>>> Also, we have the another mystery to be solved, the fast retransmission is 
>>> not triggered for some reason (or alternatively not captured in to a 
>>> log), even in the working .9. case. It would be easy to see whether it 
>>> works at all from TCP point of view by looking into mibs once you have 
>>> have some transfers in a working configuration:
>>>
>>> grep -A1 TCP /proc/net/netstat
>>>
>>> ...luckily this fast retransmit issue is less crucial as almost all people 
>>> are pretty happy already if their RTO-based recovery works even if the 
>>> fast recovery would not. So figuring it out can be postponed (if one has 
>>> to prioritize) until the silent death issue is out of the way.
>>>
>>>
>> I looked at the working .9 case stream from 192.168.1.15 to 192.168.1.19.
>> I don't think it is a mystery that fast retransmit does not trigger.
>> The condition SACKED_DATA > 3* SMSS is simply not fulfilled.
>> Neither are there 3 non-continuous SACK sequences.
>> The segments sent are too small :-(
>> Interesting though, seems to me in this case non-SACK would be better than SACK.
>> Or did I miss something?
> 
> Yes, a particularly big one, linux does not count SACKs bytes but packets. 
> In the first recovery, plenty of packets are SACKed:
> 
>     135 sack 1 {2598:2646}>
>     108 sack 1 {2598:2694}>
>     121 sack 1 {2598:2742}>
>      95 sack 1 {2598:2790}>
>     426 sack 1 {2598:2838}>
> 
> fackets_out should be 6 now which is way more than 3 which is the 
> default tp->reordering.

Ok, you probable know better than me.
But, aren't the SKBs collapsed to SMSS size segments and then
counted? I thought so.
The 3*SMSS restriction is from RFC 3517, but of course you know.

> 
>> Hey we could cook up a draft for this problem ;-)
>>
>> Anyway, real problem is, RTO does not trigger...
> 
> There are two problems. ...Both are real. ;-) But significance of the 
> other is much worse than the other.

I agree.
I'm already trying to get scp stalling, but no luck so far. Neither with
artificially dropping packets, nor using WLAN :-(


Best regards,
Arnd

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 15:34                                       ` Arnd Hannemann
@ 2009-12-03 15:48                                         ` Ilpo Järvinen
  0 siblings, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-03 15:48 UTC (permalink / raw)
  To: Arnd Hannemann
  Cc: Frederic Leroy, Damian Lukowski, Netdev, David Miller,
	Eric Dumazet, Herbert Xu, Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3379 bytes --]

On Thu, 3 Dec 2009, Arnd Hannemann wrote:

> Ilpo Järvinen wrote:
> > On Thu, 3 Dec 2009, Arnd Hannemann wrote:
> > 
> >> Ilpo Järvinen wrote:
> >>
> >> [snipped]
> >>
> >>> Also, we have the another mystery to be solved, the fast retransmission is 
> >>> not triggered for some reason (or alternatively not captured in to a 
> >>> log), even in the working .9. case. It would be easy to see whether it 
> >>> works at all from TCP point of view by looking into mibs once you have 
> >>> have some transfers in a working configuration:
> >>>
> >>> grep -A1 TCP /proc/net/netstat
> >>>
> >>> ...luckily this fast retransmit issue is less crucial as almost all people 
> >>> are pretty happy already if their RTO-based recovery works even if the 
> >>> fast recovery would not. So figuring it out can be postponed (if one has 
> >>> to prioritize) until the silent death issue is out of the way.
> >>>
> >>>
> >> I looked at the working .9 case stream from 192.168.1.15 to 192.168.1.19.
> >> I don't think it is a mystery that fast retransmit does not trigger.
> >> The condition SACKED_DATA > 3* SMSS is simply not fulfilled.
> >> Neither are there 3 non-continuous SACK sequences.
> >> The segments sent are too small :-(
> >> Interesting though, seems to me in this case non-SACK would be better than SACK.
> >> Or did I miss something?
> > 
> > Yes, a particularly big one, linux does not count SACKs bytes but packets. 
> > In the first recovery, plenty of packets are SACKed:
> > 
> >     135 sack 1 {2598:2646}>
> >     108 sack 1 {2598:2694}>
> >     121 sack 1 {2598:2742}>
> >      95 sack 1 {2598:2790}>
> >     426 sack 1 {2598:2838}>
> > 
> > fackets_out should be 6 now which is way more than 3 which is the 
> > default tp->reordering.
> 
> Ok, you probable know better than me.
> But, aren't the SKBs collapsed to SMSS size segments and then
> counted? I thought so.
> The 3*SMSS restriction is from RFC 3517, but of course you know.

On the sender side (for SACKed skbs) we should retrain the segment 
counter still for the collapsed skb (at least in SACK code this was my 
intention but it could be that there is something wrong in that area).
Besides, I think I've seen the fast rexmit missing with "sack 3" (ie., 
three holes) case too so that would point out into some other bug.

Btw, we can potentially go well beyond MSS sized collapse too for the 
sacked skbs as long as there is room in sg frags. It's a different store 
for the rexmits though but that "collapse" is not significant here I 
think.

> >> Hey we could cook up a draft for this problem ;-)
> >>
> >> Anyway, real problem is, RTO does not trigger...
> > 
> > There are two problems. ...Both are real. ;-) But significance of the 
> > other is much worse than the other.
> 
> I agree.
> I'm already trying to get scp stalling, but no luck so far. Neither with
> artificially dropping packets, nor using WLAN :-(

I got it to happen but sadly scp stalled because of another issue related 
to rtt bloat (check this thread in archive if you're interested). I think 
that might need some clarification for 1323bis too but I'm currently 
thinking it through before giving my input/feedback on that on tcpm.

Are you sure you drop for the right direction, ie., for the ACK/scp flow 
control direction which sends those small packets? Data direction losses 
seem somewhat insignificant here.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 10:29                                 ` Ilpo Järvinen
                                                     ` (3 preceding siblings ...)
  2009-12-03 14:27                                   ` Arnd Hannemann
@ 2009-12-03 18:32                                   ` Greg KH
  2009-12-03 21:37                                     ` David Miller
  4 siblings, 1 reply; 63+ messages in thread
From: Greg KH @ 2009-12-03 18:32 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Frederic Leroy, Damian Lukowski, Netdev, Asdo, David Miller,
	Eric Dumazet, Herbert Xu

On Thu, Dec 03, 2009 at 12:29:39PM +0200, Ilpo Järvinen wrote:
> I've added Greg as CC to make him aware of this issue in early as it now 
> affects 2.6.32 too (rather important to get dealt quickly in stable once 
> we have a tested solution since TCP is pretty broken with the silent 
> deaths this problem seems to cause). ...One possibility would be to just 
> queue the tested revert to stable and sort this thing out for 2.6.33 in 
> net-2.6.
> 
> Opinions, Dave?, Greg?

As always, if you have a patch that you want in the -stable tree, send
it, with the git commit id of the patch that is in Linus's tree, to the
stable@kernel.org email address so I don't loose it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 14:10                                       ` Damian Lukowski
@ 2009-12-03 19:23                                         ` Frederic Leroy
  2009-12-03 20:34                                           ` Damian Lukowski
                                                             ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-12-03 19:23 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Ilpo Järvinen, Netdev, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

Le Thu, 03 Dec 2009 15:10:11 +0100,
Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
> > I suppose adding || !tp->retrans_stamp into the false condition is
> > fine as long as we don't then have a connection that can cause a
> > connection to hang there forever for some reason (this needs to be
> > understood well enough, not just test driven in stables :-)).
> > 
> >> Unluckily, I still cannot reproduce the scp stalls here, so it
> >> would be nice if Frederic printed retrans_stamp together with
> >> icsk_ca_state and icsk_retransmits, please.
> > 
> > It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
> > that might have some significant w.r.t what happens because of FRTO.
> 
> I made a patch for Frederic with Codebase
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> 
> Thanks for testing.

I made a new .11 trace with damian patch.
The copy went to the end. 

http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.pcap.bz2
http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.dmesg.bz2
http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-before.bz2
http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-after.bz2
http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.sysctl_net.ipv4.tcp-before.bz2

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 19:23                                         ` Frederic Leroy
@ 2009-12-03 20:34                                           ` Damian Lukowski
  2009-12-03 22:03                                             ` Frederic Leroy
  2009-12-04  9:36                                             ` Frederic Leroy
  2009-12-03 20:36                                           ` Ilpo Järvinen
  2009-12-05 22:32                                           ` Damian Lukowski
  2 siblings, 2 replies; 63+ messages in thread
From: Damian Lukowski @ 2009-12-03 20:34 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Ilpo Järvinen, Netdev, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

Frederic Leroy schrieb:
> Le Thu, 03 Dec 2009 15:10:11 +0100,
> Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
>>> I suppose adding || !tp->retrans_stamp into the false condition is
>>> fine as long as we don't then have a connection that can cause a
>>> connection to hang there forever for some reason (this needs to be
>>> understood well enough, not just test driven in stables :-)).
>>>
>>>> Unluckily, I still cannot reproduce the scp stalls here, so it
>>>> would be nice if Frederic printed retrans_stamp together with
>>>> icsk_ca_state and icsk_retransmits, please.
>>> It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
>>> that might have some significant w.r.t what happens because of FRTO.
>> I made a patch for Frederic with Codebase
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>>
>> Thanks for testing.
> 
> I made a new .11 trace with damian patch.
> The copy went to the end. 

Ok, at least this "fix" seems to work at first glance, but the printk
is quite useless now. Could you run another test with the printk's but
without the retrans_stamp == 0 check, please?

Thanks.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 19:23                                         ` Frederic Leroy
  2009-12-03 20:34                                           ` Damian Lukowski
@ 2009-12-03 20:36                                           ` Ilpo Järvinen
  2009-12-03 21:37                                             ` Frederic Leroy
  2009-12-05 22:32                                           ` Damian Lukowski
  2 siblings, 1 reply; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-03 20:36 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Damian Lukowski, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2172 bytes --]

On Thu, 3 Dec 2009, Frederic Leroy wrote:

> Le Thu, 03 Dec 2009 15:10:11 +0100,
> Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
> > > I suppose adding || !tp->retrans_stamp into the false condition is
> > > fine as long as we don't then have a connection that can cause a
> > > connection to hang there forever for some reason (this needs to be
> > > understood well enough, not just test driven in stables :-)).
> > > 
> > >> Unluckily, I still cannot reproduce the scp stalls here, so it
> > >> would be nice if Frederic printed retrans_stamp together with
> > >> icsk_ca_state and icsk_retransmits, please.
> > > 
> > > It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
> > > that might have some significant w.r.t what happens because of FRTO.
> > 
> > I made a patch for Frederic with Codebase
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> > 
> > Thanks for testing.
> 
> I made a new .11 trace with damian patch.
> The copy went to the end. 
> 
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.pcap.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.dmesg.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-before.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-after.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.sysctl_net.ipv4.tcp-before.bz2

So the || !tp->retrans_stamp certainly solves the silent death, however, 
annoying thing is that now we didn't get get the zero events logged :-).
Can you please print them also in that branch. Also, I forgot previously 
that tp->frto_counter would be nice to know. And since there seems to be a 
need for yet another case, in order to solve the other problem related to 
fast retransmit, also tp->lost_out and tp->sacked_out would be nice to
know.

MIBs tell:
...24 SACK recoveries started but never any retransmission in them...
...DSACK undo counter is probably miscounting, 28 is way too much to be 
realistic for those two DSACKs received :-). It might relate to this not 
retransmitting anything.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 18:32                                   ` Greg KH
@ 2009-12-03 21:37                                     ` David Miller
  0 siblings, 0 replies; 63+ messages in thread
From: David Miller @ 2009-12-03 21:37 UTC (permalink / raw)
  To: gregkh; +Cc: ilpo.jarvinen, fredo, damian, netdev, asdo, eric.dumazet, herbert

From: Greg KH <gregkh@suse.de>
Date: Thu, 3 Dec 2009 10:32:07 -0800

> On Thu, Dec 03, 2009 at 12:29:39PM +0200, Ilpo J^[$(D+#^[(Brvinen wrote:
>> I've added Greg as CC to make him aware of this issue in early as it now 
>> affects 2.6.32 too (rather important to get dealt quickly in stable once 
>> we have a tested solution since TCP is pretty broken with the silent 
>> deaths this problem seems to cause). ...One possibility would be to just 
>> queue the tested revert to stable and sort this thing out for 2.6.33 in 
>> net-2.6.
>> 
>> Opinions, Dave?, Greg?
> 
> As always, if you have a patch that you want in the -stable tree, send
> it, with the git commit id of the patch that is in Linus's tree, to the
> stable@kernel.org email address so I don't loose it.

Yep, we'll do that once we track this down and make a final decision
on how to fix it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 20:36                                           ` Ilpo Järvinen
@ 2009-12-03 21:37                                             ` Frederic Leroy
  0 siblings, 0 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-12-03 21:37 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Damian Lukowski, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

Le Thu, 3 Dec 2009 22:36:53 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> however, annoying thing is that now we didn't get get the zero events
> logged :-). Can you please print them also in that branch. Also, I
> forgot previously that tp->frto_counter would be nice to know. And
> since there seems to be a need for yet another case, in order to
> solve the other problem related to fast retransmit, also tp->lost_out
> and tp->sacked_out would be nice to know.

.12 is online, including patch

I added the fields and moved the printk at the beginning of
retransmits_timed_out as sk is not modified in the function.

I'm preparing one test without the "|| !tp->retrans_stamp"

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 20:34                                           ` Damian Lukowski
@ 2009-12-03 22:03                                             ` Frederic Leroy
  2009-12-04 10:41                                               ` Ilpo Järvinen
  2009-12-04  9:36                                             ` Frederic Leroy
  1 sibling, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-12-03 22:03 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Ilpo Järvinen, Netdev, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

Le Thu, 03 Dec 2009 21:34:00 +0100,
Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :

> Frederic Leroy schrieb:
> > Le Thu, 03 Dec 2009 15:10:11 +0100,
> > Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
> >>> I suppose adding || !tp->retrans_stamp into the false condition is
> >>> fine as long as we don't then have a connection that can cause a
> >>> connection to hang there forever for some reason (this needs to be
> >>> understood well enough, not just test driven in stables :-)).
> >>>
> >>>> Unluckily, I still cannot reproduce the scp stalls here, so it
> >>>> would be nice if Frederic printed retrans_stamp together with
> >>>> icsk_ca_state and icsk_retransmits, please.
> >>> It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
> >>> that might have some significant w.r.t what happens because of
> >>> FRTO.
> >> I made a patch for Frederic with Codebase
> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> >>
> >> Thanks for testing.
> > 
> > I made a new .11 trace with damian patch.
> > The copy went to the end. 
> 
> Ok, at least this "fix" seems to work at first glance, but the printk
> is quite useless now. Could you run another test with the printk's but
> without the retrans_stamp == 0 check, please?

Done, it's .13

It stalled.

I only manage to make a failing stream only once. It seems
after 23h the network is not interferenced.

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 20:34                                           ` Damian Lukowski
  2009-12-03 22:03                                             ` Frederic Leroy
@ 2009-12-04  9:36                                             ` Frederic Leroy
  2009-12-04 11:14                                               ` Ilpo Järvinen
  1 sibling, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-12-04  9:36 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Ilpo Järvinen, Netdev, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

On Thu, Dec 03, 2009 at 09:34:00PM +0100, Damian Lukowski wrote:
> Frederic Leroy schrieb:
>
> Ok, at least this "fix" seems to work at first glance, but the printk
> is quite useless now. Could you run another test with the printk's but
> without the retrans_stamp == 0 check, please?

I just made 3 more test (.14 .15 .16), all streams has failed.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 22:03                                             ` Frederic Leroy
@ 2009-12-04 10:41                                               ` Ilpo Järvinen
  0 siblings, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-04 10:41 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Damian Lukowski, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2973 bytes --]

On Thu, 3 Dec 2009, Frederic Leroy wrote:

> Le Thu, 03 Dec 2009 21:34:00 +0100,
> Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
> 
> > Frederic Leroy schrieb:
> > > Le Thu, 03 Dec 2009 15:10:11 +0100,
> > > Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
> > >>> I suppose adding || !tp->retrans_stamp into the false condition is
> > >>> fine as long as we don't then have a connection that can cause a
> > >>> connection to hang there forever for some reason (this needs to be
> > >>> understood well enough, not just test driven in stables :-)).
> > >>>
> > >>>> Unluckily, I still cannot reproduce the scp stalls here, so it
> > >>>> would be nice if Frederic printed retrans_stamp together with
> > >>>> icsk_ca_state and icsk_retransmits, please.
> > >>> It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
> > >>> that might have some significant w.r.t what happens because of
> > >>> FRTO.
> > >> I made a patch for Frederic with Codebase
> > >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> > >>
> > >> Thanks for testing.
> > > 
> > > I made a new .11 trace with damian patch.
> > > The copy went to the end. 
> > 
> > Ok, at least this "fix" seems to work at first glance, but the printk
> > is quite useless now. Could you run another test with the printk's but
> > without the retrans_stamp == 0 check, please?
> 
> Done, it's .13
> 
> It stalled.
> 
> I only manage to make a failing stream only once. It seems
> after 23h the network is not interferenced.

...Your neighbors fall early into sleep :-).

It seems that TCP was in recovery (ca_state 3) when the timeout triggered 
but for some reason no retransmissions in flight. Also, TCPCB_LOST was 
marked for skbs, thus making them available for retransmission. I've read 
tcp_xmit_retransmit_queue multiple times through (and also 
tcp_mark_head_lost) but cannot find out what would prevent the rexmit 
loop from sending the retransmissions. And they seem to work for the
opposite end though.

But there is more, also for the working case and RTOs (.12.) we see 
rexmits not always flying out, see this:

t, sacked_out: 208867, 208867, 1, 1, 6, 1, 1, 1, 4
t, sacked_out: 209426, 0, 0, 3, 6, 0, 0, 1, 5
t, sacked_out: 209426, 0, 0, 3, 6, 0, 0, 1, 5
t, sacked_out: 209426, 0, 1, 1, 6, 0, 1, 1, 5
t, sacked_out: 210256, 0, 1, 1, 6, 0, 1, 1, 5
t, sacked_out: 210256, 0, 1, 1, 6, 0, 1, 1, 5
t, sacked_out: 210256, 210256, 2, 1, 6, 1, 1, 1, 5
t, sacked_out: 210855, 0, 0, 3, 6, 0, 0, 1, 4

The fourth row has increase isck_retransmit (in tcp_retransmit_timer, 
called near the end) but there are no retransmission in flight (and that 
previous line is printed right before that so there is no room for races 
or so)? ...I suspect we find the problem from tcp_retransmit_skb or from 
the stuff it calls into if looking carefully enough. Besides, that is the 
only common denominator for fast recovery retransmissions and for the RTO 
retransmission.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-04  9:36                                             ` Frederic Leroy
@ 2009-12-04 11:14                                               ` Ilpo Järvinen
  2009-12-04 13:58                                                 ` Frederic Leroy
  0 siblings, 1 reply; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-04 11:14 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Damian Lukowski, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

On Fri, 4 Dec 2009, Frederic Leroy wrote:

> On Thu, Dec 03, 2009 at 09:34:00PM +0100, Damian Lukowski wrote:
> > Frederic Leroy schrieb:
> >
> > Ok, at least this "fix" seems to work at first glance, but the printk
> > is quite useless now. Could you run another test with the printk's but
> > without the retrans_stamp == 0 check, please?
> 
> I just made 3 more test (.14 .15 .16), all streams has failed.

I noticed that we'll absolutely need that || !retrans_stamp thing as 
at least this can result in icsk_retransmits without retrans_stamp:

        if (tcp_retransmit_skb(sk, tcp_write_queue_head(sk)) > 0) {
                /* Retransmission failed because of local congestion,
                 * do not backoff.
                 */
                if (!icsk->icsk_retransmits)
                        icsk->icsk_retransmits = 1;

But there may be other error handling related things here too. Can you try 
with the following once you can (probably can print quite much, once per 
arriving packet when in recovery):

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fcd278a..19934d2 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1898,18 +1898,24 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	 * copying overhead: fragmentation, tunneling, mangling etc.
 	 */
 	if (atomic_read(&sk->sk_wmem_alloc) >
-	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
-		return -EAGAIN;
+	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf)) {
+		err = -EAGAIN;
+		goto out;
+	}
 
 	if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) {
 		if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))
 			BUG();
-		if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))
-			return -ENOMEM;
+		if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)) {
+			err = -ENOMEM;
+			goto out;
+		}
 	}
 
-	if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk))
-		return -EHOSTUNREACH; /* Routing failure or similar. */
+	if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk)) {
+		err = -EHOSTUNREACH; /* Routing failure or similar. */
+		goto out;
+	}
 
 	cur_mss = tcp_current_mss(sk);
 
@@ -1919,12 +1925,16 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	 * our retransmit serves as a zero window probe.
 	 */
 	if (!before(TCP_SKB_CB(skb)->seq, tcp_wnd_end(tp))
-	    && TCP_SKB_CB(skb)->seq != tp->snd_una)
-		return -EAGAIN;
+	    && TCP_SKB_CB(skb)->seq != tp->snd_una) {
+	    	err = -EAGAIN;
+	    	goto out;
+	}
 
 	if (skb->len > cur_mss) {
-		if (tcp_fragment(sk, skb, cur_mss, cur_mss))
-			return -ENOMEM; /* We'll try again later. */
+		if (tcp_fragment(sk, skb, cur_mss, cur_mss)) {
+			err = -ENOMEM; /* We'll try again later. */
+			goto out;
+		}
 	} else {
 		int oldpcount = tcp_skb_pcount(skb);
 
@@ -1986,6 +1996,9 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 		 */
 		TCP_SKB_CB(skb)->ack_seq = tp->snd_nxt;
 	}
+out:
+	if (err)
+		printk("rxt %p %u %d ca:%d\n", sk, TCP_SKB_CB(skb)->seq, err, icsk->icsk_ca_state);
 	return err;
 }
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-04 11:14                                               ` Ilpo Järvinen
@ 2009-12-04 13:58                                                 ` Frederic Leroy
  2009-12-04 15:09                                                   ` Ilpo Järvinen
  0 siblings, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-12-04 13:58 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Damian Lukowski, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

On Fri, Dec 04, 2009 at 01:14:46PM +0200, Ilpo Järvinen wrote:
> On Fri, 4 Dec 2009, Frederic Leroy wrote:
> 
> I noticed that we'll absolutely need that || !retrans_stamp thing as 
> [...] 
> But there may be other error handling related things here too. Can you try 
> with the following once you can (probably can print quite much, once per 
> arriving packet when in recovery):

I made 2 more trace .15 and .16 but ... 
I applied your patch and forgot to add "|| !retrans_stamp" thing.

I can't test another kernel until this evening :(

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-04 13:58                                                 ` Frederic Leroy
@ 2009-12-04 15:09                                                   ` Ilpo Järvinen
  0 siblings, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-04 15:09 UTC (permalink / raw)
  To: Frederic Leroy, Eric Dumazet
  Cc: Damian Lukowski, Netdev, David Miller, Herbert Xu, Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1199 bytes --]

On Fri, 4 Dec 2009, Frederic Leroy wrote:

> On Fri, Dec 04, 2009 at 01:14:46PM +0200, Ilpo Järvinen wrote:
> > On Fri, 4 Dec 2009, Frederic Leroy wrote:
> > 
> > I noticed that we'll absolutely need that || !retrans_stamp thing as 
> > [...] 
> > But there may be other error handling related things here too. Can you try 
> > with the following once you can (probably can print quite much, once per 
> > arriving packet when in recovery):
> 
> I made 2 more trace .15 and .16 but ... 
> I applied your patch and forgot to add "|| !retrans_stamp" thing.
> 
> I can't test another kernel until this evening :(

I think it's fine without it too (I just meant that we'll need that || 
!retrans_stamp into mainline/stable for sure even if we find the cause for 
the another problem) and your traces now confirm what I was suspecting.
The retransmissions do not succeed because of error (-EAGAIN). Possibly 
from this check: 

        if (atomic_read(&sk->sk_wmem_alloc) >
            min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
                return -EAGAIN;

...Or alternatively it comes from queue_xmit.

Eric, can you please take a look if you have a clue on this?


-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-03 19:23                                         ` Frederic Leroy
  2009-12-03 20:34                                           ` Damian Lukowski
  2009-12-03 20:36                                           ` Ilpo Järvinen
@ 2009-12-05 22:32                                           ` Damian Lukowski
  2009-12-06 10:29                                             ` Ilpo Järvinen
                                                               ` (2 more replies)
  2 siblings, 3 replies; 63+ messages in thread
From: Damian Lukowski @ 2009-12-05 22:32 UTC (permalink / raw)
  To: Frederic Leroy
  Cc: Ilpo Järvinen, Netdev, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

Frederic Leroy schrieb:
> Le Thu, 03 Dec 2009 15:10:11 +0100,
> Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
>>> I suppose adding || !tp->retrans_stamp into the false condition is
>>> fine as long as we don't then have a connection that can cause a
>>> connection to hang there forever for some reason (this needs to be
>>> understood well enough, not just test driven in stables :-)).
>>>
>>>> Unluckily, I still cannot reproduce the scp stalls here, so it
>>>> would be nice if Frederic printed retrans_stamp together with
>>>> icsk_ca_state and icsk_retransmits, please.
>>> It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
>>> that might have some significant w.r.t what happens because of FRTO.
>> I made a patch for Frederic with Codebase
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>>
>> Thanks for testing.
> 
> I made a new .11 trace with damian patch.
> The copy went to the end.

Could you please make another test and unplug the cable or drop ACKs for
several seconds, so that some RTO retransmissions are performed?
I'd like to see if retrans_stamp remains constant. In the dmesg output of
the 11th run, it seems to change while icsk_retransmits also increases.
This is kind of bad for connection timeout calculation in the RTO case ...

Damian
 
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.pcap.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.dmesg.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-before.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.proc_net_netstat-after.bz2
> http://www.starox.org/pub/scp_stall/scp_stall-houba.11-patch_damian.sysctl_net.ipv4.tcp-before.bz2


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-05 22:32                                           ` Damian Lukowski
@ 2009-12-06 10:29                                             ` Ilpo Järvinen
  2009-12-06 17:48                                               ` Ilpo Järvinen
  2009-12-06 22:44                                               ` Damian Lukowski
  2009-12-06 20:32                                             ` Frederic Leroy
  2009-12-07 14:01                                             ` Ilpo Järvinen
  2 siblings, 2 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-06 10:29 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Frederic Leroy, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2911 bytes --]

On Sat, 5 Dec 2009, Damian Lukowski wrote:

> Frederic Leroy schrieb:
> > Le Thu, 03 Dec 2009 15:10:11 +0100,
> > Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
> >>> I suppose adding || !tp->retrans_stamp into the false condition is
> >>> fine as long as we don't then have a connection that can cause a
> >>> connection to hang there forever for some reason (this needs to be
> >>> understood well enough, not just test driven in stables :-)).
> >>>
> >>>> Unluckily, I still cannot reproduce the scp stalls here, so it
> >>>> would be nice if Frederic printed retrans_stamp together with
> >>>> icsk_ca_state and icsk_retransmits, please.
> >>> It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
> >>> that might have some significant w.r.t what happens because of FRTO.
> >> I made a patch for Frederic with Codebase
> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> >>
> >> Thanks for testing.
> > 
> > I made a new .11 trace with damian patch.
> > The copy went to the end.
> 
> Could you please make another test and unplug the cable or drop ACKs for
> several seconds, so that some RTO retransmissions are performed?
> I'd like to see if retrans_stamp remains constant. In the dmesg output of
> the 11th run, it seems to change while icsk_retransmits also increases.
> This is kind of bad for connection timeout calculation in the RTO case ...

...Good point, I think that's another bug in this area. We should prevent 
retrans_stamp update when the RTO itself caused tp->retrans_out to become
zero. This bug affects also other things that are based on retrans_stamp, 
not only your code but there the effect is just less devastating.

...Anyway, we know already what happens by reading code when we know the 
case what to look for. If tcp_retransmit_skb encounters any error no 
retrans_stamp is changed (besides, cable might not be enough if you don't 
lose the route, depends on configuration what happens). And if retrans_out 
is zero, the stamp gets updated (assuming the rexmit was successfully 
made without some error condition).

What I think we need is something like this to handle all error cases 
cleanly (plus to fix that another bug I mentioned above):

	if (!icsk->icsk_retransmits)
		return false;

	if (unlikely(!tp->retrans_stamp)) {
		start_ts = TCP_SKB_CB(tcp_write_queue_head(sk))->when;
	} else {
		start_ts = tp->retrans_stamp;
	}

...and then use that timestamp in the substraction. It handles the case 
where retrans_stamp was never updated (agreeably there are corner cases 
even then where the connection doesn't die exactly when we would want when 
retrans_stamp got set on some high icsk_retransmits but still it would 
only be less than double of the specified time). ...We just need to audit 
that tcp_write_queue_head and when are always valid when call happens (I 
think they should be but that has to checked).

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-06 10:29                                             ` Ilpo Järvinen
@ 2009-12-06 17:48                                               ` Ilpo Järvinen
  2009-12-06 22:44                                               ` Damian Lukowski
  1 sibling, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-06 17:48 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Frederic Leroy, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 288 bytes --]

On Sun, 6 Dec 2009, Ilpo Järvinen wrote:

[...snip...]

> And if retrans_out is zero, the stamp gets updated (assuming the rexmit 
> was successfully made without some error condition).

This of course isn't true (before somebody starts to wonder why it 
doesn't match the code).

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-05 22:32                                           ` Damian Lukowski
  2009-12-06 10:29                                             ` Ilpo Järvinen
@ 2009-12-06 20:32                                             ` Frederic Leroy
  2009-12-07 14:01                                             ` Ilpo Järvinen
  2 siblings, 0 replies; 63+ messages in thread
From: Frederic Leroy @ 2009-12-06 20:32 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Ilpo Järvinen, Netdev, David Miller, Eric Dumazet,
	Herbert Xu, Greg KH

Le Sat, 05 Dec 2009 23:32:25 +0100,
Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
>
> Could you please make another test and unplug the cable or drop ACKs
> for several seconds, so that some RTO retransmissions are performed?
> I'd like to see if retrans_stamp remains constant. In the dmesg
> output of the 11th run, it seems to change while icsk_retransmits
> also increases. This is kind of bad for connection timeout
> calculation in the RTO case ...

.17 is online :)
I disconnected physically the stream 2 times. The first was maybe too
short and recover immediatly. The second disconnect is longer and
took several second to recover.
I hope this help !

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-06 10:29                                             ` Ilpo Järvinen
  2009-12-06 17:48                                               ` Ilpo Järvinen
@ 2009-12-06 22:44                                               ` Damian Lukowski
  2009-12-06 23:09                                                 ` Ilpo Järvinen
  1 sibling, 1 reply; 63+ messages in thread
From: Damian Lukowski @ 2009-12-06 22:44 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Frederic Leroy, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

Ilpo Järvinen schrieb:
> On Sat, 5 Dec 2009, Damian Lukowski wrote:
> 
>> Frederic Leroy schrieb:
>>> Le Thu, 03 Dec 2009 15:10:11 +0100,
>>> Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
>>>>> I suppose adding || !tp->retrans_stamp into the false condition is
>>>>> fine as long as we don't then have a connection that can cause a
>>>>> connection to hang there forever for some reason (this needs to be
>>>>> understood well enough, not just test driven in stables :-)).
>>>>>
>>>>>> Unluckily, I still cannot reproduce the scp stalls here, so it
>>>>>> would be nice if Frederic printed retrans_stamp together with
>>>>>> icsk_ca_state and icsk_retransmits, please.
>>>>> It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
>>>>> that might have some significant w.r.t what happens because of FRTO.
>>>> I made a patch for Frederic with Codebase
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>>>>
>>>> Thanks for testing.
>>> I made a new .11 trace with damian patch.
>>> The copy went to the end.
>> Could you please make another test and unplug the cable or drop ACKs for
>> several seconds, so that some RTO retransmissions are performed?
>> I'd like to see if retrans_stamp remains constant. In the dmesg output of
>> the 11th run, it seems to change while icsk_retransmits also increases.
>> This is kind of bad for connection timeout calculation in the RTO case ...
> 
> 
> ...Good point, I think that's another bug in this area. We should prevent 
> retrans_stamp update when the RTO itself caused tp->retrans_out to become
> zero. This bug affects also other things that are based on retrans_stamp, 
> not only your code but there the effect is just less devastating.
> 
> ...Anyway, we know already what happens by reading code when we know the 
> case what to look for. If tcp_retransmit_skb encounters any error no 
> retrans_stamp is changed (besides, cable might not be enough if you don't 
> lose the route, depends on configuration what happens). And if retrans_out 
> is zero, the stamp gets updated (assuming the rexmit was successfully 
> made without some error condition).
> 
> What I think we need is something like this to handle all error cases 
> cleanly (plus to fix that another bug I mentioned above):
> 
> 	if (!icsk->icsk_retransmits)
> 		return false;
> 
> 	if (unlikely(!tp->retrans_stamp)) {
> 		start_ts = TCP_SKB_CB(tcp_write_queue_head(sk))->when;
> 	} else {
> 		start_ts = tp->retrans_stamp;
> 	}
> 
> ...and then use that timestamp in the substraction. It handles the case 
> where retrans_stamp was never updated (agreeably there are corner cases 
> even then where the connection doesn't die exactly when we would want when 
> retrans_stamp got set on some high icsk_retransmits but still it would 
> only be less than double of the specified time). ...We just need to audit 
> that tcp_write_queue_head and when are always valid when call happens (I 
> think they should be but that has to checked).

retransmits_timed_out() is called directly by tcp_retransmit_timer() and by
tcp_write_timeout() which itself is only called by tcp_retransmit_timer().
As tcp_retransmit_timer() calls tcp_write_queue_head() at two points,
I think it is safe to assume, that it will be valid in
retransmits_timed_out() as well.
Also I have checked, that any call of tcp_transmit_skb() is preceded by
an update of cb->when. So it should be set and valid in
retransmits_timed_out(). Do you agree?

Anyway, how to proceed now? Should I make a patch and post it? If yes, on
which mailing list, which codebase? Or maybe it should be posted in a
series of patches, which also fixes the ENOMEM issue (which I do not really
have a clue about)?

Damian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-06 22:44                                               ` Damian Lukowski
@ 2009-12-06 23:09                                                 ` Ilpo Järvinen
  0 siblings, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-06 23:09 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Frederic Leroy, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5007 bytes --]

On Sun, 6 Dec 2009, Damian Lukowski wrote:

> Ilpo Järvinen schrieb:
> > On Sat, 5 Dec 2009, Damian Lukowski wrote:
> > 
> >> Frederic Leroy schrieb:
> >>> Le Thu, 03 Dec 2009 15:10:11 +0100,
> >>> Damian Lukowski <damian@tvk.rwth-aachen.de> a écrit :
> >>>>> I suppose adding || !tp->retrans_stamp into the false condition is
> >>>>> fine as long as we don't then have a connection that can cause a
> >>>>> connection to hang there forever for some reason (this needs to be
> >>>>> understood well enough, not just test driven in stables :-)).
> >>>>>
> >>>>>> Unluckily, I still cannot reproduce the scp stalls here, so it
> >>>>>> would be nice if Frederic printed retrans_stamp together with
> >>>>>> icsk_ca_state and icsk_retransmits, please.
> >>>>> It wouldn't hurt to know tp->packets_out and tp->retrans_out too,
> >>>>> that might have some significant w.r.t what happens because of FRTO.
> >>>> I made a patch for Frederic with Codebase
> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> >>>>
> >>>> Thanks for testing.
> >>> I made a new .11 trace with damian patch.
> >>> The copy went to the end.
> >> Could you please make another test and unplug the cable or drop ACKs for
> >> several seconds, so that some RTO retransmissions are performed?
> >> I'd like to see if retrans_stamp remains constant. In the dmesg output of
> >> the 11th run, it seems to change while icsk_retransmits also increases.
> >> This is kind of bad for connection timeout calculation in the RTO case ...
> > 
> > 
> > ...Good point, I think that's another bug in this area. We should prevent 
> > retrans_stamp update when the RTO itself caused tp->retrans_out to become
> > zero. This bug affects also other things that are based on retrans_stamp, 
> > not only your code but there the effect is just less devastating.
> > 
> > ...Anyway, we know already what happens by reading code when we know the 
> > case what to look for. If tcp_retransmit_skb encounters any error no 
> > retrans_stamp is changed (besides, cable might not be enough if you don't 
> > lose the route, depends on configuration what happens). And if retrans_out 
> > is zero, the stamp gets updated (assuming the rexmit was successfully 
> > made without some error condition).
> > 
> > What I think we need is something like this to handle all error cases 
> > cleanly (plus to fix that another bug I mentioned above):
> > 
> > 	if (!icsk->icsk_retransmits)
> > 		return false;
> > 
> > 	if (unlikely(!tp->retrans_stamp)) {
> > 		start_ts = TCP_SKB_CB(tcp_write_queue_head(sk))->when;
> > 	} else {
> > 		start_ts = tp->retrans_stamp;
> > 	}
> > 
> > ...and then use that timestamp in the substraction. It handles the case 
> > where retrans_stamp was never updated (agreeably there are corner cases 
> > even then where the connection doesn't die exactly when we would want when 
> > retrans_stamp got set on some high icsk_retransmits but still it would 
> > only be less than double of the specified time). ...We just need to audit 
> > that tcp_write_queue_head and when are always valid when call happens (I 
> > think they should be but that has to checked).
> 
> retransmits_timed_out() is called directly by tcp_retransmit_timer() and by
> tcp_write_timeout() which itself is only called by tcp_retransmit_timer().
> As tcp_retransmit_timer() calls tcp_write_queue_head() at two points,
> I think it is safe to assume, that it will be valid in
> retransmits_timed_out() as well.
> Also I have checked, that any call of tcp_transmit_skb() is preceded by
> an update of cb->when. So it should be set and valid in
> retransmits_timed_out(). Do you agree?

That's what I've been reading out too.

> Anyway, how to proceed now? Should I make a patch and post it? If yes, on
> which mailing list, which codebase?

To netdev, keep ccs from this thread but change the subject. Once 
Frederic has tested them we can put them onward with his Tested-by:.

I was kind of hoping you would have posted a more official version of the 
!tp->retrans_stamp patch already earlier (without any debug garbage and 
proper description etc.), after all, you did most the hard work in 
figuring out what was wrong with it :-). Feel still free to send, 
otherwise I'll be doing that tomorrow if I've some time. ...I intend to 
take a look on this retrans_stamp bug and come up with some solution and 
see if there is some clean up work necessary to be done in net-next 
timeframe for that stuff.

> Or maybe it should be posted in a series of patches, which also fixes 
> the ENOMEM issue (which I do not really have a clue about)?

...Plus I intend to provided a debug patch to the -EAGAIN problem as it 
is still unclear to me what exactly happens there, however, the other 
problems need fix regardless of that and are independent enough to not be 
included into a same fix. This !retrans_stamp issue is really a high 
priority one.

...In addition, I have the RTT bloat problem to handle but not with so 
high priority.

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-05 22:32                                           ` Damian Lukowski
  2009-12-06 10:29                                             ` Ilpo Järvinen
  2009-12-06 20:32                                             ` Frederic Leroy
@ 2009-12-07 14:01                                             ` Ilpo Järvinen
  2009-12-07 22:18                                               ` Frederic Leroy
  2009-12-09  4:54                                               ` David Miller
  2 siblings, 2 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-07 14:01 UTC (permalink / raw)
  To: Damian Lukowski
  Cc: Frederic Leroy, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5259 bytes --]

On Sat, 5 Dec 2009, Damian Lukowski wrote:

> Could you please make another test and unplug the cable or drop ACKs for
> several seconds, so that some RTO retransmissions are performed?
> I'd like to see if retrans_stamp remains constant. In the dmesg output of
> the 11th run, it seems to change while icsk_retransmits also increases.
> This is kind of bad for connection timeout calculation in the RTO case ...

After taking some more look into this, this is partly a red herring. It 
looks like that because of the place of the printk that was still in the 
end of the function. You can see the full trace of what happens in .13., 
they come from independent incarnations of RTO recovery (when finally no 
error happens in tcp_retransmit_skb).

However, the problem itself could occur. Here's the patch which should 
prevent that (I'm rather convinced that this really isn't stable worthy 
but net-next or net-2.6 would be fine):

--
[PATCH] tcp: fix retrans_stamp advancing in error cases

It can happen, that tcp_retransmit_skb fails due to some error.
In such cases we might end up into a state where tp->retrans_out
is zero but that's only because we removed the TCPCB_SACKED_RETRANS
bit from a segment but couldn't retransmit it because of the error
that happened. Therefore some assumptions that retrans_out checks
are based do not necessarily hold, as there still can be an old
retransmission but that is only visible in TCPCB_EVER_RETRANS bit.
As retransmission happen in sequential order (except for some very
rare corner cases), it's enough to check the head skb for that bit.

Main reason for all this complexity is the fact that connection dying
time now depends on the validity of the retrans_stamp, in particular,
that successive retransmissions of a segment must not advance
retrans_stamp under any conditions. It seems after quick thinking that
this has relatively low impact as eventually TCP will go into CA_Loss
and either use the existing check for !retrans_stamp case or send a
retransmission successfully, setting a new base time for the dying
timer (can happen only once). At worst, the dying time will be
approximately the double of the intented time. In addition,
tcp_packet_delayed() will return wrong result (has some cc aspects
but due to rarity of these errors, it's hardly an issue).

One of retrans_stamp clearing happens indirectly through first going
into CA_Open state and then a later ACK lets the clearing to happen.
Thus tcp_try_keep_open has to be modified too.

Thanks to Damian Lukowski <damian@tvk.rwth-aachen.de> for hinting
that this possibility exists (though the particular case discussed
didn't after all have it happening but was just a debug patch
artifact).

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   35 ++++++++++++++++++++++++++++++++---
 1 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d86784b..5608691 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2717,6 +2717,35 @@ static void tcp_try_undo_dsack(struct sock *sk)
 	}
 }

+/* We can clear retrans_stamp when there are no retransmissions in the
+ * window. It would seem that it is trivially available for us in
+ * tp->retrans_out, however, that kind of assumptions doesn't consider
+ * what will happen if errors occur when sending retransmission for the
+ * second time. ...It could the that such segment has only
+ * TCPCB_EVER_RETRANS set at the present time. It seems that checking
+ * the head skb is enough except for some reneging corner cases that
+ * are not worth the effort.
+ *
+ * Main reason for all this complexity is the fact that connection dying
+ * time now depends on the validity of the retrans_stamp, in particular,
+ * that successive retransmissions of a segment must not advance
+ * retrans_stamp under any conditions.
+ */
+static int tcp_any_retrans_done(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+
+	if (tp->retrans_out)
+		return 1;
+
+	skb = tcp_write_queue_head(sk);
+	if (unlikely(skb && TCP_SKB_CB(skb)->sacked & TCPCB_EVER_RETRANS))
+		return 1;
+
+	return 0;
+}
+
 /* Undo during fast recovery after partial ACK. */

 static int tcp_try_undo_partial(struct sock *sk, int acked)
@@ -2729,7 +2758,7 @@ static int tcp_try_undo_partial(struct sock *sk, int acked)
 		/* Plain luck! Hole if filled with delayed
 		 * packet, rather than with a retransmit.
 		 */
-		if (tp->retrans_out == 0)
+		if (!tcp_any_retrans_done(sk))
 			tp->retrans_stamp = 0;

 		tcp_update_reordering(sk, tcp_fackets_out(tp) + acked, 1);
@@ -2788,7 +2817,7 @@ static void tcp_try_keep_open(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	int state = TCP_CA_Open;

-	if (tcp_left_out(tp) || tp->retrans_out || tp->undo_marker)
+	if (tcp_left_out(tp) || tcp_any_retrans_done(sk) || tp->undo_marker)
 		state = TCP_CA_Disorder;

 	if (inet_csk(sk)->icsk_ca_state != state) {
@@ -2803,7 +2832,7 @@ static void tcp_try_to_open(struct sock *sk, int flag)

 	tcp_verify_left_out(tp);

-	if (!tp->frto_counter && tp->retrans_out == 0)
+	if (!tp->frto_counter && !tcp_any_retrans_done(sk))
 		tp->retrans_stamp = 0;

 	if (flag & FLAG_ECE)
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-07 14:01                                             ` Ilpo Järvinen
@ 2009-12-07 22:18                                               ` Frederic Leroy
  2009-12-07 22:38                                                 ` Ilpo Järvinen
  2009-12-09  4:54                                               ` David Miller
  1 sibling, 1 reply; 63+ messages in thread
From: Frederic Leroy @ 2009-12-07 22:18 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Damian Lukowski, Netdev, David Miller, Eric Dumazet, Herbert Xu,
	Greg KH

Le Mon, 7 Dec 2009 16:01:53 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> On Sat, 5 Dec 2009, Damian Lukowski wrote:
> 
> > Could you please make another test and unplug the cable or drop
> > [...]
> After taking some more look into this, this is partly a red herring.
> It looks like that because of the place of the printk that was still
> in the end of the function. You can see the full trace of what
> happens in .13., they come from independent incarnations of RTO
> recovery (when finally no error happens in tcp_retransmit_skb).

Doh ! Sorry :( 

> However, the problem itself could occur. Here's the patch which
> should prevent that (I'm rather convinced that this really isn't
> stable worthy but net-next or net-2.6 would be fine):
> 
> --
> [PATCH] tcp: fix retrans_stamp advancing in error cases
> [...]

Tonight, I made 2 more tests : .20 and .21 . 

The first with latest damian patch from today.
Added the printk (This time I doubled checked ;).
Start the copy, wait 20s, disconnect cable 20s, reconnect. 

The second try was identical, but I added your patch.
The copy was slower comparing to the first try.

I didn't took time to understand tcp retransmission timeout and read
the code. So, I'm not sure the printk is at the good place and usefull.

-- 
Frédéric Leroy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-07 22:18                                               ` Frederic Leroy
@ 2009-12-07 22:38                                                 ` Ilpo Järvinen
  0 siblings, 0 replies; 63+ messages in thread
From: Ilpo Järvinen @ 2009-12-07 22:38 UTC (permalink / raw)
  To: Frederic Leroy; +Cc: Damian Lukowski, David Miller, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2820 bytes --]

Trimmed Ccs.

On Mon, 7 Dec 2009, Frederic Leroy wrote:

> Le Mon, 7 Dec 2009 16:01:53 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :
> 
> > On Sat, 5 Dec 2009, Damian Lukowski wrote:
> > 
> > > Could you please make another test and unplug the cable or drop
> > > [...]
> > After taking some more look into this, this is partly a red herring.
> > It looks like that because of the place of the printk that was still
> > in the end of the function. You can see the full trace of what
> > happens in .13., they come from independent incarnations of RTO
> > recovery (when finally no error happens in tcp_retransmit_skb).
> 
> Doh ! Sorry :( 

Now I think we have had just too many testcases and are all confused :-).
I was referring to the case .11. (the same case as Damian did) ...Not 
something too newish you did, sorry about that.

> > However, the problem itself could occur. Here's the patch which
> > should prevent that (I'm rather convinced that this really isn't
> > stable worthy but net-next or net-2.6 would be fine):
> > 
> > --
> > [PATCH] tcp: fix retrans_stamp advancing in error cases
> > [...]
> 
> Tonight, I made 2 more tests : .20 and .21 . 
> 
> The first with latest damian patch from today.
> Added the printk (This time I doubled checked ;).
> Start the copy, wait 20s, disconnect cable 20s, reconnect. 
> 
> The second try was identical, but I added your patch.
> The copy was slower comparing to the first try.

The losses you are getting are somewhat random process, so it is usually 
the main explination on different transfer rates. One thing leads to 
another and therefore one case suffers more than other.

> I didn't took time to understand tcp retransmission timeout and read
> the code. So, I'm not sure the printk is at the good place and usefull.

Thanks anyway for all testing so far. I'll try to come up with the other 
debug patch tomorrow to get some information on that -EAGAIN. Unless you 
want to do it yourself and printk all the variables involved in this check 
(in tcp_output.c):

        /* Do not sent more than we queued. 1/4 is reserved for possible
         * copying overhead: fragmentation, tunneling, mangling etc.
         */
        if (atomic_read(&sk->sk_wmem_alloc) >
            min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
                return -EAGAIN;

...better to print them before the check it regardless of the actual 
result of the comparison.

...So far only the Damian's patch is clearly required for stable (but I 
suppose DaveM will handle the stable submissions as usual, hopefully it 
won't take too long though as some other people might start reporting this 
same issue once some time has passed and they notice that something is 
wrong with TCP of their new and shiny 2.6.32 :-)).

-- 
 i.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: scp stalls mysteriously
  2009-12-07 14:01                                             ` Ilpo Järvinen
  2009-12-07 22:18                                               ` Frederic Leroy
@ 2009-12-09  4:54                                               ` David Miller
  1 sibling, 0 replies; 63+ messages in thread
From: David Miller @ 2009-12-09  4:54 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: damian, fredo, netdev, eric.dumazet, herbert, gregkh

From: "Ilpo J^[$(D+#^[(Brvinen" <ilpo.jarvinen@helsinki.fi>
Date: Mon, 7 Dec 2009 16:01:53 +0200 (EET)

> [PATCH] tcp: fix retrans_stamp advancing in error cases

Applied to net-2.6

Not queued to -stable, let me know if that becomes necessary.

Thanks!

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2009-12-09  4:54 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-27 21:35 scp stalls mysteriously Frederic Leroy
2009-11-27 22:12 ` Ilpo Järvinen
     [not found]   ` <20091128010156.1219012a@houba>
2009-11-28 11:31     ` Ilpo Järvinen
2009-11-29 22:13       ` Ilpo Järvinen
2009-11-30 18:50       ` Frederic Leroy
2009-11-30 19:10         ` Ilpo Järvinen
2009-11-30 20:18         ` Ilpo Järvinen
2009-11-30 20:37           ` Frederic Leroy
2009-11-30 21:37             ` Ilpo Järvinen
2009-11-30 22:19               ` Frederic Leroy
2009-12-01 20:19               ` Frederic Leroy
2009-12-01 20:27                 ` Ilpo Järvinen
2009-12-02  7:59                   ` Frederic Leroy
2009-12-02 12:59                     ` Ilpo Järvinen
2009-12-02 15:44                       ` Frederic Leroy
2009-12-02 16:05                         ` Ilpo Järvinen
2009-12-02 17:34                           ` Frederic Leroy
2009-12-02 19:17                             ` Damian Lukowski
2009-12-03  8:59                               ` Frederic Leroy
2009-12-03 10:29                                 ` Ilpo Järvinen
2009-12-03 10:34                                   ` David Miller
2009-12-03 10:48                                     ` Ilpo Järvinen
2009-12-03 12:19                                       ` Asdo
2009-12-03 11:57                                   ` Damian Lukowski
2009-12-03 12:19                                     ` Damian Lukowski
2009-12-03 12:49                                     ` Ilpo Järvinen
2009-12-03 14:10                                       ` Damian Lukowski
2009-12-03 19:23                                         ` Frederic Leroy
2009-12-03 20:34                                           ` Damian Lukowski
2009-12-03 22:03                                             ` Frederic Leroy
2009-12-04 10:41                                               ` Ilpo Järvinen
2009-12-04  9:36                                             ` Frederic Leroy
2009-12-04 11:14                                               ` Ilpo Järvinen
2009-12-04 13:58                                                 ` Frederic Leroy
2009-12-04 15:09                                                   ` Ilpo Järvinen
2009-12-03 20:36                                           ` Ilpo Järvinen
2009-12-03 21:37                                             ` Frederic Leroy
2009-12-05 22:32                                           ` Damian Lukowski
2009-12-06 10:29                                             ` Ilpo Järvinen
2009-12-06 17:48                                               ` Ilpo Järvinen
2009-12-06 22:44                                               ` Damian Lukowski
2009-12-06 23:09                                                 ` Ilpo Järvinen
2009-12-06 20:32                                             ` Frederic Leroy
2009-12-07 14:01                                             ` Ilpo Järvinen
2009-12-07 22:18                                               ` Frederic Leroy
2009-12-07 22:38                                                 ` Ilpo Järvinen
2009-12-09  4:54                                               ` David Miller
2009-12-03 12:11                                   ` Frederic Leroy
2009-12-03 12:44                                     ` Ilpo Järvinen
2009-12-03 13:37                                     ` Arnd Hannemann
2009-12-03 14:27                                   ` Arnd Hannemann
2009-12-03 14:36                                     ` Ilpo Järvinen
2009-12-03 15:34                                       ` Arnd Hannemann
2009-12-03 15:48                                         ` Ilpo Järvinen
2009-12-03 18:32                                   ` Greg KH
2009-12-03 21:37                                     ` David Miller
2009-12-03  8:56                           ` Frederic Leroy
2009-11-30 21:24           ` Frederic Leroy
2009-11-30 21:26             ` Ilpo Järvinen
2009-11-30 21:54               ` Frederic Leroy
2009-11-27 22:22 ` Frederic Leroy
2009-11-27 22:28   ` Ilpo Järvinen
2009-11-27 22:30   ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).