netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Debugging TCP: Treason Uncloaked
@ 2008-05-08  2:29 Chris Bredesen
  2008-05-08 15:01 ` John Heffner
  2008-05-13 11:21 ` Ilpo Järvinen
  0 siblings, 2 replies; 5+ messages in thread
From: Chris Bredesen @ 2008-05-08  2:29 UTC (permalink / raw)
  To: netdev

Hello,

As per below thread, I'm forwarding to this list/alias.  tcpdump output
is attached to the forum post.

Thanks!

-------- Original Message --------
Subject: Re: F8 Treason Uncloaked
Date: Tue, 6 May 2008 20:00:51 +0100
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Organization: Red Hat UK Cyf., Amberley Place, 107-111 Peascod Street,
Windsor, Berkshire, SL4 1TE, Y Deyrnas Gyfunol. Cofrestrwyd yng Nghymru
a Lloegr o'r rhif cofrestru 3798903
To: For users of Fedora <fedora-list@redhat.com>
CC: cbredesen@redhat.com
References: <4820A59F.7010005@redhat.com>

On Tue, 06 May 2008 14:38:23 -0400
Chris Bredesen <cbredesen@redhat.com> wrote:

> Hello list,
> 
> I'm trying to get this issue out to a wider audience.  I posted it on 
> the forum but got no responses:
> 
> http://www.fedoraforum.org/forum/showthread.php?t=186331
> 
> Sorry to cross post, but I'm not sure where else to turn...

netdev@linux.kernel.org

but the TCP messages really indicate either a Linux bug or that the NAS
isn't talking proper TCP. Trying turning off window scaling was a good
test but beyond that tcpdump data will be needed.

Alan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging TCP: Treason Uncloaked
  2008-05-08  2:29 Debugging TCP: Treason Uncloaked Chris Bredesen
@ 2008-05-08 15:01 ` John Heffner
  2008-05-13 11:21 ` Ilpo Järvinen
  1 sibling, 0 replies; 5+ messages in thread
From: John Heffner @ 2008-05-08 15:01 UTC (permalink / raw)
  To: Chris Bredesen; +Cc: netdev

On Wed, May 7, 2008 at 10:29 PM, Chris Bredesen <cbredesen@redhat.com> wrote:
> Hello,
>
>  As per below thread, I'm forwarding to this list/alias.  tcpdump output
>  is attached to the forum post.

Full binary tcpdumps (including the SYN packets) are more useful.

>From the brief snippet you have, we can't see where the announced
window was reneged, but it's 192.168.1.130 doing it, not the nas box.
You're also getting DSACKs from the nas box, which is also a bit
strange.

Another thing to try is disabling TSO, though I can't think why that
might be causing a problem here.

  -John

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging TCP: Treason Uncloaked
  2008-05-08  2:29 Debugging TCP: Treason Uncloaked Chris Bredesen
  2008-05-08 15:01 ` John Heffner
@ 2008-05-13 11:21 ` Ilpo Järvinen
  2008-05-19 16:40   ` Chris Bredesen
  1 sibling, 1 reply; 5+ messages in thread
From: Ilpo Järvinen @ 2008-05-13 11:21 UTC (permalink / raw)
  To: Chris Bredesen; +Cc: Netdev

On Wed, 7 May 2008, Chris Bredesen wrote:

> Hello,
> 
> As per below thread, I'm forwarding to this list/alias.  tcpdump output
> is attached to the forum post.
> 
> Thanks!
> 
> -------- Original Message --------
> Subject: Re: F8 Treason Uncloaked
> Date: Tue, 6 May 2008 20:00:51 +0100
> From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> Organization: Red Hat UK Cyf., Amberley Place, 107-111 Peascod Street,
> Windsor, Berkshire, SL4 1TE, Y Deyrnas Gyfunol. Cofrestrwyd yng Nghymru
> a Lloegr o'r rhif cofrestru 3798903
> To: For users of Fedora <fedora-list@redhat.com>
> CC: cbredesen@redhat.com
> References: <4820A59F.7010005@redhat.com>
> 
> On Tue, 06 May 2008 14:38:23 -0400
> Chris Bredesen <cbredesen@redhat.com> wrote:
> 
> > Hello list,
> > 
> > I'm trying to get this issue out to a wider audience.  I posted it on the
> > forum but got no responses:
> > 
> > http://www.fedoraforum.org/forum/showthread.php?t=186331
> > 
> > Sorry to cross post, but I'm not sure where else to turn...
> 
> netdev@linux.kernel.org
> 
> but the TCP messages really indicate either a Linux bug or that the NAS
> isn't talking proper TCP. Trying turning off window scaling was a good
> test but beyond that tcpdump data will be needed.


...This report lacks kernel version (no I won't try to figure out what f8 
or whatever is using on your box, just tell it :-)) (e.g., ...Some 
2.6.25-rc had this problem).

Tcp_window_scaling sysctl has nothing to do with window resizing. ...It 
just decides if scaling factor can be used or not. It won't guarantee you 
a constant window!

What happened while the window was shrunk is hard to know because the log 
snippet doesn't have the point where the window was reduced.

-- 
 i.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging TCP: Treason Uncloaked
  2008-05-13 11:21 ` Ilpo Järvinen
@ 2008-05-19 16:40   ` Chris Bredesen
  2008-05-20 10:17     ` Ilpo Järvinen
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Bredesen @ 2008-05-19 16:40 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, johnwheffner

[-- Attachment #1: Type: text/plain, Size: 1500 bytes --]

Ilpo Järvinen wrote:
> ...This report lacks kernel version (no I won't try to figure out what f8 
> or whatever is using on your box, just tell it :-)) (e.g., ...Some 
> 2.6.25-rc had this problem).
> 
> Tcp_window_scaling sysctl has nothing to do with window resizing. ...It 
> just decides if scaling factor can be used or not. It won't guarantee you 
> a constant window!
> 
> What happened while the window was shrunk is hard to know because the log 
> snippet doesn't have the point where the window was reduced.

John - thanks for the explanation - I understand the relationship 
between scaling and resizing now.  If my notes are correct, it happened 
with both these kernels on the client:

2.6.24.3-12.fc8.i686
2.6.24.3-34.fc8.i686

RHEL and CentOS guys are reporting this issue as well so I wonder if 
it's something specific to a RH kernel (not sure how close they are to 
upstream but my understanding is that Fedora kernels are pretty close, 
but this is *clearly* not my area of expertise).

Kernel on the NAS device is 2.6.9 AFAIK but the distro has proprietary 
bits in it so I'm not sure what's been done there.  It's a Netgear 
ReadyNAS appliance.

In any case, I'm attaching an archive of the whole tcpdump session so 
you can have a look.   Please let me know if you need more info.  I 
really *really* appreciate your help on this -- I'm paying the results 
of our findings forward so others won't get tripped up on this issue.

Best,

Chris

[-- Attachment #2: tcp.dump.filtered.tar.gz --]
[-- Type: application/x-gzip, Size: 106507 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging TCP: Treason Uncloaked
  2008-05-19 16:40   ` Chris Bredesen
@ 2008-05-20 10:17     ` Ilpo Järvinen
  0 siblings, 0 replies; 5+ messages in thread
From: Ilpo Järvinen @ 2008-05-20 10:17 UTC (permalink / raw)
  To: Chris Bredesen; +Cc: Netdev, johnwheffner

On Mon, 19 May 2008, Chris Bredesen wrote:

> Kernel on the NAS device is 2.6.9 AFAIK but the distro has proprietary bits in
> it so I'm not sure what's been done there.  It's a Netgear ReadyNAS appliance.

It well could be NAS' fault as well.... The recent case with 25-rcs had 
TCP to transmit _past_ snd_nxt (ie., too far, which of course is not right 
either), not that the window was actually shrunk as the message suggests.

> In any case, I'm attaching an archive of the whole tcpdump session so you can
> have a look.   Please let me know if you need more info. 

Hmm, this actually seems to be fault of that type in NAS' TCP:

20:06:57.976848 ... > nas.rsync: . 23744483:23745931(1448) ack 130667 win 1448 
20:06:57.977241 nas.rsync > ...: . 130667:132115(1448) 
20:06:57.977294 nas.rsync > ...: . 132115:133563(1448) 
20:06:57.977308 nas.rsync > ...: P 133563:134259(696)

How come could it send 134259 when advertized window is just 130667+1448 = 
132115 and assume that to work? Then TCP at NAS' end finally gives up 
later because it does get cumulative ACK as response to a number of RTOs 
as window remains zero at 133563. Would the window open from zero, the 
situation would resolve when RTO is received. But it doesn't which 
may be client side user-space application's "fault" as it seems to not be 
too eager to read from TCP(?) :-/, nevertheless, NAS violated spec and 
cannot cope the results. And yes, the client didn't shrink the window 
anywhere (I checked that too), so those transmission are obviously out of 
window by spec.

If some other client works, it may be just due to luck, eg., user-space 
works differently or a subtle difference in TCP implementation behavior.

As a workaround, one could try larger receiver buffer at the client.
I don't think window scaling contributes to this problem as you suggested 
earlier, except that there are some bugs related to it in 2.6.9 that are 
fixed now (and even 2.6.24 might have the rounding bug unless somebody 
sent that to stable, I don't remember if that happened, that is, commit 
607bfbf2d55dd1cfe5368b41c2a81a8c9ccf4723).

-- 
 i.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-05-20 10:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-08  2:29 Debugging TCP: Treason Uncloaked Chris Bredesen
2008-05-08 15:01 ` John Heffner
2008-05-13 11:21 ` Ilpo Järvinen
2008-05-19 16:40   ` Chris Bredesen
2008-05-20 10:17     ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).