* Re: 2.6.17: networking bug?? [not found] ` <448ECB09.3010308@rtr.ca> @ 2006-06-13 15:00 ` Mark Lord 2006-06-13 15:28 ` Mark Lord 0 siblings, 1 reply; 24+ messages in thread From: Mark Lord @ 2006-06-13 15:00 UTC (permalink / raw) To: Mark Lord; +Cc: Linux Kernel, netdev, Linus Torvalds Mark Lord wrote: .. > The differences I see are widely varying "window sizes". > What would cause this? This is from (working) 2.6.16.18: > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 1 win 1460 <nop,nop,timestamp 730448 134760199> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 1460 <nop,nop,timestamp 730448 134760199> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56224: P 1:206(205) ack 607 win 32798 <nop,nop,timestamp 134760217 730448> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 206 win 1728 <nop,nop,timestamp 730626 134760217> This is from (failing) 2.6.17-rc6-git2: > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: . ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294760162 134771817> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: . ack 607 win 32798 <nop,nop,timestamp 134771918 4294760162> Both kernels default to /proc/sys/net/ipv4/tcp_window_scaling == 1, and 2.6.16.18 works regardless of whether I turn it off/on again. But 2.6.17-rc6-git2 fails to work with the webserver at www.everymac.com when /proc/sys/net/ipv4/tcp_window_scaling == 1. Setting this to 0 "fixes" the problem. BUG. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 15:00 ` 2.6.17: networking bug?? Mark Lord @ 2006-06-13 15:28 ` Mark Lord 2006-06-13 16:58 ` Mark Lord 0 siblings, 1 reply; 24+ messages in thread From: Mark Lord @ 2006-06-13 15:28 UTC (permalink / raw) To: Linux Kernel; +Cc: netdev, Linus Torvalds Mmm. I notice that 2.6.17 has a new sysctl related to this stuff: /proc/sys/net/ipv4/tcp_workaround_signed_windows It makes no difference whatsoever for me here when varied while /proc/sys/net/ipv4/tcp_window_scaling==1. The site www.everymac.com is still not browseable until setting /proc/sys/net/ipv4/tcp_window_scaling===0. There's one other difference I see in the tcpdump traces. The first packets from each trace below show different values for "wscale". The old (working) kernels use "wscale 2", whereas 2.6.17 uses "wscale 6". In both cases, the value seen in /proc/sys/net/ipv4/tcp_adv_win_scale is 2. This is from (working) 2.6.16.18: > > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: S 2933486277:2933486277(0) win 5840 <mss 1460,sackOK,timestamp 730285 0,nop,wscale 2> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56224: S 2545625510:2545625510(0) ack 2933486278 win 65535 <mss 1452,nop,wscale 1,nop,nop,timestamp 134760199 730285> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 1 win 1460 <nop,nop,timestamp 730448 134760199> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 1460 <nop,nop,timestamp 730448 134760199> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.56224: P 1:206(205) ack 607 win 32798 <nop,nop,timestamp 134760217 730448> > IP silvy.localnet.56224 > 216-145-246-23.rev.dls.net.www: . ack 206 win 1728 <nop,nop,timestamp 730626 134760217> This is from (failing) 2.6.17-rc6-git2: > > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: S 3000518105:3000518105(0) win 5840 <mss 1460,sackOK,timestamp 4294759165 0,nop,wscale 6> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: S 3368494549:3368494549(0) ack 3000518106 win 65535 <mss 1452,nop,wscale 1,nop,nop,timestamp 134771817 4294759165> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: . ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294759337 134771817> > IP silvy.localnet.33472 > 216-145-246-23.rev.dls.net.www: P 1:607(606) ack 1 win 92 <nop,nop,timestamp 4294760162 134771817> > IP 216-145-246-23.rev.dls.net.www > silvy.localnet.33472: . ack 607 win 32798 <nop,nop,timestamp 134771918 4294760162> Something is broken somewhere. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 15:28 ` Mark Lord @ 2006-06-13 16:58 ` Mark Lord 2006-06-13 17:22 ` Mark Lord 0 siblings, 1 reply; 24+ messages in thread From: Mark Lord @ 2006-06-13 16:58 UTC (permalink / raw) To: Linux Kernel; +Cc: netdev, Linus Torvalds .. > The site www.everymac.com is still not browseable until > setting /proc/sys/net/ipv4/tcp_window_scaling===0. > > There's one other difference I see in the tcpdump traces. > The first packets from each trace below show different > values for "wscale". The old (working) kernels use "wscale 2", > whereas 2.6.17 uses "wscale 6". In both cases, the value > seen in /proc/sys/net/ipv4/tcp_adv_win_scale is 2. Okay. More progress here. The calculation of the "wscale" values is based on the "tcp_rmem" sysctl numbers. The defaults for these *differ* between 2.6.16.18 and 2.6.17-rc*. 2.6.16: 4096 87380 174760 2.6.17: 4096 87380 2097152 If I change the tcp_rmem setting on 2.6.17 to match the old value, then the website www.everymac.com becomes accessible again: echo 4096 87380 174760 > /proc/sys/net/ipv4/tcp_rmem Looking at diffs between 2.6.16 and 2.6.17, I see a big rework of the tcp_rmem code in linux/net/ipv4/tcp.c Looks like something got broken there, or possibly the wscale calculations have a bug that is only triggered by the new rmem values ?? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 16:58 ` Mark Lord @ 2006-06-13 17:22 ` Mark Lord 2006-06-13 17:39 ` John Heffner 0 siblings, 1 reply; 24+ messages in thread From: Mark Lord @ 2006-06-13 17:22 UTC (permalink / raw) To: Linux Kernel; +Cc: netdev, Linus Torvalds, jheffner, davem Mark Lord wrote: > .. >> The site www.everymac.com is still not browseable until >> setting /proc/sys/net/ipv4/tcp_window_scaling===0. >> >> There's one other difference I see in the tcpdump traces. >> The first packets from each trace below show different >> values for "wscale". The old (working) kernels use "wscale 2", >> whereas 2.6.17 uses "wscale 6". In both cases, the value >> seen in /proc/sys/net/ipv4/tcp_adv_win_scale is 2. > > Okay. More progress here. The calculation of the "wscale" values > is based on the "tcp_rmem" sysctl numbers. > > The defaults for these *differ* between 2.6.16.18 and 2.6.17-rc*. > > 2.6.16: 4096 87380 174760 > 2.6.17: 4096 87380 2097152 > > If I change the tcp_rmem setting on 2.6.17 to match the old value, > then the website www.everymac.com becomes accessible again: > > echo 4096 87380 174760 > /proc/sys/net/ipv4/tcp_rmem > > Looking at diffs between 2.6.16 and 2.6.17, I see a big rework > of the tcp_rmem code in linux/net/ipv4/tcp.c > > Looks like something got broken there, or possibly the wscale > calculations have a bug that is only triggered by the new rmem values ?? > Okay, here's the blob that broke it. > [TCP]: Set default max buffers from memory pool size > author John Heffner <jheffner@psc.edu> > Sat, 25 Mar 2006 09:34:07 +0000 (01:34 -0800) > committer David S. Miller <davem@davemloft.net> > Sat, 25 Mar 2006 09:34:07 +0000 (01:34 -0800) > commit 7b4f4b5ebceab67ce440a61081a69f0265e17c2a > tree ac02c685ce23f2440fecbebaa5b55cd47947c03e tree > parent 2babf9daae4a3561f3264638a22ac7d0b14a6f52 commit | commitdiff > [TCP]: Set default max buffers from memory pool size > > This patch sets the maximum TCP buffer sizes (available to automatic > buffer tuning, not to setsockopt) based on the TCP memory pool size. > The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more > than 1/128 of the memory pressure threshold. > > Signed-off-by: John Heffner <jheffner@psc.edu> > Signed-off-by: David S. Miller <davem@davemloft.net> John / David: Any ideas on what's gone awry here? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 17:22 ` Mark Lord @ 2006-06-13 17:39 ` John Heffner 2006-06-13 17:50 ` Linus Torvalds 2006-07-02 17:39 ` Jan Knutar 0 siblings, 2 replies; 24+ messages in thread From: John Heffner @ 2006-06-13 17:39 UTC (permalink / raw) To: Mark Lord; +Cc: Linux Kernel, netdev, Linus Torvalds, davem Mark Lord wrote: > John / David: Any ideas on what's gone awry here? > > Yes, you have some sort of a broken middlebox in your path (firewall, transparent proxy, or similar) that doesn't correctly handle window scaling. Check out this thread: <http://marc.theaimsgroup.com/?l=linux-netdev&m=114478312100641&w=2>. The best thing you can do is try to find this broken box and inform its owner that it needs to be fixed. (If you can find out what it is, I'd be interested to know.) In the meantime, disabling window scaling will work around the problem for you. -John ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 17:39 ` John Heffner @ 2006-06-13 17:50 ` Linus Torvalds 2006-06-13 18:26 ` Mark Lord 2006-06-13 18:28 ` John Heffner 2006-07-02 17:39 ` Jan Knutar 1 sibling, 2 replies; 24+ messages in thread From: Linus Torvalds @ 2006-06-13 17:50 UTC (permalink / raw) To: John Heffner; +Cc: Mark Lord, Linux Kernel, netdev, davem On Tue, 13 Jun 2006, John Heffner wrote: > > The best thing you can do is try to find this broken box and inform its owner > that it needs to be fixed. (If you can find out what it is, I'd be interested > to know.) In the meantime, disabling window scaling will work around the > problem for you. Well, arguably, we shouldn't necessarily have defaults that use window scaling, or we should have ways to recognize automatically when it doesn't work (which may not be possible). It's not like there aren't broken boxes out there, and it might be better to make the default buffer sizes just be low enough that window scaling simply isn't an issue. I suspect that the people who really want/need window scaling know about it, and could be assumed to know enough to raise their limits, no? Linus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 17:50 ` Linus Torvalds @ 2006-06-13 18:26 ` Mark Lord 2006-06-13 19:08 ` Mark Lord 2006-06-13 18:28 ` John Heffner 1 sibling, 1 reply; 24+ messages in thread From: Mark Lord @ 2006-06-13 18:26 UTC (permalink / raw) To: John Heffner; +Cc: Linus Torvalds, Linux Kernel, netdev, davem Linus Torvalds wrote: > > On Tue, 13 Jun 2006, John Heffner wrote: >> The best thing you can do is try to find this broken box and inform its owner >> that it needs to be fixed. (If you can find out what it is, I'd be interested >> to know.) In the meantime, disabling window scaling will work around the >> problem for you. > > Well, arguably, we shouldn't necessarily have defaults that use window > scaling, or we should have ways to recognize automatically when it > doesn't work (which may not be possible). > > It's not like there aren't broken boxes out there, and it might be better > to make the default buffer sizes just be low enough that window scaling > simply isn't an issue. > > I suspect that the people who really want/need window scaling know about > it, and could be assumed to know enough to raise their limits, no? Agreed. It's taken me over a month here to realize that the particular webserver in question (www.everymac.com) wasn't "dead", but merely being blocked by my 2.6.17 kernel. All was fine with 2.6.16, as I discovered today. I wonder how many other "dead sites" there are out there, that will be shut off from people when they "upgrade" to 2.6.17 ? I'm a kernel hacker. Most users of 2.6.17 will not be. The default should be something that works "by default". Cheers ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 18:26 ` Mark Lord @ 2006-06-13 19:08 ` Mark Lord 2006-06-13 21:26 ` David Miller 2006-06-14 8:09 ` Daniel Drake 0 siblings, 2 replies; 24+ messages in thread From: Mark Lord @ 2006-06-13 19:08 UTC (permalink / raw) To: John Heffner; +Cc: Linus Torvalds, Linux Kernel, netdev, davem Mark Lord wrote: > Linus Torvalds wrote: > >> It's not like there aren't broken boxes out there, and it might be >> better to make the default buffer sizes just be low enough that window >> scaling simply isn't an issue. >> >> I suspect that the people who really want/need window scaling know >> about it, and could be assumed to know enough to raise their limits, no? > > Agreed. It's taken me over a month here to realize that the particular > webserver in question (www.everymac.com) wasn't "dead", but merely being > blocked by my 2.6.17 kernel. All was fine with 2.6.16, as I discovered > today. > > I wonder how many other "dead sites" there are out there, > that will be shut off from people when they "upgrade" to 2.6.17 ? > > I'm a kernel hacker. Most users of 2.6.17 will not be. > The default should be something that works "by default". Further to this, the current behaviour is badly unpredictable. A machine could be working perfectly, not (noticeably) affected by this bug. And then the user adds another stick of RAM to it. Poof.. many sites from the internet stop responding. Obviously the RAM upgrade broke things.. must be bad RAM, right? Err.. no, the networking stack simply decided to become incompatible with certain sites, as a result of the user adding more RAM to their machine. BbD. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 19:08 ` Mark Lord @ 2006-06-13 21:26 ` David Miller 2006-06-13 21:49 ` Mark Lord 2006-06-14 5:18 ` Andi Kleen 2006-06-14 8:09 ` Daniel Drake 1 sibling, 2 replies; 24+ messages in thread From: David Miller @ 2006-06-13 21:26 UTC (permalink / raw) To: lkml; +Cc: jheffner, torvalds, linux-kernel, netdev From: Mark Lord <lkml@rtr.ca> Date: Tue, 13 Jun 2006 15:08:59 -0400 > Err.. no, the networking stack simply decided to become incompatible > with certain sites, as a result of the user adding more RAM to their > machine. Let's discuss some facts. First, you are getting window scaling by default with the older kernel too. It's just a smaller window scale, using a shift value of say 1 or 2. What these broken middle boxes do is ignore the window scale entirely. So they don't apply a window scale to the advertised windows in each packet. Therefore, they think a smaller amount of window space is being advertised than really is. So they will silently drop packets they think is outside of this bogus window they've calculated. Now, when the window scale is smaller, the connection can still limp along, albeit slowly, making forward progress even in the face of such broken devices because half or a quarter of the window is still available. It will retransmit a lot, and the congestion window won't grow at all. When the window scale is larger, this middle box bug makes it such that not even one packet can fit into the miscalculated window and things wedge. The box thinks that your window is "94" instead of "94 << WINDOW_SCALE". I think OpenBSD's claim (they did have the bug and probably still do for all that I know) was that they wanted to make their firewalling "stateless". This is a bogus argument because by definition you cannot interpret the TCP window without having seen the initial connection startup where the parameters are negotiated, and in particular the window scale which will be used. And you want to say we should try to work around systems designed by people who think this is ok? :-) It is impossible to fill a cross-continental connection without using window scaling. A 64K window is all you get without scaling. Big buffers are absolutely necessary, and as John Heffner showed this need is growing exponentially and not slowing down. 6 megabit downlink is pretty commonplace in the US, and the standard is much higher in well connected countries such as South Korea. Also, as John Heffner mentioned, even if we could detect the broken boxes you can't just "turn off window scaling" after it's been negotiated. It's immutably active for the entire connection once enabled. Window scaling has been standardized and around for 14 years, RFC1323 was published in May of 1992. How much longer can we wait for it to be deployed properly? :-) So the broken boxes, which to be honest are few and far between these days, need to go, they really do. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 21:26 ` David Miller @ 2006-06-13 21:49 ` Mark Lord 2006-06-13 22:12 ` Rick Jones ` (3 more replies) 2006-06-14 5:18 ` Andi Kleen 1 sibling, 4 replies; 24+ messages in thread From: Mark Lord @ 2006-06-13 21:49 UTC (permalink / raw) To: David Miller; +Cc: jheffner, torvalds, linux-kernel, netdev David Miller wrote: >.. > First, you are getting window scaling by default with the older > kernel too. It's just a smaller window scale, using a shift > value of say 1 or 2. > > What these broken middle boxes do is ignore the window scale > entirely. > > So they don't apply a window scale to the advertised windows in each > packet. Therefore, they think a smaller amount of window space is > being advertised than really is. So they will silently drop packets > they think is outside of this bogus window they've calculated. > > Now, when the window scale is smaller, the connection can still limp > along, albeit slowly, making forward progress even in the face of such > broken devices because half or a quarter of the window is still > available. It will retransmit a lot, and the congestion window won't > grow at all. > > When the window scale is larger, this middle box bug makes it such > that not even one packet can fit into the miscalculated window and > things wedge. The box thinks that your window is "94" instead of > "94 << WINDOW_SCALE". .. Unilaterally following the standard is all well and good for those who know how to get around it when a site becomes inaccessible, but not for Joe User. If it always fails, or always works, that's not such a big problem. I would never have complained if I had never been able to access the web sites in question. But since it IS working in 2.6.16, and got broken in 2.6.17, I'm bloody well going to complain. I suppose the most important objection to our current behaviour is that this behaviour *changes* when something totally unrelated (to Joe User) happens: adding or removing a stick of RAM. So I'm not against the window scaling, just against it's apparent randomness (to the vast majority who are not "in the know"). We should perhaps just have a fixed upper memory setting, as we currently do in 2.6.16, so that the behaviour is predictable. On a related note.. I wonder if we can choose better values for the window size, so that if the scale factor is ignored, we still end up with reasonably sized packets? So that the other box will not think our window is a mere "94" when the scale factor is lost? -ml ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 21:49 ` Mark Lord @ 2006-06-13 22:12 ` Rick Jones 2006-06-13 22:23 ` David Miller ` (2 subsequent siblings) 3 siblings, 0 replies; 24+ messages in thread From: Rick Jones @ 2006-06-13 22:12 UTC (permalink / raw) To: Mark Lord; +Cc: David Miller, jheffner, torvalds, linux-kernel, netdev Mark From everything I have read so far (which admittedly hasn't been everything) it sounds like the firewall in question was a ticking timebomb. If 2.6.17 hadn't set it off, something else might very well have done so. Or, if you prefer another metaphore, 2.6.17 was simply the last in a series of straws on the back of the camel what was the firewall. Meta issues of whether or not the camel that is firewalls should have ever been allowed to poke its nose in the Internet Tent notwithstanding :) At the very least, the firewall, if it is going to be "stateless," has to strip the window scaling option from the SYN's that go past. Otherwise, I would be inclined to agree with David that the firewall is fundamentally broken. rick jones ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 21:49 ` Mark Lord 2006-06-13 22:12 ` Rick Jones @ 2006-06-13 22:23 ` David Miller 2006-06-13 22:40 ` Rick Jones 2006-06-13 23:22 ` Matt Mackall 2006-06-19 7:07 ` Helge Hafting 3 siblings, 1 reply; 24+ messages in thread From: David Miller @ 2006-06-13 22:23 UTC (permalink / raw) To: lkml; +Cc: jheffner, torvalds, linux-kernel, netdev From: Mark Lord <lkml@rtr.ca> Date: Tue, 13 Jun 2006 17:49:21 -0400 > I suppose the most important objection to our current behaviour > is that this behaviour *changes* when something totally unrelated > (to Joe User) happens: adding or removing a stick of RAM. We are pretty much required to choose the TCP memory parameters based upon how much physical memory is in the machine, and these parameters in-turn are inextricably linked to what kind of window scale we try to use for connections. The behavior is unfortunate, but more unfortunate are the boxes that create these problems in the first place. I believe their lifespan is quite limited. > We should perhaps just have a fixed upper memory setting, as we > currently do in 2.6.16, so that the behaviour is predictable. The change in 2.6.17 was exactly that we needed to increase this upper limit to ~4MB. > On a related note.. I wonder if we can choose better values for > the window size, so that if the scale factor is ignored, we still > end up with reasonably sized packets? So that the other box > will not think our window is a mere "94" when the scale factor > is lost? We have an algorithm that tries to pick something based upon the set of the values we might need to represent in the window field. If the scale is too high, you lose accuracy, since the lower bits get chopped off when the TCP header is being built and the computed window size is shifted down. So we try to pick the smallest scale necessary to represent the largest window size we might end up needing to advertise. A complication here is that we dynamically size both receive and send buffers in response to our growing knowledge of the connection's characteristics over time. So at the beginning we'll use a small buffer size, and as the congestion window grows we'll increase our buffer sizes to fill the pipe. This adds even more considerations for window scale selection, as you can imagine. One final word about window sizes. If you have a connection whose bandwidth-delay-product needs an N byte buffer to fill, you actually have to have an "N * 2" sized buffer available in order for fast retransmit to work. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 22:23 ` David Miller @ 2006-06-13 22:40 ` Rick Jones 2006-06-13 23:01 ` David Miller 2006-06-14 1:25 ` John Heffner 0 siblings, 2 replies; 24+ messages in thread From: Rick Jones @ 2006-06-13 22:40 UTC (permalink / raw) To: David Miller; +Cc: lkml, jheffner, torvalds, linux-kernel, netdev > One final word about window sizes. If you have a connection whose > bandwidth-delay-product needs an N byte buffer to fill, you actually > have to have an "N * 2" sized buffer available in order for fast > retransmit to work. Is that as important in the presence of SACK? rick jones ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 22:40 ` Rick Jones @ 2006-06-13 23:01 ` David Miller 2006-06-14 1:25 ` John Heffner 1 sibling, 0 replies; 24+ messages in thread From: David Miller @ 2006-06-13 23:01 UTC (permalink / raw) To: rick.jones2; +Cc: lkml, jheffner, torvalds, linux-kernel, netdev From: Rick Jones <rick.jones2@hp.com> Date: Tue, 13 Jun 2006 15:40:53 -0700 > > One final word about window sizes. If you have a connection whose > > bandwidth-delay-product needs an N byte buffer to fill, you actually > > have to have an "N * 2" sized buffer available in order for fast > > retransmit to work. > > Is that as important in the presence of SACK? The consern is identical, SACK or not. The only difference SACK introduces for fast retransmit is that we know with more certainty which holes need to be filled and thus which packets to fast retransmit. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 22:40 ` Rick Jones 2006-06-13 23:01 ` David Miller @ 2006-06-14 1:25 ` John Heffner 1 sibling, 0 replies; 24+ messages in thread From: John Heffner @ 2006-06-14 1:25 UTC (permalink / raw) To: Rick Jones; +Cc: David Miller, netdev Rick Jones wrote: >> One final word about window sizes. If you have a connection whose >> bandwidth-delay-product needs an N byte buffer to fill, you actually >> have to have an "N * 2" sized buffer available in order for fast >> retransmit to work. > > Is that as important in the presence of SACK? With SACK you do need up to N * 2, but no more. With New Reno, you potentially need ~ N * (number of losses) or it will time out... yuck. SACK is a good thing. :) -John ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 21:49 ` Mark Lord 2006-06-13 22:12 ` Rick Jones 2006-06-13 22:23 ` David Miller @ 2006-06-13 23:22 ` Matt Mackall 2006-06-19 7:07 ` Helge Hafting 3 siblings, 0 replies; 24+ messages in thread From: Matt Mackall @ 2006-06-13 23:22 UTC (permalink / raw) To: Mark Lord; +Cc: David Miller, jheffner, torvalds, linux-kernel, netdev On Tue, Jun 13, 2006 at 05:49:21PM -0400, Mark Lord wrote: > > > David Miller wrote: > >.. > >First, you are getting window scaling by default with the older > >kernel too. It's just a smaller window scale, using a shift > >value of say 1 or 2. > > > >What these broken middle boxes do is ignore the window scale > >entirely. > > > >So they don't apply a window scale to the advertised windows in each > >packet. Therefore, they think a smaller amount of window space is > >being advertised than really is. So they will silently drop packets > >they think is outside of this bogus window they've calculated. > > > >Now, when the window scale is smaller, the connection can still limp > >along, albeit slowly, making forward progress even in the face of such > >broken devices because half or a quarter of the window is still > >available. It will retransmit a lot, and the congestion window won't > >grow at all. > > > >When the window scale is larger, this middle box bug makes it such > >that not even one packet can fit into the miscalculated window and > >things wedge. The box thinks that your window is "94" instead of > >"94 << WINDOW_SCALE". > .. > > Unilaterally following the standard is all well and good > for those who know how to get around it when a site becomes > inaccessible, but not for Joe User. We had very similar issues with ECN. But unlike ECN, window scaling is not something we can just shrug our shoulders and say "oh well" about. We will have to deal with it eventually. It might as well be sooner. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 21:49 ` Mark Lord ` (2 preceding siblings ...) 2006-06-13 23:22 ` Matt Mackall @ 2006-06-19 7:07 ` Helge Hafting 3 siblings, 0 replies; 24+ messages in thread From: Helge Hafting @ 2006-06-19 7:07 UTC (permalink / raw) To: Mark Lord; +Cc: David Miller, jheffner, torvalds, linux-kernel, netdev Mark Lord wrote: > > Unilaterally following the standard is all well and good > for those who know how to get around it when a site becomes > inaccessible, but not for Joe User. > So lets enable it in the kernel, and let the distros turn it off. The Joe User who isn't a kernel hacker won't be running 2.6.17 in a long time. He'll be running whatever his distro packages for him, and they will know how to disable (or patch out) window scaling. Someone who compiles his own kernel runs into all sorts of issues, this is just one more of them. > If it always fails, or always works, that's not such a big problem. > I would never have complained if I had never been able to access > the web sites in question. But since it IS working in 2.6.16, > and got broken in 2.6.17, I'm bloody well going to complain. Yes. And make sure you complain to those running the bad box as well. Helge Hafting ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 21:26 ` David Miller 2006-06-13 21:49 ` Mark Lord @ 2006-06-14 5:18 ` Andi Kleen 1 sibling, 0 replies; 24+ messages in thread From: Andi Kleen @ 2006-06-14 5:18 UTC (permalink / raw) To: David Miller; +Cc: lkml, jheffner, torvalds, linux-kernel, netdev > Also, as John Heffner mentioned, even if we could detect the broken > boxes you can't just "turn off window scaling" after it's been > negotiated. It's immutably active for the entire connection once > enabled. In theory you could set a bit in the dst entry and not use it next time you connect to that host. That would be ok for web browsing at least when creates new connections all the time. But it's unclear how to even detect this situation reliably e.g. you don't want to disable it just because there was a bit of packet loss on a connection to a particular host earlier and there is no clear heuristic to detect that this particular problem happened. > So the broken boxes, which to be honest are few and far between these > days, need to go, they really do. Agreed. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 19:08 ` Mark Lord 2006-06-13 21:26 ` David Miller @ 2006-06-14 8:09 ` Daniel Drake 1 sibling, 0 replies; 24+ messages in thread From: Daniel Drake @ 2006-06-14 8:09 UTC (permalink / raw) To: Mark Lord; +Cc: John Heffner, Linus Torvalds, Linux Kernel, netdev, davem Mark Lord wrote: > Further to this, the current behaviour is badly unpredictable. > > A machine could be working perfectly, not (noticeably) affected > by this bug. And then the user adds another stick of RAM to it. This "bug" already existed in 2.6.16 to a certain extent: you were losing out on a lot of TCP performance. Go back to 2.6.7, measure TCP performance, and you'll probably find it was significantly better. Also, there aren't that many broken end-points out there. www.everymac.com loads fine for me and does not ignore the window scale factor. The problem in your case is a broken router in the middle. I had the same problem: certain sites would not load, but there is absolutely nothing wrong with the servers that run these sites: http://marc.theaimsgroup.com/?l=linux-netdev&m=114478312100641&w=2 I contacted my ISP and informed them of the issue. They fixed it nationwide within a few weeks. You might try confirming that your problem only applies to HTTP like mine did (ISP runs some lame transparent webcaches), and it was a bug in the software there (NetApp). We already had the "some routers are broken, should we do anything" discussion back at the time of 2.6.8: http://lwn.net/Articles/92727/ Daniel ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 17:50 ` Linus Torvalds 2006-06-13 18:26 ` Mark Lord @ 2006-06-13 18:28 ` John Heffner 2006-06-13 20:45 ` Barry K. Nathan 2006-06-13 22:09 ` Chase Venters 1 sibling, 2 replies; 24+ messages in thread From: John Heffner @ 2006-06-13 18:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mark Lord, Linux Kernel, netdev, davem Linus Torvalds wrote: > > On Tue, 13 Jun 2006, John Heffner wrote: >> The best thing you can do is try to find this broken box and inform its owner >> that it needs to be fixed. (If you can find out what it is, I'd be interested >> to know.) In the meantime, disabling window scaling will work around the >> problem for you. > > Well, arguably, we shouldn't necessarily have defaults that use window > scaling, or we should have ways to recognize automatically when it > doesn't work (which may not be possible). > > It's not like there aren't broken boxes out there, and it might be better > to make the default buffer sizes just be low enough that window scaling > simply isn't an issue. > > I suspect that the people who really want/need window scaling know about > it, and could be assumed to know enough to raise their limits, no? > > Linus Unfortunately, there's really no way to detect this, at least not until it's too late. You can't un-negotiate window scale after the connection is initiated. 64k buffers, the largest you can use without window scaling, are adequate for most home users on DSL or cable modems (good to about 10 Mbps across the US, not quite that over trans-oceanic links). Unfortunately, that's about a factor of ten too small for that average university user, and a factor of 100-1000 too small for high end use. Check out the figure at <http://people.internet2.edu/~ghb/pmwiki/pmwiki.php/BridgingTheGap/BtGWizGap>, which has some data points. (The bottom line is the best "normal" users can get with system default buffers, the top line is what high-end users have gotten with tuned systems over the wide area. Note that this gap is increasing at an exponential rate.) In the last couple years, we've added code that can automatically size the buffers as appropriate for each connection, but it's completely crippled unless you use a window scale. Personally, I think it's not a question of *whether* we have to start using a window scale by default, but *when*. I don't know that we want to let a small number of unambiguously broken middleboxes kill our forward progress. Though I haven't gotten my hands on it, I believe Windows will soon have this capability, too. I'm sure Windows is big enough that whenever they turn this on, it will flush out all these boxes pretty quickly. We could wait for them to do it first, that that's not my favored approach. BTW, as one data point, I've been personally running with a large window scale for about 5 years, and only seen a small handful of problems, most of which were corrected fairly quickly after I sent email to the admin of the box in question. No "big" sites have been an issue. Thanks, -John ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 18:28 ` John Heffner @ 2006-06-13 20:45 ` Barry K. Nathan 2006-06-13 22:09 ` Chase Venters 1 sibling, 0 replies; 24+ messages in thread From: Barry K. Nathan @ 2006-06-13 20:45 UTC (permalink / raw) To: John Heffner; +Cc: Linus Torvalds, Mark Lord, Linux Kernel, netdev, davem On 6/13/06, John Heffner <jheffner@psc.edu> wrote: > Though I haven't gotten my hands on it, I believe Windows will soon have > this capability, too. I'm sure Windows is big enough that whenever they > turn this on, it will flush out all these boxes pretty quickly. We > could wait for them to do it first, that that's not my favored approach. Yes, that appears to be the case with Windows Vista: http://blogs.msdn.com/wndp/archive/2006/05/05/Winhec_blog_tcpip_2.aspx *However*, they're also adding "black hole detection", for working around broken boxes that drop ICMP. And that kinda makes me wonder how long they're going to stick to their guns for enabling window scaling. (For instance, if too many sites break, maybe they'll disable it again with Vista Service Pack 1 or something.) -- -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 18:28 ` John Heffner 2006-06-13 20:45 ` Barry K. Nathan @ 2006-06-13 22:09 ` Chase Venters 2006-06-13 22:23 ` David Miller 1 sibling, 1 reply; 24+ messages in thread From: Chase Venters @ 2006-06-13 22:09 UTC (permalink / raw) To: John Heffner; +Cc: Linus Torvalds, Mark Lord, Linux Kernel, netdev, davem On Tue, 13 Jun 2006, John Heffner wrote: > > In the last couple years, we've added code that can automatically size the > buffers as appropriate for each connection, but it's completely crippled > unless you use a window scale. Personally, I think it's not a question of > *whether* we have to start using a window scale by default, but *when*. I > don't know that we want to let a small number of unambiguously broken > middleboxes kill our forward progress. Another example - Same thing happened with ECN. I recall setting up a mail server at the time and noticing that I had to disable ECN because some dumbass PIX routers out there were dropping packets with a _reserved bit_ set! Sure, there was a firmware upgrade, but the dingbat admins I tried to alert didn't seem (at the time) too interested in fixing their problem. Does anyone have any interesting statistics on how often end-users are likely to run into this crap? It really is a shame when you have to suck just because someone else does. > > Thanks, > -John Cheers, Chase ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 22:09 ` Chase Venters @ 2006-06-13 22:23 ` David Miller 0 siblings, 0 replies; 24+ messages in thread From: David Miller @ 2006-06-13 22:23 UTC (permalink / raw) To: chase.venters; +Cc: jheffner, torvalds, lkml, linux-kernel, netdev From: Chase Venters <chase.venters@clientec.com> Date: Tue, 13 Jun 2006 17:09:16 -0500 (CDT) > Does anyone have any interesting statistics on how often end-users > are likely to run into this crap? I think it's much less likely than the ECN stuff, by a long shot. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.17: networking bug?? 2006-06-13 17:39 ` John Heffner 2006-06-13 17:50 ` Linus Torvalds @ 2006-07-02 17:39 ` Jan Knutar 1 sibling, 0 replies; 24+ messages in thread From: Jan Knutar @ 2006-07-02 17:39 UTC (permalink / raw) To: John Heffner; +Cc: Mark Lord, Linux Kernel, netdev, Linus Torvalds, davem On Tuesday 13 June 2006 20:39, John Heffner wrote: > The best thing you can do is try to find this broken box and inform its > owner that it needs to be fixed. (If you can find out what it is, I'd > be interested to know.) In the meantime, disabling window scaling will > work around the problem for you. I was bit by this "networking bug" too. The broken box turned out to be the OpenBSD box I was trying to connect to. The owner removed scrub from pf.conf, and connectivity was restored. ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2006-07-02 17:39 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <448EC6F3.3060002@rtr.ca>
[not found] ` <448ECB09.3010308@rtr.ca>
2006-06-13 15:00 ` 2.6.17: networking bug?? Mark Lord
2006-06-13 15:28 ` Mark Lord
2006-06-13 16:58 ` Mark Lord
2006-06-13 17:22 ` Mark Lord
2006-06-13 17:39 ` John Heffner
2006-06-13 17:50 ` Linus Torvalds
2006-06-13 18:26 ` Mark Lord
2006-06-13 19:08 ` Mark Lord
2006-06-13 21:26 ` David Miller
2006-06-13 21:49 ` Mark Lord
2006-06-13 22:12 ` Rick Jones
2006-06-13 22:23 ` David Miller
2006-06-13 22:40 ` Rick Jones
2006-06-13 23:01 ` David Miller
2006-06-14 1:25 ` John Heffner
2006-06-13 23:22 ` Matt Mackall
2006-06-19 7:07 ` Helge Hafting
2006-06-14 5:18 ` Andi Kleen
2006-06-14 8:09 ` Daniel Drake
2006-06-13 18:28 ` John Heffner
2006-06-13 20:45 ` Barry K. Nathan
2006-06-13 22:09 ` Chase Venters
2006-06-13 22:23 ` David Miller
2006-07-02 17:39 ` Jan Knutar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).