netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] TCP acts like it is always out of memory.
       [not found]     ` <20040630153049.3ca25b76.davem@redhat.com>
@ 2004-07-01 20:37       ` Stephen Hemminger
  2004-07-01 21:04         ` David S. Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Stephen Hemminger @ 2004-07-01 20:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: Arnaldo Carvalho de Melo, netdev

Current 2.6.7 tree acts as if it is alway under memory pressure because
a recent change did a s/tcp_memory_pressure/tcp_prot.memory_pressure/.
The problem is tcp_prot.memory_pressure is a pointer, so it is always non-zero!

Rather than using *tcp_prot.memory_pressure, just go back to looking at
tcp_memory_pressure.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

diff -Nru a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c	2004-07-01 13:36:58 -07:00
+++ b/net/ipv4/tcp_input.c	2004-07-01 13:36:58 -07:00
@@ -259,7 +259,7 @@
 	/* Check #1 */
 	if (tp->rcv_ssthresh < tp->window_clamp &&
 	    (int)tp->rcv_ssthresh < tcp_space(sk) &&
-	    !tcp_prot.memory_pressure) {
+	    !tcp_memory_pressure) {
 		int incr;
 
 		/* Check #2. Increase window, if skb with such overhead
@@ -349,7 +349,7 @@
 	if (ofo_win) {
 		if (sk->sk_rcvbuf < sysctl_tcp_rmem[2] &&
 		    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
-		    !tcp_prot.memory_pressure &&
+		    !tcp_memory_pressure &&
 		    atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0])
 			sk->sk_rcvbuf = min(atomic_read(&sk->sk_rmem_alloc),
 					    sysctl_tcp_rmem[2]);
@@ -3764,7 +3764,7 @@
 
 	if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
 		tcp_clamp_window(sk, tp);
-	else if (tcp_prot.memory_pressure)
+	else if (tcp_memory_pressure)
 		tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss);
 
 	tcp_collapse_ofo_queue(sk);
@@ -3844,7 +3844,7 @@
 
 	if (tp->packets_out < tp->snd_cwnd &&
 	    !(sk->sk_userlocks & SOCK_SNDBUF_LOCK) &&
-	    !tcp_prot.memory_pressure &&
+	    !tcp_memory_pressure &&
 	    atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0]) {
  		int sndmem = max_t(u32, tp->mss_clamp, tp->mss_cache) +
 			MAX_TCP_HEADER + 16 + sizeof(struct sk_buff),
diff -Nru a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c	2004-07-01 13:36:58 -07:00
+++ b/net/ipv4/tcp_output.c	2004-07-01 13:36:58 -07:00
@@ -672,7 +672,7 @@
 	if (free_space < full_space/2) {
 		tp->ack.quick = 0;
 
-		if (tcp_prot.memory_pressure)
+		if (tcp_memory_pressure)
 			tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U*tp->advmss);
 
 		if (free_space < mss)
diff -Nru a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
--- a/net/ipv4/tcp_timer.c	2004-07-01 13:36:58 -07:00
+++ b/net/ipv4/tcp_timer.c	2004-07-01 13:36:58 -07:00
@@ -257,7 +257,7 @@
 	TCP_CHECK_TIMER(sk);
 
 out:
-	if (tcp_prot.memory_pressure)
+	if (tcp_memory_pressure)
 		sk_stream_mem_reclaim(sk);
 out_unlock:
 	bh_unlock_sock(sk);

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] TCP acts like it is always out of memory.
  2004-07-01 20:37       ` [PATCH] TCP acts like it is always out of memory Stephen Hemminger
@ 2004-07-01 21:04         ` David S. Miller
  2004-07-02  1:32           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 37+ messages in thread
From: David S. Miller @ 2004-07-01 21:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: acme, netdev

On Thu, 1 Jul 2004 13:37:38 -0700
Stephen Hemminger <shemminger@osdl.org> wrote:

> Current 2.6.7 tree acts as if it is alway under memory pressure because
> a recent change did a s/tcp_memory_pressure/tcp_prot.memory_pressure/.
> The problem is tcp_prot.memory_pressure is a pointer, so it is always non-zero!
> 
> Rather than using *tcp_prot.memory_pressure, just go back to looking at
> tcp_memory_pressure.

Hehe, applied thanks Stephen.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] TCP acts like it is always out of memory.
  2004-07-01 21:04         ` David S. Miller
@ 2004-07-02  1:32           ` Arnaldo Carvalho de Melo
  2004-07-06  9:35             ` analysis of TCP window size issues still around - several reports / SACK involved? bert hubert
  0 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2004-07-02  1:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: Stephen Hemminger, netdev

Em Thu, Jul 01, 2004 at 02:04:06PM -0700, David S. Miller escreveu:
> On Thu, 1 Jul 2004 13:37:38 -0700
> Stephen Hemminger <shemminger@osdl.org> wrote:
> 
> > Current 2.6.7 tree acts as if it is alway under memory pressure because
> > a recent change did a s/tcp_memory_pressure/tcp_prot.memory_pressure/.
> > The problem is tcp_prot.memory_pressure is a pointer, so it is always non-zero!
> > 
> > Rather than using *tcp_prot.memory_pressure, just go back to looking at
> > tcp_memory_pressure.
> 
> Hehe, applied thanks Stephen.

:-) Thanks Stephen for the fix, this was a leftover of the conversion of
the memory pressure members in struct proto to pointers, to cover the case
pointed out by David related to the ipv6_mapped functionality in the
1.1722.122.23 changeset, (i.e.  tcp_prot and tcpv6_prot having to share the
same accounting variables), I forgot to convert all places where the
tcp_prot.memory_pressure memory is used, the fix is exactly what I should
have done. Due to family health problems I was unable to promply fix this
thinko, so, again, thank you very much.

Best Regards,

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* analysis of TCP window size issues still around - several reports / SACK involved?
  2004-07-02  1:32           ` Arnaldo Carvalho de Melo
@ 2004-07-06  9:35             ` bert hubert
  2004-07-06 18:47               ` [PATCH] fix tcp_default_win_scale Stephen Hemminger
  2004-07-06 20:19               ` analysis of TCP window size issues still around - several reports / SACK involved? David S. Miller
  0 siblings, 2 replies; 37+ messages in thread
From: bert hubert @ 2004-07-06  9:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: David S. Miller,
	linux-kernel @ vger. kernel. org Stephen Hemminger, netdev,
	alessandro.suardi, phyprabab

On Thu, Jul 01, 2004 at 10:32:25PM -0300, Arnaldo Carvalho de Melo wrote:

> > > Rather than using *tcp_prot.memory_pressure, just go back to looking at
> > > tcp_memory_pressure.
> > 
> > Hehe, applied thanks Stephen.

People,

There are still persistent reports of TCP problems, even after patching away
the memory_pressure pointer problem. From one trace I've seen, by Alessandro
Suardi, but I lack the SACK knowledge to fully interpret these traces:

22:42:40.890025 192.168.1.6.32843 > 204.152.189.116.http: S 1994994484:1994994484(0) win 5840 <mss 1460,sackOK,timestamp 4294940315 0,nop,wscale 7> (DF)
22:42:41.143063 204.152.189.116.http > 192.168.1.6.32843: S 1404108869:1404108869(0) ack 1994994485 win 5792 <mss 1452,sackOK,timestamp 3383469176 4294940315,nop,wscale 0> (DF)
22:42:41.143123 192.168.1.6.32843 > 204.152.189.116.http: . ack 1 win 45 <nop,nop,timestamp 4294940568 3383469176> (DF)

Alessandro's machine does perform window scaling, tcpdump however does not
understand that and neglects to multiply 45 by 2^7 (=5760). Kernel.org does do
wscale, but defaults to 2^0.

22:42:41.143362 192.168.1.6.32843 > 204.152.189.116.http: P 1:421(420) ack 1 win 45 <nop,nop,timestamp 4294940568 3383469176> (DF)

Alessandro's machine sends a GET request.

22:42:41.147669 204.152.189.116.http > 192.168.1.6.32843: S 1404108869:1404108869(0) ack 1994994485 win 5792 <mss 1452,sackOK,timestamp 3383469180 4294940315,nop,wscale 0> (DF)

www.kernel.org acts like it did not see our ACK.

22:42:41.147723 192.168.1.6.32843 > 204.152.189.116.http: . ack 1 win 45 <nop,nop,timestamp 4294940572 3383469180,nop,nop,sack sack 1 {0:1} > (DF)

Allessandro's machine sends a selective ACK - could have gotten away with a
regular one I'd think?

22:42:41.408763 204.152.189.116.http > 192.168.1.6.32843: . ack 421 win 6432 <nop,nop,timestamp 3383469440 4294940568> (DF)

www.kernel.org acks the GET, but from then on does not send anything.

After a minute, Alessandro gets bored and presses STOP in Mozilla:

22:43:41.051537 192.168.1.6.32843 > 204.152.189.116.http: F 421:421(0) ack 1 win 45 <nop,nop,timestamp 33189 3383469440> (DF)

k.o acks this FIN, but also sends a selective ack:

22:43:41.304371 204.152.189.116.http > 192.168.1.6.32843: . ack 422 win 6432 <nop,nop,timestamp 3383529343 33189,nop,nop,sack sack 1 {421:422} > (DF)

Here are the underlying reports:

http://lkml.org/lkml/2004/7/4/116 (Alessandro Suardi)
	The _only_ site I found I can browse without disabling TCP
	  window scaling is http://www.google.it.

	tcpdump at: http://xoomer.virgilio.it/incident/tcpdump.out 

http://lkml.org/lkml/2004/7/5/105 (Phy Prabab)
	Concerning my issue with ftp'ing from remote sites,
	using these sysctl's I was able to get the performance
	back:

	net.ipv4.tcp_default_win_scale=0
	net.ipv4.tcp_moderate_rcvbuf=0

	2.6.7-bk18 w/out sysctls:
	2573621 bytes received in 2e+02 seconds (12 Kbytes/s)

	2.6.7-bk18 w/sysctls:
	2573621 bytes received in 0.69 seconds (3.6e+03
	Kbytes/s)

These both refer to bk after the *tcp_prot.memory_pressure patch was
applied.

Thanks for your attention.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] fix tcp_default_win_scale.
  2004-07-06  9:35             ` analysis of TCP window size issues still around - several reports / SACK involved? bert hubert
@ 2004-07-06 18:47               ` Stephen Hemminger
  2004-07-06 19:40                 ` Jamie Lokier
                                   ` (5 more replies)
  2004-07-06 20:19               ` analysis of TCP window size issues still around - several reports / SACK involved? David S. Miller
  1 sibling, 6 replies; 37+ messages in thread
From: Stephen Hemminger @ 2004-07-06 18:47 UTC (permalink / raw)
  To: David S. Miller
  Cc: bert hubert, Arnaldo Carvalho de Melo, netdev, alessandro.suardi,
	phyprabab, netdev, linux-net, linux-kernel

Recent TCP changes exposed the problem that there ar lots of really broken firewalls 
that strip or alter TCP options.
When the options are modified TCP gets busted now.  The problem is that when
we propose window scaling, we expect that the other side receives the same initial
SYN request that we sent.  If there is corrupting firewalls that strip it then
the window we send is not correctly scaled; so the other side thinks there is not
enough space to send.

I propose that the following that will avoid sending window scaling that
is big enough to break in these cases unless the tcp_rmem has been increased.
It will keep default configuration from blowing in a corrupt world.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

diff -Nru a/include/linux/sysctl.h b/include/linux/sysctl.h
--- a/include/linux/sysctl.h	2004-07-06 11:45:18 -07:00
+++ b/include/linux/sysctl.h	2004-07-06 11:45:18 -07:00
@@ -337,7 +337,7 @@
  	NET_TCP_BIC=102,
  	NET_TCP_BIC_FAST_CONVERGENCE=103,
 	NET_TCP_BIC_LOW_WINDOW=104,
-	NET_TCP_DEFAULT_WIN_SCALE=105,
+/*	NET_TCP_DEFAULT_WIN_SCALE */
 	NET_TCP_MODERATE_RCVBUF=106,
 };
 
diff -Nru a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h	2004-07-06 11:45:18 -07:00
+++ b/include/net/tcp.h	2004-07-06 11:45:18 -07:00
@@ -611,7 +611,6 @@
 extern int sysctl_tcp_bic;
 extern int sysctl_tcp_bic_fast_convergence;
 extern int sysctl_tcp_bic_low_window;
-extern int sysctl_tcp_default_win_scale;
 extern int sysctl_tcp_moderate_rcvbuf;
 
 extern atomic_t tcp_memory_allocated;
@@ -1690,6 +1689,13 @@
 		*ptr++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_WINDOW << 16) | (TCPOLEN_WINDOW << 8) | (wscale));
 }
 
+/* Default window scaling based on the size of the maximum window  */
+static inline __u8 tcp_default_win_scale(void)
+{
+	int b = ffs(sysctl_tcp_rmem[2]);
+	return (b < 17) ? 0 : b-16;
+}
+
 /* Determine a window scaling and initial window to offer.
  * Based on the assumption that the given amount of space
  * will be offered. Store the results in the tp structure.
@@ -1732,8 +1738,7 @@
 		    space - max((space>>sysctl_tcp_app_win), mss>>*rcv_wscale) < 65536/2)
 			(*rcv_wscale)--;
 
-		*rcv_wscale = max((__u8)sysctl_tcp_default_win_scale,
-				  *rcv_wscale);
+		*rcv_wscale = max(tcp_default_win_scale(), *rcv_wscale);
 	}
 
 	/* Set initial window to value enough for senders,
diff -Nru a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
--- a/net/ipv4/sysctl_net_ipv4.c	2004-07-06 11:45:18 -07:00
+++ b/net/ipv4/sysctl_net_ipv4.c	2004-07-06 11:45:18 -07:00
@@ -667,14 +667,6 @@
 		.proc_handler	= &proc_dointvec,
 	},
 	{
-		.ctl_name	= NET_TCP_DEFAULT_WIN_SCALE,
-		.procname	= "tcp_default_win_scale",
-		.data		= &sysctl_tcp_default_win_scale,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= &proc_dointvec,
-	},
-	{
 		.ctl_name	= NET_TCP_MODERATE_RCVBUF,
 		.procname	= "tcp_moderate_rcvbuf",
 		.data		= &sysctl_tcp_moderate_rcvbuf,
diff -Nru a/net/ipv4/tcp.c b/net/ipv4/tcp.c
--- a/net/ipv4/tcp.c	2004-07-06 11:45:18 -07:00
+++ b/net/ipv4/tcp.c	2004-07-06 11:45:18 -07:00
@@ -276,8 +276,6 @@
 
 atomic_t tcp_orphan_count = ATOMIC_INIT(0);
 
-int sysctl_tcp_default_win_scale = 7;
-
 int sysctl_tcp_mem[3];
 int sysctl_tcp_wmem[3] = { 4 * 1024, 16 * 1024, 128 * 1024 };
 int sysctl_tcp_rmem[3] = { 4 * 1024, 87380, 87380 * 2 };

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 18:47               ` [PATCH] fix tcp_default_win_scale Stephen Hemminger
@ 2004-07-06 19:40                 ` Jamie Lokier
  2004-07-06 20:05                   ` Stephen Hemminger
  2004-07-06 20:12                   ` David S. Miller
  2004-07-06 20:00                 ` Nivedita Singhvi
                                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 37+ messages in thread
From: Jamie Lokier @ 2004-07-06 19:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, linux-net, linux-kernel

Stephen Hemminger wrote:
> Recent TCP changes exposed the problem that there ar lots of really
> broken firewalls that strip or alter TCP options.  When the options
> are modified TCP gets busted now.  The problem is that when we
> propose window scaling, we expect that the other side receives the
> same initial SYN request that we sent.  If there is corrupting
> firewalls that strip it then the window we send is not correctly
> scaled; so the other side thinks there is not enough space to send.

If a firewall strips the window scaling option in both directions,
then window scaling is disabled (RFC 1323 section 2.2).

Are you saying there are broken firewalls which strip TCP options in
one direction only?

-- Jamie

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 18:47               ` [PATCH] fix tcp_default_win_scale Stephen Hemminger
  2004-07-06 19:40                 ` Jamie Lokier
@ 2004-07-06 20:00                 ` Nivedita Singhvi
  2004-07-06 20:16                   ` David S. Miller
       [not found]                 ` <20040706185856.GN18841@lug-owl.de>
                                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 37+ messages in thread
From: Nivedita Singhvi @ 2004-07-06 20:00 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, bert hubert, Arnaldo Carvalho de Melo, netdev,
	alessandro.suardi, phyprabab, linux-net, linux-kernel

Stephen Hemminger wrote:
> Recent TCP changes exposed the problem that there ar lots of really broken firewalls 
> that strip or alter TCP options.

We should not be accepting of this situation, surely. I mean, the firewalls
have to get fixed. Multiple things are breaking here, due to this. What
are the other options they are messing with, and and any idea why?

> When the options are modified TCP gets busted now.  The problem is that when
> we propose window scaling, we expect that the other side receives the same initial
> SYN request that we sent.  If there is corrupting firewalls that strip it then
> the window we send is not correctly scaled; so the other side thinks there is not
> enough space to send.

If the firewall is actually stripping the TCP window scaling option,
then that tells the other end that we can't *receive* scaled windows
either, since the option indicates both, we are sending and capable
of receiving. i.e. The other end will not send us scaled windows.
There is no way we can fix this on the rcv end.

> I propose that the following that will avoid sending window scaling that
> is big enough to break in these cases unless the tcp_rmem has been increased.
> It will keep default configuration from blowing in a corrupt world.

Does this need to be the default behaviour? Just how prevalent is
this??

thanks,
Nivedita


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 19:40                 ` Jamie Lokier
@ 2004-07-06 20:05                   ` Stephen Hemminger
  2004-07-06 20:28                     ` David S. Miller
  2004-07-06 20:12                   ` David S. Miller
  1 sibling, 1 reply; 37+ messages in thread
From: Stephen Hemminger @ 2004-07-06 20:05 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: netdev, linux-net, linux-kernel

On Tue, 6 Jul 2004 20:40:34 +0100
Jamie Lokier <jamie@shareable.org> wrote:

> Stephen Hemminger wrote:
> > Recent TCP changes exposed the problem that there ar lots of really
> > broken firewalls that strip or alter TCP options.  When the options
> > are modified TCP gets busted now.  The problem is that when we
> > propose window scaling, we expect that the other side receives the
> > same initial SYN request that we sent.  If there is corrupting
> > firewalls that strip it then the window we send is not correctly
> > scaled; so the other side thinks there is not enough space to send.
> 
> If a firewall strips the window scaling option in both directions,
> then window scaling is disabled (RFC 1323 section 2.2).
> 
> Are you saying there are broken firewalls which strip TCP options in
> one direction only?

It appears so.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 19:40                 ` Jamie Lokier
  2004-07-06 20:05                   ` Stephen Hemminger
@ 2004-07-06 20:12                   ` David S. Miller
  2004-07-06 22:44                     ` bert hubert
  1 sibling, 1 reply; 37+ messages in thread
From: David S. Miller @ 2004-07-06 20:12 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: shemminger, netdev, linux-net, linux-kernel

On Tue, 6 Jul 2004 20:40:34 +0100
Jamie Lokier <jamie@shareable.org> wrote:

> If a firewall strips the window scaling option in both directions,
> then window scaling is disabled (RFC 1323 section 2.2).
> 
> Are you saying there are broken firewalls which strip TCP options in
> one direction only?

It is this specific case:

1) SYN packet contains window scale option of ZERO.

   This says two things, that the system will use a window
   scale of ZERO and that it SUPPORTS send and receive window
   scaling.

   If the firewall were to delete this, we'd be OK, but it
   does not.  It leaves the option with zero in there.

2) SYN+ACK goes back out with non-zero window scale option.

   Note that because of #1, it is impossible for the system
   which sent the SYN packet to "refuse" the window scale
   option sent in the SYN+ACK.

   Here is where we have problems.  If the firewall patches
   the scale to zero, which is what some of these things
   are doing, it is then the firewall's responsibility to
   scale the window to make it appear to be zero-scaled.

   And this is not being done by these broken firewalls.

BTW, this is why it is so important to get tcpdump traces
at both ends of the connection to analyze problems like
this.  If you look at only one side with dumps, you might
not get the side that is getting packets edited by a
firewall or other device.

These machines are so broken that I absolutely refuse to change
how we behave to work around them.

If they want window scaling to be effectively disabled, they should
patch out the window scale option in the "SYN" packet, this prevents
the SYN+ACK sending system from advertising any window scaling support.

What these broken devices are doing is effectively making window
scaling unusable on the internet, and I refuse to swallow such
crap.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:00                 ` Nivedita Singhvi
@ 2004-07-06 20:16                   ` David S. Miller
  2004-07-06 20:26                     ` David Ford
  0 siblings, 1 reply; 37+ messages in thread
From: David S. Miller @ 2004-07-06 20:16 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: shemminger, ahu, acme, netdev, alessandro.suardi, phyprabab,
	linux-net, linux-kernel

On Tue, 06 Jul 2004 13:00:07 -0700
Nivedita Singhvi <niv@us.ibm.com> wrote:

> Stephen Hemminger wrote:
> > Recent TCP changes exposed the problem that there ar lots of really broken firewalls 
> > that strip or alter TCP options.
> 
> We should not be accepting of this situation, surely. I mean, the firewalls
> have to get fixed. Multiple things are breaking here, due to this. What
> are the other options they are messing with, and and any idea why?

I totally agree with Nivedita, and that's why I'm not going to
apply Stephen's patch.

> If the firewall is actually stripping the TCP window scaling option,
> then that tells the other end that we can't *receive* scaled windows
> either, since the option indicates both, we are sending and capable
> of receiving. i.e. The other end will not send us scaled windows.
> There is no way we can fix this on the rcv end.
> 

That's correct.  If the SYN contains a window scale option, this tells
the SYN+ACK sending side that both receive and send side window scaling
is supported.  I think what's really happening is that the firewall is
patching the non-zero window scale option in the SYN+ACK  packet to be
zero, yet not adjusting the window field of packets in the rest of the
TCP stream.

> Does this need to be the default behaviour? Just how prevalent is
> this??

Frankly, I've personally seen none of this.  I sit on a DSL line with
no firewalling at my end and I can access all sites just fine.  This
seems to indicate that most of the breakage is local to the user's
point of access to the net, rather than a firewall at google.com
or kernel.org or similar.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
       [not found]                 ` <20040706185856.GN18841@lug-owl.de>
@ 2004-07-06 20:17                   ` David S. Miller
  2004-07-06 20:31                     ` Stephen Hemminger
  0 siblings, 1 reply; 37+ messages in thread
From: David S. Miller @ 2004-07-06 20:17 UTC (permalink / raw)
  To: Jan-Benedict Glaw; +Cc: linux-net, linux-kernel, netdev

On Tue, 6 Jul 2004 20:58:56 +0200
Jan-Benedict Glaw <jbglaw@lug-owl.de> wrote:

> On Tue, 2004-07-06 11:47:41 -0700, Stephen Hemminger <shemminger@osdl.org>
> wrote in message <20040706114741.1bf98bbe@dell_ss3.pdx.osdl.net>:
> 
> > I propose that the following that will avoid sending window scaling that
> > is big enough to break in these cases unless the tcp_rmem has been increased.
> > It will keep default configuration from blowing in a corrupt world.
> 
> I'm not sure if this is the right way to react. I'd think it's okay to
> give the user the possibility to scale the window so that it works with
> his b0rk3d firewall, but default behavior should be to do whatever the
> protocol dictates/allows.

I totally agree, and that's why the sysctl is there for people to
tweak as they desire.

Jan, any particular reason you removed so much stuff (in particular
netdev@oss.sgi.com) from the CC: list in your posting here?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: analysis of TCP window size issues still around - several reports / SACK involved?
  2004-07-06  9:35             ` analysis of TCP window size issues still around - several reports / SACK involved? bert hubert
  2004-07-06 18:47               ` [PATCH] fix tcp_default_win_scale Stephen Hemminger
@ 2004-07-06 20:19               ` David S. Miller
  2004-07-06 20:27                 ` bert hubert
  1 sibling, 1 reply; 37+ messages in thread
From: David S. Miller @ 2004-07-06 20:19 UTC (permalink / raw)
  To: bert hubert; +Cc: acme, shemminger, netdev, alessandro.suardi, phyprabab

On Tue, 6 Jul 2004 11:35:03 +0200
bert hubert <ahu@ds9a.nl> wrote:

> 22:42:40.890025 192.168.1.6.32843 > 204.152.189.116.http: S 1994994484:1994994484(0) win 5840 <mss 1460,sackOK,timestamp 4294940315 0,nop,wscale 7> (DF)
> 22:42:41.143063 204.152.189.116.http > 192.168.1.6.32843: S 1404108869:1404108869(0) ack 1994994485 win 5792 <mss 1452,sackOK,timestamp 3383469176 4294940315,nop,wscale 0> (DF)
> 22:42:41.143123 192.168.1.6.32843 > 204.152.189.116.http: . ack 1 win 45 <nop,nop,timestamp 4294940568 3383469176> (DF)
> 
> Alessandro's machine does perform window scaling, tcpdump however does not
> understand that and neglects to multiply 45 by 2^7 (=5760). Kernel.org does do
> wscale, but defaults to 2^0.

tcpdump's behavior is correct, it's just reporting the raw window
field in the TCP header, unscaled, and that is fine.  In fact I'd
rather it do this, so that diagnosing dumps are easier.  If tcpdump
tries to be too clever, scaling the window, then I might end up
chasing down a tcpdump bug rather than a TCP one :-)

What would be more interesting is to get the tcpdump trace from the
other side of this connection.  This is crucial, as it will show how
and in what way exactly the window scale options and/or window fields
are being edited by a firewall or other device and thus causing
the problems.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 18:47               ` [PATCH] fix tcp_default_win_scale Stephen Hemminger
                                   ` (2 preceding siblings ...)
       [not found]                 ` <20040706185856.GN18841@lug-owl.de>
@ 2004-07-06 20:24                 ` David S. Miller
  2004-07-06 23:16                   ` Andi Kleen
  2004-07-06 23:19                 ` Redeeman
  2004-07-07 19:47                 ` John Heffner
  5 siblings, 1 reply; 37+ messages in thread
From: David S. Miller @ 2004-07-06 20:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: ahu, acme, netdev, alessandro.suardi, phyprabab, linux-net,
	linux-kernel

On Tue, 6 Jul 2004 11:47:41 -0700
Stephen Hemminger <shemminger@osdl.org> wrote:

> The problem is that when
> we propose window scaling, we expect that the other side receives the same initial
> SYN request that we sent.  If there is corrupting firewalls that strip it then
> the window we send is not correctly scaled; so the other side thinks there is not
> enough space to send.

Inaccurate analysis Stephen.

If the window option is edited out from the SYN by the firewall,
it is impossible for the receiving system to respond with any
window scaling option in the SYN+ACK packet.

If a window scale option is not present in the SYN, it means that
it does not support window scaling at all.

What must be really happening, therefore, is that the firewall is
patching the scale factor in the option, not deleting it outright.
And then it isn't properly rescaling the window field in the TCP
headers for the rest of the connection's lifetime.  That would explain
all of this.

We can confirm this by getting a trace at both ends of a sick connection,
and seeing if a non-zero window scale option gets patched to some other
value by the time it reaches the receiving system.

Then we will be aware of two bugs:

1) Cisco IOS, when NAT'ing, can mis-adjust SACK block options such
   that the sequence numbers are corrupt.

2) Some firewalls patch non-zero window scale options to be zero ones
   yet do not properly adjust the window field in TCP headers for the
   rest of the connection.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:16                   ` David S. Miller
@ 2004-07-06 20:26                     ` David Ford
  0 siblings, 0 replies; 37+ messages in thread
From: David Ford @ 2004-07-06 20:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: Nivedita Singhvi, shemminger, ahu, acme, netdev,
	alessandro.suardi, phyprabab, linux-net, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 593 bytes --]

It's been a while since I used a 1460 MTU for PPTP over DSL, but unless 
OSDN got a clue recently, their firewalls drop the ICMP for PMTU 
discovery.  Does anyone have a tool that exercises a bunch of TCP/IP 
options to detect such broken firewalls?

David

David S. Miller wrote:

>[...]
>Frankly, I've personally seen none of this.  I sit on a DSL line with
>no firewalling at my end and I can access all sites just fine.  This
>seems to indicate that most of the breakage is local to the user's
>point of access to the net, rather than a firewall at google.com
>or kernel.org or similar.
>

[-- Attachment #2: david+challenge-response.vcf --]
[-- Type: text/x-vcard, Size: 183 bytes --]

begin:vcard
fn:David Ford
n:Ford;David
email;internet:david@blue-labs.org
title:Industrial Geek
tel;home:Ask please
tel;cell:(203) 650-3611
x-mozilla-html:TRUE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: analysis of TCP window size issues still around - several reports / SACK involved?
  2004-07-06 20:19               ` analysis of TCP window size issues still around - several reports / SACK involved? David S. Miller
@ 2004-07-06 20:27                 ` bert hubert
  2004-07-06 20:31                   ` David S. Miller
  2004-07-07 21:25                   ` Alessandro Suardi
  0 siblings, 2 replies; 37+ messages in thread
From: bert hubert @ 2004-07-06 20:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: acme, shemminger, netdev, alessandro.suardi, phyprabab

On Tue, Jul 06, 2004 at 01:19:55PM -0700, David S. Miller wrote:

> rather it do this, so that diagnosing dumps are easier.  If tcpdump
> tries to be too clever, scaling the window, then I might end up
> chasing down a tcpdump bug rather than a TCP one :-)

True - it might want to print '43 (*128=5706)' or something like that.

> What would be more interesting is to get the tcpdump trace from the
> other side of this connection.  This is crucial, as it will show how
> and in what way exactly the window scale options and/or window fields
> are being edited by a firewall or other device and thus causing
> the problems.

I have an appointment with Alessandro tomorrow evening at 11PM CEST to do
just that.

It sounds like window scaling may become yet another ECN...

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:05                   ` Stephen Hemminger
@ 2004-07-06 20:28                     ` David S. Miller
  2004-07-06 20:36                       ` Stephen Hemminger
  0 siblings, 1 reply; 37+ messages in thread
From: David S. Miller @ 2004-07-06 20:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: jamie, netdev, linux-net, linux-kernel

On Tue, 6 Jul 2004 13:05:49 -0700
Stephen Hemminger <shemminger@osdl.org> wrote:

> On Tue, 6 Jul 2004 20:40:34 +0100
> Jamie Lokier <jamie@shareable.org> wrote:
> 
> > Are you saying there are broken firewalls which strip TCP options in
> > one direction only?
> 
> It appears so.

Ok, this is a possibility.  And why it breaks is that if the ACK
for the SYN+ACK comes back, the SYN+ACK sender can only assume
that the window scale was accepted.

Stephen, do you have a trace showing exactly this?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: analysis of TCP window size issues still around - several reports / SACK involved?
  2004-07-06 20:27                 ` bert hubert
@ 2004-07-06 20:31                   ` David S. Miller
  2004-07-07 21:25                   ` Alessandro Suardi
  1 sibling, 0 replies; 37+ messages in thread
From: David S. Miller @ 2004-07-06 20:31 UTC (permalink / raw)
  To: bert hubert; +Cc: acme, shemminger, netdev, alessandro.suardi, phyprabab

On Tue, 6 Jul 2004 22:27:08 +0200
bert hubert <ahu@ds9a.nl> wrote:

> On Tue, Jul 06, 2004 at 01:19:55PM -0700, David S. Miller wrote:
> 
> > What would be more interesting is to get the tcpdump trace from the
> > other side of this connection.  This is crucial, as it will show how
> > and in what way exactly the window scale options and/or window fields
> > are being edited by a firewall or other device and thus causing
> > the problems.
> 
> I have an appointment with Alessandro tomorrow evening at 11PM CEST to do
> just that.

That's great, thanks a lot.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:17                   ` David S. Miller
@ 2004-07-06 20:31                     ` Stephen Hemminger
  2004-07-06 20:33                       ` David S. Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Stephen Hemminger @ 2004-07-06 20:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jan-Benedict Glaw, linux-net, linux-kernel, netdev

On Tue, 6 Jul 2004 13:17:31 -0700
"David S. Miller" <davem@redhat.com> wrote:

> On Tue, 6 Jul 2004 20:58:56 +0200
> Jan-Benedict Glaw <jbglaw@lug-owl.de> wrote:
> 
> > On Tue, 2004-07-06 11:47:41 -0700, Stephen Hemminger <shemminger@osdl.org>
> > wrote in message <20040706114741.1bf98bbe@dell_ss3.pdx.osdl.net>:
> > 
> > > I propose that the following that will avoid sending window scaling that
> > > is big enough to break in these cases unless the tcp_rmem has been increased.
> > > It will keep default configuration from blowing in a corrupt world.
> > 
> > I'm not sure if this is the right way to react. I'd think it's okay to
> > give the user the possibility to scale the window so that it works with
> > his b0rk3d firewall, but default behavior should be to do whatever the
> > protocol dictates/allows.
> 
> I totally agree, and that's why the sysctl is there for people to
> tweak as they desire.
> 
> Jan, any particular reason you removed so much stuff (in particular
> netdev@oss.sgi.com) from the CC: list in your posting here?

The point is we are sending a bigger window scale then we need to.
The maximum receive window is limited by tcp_rmem[2], so we only need to
allow that much.  Having a different sysctl just for that is unnecessary and
potentially confusing.

The default tcp_rmem[2] is 174760, so we only need a wscale of 2 to represent
that. We were sending 7.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:31                     ` Stephen Hemminger
@ 2004-07-06 20:33                       ` David S. Miller
  0 siblings, 0 replies; 37+ messages in thread
From: David S. Miller @ 2004-07-06 20:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: jbglaw, linux-net, linux-kernel, netdev

On Tue, 6 Jul 2004 13:31:46 -0700
Stephen Hemminger <shemminger@osdl.org> wrote:

> The default tcp_rmem[2] is 174760, so we only need a wscale of 2 to represent
> that. We were sending 7.

It's only going to paper over this problem, because a window scale
of 2 still gets edited by the firewalls yet doesn't cause the
kind of damage 7 does.

Also, using a value of 7 is very safe, because it handles even the
tinyest of MTU's in use today (512 byte SLIP connections, for example
can still advertise sub-MTU sized chunks in the window).  Since
a window scale of 7 allows a granularity of 128 octets.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:36                       ` Stephen Hemminger
@ 2004-07-06 20:35                         ` David S. Miller
  2004-07-06 21:55                           ` John Heffner
  2004-07-06 23:01                           ` PLS help fix: recent 2.6.7 won't connect to anything " bert hubert
  0 siblings, 2 replies; 37+ messages in thread
From: David S. Miller @ 2004-07-06 20:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: jamie, netdev, linux-net, linux-kernel

On Tue, 6 Jul 2004 13:36:41 -0700
Stephen Hemminger <shemminger@osdl.org> wrote:

> > Ok, this is a possibility.  And why it breaks is that if the ACK
> > for the SYN+ACK comes back, the SYN+ACK sender can only assume
> > that the window scale was accepted.
> > 
> > Stephen, do you have a trace showing exactly this?
> 
> No, I don't have a br0ken firewall here.  I can get out fine.
> When I setup with same kernel as packages.gentoo.org, it works fine as well.

Therefore we do not know which of the following two it really is:

1) window scale option being stripped from SYN+ACK

2) non-zero window option being patched into a zero window
   scale option

The trace Bert Hubert will get with Alessandro will give us the
information we need.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:28                     ` David S. Miller
@ 2004-07-06 20:36                       ` Stephen Hemminger
  2004-07-06 20:35                         ` David S. Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Stephen Hemminger @ 2004-07-06 20:36 UTC (permalink / raw)
  To: David S. Miller; +Cc: jamie, netdev, linux-net, linux-kernel

On Tue, 6 Jul 2004 13:28:22 -0700
"David S. Miller" <davem@redhat.com> wrote:

> On Tue, 6 Jul 2004 13:05:49 -0700
> Stephen Hemminger <shemminger@osdl.org> wrote:
> 
> > On Tue, 6 Jul 2004 20:40:34 +0100
> > Jamie Lokier <jamie@shareable.org> wrote:
> > 
> > > Are you saying there are broken firewalls which strip TCP options in
> > > one direction only?
> > 
> > It appears so.
> 
> Ok, this is a possibility.  And why it breaks is that if the ACK
> for the SYN+ACK comes back, the SYN+ACK sender can only assume
> that the window scale was accepted.
> 
> Stephen, do you have a trace showing exactly this?

No, I don't have a br0ken firewall here.  I can get out fine.
When I setup with same kernel as packages.gentoo.org, it works fine as well.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:35                         ` David S. Miller
@ 2004-07-06 21:55                           ` John Heffner
  2004-07-06 22:50                             ` David S. Miller
  2004-07-06 23:01                           ` PLS help fix: recent 2.6.7 won't connect to anything " bert hubert
  1 sibling, 1 reply; 37+ messages in thread
From: John Heffner @ 2004-07-06 21:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-net, linux-kernel

Another bit to addr to the firewall / window scale mess:  I remember from
a while ago that the Cisco PIX firewalls would not allow a window scale of
greater than 8.  I don't know if they've fixed this or not.  It seems
like some sort of arbitrary limit.

This is obviously not the problem people are seeing now, but could be a
problem in the future.

  -John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:12                   ` David S. Miller
@ 2004-07-06 22:44                     ` bert hubert
  2004-07-06 22:49                       ` David S. Miller
  0 siblings, 1 reply; 37+ messages in thread
From: bert hubert @ 2004-07-06 22:44 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jamie Lokier, shemminger, netdev, linux-net, linux-kernel

On Tue, Jul 06, 2004 at 01:12:35PM -0700, David S. Miller wrote:

> It is this specific case:
> 
> 1) SYN packet contains window scale option of ZERO.

Not true - the outgoing SYN packet had window scale 7, when it was sent. The
SYN|ACK had window scale 0, when received by the initiating system.

Also - even if the remote were to assume a 47 byte window size, would it not
be able to send small packets? Or does the window size also include
packet haders?

Regards,

bert


-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 22:44                     ` bert hubert
@ 2004-07-06 22:49                       ` David S. Miller
  2004-07-07 18:06                         ` Stephen Hemminger
  0 siblings, 1 reply; 37+ messages in thread
From: David S. Miller @ 2004-07-06 22:49 UTC (permalink / raw)
  To: bert hubert; +Cc: jamie, shemminger, netdev, linux-net, linux-kernel

On Wed, 7 Jul 2004 00:44:53 +0200
bert hubert <ahu@ds9a.nl> wrote:

> Not true - the outgoing SYN packet had window scale 7, when it was sent. The
> SYN|ACK had window scale 0, when received by the initiating system.
> 
> Also - even if the remote were to assume a 47 byte window size, would it not
> be able to send small packets? Or does the window size also include
> packet haders?

SWS avoidance makes us not send packets.  See this quote in an email
from John Heffner the other week:

================================
To elaborate on my earlier mail. my hypothesis is that somehow the web
server beleives that we sent a winscale of 0.  In such a case, when we try
to advertise our initial 4*MSS (5840 bytes) of window, with a window scale
of 3 we use a value of 730 in the window field.  All sender SWS avoidance
(RFC1122) tests will fail, most notably 1 (because we already advertised
5840 bytes and 730 < 5840/2) and 3 (because 730 < 1460).  With a winscale
of 2, we will use a value of 1460 in the window field, so both tests will
succeed.
================================

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 21:55                           ` John Heffner
@ 2004-07-06 22:50                             ` David S. Miller
  2004-07-07  1:32                               ` John Heffner
  0 siblings, 1 reply; 37+ messages in thread
From: David S. Miller @ 2004-07-06 22:50 UTC (permalink / raw)
  To: John Heffner; +Cc: netdev, linux-net, linux-kernel

On Tue, 6 Jul 2004 17:55:12 -0400 (EDT)
John Heffner <jheffner@psc.edu> wrote:

> Another bit to addr to the firewall / window scale mess:  I remember from
> a while ago that the Cisco PIX firewalls would not allow a window scale of
> greater than 8.  I don't know if they've fixed this or not.  It seems
> like some sort of arbitrary limit.

In what manner did it deal with > 8 window scales?  By rewriting the option
or deleting the option entirely from the SYN or SYN+ACK packets?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* PLS help fix: recent 2.6.7 won't connect to anything Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:35                         ` David S. Miller
  2004-07-06 21:55                           ` John Heffner
@ 2004-07-06 23:01                           ` bert hubert
  1 sibling, 0 replies; 37+ messages in thread
From: bert hubert @ 2004-07-06 23:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: Stephen Hemminger, jamie, netdev, linux-net, linux-kernel

On Tue, Jul 06, 2004 at 01:35:59PM -0700, David S. Miller wrote:

> Therefore we do not know which of the following two it really is:

Anybody with this problem is kindly invited to try to connect to
213.244.168.210, port 10000, http://213.244.168.210:10000/ should work.

If you have a problem, email me with your IP address, I have a tcpdump
running.

> 1) window scale option being stripped from SYN+ACK

The remote is in fact zeus-pub.kernel.org. I assume it does not have a
broken firewall, and I sure haven't, and it sends out to me:

00:46:31.936667 192.168.1.4.34018 > 204.152.189.116.80: S 2786942165:2786942165(0) win 5840 
	<mss 1460,sackOK,timestamp 269093190,nop,wscale 7> (DF)

00:46:32.097745 204.152.189.116.80 > 192.168.1.4.34018: S 2888442437:2888442437(0) 
	ack 2786942166 win 5792 <mss 1460,sackOK,timestamp 3563902477 26909319,nop,wscale 0> (DF)
	                                                                                  ^
00:46:32.098170 192.168.1.4.34018 > 204.152.189.116.80: . ack 1 win 45 
	<nop,nop,timestamp 26909481 3563902477> (DF)

So I would rule out 1), as this is a network that does not have the problem. 

> 2) non-zero window option being patched into a zero window
>    scale option

This looks more likely, on the outgoing SYN. We'll know tomorrow evening
(CEST) or earlier if somebody with the problem volunteers.

Regards,

bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 20:24                 ` David S. Miller
@ 2004-07-06 23:16                   ` Andi Kleen
  2004-07-07  7:50                     ` Chris Wedgwood
  0 siblings, 1 reply; 37+ messages in thread
From: Andi Kleen @ 2004-07-06 23:16 UTC (permalink / raw)
  To: David S. Miller
  Cc: Stephen Hemminger, ahu, acme, netdev, alessandro.suardi,
	phyprabab, linux-net, linux-kernel


I would not change anything, just suggest that users who sit
behind such a broken device do

echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

and yell loudly at their ISPs to get this fixed. Crippling the stack
by default just to work around such obvious bugs would be wrong.

In the past there were similar bugs with broken VJ header compression
algorithms that also corrupted window scaling. We just ignored
these and suggested to the users to turn it off. That worked fine.

[btw it's quite possible that this isn't a firewall, but also
some kind of header compression that is doing the wrong thing]

-Andi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 18:47               ` [PATCH] fix tcp_default_win_scale Stephen Hemminger
                                   ` (3 preceding siblings ...)
  2004-07-06 20:24                 ` David S. Miller
@ 2004-07-06 23:19                 ` Redeeman
  2004-07-07 19:47                 ` John Heffner
  5 siblings, 0 replies; 37+ messages in thread
From: Redeeman @ 2004-07-06 23:19 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, bert hubert, Arnaldo Carvalho de Melo, netdev,
	alessandro.suardi, phyprabab, linux-net, LKML Mailinglist

On Tue, 2004-07-06 at 11:47 -0700, Stephen Hemminger wrote:
> Recent TCP changes exposed the problem that there ar lots of really broken firewalls 
> that strip or alter TCP options.
> When the options are modified TCP gets busted now.  The problem is that when
> we propose window scaling, we expect that the other side receives the same initial
> SYN request that we sent.  If there is corrupting firewalls that strip it then
> the window we send is not correctly scaled; so the other side thinks there is not
> enough space to send.
> 
> I propose that the following that will avoid sending window scaling that
> is big enough to break in these cases unless the tcp_rmem has been increased.
> It will keep default configuration from blowing in a corrupt world.
so this should fix the issues? can you also tell me why this suddenly happend? that would make me a real happy man

> Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 22:50                             ` David S. Miller
@ 2004-07-07  1:32                               ` John Heffner
  0 siblings, 0 replies; 37+ messages in thread
From: John Heffner @ 2004-07-07  1:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-net, linux-kernel

On Tue, 6 Jul 2004, David S. Miller wrote:

> On Tue, 6 Jul 2004 17:55:12 -0400 (EDT)
> John Heffner <jheffner@psc.edu> wrote:
>
> > Another bit to addr to the firewall / window scale mess:  I remember from
> > a while ago that the Cisco PIX firewalls would not allow a window scale of
> > greater than 8.  I don't know if they've fixed this or not.  It seems
> > like some sort of arbitrary limit.
>
> In what manner did it deal with > 8 window scales?  By rewriting the option
> or deleting the option entirely from the SYN or SYN+ACK packets?

I don't recall.  It was not as ugly as changing the option value.  It may
have just sent a RST.

  -John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 23:16                   ` Andi Kleen
@ 2004-07-07  7:50                     ` Chris Wedgwood
  0 siblings, 0 replies; 37+ messages in thread
From: Chris Wedgwood @ 2004-07-07  7:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David S. Miller, Stephen Hemminger, ahu, acme, netdev,
	alessandro.suardi, phyprabab, linux-net, linux-kernel

On Wed, Jul 07, 2004 at 01:16:00AM +0200, Andi Kleen wrote:

> [btw it's quite possible that this isn't a firewall, but also
> some kind of header compression that is doing the wrong thing]

... or some kind of nasty intrusive bandwidth molesting device like a
PacketShaper...


   --cw

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 22:49                       ` David S. Miller
@ 2004-07-07 18:06                         ` Stephen Hemminger
  2004-07-07 19:31                           ` Jamie Lokier
                                             ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Stephen Hemminger @ 2004-07-07 18:06 UTC (permalink / raw)
  To: David S. Miller; +Cc: bert hubert, jamie, netdev, linux-net, linux-kernel

I do not argue with that the correct thing to do is to use window scaling
and find/fix the poor sop's stuck behind busted networks.

But: isn't it better to have just one sysctl parameter set (tcp_rmem)
and set the window scale as needed rather than increasing the already
bewildering array of dials and knobs?  I can't see why it would be advantageous
to set a window scale of 7 if the largest possible window ever offered
is limited to a smaller value? 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-07 18:06                         ` Stephen Hemminger
@ 2004-07-07 19:31                           ` Jamie Lokier
  2004-07-07 19:38                             ` bert hubert
  2004-07-07 19:41                           ` John Heffner
  2004-07-09 23:14                           ` David S. Miller
  2 siblings, 1 reply; 37+ messages in thread
From: Jamie Lokier @ 2004-07-07 19:31 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, bert hubert, netdev, linux-net, linux-kernel

Stephen Hemminger wrote:
> But: isn't it better to have just one sysctl parameter set
> (tcp_rmem) and set the window scale as needed rather than increasing
> the already bewildering array of dials and knobs?  I can't see why
> it would be advantageous to set a window scale of 7 if the largest
> possible window ever offered is limited to a smaller value?

That's a fair question.

It seems to me the only effects of a larger scale than necessary
are (a) the buffer size can be increased after the connection is
established, and (b) coarser granularity which can only degrade
performance over low mss links.

So why do we set a larger window scale than necessary?
Is it to support (a)?

-- Jamie


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-07 19:31                           ` Jamie Lokier
@ 2004-07-07 19:38                             ` bert hubert
  0 siblings, 0 replies; 37+ messages in thread
From: bert hubert @ 2004-07-07 19:38 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Stephen Hemminger, David S. Miller, netdev, linux-net,
	linux-kernel

On Wed, Jul 07, 2004 at 08:31:25PM +0100, Jamie Lokier wrote:

> So why do we set a larger window scale than necessary?
> Is it to support (a)?

It might be useful to shake out the bugs of the internet - so far I have
indications that at least on residential ADSL router is responsible, it
removes wscale when doing TCP portforwarding.

If and when we decide to do larger rmem than now, we might have a better
chance.

Regards,

bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-07 18:06                         ` Stephen Hemminger
  2004-07-07 19:31                           ` Jamie Lokier
@ 2004-07-07 19:41                           ` John Heffner
  2004-07-09 23:14                           ` David S. Miller
  2 siblings, 0 replies; 37+ messages in thread
From: John Heffner @ 2004-07-07 19:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, bert hubert, jamie, netdev, linux-net,
	linux-kernel

On Wed, 7 Jul 2004, Stephen Hemminger wrote:

> I do not argue with that the correct thing to do is to use window scaling
> and find/fix the poor sop's stuck behind busted networks.
>
> But: isn't it better to have just one sysctl parameter set (tcp_rmem)
> and set the window scale as needed rather than increasing the already
> bewildering array of dials and knobs?  I can't see why it would be advantageous
> to set a window scale of 7 if the largest possible window ever offered
> is limited to a smaller value?


I personally agree with this.  One can imagine cases where it would be
useful to have a control on window scale, but the added complexity is
probably not worth it.

  -John





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-06 18:47               ` [PATCH] fix tcp_default_win_scale Stephen Hemminger
                                   ` (4 preceding siblings ...)
  2004-07-06 23:19                 ` Redeeman
@ 2004-07-07 19:47                 ` John Heffner
  5 siblings, 0 replies; 37+ messages in thread
From: John Heffner @ 2004-07-07 19:47 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, bert hubert, Arnaldo Carvalho de Melo, netdev,
	alessandro.suardi, phyprabab, netdev, linux-net, linux-kernel

On Tue, 6 Jul 2004, Stephen Hemminger wrote:

> +/* Default window scaling based on the size of the maximum window  */
> +static inline __u8 tcp_default_win_scale(void)
> +{
> +	int b = ffs(sysctl_tcp_rmem[2]);
> +	return (b < 17) ? 0 : b-16;
> +}


I would actually change this to be:

static inline __u8 tcp_select_win_scale(void)
{
	int b = ffs(tcp_win_from_space(max(sysctl_tcp_rmem[2], sysctl_rmem_max)));
	b = (b < 17) ? 0 : b-16;
	return max(b, 14);
}

Then you can also get rid of all the window scale calculation code in
tcp_select_initial_window().

  -John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: analysis of TCP window size issues still around - several reports / SACK involved?
  2004-07-06 20:27                 ` bert hubert
  2004-07-06 20:31                   ` David S. Miller
@ 2004-07-07 21:25                   ` Alessandro Suardi
  1 sibling, 0 replies; 37+ messages in thread
From: Alessandro Suardi @ 2004-07-07 21:25 UTC (permalink / raw)
  To: bert hubert; +Cc: David S. Miller, acme, shemminger, netdev, phyprabab

bert hubert wrote:
> On Tue, Jul 06, 2004 at 01:19:55PM -0700, David S. Miller wrote:
> 
> 
>>rather it do this, so that diagnosing dumps are easier.  If tcpdump
>>tries to be too clever, scaling the window, then I might end up
>>chasing down a tcpdump bug rather than a TCP one :-)
> 
> 
> True - it might want to print '43 (*128=5706)' or something like that.
> 
> 
>>What would be more interesting is to get the tcpdump trace from the
>>other side of this connection.  This is crucial, as it will show how
>>and in what way exactly the window scale options and/or window fields
>>are being edited by a firewall or other device and thus causing
>>the problems.
> 
> 
> I have an appointment with Alessandro tomorrow evening at 11PM CEST to do
> just that.

Sorry about being slightly late - I read the thread from my VPN
  link (which does work), now I'm turning it off and will email
  Bert with my actual IP address and my connection.

Thanks,

--alessandro

  "Practice is more important than theory. A _lot_ more important."
     (Linus Torvalds on lkml, 1 June 2004)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] fix tcp_default_win_scale.
  2004-07-07 18:06                         ` Stephen Hemminger
  2004-07-07 19:31                           ` Jamie Lokier
  2004-07-07 19:41                           ` John Heffner
@ 2004-07-09 23:14                           ` David S. Miller
  2 siblings, 0 replies; 37+ messages in thread
From: David S. Miller @ 2004-07-09 23:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: ahu, jamie, netdev, linux-net, linux-kernel

On Wed, 7 Jul 2004 11:06:53 -0700
Stephen Hemminger <shemminger@osdl.org> wrote:

> But: isn't it better to have just one sysctl parameter set (tcp_rmem)
> and set the window scale as needed rather than increasing the already
> bewildering array of dials and knobs?  I can't see why it would be advantageous
> to set a window scale of 7 if the largest possible window ever offered
> is limited to a smaller value? 

Stephen, here is what is going to happen if we apply your patch.

The default window scale will be 2, which is under the value which
starts to cause the problems which is 3.

So things will silently work, and most people will not notice the
problem.

I'd much rather bugs scream out saying "I'm a bug fix me!" than to
just silently linger around mostly unnoticed.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2004-07-09 23:14 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <32886.63.170.215.71.1088564087.squirrel@www.osdl.org>
     [not found] ` <20040629222751.392f0a82.davem@redhat.com>
     [not found]   ` <20040630152750.2d01ca51@dell_ss3.pdx.osdl.net>
     [not found]     ` <20040630153049.3ca25b76.davem@redhat.com>
2004-07-01 20:37       ` [PATCH] TCP acts like it is always out of memory Stephen Hemminger
2004-07-01 21:04         ` David S. Miller
2004-07-02  1:32           ` Arnaldo Carvalho de Melo
2004-07-06  9:35             ` analysis of TCP window size issues still around - several reports / SACK involved? bert hubert
2004-07-06 18:47               ` [PATCH] fix tcp_default_win_scale Stephen Hemminger
2004-07-06 19:40                 ` Jamie Lokier
2004-07-06 20:05                   ` Stephen Hemminger
2004-07-06 20:28                     ` David S. Miller
2004-07-06 20:36                       ` Stephen Hemminger
2004-07-06 20:35                         ` David S. Miller
2004-07-06 21:55                           ` John Heffner
2004-07-06 22:50                             ` David S. Miller
2004-07-07  1:32                               ` John Heffner
2004-07-06 23:01                           ` PLS help fix: recent 2.6.7 won't connect to anything " bert hubert
2004-07-06 20:12                   ` David S. Miller
2004-07-06 22:44                     ` bert hubert
2004-07-06 22:49                       ` David S. Miller
2004-07-07 18:06                         ` Stephen Hemminger
2004-07-07 19:31                           ` Jamie Lokier
2004-07-07 19:38                             ` bert hubert
2004-07-07 19:41                           ` John Heffner
2004-07-09 23:14                           ` David S. Miller
2004-07-06 20:00                 ` Nivedita Singhvi
2004-07-06 20:16                   ` David S. Miller
2004-07-06 20:26                     ` David Ford
     [not found]                 ` <20040706185856.GN18841@lug-owl.de>
2004-07-06 20:17                   ` David S. Miller
2004-07-06 20:31                     ` Stephen Hemminger
2004-07-06 20:33                       ` David S. Miller
2004-07-06 20:24                 ` David S. Miller
2004-07-06 23:16                   ` Andi Kleen
2004-07-07  7:50                     ` Chris Wedgwood
2004-07-06 23:19                 ` Redeeman
2004-07-07 19:47                 ` John Heffner
2004-07-06 20:19               ` analysis of TCP window size issues still around - several reports / SACK involved? David S. Miller
2004-07-06 20:27                 ` bert hubert
2004-07-06 20:31                   ` David S. Miller
2004-07-07 21:25                   ` Alessandro Suardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).