* TCP connection stops after high load.
@ 2007-04-12 21:11 Robert Iakobashvili
2007-04-12 21:15 ` David Miller
0 siblings, 1 reply; 41+ messages in thread
From: Robert Iakobashvili @ 2007-04-12 21:11 UTC (permalink / raw)
To: netdev; +Cc: Ben Greear
Hi Ben,
On 4/11/07, Ben Greear <greearb@candelatech.com> wrote:
> The problem is that I set up a TCP connection with bi-directional traffic
> of around 800Mbps, doing large (20k - 64k writes and reads) between two ports on
> the same machine (this 2.6.18.2 kernel is tainted with my full patch set,
> but I also reproduced with only the non-tainted send-to-self patch applied
> last may on the 2.6.16 kernel, so I assume the bug is not particular to my patch
> set).
>
> At first, all is well, but within 5-10 minutes, the TCP connection will stall
> and I only see a massive amount of duplicate ACKs on the link.
>
Just today I have faced some problems in the setup lighttpd server
(epoll demultiplexing and increased max-fds num) against curl-loader,
generating HTTP client load, both on the same host.
curl-loader adds 1000-8000 secondary IPv4 addresses to
eth0 interface. Then it opens 20-200 virtual HTTP clients per second till the
steady state number. Each client opens its socket, binds to a
secondary IP-address
and connects to the web server with further HTTP GET/POST, etc
response, etc
It works good with 2.6.11.8 and debian 2.6.18.3-i686 image.
At the same Intel Pentium-4 PC with the same about kernel configuration
(make oldconfig using Debian config-2.6.18.3-i686) the setup fails with the
tcp-connections stalled after 1000 established connections when the kernel
is 2.6.20.6 or 2.6.19.5.
It stalls even earlier, when lighttpd used with the default (poll ())
demultiplexing
after 500 connections or when apache2 web server used (memory?) - after 100
connections.
I am currently going to try vanilla 2.6.18.3 and, if with it also
fails, to look through
Debian patches, trying to figure out, what is the delta.
strace-ing and logs has revealed actually 2 scenarios of failures.
Connections are established successfully and:
- request sent and there is no response;
- partial response received and the connection stalls.
I will also try to collect some streams by tcpdump, using
the filtering by a client side source-ip.
Already tried going from BIC to Reno - not helpful, and loading
from the loopback (lo) - same picture.
Don't fill yourself alone, it may be the same problem, that
we encounter.
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
An open-source HTTP/S, FTP/S traffic
generating, and web testing tool.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-12 21:11 TCP connection stops after high load Robert Iakobashvili
@ 2007-04-12 21:15 ` David Miller
2007-04-15 12:14 ` Robert Iakobashvili
2007-04-15 13:52 ` Robert Iakobashvili
0 siblings, 2 replies; 41+ messages in thread
From: David Miller @ 2007-04-12 21:15 UTC (permalink / raw)
To: coroberti; +Cc: netdev, greearb
From: "Robert Iakobashvili" <coroberti@gmail.com>
Date: Thu, 12 Apr 2007 23:11:14 +0200
> It works good with 2.6.11.8 and debian 2.6.18.3-i686 image.
>
> At the same Intel Pentium-4 PC with the same about kernel configuration
> (make oldconfig using Debian config-2.6.18.3-i686) the setup fails with the
> tcp-connections stalled after 1000 established connections when the kernel
> is 2.6.20.6 or 2.6.19.5.
>
> It stalls even earlier, when lighttpd used with the default (poll ())
> demultiplexing
> after 500 connections or when apache2 web server used (memory?) - after 100
> connections.
>
> I am currently going to try vanilla 2.6.18.3 and, if with it also
> fails, to look through
> Debian patches, trying to figure out, what is the delta.
>
> strace-ing and logs has revealed actually 2 scenarios of failures.
> Connections are established successfully and:
> - request sent and there is no response;
> - partial response received and the connection stalls.
The following patch is not the cause, but it likely
exacerbates the problem, can you revert the following
patch from your kernel and see if it changes the behavior?
commit 7b4f4b5ebceab67ce440a61081a69f0265e17c2a
Author: John Heffner <jheffner@psc.edu>
Date: Sat Mar 25 01:34:07 2006 -0800
[TCP]: Set default max buffers from memory pool size
This patch sets the maximum TCP buffer sizes (available to automatic
buffer tuning, not to setsockopt) based on the TCP memory pool size.
The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more
than 1/128 of the memory pressure threshold.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4b0272c..591e96d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -276,8 +276,8 @@ atomic_t tcp_orphan_count = ATOMIC_INIT(0);
EXPORT_SYMBOL_GPL(tcp_orphan_count);
int sysctl_tcp_mem[3];
-int sysctl_tcp_wmem[3] = { 4 * 1024, 16 * 1024, 128 * 1024 };
-int sysctl_tcp_rmem[3] = { 4 * 1024, 87380, 87380 * 2 };
+int sysctl_tcp_wmem[3];
+int sysctl_tcp_rmem[3];
EXPORT_SYMBOL(sysctl_tcp_mem);
EXPORT_SYMBOL(sysctl_tcp_rmem);
@@ -2081,7 +2081,8 @@ __setup("thash_entries=", set_thash_entries);
void __init tcp_init(void)
{
struct sk_buff *skb = NULL;
- int order, i;
+ unsigned long limit;
+ int order, i, max_share;
if (sizeof(struct tcp_skb_cb) > sizeof(skb->cb))
__skb_cb_too_small_for_tcp(sizeof(struct tcp_skb_cb),
@@ -2155,12 +2156,16 @@ void __init tcp_init(void)
sysctl_tcp_mem[1] = 1024 << order;
sysctl_tcp_mem[2] = 1536 << order;
- if (order < 3) {
- sysctl_tcp_wmem[2] = 64 * 1024;
- sysctl_tcp_rmem[0] = PAGE_SIZE;
- sysctl_tcp_rmem[1] = 43689;
- sysctl_tcp_rmem[2] = 2 * 43689;
- }
+ limit = ((unsigned long)sysctl_tcp_mem[1]) << (PAGE_SHIFT - 7);
+ max_share = min(4UL*1024*1024, limit);
+
+ sysctl_tcp_wmem[0] = SK_STREAM_MEM_QUANTUM;
+ sysctl_tcp_wmem[1] = 16*1024;
+ sysctl_tcp_wmem[2] = max(64*1024, max_share);
+
+ sysctl_tcp_rmem[0] = SK_STREAM_MEM_QUANTUM;
+ sysctl_tcp_rmem[1] = 87380;
+ sysctl_tcp_rmem[2] = max(87380, max_share);
printk(KERN_INFO "TCP: Hash tables configured "
"(established %d bind %d)\n",
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: TCP connection stops after high load.
2007-04-12 21:15 ` David Miller
@ 2007-04-15 12:14 ` Robert Iakobashvili
2007-04-15 15:31 ` John Heffner
2007-04-15 13:52 ` Robert Iakobashvili
1 sibling, 1 reply; 41+ messages in thread
From: Robert Iakobashvili @ 2007-04-15 12:14 UTC (permalink / raw)
To: David Miller; +Cc: netdev, greearb
On 4/13/07, David Miller <davem@davemloft.net> wrote:
> From: "Robert Iakobashvili" <coroberti@gmail.com>
> Date: Thu, 12 Apr 2007 23:11:14 +0200
>
> > It works good with 2.6.11.8 and debian 2.6.18.3-i686 image.
> >
> > At the same Intel Pentium-4 PC with the same about kernel configuration
> > (make oldconfig using Debian config-2.6.18.3-i686) the setup fails with the
> > tcp-connections stalled after 1000 established connections when the kernel
> > is 2.6.20.6 or 2.6.19.5.
> >
> > It stalls even earlier, when lighttpd used with the default (poll ())
> > demultiplexing
> > after 500 connections or when apache2 web server used (memory?) - after 100
> > connections.
> >
> > I am currently going to try vanilla 2.6.18.3 and, if with it also
> > fails, to look through
> > Debian patches, trying to figure out, what is the delta.
Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and
2.6.20.6 do not.
Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5
tcp_rmem and tcp_wmem are the same, whereas tcp_mem are
much different:
kernel tcp_mem
---------------------------------------
2.6.18.3 12288 16384 24576
2.6.19.5 3072 4096 6144
Is not it done deliberately by the below patch:
commit 9e950efa20dc8037c27509666cba6999da9368e8
Author: John Heffner <jheffner@psc.edu>
Date: Mon Nov 6 23:10:51 2006 -0800
[TCP]: Don't use highmem in tcp hash size calculation.
This patch removes consideration of high memory when determining TCP
hash table sizes. Taking into account high memory results in tcp_mem
values that are too large.
Is it a feature?
My machine has:
MemTotal: 484368 kB
and
for all kernel configurations are actually the same with
CONFIG_HIGHMEM4G=y
Thanks,
--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
An open-source HTTP/S, FTP/S traffic
generating, and web testing tool.
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: TCP connection stops after high load.
2007-04-15 12:14 ` Robert Iakobashvili
@ 2007-04-15 15:31 ` John Heffner
2007-04-15 15:49 ` Robert Iakobashvili
0 siblings, 1 reply; 41+ messages in thread
From: John Heffner @ 2007-04-15 15:31 UTC (permalink / raw)
To: Robert Iakobashvili; +Cc: David Miller, netdev, greearb
Robert Iakobashvili wrote:
> Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and
> 2.6.20.6 do not.
>
> Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5
> tcp_rmem and tcp_wmem are the same, whereas tcp_mem are
> much different:
>
> kernel tcp_mem
> ---------------------------------------
> 2.6.18.3 12288 16384 24576
> 2.6.19.5 3072 4096 6144
>
>
> Is not it done deliberately by the below patch:
>
> commit 9e950efa20dc8037c27509666cba6999da9368e8
> Author: John Heffner <jheffner@psc.edu>
> Date: Mon Nov 6 23:10:51 2006 -0800
>
> [TCP]: Don't use highmem in tcp hash size calculation.
>
> This patch removes consideration of high memory when determining TCP
> hash table sizes. Taking into account high memory results in tcp_mem
> values that are too large.
>
> Is it a feature?
>
> My machine has:
> MemTotal: 484368 kB
> and
> for all kernel configurations are actually the same with
> CONFIG_HIGHMEM4G=y
>
> Thanks,
>
Another patch that went in right around that time:
commit 52bf376c63eebe72e862a1a6e713976b038c3f50
Author: John Heffner <jheffner@psc.edu>
Date: Tue Nov 14 20:25:17 2006 -0800
[TCP]: Fix up sysctl_tcp_mem initialization.
Fix up tcp_mem initial settings to take into account the size of the
hash entries (different on SMP and non-SMP systems).
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
(This has been changed again for 2.6.21.)
In the dmesg, there should be some messages like this:
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
What do yours say?
Thanks,
-John
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: TCP connection stops after high load.
2007-04-15 15:31 ` John Heffner
@ 2007-04-15 15:49 ` Robert Iakobashvili
2007-04-16 18:07 ` John Heffner
0 siblings, 1 reply; 41+ messages in thread
From: Robert Iakobashvili @ 2007-04-15 15:49 UTC (permalink / raw)
To: John Heffner; +Cc: David Miller, netdev, greearb
Hi John,
On 4/15/07, John Heffner <jheffner@psc.edu> wrote:
> Robert Iakobashvili wrote:
> > Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and
> > 2.6.20.6 do not.
> >
> > Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5
> > tcp_rmem and tcp_wmem are the same, whereas tcp_mem are
> > much different:
> >
> > kernel tcp_mem
> > ---------------------------------------
> > 2.6.18.3 12288 16384 24576
> > 2.6.19.5 3072 4096 6144
> Another patch that went in right around that time:
>
> commit 52bf376c63eebe72e862a1a6e713976b038c3f50
> Author: John Heffner <jheffner@psc.edu>
> Date: Tue Nov 14 20:25:17 2006 -0800
>
> [TCP]: Fix up sysctl_tcp_mem initialization.
> (This has been changed again for 2.6.21.)
>
> In the dmesg, there should be some messages like this:
> IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
> TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
> TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
>
> What do yours say?
For the 2.6.19.5, where we have this problem:
>From dmsg:
IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
TCP established hash table entries: 16384 (order: 5, 131072 bytes)
TCP bind hash table entries: 8192 (order: 4, 65536 bytes)
#cat /proc/sys/net/ipv4/tcp_mem
3072 4096 6144
MemTotal: 484368 kB
CONFIG_HIGHMEM4G=y
Thanks,
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
An open-source HTTP/S, FTP/S traffic
generating, and web testing tool.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-15 15:49 ` Robert Iakobashvili
@ 2007-04-16 18:07 ` John Heffner
2007-04-16 18:51 ` Robert Iakobashvili
0 siblings, 1 reply; 41+ messages in thread
From: John Heffner @ 2007-04-16 18:07 UTC (permalink / raw)
To: Robert Iakobashvili; +Cc: David Miller, netdev, greearb
Robert Iakobashvili wrote:
> Hi John,
>
> On 4/15/07, John Heffner <jheffner@psc.edu> wrote:
>> Robert Iakobashvili wrote:
>> > Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and
>> > 2.6.20.6 do not.
>> >
>> > Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5
>> > tcp_rmem and tcp_wmem are the same, whereas tcp_mem are
>> > much different:
>> >
>> > kernel tcp_mem
>> > ---------------------------------------
>> > 2.6.18.3 12288 16384 24576
>> > 2.6.19.5 3072 4096 6144
>
>> Another patch that went in right around that time:
>>
>> commit 52bf376c63eebe72e862a1a6e713976b038c3f50
>> Author: John Heffner <jheffner@psc.edu>
>> Date: Tue Nov 14 20:25:17 2006 -0800
>>
>> [TCP]: Fix up sysctl_tcp_mem initialization.
>> (This has been changed again for 2.6.21.)
>>
>> In the dmesg, there should be some messages like this:
>> IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
>> TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
>> TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
>> TCP: Hash tables configured (established 131072 bind 65536)
>>
>> What do yours say?
>
> For the 2.6.19.5, where we have this problem:
>> From dmsg:
> IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
> TCP established hash table entries: 16384 (order: 5, 131072 bytes)
> TCP bind hash table entries: 8192 (order: 4, 65536 bytes)
>
> #cat /proc/sys/net/ipv4/tcp_mem
> 3072 4096 6144
>
> MemTotal: 484368 kB
> CONFIG_HIGHMEM4G=y
Yes, this difference is caused by the commit above. The old way didn't
really make a lot of sense, since it was different based on smp/non-smp
and page size, and had large discontinuities at 512MB and every power of
two. It was hard to make the limit never larger than the memory pool
but never too small either, when based on the hash table size.
The current net-2.6 (2.6.21) has a redesigned tcp_mem initialization
that should give you more appropriate values, something like 45408 60546
90816. For reference:
Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c
Author: John Heffner <jheffner@psc.edu> Fri, 16 Mar 2007 15:04:03 -0700
[TCP]: Fix tcp_mem[] initialization.
Change tcp_mem initialization function. The fraction of total memory
is now a continuous function of memory size, and independent of page
size.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks,
-John
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: TCP connection stops after high load.
2007-04-16 18:07 ` John Heffner
@ 2007-04-16 18:51 ` Robert Iakobashvili
2007-04-16 19:11 ` John Heffner
2007-04-16 19:15 ` David Miller
0 siblings, 2 replies; 41+ messages in thread
From: Robert Iakobashvili @ 2007-04-16 18:51 UTC (permalink / raw)
To: John Heffner; +Cc: David Miller, netdev, greearb
> >> Robert Iakobashvili wrote:
> >> > Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and
> >> > 2.6.20.6 do not.
> >> >
> >> > Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5
> >> > tcp_rmem and tcp_wmem are the same, whereas tcp_mem are
> >> > much different:
> >> >
> >> > kernel tcp_mem
> >> > ---------------------------------------
> >> > 2.6.18.3 12288 16384 24576
> >> > 2.6.19.5 3072 4096 6144
> >
> >> Another patch that went in right around that time:
> >>
> >> commit 52bf376c63eebe72e862a1a6e713976b038c3f50
> >> Author: John Heffner <jheffner@psc.edu>
> >> Date: Tue Nov 14 20:25:17 2006 -0800
> >>
> >> [TCP]: Fix up sysctl_tcp_mem initialization.
> >> (This has been changed again for 2.6.21.)
> >>
> Yes, this difference is caused by the commit above.
> The current net-2.6 (2.6.21) has a redesigned tcp_mem initialization
> that should give you more appropriate values, something like 45408 60546
> 90816. For reference:
> Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c
> Author: John Heffner <jheffner@psc.edu> Fri, 16 Mar 2007 15:04:03 -0700
>
> [TCP]: Fix tcp_mem[] initialization.
> Change tcp_mem initialization function. The fraction of total memory
> is now a continuous function of memory size, and independent of page
> size.
Kernels 2.6.19 and 2.6.20 series are effectively broken right now.
Don't you wish to patch them?
--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
An open-source HTTP/S, FTP/S traffic
generating, and web testing tool.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-16 18:51 ` Robert Iakobashvili
@ 2007-04-16 19:11 ` John Heffner
2007-04-16 19:17 ` David Miller
2007-04-16 19:15 ` David Miller
1 sibling, 1 reply; 41+ messages in thread
From: John Heffner @ 2007-04-16 19:11 UTC (permalink / raw)
To: Robert Iakobashvili; +Cc: David Miller, netdev, greearb
Robert Iakobashvili wrote:
> Kernels 2.6.19 and 2.6.20 series are effectively broken right now.
> Don't you wish to patch them?
>
I don't know if this qualifies as an unconditional bug. The commit
above was actually a bugfix so that the limits were not higher than
total memory on some systems, but had the side effect that it made them
even smaller on your particular configuration. Also, having initial
sysctl values that are conservatively small probably doesn't qualify as
a bug (for patching stable trees). You might ask the -stable
maintainers if they have a different opinion.
For most people, 2.6.19 and 2.6.20 work fine. For those who really care
about the tcp_mem values (are using a substantial fraction of physical
memory for TCP connections), the best bet is to set the tcp_mem sysctl
values in the startup scripts, or use the new initialization function in
2.6.21.
Thanks,
-John
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-16 19:11 ` John Heffner
@ 2007-04-16 19:17 ` David Miller
0 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2007-04-16 19:17 UTC (permalink / raw)
To: jheffner; +Cc: coroberti, netdev, greearb
From: John Heffner <jheffner@psc.edu>
Date: Mon, 16 Apr 2007 15:11:07 -0400
> I don't know if this qualifies as an unconditional bug. The commit
> above was actually a bugfix so that the limits were not higher than
> total memory on some systems, but had the side effect that it made them
> even smaller on your particular configuration. Also, having initial
> sysctl values that are conservatively small probably doesn't qualify as
> a bug (for patching stable trees). You might ask the -stable
> maintainers if they have a different opinion.
>
> For most people, 2.6.19 and 2.6.20 work fine. For those who really care
> about the tcp_mem values (are using a substantial fraction of physical
> memory for TCP connections), the best bet is to set the tcp_mem sysctl
> values in the startup scripts, or use the new initialization function in
> 2.6.21.
What's most important is determining if that tcp_mem[] patch actually
fixes his problem, so it is his responsibility to see whether this
is the case.
If it does fix the problem, I'm happy to submit the backport to -stable.
But until such tests are made, it's just speculation whether the patch
fixes the problem or not, and therefore there is zero justification to
submit it to -stable.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-16 18:51 ` Robert Iakobashvili
2007-04-16 19:11 ` John Heffner
@ 2007-04-16 19:15 ` David Miller
2007-04-17 7:58 ` Robert Iakobashvili
1 sibling, 1 reply; 41+ messages in thread
From: David Miller @ 2007-04-16 19:15 UTC (permalink / raw)
To: coroberti; +Cc: jheffner, netdev, greearb
From: "Robert Iakobashvili" <coroberti@gmail.com>
Date: Mon, 16 Apr 2007 20:51:54 +0200
> > Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c
> > Author: John Heffner <jheffner@psc.edu> Fri, 16 Mar 2007 15:04:03 -0700
> >
> > [TCP]: Fix tcp_mem[] initialization.
> > Change tcp_mem initialization function. The fraction of total memory
> > is now a continuous function of memory size, and independent of page
> > size.
>
>
> Kernels 2.6.19 and 2.6.20 series are effectively broken right now.
> Don't you wish to patch them?
Can you verify that this patch actually fixes your problem?
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-16 19:15 ` David Miller
@ 2007-04-17 7:58 ` Robert Iakobashvili
2007-04-17 19:39 ` David Miller
0 siblings, 1 reply; 41+ messages in thread
From: Robert Iakobashvili @ 2007-04-17 7:58 UTC (permalink / raw)
To: David Miller; +Cc: jheffner, netdev, greearb, Michael Moser
David,
On 4/16/07, David Miller <davem@davemloft.net> wrote:
> > > Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c
> > > Author: John Heffner <jheffner@psc.edu> Fri, 16 Mar 2007 15:04:03 -0700
> > >
> > > [TCP]: Fix tcp_mem[] initialization.
> > > Change tcp_mem initialization function. The fraction of total memory
> > > is now a continuous function of memory size, and independent of page
> > > size.
> >
> >
> > Kernels 2.6.19 and 2.6.20 series are effectively broken right now.
> > Don't you wish to patch them?
>
> Can you verify that this patch actually fixes your problem?
Yes, it fixes.
After the patch curl-loader works with patched 2.6.19.7 and
with patched 2.6.20.7 using simulteneous 3000 local connections smothly,
and even better than with referred as a "good" 2.6.18.3.
Besides that the tcp_mem status for my machine:
kernel tcp_mem
------------------------------------------------------
2.6.19.7 3072 4096 6144
2.6.19.7-patched 45696 60928 91392
2.6.20.7 3072 4096 6144
2.6.20.7-patched 45696 60928 91392
The patch was applied smothly just with line offsets.
--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
An open-source HTTP/S, FTP/S traffic
generating, and web testing tool.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-17 7:58 ` Robert Iakobashvili
@ 2007-04-17 19:39 ` David Miller
2007-04-17 19:47 ` John Heffner
2007-04-17 19:58 ` Robert Iakobashvili
0 siblings, 2 replies; 41+ messages in thread
From: David Miller @ 2007-04-17 19:39 UTC (permalink / raw)
To: coroberti; +Cc: jheffner, netdev, greearb, moser.michael
From: "Robert Iakobashvili" <coroberti@gmail.com>
Date: Tue, 17 Apr 2007 10:58:04 +0300
> David,
>
> On 4/16/07, David Miller <davem@davemloft.net> wrote:
> > > > Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c
> > > > Author: John Heffner <jheffner@psc.edu> Fri, 16 Mar 2007 15:04:03 -0700
> > > >
> > > > [TCP]: Fix tcp_mem[] initialization.
> > > > Change tcp_mem initialization function. The fraction of total memory
> > > > is now a continuous function of memory size, and independent of page
> > > > size.
> > >
> > >
> > > Kernels 2.6.19 and 2.6.20 series are effectively broken right now.
> > > Don't you wish to patch them?
> >
> > Can you verify that this patch actually fixes your problem?
>
> Yes, it fixes.
Thanks, I will submit it to -stable branch.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-17 19:39 ` David Miller
@ 2007-04-17 19:47 ` John Heffner
2007-04-17 19:51 ` David Miller
2007-04-17 19:58 ` Robert Iakobashvili
1 sibling, 1 reply; 41+ messages in thread
From: John Heffner @ 2007-04-17 19:47 UTC (permalink / raw)
To: David Miller; +Cc: coroberti, netdev, greearb, moser.michael
David Miller wrote:
> From: "Robert Iakobashvili" <coroberti@gmail.com>
> Date: Tue, 17 Apr 2007 10:58:04 +0300
>
>> David,
>>
>> On 4/16/07, David Miller <davem@davemloft.net> wrote:
>>>>> Commit: 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c
>>>>> Author: John Heffner <jheffner@psc.edu> Fri, 16 Mar 2007 15:04:03 -0700
>>>>>
>>>>> [TCP]: Fix tcp_mem[] initialization.
>>>>> Change tcp_mem initialization function. The fraction of total memory
>>>>> is now a continuous function of memory size, and independent of page
>>>>> size.
>>>>
>>>> Kernels 2.6.19 and 2.6.20 series are effectively broken right now.
>>>> Don't you wish to patch them?
>>> Can you verify that this patch actually fixes your problem?
>> Yes, it fixes.
>
> Thanks, I will submit it to -stable branch.
My only reservation in submitting this to -stable is that it will in
many cases increase the default tcp_mem values, which in turn can
increase the default tcp_rmem values, and therefore the window scale.
There will be some set of people with broken firewalls who trigger that
problem for the first time by upgrading along the stable branch. While
it's not our fault, it could cause some complaints...
Thanks,
-John
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-17 19:47 ` John Heffner
@ 2007-04-17 19:51 ` David Miller
0 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2007-04-17 19:51 UTC (permalink / raw)
To: jheffner; +Cc: coroberti, netdev, greearb, moser.michael
From: John Heffner <jheffner@psc.edu>
Date: Tue, 17 Apr 2007 15:47:58 -0400
> My only reservation in submitting this to -stable is that it will in
> many cases increase the default tcp_mem values, which in turn can
> increase the default tcp_rmem values, and therefore the window scale.
> There will be some set of people with broken firewalls who trigger that
> problem for the first time by upgrading along the stable branch. While
> it's not our fault, it could cause some complaints...
It is a very valid concern.
However this is fixing a problem where we are in the wrong,
whereas the firewall issues are external and should not
block us from being able to fix our own bugs :-)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-17 19:39 ` David Miller
2007-04-17 19:47 ` John Heffner
@ 2007-04-17 19:58 ` Robert Iakobashvili
1 sibling, 0 replies; 41+ messages in thread
From: Robert Iakobashvili @ 2007-04-17 19:58 UTC (permalink / raw)
To: David Miller; +Cc: jheffner, netdev, greearb, moser.michael
> > Yes, it fixes.
>
> Thanks, I will submit it to -stable branch.
>
David and John,
Thanks for your caring and attention.
--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
An open-source HTTP/S, FTP/S traffic
generating, and web testing tool.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-12 21:15 ` David Miller
2007-04-15 12:14 ` Robert Iakobashvili
@ 2007-04-15 13:52 ` Robert Iakobashvili
1 sibling, 0 replies; 41+ messages in thread
From: Robert Iakobashvili @ 2007-04-15 13:52 UTC (permalink / raw)
To: David Miller; +Cc: netdev, greearb
On 4/13/07, David Miller <davem@davemloft.net> wrote:
> From: "Robert Iakobashvili" <coroberti@gmail.com>
> Date: Thu, 12 Apr 2007 23:11:14 +0200
>
> > It works good with 2.6.11.8 and debian 2.6.18.3-i686 image.
> >
> > At the same Intel Pentium-4 PC with the same about kernel configuration
> > (make oldconfig using Debian config-2.6.18.3-i686) the setup fails with the
> > tcp-connections stalled after 1000 established connections when the kernel
> > is 2.6.20.6 or 2.6.19.5.
> >
> > It stalls even earlier, when lighttpd used with the default (poll ())
> > demultiplexing
> > after 500 connections or when apache2 web server used (memory?) - after 100
> > connections.
> >
> > I am currently going to try vanilla 2.6.18.3 and, if with it also
> > fails, to look through
> > Debian patches, trying to figure out, what is the delta.
>Vanilla 2.6.18.3 works for me perfectly, whereas 2.6.19.5 and
2.6.20.6 do not.
>
>Looking into the tcp /proc entries of 2.6.18.3 versus 2.6.19.5
>tcp_rmem and tcp_wmem are the same, whereas tcp_mem are
>much different:
>
>kernel tcp_mem
>---------------------------------------
>2.6.18.3 12288 16384 24576
>2.6.19.5 3072 4096 6144
>
>
>Is not it done deliberately by the below patch:
>
>commit 9e950efa20dc8037c27509666cba6999da9368e8
>Author: John Heffner <jheffner@psc.edu>
>Date: Mon Nov 6 23:10:51 2006 -0800
Sorry, the commit is innocent. Something else has been
broken in tcp_mem initialization logic.
>My machine has:
>MemTotal: 484368 kB
>and
>for all kernel configurations are actually the same with
>CONFIG_HIGHMEM4G=y
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
An open-source HTTP/S, FTP/S traffic
generating, and web testing tool.
^ permalink raw reply [flat|nested] 41+ messages in thread
* TCP connection stops after high load.
@ 2007-04-11 18:50 Ben Greear
2007-04-11 20:26 ` Ben Greear
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Ben Greear @ 2007-04-11 18:50 UTC (permalink / raw)
To: NetDev
Back in May of last year, I reported this problem, but worked
around it at the time by changing the kernel memory settings
in the networking stack. I reproduced the problem again today
with the previously working kernel memory settings..which is not
supprising since I just papered over the bug last time.
The problem is that I set up a TCP connection with bi-directional traffic
of around 800Mbps, doing large (20k - 64k writes and reads) between two ports on
the same machine (this 2.6.18.2 kernel is tainted with my full patch set,
but I also reproduced with only the non-tainted send-to-self patch applied
last may on the 2.6.16 kernel, so I assume the bug is not particular to my patch
set).
At first, all is well, but within 5-10 minutes, the TCP connection will stall
and I only see a massive amount of duplicate ACKs on the link. Before,
I sometimes saw OOM messages, but this time there are no OOM messages. The system
has a two-port pro/1000 fibre NIC, 1GB RAM, kernel 2.6.18.2 + hacks, etc.
Stopping and starting the connection allows traffic to flow again (if briefly).
Starting a new connection works fine even if the old one is still stalled,
so it's not a global memory exhaustion problem.
So, I would like to dig into this problem myself since no one else
is reporting this type of problem, but I am quite ignorant of the TCP
stack implementation. Based on the dup-acks I see on the wire, I assume
the TCP state machine is messed up somehow. Could anyone point me to
likely places in the TCP stack to start looking for this bug?
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: TCP connection stops after high load.
2007-04-11 18:50 Ben Greear
@ 2007-04-11 20:26 ` Ben Greear
2007-04-11 20:48 ` David Miller
2007-04-11 20:41 ` David Miller
2007-04-12 6:12 ` Ilpo Järvinen
2 siblings, 1 reply; 41+ messages in thread
From: Ben Greear @ 2007-04-11 20:26 UTC (permalink / raw)
To: NetDev
Ben Greear wrote:
> Back in May of last year, I reported this problem, but worked
> around it at the time by changing the kernel memory settings
> in the networking stack. I reproduced the problem again today
> with the previously working kernel memory settings..which is not
> supprising since I just papered over the bug last time.
So, I have been poking around. Disabling tso makes the problem happen
sooner (< 1 minute). Changing the tcp_congestion_control does not help.
Interestingly, I found this page mentioning a SACK problem in Linux:
http://www-didc.lbl.gov/TCP-tuning/linux.html
I tried disabling SACK, but the problem still happens. However,
I do see the CWND go to 1 as soon as the connection stalls (I'm not
sure exactly which happens first.) Before the stall, I see CWND
reported in the ~40 range.
Maybe something similar to the SACK bug can happen on very fast, very
low latency links, with large send/receive buffers configured?
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-11 20:26 ` Ben Greear
@ 2007-04-11 20:48 ` David Miller
2007-04-11 21:06 ` Ben Greear
0 siblings, 1 reply; 41+ messages in thread
From: David Miller @ 2007-04-11 20:48 UTC (permalink / raw)
To: greearb; +Cc: netdev
From: Ben Greear <greearb@candelatech.com>
Date: Wed, 11 Apr 2007 13:26:36 -0700
> Interestingly, I found this page mentioning a SACK problem in Linux:
> http://www-didc.lbl.gov/TCP-tuning/linux.html
Don't read that page, it is the last place in the world your should
take hints and advice from, most of the problems they speak of there
have been fixed years ago.
Please start instrumenting the TCP code instead of "poking around"
hoping you'll hit the grand jackpot by manipulating some sysctl
setting.
It doesn't help us and it won't help you, start reading and
understanding the TCP code, add debugging printk's, anything to get
more information about this.
And please don't report anything here until you have some solid piece
of debugging information, else I'll just sit here replying and
prodding you along ever so slowly. :(
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-11 20:48 ` David Miller
@ 2007-04-11 21:06 ` Ben Greear
2007-04-11 21:11 ` David Miller
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Ben Greear @ 2007-04-11 21:06 UTC (permalink / raw)
To: David Miller; +Cc: netdev
David Miller wrote:
> From: Ben Greear <greearb@candelatech.com>
> Date: Wed, 11 Apr 2007 13:26:36 -0700
>
>> Interestingly, I found this page mentioning a SACK problem in Linux:
>> http://www-didc.lbl.gov/TCP-tuning/linux.html
>
> Don't read that page, it is the last place in the world your should
> take hints and advice from, most of the problems they speak of there
> have been fixed years ago.
Much of their memory and buffer settings are similar to what I've
seen elsewhere..and what I use, but it could be we're all getting
the same info from the same faulty source. Suggestions of a proper
site for tuning TCP for high speed/high latency links are welcome.
> Please start instrumenting the TCP code instead of "poking around"
> hoping you'll hit the grand jackpot by manipulating some sysctl
> setting.
>
> It doesn't help us and it won't help you, start reading and
> understanding the TCP code, add debugging printk's, anything to get
> more information about this.
>
> And please don't report anything here until you have some solid piece
> of debugging information, else I'll just sit here replying and
> prodding you along ever so slowly. :(
Does the CWND == 1 count as solid? Any idea how/why this would go
to 1 in conjunction with the dup acks?
For the dup acks, I see nothing *but* dup acks on the wire...going in
both directions interestingly, at greater than 100,000 packets per second.
I don't mind adding printks...and I've started reading through the code,
but there is a lot of it, and indiscriminate printks will likely just
hide the problem because it will slow down performance so much.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: TCP connection stops after high load.
2007-04-11 21:06 ` Ben Greear
@ 2007-04-11 21:11 ` David Miller
2007-04-11 21:31 ` Ben Greear
2007-04-12 1:06 ` Benjamin LaHaise
2007-04-12 14:48 ` Andi Kleen
2 siblings, 1 reply; 41+ messages in thread
From: David Miller @ 2007-04-11 21:11 UTC (permalink / raw)
To: greearb; +Cc: netdev
From: Ben Greear <greearb@candelatech.com>
Date: Wed, 11 Apr 2007 14:06:31 -0700
> Does the CWND == 1 count as solid? Any idea how/why this would go
> to 1 in conjunction with the dup acks?
>
> For the dup acks, I see nothing *but* dup acks on the wire...going in
> both directions interestingly, at greater than 100,000 packets per second.
>
> I don't mind adding printks...and I've started reading through the code,
> but there is a lot of it, and indiscriminate printks will likely just
> hide the problem because it will slow down performance so much.
If you know that it doesn't take Einstein to figure out that maybe you
should add logging when CWND is one and we're sending out an ACK?
This is why I think you're very lazy Ben and I get very agitated with
all of your reports, you put zero effort into thinking about how to
debug the problem even though you know full well how to do it.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-11 21:11 ` David Miller
@ 2007-04-11 21:31 ` Ben Greear
2007-04-11 21:39 ` David Miller
2007-04-12 2:44 ` SANGTAE HA
0 siblings, 2 replies; 41+ messages in thread
From: Ben Greear @ 2007-04-11 21:31 UTC (permalink / raw)
To: David Miller; +Cc: netdev
David Miller wrote:
> From: Ben Greear <greearb@candelatech.com>
> Date: Wed, 11 Apr 2007 14:06:31 -0700
>
>> Does the CWND == 1 count as solid? Any idea how/why this would go
>> to 1 in conjunction with the dup acks?
>>
>> For the dup acks, I see nothing *but* dup acks on the wire...going in
>> both directions interestingly, at greater than 100,000 packets per second.
>>
>> I don't mind adding printks...and I've started reading through the code,
>> but there is a lot of it, and indiscriminate printks will likely just
>> hide the problem because it will slow down performance so much.
>
> If you know that it doesn't take Einstein to figure out that maybe you
> should add logging when CWND is one and we're sending out an ACK?
>
> This is why I think you're very lazy Ben and I get very agitated with
> all of your reports, you put zero effort into thinking about how to
> debug the problem even though you know full well how to do it.
I've spent solid weeks tracking down obscure races. I'm hoping that
someone who knows the tcp stack will have some idea of places to look
based on the reported symptoms so that I don't have to spend another
solid week chasing this one. If not, so be it..I'm still working on
this between sending emails. For what it's worth, the problem (or something similar)
is reproducible on a stock FC5 .18-ish kernel as well, running between
two machines, 2 ports each.
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-11 21:31 ` Ben Greear
@ 2007-04-11 21:39 ` David Miller
2007-04-12 2:44 ` SANGTAE HA
1 sibling, 0 replies; 41+ messages in thread
From: David Miller @ 2007-04-11 21:39 UTC (permalink / raw)
To: greearb; +Cc: netdev
From: Ben Greear <greearb@candelatech.com>
Date: Wed, 11 Apr 2007 14:31:00 -0700
> I've spent solid weeks tracking down obscure races.
I've spent solid weeks tracking down kernel stack corruption and scsi
problems on sparc64, as well as attending to my network maintainer
duties, what is your point?
> I'm hoping that someone who knows the tcp stack will have some idea
> of places to look based on the reported symptoms so that I don't
> have to spend another solid week chasing this one.
If you can reproduce the bug and others cannot, you are the one in the
best possible situation to add diagnostics and figure out what's
wrong. Please do this.
You get a lot from Linux in your work, but you sure grumble a lot when
you might need to give even a smidgen back. You just dump random
pieces of information at this list and expect other people to just fix
it for you. It's this part of your attitude that I absolutely do not
like. Other people are able to report bugs in a pleasant and
non-selfish way that makes me want to go and fix the bug for them
proactively, you do not.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-11 21:31 ` Ben Greear
2007-04-11 21:39 ` David Miller
@ 2007-04-12 2:44 ` SANGTAE HA
1 sibling, 0 replies; 41+ messages in thread
From: SANGTAE HA @ 2007-04-12 2:44 UTC (permalink / raw)
To: Ben Greear; +Cc: David Miller, netdev
I also noticed this happening with 2.6.18 kernel version, but this was
not severe with linux 2.6.20.3. So, the short-term solution will be
upgrading to the latest kernel of FC-6.
A long black-out is mostly observed when a lot of packet losses
happened in slow start. You can prevent this by applying a patch
(limited slow start) to your slow start. Did you have same problems
with cubic which employs a less aggressive slow start? I leave this
debugging for some later version of kernel but you are welcome to
debug this problem.
I recommend you install tcp_probe and recreate the problem. Whenever
you get an ack from the receiver, the probe will print the current
congestion information. Also, you can easily include some other
information you want in that module. You can get some information from
some statistics on /proc/net/tcp and /proc/net/netstat.
See http://netsrv.csc.ncsu.edu/wiki/index.php/Efficiency_of_SACK_processing
Thanks,
Sangtae
On 4/11/07, Ben Greear <greearb@candelatech.com> wrote:
> David Miller wrote:
> > From: Ben Greear <greearb@candelatech.com>
> > Date: Wed, 11 Apr 2007 14:06:31 -0700
> >
> >> Does the CWND == 1 count as solid? Any idea how/why this would go
> >> to 1 in conjunction with the dup acks?
> >>
> >> For the dup acks, I see nothing *but* dup acks on the wire...going in
> >> both directions interestingly, at greater than 100,000 packets per second.
> >>
> >> I don't mind adding printks...and I've started reading through the code,
> >> but there is a lot of it, and indiscriminate printks will likely just
> >> hide the problem because it will slow down performance so much.
> >
> > If you know that it doesn't take Einstein to figure out that maybe you
> > should add logging when CWND is one and we're sending out an ACK?
> >
> > This is why I think you're very lazy Ben and I get very agitated with
> > all of your reports, you put zero effort into thinking about how to
> > debug the problem even though you know full well how to do it.
>
> I've spent solid weeks tracking down obscure races. I'm hoping that
> someone who knows the tcp stack will have some idea of places to look
> based on the reported symptoms so that I don't have to spend another
> solid week chasing this one. If not, so be it..I'm still working on
> this between sending emails. For what it's worth, the problem (or something similar)
> is reproducible on a stock FC5 .18-ish kernel as well, running between
> two machines, 2 ports each.
>
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc http://www.candelatech.com
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-11 21:06 ` Ben Greear
2007-04-11 21:11 ` David Miller
@ 2007-04-12 1:06 ` Benjamin LaHaise
2007-04-12 14:48 ` Andi Kleen
2 siblings, 0 replies; 41+ messages in thread
From: Benjamin LaHaise @ 2007-04-12 1:06 UTC (permalink / raw)
To: Ben Greear; +Cc: David Miller, netdev
On Wed, Apr 11, 2007 at 02:06:31PM -0700, Ben Greear wrote:
> For the dup acks, I see nothing *but* dup acks on the wire...going in
> both directions interestingly, at greater than 100,000 packets per second.
>
> I don't mind adding printks...and I've started reading through the code,
> but there is a lot of it, and indiscriminate printks will likely just
> hide the problem because it will slow down performance so much.
What do the timestamps look like? PAWS contains logic which will drop
packets if the timestamps are too old compared to what the receiver
expects.
-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <zyntrop@kvack.org>.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-11 21:06 ` Ben Greear
2007-04-11 21:11 ` David Miller
2007-04-12 1:06 ` Benjamin LaHaise
@ 2007-04-12 14:48 ` Andi Kleen
2007-04-12 17:59 ` Ben Greear
2 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2007-04-12 14:48 UTC (permalink / raw)
To: Ben Greear; +Cc: David Miller, netdev
Ben Greear <greearb@candelatech.com> writes:
>
> I don't mind adding printks...and I've started reading through the code,
> but there is a lot of it, and indiscriminate printks will likely just
> hide the problem because it will slow down performance so much.
You could add /proc/net/snmp counters for interesting events (e.g. GFP_ATOMIC
allocations failing). Perhaps netstat -s already shows something interesting.
-Andi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-12 14:48 ` Andi Kleen
@ 2007-04-12 17:59 ` Ben Greear
2007-04-12 18:19 ` Eric Dumazet
0 siblings, 1 reply; 41+ messages in thread
From: Ben Greear @ 2007-04-12 17:59 UTC (permalink / raw)
To: Andi Kleen; +Cc: netdev, bcrl
Andi Kleen wrote:
> Ben Greear <greearb@candelatech.com> writes:
>> I don't mind adding printks...and I've started reading through the code,
>> but there is a lot of it, and indiscriminate printks will likely just
>> hide the problem because it will slow down performance so much.
>
> You could add /proc/net/snmp counters for interesting events (e.g. GFP_ATOMIC
> allocations failing). Perhaps netstat -s already shows something interesting.
I will look for more interesting events to add counters for, thanks for
the suggestion. Thanks for the rest of the suggestions and patches from
others as well, I will be trying those out today and will let you know how
it goes. I can also try this on the 2.6.20 kernel.
This is on the machine connected to itself. This is by far the easiest
way to reproduce the problem. This is from the stalled state. About 3-5 minutes
later (I wasn't watching too closely), the connection briefly started up again
and then stalled again. While it is stalled and sending ACKs, the netstat -an
counters remain the same. It appears this run/stall behaviour happens repeatedly,
as the over-all bits-per-second average overnight was around 90Mbps, and it runs at ~800Mbps
when running full speed.
from netstat -an:
tcp 0 759744 20.20.20.30:33012 20.20.20.20:33011 ESTABLISHED
tcp 0 722984 20.20.20.20:33011 20.20.20.30:33012 ESTABLISHED
I'm not sure if netstat -s shows interesting things or not...it does show a very large
number of packets in and out. I ran it twice..about 5 seconds apart. I pasted some
values from the second run on the right-hand side where the numbers looked interesting.
This info is at the bottom of this email.
For GFP_ATOMIC allocations failing, doesn't that show up as order X allocation failure
messages in the kernel (I see no messages of this type.)?
Here is a tcpdump of the connection in the stalled state. As you can see by
the 'time' output, it's running at around 100,000 packets per second. tcpdump
dropped the vast majority of these. Based on the network interface stats, I
believe both sides of the connection are sending acks at about the same
rate (about 160kpps when tcpdump is not running it seems).
10:46:46.541490 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 <nop,nop,timestamp 85158912 84963208>
10:46:46.541494 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
10:46:46.541567 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
10:46:46.541653 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
10:46:46.541886 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
10:46:46.541891 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 <nop,nop,timestamp 85158912 84963208>
10:46:46.541895 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
10:46:46.541988 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
10:46:46.542077 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
10:46:46.542307 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158913 84963208>
10:46:46.542312 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 <nop,nop,timestamp 85158913 84963208>
10:46:46.542321 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 <nop,nop,timestamp 85158913 84963208>
10:46:46.542410 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158913 84963208>
10:46:46.542494 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158913 84963208>
10:46:46.542708 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158913 84963208>
10:46:46.542718 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158913 84963208>
10:46:46.542735 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158913 84963208>
10:46:46.542818 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158913 84963208>
10:46:46.542899 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 <nop,nop,timestamp 85158913 84963208>
4214 packets captured
253889 packets received by filter
244719 packets dropped by kernel
real 0m2.640s
user 0m0.067s
sys 0m0.079s
Two netstat -s outputs....about 5 seconds apart.
[root@lf1001-240 ipv4]# netstat -s
Ip:
2823452436 total packets received 2840939253 total packets received
1 with invalid addresses
0 forwarded
0 incoming packets discarded
2823452435 incoming packets delivered 2840939252 incoming packets delivered
1549687963 requests sent out 1565951477 requests sent out
Icmp:
0 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
0 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
Tcp:
77 active connections openings
74 passive connection openings
0 failed connection attempts
122 connection resets received
10 connections established
2823426197 segments received 2840914122 segments received
1549683727 segments send out 1565948373 segments send out
2171 segments retransmited 2187 segments retransmited
0 bad segments received.
2203 resets sent
Udp:
21739 packets received
0 packets to unknown port received.
0 packet receive errors
4236 packets sent
TcpExt:
1164 invalid SYN cookies received
31323 packets pruned from receive queue because of socket buffer overrun 31337
4 TCP sockets finished time wait in fast timer
8 packets rejects in established connections because of timestamp
91542 delayed acks sent 91645
1902 delayed acks further delayed because of locked socket
Quick ack mode was activated 2201 times
2 packets directly queued to recvmsg prequeue.
1323185164 packets header predicted 1324477473
63077636 acknowledgments not containing data received 63141338
17021279 predicted acknowledgments 17043867
2035 times recovered from packet loss due to fast retransmit
8 times recovered from packet loss due to SACK data
Detected reordering 13 times using reno fast retransmit
Detected reordering 642 times using time stamp
1971 congestion windows fully recovered
16017 congestion windows partially recovered using Hoe heuristic
19 congestion windows recovered after partial ack
0 TCP data loss events
1 timeouts in loss state
225 fast retransmits
3 forward retransmits
151 other TCP timeouts
TCPRenoRecoveryFail: 1
11658529 packets collapsed in receive queue due to low socket buffer 11664170
123 DSACKs sent for old packets
70 DSACKs received
132 connections aborted due to timeout
[root@lf1001-240 ipv4]# netstat -s
Ip:
2840939253 total packets received
1 with invalid addresses
0 forwarded
0 incoming packets discarded
2840939252 incoming packets delivered
1565951477 requests sent out
Icmp:
0 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
0 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
Tcp:
77 active connections openings
74 passive connection openings
0 failed connection attempts
122 connection resets received
10 connections established
2840914122 segments received
1565948373 segments send out
2187 segments retransmited
0 bad segments received.
2203 resets sent
Udp:
21755 packets received
0 packets to unknown port received.
0 packet receive errors
4239 packets sent
TcpExt:
1164 invalid SYN cookies received
31337 packets pruned from receive queue because of socket buffer overrun
4 TCP sockets finished time wait in fast timer
8 packets rejects in established connections because of timestamp
91645 delayed acks sent
1912 delayed acks further delayed because of locked socket
Quick ack mode was activated 2217 times
2 packets directly queued to recvmsg prequeue.
1324477473 packets header predicted
63141338 acknowledgments not containing data received
17043867 predicted acknowledgments
2037 times recovered from packet loss due to fast retransmit
8 times recovered from packet loss due to SACK data
Detected reordering 13 times using reno fast retransmit
Detected reordering 642 times using time stamp
1973 congestion windows fully recovered
16021 congestion windows partially recovered using Hoe heuristic
19 congestion windows recovered after partial ack
0 TCP data loss events
1 timeouts in loss state
225 fast retransmits
3 forward retransmits
153 other TCP timeouts
TCPRenoRecoveryFail: 1
11664170 packets collapsed in receive queue due to low socket buffer
123 DSACKs sent for old packets
70 DSACKs received
132 connections aborted due to timeout
>
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: TCP connection stops after high load.
2007-04-12 17:59 ` Ben Greear
@ 2007-04-12 18:19 ` Eric Dumazet
2007-04-12 19:12 ` Ben Greear
2007-04-13 16:10 ` Daniel Schaffrath
0 siblings, 2 replies; 41+ messages in thread
From: Eric Dumazet @ 2007-04-12 18:19 UTC (permalink / raw)
To: Ben Greear; +Cc: Andi Kleen, netdev, bcrl
On Thu, 12 Apr 2007 10:59:19 -0700
Ben Greear <greearb@candelatech.com> wrote:
>
> Here is a tcpdump of the connection in the stalled state. As you can see by
> the 'time' output, it's running at around 100,000 packets per second. tcpdump
> dropped the vast majority of these. Based on the network interface stats, I
> believe both sides of the connection are sending acks at about the same
> rate (about 160kpps when tcpdump is not running it seems).
Warning : tcpdump can lie, telling you packets being transmited several time.
And yes, tcpdump slow things down because of enabling accurate timestamping of packets.
>
>
> 10:46:46.541490 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 <nop,nop,timestamp 85158912 84963208>
> 10:46:46.541494 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
> 10:46:46.541567 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
> 10:46:46.541653 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
> 10:46:46.541886 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
> 10:46:46.541891 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48 win 6132 <nop,nop,timestamp 85158912 84963208>
> 10:46:46.541895 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
> 10:46:46.541988 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1 win 114 <nop,nop,timestamp 85158912 84963208>
>
What
"tc -s -d qdisc"
"ifconfig -a"
"cat /proc/interrupts"
"cat /proc/net/sockstat" and
"cat /proc/net/softnet_stat" are telling ?
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-12 18:19 ` Eric Dumazet
@ 2007-04-12 19:12 ` Ben Greear
2007-04-12 20:41 ` Eric Dumazet
2007-04-13 16:10 ` Daniel Schaffrath
1 sibling, 1 reply; 41+ messages in thread
From: Ben Greear @ 2007-04-12 19:12 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andi Kleen, netdev, bcrl
Eric Dumazet wrote:
> What
> "tc -s -d qdisc"
> "ifconfig -a"
> "cat /proc/interrupts"
> "cat /proc/net/sockstat" and
> "cat /proc/net/softnet_stat" are telling ?
In this test, eth2 is talking to eth3, using something similar to this
send-to-self patch:
http://www.candelatech.com/oss/sts.patch
[root@lf1001-240 ipv4]# ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:30:48:89:74:60
inet addr:192.168.100.187 Bcast:192.168.100.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1672220 errors:0 dropped:0 overruns:0 frame:0
TX packets:1560305 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:151896589 (144.8 MiB) TX bytes:1375280163 (1.2 GiB)
Interrupt:17
eth1 Link encap:Ethernet HWaddr 00:30:48:89:74:61
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:18
eth2 Link encap:Ethernet HWaddr 00:07:E9:1F:CE:02
inet addr:20.20.20.20 Bcast:20.20.20.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2175144684 errors:0 dropped:2 overruns:0 frame:0
TX packets:2196560123 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1321380186 (1.2 GiB) TX bytes:2274982574 (2.1 GiB)
Base address:0xd000 Memory:d0000000-d0020000
eth3 Link encap:Ethernet HWaddr 00:07:E9:1F:CE:03
inet addr:20.20.20.30 Bcast:20.20.20.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2196315966 errors:0 dropped:0 overruns:0 frame:0
TX packets:2174900538 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2257901986 (2.1 GiB) TX bytes:1304493504 (1.2 GiB)
Base address:0xd100 Memory:d0020000-d0040000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1159378 errors:0 dropped:0 overruns:0 frame:0
TX packets:1159378 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1133646590 (1.0 GiB) TX bytes:1133646590 (1.0 GiB)
[root@lf1001-240 ipv4]# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 1367521025 bytes 1324808 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc pfifo_fast 0: dev eth1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc pfifo_fast 0: dev eth2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 1815657070136 bytes 1536674488 pkt (dropped 0, overlimits 0 requeues 1448094)
rate 0bit 0pps backlog 0b 0p requeues 1448094
qdisc pfifo_fast 0: dev eth3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 1752033393324 bytes 1536906566 pkt (dropped 0, overlimits 0 requeues 1063672)
rate 0bit 0pps backlog 0b 0p requeues 1063672
[root@lf1001-240 ipv4]# cat /proc/interrupts
CPU0 CPU1
0: 46020594 44501954 IO-APIC-edge timer
1: 9 0 IO-APIC-edge i8042
7: 0 0 IO-APIC-edge parport0
8: 1 0 IO-APIC-edge rtc
9: 0 0 IO-APIC-level acpi
12: 96 0 IO-APIC-edge i8042
14: 394023 407282 IO-APIC-edge ide0
16: 0 0 IO-APIC-level uhci_hcd:usb4
17: 1134346 1034006 IO-APIC-level uhci_hcd:usb3, eth0
18: 81605 83739 IO-APIC-level libata, uhci_hcd:usb2, eth1
19: 0 0 IO-APIC-level uhci_hcd:usb1, ehci_hcd:usb5
20: 53056128 46598235 IO-APIC-level eth2
21: 47534577 52189674 IO-APIC-level eth3
NMI: 0 0
LOC: 90485383 90485382
ERR: 0
MIS: 0
[root@lf1001-240 ipv4]# cat /proc/net/sockstat
sockets: used 334
TCP: inuse 27 orphan 0 tw 0 alloc 27 mem 360
UDP: inuse 12
RAW: inuse 0
FRAG: inuse 0 memory 0
[root@lf1001-240 ipv4]# cat /proc/net/softnet_stat
d58236f1 00000000 023badc3 00000000 00000000 00000000 00000000 00000000 0004ef01
3a4354a1 00000000 01b57b4b 00000000 00000000 00000000 00000000 00000000 0005445f
Thanks,
Ben
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: TCP connection stops after high load.
2007-04-12 19:12 ` Ben Greear
@ 2007-04-12 20:41 ` Eric Dumazet
2007-04-12 21:36 ` Ben Greear
0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2007-04-12 20:41 UTC (permalink / raw)
To: Ben Greear; +Cc: Andi Kleen, netdev, bcrl
Ben Greear a écrit :
> Eric Dumazet wrote:
>
>> What "tc -s -d qdisc"
>> "ifconfig -a"
>> "cat /proc/interrupts" "cat /proc/net/sockstat" and
>> "cat /proc/net/softnet_stat" are telling ?
>
>
> In this test, eth2 is talking to eth3, using something similar to this
> send-to-self patch:
> http://www.candelatech.com/oss/sts.patch
>
>
> [root@lf1001-240 ipv4]# ifconfig -a
> eth0 Link encap:Ethernet HWaddr 00:30:48:89:74:60
> inet addr:192.168.100.187 Bcast:192.168.100.255
> Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:1672220 errors:0 dropped:0 overruns:0 frame:0
> TX packets:1560305 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:151896589 (144.8 MiB) TX bytes:1375280163 (1.2 GiB)
> Interrupt:17
>
> eth1 Link encap:Ethernet HWaddr 00:30:48:89:74:61
> UP BROADCAST MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> Interrupt:18
>
> eth2 Link encap:Ethernet HWaddr 00:07:E9:1F:CE:02
> inet addr:20.20.20.20 Bcast:20.20.20.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2175144684 errors:0 dropped:2 overruns:0 frame:0
> TX packets:2196560123 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:1321380186 (1.2 GiB) TX bytes:2274982574 (2.1 GiB)
> Base address:0xd000 Memory:d0000000-d0020000
>
> eth3 Link encap:Ethernet HWaddr 00:07:E9:1F:CE:03
> inet addr:20.20.20.30 Bcast:20.20.20.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2196315966 errors:0 dropped:0 overruns:0 frame:0
> TX packets:2174900538 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:2257901986 (2.1 GiB) TX bytes:1304493504 (1.2 GiB)
> Base address:0xd100 Memory:d0020000-d0040000
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:1159378 errors:0 dropped:0 overruns:0 frame:0
> TX packets:1159378 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:1133646590 (1.0 GiB) TX bytes:1133646590 (1.0 GiB)
>
> [root@lf1001-240 ipv4]# tc -s -d qdisc
> qdisc pfifo_fast 0: dev eth0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1
> 1 1 1
> Sent 1367521025 bytes 1324808 pkt (dropped 0, overlimits 0 requeues 0)
> rate 0bit 0pps backlog 0b 0p requeues 0
> qdisc pfifo_fast 0: dev eth1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1
> 1 1 1
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> rate 0bit 0pps backlog 0b 0p requeues 0
> qdisc pfifo_fast 0: dev eth2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1
> 1 1 1
> Sent 1815657070136 bytes 1536674488 pkt (dropped 0, overlimits 0
> requeues 1448094)
> rate 0bit 0pps backlog 0b 0p requeues 1448094
> qdisc pfifo_fast 0: dev eth3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1
> 1 1 1
> Sent 1752033393324 bytes 1536906566 pkt (dropped 0, overlimits 0
> requeues 1063672)
> rate 0bit 0pps backlog 0b 0p requeues 1063672
>
> [root@lf1001-240 ipv4]# cat /proc/interrupts
> CPU0 CPU1
> 0: 46020594 44501954 IO-APIC-edge timer
> 1: 9 0 IO-APIC-edge i8042
> 7: 0 0 IO-APIC-edge parport0
> 8: 1 0 IO-APIC-edge rtc
> 9: 0 0 IO-APIC-level acpi
> 12: 96 0 IO-APIC-edge i8042
> 14: 394023 407282 IO-APIC-edge ide0
> 16: 0 0 IO-APIC-level uhci_hcd:usb4
> 17: 1134346 1034006 IO-APIC-level uhci_hcd:usb3, eth0
> 18: 81605 83739 IO-APIC-level libata, uhci_hcd:usb2, eth1
> 19: 0 0 IO-APIC-level uhci_hcd:usb1, ehci_hcd:usb5
> 20: 53056128 46598235 IO-APIC-level eth2
> 21: 47534577 52189674 IO-APIC-level eth3
> NMI: 0 0
> LOC: 90485383 90485382
> ERR: 0
> MIS: 0
>
> [root@lf1001-240 ipv4]# cat /proc/net/sockstat
> sockets: used 334
> TCP: inuse 27 orphan 0 tw 0 alloc 27 mem 360
> UDP: inuse 12
> RAW: inuse 0
> FRAG: inuse 0 memory 0
>
> [root@lf1001-240 ipv4]# cat /proc/net/softnet_stat
> d58236f1 00000000 023badc3 00000000 00000000 00000000 00000000 00000000
> 0004ef01
> 3a4354a1 00000000 01b57b4b 00000000 00000000 00000000 00000000 00000000
> 0005445f
>
Hum, could you try to bind nic irqs on separate cpus ?
eth2 -> CPU0 and eth3 -> CPU1
# echo 1 >/proc/irq/20/smp_affinity
# echo 2 >/proc/irq/21/smp_affinity
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-12 20:41 ` Eric Dumazet
@ 2007-04-12 21:36 ` Ben Greear
2007-04-13 7:09 ` Evgeniy Polyakov
0 siblings, 1 reply; 41+ messages in thread
From: Ben Greear @ 2007-04-12 21:36 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andi Kleen, netdev, bcrl
Eric Dumazet wrote:
> Hum, could you try to bind nic irqs on separate cpus ?
I just started a run on 2.6.20.4, and so far (~20 minutes), it
is behaving perfect..running at around 925Mbps in both directions.
CWND averages about 600, bouncing from a low of 300 up to 800, but
that could very well be perfectly normal. I'm quite pleased with
the faster performance in this kernel as well...seems like the old one
would rarely get above 800Mbps even when it was passing traffic!
I am not sure if the problem is fixed or just harder to hit,
but for now it looks good.
I'm going to also try a 2.6.19 kernel and see if the problem hits there
in an attempt to figure out what patch changed the behaviour.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-12 21:36 ` Ben Greear
@ 2007-04-13 7:09 ` Evgeniy Polyakov
2007-04-13 16:42 ` Ben Greear
0 siblings, 1 reply; 41+ messages in thread
From: Evgeniy Polyakov @ 2007-04-13 7:09 UTC (permalink / raw)
To: Ben Greear; +Cc: Eric Dumazet, Andi Kleen, netdev, bcrl
On Thu, Apr 12, 2007 at 02:36:34PM -0700, Ben Greear (greearb@candelatech.com) wrote:
> I am not sure if the problem is fixed or just harder to hit,
> but for now it looks good.
Wasn't default congestion control algo changed between that kernel
releases?
With such small rtt like in your setup there could be some obscure bug,
try to set different one and check if it still works good/bad.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-13 7:09 ` Evgeniy Polyakov
@ 2007-04-13 16:42 ` Ben Greear
0 siblings, 0 replies; 41+ messages in thread
From: Ben Greear @ 2007-04-13 16:42 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: Eric Dumazet, Andi Kleen, netdev, bcrl
Evgeniy Polyakov wrote:
> On Thu, Apr 12, 2007 at 02:36:34PM -0700, Ben Greear (greearb@candelatech.com) wrote:
>
>> I am not sure if the problem is fixed or just harder to hit,
>> but for now it looks good.
>>
>
> Wasn't default congestion control algo changed between that kernel
> releases?
> With such small rtt like in your setup there could be some obscure bug,
> try to set different one and check if it still works good/bad.
>
I had earlier tried changing between bic and reno (the only two I had
compiled in
that kernel), and it did not affect anything. I also realized that I
had been reproducing
the bug (and the traces I sent to this list earlier) on a 2.6.17.4
kernel..not 2.6.18 as
I had supposed. So, it's possible that the problem was fixed between
2.6.17.4 and
2.6.18.2 as well.
I also figured out yesterday that rebooting to go to a new kernel makes
it slower
to reproduce, even on kernels known to have the problem. This is
probably because
lots of memory is available after a reboot. I am going to set up
some long term tests on 2.6.18, 2.6.19 and 2.6.20 and let them cook for
several
days to make sure the problem is truly fixed in the later kernels.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-12 18:19 ` Eric Dumazet
2007-04-12 19:12 ` Ben Greear
@ 2007-04-13 16:10 ` Daniel Schaffrath
2007-04-13 16:41 ` Eric Dumazet
1 sibling, 1 reply; 41+ messages in thread
From: Daniel Schaffrath @ 2007-04-13 16:10 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ben Greear, Andi Kleen, netdev, bcrl
On 2007/04/12 , at 20:19, Eric Dumazet wrote:
> On Thu, 12 Apr 2007 10:59:19 -0700
> Ben Greear <greearb@candelatech.com> wrote:
>>
>> Here is a tcpdump of the connection in the stalled state. As you
>> can see by
>> the 'time' output, it's running at around 100,000 packets per
>> second. tcpdump
>> dropped the vast majority of these. Based on the network
>> interface stats, I
>> believe both sides of the connection are sending acks at about the
>> same
>> rate (about 160kpps when tcpdump is not running it seems).
>
> Warning : tcpdump can lie, telling you packets being transmited
> several time.
Maybe you have further pointers how come that tcpdump lies about
duplicated packets?
Thanks a lot,
Daniel
>
> And yes, tcpdump slow things down because of enabling accurate
> timestamping of packets.
>
>>
>>
>> 10:46:46.541490 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48
>> win 6132 <nop,nop,timestamp 85158912 84963208>
>> 10:46:46.541494 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1
>> win 114 <nop,nop,timestamp 85158912 84963208>
>> 10:46:46.541567 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1
>> win 114 <nop,nop,timestamp 85158912 84963208>
>> 10:46:46.541653 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1
>> win 114 <nop,nop,timestamp 85158912 84963208>
>> 10:46:46.541886 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1
>> win 114 <nop,nop,timestamp 85158912 84963208>
>> 10:46:46.541891 IP 20.20.20.20.33011 > 20.20.20.30.33012: . ack 48
>> win 6132 <nop,nop,timestamp 85158912 84963208>
>> 10:46:46.541895 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1
>> win 114 <nop,nop,timestamp 85158912 84963208>
>> 10:46:46.541988 IP 20.20.20.30.33012 > 20.20.20.20.33011: . ack 1
>> win 114 <nop,nop,timestamp 85158912 84963208>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-13 16:10 ` Daniel Schaffrath
@ 2007-04-13 16:41 ` Eric Dumazet
2007-04-14 4:21 ` Herbert Xu
0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2007-04-13 16:41 UTC (permalink / raw)
To: Daniel Schaffrath; +Cc: Ben Greear, Andi Kleen, netdev, bcrl
On Fri, 13 Apr 2007 18:10:12 +0200
Daniel Schaffrath <danielschaffrath@mac.com> wrote:
>
> On 2007/04/12 , at 20:19, Eric Dumazet wrote:
> >
> > Warning : tcpdump can lie, telling you packets being transmited
> > several time.
> Maybe you have further pointers how come that tcpdump lies about
> duplicated packets?
>
dev_queue_xmit_nit() is called before attempting to send packet to device.
If device could not accept the packet (hard_start_xmit() returns an error), packet is requeued and retried later.
each retry means call ev_queue_xmit_nit() again, so tcpdump/sniffers can 'see' packet transmited several times.
This is why I asked for "tc -s -d qdisc" results : to check the requeue counter (not its absolute value, but relative to number of packets sent)
See dev_hard_start_xmit() in net/core/dev.c
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-13 16:41 ` Eric Dumazet
@ 2007-04-14 4:21 ` Herbert Xu
2007-04-14 4:25 ` David Miller
2007-04-14 5:31 ` Eric Dumazet
0 siblings, 2 replies; 41+ messages in thread
From: Herbert Xu @ 2007-04-14 4:21 UTC (permalink / raw)
To: Eric Dumazet; +Cc: danielschaffrath, greearb, andi, netdev, bcrl
Eric Dumazet <dada1@cosmosbay.com> wrote:
>
> dev_queue_xmit_nit() is called before attempting to send packet to device.
>
> If device could not accept the packet (hard_start_xmit() returns an error), packet is requeued and retried later.
> each retry means call ev_queue_xmit_nit() again, so tcpdump/sniffers can 'see' packet transmited several times.
This should only happen with LLTX drivers. In fact, LLTX drivers are
really more trouble than they're worth. They should all be rewritten
to follow the model used in tg3.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: TCP connection stops after high load.
2007-04-14 4:21 ` Herbert Xu
@ 2007-04-14 4:25 ` David Miller
2007-04-14 5:31 ` Eric Dumazet
1 sibling, 0 replies; 41+ messages in thread
From: David Miller @ 2007-04-14 4:25 UTC (permalink / raw)
To: herbert; +Cc: dada1, danielschaffrath, greearb, andi, netdev, bcrl
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 14 Apr 2007 14:21:44 +1000
> Eric Dumazet <dada1@cosmosbay.com> wrote:
> >
> > dev_queue_xmit_nit() is called before attempting to send packet to device.
> >
> > If device could not accept the packet (hard_start_xmit() returns an error), packet is requeued and retried later.
> > each retry means call ev_queue_xmit_nit() again, so tcpdump/sniffers can 'see' packet transmited several times.
>
> This should only happen with LLTX drivers. In fact, LLTX drivers are
> really more trouble than they're worth. They should all be rewritten
> to follow the model used in tg3.
Agreed.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-14 4:21 ` Herbert Xu
2007-04-14 4:25 ` David Miller
@ 2007-04-14 5:31 ` Eric Dumazet
2007-04-14 5:37 ` David Miller
1 sibling, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2007-04-14 5:31 UTC (permalink / raw)
To: Herbert Xu; +Cc: danielschaffrath, greearb, andi, netdev, bcrl
Herbert Xu a écrit :
> Eric Dumazet <dada1@cosmosbay.com> wrote:
>> dev_queue_xmit_nit() is called before attempting to send packet to device.
>>
>> If device could not accept the packet (hard_start_xmit() returns an error), packet is requeued and retried later.
>> each retry means call ev_queue_xmit_nit() again, so tcpdump/sniffers can 'see' packet transmited several times.
>
> This should only happen with LLTX drivers. In fact, LLTX drivers are
> really more trouble than they're worth. They should all be rewritten
> to follow the model used in tg3.
When did tg3 model changed exactly ?
Because I remember having this 'problem' with tg3 devices not a long time ago...
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-14 5:31 ` Eric Dumazet
@ 2007-04-14 5:37 ` David Miller
0 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2007-04-14 5:37 UTC (permalink / raw)
To: dada1; +Cc: herbert, danielschaffrath, greearb, andi, netdev, bcrl
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Sat, 14 Apr 2007 07:31:35 +0200
> When did tg3 model changed exactly ?
June of 2006:
commit 00b7050426da8e7e58c889c5c80a19920d2d41b3
Author: Michael Chan <mchan@broadcom.com>
Date: Sat Jun 17 21:58:45 2006 -0700
[TG3]: Convert to non-LLTX
Herbert Xu pointed out that it is unsafe to call netif_tx_disable()
from LLTX drivers because it uses dev->xmit_lock to synchronize
whereas LLTX drivers use private locks.
Convert tg3 to non-LLTX to fix this issue. tg3 is a lockless driver
where hard_start_xmit and tx completion handling can run concurrently
under normal conditions. A tx_lock is only needed to prevent
netif_stop_queue and netif_wake_queue race condtions when the queue
is full.
So whether we use LLTX or non-LLTX, it makes practically no
difference.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-11 18:50 Ben Greear
2007-04-11 20:26 ` Ben Greear
@ 2007-04-11 20:41 ` David Miller
2007-04-12 6:12 ` Ilpo Järvinen
2 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2007-04-11 20:41 UTC (permalink / raw)
To: greearb; +Cc: netdev
From: Ben Greear <greearb@candelatech.com>
Date: Wed, 11 Apr 2007 11:50:18 -0700
> So, I would like to dig into this problem myself since no one else
> is reporting this type of problem, but I am quite ignorant of the TCP
> stack implementation. Based on the dup-acks I see on the wire, I assume
> the TCP state machine is messed up somehow. Could anyone point me to
> likely places in the TCP stack to start looking for this bug?
Dup acks mean that packets are being dropped and there are thus holes
in the sequence seen at the receiver.
Likely what happens is that we hit the global memory pressure
limit, start dropping packets, and never recover even after the
memory pressure is within it's limits again.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: TCP connection stops after high load.
2007-04-11 18:50 Ben Greear
2007-04-11 20:26 ` Ben Greear
2007-04-11 20:41 ` David Miller
@ 2007-04-12 6:12 ` Ilpo Järvinen
2 siblings, 0 replies; 41+ messages in thread
From: Ilpo Järvinen @ 2007-04-12 6:12 UTC (permalink / raw)
To: Ben Greear; +Cc: NetDev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3657 bytes --]
On Wed, 11 Apr 2007, Ben Greear wrote:
> The problem is that I set up a TCP connection with bi-directional traffic
> of around 800Mbps, doing large (20k - 64k writes and reads) between two ports
> on
> the same machine (this 2.6.18.2 kernel is tainted with my full patch set,
> but I also reproduced with only the non-tainted send-to-self patch applied
> last may on the 2.6.16 kernel, so I assume the bug is not particular to my
> patch
> set).
>
> At first, all is well, but within 5-10 minutes, the TCP connection will stall
> and I only see a massive amount of duplicate ACKs on the link. Before,
> I sometimes saw OOM messages, but this time there are no OOM messages. The
> system
> has a two-port pro/1000 fibre NIC, 1GB RAM, kernel 2.6.18.2 + hacks, etc.
> Stopping and starting the connection allows traffic to flow again (if
> briefly).
> Starting a new connection works fine even if the old one is still stalled,
> so it's not a global memory exhaustion problem.
>
> So, I would like to dig into this problem myself since no one else
> is reporting this type of problem, but I am quite ignorant of the TCP
> stack implementation. Based on the dup-acks I see on the wire, I assume
> the TCP state machine is messed up somehow. Could anyone point me to
> likely places in the TCP stack to start looking for this bug?
Since your doing bidirectional, try this patch below (probably you'll have
apply it manually to 2.6.18 series due to space changes that were made
after it in net/ hierarchy). I suspect it's a part of the problem but
there could be other things as well because this should only hinder TCP
before RTO occurs:
[PATCH] [TCP]: Fix ratehalving with bidirectional flows
Actually, the ratehalving seems to work too well, as cwnd is
reduced on every second ACK even though the packets in flight
remains unchanged. Recoveries in a bidirectional flows suffer
quite badly because of this, both NewReno and SACK are affected.
After this patch, rate halving is performed per ACK only if
packets in flight was supposedly changed too.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
net/ipv4/tcp_input.c | 23 +++++++++++++----------
1 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 322e43c..bf0f74c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1823,19 +1823,22 @@ static inline u32 tcp_cwnd_min(const str
}
/* Decrease cwnd each second ack. */
-static void tcp_cwnd_down(struct sock *sk)
+static void tcp_cwnd_down(struct sock *sk, int flag)
{
struct tcp_sock *tp = tcp_sk(sk);
int decr = tp->snd_cwnd_cnt + 1;
+
+ if ((flag&FLAG_FORWARD_PROGRESS) ||
+ (IsReno(tp) && !(flag&FLAG_NOT_DUP))) {
+ tp->snd_cwnd_cnt = decr&1;
+ decr >>= 1;
- tp->snd_cwnd_cnt = decr&1;
- decr >>= 1;
+ if (decr && tp->snd_cwnd > tcp_cwnd_min(sk))
+ tp->snd_cwnd -= decr;
- if (decr && tp->snd_cwnd > tcp_cwnd_min(sk))
- tp->snd_cwnd -= decr;
-
- tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1);
- tp->snd_cwnd_stamp = tcp_time_stamp;
+ tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1);
+ tp->snd_cwnd_stamp = tcp_time_stamp;
+ }
}
/* Nothing was retransmitted or returned timestamp is less
@@ -2020,7 +2023,7 @@ static void tcp_try_to_open(struct sock
}
tcp_moderate_cwnd(tp);
} else {
- tcp_cwnd_down(sk);
+ tcp_cwnd_down(sk, flag);
}
}
@@ -2220,7 +2223,7 @@ tcp_fastretrans_alert(struct sock *sk, u
if (is_dupack || tcp_head_timedout(sk, tp))
tcp_update_scoreboard(sk, tp);
- tcp_cwnd_down(sk);
+ tcp_cwnd_down(sk, flag);
tcp_xmit_retransmit_queue(sk);
}
--
1.4.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
end of thread, other threads:[~2007-04-17 19:58 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-12 21:11 TCP connection stops after high load Robert Iakobashvili
2007-04-12 21:15 ` David Miller
2007-04-15 12:14 ` Robert Iakobashvili
2007-04-15 15:31 ` John Heffner
2007-04-15 15:49 ` Robert Iakobashvili
2007-04-16 18:07 ` John Heffner
2007-04-16 18:51 ` Robert Iakobashvili
2007-04-16 19:11 ` John Heffner
2007-04-16 19:17 ` David Miller
2007-04-16 19:15 ` David Miller
2007-04-17 7:58 ` Robert Iakobashvili
2007-04-17 19:39 ` David Miller
2007-04-17 19:47 ` John Heffner
2007-04-17 19:51 ` David Miller
2007-04-17 19:58 ` Robert Iakobashvili
2007-04-15 13:52 ` Robert Iakobashvili
-- strict thread matches above, loose matches on Subject: below --
2007-04-11 18:50 Ben Greear
2007-04-11 20:26 ` Ben Greear
2007-04-11 20:48 ` David Miller
2007-04-11 21:06 ` Ben Greear
2007-04-11 21:11 ` David Miller
2007-04-11 21:31 ` Ben Greear
2007-04-11 21:39 ` David Miller
2007-04-12 2:44 ` SANGTAE HA
2007-04-12 1:06 ` Benjamin LaHaise
2007-04-12 14:48 ` Andi Kleen
2007-04-12 17:59 ` Ben Greear
2007-04-12 18:19 ` Eric Dumazet
2007-04-12 19:12 ` Ben Greear
2007-04-12 20:41 ` Eric Dumazet
2007-04-12 21:36 ` Ben Greear
2007-04-13 7:09 ` Evgeniy Polyakov
2007-04-13 16:42 ` Ben Greear
2007-04-13 16:10 ` Daniel Schaffrath
2007-04-13 16:41 ` Eric Dumazet
2007-04-14 4:21 ` Herbert Xu
2007-04-14 4:25 ` David Miller
2007-04-14 5:31 ` Eric Dumazet
2007-04-14 5:37 ` David Miller
2007-04-11 20:41 ` David Miller
2007-04-12 6:12 ` Ilpo Järvinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).