netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TCP connection stops after high load.
@ 2007-04-12 21:11 Robert Iakobashvili
  2007-04-12 21:15 ` David Miller
  0 siblings, 1 reply; 41+ messages in thread
From: Robert Iakobashvili @ 2007-04-12 21:11 UTC (permalink / raw)
  To: netdev; +Cc: Ben Greear

Hi Ben,

On 4/11/07, Ben Greear <greearb@candelatech.com> wrote:
>  The problem is that I set up a TCP connection with bi-directional traffic
> of around 800Mbps, doing large (20k - 64k writes and reads) between two ports on
> the same machine (this 2.6.18.2  kernel is tainted with my full patch set,
> but I also reproduced with only the non-tainted send-to-self patch applied
> last may on the 2.6.16 kernel, so I assume the bug is not particular to my patch
> set).
>
>  At first, all is well, but within 5-10 minutes, the TCP connection will stall
> and I only see a massive amount of duplicate ACKs on the link.
>

Just today I have faced some problems in the setup lighttpd server
(epoll demultiplexing and increased max-fds num) against curl-loader,
generating HTTP client load, both on the same host.

curl-loader adds 1000-8000 secondary IPv4 addresses to
eth0 interface. Then it opens 20-200 virtual HTTP clients per second till the
steady state number. Each client opens its socket, binds to a
secondary IP-address
and connects to the web server with further HTTP GET/POST, etc
response, etc

It works good with  2.6.11.8 and debian 2.6.18.3-i686 image.

At the same Intel Pentium-4 PC with the same about kernel configuration
(make oldconfig using Debian config-2.6.18.3-i686) the setup fails with the
tcp-connections stalled after 1000 established connections when the kernel
is 2.6.20.6 or 2.6.19.5.

It stalls even earlier, when lighttpd used with the default (poll ())
demultiplexing
after 500 connections or when apache2 web server used (memory?) - after 100
connections.

I am currently going to try vanilla 2.6.18.3 and, if with it also
fails, to look through
Debian patches, trying to figure out, what is the delta.

strace-ing and logs has revealed actually 2 scenarios of failures.
Connections are established successfully and:
- request sent and there is no response;
- partial response received and the connection stalls.

I will also try to collect some streams by tcpdump, using
the filtering by a client side source-ip.

Already tried going from BIC to Reno - not helpful, and loading
from the loopback (lo) - same picture.

Don't fill yourself alone, it may be the same problem, that
we encounter.

Sincerely,
 Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...................................................................
Navigare necesse est, vivere non est necesse
...................................................................
http://curl-loader.sourceforge.net
An open-source HTTP/S, FTP/S traffic
generating, and web testing tool.

^ permalink raw reply	[flat|nested] 41+ messages in thread
* TCP connection stops after high load.
@ 2007-04-11 18:50 Ben Greear
  2007-04-11 20:26 ` Ben Greear
                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Ben Greear @ 2007-04-11 18:50 UTC (permalink / raw)
  To: NetDev

Back in May of last year, I reported this problem, but worked
around it at the time by changing the kernel memory settings
in the networking stack.  I reproduced the problem again today
with the previously working kernel memory settings..which is not
supprising since I just papered over the bug last time.

The problem is that I set up a TCP connection with bi-directional traffic
of around 800Mbps, doing large (20k - 64k writes and reads) between two ports on
the same machine (this 2.6.18.2 kernel is tainted with my full patch set,
but I also reproduced with only the non-tainted send-to-self patch applied
last may on the 2.6.16 kernel, so I assume the bug is not particular to my patch
set).

At first, all is well, but within 5-10 minutes, the TCP connection will stall
and I only see a massive amount of duplicate ACKs on the link.  Before,
I sometimes saw OOM messages, but this time there are no OOM messages.  The system
has a two-port pro/1000 fibre NIC, 1GB RAM, kernel 2.6.18.2 + hacks, etc.
Stopping and starting the connection allows traffic to flow again (if briefly).
Starting a new connection works fine even if the old one is still stalled,
so it's not a global memory exhaustion problem.

So, I would like to dig into this problem myself since no one else
is reporting this type of problem, but I am quite ignorant of the TCP
stack implementation.  Based on the dup-acks I see on the wire, I assume
the TCP state machine is messed up somehow.  Could anyone point me to
likely places in the TCP stack to start looking for this bug?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2007-04-17 19:58 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-12 21:11 TCP connection stops after high load Robert Iakobashvili
2007-04-12 21:15 ` David Miller
2007-04-15 12:14   ` Robert Iakobashvili
2007-04-15 15:31     ` John Heffner
2007-04-15 15:49       ` Robert Iakobashvili
2007-04-16 18:07         ` John Heffner
2007-04-16 18:51           ` Robert Iakobashvili
2007-04-16 19:11             ` John Heffner
2007-04-16 19:17               ` David Miller
2007-04-16 19:15             ` David Miller
2007-04-17  7:58               ` Robert Iakobashvili
2007-04-17 19:39                 ` David Miller
2007-04-17 19:47                   ` John Heffner
2007-04-17 19:51                     ` David Miller
2007-04-17 19:58                   ` Robert Iakobashvili
2007-04-15 13:52   ` Robert Iakobashvili
  -- strict thread matches above, loose matches on Subject: below --
2007-04-11 18:50 Ben Greear
2007-04-11 20:26 ` Ben Greear
2007-04-11 20:48   ` David Miller
2007-04-11 21:06     ` Ben Greear
2007-04-11 21:11       ` David Miller
2007-04-11 21:31         ` Ben Greear
2007-04-11 21:39           ` David Miller
2007-04-12  2:44           ` SANGTAE HA
2007-04-12  1:06       ` Benjamin LaHaise
2007-04-12 14:48       ` Andi Kleen
2007-04-12 17:59         ` Ben Greear
2007-04-12 18:19           ` Eric Dumazet
2007-04-12 19:12             ` Ben Greear
2007-04-12 20:41               ` Eric Dumazet
2007-04-12 21:36                 ` Ben Greear
2007-04-13  7:09                   ` Evgeniy Polyakov
2007-04-13 16:42                     ` Ben Greear
2007-04-13 16:10             ` Daniel Schaffrath
2007-04-13 16:41               ` Eric Dumazet
2007-04-14  4:21                 ` Herbert Xu
2007-04-14  4:25                   ` David Miller
2007-04-14  5:31                   ` Eric Dumazet
2007-04-14  5:37                     ` David Miller
2007-04-11 20:41 ` David Miller
2007-04-12  6:12 ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).