TCP kernel tables overflowing after sustained 1000 new connections per second

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* TCP kernel tables overflowing after sustained 1000 new connections  per second
@ 2009-09-09 18:46 Paul Sheer
  2009-09-09 19:16 ` Chuck Ebbert
  2009-09-10  0:08 ` David Miller
  0 siblings, 2 replies; 5+ messages in thread
From: Paul Sheer @ 2009-09-09 18:46 UTC (permalink / raw)
  To: linux-kernel, roque

I am developing a high-performance application, and testing against Apache.
It makes 1000 new connections to Apache per second.

After 16 seconds the test grinds to a halt. A Linux kernel problem. There are
several hurdles to overcome when trying to sustain such through-put. Some
are configuration issues, others I believe are real problems with the kernel
internals. I'll discuss these all below.

Configuration:

These are the relavent kernel configuration parameters:

 /proc/sys/net/ipv4/tcp_tw_recycle
 /proc/sys/net/ipv4/tcp_tw_reuse
 /proc/sys/net/ipv4/tcp_max_tw_buckets
 /proc/sys/net/ipv4/ip_local_port_range
 /proc/sys/net/ipv4/tcp_timestamps
 /proc/sys/net/ipv4/tcp_fin_timeout
 /proc/sys/net/ipv4/tcp_orphan_retries
 /proc/sys/net/ipv4/tcp_rfc1337
 /proc/sys/net/ipv4/tcp_max_orphans
 /proc/sys/net/ipv4/tcp_max_syn_backlog
 /proc/sys/net/ipv4/tcp_mem

On a gigabit local LAN I can set the timeouts very low to encourage
port reuse. A well known configuration issue with all OS's - just search
for MyOS+TIMED_WAIT on google. No problems here.

The second problem is the ip_conntrack module.

If you don't know that your distribution has enabled this module
by default, it not easy to work out that it has internal tables
that max out at 16384.  So this explains why my system
stops accepting connections after exactly 16 seconds.
If you stop the application, give it a few minutes, try again,
then you can do another 16 seconds of flat out load for it
grinds to a halt again. Doing an rm on the module ko and
rebooting fixed *this* problem.

The third problem seems to be connected to /proc/net/tcp6

look at the output of the script

while true ; do echo "`date`: `cat /proc/net/tcp6 | wc -l`  vs  `cat
/proc/net/tcp | wc -l`" ; sleep 1 ; done

while I run my load test:

 Wed Sep  9 20:39:26 SAST 2009: 5  vs  20
 Wed Sep  9 20:39:27 SAST 2009: 5  vs  20
 Wed Sep  9 20:39:28 SAST 2009: 5  vs  20
 Wed Sep  9 20:39:29 SAST 2009: 5  vs  20
 Wed Sep  9 20:39:31 SAST 2009: 1233  vs  20
 Wed Sep  9 20:39:32 SAST 2009: 2640  vs  21
 Wed Sep  9 20:39:33 SAST 2009: 4190  vs  20
 Wed Sep  9 20:39:34 SAST 2009: 5813  vs  20
 Wed Sep  9 20:39:35 SAST 2009: 7527  vs  20
 Wed Sep  9 20:39:37 SAST 2009: 9568  vs  44
 Wed Sep  9 20:39:38 SAST 2009: 11819  vs  21
 Wed Sep  9 20:39:40 SAST 2009: 14510  vs  21
 Wed Sep  9 20:39:42 SAST 2009: 16971  vs  20
 Wed Sep  9 20:39:44 SAST 2009: 16971  vs  20
 Wed Sep  9 20:39:46 SAST 2009: 17013  vs  20
 Wed Sep  9 20:39:48 SAST 2009: 17013  vs  20
 Wed Sep  9 20:39:50 SAST 2009: 17013  vs  20

So it is clear "something" is filling up in tcp_ipv6.c

any ideas Pedro?
anyone?

Many thanks.

-paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP kernel tables overflowing after sustained 1000 new connections  per second
  2009-09-09 18:46 TCP kernel tables overflowing after sustained 1000 new connections per second Paul Sheer
@ 2009-09-09 19:16 ` Chuck Ebbert
  2009-09-10  0:08 ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: Chuck Ebbert @ 2009-09-09 19:16 UTC (permalink / raw)
  To: Paul Sheer; +Cc: linux-kernel, roque

On Wed, 9 Sep 2009 20:46:07 +0200
Paul Sheer <paulsheer@gmail.com> wrote:

> I am developing a high-performance application, and testing against Apache.
> It makes 1000 new connections to Apache per second.
> 
> After 16 seconds the test grinds to a halt. A Linux kernel problem. There are
> several hurdles to overcome when trying to sustain such through-put. Some
> are configuration issues, others I believe are real problems with the kernel
> internals. I'll discuss these all below.
> 

You should send this to netdev@vger.kernel.org .

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP kernel tables overflowing after sustained 1000 new connections per second
  2009-09-09 18:46 TCP kernel tables overflowing after sustained 1000 new connections per second Paul Sheer
  2009-09-09 19:16 ` Chuck Ebbert
@ 2009-09-10  0:08 ` David Miller
  2009-09-10  0:26   ` Brian Haley
  2009-09-10  9:24   ` Andi Kleen
  1 sibling, 2 replies; 5+ messages in thread
From: David Miller @ 2009-09-10  0:08 UTC (permalink / raw)
  To: paulsheer; +Cc: linux-kernel, roque, netdev

From: Paul Sheer <paulsheer@gmail.com>
Date: Wed, 9 Sep 2009 20:46:07 +0200

Can you please send networking reports and questions at least
CC:'d to netdev@vger.kernel.org, which is where the networking
developers are subscribed?  I've added it to the CC:

> I am developing a high-performance application, and testing against Apache.
> It makes 1000 new connections to Apache per second.
> 
> After 16 seconds the test grinds to a halt. A Linux kernel problem. There are
> several hurdles to overcome when trying to sustain such through-put. Some
> are configuration issues, others I believe are real problems with the kernel
> internals. I'll discuss these all below.
> 
> Configuration:
> 
> These are the relavent kernel configuration parameters:
> 
>  /proc/sys/net/ipv4/tcp_tw_recycle
>  /proc/sys/net/ipv4/tcp_tw_reuse
>  /proc/sys/net/ipv4/tcp_max_tw_buckets
>  /proc/sys/net/ipv4/ip_local_port_range
>  /proc/sys/net/ipv4/tcp_timestamps
>  /proc/sys/net/ipv4/tcp_fin_timeout
>  /proc/sys/net/ipv4/tcp_orphan_retries
>  /proc/sys/net/ipv4/tcp_rfc1337
>  /proc/sys/net/ipv4/tcp_max_orphans
>  /proc/sys/net/ipv4/tcp_max_syn_backlog
>  /proc/sys/net/ipv4/tcp_mem
> 
> On a gigabit local LAN I can set the timeouts very low to encourage
> port reuse. A well known configuration issue with all OS's - just search
> for MyOS+TIMED_WAIT on google. No problems here.
> 
> 
> The second problem is the ip_conntrack module.
> 
> If you don't know that your distribution has enabled this module
> by default, it not easy to work out that it has internal tables
> that max out at 16384.  So this explains why my system
> stops accepting connections after exactly 16 seconds.
> If you stop the application, give it a few minutes, try again,
> then you can do another 16 seconds of flat out load for it
> grinds to a halt again. Doing an rm on the module ko and
> rebooting fixed *this* problem.
> 
> The third problem seems to be connected to /proc/net/tcp6
> 
> look at the output of the script
> 
> while true ; do echo "`date`: `cat /proc/net/tcp6 | wc -l`  vs  `cat
> /proc/net/tcp | wc -l`" ; sleep 1 ; done
> 
> while I run my load test:
> 
> 
>  Wed Sep  9 20:39:26 SAST 2009: 5  vs  20
>  Wed Sep  9 20:39:27 SAST 2009: 5  vs  20
>  Wed Sep  9 20:39:28 SAST 2009: 5  vs  20
>  Wed Sep  9 20:39:29 SAST 2009: 5  vs  20
>  Wed Sep  9 20:39:31 SAST 2009: 1233  vs  20
>  Wed Sep  9 20:39:32 SAST 2009: 2640  vs  21
>  Wed Sep  9 20:39:33 SAST 2009: 4190  vs  20
>  Wed Sep  9 20:39:34 SAST 2009: 5813  vs  20
>  Wed Sep  9 20:39:35 SAST 2009: 7527  vs  20
>  Wed Sep  9 20:39:37 SAST 2009: 9568  vs  44
>  Wed Sep  9 20:39:38 SAST 2009: 11819  vs  21
>  Wed Sep  9 20:39:40 SAST 2009: 14510  vs  21
>  Wed Sep  9 20:39:42 SAST 2009: 16971  vs  20
>  Wed Sep  9 20:39:44 SAST 2009: 16971  vs  20
>  Wed Sep  9 20:39:46 SAST 2009: 17013  vs  20
>  Wed Sep  9 20:39:48 SAST 2009: 17013  vs  20
>  Wed Sep  9 20:39:50 SAST 2009: 17013  vs  20
> 
> So it is clear "something" is filling up in tcp_ipv6.c
> 
> any ideas Pedro?
> anyone?
> 
> Many thanks.
> 
> -paul
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP kernel tables overflowing after sustained 1000 new connections per second
  2009-09-10  0:08 ` David Miller
@ 2009-09-10  0:26   ` Brian Haley
  2009-09-10  9:24   ` Andi Kleen
  1 sibling, 0 replies; 5+ messages in thread
From: Brian Haley @ 2009-09-10  0:26 UTC (permalink / raw)
  To: paulsheer; +Cc: David Miller, linux-kernel, roque, netdev

>> The third problem seems to be connected to /proc/net/tcp6
>>
>> look at the output of the script
>>
>> while true ; do echo "`date`: `cat /proc/net/tcp6 | wc -l`  vs  `cat
>> /proc/net/tcp | wc -l`" ; sleep 1 ; done
>>
>> while I run my load test:
>>
>>
>>  Wed Sep  9 20:39:26 SAST 2009: 5  vs  20
>>  Wed Sep  9 20:39:27 SAST 2009: 5  vs  20
>>  Wed Sep  9 20:39:28 SAST 2009: 5  vs  20
>>  Wed Sep  9 20:39:29 SAST 2009: 5  vs  20
>>  Wed Sep  9 20:39:31 SAST 2009: 1233  vs  20
>>  Wed Sep  9 20:39:32 SAST 2009: 2640  vs  21
>>  Wed Sep  9 20:39:33 SAST 2009: 4190  vs  20
>>  Wed Sep  9 20:39:34 SAST 2009: 5813  vs  20
>>  Wed Sep  9 20:39:35 SAST 2009: 7527  vs  20
>>  Wed Sep  9 20:39:37 SAST 2009: 9568  vs  44
>>  Wed Sep  9 20:39:38 SAST 2009: 11819  vs  21
>>  Wed Sep  9 20:39:40 SAST 2009: 14510  vs  21
>>  Wed Sep  9 20:39:42 SAST 2009: 16971  vs  20
>>  Wed Sep  9 20:39:44 SAST 2009: 16971  vs  20
>>  Wed Sep  9 20:39:46 SAST 2009: 17013  vs  20
>>  Wed Sep  9 20:39:48 SAST 2009: 17013  vs  20
>>  Wed Sep  9 20:39:50 SAST 2009: 17013  vs  20
>>
>> So it is clear "something" is filling up in tcp_ipv6.c

By default, apache is going to open an IPv6 socket, so every connection,
even IPv4, will use an IPv6 socket with a mapped address:

# netstat -anp | grep apache
tcp6       0      0 :::80                   :::*                    LISTEN     27795/apache2       
tcp6       0      0 ::ffff:10.0.0.1:80   ::ffff:10.0.0.2:35271      ESTABLISHED27813/apache2

I'm guessing that 17013 is your 16384 plus a few in time-wait, right?

There's a way to change it to be IPv4-only in the conf file from what I remember.

-Brian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP kernel tables overflowing after sustained 1000 new connections per second
  2009-09-10  0:08 ` David Miller
  2009-09-10  0:26   ` Brian Haley
@ 2009-09-10  9:24   ` Andi Kleen
  1 sibling, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2009-09-10  9:24 UTC (permalink / raw)
  To: David Miller; +Cc: paulsheer, linux-kernel, roque, netdev

> On a gigabit local LAN I can set the timeouts very low to encourage
> port reuse. A well known configuration issue with all OS's - just search
> for MyOS+TIMED_WAIT on google. No problems here.

The timeouts are what they are for a reason to detect old packets in
the network and prevent data corruption. That's why the RFCs require
them. 

Unless you never run on WANs or have very strong data integry checking
in your application (e.g. SSL) it's normally not a good idea to mess
with them.

When you run out of port space you should use more local IP addresses.

Possibly if you don't have problems with firewalls you could
also increase the port space, but that's still limited.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-09-10  9:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-09 18:46 TCP kernel tables overflowing after sustained 1000 new connections per second Paul Sheer
2009-09-09 19:16 ` Chuck Ebbert
2009-09-10  0:08 ` David Miller
2009-09-10  0:26   ` Brian Haley
2009-09-10  9:24   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox