From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gaspar Chilingarov Subject: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... ) Date: Wed, 21 Apr 2010 03:17:08 +0500 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 To: netdev Return-path: Received: from mail-bw0-f225.google.com ([209.85.218.225]:53919 "EHLO mail-bw0-f225.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752500Ab0DTWRK (ORCPT ); Tue, 20 Apr 2010 18:17:10 -0400 Received: by bwz25 with SMTP id 25so7386777bwz.28 for ; Tue, 20 Apr 2010 15:17:08 -0700 (PDT) Sender: netdev-owner@vger.kernel.org List-ID: [1.] Large amount of outgoing tcp connections fail to bind properly to their ip/ports [2.] Full description of the problem/report: I'm trying to establish huge amount of outgoing tcp connections (over several 100000-s) on a single machine. I need to test load a server, which could process that amount of connections :) The number of connections which are possible to establish from single ip is regulated by net.ipv4.ip_local_port_range = 32768 61000, which gives 28232 connections. Good. I expect that each socket is identified on a local side as unique pair of local_ip:local_port . Thus I've added some more IP addresses (say 10) to the machine (aliases to the same network interface). I expect to be able to establish 10 times more connections than before (I know about file descriptor limits, system limit of total number of file descriptors and so on - which are tuned to high values already). And the fun part begins - I have 28232 on a first source IP (all in established state, say 10.0.0.10) and now I'm trying to establish one more connection with nc, specifying 10.0.0.11 as a source IP -- and getting "unable to bind error" Notes about example; 10.0.0.1:8192 is a server which just accepts a connections and listens forever on them. It's in erlang and it can handle great loads -- so there is not problems on that side. Using the same script I was able to establish more than 20.000 connections without any problems (having a standard local port range set) To make experiment easily reproducible I've done the following things: Decrease number of local ports available to 1001 - net.ipv4.ip_local_port_range = 60000 61000 I have script like this (writing from memory) #!/bin/sh I=0 IP=10.0.0.10 # connection stats before run netstat -n | grep ESTABLISHED | fgrep "$IP" | wc -l while [ $I -le 1000 ]; do # run nc in background, supress any output nc -s $IP 10.0.0.1 8192 > /dev/null 2>&1 & I=$(($I + 1)) done # connection stats after run netstat -n | grep ESTABLISHED | fgrep "$IP" | wc -l EVEN on the first run I get only 990 successful connections! something fails, strange .... nc 10.0.0.1 8192 fails with error "unable to bind" and establishes connection only from 5-10 try. Ook, well, run this script again, get all possible 1001 connections and than change source IP to 10.0.0.11 If you run in several times you will get the following numbers of established connections about each run (for given source IP) ~650, ~870, 950,980,990,995,995, 1000 and several runs to get 1001. Then if you change IP to the next available and run it again - you will get practically the same numbers and this continues for 3-th, 4th, 5-th and other IP's. As a programmer, I feel that there is some hash table for local_ip:local_port pairs in the kernel (may be also incorporating PID), which has a collisions and in case of collision it just fails to reserve/bind this pair for the socket. I hope I'm right, but I've failed to find where the allocation is done :) In case if PID does not change (i've tried to run tests from primitive client in erlang as well -- you get much more worse picture and getting new socket becomes just impossible). I think that even in case if there is one port available for that IP -- it should be possible to bind (even if the kernel should do the full scan on local port range to find that unused port). I would be grateful for hints where to look in the source -- may be I can produce some working patches for it. [3.] Keywords (i.e., modules, networking, kernel): does not matter, i think. [4.] Kernel version (from /proc/version): Ubuntu Karmic Koala on amd64 with latest shipped kernel. Linux version 2.6.31-21-generic (buildd@yellow) (gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9) ) #59-Ubuntu SMP Wed Mar 24 07:28:27 UTC 2010 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) n/a [6.] A small shell script or example program which triggers the problem (if possible) [7.] Environment Thanks in advance, Gaspar -- Gaspar Chilingarov tel +37493 419763 (mobile - leave voice mail message) icq 63174784 skype://gasparch e mailto:nm@web.am mailto:gasparch@gmail.com w http://gasparchilingarov.com/