netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Graf <tgraf@infradead.org>
To: netdev@vger.kernel.org
Subject: [RFC] random SYN drops causing connect() delays
Date: Mon, 12 Apr 2010 04:06:33 -0400	[thread overview]
Message-ID: <20100412080633.GA27418@bombadil.infradead.org> (raw)

Hello,

I have been tracking down an issue commonly referred to as the 3-sec
connect() delay. It exists since recent 2.6.x kernels and has never
been fixed even though it disappeared in recent releases unless
sched_child_runs_first is set to 1 again.

What happens is that if a client attemps to open many connections to
a socket with only minimal delay inbetween attemps some SYNs are
randomly dropped on the server side causing the client to resend after
the 3 sec TCP timeout and thus causing connect()s to be randomly delayed.

Steps to reproduce:
 1. Compile reproducer attached below
 2. run ./test_delay 127.0.0.1 22 10000 0 > log
 3. awk -F: '{if ($2>2990) print $1 $2;}' log
 4. all listed connection attemps will have been delayed for >3s

Facts:
 - Issue can be reproduced over loopback or real networks.
 - Enabling SO_LINGER on the client side will make the issue disappear!!
 - While the issue is appearing, the acceptq seems to be overflowing. Both
   LISTENOVERFLOWS and LISTENDROPS are increasing although not by the exact
   number of delay occurences. inetdiag reports sk_max_ack_backlog to be 0
   therefore one possibility that comes to mind is that sk_ack_backlog
   underflows due to a race.
 - The issue disappeared in recent kernels, I bisected it down to the following
   commit:
	commit 2bba22c50b06abe9fd0d23933b1e64d35b419262
	Author: Mike Galbraith <efault@gmx.de>
	Date:   Wed Sep 9 15:41:37 2009 +0200

	    sched: Turn off child_runs_first
	    
	    Set child_runs_first default to off.

   Setting kernel.sched_child_runs_first=1 makes the isssue reappear in recent
   kernels.  This hardens the theory of a race condition.
 - It looks like that the issue can only be reproduced if the server
   socket sends out data immediately after the connection has been established
   but I cannot proof this theory.

I will continue to look into the sk_ack_backlog underflow theory but would
appreciate any comments or theories.

Thanks,

Reproducer:

#include <sys/socket.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>
#include <sys/time.h>

int main(int argc, char *argv[])
{

        int sock,i;  
        struct timeval tim;
        double start,end;  

        struct hostent *host;
        struct sockaddr_in server_addr, local;
	socklen_t len = sizeof(local);

        char* hostname;
        int port, count, delay;

        if( argc < 3 ){
           printf("Usage:\n\t%s host port [count=1000] [delay=0]\n",argv[0]);
           return 1;
        }

        hostname = argv[1];
        port = atoi(argv[2]);

        if( argc > 3 )
           count = atoi(argv[3]);
	else
           count = 1000;

        if( argc > 4 )
           delay = atoi(argv[4]);
	else
           delay = 0;

        host = gethostbyname(hostname);

        server_addr.sin_family = AF_INET;     
        server_addr.sin_port = htons(port);   
        server_addr.sin_addr = *((struct in_addr *)host->h_addr);
        bzero(&(server_addr.sin_zero),8); 

        for(i=0; i< count; i=i+1){
          gettimeofday(&tim, NULL);
          start=tim.tv_sec*1000+(tim.tv_usec/1000);

          if ((sock = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
            perror("Socket");
            exit(1);
          }

          if (connect(sock, (struct sockaddr *)&server_addr,
                    sizeof(struct sockaddr)) == -1) 
          {
            perror("Connect");
            exit(1);
          }

	  getsockname(sock, (struct sockaddr *) &local, &len);
          close(sock);

          gettimeofday(&tim, NULL);
          end=tim.tv_sec*1000+(tim.tv_usec/1000);
          printf("[%d] %u-> Time to open socket (clock): %d\n",
	  	i, ntohs(local.sin_port), (int)(end - start));
	  usleep(delay*1000);
        }
/*
        printf("Time to open socket (ms): %d\n", ((end - start)*1000)/CLOCKS_PER_SEC);
*/

        return 0;
}



             reply	other threads:[~2010-04-12  8:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-12  8:06 Thomas Graf [this message]
2010-04-12  8:39 ` [RFC] random SYN drops causing connect() delays Thomas Graf
2010-04-28  1:56   ` David Miller
2010-04-28  4:44     ` Thomas Graf
2010-04-28  5:52       ` Eric Dumazet
2010-04-28  6:11         ` Thomas Graf
2010-04-14 11:37 ` Lennart Schulte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100412080633.GA27418@bombadil.infradead.org \
    --to=tgraf@infradead.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).