From: Thomas Graf <tgraf@infradead.org>
To: netdev@vger.kernel.org
Subject: [RFC] random SYN drops causing connect() delays
Date: Mon, 12 Apr 2010 04:06:33 -0400 [thread overview]
Message-ID: <20100412080633.GA27418@bombadil.infradead.org> (raw)
Hello,
I have been tracking down an issue commonly referred to as the 3-sec
connect() delay. It exists since recent 2.6.x kernels and has never
been fixed even though it disappeared in recent releases unless
sched_child_runs_first is set to 1 again.
What happens is that if a client attemps to open many connections to
a socket with only minimal delay inbetween attemps some SYNs are
randomly dropped on the server side causing the client to resend after
the 3 sec TCP timeout and thus causing connect()s to be randomly delayed.
Steps to reproduce:
1. Compile reproducer attached below
2. run ./test_delay 127.0.0.1 22 10000 0 > log
3. awk -F: '{if ($2>2990) print $1 $2;}' log
4. all listed connection attemps will have been delayed for >3s
Facts:
- Issue can be reproduced over loopback or real networks.
- Enabling SO_LINGER on the client side will make the issue disappear!!
- While the issue is appearing, the acceptq seems to be overflowing. Both
LISTENOVERFLOWS and LISTENDROPS are increasing although not by the exact
number of delay occurences. inetdiag reports sk_max_ack_backlog to be 0
therefore one possibility that comes to mind is that sk_ack_backlog
underflows due to a race.
- The issue disappeared in recent kernels, I bisected it down to the following
commit:
commit 2bba22c50b06abe9fd0d23933b1e64d35b419262
Author: Mike Galbraith <efault@gmx.de>
Date: Wed Sep 9 15:41:37 2009 +0200
sched: Turn off child_runs_first
Set child_runs_first default to off.
Setting kernel.sched_child_runs_first=1 makes the isssue reappear in recent
kernels. This hardens the theory of a race condition.
- It looks like that the issue can only be reproduced if the server
socket sends out data immediately after the connection has been established
but I cannot proof this theory.
I will continue to look into the sk_ack_backlog underflow theory but would
appreciate any comments or theories.
Thanks,
Reproducer:
#include <sys/socket.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>
#include <sys/time.h>
int main(int argc, char *argv[])
{
int sock,i;
struct timeval tim;
double start,end;
struct hostent *host;
struct sockaddr_in server_addr, local;
socklen_t len = sizeof(local);
char* hostname;
int port, count, delay;
if( argc < 3 ){
printf("Usage:\n\t%s host port [count=1000] [delay=0]\n",argv[0]);
return 1;
}
hostname = argv[1];
port = atoi(argv[2]);
if( argc > 3 )
count = atoi(argv[3]);
else
count = 1000;
if( argc > 4 )
delay = atoi(argv[4]);
else
delay = 0;
host = gethostbyname(hostname);
server_addr.sin_family = AF_INET;
server_addr.sin_port = htons(port);
server_addr.sin_addr = *((struct in_addr *)host->h_addr);
bzero(&(server_addr.sin_zero),8);
for(i=0; i< count; i=i+1){
gettimeofday(&tim, NULL);
start=tim.tv_sec*1000+(tim.tv_usec/1000);
if ((sock = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
perror("Socket");
exit(1);
}
if (connect(sock, (struct sockaddr *)&server_addr,
sizeof(struct sockaddr)) == -1)
{
perror("Connect");
exit(1);
}
getsockname(sock, (struct sockaddr *) &local, &len);
close(sock);
gettimeofday(&tim, NULL);
end=tim.tv_sec*1000+(tim.tv_usec/1000);
printf("[%d] %u-> Time to open socket (clock): %d\n",
i, ntohs(local.sin_port), (int)(end - start));
usleep(delay*1000);
}
/*
printf("Time to open socket (ms): %d\n", ((end - start)*1000)/CLOCKS_PER_SEC);
*/
return 0;
}
next reply other threads:[~2010-04-12 8:06 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-12 8:06 Thomas Graf [this message]
2010-04-12 8:39 ` [RFC] random SYN drops causing connect() delays Thomas Graf
2010-04-28 1:56 ` David Miller
2010-04-28 4:44 ` Thomas Graf
2010-04-28 5:52 ` Eric Dumazet
2010-04-28 6:11 ` Thomas Graf
2010-04-14 11:37 ` Lennart Schulte
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100412080633.GA27418@bombadil.infradead.org \
--to=tgraf@infradead.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).