From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754724Ab3AVQKs (ORCPT ); Tue, 22 Jan 2013 11:10:48 -0500 Received: from moutng.kundenserver.de ([212.227.17.8]:59709 "EHLO moutng.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754659Ab3AVQKm (ORCPT ); Tue, 22 Jan 2013 11:10:42 -0500 Date: Tue, 22 Jan 2013 17:10:38 +0100 From: Leandro Lucarella To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Doubts about listen backlog and tcp_max_syn_backlog Message-ID: <20130122161038.GG4608@sociomantic.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Paranoid: Just because you're paranoid, don't mean they're not after you. User-Agent: Mutt/1.5.21 (2010-09-15) X-Provags-ID: V02:K0:4MJv7z6U11T4BOdAtDc8w/4rhIIFwstsRZtvEIlJhos nTLB4+oCE8eKmTI9oNQNk1ZpLo/xKauQvkM6yGN/lWtAPmo2t5 jvikJ4NWA6jaY3s9LUmMVd3xTG2dgaafYYeViUSVDMgJQWJ/vx 9lfu8Ayle5+CupoASmoQOV6Rxh5KAEJUSCeNQpAsSeFBTPdZPL zmwgz8Gi87RwspJijk+92+Vh8trQ4vwMCR4zdSlZAsT6We6rZc 97yfP+Tns9VplJGHUdGn1Vy3ycOIG6P4OaDrK5HZOH0QpV5cjv WbzEidQItYiWwpXEcXtCcK4ZBkAmAJMfF5snajF7dUfZvzWRCx xqnkeCwu6uyzsogmj7I0DAk6Q8W4drWbn55XA9pMOxOE4U6Tkj /dueiId2K1nSQ== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, I'm having some problems with missing SYNs in a server with a high rate of incoming connections and, even when far from understanding the kernel, I ended up looking at the kernel's source to try to understand better what's going on, because some stuff doesn't make a lot of sense to me. The path I followed is this (line numbers for Linux 3.7): net/socket.c[3] SYSCALL_DEFINE2(listen, int, fd, int, backlog) backlog is truncated to sysctl_somaxconn and sock->ops->listen(sock, backlog) is called, which I guess it calls to inet_listen(). net/ipv4/af_inet.c[4] int inet_listen(struct socket *sock, int backlog) the backlog is assigned to sk->sk_max_ack_backlog and inet_csk_listen_start(sk, backlog) is called (if the socket wans't already in TCP_LISTEN state) net/ipv4/inet_connection_sock.c[5] int inet_csk_listen_start(struct sock *sk, const int nr_table_entries) reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries) is called, which I guess it creates the actual queue net/core/request_sock.c[6] int reqsk_queue_alloc(struct request_sock_queue *queue, unsigned int nr_table_entries) nr_table_entries is first adjusted to satisfy: 8 <= nr_table_entries <= sysctl_max_syn_backlog and then incremented by one and rounded up to the next power of 2. So here are a couple of questions: 1. What's the relation between the socket backlog and the queue created by reqsk_queue_alloc()? Because the backlog is only adjusted not to be grater than sysctl_somaxconn, but the queue size can be quite different. 2. The comment just above the definition of reqsk_queue_alloc() about sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in queue per LISTEN socket.". But then nr_table_entries is not only rounded up to the next power of 2, is incremented by one before that, so a backlog of, for example, 128, would end up with 256 table entries even if sysctl_max_syn_backlog is 128. 3. Why is there a nr_table_entries + 1 at all in there? Looking at the commit that introduced this[1] I can't find any explanation and I've read some big projects are using backlogs of 511 because of this[2]. (which BTW, ff the queue is really a hash table, looks like an awful idea). 4. I found some places sk->sk_ack_backlog is checked against sk->sk_max_ack_backlog to see if new requests should be dropped, but I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too. Thanks a lot. [1] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=72a3effaf633bcae9034b7e176bdbd78d64a71db [2] http://blog.dubbelboer.com/2012/04/09/syn-cookies.html#a_reasonably_backlog_size [3] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/socket.c;h=2ca51c719ef984cdadef749008456cf7bd5e1ae4;hb=HEAD#l1544 [4] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/af_inet.c;h=24b384b7903ea7a59a11e7a4cbf06db996498924;hb=HEAD#l192 [5] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/ipv4/inet_connection_sock.c;h=d0670f00d5243f95bec4536f60edf32fa2ded850;hb=HEAD#l729 [6] http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=net/core/request_sock.c;h=c31d9e8668c30346894adbf3be55eed4beeb1258;hb=HEAD#l23 -- Leandro Lucarella sociomantic labs GmbH http://www.sociomantic.com