netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leandro Lucarella <leandro.lucarella@sociomantic.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Doubts about listen backlog and tcp_max_syn_backlog
Date: Tue, 22 Jan 2013 17:59:29 +0100	[thread overview]
Message-ID: <20130122165929.GH4608@sociomantic.com> (raw)
In-Reply-To: <1358873142.3464.3964.camel@edumazet-glaptop>

On Tue, Jan 22, 2013 at 08:45:42AM -0800, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 17:10 +0100, Leandro Lucarella wrote:
> > Hi, I'm having some problems with missing SYNs in a server with a high
> > rate of incoming connections and, even when far from understanding the
> > kernel,  I ended up looking at the kernel's source to try to understand
> > better what's going on, because some stuff doesn't make a lot of sense
> > to me.
[snip]
> > 1. What's the relation between the socket backlog and the queue created
> >    by reqsk_queue_alloc()? Because the backlog is only adjusted not to
> >    be grater than sysctl_somaxconn, but the queue size can be quite
> >    different.
> > 2. The comment just above the definition of reqsk_queue_alloc() about
> >    sysctl_max_syn_backlog says "Maximum number of SYN_RECV sockets in
> >    queue per LISTEN socket.". But then nr_table_entries is not only
> >    rounded up to the next power of 2, is incremented by one before that,
> >    so a backlog of, for example, 128, would end up with 256 table
> >    entries even if sysctl_max_syn_backlog is 128.
> > 3. Why is there a nr_table_entries + 1 at all in there? Looking at the
> >    commit that introduced this[1] I can't find any explanation and I've
> >    read some big projects are using backlogs of 511 because of this[2].
> >    (which BTW, ff the queue is really a hash table, looks like an awful
> >    idea).
> > 4. I found some places sk->sk_ack_backlog is checked against
> >    sk->sk_max_ack_backlog to see if new requests should be dropped, but
> >    I also saw checks like inet_csk_reqsk_queue_young(sk) > 1 or
> >    inet_csk_reqsk_queue_is_full(sk), so I guess the queue is used too.
[snip]
> 
> What particular problem do you have ?

What I'm seeing are clients taking either useconds to connect, or 3
seconds, which suggest SYNs are getting lost, but the network doesn't
seem to be the problem. I'm still investigating this, so unfortunately
I'm not really sure.

> A serious rewrite of LISTEN code is needed, because the current
> implementation doesn't scale :
> 
> The SYNACK retransmits are done by a single timer wheel, holding the
> socket lock for too long. So increasing the backlog to 2^16 or 2^17 is
> not really an option.
> 
> Hash table are nice, but if we have to scan them, holding a single lock,
> they are not so nice.

So, the queue is really a hash table, then? So using any (2^n)-1 would
be a bad idea because when the backlog is next to full, the hash table
will be really slow? Is that why the + 1 is there? Is assuming everyone
will use a power of 2 an thus having a load factor of 0.5 at most?

-- 
Leandro Lucarella
Senior R&D Developer
-----------------------------------------------------------
sociomantic labs GmbH
Paul-Lincke-Ufer 39/40
10999 Berlin
DEUTSCHLAND
-----------------------------------------------------------
http://www.sociomantic.com
-----------------------------------------------------------
Fon:       +49 (0) 30 3087 4615
Fax:       +49 (0) 30 3087 4619
Mobile:    +49 (0)157 3636 7373
Skype:     llucarella
Twitter:   http://www.twitter.com/sociomantic
Facebook:  http://bit.ly/labsfacebook
-----------------------------------------------------------
sociomantic labs GmbH, Location: Berlin
Commercial Register - AG Charlottenburg: HRB 121302 B
VAT No. - USt-ID: DE 266262100
Managing Directors: Thomas Nicolai, Thomas Brandhoff

  reply	other threads:[~2013-01-22 16:59 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-22 16:10 Doubts about listen backlog and tcp_max_syn_backlog Leandro Lucarella
2013-01-22 16:45 ` Eric Dumazet
2013-01-22 16:59   ` Leandro Lucarella [this message]
2013-01-22 17:13     ` Eric Dumazet
2013-01-22 18:17       ` Rick Jones
2013-01-22 18:42         ` Leandro Lucarella
2013-01-22 22:01           ` Rick Jones
2013-01-23 10:47             ` Leandro Lucarella
2013-01-23 19:28               ` Rick Jones
2013-01-24 12:22                 ` Leandro Lucarella
2013-01-24 18:44                   ` Rick Jones
2013-01-24 19:21                     ` Leandro Lucarella
2013-01-25  6:12                       ` Nivedita SInghvi
2013-01-25 10:05                         ` Leandro Lucarella
2013-01-28  2:48                           ` Nivedita Singhvi
2013-01-28  5:21                             ` Vijay Subramanian
2013-01-28 14:40                               ` Leandro Lucarella
2013-01-28 13:08                             ` Leandro Lucarella
2013-01-23 20:48               ` Vijay Subramanian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130122165929.GH4608@sociomantic.com \
    --to=leandro.lucarella@sociomantic.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).