linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Neil Brown <neilb@suse.de>
Cc: Mark Hills <mark@pogo.org.uk>, linux-nfs@vger.kernel.org
Subject: Re: Listen backlog set to 64
Date: Mon, 29 Nov 2010 15:59:35 -0500	[thread overview]
Message-ID: <20101129205935.GD9897@fieldses.org> (raw)
In-Reply-To: <20101117090826.4b2724da@notabene.brown>

On Wed, Nov 17, 2010 at 09:08:26AM +1100, Neil Brown wrote:
> On Tue, 16 Nov 2010 13:20:26 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Mon, Nov 15, 2010 at 06:43:52PM +0000, Mark Hills wrote:
> > > I am looking into an issue of hanging clients to a set of NFS servers, on 
> > > a large HPC cluster.
> > > 
> > > My investigation took me to the RPC code, svc_create_socket().
> > > 
> > > 	if (protocol == IPPROTO_TCP) {
> > > 		if ((error = kernel_listen(sock, 64)) < 0)
> > > 			goto bummer;
> > > 	}
> > > 
> > > A fixed backlog of 64 connections at the server seems like it could be too 
> > > low on a cluster like this, particularly when the protocol opens and 
> > > closes the TCP connection.
> > > 
> > > I wondered what is the rationale is behind this number, particuarly as it 
> > > is a fixed value. Perhaps there is a reason why this has no effect on 
> > > nfsd, or is this a FAQ for people on large systems?
> > > 
> > > The servers show overflow of a listening queue, which I imagine is 
> > > related.
> > > 
> > >   $ netstat -s
> > >   [...]
> > >   TcpExt:
> > >     6475 times the listen queue of a socket overflowed
> > >     6475 SYNs to LISTEN sockets ignored
> > > 
> > > The affected servers are old, kernel 2.6.9. But this limit of 64 is 
> > > consistent across that and the latest kernel source.
> > 
> > Looks like the last time that was touched was 8 years ago, by Neil (below, from
> > historical git archive).
> > 
> > I'd be inclined to just keep doubling it until people don't complain,
> > unless it's very expensive.  (How much memory (or whatever else) does a
> > pending connection tie up?)
> 
> Surely we should "keep multiplying by 13" as that is what I did :-)
> 
> There is a sysctl 'somaxconn' which limits what a process can ask for in the
> listen() system call, but as we bypass this syscall it doesn't directly
> affect nfsd.
> It defaults to SOMAXCONN == 128 but can be raised arbitrarily by the sysadmin.
> 
> There is another sysctl 'max_syn_backlog' which looks like a system-wide
> limit to the connect backlog.
> This defaults to 256.  The comment says it is
> adjusted between 128 and 1024 based on memory size, though that isn't clear
> in the code (to me at least).

This comment?:

/*
 * Maximum number of SYN_RECV sockets in queue per LISTEN socket.
 * One SYN_RECV socket costs about 80bytes on a 32bit machine.
 * It would be better to replace it with a global counter for all sockets
 * but then some measure against one socket starving all other sockets
 * would be needed.
 *
 * It was 128 by default. Experiments with real servers show, that
 * it is absolutely not enough even at 100conn/sec. 256 cures most
 * of problems. This value is adjusted to 128 for very small machines
 * (<=32Mb of memory) and to 1024 on normal or better ones (>=256Mb).
 * Note : Dont forget somaxconn that may limit backlog too.
 */
int sysctl_max_syn_backlog = 256;

Looks like net/ipv4/tcp.c:tcp_init() does the memory-based calculation.

80 bytes sounds small.

> So we could:
>   - hard code a new number
>   - make this another sysctl configurable
>   - auto-adjust it so that it "just works".
> 
> I would prefer the latter if it is possible.   Possibly we could adjust it
> based on the number of nfsd threads, like we do for receive buffer space.
> Maybe something arbitrary like:
>    min(16 + 2 * number of threads, sock_net(sk)->core.sysctl_somaxconn)
> 
> which would get the current 64 at 24 threads, and can easily push up to 128
> and beyond with more threads.
> 
> Or is that too arbitrary?

I kinda like the idea of piggybacking on an existing constant like
sysctl_max_syn_backlog.  Somebody else hopefully keeps it set to something
reasonable, and we as a last resort it gives you a knob to twiddle.

But number of threads would work OK too.

At a minimum we should make sure we solve the original problem....
Mark, have you had a chance to check whether increasing that number to
128 or more is enough to solve your problem?

--b.

  reply	other threads:[~2010-11-29 20:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-15 18:43 Listen backlog set to 64 Mark Hills
2010-11-16 18:20 ` J. Bruce Fields
2010-11-16 19:05   ` Mark Hills
2010-11-16 22:08   ` Neil Brown
2010-11-29 20:59     ` J. Bruce Fields [this message]
2010-11-30 17:50       ` Mark Hills
2010-11-30 20:00         ` J. Bruce Fields
2010-11-30 22:09           ` Mark Hills
2010-12-01 18:18           ` Mark Hills
2010-12-01 18:28             ` Chuck Lever
2010-12-01 18:46               ` J. Bruce Fields
2010-12-08 14:45               ` mount.nfs timeout of 9999ms (was Re: Listen backlog set to 64) Mark Hills
2010-12-08 15:38                 ` J. Bruce Fields
2010-12-08 16:45                 ` Chuck Lever
2010-12-08 17:31                   ` Mark Hills
2010-12-08 18:28                     ` Chuck Lever
2010-12-08 18:37                       ` J. Bruce Fields
2010-12-08 20:34                         ` Chuck Lever
2010-12-08 21:04                         ` Chuck Lever
2010-12-13 16:19                       ` Chuck Lever
2010-12-01 18:36             ` Listen backlog set to 64 J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101129205935.GD9897@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mark@pogo.org.uk \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).