linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Mark Hills <mark@pogo.org.uk>
Cc: Neil Brown <neilb@suse.de>, linux-nfs@vger.kernel.org
Subject: Re: Listen backlog set to 64
Date: Tue, 30 Nov 2010 15:00:13 -0500	[thread overview]
Message-ID: <20101130200013.GA2108@fieldses.org> (raw)
In-Reply-To: <alpine.NEB.2.01.1011301705220.26378@jrf.vwaro.pbz>

On Tue, Nov 30, 2010 at 05:50:52PM +0000, Mark Hills wrote:
> On Mon, 29 Nov 2010, J. Bruce Fields wrote:
> 
> > On Wed, Nov 17, 2010 at 09:08:26AM +1100, Neil Brown wrote:
> > > On Tue, 16 Nov 2010 13:20:26 -0500
> > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > 
> > > > On Mon, Nov 15, 2010 at 06:43:52PM +0000, Mark Hills wrote:
> > > > > I am looking into an issue of hanging clients to a set of NFS servers, on 
> > > > > a large HPC cluster.
> > > > > 
> > > > > My investigation took me to the RPC code, svc_create_socket().
> > > > > 
> > > > > 	if (protocol == IPPROTO_TCP) {
> > > > > 		if ((error = kernel_listen(sock, 64)) < 0)
> > > > > 			goto bummer;
> > > > > 	}
> > > > > 
> > > > > A fixed backlog of 64 connections at the server seems like it could be too 
> > > > > low on a cluster like this, particularly when the protocol opens and 
> > > > > closes the TCP connection.
> [...] 
> > > So we could:
> > >   - hard code a new number
> > >   - make this another sysctl configurable
> > >   - auto-adjust it so that it "just works".
> > > 
> > > I would prefer the latter if it is possible.   Possibly we could adjust it
> > > based on the number of nfsd threads, like we do for receive buffer space.
> > > Maybe something arbitrary like:
> > >    min(16 + 2 * number of threads, sock_net(sk)->core.sysctl_somaxconn)
> > > 
> > > which would get the current 64 at 24 threads, and can easily push up to 128
> > > and beyond with more threads.
> > > 
> > > Or is that too arbitrary?
> > 
> > I kinda like the idea of piggybacking on an existing constant like
> > sysctl_max_syn_backlog.  Somebody else hopefully keeps it set to something
> > reasonable, and we as a last resort it gives you a knob to twiddle.
> > 
> > But number of threads would work OK too.
> > 
> > At a minimum we should make sure we solve the original problem....
> > Mark, have you had a chance to check whether increasing that number to
> > 128 or more is enough to solve your problem?
> 
> I think we can hold off changing the queue size, for now at least. We 
> reduced the reported queue overflows by increasing the number of mountd 
> threads, allowing it to service the queue more quickly.

Apologies, I should have thought to suggest that at the start.

> However this did 
> not fix the common problem, and I was hoping to have more information in 
> this follow-up email.
> 
> Our investigation brings us to rpc.mountd and mount.nfs communicating. In 
> the client log we see messages like:
> 
>   Nov 24 12:09:43 nyrd001 automount[3782]: >> mount.nfs: mount to NFS server 'ss1a:/mnt/raid1/banana' failed: timed out, giving up
> 
> Using strace and isolating one of these, I can see a non-blocking connect 
> has already managed to make a connection and even send/receive some data. 
> 
> But soon a timeout of 9999 milliseconds in poll() causes a problem in 
> mount.nfs when waiting for a response of some sort. The socket in question 
> is a connection to mountd:
> 
>   26512 futex(0x7ff76affa540, FUTEX_WAKE_PRIVATE, 1) = 0
>   26512 write(3, "\200\0\0(j\212\254\365\0\0\0\0\0\0\0\2\0\1\206\245\0\0\0\3\0\0\0\0\0\0\0\0"..., 44) = 44
>   26512 poll([{fd=3, events=POLLIN}], 1, 9999 <unfinished ...>
> 
> When it returns:
> 
>   26512 <... poll resumed> )              = 0 (Timeout)
>   26512 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
>   26512 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
>   26512 close(3)                          = 0
>   26512 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
>   26512 write(2, "mount.nfs: mount to NFS server '"..., 100) = 100
> 
> There's no re-try from here, just a failed mount.

That does sound wrong.  I'm not at all familiar with automount,
unfortunately; how is it invoking mount.nfs?

> What is the source of this 9999 millisecond timeout used by poll() in 
> mount.nfs? It was not clear in an initial search of nfs-utils and glibc, 
> but I need more time to investigate.
> 
> If the server is being too slow to respond, what could the cause of this 
> be? Multiple threads are already in use, but it seems like they are not 
> all in use because a thread is able to accept() the connection. I haven't 
> been able to pin this on the forward/reverse DNS lookup used by 
> authentication and logging.

Can you tell where the mountd threads are typically waiting?

--b.

  reply	other threads:[~2010-11-30 20:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-15 18:43 Listen backlog set to 64 Mark Hills
2010-11-16 18:20 ` J. Bruce Fields
2010-11-16 19:05   ` Mark Hills
2010-11-16 22:08   ` Neil Brown
2010-11-29 20:59     ` J. Bruce Fields
2010-11-30 17:50       ` Mark Hills
2010-11-30 20:00         ` J. Bruce Fields [this message]
2010-11-30 22:09           ` Mark Hills
2010-12-01 18:18           ` Mark Hills
2010-12-01 18:28             ` Chuck Lever
2010-12-01 18:46               ` J. Bruce Fields
2010-12-08 14:45               ` mount.nfs timeout of 9999ms (was Re: Listen backlog set to 64) Mark Hills
2010-12-08 15:38                 ` J. Bruce Fields
2010-12-08 16:45                 ` Chuck Lever
2010-12-08 17:31                   ` Mark Hills
2010-12-08 18:28                     ` Chuck Lever
2010-12-08 18:37                       ` J. Bruce Fields
2010-12-08 20:34                         ` Chuck Lever
2010-12-08 21:04                         ` Chuck Lever
2010-12-13 16:19                       ` Chuck Lever
2010-12-01 18:36             ` Listen backlog set to 64 J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101130200013.GA2108@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mark@pogo.org.uk \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).