From: "J. Bruce Fields" <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Mark Hills <mark@pogo.org.uk>, Neil Brown <neilb@suse.de>,
linux-nfs@vger.kernel.org
Subject: Re: Listen backlog set to 64
Date: Wed, 1 Dec 2010 13:46:57 -0500 [thread overview]
Message-ID: <20101201184656.GD17142@fieldses.org> (raw)
In-Reply-To: <F5FCB254-E6CE-4B71-ABD9-33F098F9E0BA@oracle.com>
On Wed, Dec 01, 2010 at 01:28:40PM -0500, Chuck Lever wrote:
>
> On Dec 1, 2010, at 1:18 PM, Mark Hills wrote:
>
> > On Tue, 30 Nov 2010, J. Bruce Fields wrote:
> >
> >> On Tue, Nov 30, 2010 at 05:50:52PM +0000, Mark Hills wrote:
> > [...]
> >>> Our investigation brings us to rpc.mountd and mount.nfs communicating. In
> >>> the client log we see messages like:
> >>>
> >>> Nov 24 12:09:43 nyrd001 automount[3782]: >> mount.nfs: mount to NFS server 'ss1a:/mnt/raid1/banana' failed: timed out, giving up
> >>>
> >>> Using strace and isolating one of these, I can see a non-blocking connect
> >>> has already managed to make a connection and even send/receive some data.
> >>>
> >>> But soon a timeout of 9999 milliseconds in poll() causes a problem in
> >>> mount.nfs when waiting for a response of some sort. The socket in question
> >>> is a connection to mountd:
> >>>
> >>> 26512 futex(0x7ff76affa540, FUTEX_WAKE_PRIVATE, 1) = 0
> >>> 26512 write(3, "\200\0\0(j\212\254\365\0\0\0\0\0\0\0\2\0\1\206\245\0\0\0\3\0\0\0\0\0\0\0\0"..., 44) = 44
> >>> 26512 poll([{fd=3, events=POLLIN}], 1, 9999 <unfinished ...>
> >>>
> >>> When it returns:
> >>>
> >>> 26512 <... poll resumed> ) = 0 (Timeout)
> >>> 26512 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> >>> 26512 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
> >>> 26512 close(3) = 0
> >>> 26512 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> >>> 26512 write(2, "mount.nfs: mount to NFS server '"..., 100) = 100
> >>>
> >>> There's no re-try from here, just a failed mount.
> >>
> >> That does sound wrong. I'm not at all familiar with automount,
> >> unfortunately; how is it invoking mount.nfs?
> >>
> >>> What is the source of this 9999 millisecond timeout used by poll() in
> >>> mount.nfs? It was not clear in an initial search of nfs-utils and glibc,
> >>> but I need more time to investigate.
> >>>
> >>> If the server is being too slow to respond, what could the cause of this
> >>> be? Multiple threads are already in use, but it seems like they are not
> >>> all in use because a thread is able to accept() the connection. I haven't
> >>> been able to pin this on the forward/reverse DNS lookup used by
> >>> authentication and logging.
> >>
> >> Can you tell where the mountd threads are typically waiting?
> >
> > Here's a trace from mountd. Note the long pause after fdatasync():
> >
> > 31156 12:09:40 open("/var/lib/nfs/rmtab", O_WRONLY|O_CREAT|O_APPEND, 0666) = 14 <0.000010>
> > 31156 12:09:40 fstat(14, {st_mode=S_IFREG|0644, st_size=62642, ...}) = 0 <0.000005>
> > 31156 12:09:40 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af00c929000 <0.000007>
> > 31156 12:09:40 fstat(14, {st_mode=S_IFREG|0644, st_size=62642, ...}) = 0 <0.000005>
> > 31156 12:09:40 lseek(14, 62642, SEEK_SET) = 62642 <0.000006>
> > 31156 12:09:40 write(14, "192.168.14.165:/mnt/raid1/abcde_"..., 57) = 57 <0.000013>
> > 31156 12:09:40 fdatasync(14) = 0 <15.413401>
> > 31156 12:09:55 close(14) = 0 <0.000008>
> > 31156 12:09:55 munmap(0x2af00c929000, 4096) = 0 <0.000083>
> > 31156 12:09:55 close(13) = 0 <0.000008>
> >
> > This pause exceeds the mysterious 9999 millisecond timeout allowed at the
> > client.
> >
> > fdatasync() is happening at every write of rmtab -- a lot. Occasionally it
> > takes a long time. The syncing of the rmtab seems a reasonable explanation
> > why more threads does not help -- all mountd threads lock up.
> >
> > The strace I am analysing here contains 670 calls to fdatasync() in a 16
> > minute period. All of them take < 1 second, except for this burst:
> >
> > 31156 12:09:11 fdatasync(14) = 0 <0.000010>
> > 31156 12:09:11 fdatasync(15) = 0 <0.136285>
> > 31156 12:09:12 fdatasync(14) = 0 <0.000010>
> > 31156 12:09:12 fdatasync(15) = 0 <14.911323>
> > 31156 12:09:27 fdatasync(14) = 0 <1.298896>
> > 31156 12:09:29 fdatasync(14) = 0 <1.159262>
> > 31156 12:09:33 fdatasync(14) = 0 <1.447169>
> > 31156 12:09:35 fdatasync(14) = 0 <1.345562>
> > 31156 12:09:40 fdatasync(14) = 0 <0.000010>
> > 31156 12:09:40 fdatasync(14) = 0 <15.413401>
> > 31156 12:09:55 fdatasync(17) = 0 <0.000010>
> > 31156 12:09:55 fdatasync(17) = 0 <1.068503>
> > 31156 12:09:57 fdatasync(18) = 0 <0.000010>
> > 31156 12:09:57 fdatasync(18) = 0 <0.089682>
> >
> > /var/lib/nfs/rmtab[.tmp] is on an ext3 filesystem, the system disk.
> >
> > The server is only acting as a 'regular' and dedicated file server, but
> > presumably some other process is causing /var to become erratic. Unless
> > continual writing and syncing to a small (64KiB) file could cause this to
> > happen periodically?
>
> fdatasync() will write all outstanding dirty data on that partition to persistent storage, not just the dirty data for that file. If /var is on a partition that has a lot of other activity, then this is the expected outcome, unfortunately. I would expect such delays to be due to a large amount of outstanding writes, not to periodic synchronous writes.
>
> This problem may be addressed in recent versions of nfs-utils. I see a commit from Ben Myers on February 12 of this year that seems on point. nfs-utils-1.2.2 has it, I think?
Doh, yes, looks like the code I quoted went into nfs-utils 1.2.2 with
Ben's commit.
--b.
next prev parent reply other threads:[~2010-12-01 18:47 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-15 18:43 Listen backlog set to 64 Mark Hills
2010-11-16 18:20 ` J. Bruce Fields
2010-11-16 19:05 ` Mark Hills
2010-11-16 22:08 ` Neil Brown
2010-11-29 20:59 ` J. Bruce Fields
2010-11-30 17:50 ` Mark Hills
2010-11-30 20:00 ` J. Bruce Fields
2010-11-30 22:09 ` Mark Hills
2010-12-01 18:18 ` Mark Hills
2010-12-01 18:28 ` Chuck Lever
2010-12-01 18:46 ` J. Bruce Fields [this message]
2010-12-08 14:45 ` mount.nfs timeout of 9999ms (was Re: Listen backlog set to 64) Mark Hills
2010-12-08 15:38 ` J. Bruce Fields
2010-12-08 16:45 ` Chuck Lever
2010-12-08 17:31 ` Mark Hills
2010-12-08 18:28 ` Chuck Lever
2010-12-08 18:37 ` J. Bruce Fields
2010-12-08 20:34 ` Chuck Lever
2010-12-08 21:04 ` Chuck Lever
2010-12-13 16:19 ` Chuck Lever
2010-12-01 18:36 ` Listen backlog set to 64 J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101201184656.GD17142@fieldses.org \
--to=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=mark@pogo.org.uk \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).