Re: Listen backlog set to 64

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "J. Bruce Fields" <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Mark Hills <mark@pogo.org.uk>, Neil Brown <neilb@suse.de>,
	linux-nfs@vger.kernel.org
Subject: Re: Listen backlog set to 64
Date: Wed, 1 Dec 2010 13:46:57 -0500	[thread overview]
Message-ID: <20101201184656.GD17142@fieldses.org> (raw)
In-Reply-To: <F5FCB254-E6CE-4B71-ABD9-33F098F9E0BA@oracle.com>

On Wed, Dec 01, 2010 at 01:28:40PM -0500, Chuck Lever wrote:
> 
> On Dec 1, 2010, at 1:18 PM, Mark Hills wrote:
> 
> > On Tue, 30 Nov 2010, J. Bruce Fields wrote:
> > 
> >> On Tue, Nov 30, 2010 at 05:50:52PM +0000, Mark Hills wrote:
> > [...]
> >>> Our investigation brings us to rpc.mountd and mount.nfs communicating. In 
> >>> the client log we see messages like:
> >>> 
> >>>  Nov 24 12:09:43 nyrd001 automount[3782]: >> mount.nfs: mount to NFS server 'ss1a:/mnt/raid1/banana' failed: timed out, giving up
> >>> 
> >>> Using strace and isolating one of these, I can see a non-blocking connect 
> >>> has already managed to make a connection and even send/receive some data. 
> >>> 
> >>> But soon a timeout of 9999 milliseconds in poll() causes a problem in 
> >>> mount.nfs when waiting for a response of some sort. The socket in question 
> >>> is a connection to mountd:
> >>> 
> >>>  26512 futex(0x7ff76affa540, FUTEX_WAKE_PRIVATE, 1) = 0
> >>>  26512 write(3, "\200\0\0(j\212\254\365\0\0\0\0\0\0\0\2\0\1\206\245\0\0\0\3\0\0\0\0\0\0\0\0"..., 44) = 44
> >>>  26512 poll([{fd=3, events=POLLIN}], 1, 9999 <unfinished ...>
> >>> 
> >>> When it returns:
> >>> 
> >>>  26512 <... poll resumed> )              = 0 (Timeout)
> >>>  26512 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> >>>  26512 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
> >>>  26512 close(3)                          = 0
> >>>  26512 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> >>>  26512 write(2, "mount.nfs: mount to NFS server '"..., 100) = 100
> >>> 
> >>> There's no re-try from here, just a failed mount.
> >> 
> >> That does sound wrong.  I'm not at all familiar with automount,
> >> unfortunately; how is it invoking mount.nfs?
> >> 
> >>> What is the source of this 9999 millisecond timeout used by poll() in 
> >>> mount.nfs? It was not clear in an initial search of nfs-utils and glibc, 
> >>> but I need more time to investigate.
> >>> 
> >>> If the server is being too slow to respond, what could the cause of this 
> >>> be? Multiple threads are already in use, but it seems like they are not 
> >>> all in use because a thread is able to accept() the connection. I haven't 
> >>> been able to pin this on the forward/reverse DNS lookup used by 
> >>> authentication and logging.
> >> 
> >> Can you tell where the mountd threads are typically waiting?
> > 
> > Here's a trace from mountd. Note the long pause after fdatasync():
> > 
> > 31156 12:09:40 open("/var/lib/nfs/rmtab", O_WRONLY|O_CREAT|O_APPEND, 0666) = 14 <0.000010>
> > 31156 12:09:40 fstat(14, {st_mode=S_IFREG|0644, st_size=62642, ...}) = 0 <0.000005>
> > 31156 12:09:40 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af00c929000 <0.000007>
> > 31156 12:09:40 fstat(14, {st_mode=S_IFREG|0644, st_size=62642, ...}) = 0 <0.000005>
> > 31156 12:09:40 lseek(14, 62642, SEEK_SET) = 62642 <0.000006>
> > 31156 12:09:40 write(14, "192.168.14.165:/mnt/raid1/abcde_"..., 57) = 57 <0.000013>
> > 31156 12:09:40 fdatasync(14)            = 0 <15.413401>
> > 31156 12:09:55 close(14)                = 0 <0.000008>
> > 31156 12:09:55 munmap(0x2af00c929000, 4096) = 0 <0.000083>
> > 31156 12:09:55 close(13)                = 0 <0.000008>
> > 
> > This pause exceeds the mysterious 9999 millisecond timeout allowed at the 
> > client.
> > 
> > fdatasync() is happening at every write of rmtab -- a lot. Occasionally it 
> > takes a long time. The syncing of the rmtab seems a reasonable explanation 
> > why more threads does not help -- all mountd threads lock up.
> > 
> > The strace I am analysing here contains 670 calls to fdatasync() in a 16 
> > minute period. All of them take < 1 second, except for this burst:
> > 
> > 31156 12:09:11 fdatasync(14)            = 0 <0.000010>
> > 31156 12:09:11 fdatasync(15)            = 0 <0.136285>
> > 31156 12:09:12 fdatasync(14)            = 0 <0.000010>
> > 31156 12:09:12 fdatasync(15)            = 0 <14.911323>
> > 31156 12:09:27 fdatasync(14)            = 0 <1.298896>
> > 31156 12:09:29 fdatasync(14)            = 0 <1.159262>
> > 31156 12:09:33 fdatasync(14)            = 0 <1.447169>
> > 31156 12:09:35 fdatasync(14)            = 0 <1.345562>
> > 31156 12:09:40 fdatasync(14)            = 0 <0.000010>
> > 31156 12:09:40 fdatasync(14)            = 0 <15.413401>
> > 31156 12:09:55 fdatasync(17)            = 0 <0.000010>
> > 31156 12:09:55 fdatasync(17)            = 0 <1.068503>
> > 31156 12:09:57 fdatasync(18)            = 0 <0.000010>
> > 31156 12:09:57 fdatasync(18)            = 0 <0.089682>
> > 
> > /var/lib/nfs/rmtab[.tmp] is on an ext3 filesystem, the system disk.
> > 
> > The server is only acting as a 'regular' and dedicated file server, but 
> > presumably some other process is causing /var to become erratic. Unless 
> > continual writing and syncing to a small (64KiB) file could cause this to 
> > happen periodically?
> 
> fdatasync() will write all outstanding dirty data on that partition to persistent storage, not just the dirty data for that file.  If /var is on a partition that has a lot of other activity, then this is the expected outcome, unfortunately.  I would expect such delays to be due to a large amount of outstanding writes, not to periodic synchronous writes.
> 
> This problem may be addressed in recent versions of nfs-utils.  I see a commit from Ben Myers on February 12 of this year that seems on point.  nfs-utils-1.2.2 has it, I think?

Doh, yes, looks like the code I quoted went into nfs-utils 1.2.2 with
Ben's commit.

--b.

next prev parent reply	other threads:[~2010-12-01 18:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-15 18:43 Listen backlog set to 64 Mark Hills
2010-11-16 18:20 ` J. Bruce Fields
2010-11-16 19:05   ` Mark Hills
2010-11-16 22:08   ` Neil Brown
2010-11-29 20:59     ` J. Bruce Fields
2010-11-30 17:50       ` Mark Hills
2010-11-30 20:00         ` J. Bruce Fields
2010-11-30 22:09           ` Mark Hills
2010-12-01 18:18           ` Mark Hills
2010-12-01 18:28             ` Chuck Lever
2010-12-01 18:46               ` J. Bruce Fields [this message]
2010-12-08 14:45               ` mount.nfs timeout of 9999ms (was Re: Listen backlog set to 64) Mark Hills
2010-12-08 15:38                 ` J. Bruce Fields
2010-12-08 16:45                 ` Chuck Lever
2010-12-08 17:31                   ` Mark Hills
2010-12-08 18:28                     ` Chuck Lever
2010-12-08 18:37                       ` J. Bruce Fields
2010-12-08 20:34                         ` Chuck Lever
2010-12-08 21:04                         ` Chuck Lever
2010-12-13 16:19                       ` Chuck Lever
2010-12-01 18:36             ` Listen backlog set to 64 J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101201184656.GD17142@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mark@pogo.org.uk \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.