From: Dave Chinner <david@fromorbit.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Brian Foster <bfoster@redhat.com>, Christoph Hellwig <hch@lst.de>,
it+linux-nfs@molgen.mpg.de, linux-nfs@vger.kernel.org,
linux-xfs@vger.kernel.org,
"J. Bruce Fields" <bfields@fieldses.org>,
Jeff Layton <jlayton@poochiereds.net>
Subject: Re: Locking problems with Linux 4.9 and 4.11 with NFSD and `fs/iomap.c`
Date: Wed, 2 Aug 2017 08:51:44 +1000 [thread overview]
Message-ID: <20170801225144.GP17762@dastard> (raw)
In-Reply-To: <979473d1-9e8a-51ba-28d9-9ace63f8105b@molgen.mpg.de>
On Tue, Aug 01, 2017 at 07:49:50PM +0200, Paul Menzel wrote:
> Dear Brian, dear Christoph,
>
>
> On 06/27/17 13:59, Paul Menzel wrote:
>
> >Just a small update that we were hit by the problem on a different
> >machine (identical model) with Linux 4.9.32 and the exact same
> >symptoms.
> >
> >```
> >$ sudo cat /proc/2085/stack
> >[<ffffffff811f920c>] iomap_write_begin+0x8c/0x120
> >[<ffffffff811f982b>] iomap_zero_range_actor+0xeb/0x210
> >[<ffffffff811f9a82>] iomap_apply+0xa2/0x110
> >[<ffffffff811f9c58>] iomap_zero_range+0x58/0x80
> >[<ffffffff8133c7de>] xfs_zero_eof+0x4e/0xb0
> >[<ffffffff8133c9dd>] xfs_file_aio_write_checks+0x19d/0x1c0
> >[<ffffffff8133ce89>] xfs_file_buffered_aio_write+0x79/0x2d0
> >[<ffffffff8133d17e>] xfs_file_write_iter+0x9e/0x150
> >[<ffffffff81198dc0>] do_iter_readv_writev+0xa0/0xf0
> >[<ffffffff81199fba>] do_readv_writev+0x18a/0x230
> >[<ffffffff8119a2ac>] vfs_writev+0x3c/0x50
> >[<ffffffffffffffff>] 0xffffffffffffffff
> >```
> >
> >We haven’t had time to set up a test system yet to analyze that further.
>
> Today, two systems with Linux 4.9.23 exhibited the problem of `top`
> showing that `nfsd` is at 100 %. Restarting one machine into Linux
> *4.9.38* showed the same problem. One of them with a 1 GBit/s
> network device got traffic from a 10 GBit/s system, so the
> connection was saturated.
So the question is this: is there IO being issued here, is the page
cache growing, or is it in a tight loop doing nothing? Details of
your hardware, XFS config and NFS server config is kinda important
here, too.
For example, if the NFS server IO patterns trigger a large
speculative delayed allocation, then the client does a write at the
end of the speculative delalloc range, we will zero the entire
speculative delalloc range. That could be several GB of zeros that
need to be written here. It's sub-optimal, yes, and but large
zeroing is rare enough that we haven't needed to optimise it by
allocating unwritten extents instead. It would be really handy to
know what application the NFS client is running as that might give
insight into the trigger behaviour and whether you are hitting this
case.
Also, if the NFS client is only writing to one file, then all the
other writes that are on the wire will end up being serviced by nfsd
threads that then block waiting for the inode lock. If the client
issues more writes on the wire thant he NFS server has worker
threads, the client side write will starve the NFS server of
worker threads until the zeroing completes. This is the behaviour
you are seeing - it's a common server side config error that's been
known for at least 15 years...
FWIW, it used to be that a linux NFS client could have 16 concurrent
outstanding NFS RPCs to a server at a time - I don't know if that
limit still exists or whether it's been increased. However, the
typical knfsd default is (still) only 8 worker threads, meaning a
single client and server using default configs can cause the above
server DOS issue. e.g on a bleeding edge debian distro install:
$ head -2 /etc/default/nfs-kernel-server
# Number of servers to start up
RPCNFSDCOUNT=8
$
So, yeah, distros still only configure the nfs server with 8 worker
thread by default. If it's a dedicated NFS server, then I'd be using
somewhere around 64 NFSD threads *per CPU* as a starting point for
the server config...
At minimum, you need to ensure that the NFS server has at least
double the number of server threads as the largest client side
concurrent RPC count so that a single client can't DOS the NFS
server with a single blocked write stream.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-08-01 22:51 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-07 19:09 Locking problems with Linux 4.9 with NFSD and `fs/iomap.c` Paul Menzel
2017-05-08 13:18 ` Brian Foster
2017-05-09 9:05 ` Christoph Hellwig
[not found] ` <7ae18b0d-38e3-9b12-0989-ede68956ad43@molgen.mpg.de>
[not found] ` <358037e8-6784-ebca-9fbb-ec7eef3977d6@molgen.mpg.de>
[not found] ` <20170510171757.GA10534@localhost.localdomain>
2017-06-27 11:59 ` Locking problems with Linux 4.9 and 4.11 " Paul Menzel
2017-06-28 16:41 ` Christoph Hellwig
2017-08-01 17:49 ` Paul Menzel
2017-08-01 22:51 ` Dave Chinner [this message]
2017-08-10 14:11 ` Paul Menzel
2017-08-10 19:54 ` AW: " Markus Stockhausen
2017-08-11 10:15 ` Christoph Hellwig
2017-08-11 15:14 ` Paul Menzel
2017-05-10 9:08 ` Locking problems with Linux 4.9 " Paul Menzel
2017-05-10 17:23 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170801225144.GP17762@dastard \
--to=david@fromorbit.com \
--cc=bfields@fieldses.org \
--cc=bfoster@redhat.com \
--cc=hch@lst.de \
--cc=it+linux-nfs@molgen.mpg.de \
--cc=jlayton@poochiereds.net \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=pmenzel@molgen.mpg.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).