linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bruce Fields <bfields@fieldses.org>
To: Frank van der Linden <fllinden@amazon.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	linux-nfs@vger.kernel.org
Subject: Re: nfsd filecache issues with v4
Date: Thu, 25 Jun 2020 13:10:21 -0400	[thread overview]
Message-ID: <20200625171021.GC30655@fieldses.org> (raw)
In-Reply-To: <20200608192122.GA19171@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com>

On Mon, Jun 08, 2020 at 07:21:22PM +0000, Frank van der Linden wrote:
> We recently noticed that, with 5.4+ kernels, the generic/531 test takes
> a very long time to finish for v4, especially when run on larger systems.
> 
> Case in point: a 72 VCPU, 144G EC2 instance as a client will make the test
> last about 20 hours.
> 
> So, I had a look to see what was going on. First of all, the test generates
> a lot of files - what it does is generate 50000 files per process, where
> it starts 2 * NCPU processes. So that's 144 processes in this case, 50000
> files each. Also, it does it by setting the file ulimit to 50000, and then
> just opening files, keeping them open, until it hits the limit.
> 
> So that's 7 million new/open files - that's a lot, but the problem can
> be triggered with far fewer than that as well.
> 
> Looking at what the server was doing, I noticed a lot of lock contention
> for nfsd_file_lru. Then I noticed that that nfsd_filecache_count kept
> going up, reflecting the number of open files by the client processes,
> eventually reaching, for example, that 7 million number.
> 
> So here's what happens: for NFSv4, files that are associated with an
> open stateid can stick around for a long time, as long as there's no
> CLOSE done on them. That's what's happening here. Also, since those files
> have a refcount of >= 2 (one for the hash table, one for being pointed to
> by the state), they are never eligible for removal from the file cache.
> Worse, since the code call nfs_file_gc inline if the upper bound is crossed
> (8192), every single operation that calls nfsd_file_acquire will end up
> walking the entire LRU, trying to free files, and failing every time.
> Walking a list with millions of files every single time isn't great.

Thanks for tracking this down.

> 
> There are some ways to fix this behavior like:
> 
> * Always allow v4 cached file structured to be purged from the cache.
>   They will stick around, since they still have a reference, but
>   at least they won't slow down cache handling to a crawl.

If they have to stick around anyway it seems too bad not to be able to
use them.

I mean, just because a file's opened first by a v4 user doesn't mean it
might not also have other users, right?

Would it be that hard to make nfsd_file_gc() a little smarter?

I don't know, maybe it's not worth it.

--b.

> * Don't add v4 files to the cache to begin with.
> 
> * Since the only advantage of the file cache for v4 is the caching
>   of files linked to special stateids (as far as I can tell), only
>   cache files associated with special state ids.
> 
> * Don't bother with v4 files at all, and revert the changes that
>   made v4 use the file cache.
> 
> In general, the resource control for files OPENed by the client is
> probably an issue. Even if you fix the cache, what if there are
> N clients that open millions of files and keep them open? Maybe
> there should be a fallback to start using temporary open files
> if a client goes beyond a reasonable limit and threatens to eat
> all resources.
> 
> Thoughts?
> 
> - Frank

  reply	other threads:[~2020-06-25 17:10 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-08 19:21 nfsd filecache issues with v4 Frank van der Linden
2020-06-25 17:10 ` Bruce Fields [this message]
2020-06-25 19:12   ` Frank van der Linden
2020-06-25 19:20     ` Frank van der Linden
2020-06-25 19:48     ` Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200625171021.GC30655@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=fllinden@amazon.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).