From: Chuck Lever <chuck.lever@oracle.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: Jeff Layton <jlayton@redhat.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Bruce Fields <bfields@redhat.com>
Subject: Re: [PATCH 00/16] Cache open file descriptors in knfsd
Date: Mon, 1 Jul 2019 11:39:25 -0400 [thread overview]
Message-ID: <2A2E890C-8AD2-482D-A51C-AB6D254B9DB0@oracle.com> (raw)
In-Reply-To: <98d5ef75d1fa2b8775f52d378ca7d8dd1a542ae1.camel@hammerspace.com>
> On Jul 1, 2019, at 11:17 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Mon, 2019-07-01 at 11:02 -0400, Chuck Lever wrote:
>> Interesting work! Kudos to you and Jeff.
>>
>>
>>> On Jun 30, 2019, at 9:52 AM, Trond Myklebust <trondmy@gmail.com>
>>> wrote:
>>>
>>> When a NFSv3 READ or WRITE request comes in, the first thing knfsd
>>> has
>>> to do is open a new file descriptor. While this is often a
>>> relatively
>>> inexpensive thing to do for most local filesystems, it is usually
>>> less
>>> so for FUSE, clustered or networked filesystems that are being
>>> exported
>>> by knfsd.
>>
>> True, I haven't measured much effect if any of open and close on
>> local
>> file systems. It would be valuable if the cover letter provided a
>> more
>> quantified assessment of the cost for these other use cases. It
>> sounds
>> plausible to me that they would be more expensive, but I'm wondering
>> if
>> the additional complexity of an open file cache is warranted and
>> effective. Do you have any benchmark results to share?
>>
>> Are there particular workloads where you believe open caching will be
>> especially beneficial?
>
> I'd expect pretty much anything with a nontrivial open() method. i.e.:
> FUSE, GFS2, OCFS2, CEPH, etc. to benefit.
>
> I've seen no slowdowns so far with traditional filesystems: i.e. ext4
> and xfs.
>
> Note that the removal of the raparms cache does in many way compensate
> for the new need to lookup the struct file.
>
>>> This set of patches attempts to reduce some of that cost by caching
>>> open file descriptors so that they may be reused by other incoming
>>> READ/WRITE requests for the same file.
>>
>> Is the open file cache a single cache per server? Wondering if there
>> can be significant interference (eg lock contention or cache
>> sloshing)
>> between separate workloads on different exports, for example.
>
> The file cache is global. Cache lookups are lockless (i.e. RCU
> protected), so there is little contention for the case where there is
> already an entry. For the case where we have to add an entry, there is
> a mutex that might get contended in the case of workloads with lots of
> small file open+closes.
>> Do you have any benchmark results that show that removing the raparms
>> cache is harmless?
>
> The same information is carried in struct file. The whole raparms cache
> was just a hack in order to allow us to port the readahead information
> across struct file instances. Now that we are caching the struct file
> itself, the raparms hack is unnecessary.
OK. I see the patch description of 11/16 mentions something about "stop
fiddling with raparms" but IMO the patch description for 13/16 should be
changed to make the above clear. Thanks!
> IOW: I haven't seen any slowdowns so far, however I don't have access
> to a bleeding edge networking setup that would push this further.
>
>>> One danger when doing this, is that knfsd may end up caching file
>>> descriptors for files that have been unlinked. In order to deal
>>> with
>>> this issue, we use fsnotify to monitor the files, and have hooks to
>>> evict those descriptors from the file cache if the i_nlink value
>>> goes to 0.
>>>
>>> Jeff Layton (12):
>>> sunrpc: add a new cache_detail operation for when a cache is
>>> flushed
>>> locks: create a new notifier chain for lease attempts
>>> nfsd: add a new struct file caching facility to nfsd
>>> nfsd: hook up nfsd_write to the new nfsd_file cache
>>> nfsd: hook up nfsd_read to the nfsd_file cache
>>> nfsd: hook nfsd_commit up to the nfsd_file cache
>>> nfsd: convert nfs4_file->fi_fds array to use nfsd_files
>>> nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
>>> nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
>>> nfsd: have nfsd_test_lock use the nfsd_file cache
>>> nfsd: rip out the raparms cache
>>> nfsd: close cached files prior to a REMOVE or RENAME that would
>>> replace target
>>>
>>> Trond Myklebust (4):
>>> notify: export symbols for use by the knfsd file cache
>>> vfs: Export flush_delayed_fput for use by knfsd.
>>> nfsd: Fix up some unused variable warnings
>>> nfsd: Fix the documentation for svcxdr_tmpalloc()
>>>
>>> fs/file_table.c | 1 +
>>> fs/locks.c | 62 +++
>>> fs/nfsd/Kconfig | 1 +
>>> fs/nfsd/Makefile | 3 +-
>>> fs/nfsd/blocklayout.c | 3 +-
>>> fs/nfsd/export.c | 13 +
>>> fs/nfsd/filecache.c | 885
>>> +++++++++++++++++++++++++++++++
>>> fs/nfsd/filecache.h | 60 +++
>>> fs/nfsd/nfs4layouts.c | 12 +-
>>> fs/nfsd/nfs4proc.c | 83 +--
>>> fs/nfsd/nfs4state.c | 183 ++++---
>>> fs/nfsd/nfs4xdr.c | 31 +-
>>> fs/nfsd/nfssvc.c | 16 +-
>>> fs/nfsd/state.h | 10 +-
>>> fs/nfsd/trace.h | 140 +++++
>>> fs/nfsd/vfs.c | 295 ++++-------
>>> fs/nfsd/vfs.h | 9 +-
>>> fs/nfsd/xdr4.h | 19 +-
>>> fs/notify/fsnotify.h | 2 -
>>> fs/notify/group.c | 2 +
>>> fs/notify/mark.c | 6 +
>>> include/linux/fs.h | 5 +
>>> include/linux/fsnotify_backend.h | 2 +
>>> include/linux/sunrpc/cache.h | 1 +
>>> net/sunrpc/cache.c | 3 +
>>> 25 files changed, 1465 insertions(+), 382 deletions(-)
>>> create mode 100644 fs/nfsd/filecache.c
>>> create mode 100644 fs/nfsd/filecache.h
>>>
>>> --
>>> 2.21.0
>>>
>>
>> --
>> Chuck Lever
>>
>>
>>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
--
Chuck Lever
next prev parent reply other threads:[~2019-07-01 15:40 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-30 13:52 [PATCH 00/16] Cache open file descriptors in knfsd Trond Myklebust
2019-06-30 13:52 ` [PATCH 01/16] sunrpc: add a new cache_detail operation for when a cache is flushed Trond Myklebust
2019-06-30 13:52 ` [PATCH 02/16] locks: create a new notifier chain for lease attempts Trond Myklebust
2019-06-30 13:52 ` [PATCH 03/16] notify: export symbols for use by the knfsd file cache Trond Myklebust
2019-06-30 13:52 ` [PATCH 04/16] vfs: Export flush_delayed_fput for use by knfsd Trond Myklebust
2019-06-30 13:52 ` [PATCH 05/16] nfsd: add a new struct file caching facility to nfsd Trond Myklebust
2019-06-30 13:52 ` [PATCH 06/16] nfsd: hook up nfsd_write to the new nfsd_file cache Trond Myklebust
2019-06-30 13:52 ` [PATCH 07/16] nfsd: hook up nfsd_read to the " Trond Myklebust
2019-06-30 13:52 ` [PATCH 08/16] nfsd: hook nfsd_commit up " Trond Myklebust
2019-06-30 13:52 ` [PATCH 09/16] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Trond Myklebust
2019-06-30 13:52 ` [PATCH 10/16] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Trond Myklebust
2019-06-30 13:52 ` [PATCH 11/16] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache Trond Myklebust
2019-06-30 13:52 ` [PATCH 12/16] nfsd: have nfsd_test_lock use " Trond Myklebust
2019-06-30 13:52 ` [PATCH 13/16] nfsd: rip out the raparms cache Trond Myklebust
2019-06-30 13:52 ` [PATCH 14/16] nfsd: close cached files prior to a REMOVE or RENAME that would replace target Trond Myklebust
2019-06-30 13:52 ` [PATCH 15/16] nfsd: Fix up some unused variable warnings Trond Myklebust
2019-06-30 13:52 ` [PATCH 16/16] nfsd: Fix the documentation for svcxdr_tmpalloc() Trond Myklebust
2019-06-30 15:57 ` [PATCH 05/16] nfsd: add a new struct file caching facility to nfsd Matthew Wilcox
2019-06-30 16:15 ` Trond Myklebust
2019-06-30 15:27 ` [PATCH 02/16] locks: create a new notifier chain for lease attempts Matthew Wilcox
2019-06-30 15:50 ` Trond Myklebust
2019-07-01 15:02 ` [PATCH 00/16] Cache open file descriptors in knfsd Chuck Lever
2019-07-01 15:17 ` Trond Myklebust
2019-07-01 15:39 ` Chuck Lever [this message]
2019-07-31 22:05 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2A2E890C-8AD2-482D-A51C-AB6D254B9DB0@oracle.com \
--to=chuck.lever@oracle.com \
--cc=bfields@redhat.com \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).