linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: Jeff Layton <jlayton@redhat.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Bruce Fields <bfields@redhat.com>
Subject: Re: [PATCH 00/16] Cache open file descriptors in knfsd
Date: Mon, 1 Jul 2019 11:39:25 -0400	[thread overview]
Message-ID: <2A2E890C-8AD2-482D-A51C-AB6D254B9DB0@oracle.com> (raw)
In-Reply-To: <98d5ef75d1fa2b8775f52d378ca7d8dd1a542ae1.camel@hammerspace.com>



> On Jul 1, 2019, at 11:17 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Mon, 2019-07-01 at 11:02 -0400, Chuck Lever wrote:
>> Interesting work! Kudos to you and Jeff.
>> 
>> 
>>> On Jun 30, 2019, at 9:52 AM, Trond Myklebust <trondmy@gmail.com>
>>> wrote:
>>> 
>>> When a NFSv3 READ or WRITE request comes in, the first thing knfsd
>>> has
>>> to do is open a new file descriptor. While this is often a
>>> relatively
>>> inexpensive thing to do for most local filesystems, it is usually
>>> less
>>> so for FUSE, clustered or networked filesystems that are being
>>> exported
>>> by knfsd.
>> 
>> True, I haven't measured much effect if any of open and close on
>> local
>> file systems. It would be valuable if the cover letter provided a
>> more
>> quantified assessment of the cost for these other use cases. It
>> sounds
>> plausible to me that they would be more expensive, but I'm wondering
>> if
>> the additional complexity of an open file cache is warranted and
>> effective. Do you have any benchmark results to share?
>> 
>> Are there particular workloads where you believe open caching will be
>> especially beneficial?
> 
> I'd expect pretty much anything with a nontrivial open() method. i.e.:
> FUSE, GFS2, OCFS2, CEPH, etc. to benefit.
> 
> I've seen no slowdowns so far with traditional filesystems: i.e. ext4
> and xfs.
> 
> Note that the removal of the raparms cache does in many way compensate
> for the new need to lookup the struct file.
> 
>>> This set of patches attempts to reduce some of that cost by caching
>>> open file descriptors so that they may be reused by other incoming
>>> READ/WRITE requests for the same file.
>> 
>> Is the open file cache a single cache per server? Wondering if there
>> can be significant interference (eg lock contention or cache
>> sloshing)
>> between separate workloads on different exports, for example.
> 
> The file cache is global. Cache lookups are lockless (i.e. RCU
> protected), so there is little contention for the case where there is
> already an entry. For the case where we have to add an entry, there is
> a mutex that might get contended in the case of workloads with lots of
> small file open+closes.

>> Do you have any benchmark results that show that removing the raparms
>> cache is harmless?
> 
> The same information is carried in struct file. The whole raparms cache
> was just a hack in order to allow us to port the readahead information
> across struct file instances. Now that we are caching the struct file
> itself, the raparms hack is unnecessary.

OK. I see the patch description of 11/16 mentions something about "stop
fiddling with raparms" but IMO the patch description for 13/16 should be
changed to make the above clear. Thanks!


> IOW: I haven't seen any slowdowns so far, however I don't have access
> to a bleeding edge networking setup that would push this further.
> 
>>> One danger when doing this, is that knfsd may end up caching file
>>> descriptors for files that have been unlinked. In order to deal
>>> with
>>> this issue, we use fsnotify to monitor the files, and have hooks to
>>> evict those descriptors from the file cache if the i_nlink value
>>> goes to 0.
>>> 
>>> Jeff Layton (12):
>>> sunrpc: add a new cache_detail operation for when a cache is
>>> flushed
>>> locks: create a new notifier chain for lease attempts
>>> nfsd: add a new struct file caching facility to nfsd
>>> nfsd: hook up nfsd_write to the new nfsd_file cache
>>> nfsd: hook up nfsd_read to the nfsd_file cache
>>> nfsd: hook nfsd_commit up to the nfsd_file cache
>>> nfsd: convert nfs4_file->fi_fds array to use nfsd_files
>>> nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
>>> nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
>>> nfsd: have nfsd_test_lock use the nfsd_file cache
>>> nfsd: rip out the raparms cache
>>> nfsd: close cached files prior to a REMOVE or RENAME that would
>>>   replace target
>>> 
>>> Trond Myklebust (4):
>>> notify: export symbols for use by the knfsd file cache
>>> vfs: Export flush_delayed_fput for use by knfsd.
>>> nfsd: Fix up some unused variable warnings
>>> nfsd: Fix the documentation for svcxdr_tmpalloc()
>>> 
>>> fs/file_table.c                  |   1 +
>>> fs/locks.c                       |  62 +++
>>> fs/nfsd/Kconfig                  |   1 +
>>> fs/nfsd/Makefile                 |   3 +-
>>> fs/nfsd/blocklayout.c            |   3 +-
>>> fs/nfsd/export.c                 |  13 +
>>> fs/nfsd/filecache.c              | 885
>>> +++++++++++++++++++++++++++++++
>>> fs/nfsd/filecache.h              |  60 +++
>>> fs/nfsd/nfs4layouts.c            |  12 +-
>>> fs/nfsd/nfs4proc.c               |  83 +--
>>> fs/nfsd/nfs4state.c              | 183 ++++---
>>> fs/nfsd/nfs4xdr.c                |  31 +-
>>> fs/nfsd/nfssvc.c                 |  16 +-
>>> fs/nfsd/state.h                  |  10 +-
>>> fs/nfsd/trace.h                  | 140 +++++
>>> fs/nfsd/vfs.c                    | 295 ++++-------
>>> fs/nfsd/vfs.h                    |   9 +-
>>> fs/nfsd/xdr4.h                   |  19 +-
>>> fs/notify/fsnotify.h             |   2 -
>>> fs/notify/group.c                |   2 +
>>> fs/notify/mark.c                 |   6 +
>>> include/linux/fs.h               |   5 +
>>> include/linux/fsnotify_backend.h |   2 +
>>> include/linux/sunrpc/cache.h     |   1 +
>>> net/sunrpc/cache.c               |   3 +
>>> 25 files changed, 1465 insertions(+), 382 deletions(-)
>>> create mode 100644 fs/nfsd/filecache.c
>>> create mode 100644 fs/nfsd/filecache.h
>>> 
>>> -- 
>>> 2.21.0
>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

--
Chuck Lever




  reply	other threads:[~2019-07-01 15:40 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-30 13:52 [PATCH 00/16] Cache open file descriptors in knfsd Trond Myklebust
2019-06-30 13:52 ` [PATCH 01/16] sunrpc: add a new cache_detail operation for when a cache is flushed Trond Myklebust
2019-06-30 13:52   ` [PATCH 02/16] locks: create a new notifier chain for lease attempts Trond Myklebust
2019-06-30 13:52     ` [PATCH 03/16] notify: export symbols for use by the knfsd file cache Trond Myklebust
2019-06-30 13:52       ` [PATCH 04/16] vfs: Export flush_delayed_fput for use by knfsd Trond Myklebust
2019-06-30 13:52         ` [PATCH 05/16] nfsd: add a new struct file caching facility to nfsd Trond Myklebust
2019-06-30 13:52           ` [PATCH 06/16] nfsd: hook up nfsd_write to the new nfsd_file cache Trond Myklebust
2019-06-30 13:52             ` [PATCH 07/16] nfsd: hook up nfsd_read to the " Trond Myklebust
2019-06-30 13:52               ` [PATCH 08/16] nfsd: hook nfsd_commit up " Trond Myklebust
2019-06-30 13:52                 ` [PATCH 09/16] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Trond Myklebust
2019-06-30 13:52                   ` [PATCH 10/16] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Trond Myklebust
2019-06-30 13:52                     ` [PATCH 11/16] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache Trond Myklebust
2019-06-30 13:52                       ` [PATCH 12/16] nfsd: have nfsd_test_lock use " Trond Myklebust
2019-06-30 13:52                         ` [PATCH 13/16] nfsd: rip out the raparms cache Trond Myklebust
2019-06-30 13:52                           ` [PATCH 14/16] nfsd: close cached files prior to a REMOVE or RENAME that would replace target Trond Myklebust
2019-06-30 13:52                             ` [PATCH 15/16] nfsd: Fix up some unused variable warnings Trond Myklebust
2019-06-30 13:52                               ` [PATCH 16/16] nfsd: Fix the documentation for svcxdr_tmpalloc() Trond Myklebust
2019-06-30 15:57           ` [PATCH 05/16] nfsd: add a new struct file caching facility to nfsd Matthew Wilcox
2019-06-30 16:15             ` Trond Myklebust
2019-06-30 15:27     ` [PATCH 02/16] locks: create a new notifier chain for lease attempts Matthew Wilcox
2019-06-30 15:50       ` Trond Myklebust
2019-07-01 15:02 ` [PATCH 00/16] Cache open file descriptors in knfsd Chuck Lever
2019-07-01 15:17   ` Trond Myklebust
2019-07-01 15:39     ` Chuck Lever [this message]
2019-07-31 22:05 ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2A2E890C-8AD2-482D-A51C-AB6D254B9DB0@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=bfields@redhat.com \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).