From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trondmy@hammerspace.com>,
Neil Brown <neilb@suse.de>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v15 16/26] nfsd: add LOCALIO support
Date: Tue, 3 Sep 2024 12:09:41 -0400 [thread overview]
Message-ID: <Ztc0xZIrAtZszJNf@kernel.org> (raw)
In-Reply-To: <67405117-1C08-4CA9-B0CE-743DFC7BCE3F@oracle.com>
On Tue, Sep 03, 2024 at 03:59:31PM +0000, Chuck Lever III wrote:
>
>
> > On Sep 3, 2024, at 11:29 AM, Mike Snitzer <snitzer@kernel.org> wrote:
> >
> > On Tue, Sep 03, 2024 at 11:19:45AM -0400, Jeff Layton wrote:
> >> On Tue, 2024-09-03 at 11:00 -0400, Mike Snitzer wrote:
> >>> On Tue, Sep 03, 2024 at 10:40:28AM -0400, Jeff Layton wrote:
> >>>> On Tue, 2024-09-03 at 10:34 -0400, Chuck Lever wrote:
> >>>>> On Sat, Aug 31, 2024 at 06:37:36PM -0400, Mike Snitzer wrote:
> >>>>>> From: Weston Andros Adamson <dros@primarydata.com>
> >>>>>>
> >>>>>> Add server support for bypassing NFS for localhost reads, writes, and
> >>>>>> commits. This is only useful when both the client and server are
> >>>>>> running on the same host.
> >>>>>>
> >>>>>> If nfsd_open_local_fh() fails then the NFS client will both retry and
> >>>>>> fallback to normal network-based read, write and commit operations if
> >>>>>> localio is no longer supported.
> >>>>>>
> >>>>>> Care is taken to ensure the same NFS security mechanisms are used
> >>>>>> (authentication, etc) regardless of whether localio or regular NFS
> >>>>>> access is used. The auth_domain established as part of the traditional
> >>>>>> NFS client access to the NFS server is also used for localio. Store
> >>>>>> auth_domain for localio in nfsd_uuid_t and transfer it to the client
> >>>>>> if it is local to the server.
> >>>>>>
> >>>>>> Relative to containers, localio gives the client access to the network
> >>>>>> namespace the server has. This is required to allow the client to
> >>>>>> access the server's per-namespace nfsd_net struct.
> >>>>>>
> >>>>>> This commit also introduces the use of NFSD's percpu_ref to interlock
> >>>>>> nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
> >>>>>> not destroyed while in use by nfsd_open_local_fh and other LOCALIO
> >>>>>> client code.
> >>>>>>
> >>>>>> CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO.
> >>>>>>
> >>>>>> Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> >>>>>> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> >>>>>> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> >>>>>> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> >>>>>> Co-developed-by: NeilBrown <neilb@suse.de>
> >>>>>> Signed-off-by: NeilBrown <neilb@suse.de>
> >>>>>>
> >>>>>> Not-Acked-by: Chuck Lever <chuck.lever@oracle.com>
> >>>>>> Not-Reviewed-by: Jeff Layton <jlayton@kernel.org>
> >>>>>> ---
> >>>>>> fs/nfsd/Makefile | 1 +
> >>>>>> fs/nfsd/filecache.c | 2 +-
> >>>>>> fs/nfsd/localio.c | 112 +++++++++++++++++++++++++++++++++++++
> >>>>>> fs/nfsd/netns.h | 4 ++
> >>>>>> fs/nfsd/nfsctl.c | 25 ++++++++-
> >>>>>> fs/nfsd/trace.h | 3 +-
> >>>>>> fs/nfsd/vfs.h | 2 +
> >>>>>> include/linux/nfslocalio.h | 8 +++
> >>>>>> 8 files changed, 154 insertions(+), 3 deletions(-)
> >>>>>> create mode 100644 fs/nfsd/localio.c
> >>>>>>
> >>>>>> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> >>>>>> index b8736a82e57c..18cbd3fa7691 100644
> >>>>>> --- a/fs/nfsd/Makefile
> >>>>>> +++ b/fs/nfsd/Makefile
> >>>>>> @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> >>>>>> nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> >>>>>> nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> >>>>>> nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> >>>>>> +nfsd-$(CONFIG_NFS_LOCALIO) += localio.o
> >>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> >>>>>> index 89ff380ec31e..348c1b97092e 100644
> >>>>>> --- a/fs/nfsd/filecache.c
> >>>>>> +++ b/fs/nfsd/filecache.c
> >>>>>> @@ -52,7 +52,7 @@
> >>>>>> #define NFSD_FILE_CACHE_UP (0)
> >>>>>>
> >>>>>> /* We only care about NFSD_MAY_READ/WRITE for this cache */
> >>>>>> -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
> >>>>>> +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
> >>>>>>
> >>>>>> static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> >>>>>> static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> >>>>>> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> >>>>>> new file mode 100644
> >>>>>> index 000000000000..75df709c6903
> >>>>>> --- /dev/null
> >>>>>> +++ b/fs/nfsd/localio.c
> >>>>>> @@ -0,0 +1,112 @@
> >>>>>> +// SPDX-License-Identifier: GPL-2.0-only
> >>>>>> +/*
> >>>>>> + * NFS server support for local clients to bypass network stack
> >>>>>> + *
> >>>>>> + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> >>>>>> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> >>>>>> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> >>>>>> + * Copyright (C) 2024 NeilBrown <neilb@suse.de>
> >>>>>> + */
> >>>>>> +
> >>>>>> +#include <linux/exportfs.h>
> >>>>>> +#include <linux/sunrpc/svcauth.h>
> >>>>>> +#include <linux/sunrpc/clnt.h>
> >>>>>> +#include <linux/nfs.h>
> >>>>>> +#include <linux/nfs_common.h>
> >>>>>> +#include <linux/nfslocalio.h>
> >>>>>> +#include <linux/string.h>
> >>>>>> +
> >>>>>> +#include "nfsd.h"
> >>>>>> +#include "vfs.h"
> >>>>>> +#include "netns.h"
> >>>>>> +#include "filecache.h"
> >>>>>> +
> >>>>>> +static const struct nfsd_localio_operations nfsd_localio_ops = {
> >>>>>> + .nfsd_open_local_fh = nfsd_open_local_fh,
> >>>>>> + .nfsd_file_put_local = nfsd_file_put_local,
> >>>>>> + .nfsd_file_file = nfsd_file_file,
> >>>>>> +};
> >>>>>> +
> >>>>>> +void nfsd_localio_ops_init(void)
> >>>>>> +{
> >>>>>> + memcpy(&nfs_to, &nfsd_localio_ops, sizeof(nfsd_localio_ops));
> >>>>>> +}
> >>>>>
> >>>>> Same comment as Neil: this should surface a pointer to the
> >>>>> localio_ops struct. Copying the whole set of function pointers is
> >>>>> generally unnecessary.
> >>>>>
> >>>>>
> >>>>>> +
> >>>>>> +/**
> >>>>>> + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
> >>>>>> + *
> >>>>>> + * @uuid: nfs_uuid_t which provides the 'struct net' to get the proper nfsd_net
> >>>>>> + * and the 'struct auth_domain' required for LOCALIO access
> >>>>>> + * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred
> >>>>>> + * @cred: cred that the client established
> >>>>>> + * @nfs_fh: filehandle to lookup
> >>>>>> + * @fmode: fmode_t to use for open
> >>>>>> + *
> >>>>>> + * This function maps a local fh to a path on a local filesystem.
> >>>>>> + * This is useful when the nfs client has the local server mounted - it can
> >>>>>> + * avoid all the NFS overhead with reads, writes and commits.
> >>>>>> + *
> >>>>>> + * On successful return, returned nfsd_file will have its nf_net member
> >>>>>> + * set. Caller (NFS client) is responsible for calling nfsd_serv_put and
> >>>>>> + * nfsd_file_put (via nfs_to.nfsd_file_put_local).
> >>>>>> + */
> >>>>>> +struct nfsd_file *
> >>>>>> +nfsd_open_local_fh(nfs_uuid_t *uuid,
> >>>>>> + struct rpc_clnt *rpc_clnt, const struct cred *cred,
> >>>>>> + const struct nfs_fh *nfs_fh, const fmode_t fmode)
> >>>>>> + __must_hold(rcu)
> >>>>>> +{
> >>>>>> + int mayflags = NFSD_MAY_LOCALIO;
> >>>>>> + struct nfsd_net *nn = NULL;
> >>>>>> + struct net *net;
> >>>>>> + struct svc_cred rq_cred;
> >>>>>> + struct svc_fh fh;
> >>>>>> + struct nfsd_file *localio;
> >>>>>> + __be32 beres;
> >>>>>> +
> >>>>>> + if (nfs_fh->size > NFS4_FHSIZE)
> >>>>>> + return ERR_PTR(-EINVAL);
> >>>>>> +
> >>>>>> + /*
> >>>>>> + * Not running in nfsd context, so must safely get reference on nfsd_serv.
> >>>>>> + * But the server may already be shutting down, if so disallow new localio.
> >>>>>> + * uuid->net is NOT a counted reference, but caller's rcu_read_lock() ensures
> >>>>>> + * that if uuid->net is not NULL, then calling nfsd_serv_try_get() is safe
> >>>>>> + * and if it succeeds we will have an implied reference to the net.
> >>>>>> + */
> >>>>>> + net = rcu_dereference(uuid->net);
> >>>>>> + if (net)
> >>>>>> + nn = net_generic(net, nfsd_net_id);
> >>>>>> + if (unlikely(!nn || !nfsd_serv_try_get(nn)))
> >>>>>> + return ERR_PTR(-ENXIO);
> >>>>>> +
> >>>>>> + /* Drop the rcu lock for nfsd_file_acquire_local() */
> >>>>>> + rcu_read_unlock();
> >>>>>
> >>>>> I'm struggling with the locking logistics. Caller takes the RCU read
> >>>>> lock, this function drops the lock, then takes it again. So:
> >>>>>
> >>>>> - A caller might rely on the lock being held continuously, but
> >>>>> - The API contract documented above doesn't indicate that this
> >>>>> function drops that lock
> >>>>> - The __must_hold(rcu) annotation doesn't indicate that this
> >>>>> function drops that lock, IIUC
> >>>>>
> >>>>> Dropping and retaking the lock in here is an anti-pattern that
> >>>>> should be avoided. I suggest we are better off in the long run if
> >>>>> the caller does not need to take the RCU read lock, but instead,
> >>>>> nfsd_open_local_fh takes it right here just for the rcu_dereference.
> >>>
> >>> I thought so too when I first saw how Neil approached fixing this to
> >>> be safe. It was only after putting further time to it (and having the
> >>> benefit of being so close to all this) that I realized the nuance at
> >>> play (please see my reply to Jeff below for the nuance I'm speaking
> >>> of).
> >>>
> >>>>>
> >>>>> OTOH, Why drop the lock before calling nfsd_file_acquire_local()?
> >>>>> The RCU read lock can safely be taken more than once in succession.
> >>>>>
> >>>>> Let's rethink the locking strategy.
> >>>>>
> >>>
> >>> Yes, _that_ is a very valid point. I did wonder the same: it seems
> >>> perfectly fine to simply retain the RCU throughout the entirety of
> >>> nfsd_open_local_fh().
> >>>
> >>
> >> Nope. nfsd_file_do_acquire can allocate, so you can't hold the
> >> rcu_read_lock over the whole thing.
> >
> > Ah, yeap.. sorry, I knew that ;)
> >
> >>>> Agreed. The only caller does this:
> >>>>
> >>>> rcu_read_lock();
> >>>> if (!rcu_access_pointer(uuid->net)) {
> >>>> rcu_read_unlock();
> >>>> return ERR_PTR(-ENXIO);
> >>>> }
> >>>> localio = nfs_to.nfsd_open_local_fh(uuid, rpc_clnt, cred,
> >>>> nfs_fh, fmode);
> >>>> rcu_read_unlock();
> >>>>
> >>>> Maybe just move the check for uuid->net down into nfsd_open_local_fh,
> >>>> and it can acquire the rcu_read_lock for itself?
> >>>
> >>> No, sorry we cannot. The call to nfs_to.nfsd_open_local_fh (which is
> >>> a symbol provided by nfsd) is only safe if the RCU protected pre-check
> >>> shows the uuid->net valid.
> >>
> >> Ouch, ok.
> >
> > I had to double check but I did add a comment that speaks directly to
> > this "nuance" above the code you quoted:
> >
> > /*
> > * uuid->net must not be NULL, otherwise NFS may not have ref
> > * on NFSD and therefore cannot safely make 'nfs_to' calls.
> > */
> >
> > So yeah, this code needs to stay like this. The __must_hold(rcu) just
> > ensures the RCU is held on entry and exit.. the bouncing of RCU
> > (dropping and retaking) isn't of immediate concern is it? While I
> > agree it isn't ideal, it is what it is given:
> > 1) NFS caller of NFSD symbol is only safe if it has RCU amd verified
> > uuid->net valid
> > 2) nfsd_file_do_acquire() can allocate.
>
> OK, understood, but the annotation is still wrong. The lock
> is dropped here so I think you need __releases and __acquires
> in that case. However...
Sure, that seems like more precise context with which to train
lockdep.
> Let's wait for Neil's comments, but I think this needs to be
> properly addressed before merging. The comments are not going
> to be enough IMO.
I obviously have no issues with Neil confirming/expanding what I
shared about the need for checking uuid->net with RCU held to ensure
it safe to call this nfs_to method. Without it we cannot make the
call, which happens to then take other references (nfsd_serv and
nfsd_file) that we can then lean on for the duration of the NFS client
issuing IO and then dropping the references/interlock when completing
the IO.
The NFS client maintainers need to give a good review anyway, so
plenty of time for Neil to weigh in.
Thanks,
Mike
next prev parent reply other threads:[~2024-09-03 16:09 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-31 22:37 [PATCH v15 00/26] nfs/nfsd: add support for LOCALIO Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 01/26] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 02/26] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 03/26] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 04/26] NFSD: Handle @rqstp == NULL in check_nfsd_access() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 05/26] NFSD: Refactor nfsd_setuser_and_check_port() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 06/26] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 07/26] NFSD: Short-circuit fh_verify tracepoints for LOCALIO Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 08/26] nfsd: factor out __fh_verify to allow NULL rqstp to be passed Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 09/26] nfsd: add nfsd_file_acquire_local() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 10/26] nfsd: add nfsd_serv_try_get and nfsd_serv_put Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 11/26] SUNRPC: remove call_allocate() BUG_ONs Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 12/26] SUNRPC: add svcauth_map_clnt_to_svc_cred_local Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 13/26] SUNRPC: replace program list with program array Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 14/26] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-09-01 23:25 ` NeilBrown
2024-09-03 16:33 ` Mike Snitzer
2024-09-05 19:24 ` Anna Schumaker
2024-09-05 19:38 ` Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 15/26] nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO Mike Snitzer
2024-09-01 23:37 ` NeilBrown
2024-08-31 22:37 ` [PATCH v15 16/26] nfsd: add LOCALIO support Mike Snitzer
2024-09-01 23:46 ` NeilBrown
2024-09-03 14:34 ` Chuck Lever
2024-09-03 14:40 ` Jeff Layton
2024-09-03 15:00 ` Mike Snitzer
2024-09-03 15:19 ` Jeff Layton
2024-09-03 15:29 ` Mike Snitzer
2024-09-03 15:59 ` Chuck Lever III
2024-09-03 16:09 ` Mike Snitzer [this message]
2024-09-03 17:07 ` Chuck Lever III
2024-09-03 22:31 ` NeilBrown
2024-09-04 5:01 ` NeilBrown
2024-09-04 13:47 ` Chuck Lever
2024-09-05 14:21 ` Mike Snitzer
2024-09-05 15:41 ` Chuck Lever III
2024-09-05 23:34 ` NeilBrown
2024-09-06 15:04 ` Mike Snitzer
2024-09-06 18:07 ` Mike Snitzer
2024-09-06 21:56 ` NeilBrown
2024-09-06 22:33 ` Chuck Lever III
2024-09-06 23:14 ` NeilBrown
2024-09-07 15:17 ` Mike Snitzer
2024-09-07 16:09 ` Chuck Lever III
2024-09-07 19:08 ` Mike Snitzer
2024-09-07 21:12 ` Chuck Lever III
2024-09-08 15:05 ` Chuck Lever III
2024-09-07 15:52 ` Mike Snitzer
2024-09-04 13:54 ` Jeff Layton
2024-09-04 13:56 ` Chuck Lever III
2024-08-31 22:37 ` [PATCH v15 17/26] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-09-03 14:11 ` Chuck Lever
2024-08-31 22:37 ` [PATCH v15 18/26] nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 19/26] nfs: add LOCALIO support Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 20/26] nfs: enable localio for non-pNFS IO Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 21/26] pnfs/flexfiles: enable localio support Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 22/26] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 23/26] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 24/26] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 25/26] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 26/26] nfs: add "NFS Client and Server Interlock" section to localio.rst Mike Snitzer
2024-09-01 23:52 ` [PATCH v15 00/26] nfs/nfsd: add support for LOCALIO NeilBrown
2024-09-03 14:49 ` Jeff Layton
2024-09-06 19:31 ` Anna Schumaker
2024-09-06 20:34 ` Mike Snitzer
2024-09-06 21:09 ` Chuck Lever III
2024-09-10 16:45 ` Mike Snitzer
2024-09-10 19:14 ` Mike Snitzer
2024-09-10 19:24 ` Anna Schumaker
2024-09-10 20:31 ` Anna Schumaker
2024-09-10 22:11 ` Mike Snitzer
2024-09-11 17:51 ` Mike Snitzer
2024-09-11 18:48 ` Mike Snitzer
2024-09-13 18:12 ` Mike Snitzer
2024-09-11 0:43 ` NeilBrown
2024-09-11 16:03 ` Chuck Lever III
2024-09-12 23:31 ` NeilBrown
2024-09-12 23:42 ` Chuck Lever III
2024-09-13 12:27 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ztc0xZIrAtZszJNf@kernel.org \
--to=snitzer@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).