From: Mike Snitzer <snitzer@kernel.org>
To: NeilBrown <neilb@suse.de>
Cc: Chuck Lever <chuck.lever@oracle.com>,
Jeff Layton <jlayton@kernel.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trondmy@hammerspace.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v15 16/26] nfsd: add LOCALIO support
Date: Fri, 6 Sep 2024 14:07:28 -0400 [thread overview]
Message-ID: <ZttE4DKrqqVa0ACn@kernel.org> (raw)
In-Reply-To: <ZtsZ8IEoV-DyqAzj@kernel.org>
On Fri, Sep 06, 2024 at 11:04:16AM -0400, Mike Snitzer wrote:
> On Fri, Sep 06, 2024 at 09:34:08AM +1000, NeilBrown wrote:
> > On Fri, 06 Sep 2024, Mike Snitzer wrote:
> > > On Wed, Sep 04, 2024 at 09:47:07AM -0400, Chuck Lever wrote:
> > > > On Wed, Sep 04, 2024 at 03:01:46PM +1000, NeilBrown wrote:
> > > > > On Wed, 04 Sep 2024, NeilBrown wrote:
> > > > > >
> > > > > > I agree that dropping and reclaiming a lock is an anti-pattern and in
> > > > > > best avoided in general. I cannot see a better alternative in this
> > > > > > case.
> > > > >
> > > > > It occurred to me what I should spell out the alternate that I DO see so
> > > > > you have the option of disagreeing with my assessment that it isn't
> > > > > "better".
> > > > >
> > > > > We need RCU to call into nfsd, we need a per-cpu ref on the net (which
> > > > > we can only get inside nfsd) and NOT RCU to call
> > > > > nfsd_file_acquire_local().
> > > > >
> > > > > The current code combines these (because they are only used together)
> > > > > and so the need to drop rcu.
> > > > >
> > > > > I thought briefly that it could simply drop rcu and leave it dropped
> > > > > (__releases(rcu)) but not only do I generally like that LESS than
> > > > > dropping and reclaiming, I think it would be buggy. While in the nfsd
> > > > > module code we need to be holding either rcu or a ref on the server else
> > > > > the code could disappear out from under the CPU. So if we exit without
> > > > > a ref on the server - which we do if nfsd_file_acquire_local() fails -
> > > > > then we need to reclaim RCU *before* dropping the ref. So the current
> > > > > code is slightly buggy.
> > > > >
> > > > > We could instead split the combined call into multiple nfs_to
> > > > > interfaces.
> > > > >
> > > > > So nfs_open_local_fh() in nfs_common/nfslocalio.c would be something
> > > > > like:
> > > > >
> > > > > rcu_read_lock();
> > > > > net = READ_ONCE(uuid->net);
> > > > > if (!net || !nfs_to.get_net(net)) {
> > > > > rcu_read_unlock();
> > > > > return ERR_PTR(-ENXIO);
> > > > > }
> > > > > rcu_read_unlock();
> > > > > localio = nfs_to.nfsd_open_local_fh(....);
> > > > > if (IS_ERR(localio))
> > > > > nfs_to.put_net(net);
> > > > > return localio;
> > > > >
> > > > > So we have 3 interfaces instead of 1, but no hidden unlock/lock.
> > > >
> > > > Splitting up the function call occurred to me as well, but I didn't
> > > > come up with a specific bit of surgery. Thanks for the suggestion.
> > > >
> > > > At this point, my concern is that we will lose your cogent
> > > > explanation of why the release/lock is done. Having it in email is
> > > > great, but email is more ephemeral than actually putting it in the
> > > > code.
> > > >
> > > >
> > > > > As I said, I don't think this is a net win, but reasonable people might
> > > > > disagree with me.
> > > >
> > > > The "win" here is that it makes this code self-documenting and
> > > > somewhat less likely to be broken down the road by changes in and
> > > > around this area. Since I'm more forgetful these days I lean towards
> > > > the more obvious kinds of coding solutions. ;-)
> > > >
> > > > Mike, how do you feel about the 3-interface suggestion?
> > >
> > > I dislike expanding from 1 indirect function call to 2 in rapid
> > > succession (3 for the error path, not a problem, just being precise.
> > > But I otherwise like it.. maybe.. heh.
> > >
> > > FYI, I did run with the suggestion to make nfs_to a pointer that just
> > > needs a simple assignment rather than memcpy to initialize. So Neil's
> > > above code becames:
> > >
> > > rcu_read_lock();
> > > net = rcu_dereference(uuid->net);
> > > if (!net || !nfs_to->nfsd_serv_try_get(net)) {
> > > rcu_read_unlock();
> > > return ERR_PTR(-ENXIO);
> > > }
> > > rcu_read_unlock();
> > > /* We have an implied reference to net thanks to nfsd_serv_try_get */
> > > localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt,
> > > cred, nfs_fh, fmode);
> > > if (IS_ERR(localio))
> > > nfs_to->nfsd_serv_put(net);
> > > return localio;
> > >
> > > I do think it cleans the code up... full patch is here:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=nfs-localio-for-next.v15-with-fixups&id=e85306941878a87070176702de687f2779436061
> > >
> > > But I'm still on the fence.. someone help push me over!
> >
> > I think the new code is unquestionable clearer, and not taking this
> > approach would be a micro-optimisation which would need to be
> > numerically justified. So I'm pushing for the three-interface version
> > (despite what I said before).
> >
> > Unfortunately the new code is not bug-free - not quite.
> > As soon as nfs_to->nfsd_serv_put() calls percpu_ref_put() the nfsd
> > module can be unloaded, and the "return" instruction might not be
> > present. For this to go wrong would require a lot of bad luck, but if
> > the CPU took an interrupt at the wrong time were would be room.
> >
> > [Ever since module_put_and_exit() was added (now ..and_kthread_exit)
> > I've been sensitive to dropping the ref to a module in code running in
> > the module]
> >
> > So I think nfsd_serv_put (and nfsd_serv_try_get() __must_hold(RCU) and
> > nfs_open_local_fh() needs rcu_read_lock() before calling
> > nfs_to->nfsd_serv_put(net).
>
> OK, yes I can see that, I implemented what you suggested at the end of
> your reply (see inline patch below)...
>
> But I'd just like to point out that something like the below patch
> wouldn't be needed if we kept my "heavy" approach (nfs reference on
> nfsd modules via nfs_common using request_symbol):
> https://marc.info/?l=linux-nfs&m=172499445027800&w=2
> (that patch has stuff I since cleaned up, e.g. removed typedefs and
> EXPORT_SYMBOL_GPLs..)
>
> I knew we were going to pay for being too cute with how nfs took its
> reference on nfsd.
>
> So here we are, needing fiddly incremental fixes like this to close a
> really-small-yet-will-be-deadly race:
<snip required delicate rcu re-locking requirements patch>
I prefer this incremental re-implementation of my symbol_request patch
that eliminates all concerns about the validity of 'nfs_to' calls:
---
fs/nfs/localio.c | 5 +++
fs/nfs_common/nfslocalio.c | 84 +++++++++++++++++++++++++++++++-------
fs/nfsd/localio.c | 2 +-
include/linux/nfslocalio.h | 7 +++-
4 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index c29cdf51c458..43520ac0fde8 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -124,6 +124,10 @@ const struct rpc_program nfslocalio_program = {
static void nfs_local_enable(struct nfs_client *clp)
{
spin_lock(&clp->cl_localio_lock);
+ if (!nfs_to_nfsd_localio_ops_get()) {
+ spin_unlock(&clp->cl_localio_lock);
+ return;
+ }
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
trace_nfs_local_enable(clp);
spin_unlock(&clp->cl_localio_lock);
@@ -138,6 +142,7 @@ void nfs_local_disable(struct nfs_client *clp)
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
trace_nfs_local_disable(clp);
nfs_uuid_invalidate_one_client(&clp->cl_uuid);
+ nfs_to_nfsd_localio_ops_put();
}
spin_unlock(&clp->cl_localio_lock);
}
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 42b479b9191f..9039e0f1afa3 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -7,6 +7,7 @@
#include <linux/module.h>
#include <linux/rculist.h>
#include <linux/nfslocalio.h>
+#include <linux/refcount.h>
#include <net/netns/generic.h>
MODULE_LICENSE("GPL");
@@ -53,11 +54,8 @@ static nfs_uuid_t * nfs_uuid_lookup_locked(const uuid_t *uuid)
return NULL;
}
-static struct module *nfsd_mod;
-
void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
- struct net *net, struct auth_domain *dom,
- struct module *mod)
+ struct net *net, struct auth_domain *dom)
{
nfs_uuid_t *nfs_uuid;
@@ -73,9 +71,6 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
*/
list_move(&nfs_uuid->list, list);
rcu_assign_pointer(nfs_uuid->net, net);
-
- __module_get(mod);
- nfsd_mod = mod;
}
spin_unlock(&nfs_uuid_lock);
}
@@ -83,10 +78,8 @@ EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
{
- if (nfs_uuid->net) {
- module_put(nfsd_mod);
- nfs_uuid->net = NULL;
- }
+ if (nfs_uuid->net)
+ RCU_INIT_POINTER(nfs_uuid->net, NULL);
if (nfs_uuid->dom) {
auth_domain_put(nfs_uuid->dom);
nfs_uuid->dom = NULL;
@@ -123,14 +116,14 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
struct nfsd_file *localio;
/*
- * Not running in nfsd context, so must safely get reference on nfsd_serv.
+ * NFS has a reference to NFSD and can safely make 'nfs_to' calls.
+ *
+ * But not running in NFSD context, so must safely get reference to nfsd_serv.
* But the server may already be shutting down, if so disallow new localio.
+ *
* uuid->net is NOT a counted reference, but rcu_read_lock() ensures that
* if uuid->net is not NULL, then calling nfsd_serv_try_get() is safe
* and if it succeeds we will have an implied reference to the net.
- *
- * Otherwise NFS may not have ref on NFSD and therefore cannot safely
- * make 'nfs_to' calls.
*/
rcu_read_lock();
net = rcu_dereference(uuid->net);
@@ -153,6 +146,7 @@ EXPORT_SYMBOL_GPL(nfs_open_local_fh);
* but cannot be statically linked, because that will make the NFS
* module always depend on the NFSD module.
*
+ * [FIXME: must adjust following 2 paragraphs]
* 'nfs_to' provides NFS access to NFSD functions needed for LOCALIO,
* its lifetime is tightly coupled to the NFSD module and will always
* be available to NFS LOCALIO because any successful client<->server
@@ -170,3 +164,63 @@ EXPORT_SYMBOL_GPL(nfs_open_local_fh);
*/
const struct nfsd_localio_operations *nfs_to;
EXPORT_SYMBOL_GPL(nfs_to);
+
+static DEFINE_SPINLOCK(nfs_to_nfsd_lock);
+static refcount_t nfs_to_ref;
+
+bool nfs_to_nfsd_localio_ops_get(void)
+{
+ spin_lock(&nfs_to_nfsd_lock);
+
+ /* Only get nfsd_localio_operations on first reference */
+ if (refcount_read(&nfs_to_ref) == 0) {
+ refcount_set(&nfs_to_ref, 1);
+ /* fallthru */
+ } else {
+ refcount_inc(&nfs_to_ref);
+ spin_unlock(&nfs_to_nfsd_lock);
+ return true;
+ }
+
+ /* Must drop spinlock before call to symbol_request */
+ spin_unlock(&nfs_to_nfsd_lock);
+
+ /*
+ * If NFSD isn't available LOCALIO isn't possible.
+ * Use nfsd_open_local_fh symbol as the bellwether, if
+ * available then nfs_common has NFSD module reference
+ * on NFS's behalf and can safely call 'nfs_to' functions.
+ */
+ if (!symbol_request(nfsd_open_local_fh))
+ return false;
+ return true;
+}
+EXPORT_SYMBOL_GPL(nfs_to_nfsd_localio_ops_get);
+
+void nfs_to_nfsd_localio_ops_put(void)
+{
+ spin_lock(&nfs_to_nfsd_lock);
+
+ if (!refcount_dec_and_test(&nfs_to_ref))
+ goto out;
+
+ symbol_put(nfsd_open_local_fh);
+ nfs_to = NULL;
+out:
+ spin_unlock(&nfs_to_nfsd_lock);
+}
+EXPORT_SYMBOL_GPL(nfs_to_nfsd_localio_ops_put);
+
+static int __init nfslocalio_init(void)
+{
+ refcount_set(&nfs_to_ref, 0);
+
+ return 0;
+}
+
+static void __exit nfslocalio_exit(void)
+{
+}
+
+module_init(nfslocalio_init);
+module_exit(nfslocalio_exit);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 291e9c69cae4..291ad916d67a 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -114,7 +114,7 @@ static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp)
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
nfs_uuid_is_local(&argp->uuid, &nn->local_clients,
- net, rqstp->rq_client, THIS_MODULE);
+ net, rqstp->rq_client);
return rpc_success;
}
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index b353abe00357..2e6b9107a7d1 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -35,7 +35,7 @@ typedef struct {
void nfs_uuid_begin(nfs_uuid_t *);
void nfs_uuid_end(nfs_uuid_t *);
void nfs_uuid_is_local(const uuid_t *, struct list_head *,
- struct net *, struct auth_domain *, struct module *);
+ struct net *, struct auth_domain *);
void nfs_uuid_invalidate_clients(struct list_head *list);
void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid);
@@ -58,9 +58,12 @@ struct nfsd_localio_operations {
struct file *(*nfsd_file_file)(struct nfsd_file *);
} ____cacheline_aligned;
-extern void nfsd_localio_ops_init(void);
extern const struct nfsd_localio_operations *nfs_to;
+extern void nfsd_localio_ops_init(void);
+bool nfs_to_nfsd_localio_ops_get(void);
+void nfs_to_nfsd_localio_ops_put(void);
+
struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *,
struct rpc_clnt *, const struct cred *,
const struct nfs_fh *, const fmode_t);
--
2.39.3
next prev parent reply other threads:[~2024-09-06 18:07 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-31 22:37 [PATCH v15 00/26] nfs/nfsd: add support for LOCALIO Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 01/26] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 02/26] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 03/26] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 04/26] NFSD: Handle @rqstp == NULL in check_nfsd_access() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 05/26] NFSD: Refactor nfsd_setuser_and_check_port() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 06/26] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 07/26] NFSD: Short-circuit fh_verify tracepoints for LOCALIO Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 08/26] nfsd: factor out __fh_verify to allow NULL rqstp to be passed Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 09/26] nfsd: add nfsd_file_acquire_local() Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 10/26] nfsd: add nfsd_serv_try_get and nfsd_serv_put Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 11/26] SUNRPC: remove call_allocate() BUG_ONs Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 12/26] SUNRPC: add svcauth_map_clnt_to_svc_cred_local Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 13/26] SUNRPC: replace program list with program array Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 14/26] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-09-01 23:25 ` NeilBrown
2024-09-03 16:33 ` Mike Snitzer
2024-09-05 19:24 ` Anna Schumaker
2024-09-05 19:38 ` Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 15/26] nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO Mike Snitzer
2024-09-01 23:37 ` NeilBrown
2024-08-31 22:37 ` [PATCH v15 16/26] nfsd: add LOCALIO support Mike Snitzer
2024-09-01 23:46 ` NeilBrown
2024-09-03 14:34 ` Chuck Lever
2024-09-03 14:40 ` Jeff Layton
2024-09-03 15:00 ` Mike Snitzer
2024-09-03 15:19 ` Jeff Layton
2024-09-03 15:29 ` Mike Snitzer
2024-09-03 15:59 ` Chuck Lever III
2024-09-03 16:09 ` Mike Snitzer
2024-09-03 17:07 ` Chuck Lever III
2024-09-03 22:31 ` NeilBrown
2024-09-04 5:01 ` NeilBrown
2024-09-04 13:47 ` Chuck Lever
2024-09-05 14:21 ` Mike Snitzer
2024-09-05 15:41 ` Chuck Lever III
2024-09-05 23:34 ` NeilBrown
2024-09-06 15:04 ` Mike Snitzer
2024-09-06 18:07 ` Mike Snitzer [this message]
2024-09-06 21:56 ` NeilBrown
2024-09-06 22:33 ` Chuck Lever III
2024-09-06 23:14 ` NeilBrown
2024-09-07 15:17 ` Mike Snitzer
2024-09-07 16:09 ` Chuck Lever III
2024-09-07 19:08 ` Mike Snitzer
2024-09-07 21:12 ` Chuck Lever III
2024-09-08 15:05 ` Chuck Lever III
2024-09-07 15:52 ` Mike Snitzer
2024-09-04 13:54 ` Jeff Layton
2024-09-04 13:56 ` Chuck Lever III
2024-08-31 22:37 ` [PATCH v15 17/26] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-09-03 14:11 ` Chuck Lever
2024-08-31 22:37 ` [PATCH v15 18/26] nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 19/26] nfs: add LOCALIO support Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 20/26] nfs: enable localio for non-pNFS IO Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 21/26] pnfs/flexfiles: enable localio support Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 22/26] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 23/26] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 24/26] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 25/26] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-08-31 22:37 ` [PATCH v15 26/26] nfs: add "NFS Client and Server Interlock" section to localio.rst Mike Snitzer
2024-09-01 23:52 ` [PATCH v15 00/26] nfs/nfsd: add support for LOCALIO NeilBrown
2024-09-03 14:49 ` Jeff Layton
2024-09-06 19:31 ` Anna Schumaker
2024-09-06 20:34 ` Mike Snitzer
2024-09-06 21:09 ` Chuck Lever III
2024-09-10 16:45 ` Mike Snitzer
2024-09-10 19:14 ` Mike Snitzer
2024-09-10 19:24 ` Anna Schumaker
2024-09-10 20:31 ` Anna Schumaker
2024-09-10 22:11 ` Mike Snitzer
2024-09-11 17:51 ` Mike Snitzer
2024-09-11 18:48 ` Mike Snitzer
2024-09-13 18:12 ` Mike Snitzer
2024-09-11 0:43 ` NeilBrown
2024-09-11 16:03 ` Chuck Lever III
2024-09-12 23:31 ` NeilBrown
2024-09-12 23:42 ` Chuck Lever III
2024-09-13 12:27 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZttE4DKrqqVa0ACn@kernel.org \
--to=snitzer@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).